Working with Thomas Serre at Brown University, I have studied neural mechanisms that support our ability to segment objects in complex scenes, and how models for these mechanisms can solve challenges to artificial vision.

Consider the above image, which poses a simple question on scenes of varying complexity: are two dots falling on the same or different objects? We have little issue answering such a question with a glance or two, whereas current artificial vision systems are strained by this task as scene complexity increases (left vs. middle columns). This is because artificial vision does not have many of the visual routines that we depend on to efficiently and accurately reason about the world. The foundations of biological vision are therefore a roadmap for building better artificial vision. Building on this insight, I pair real-world and synthetic datasets to systematically probe the limitations of artificial vs. biological vision.

Neural circuits for machine vision

We hypothesized that a canonical visual circuit, which implements intra-areal interactions between neurons, is key for segmenting/grouping and maintaining constant object percepts in primate vision.

In initial theoretical work [1], we demonstrated that a dynamical neural field model of this circuit explained both physiological and behavioral data describing contextual illusions in diverse visual domains — from the orientation tilt illusion to color and motion induction.

I next extended this circuit into a machine learning module, making it possible to learn its dynamical parameters and synaptic connections from large datasets of images or neural data [2]. Through fitting circuit parameters to natural images, the model learned synaptic connections that resemble those thought to underlie “gestalt”-based perceptual grouping, and was able to solve simple segmentation tasks despite having orders-of-magnitude fewer free parameters than state-of-the-art convolutional neural network (CNN) artificial vision systems.

I have revised this model with inter-areal dynamic connections into the circuit, which span regions of a visual hierarchy. In one project using this new model I showed that intra- vs. inter-areal connections serve complementary roles for segmenting complex natural scenes, and that a model containing both explains visual perception better than dominant feedforward CNN models for the visual system [3].

In another project I described how both types of connections help biological and artificial vision more efficiently learn how to segment the world. An artificial vision system with these connections also shows contextual illusions, suggesting that such illusions are not a computational “bug” of the visual system, but instead a feature of neural biases for segmenting visual scenes [4].

Separately, I demonstrated how a modified version of the circuit can implement spatial- and feature-based visual attention, which promotes representations of object features that are more similar to human observers and generalizable than state-of-the-art CNNs [5].

We have now discovered how to extend these circuit models to complex scene segmentation tasks, where they learn visual routines that resemble classic theory from cognitive science — without any constraints to do so — and outperform state-of-the-art approaches which rely on “feedforward” processing [6].

Computational, systems, and cognitive neuroscience

A popular approach in visual neuroscience is to compare the responses of artificial vision models, like CNNs, to neural and behavior data. However, dominant models of the visual system have little in common with biology, and it is reasonable to expect that including mechanisms that are similar to those used by biological vision will improve model explanations of neural data.

We have preliminary evidence that CNN models of visual system recordings from retina/V1/V4/It are better predictors of neural activity when they incorporate recurrent intra-areal connections.

I have used extensions of my circuit model to build novel tools for analyzing neuroscience data. This includes a computer vision model for predicting neurons’ health from their morphology for drug discovery [7], and an algorithm for deconvolving neuron action potentials from calcium imaging data [8].

I am also using an extension of the same circuit model to reconstruct connectomes of mouse retina (David Berson) and identify retinal circuits that support low-level visual constancies [9].

References

[1] Mély D, Linsley D, & Serre T. 2018. Opponent surrounds explain diversity of contextual phenomena across visual modalities. Psychological Review.

[2] Linsley D, Kim J, Veerabadran V, & Serre T. 2018. Learning long-range spatial dependencies with horizontal gated-recurrent units. NeurIPS, 2018.

[3] Linsley D*, Kim J* & Serre T. 2019. Disentangling neural mechanisms for perceptual grouping. ArXiv.

[4] Kim J*, Linsley D*, & Serre T. 2019. Sample-efficient image segmentation through recurrence. ArXiv.

[5] Linsley D, Scheibler D, Eberhardt S, & Serre T. 2018. Learning what and where to attend. ICLR, 2019.

[6] Linsley D*, Ashok A*, Govindarajan L*, Liu R, & Serre T. 2020. Stable and expressive recurrent vision models. Advances in Neural Information Processing Systems (NIPS Spotlight), 2020.

[7] Linsley D, … & Serre T. 2018. Learning to predict action potentials end-to-end from calcium imaging data. IEEE Conference on Information Sciences and Systems.

[8] Fang M, Markmiller S, Vu A, Javaherian A, Dowdle W, Jolivet P, Bushway P, Castello N, Baral A, Chan M, Linsley J, Linsley D, Mercola M, Finkbeiner S, Lecuyer E, Lewcock J & Yeo G. 2019. Small molecule modulation of TDP-43 recruitment to stress granules prevents persistent TDP-43 accumulation in ALS/FTD. Neuron.

[9] Linsley D*, Kim J* & Serre T. 2018. Robust neural circuit reconstruction from serial electron microscopy with convolutional recurrent networks. ArXiv.

Drew Linsley

My research

Neural circuits for machine vision

Computational, systems, and cognitive neuroscience