Analysis of eye movement
Eye movement first 2 seconds (
Yarbus, 1967)
During the 1960s, technical development permitted the continuous registration of eye movement during reading[9] in picture viewing[10] and later in visual problem solving[11] and when headset-cameras became available, also during driving.[12]
The picture to the left shows what may happen during the first two seconds of visual inspection. While the background is out of focus, representing the peripheral vision, the first eye movement goes to the boots of the man (just because they are very near the starting fixation and have a reasonable contrast).
The following fixations jump from face to face. They might even permit comparisons between faces.
It may be concluded that the icon face is a very attractive search icon within the peripheral field of vision. The foveal vision adds detailed information to the peripheral first impression.
It can also be noted that there are three different types of eye movements: vergence movements, saccadic movements and pursuit movements. Vergence movements involve the cooperation of both eyes to allow for an image to fall on the same area of both retinas. This results in a single focused image. Saccadic movements is the type of eye movement that is used to rapidly scan a particular scene/image. Lastly, pursuit movement is used to follow objects in motion.[13]
Face and object recognition
There is considerable evidence that face and object recognition are accomplished by distinct systems. For example, prosopagnosic patients show deficits in face, but not object processing, while object agnosic patients (most notably, patient C.K.) show deficits in object processing with spared face processing.[14] Behaviorally, it has been shown that faces, but not objects, are subject to inversion effects, leading to the claim that faces are "special."[14][15] Further, face and object processing recruit distinct neural systems.[16] Notably, some have argued that the apparent specialization of the human brain for face processing does not reflect true domain specificity, but rather a more general process of expert-level discrimination within a given class of stimulus,[17] though this latter claim is the subject of substantial debate.
The cognitive and computational approaches
The major problem with the Gestalt laws (and the Gestalt school generally) is that they are descriptive not explanatory. For example, one cannot explain how humans see continuous contours by simply stating that the brain "prefers good continuity". Computational models of vision have had more success in explaining visual phenomena and have largely superseded Gestalt theory. More recently, the computational models of visual perception have been developed for Virtual Reality systems—these are closer to real life situation as they account for motion and activities which are prevalent in the real world.[18] Regarding Gestalt influence on the study of visual perception, Bruce, Green & Georgeson conclude:
-
The physiological theory of the Gestaltists has fallen by the wayside, leaving us with a set of descriptive principles, but without a model of perceptual processing. Indeed, some of their "laws" of perceptual organisation today sound vague and inadequate. What is meant by a "good" or "simple" shape, for example?[19]
In the 1970s, David Marr developed a multi-level theory of vision, which analyzed the process of vision at different levels of abstraction. In order to focus on the understanding of specific problems in vision, he identified three levels of analysis: the computational, algorithmic and implementational levels. Many vision scientists, including Tomaso Poggio, have embraced these levels of analysis and employed them to further characterize vision from a computational perspective.
The computational level addresses, at a high level of abstraction, the problems that the visual system must overcome. The algorithmic level attempts to identify the strategy that may be used to solve these problems. Finally, the implementational level attempts to explain how solutions to these problems are realized in neural circuitry.
Marr suggested that it is possible to investigate vision at any of these levels independently. Marr described vision as proceeding from a two-dimensional visual array (on the retina) to a three-dimensional description of the world as output. His stages of vision include:
-
A 2D or primal sketch of the scene, based on feature extraction of fundamental components of the scene, including edges, regions, etc. Note the similarity in concept to a pencil sketch drawn quickly by an artist as an impression.
-
A 2½ D sketch of the scene, where textures are acknowledged, etc. Note the similarity in concept to the stage in drawing where an artist highlights or shades areas of a scene, to provide depth.
-
A 3 D model, where the scene is visualized in a continuous, 3-dimensional map.[20]
Transduction
Transduction is the process through which energy from environmental stimuli is converted to neural activity for the brain to understand and process. The back of the eye contains three different cell layers; Photoreceptor layer, Bipolar cell layer and Ganglion cell layer. The photoreceptor layer is at the very back and contains rod photoreceptors and cone photoreceptors. Cones are responsible for color perception. There are three different cones: red, green and blue. Photoreceptors contain within them a special chemical called a photopigment, which are embedded in the membrane of the lamellae; a single human rod contains approximately 10 million of them. The photopigment molecules consist of two parts an opsin (a protein) and retinal (a lipid).[21] There are 3 specific photopigments (each with their own color) that respond to specific wavelengths of light. When the appropriate wavelength of light hits the photoreceptor, its photopigment splits into two, which sends a message to the bipolar cell layer, which in turn sends a message to the ganglion cells, which then send the information through the optic nerve to the brain. If the appropriate photopigment is not in the proper photoreceptor (for example, a green photopigment inside a red cone), a condition called color vision deficiency will occur.[22]
Opponent process
Transduction involves chemical messages sent from the photoreceptors to the bipolar cells to the ganglion cells. Several photoreceptors may send their information to one ganglion cell. There are two types of ganglion cells: red/green and yellow/blue. These neuron cells consistently fire—even when not stimulated. The brain interprets different colors (and with a lot of information, an image) when the rate of firing of these neurons alters. Red light stimulates the red cone, which in turn stimulates the red/green ganglion cell. Likewise, green light stimulates the green cone, which stimulates the red/green ganglion cell and blue light stimulates the blue cone which stimulates the yellow/blue ganglion cell. The rate of firing of the ganglion cells is increased when it is signaled by one cone and decreased (inhibited) when it is signaled by the other cone. The first color in the name if the ganglion cell is the color that excites it and the second is the color that inhibits it. i.e.: A red cone would excite the red/green ganglion cell and the green cone would inhibit the red/green ganglion cell. This is an opponent process. If the rate of firing of a red/green ganglion cell is increased, the brain would know that the light was red, if the rate was decreased, the brain would know that the color of the light was green.[22]
Artificial visual perception
Theories and observations of visual perception have been the main source of inspiration for computer vision (also called machine vision, or computational vision). Special hardware structures and software algorithms provide machines with the capability to interpret the images coming from a camera or a sensor. Artificial Visual Perception has long been used in the industry and is now entering the domains of automotive and robotics.[23][24]
See also