5.6 The Gestalt Principles of Perception

Learning objectives.

By the end of this section, you will be able to:

  • Explain the figure-ground relationship
  • Define Gestalt principles of grouping
  • Describe how perceptual set is influenced by an individual’s characteristics and mental state

   In the early part of the 20th century, Max Wertheimer published a paper demonstrating that individuals perceived motion in rapidly flickering static images—an insight that came to him as he used a child’s toy tachistoscope. Wertheimer, and his assistants Wolfgang Köhler and Kurt Koffka, who later became his partners, believed that perception involved more than simply combining sensory stimuli. This belief led to a new movement within the field of psychology known as Gestalt psychology. The word gestalt literally means form or pattern, but its use reflects the idea that the whole is different from the sum of its parts. In other words, the brain creates a perception that is more than simply the sum of available sensory inputs, and it does so in predictable ways. Gestalt psychologists translated these predictable ways into principles by which we organize sensory information. As a result, Gestalt psychology has been extremely influential in the area of sensation and perception (Rock & Palmer, 1990).

Gestalt perspectives in psychology represent investigations into ambiguous stimuli to determine where and how these ambiguities are being resolved by the brain. They are also aimed at understanding sensory and perception as processing information as groups or wholes instead of constructed wholes from many small parts. This perspective has been supported by modern cognitive science through fMRI research demonstrating that some parts of the brain, specifically the lateral occipital lobe, and the fusiform gyrus, are involved in the processing of whole objects, as opposed to the primary occipital areas that process individual elements of stimuli (Kubilius, Wagemans & Op de Beeck, 2011).

One Gestalt principle is the figure-ground relationship. According to this principle, we tend to segment our visual world into figure and ground. Figure is the object or person that is the focus of the visual field, while the ground is the background. As the figure below shows, our perception can vary tremendously, depending on what is perceived as figure and what is perceived as ground. Presumably, our ability to interpret sensory information depends on what we label as figure and what we label as ground in any particular case, although this assumption has been called into question (Peterson & Gibson, 1994; Vecera & O’Reilly, 1998).

An illustration shows two identical black face-like shapes that face towards one another, and one white vase-like shape that occupies all of the space in between them. Depending on which part of the illustration is focused on, either the black shapes or the white shape may appear to be the object of the illustration, leaving the other(s) perceived as negative space.

The concept of figure-ground relationship explains why this image can be perceived either as a vase or as a pair of faces.

   Another Gestalt principle for organizing sensory stimuli into meaningful perception is proximity . This principle asserts that things that are close to one another tend to be grouped together, as the figure below illustrates.

The Gestalt principle of proximity suggests that you see (a) one block of dots on the left side and (b) three columns on the right side.

   How we read something provides another illustration of the proximity concept. For example, we read this sentence like this, notl iket hiso rt hat. We group the letters of a given word together because there are no spaces between the letters, and we perceive words because there are spaces between each word. Here are some more examples: Cany oum akes enseo ft hiss entence? What doth es e wor dsmea n?

We might also use the principle of similarity to group things in our visual fields. According to this principle, things that are alike tend to be grouped together (figure below). For example, when watching a football game, we tend to group individuals based on the colors of their uniforms. When watching an offensive drive, we can get a sense of the two teams simply by grouping along this dimension.

When looking at this array of dots, we likely perceive alternating rows of colors. We are grouping these dots according to the principle of similarity.

   Two additional Gestalt principles are the law of continuity (or good continuation) and closure. The law of continuity suggests that we are more likely to perceive continuous, smooth flowing lines rather than jagged, broken lines (figure below). The principle of closure states that we organize our perceptions into complete objects rather than as a series of parts (figure below).

Good continuation would suggest that we are more likely to perceive this as two overlapping lines, rather than four lines meeting in the center.

Closure suggests that we will perceive a complete circle and rectangle rather than a series of segments..

   According to Gestalt theorists, pattern perception, or our ability to discriminate among different figures and shapes, occurs by following the principles described above. You probably feel fairly certain that your perception accurately matches the real world, but this is not always the case. Our perceptions are based on perceptual hypotheses: educated guesses that we make while interpreting sensory information. These hypotheses are informed by a number of factors, including our personalities, experiences, and expectations. We use these hypotheses to generate our perceptual set. For instance, research has demonstrated that those who are given verbal priming produce a biased interpretation of complex ambiguous figures (Goolkasian & Woodbury, 2010).

Template Approach

Ulrich Neisser (1967), author of one of the first cognitive psychology textbook suggested pattern recognition would be simplified, although abilities would still exist, if all the patterns we experienced were identical. According to this theory, it would be easier for us to recognize something if it matched exactly with what we had perceived before. Obviously the real environment is infinitely dynamic producing countless combinations of orientation, size. So how is it that we can still read a letter g whether it is capitalized, non-capitalized or in someone else hand writing? Neisser suggested that categorization of information is performed by way of the brain creating mental  templates , stored models of all possible categorizable patterns (Radvansky & Ashcraft, 2014). When a computer reads your debt card information it is comparing the information you enter to a template of what the number should look like (has a specific amount of numbers, no letters or symbols…). The template view perception is able to easily explain how we recognize pieces of our environment, but it is not able to explain why we are still able to recognize things when it is not viewed from the same angle, distance, or in the same context.

In order to address the shortfalls of the template model of perception, the  feature detection approach to visual perception suggests we recognize specific features of what we are looking at, for example the straight lines in an H versus the curved line of a letter C. Rather than matching an entire template-like pattern for the capital letter H, we identify the elemental features that are present in the H. Several people have suggested theories of feature-based pattern recognition, one of which was described by Selfridge (1959) and is known as the  pandemonium model suggesting that information being perceived is processed through various stages by what Selfridge described as mental demons, who shout out loud as they attempt to identify patterns in the stimuli. These pattern demons are at the lowest level of perception so after they are able to identify patterns, computational demons further analyze features to match to templates such as straight or curved lines. Finally at the highest level of discrimination, cognitive demons which allow stimuli to be categorized in terms of context and other higher order classifications, and the decisions demon decides among all the demons shouting about what the stimuli is which while be selected for interpretation.

what perceptual hypothesis

Selfridge’s pandemonium model showing the various levels of demons which make estimations and pass the information on to the next level before the decision demon makes the best estimation to what the stimuli is. Adapted from Lindsay and Norman (1972).

Although Selfridges ideas regarding layers of shouting demons that make up our ability to discriminate features of our environment, the model actually incorporates several ideas that are important for pattern recognition. First, at its foundation, this model is a feature detection model that incorporates higher levels of processing as the information is processed in time. Second, the Selfridge model of many different shouting demons incorporates ideas of parallel processing suggesting many different forms of stimuli can be analyzed and processed to some extent at the same time. Third and finally, the model suggests that perception in a very real sense is a series of problem solving procedures where we are able to take bits of information and piece it all together to create something we are able to recognize and classify as something meaningful.

In addition to sounding initially improbable by being based on a series of shouting fictional demons,  one of the main critiques of Selfridge’s demon model of feature detection is that it is primarily a  bottom-up , or  data-driven processing system. This means the feature detection and processing for discrimination all comes from what we get out of the environment. Modern progress in cognitive science has argued against strictly bottom-up processing models suggesting that context plays an extremely important role in determining what you are perceiving and discriminating between stimuli. To build off previous models, cognitive scientist suggested an additional  top-down , or  conceptually-driven account in which context and higher level knowledge such as context something tends to occur in or a persons expectations influence lower-level processes.

Finally the most modern theories that attempt to describe how information is processed for our perception and discrimination are known as  connectionist   models. Connectionist models incorporate an enormous amount of mathematical computations which work in parallel and across series of interrelated web like structures using top-down and bottom-up processes to narrow down what the most probably solution for the discrimination would be. Each unit in a connectionist layer is massively connected in a giant web with many or al the units in the next layer of discrimination. Within these models, even if there is not many features present in the stimulus, the number of computations in a single run for discrimination become incredibly large because of all the connections that exist between each unit and layer.

The Depths of Perception: Bias, Prejudice, and Cultural Factors

   In this chapter, you have learned that perception is a complex process. Built from sensations, but influenced by our own experiences, biases, prejudices, and cultures , perceptions can be very different from person to person. Research suggests that implicit racial prejudice and stereotypes affect perception. For instance, several studies have demonstrated that non-Black participants identify weapons faster and are more likely to identify non-weapons as weapons when the image of the weapon is paired with the image of a Black person (Payne, 2001; Payne, Shimizu, & Jacoby, 2005). Furthermore, White individuals’ decisions to shoot an armed target in a video game is made more quickly when the target is Black (Correll, Park, Judd, & Wittenbrink, 2002; Correll, Urland, & Ito, 2006). This research is important, considering the number of very high-profile cases in the last few decades in which young Blacks were killed by people who claimed to believe that the unarmed individuals were armed and/or represented some threat to their personal safety.

Gestalt theorists have been incredibly influential in the areas of sensation and perception. Gestalt principles such as figure-ground relationship, grouping by proximity or similarity, the law of good continuation, and closure are all used to help explain how we organize sensory information. Our perceptions are not infallible, and they can be influenced by bias, prejudice, and other factors.

References:

Openstax Psychology text by Kathryn Dumper, William Jenkins, Arlene Lacombe, Marilyn Lovett and Marion Perlmutter licensed under CC BY v4.0. https://openstax.org/details/books/psychology

Review Questions:

1. According to the principle of ________, objects that occur close to one another tend to be grouped together.

a. similarity

b. good continuation

c. proximity

2. Our tendency to perceive things as complete objects rather than as a series of parts is known as the principle of ________.

d. similarity

3. According to the law of ________, we are more likely to perceive smoothly flowing lines rather than choppy or jagged lines.

4. The main point of focus in a visual display is known as the ________.

b. perceptual set

Critical Thinking Question:

1. The central tenet of Gestalt psychology is that the whole is different from the sum of its parts. What does this mean in the context of perception?

2. Take a look at the following figure. How might you influence whether people see a duck or a rabbit?

A drawing appears to be a duck when viewed horizontally and a rabbit when viewed vertically.

Personal Application Question:

1. Have you ever listened to a song on the radio and sung along only to find out later that you have been singing the wrong lyrics? Once you found the correct lyrics, did your perception of the song change?

figure-ground relationship

Gestalt psychology

  • good continuation

pattern perception

perceptual hypothesis

principle of closure

Key Takeaways

1. This means that perception cannot be understood completely simply by combining the parts. Rather, the relationship that exists among those parts (which would be established according to the principles described in this chapter) is important in organizing and interpreting sensory information into a perceptual set.

2. Playing on their expectations could be used to influence what they were most likely to see. For instance, telling a story about Peter Rabbit and then presenting this image would bias perception along rabbit lines.

closure:  organizing our perceptions into complete objects rather than as a series of parts

figure-ground relationship:  segmenting our visual world into figure and ground

Gestalt psychology:  field of psychology based on the idea that the whole is different from the sum of its parts

good continuation:  (also, continuity) we are more likely to perceive continuous, smooth flowing lines rather than jagged, broken lines

pattern perception:  ability to discriminate among different figures and shapes

perceptual hypothesis:  educated guess used to interpret sensory information

principle of closure:  organize perceptions into complete objects rather than as a series of parts

proximity:  things that are close to one another tend to be grouped together

similarity:  things that are alike tend to be grouped together

Review Questions

According to the principle of ________, objects that occur close to one another tend to be grouped together.

Our tendency to perceive things as complete objects rather than as a series of parts is known as the principle of ________.

According to the law of ________, we are more likely to perceive smoothly flowing lines rather than choppy or jagged lines.

The main point of focus in a visual display is known as the ________.

  • perceptual set

Critical Thinking Question

The central tenet of Gestalt psychology is that the whole is different from the sum of its parts. What does this mean in the context of perception?

Take a look at the following figure. How might you influence whether people see a duck or a rabbit?

Answer: Playing on their expectations could be used to influence what they were most likely to see. For instance, telling a story about Peter Rabbit and then presenting this image would bias perception along rabbit lines.

Personal Application Question

Have you ever listened to a song on the radio and sung along only to find out later that you have been singing the wrong lyrics? Once you found the correct lyrics, did your perception of the song change?

Creative Commons License

Share This Book

  • Increase Font Size

Perceptual Set In Psychology: Definition & Examples

Saul McLeod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul McLeod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

Perceptual set in psychology refers to a mental predisposition or readiness to perceive stimuli in a particular way based on previous experiences, expectations, beliefs, and context. It influences how we interpret and make sense of sensory information, shaping our perception and understanding of the world.

Perceptual set theory stresses the idea of perception as an active process involving selection, inference, and interpretation (known as top-down processing ).

The concept of perceptual set is important to the active process of perception.  Allport (1955) defined perceptual set as:

“A perceptual bias or predisposition or readiness to perceive particular features of a stimulus.”

Perceptual set is a tendency to perceive or notice some aspects of the available sensory data and ignore others.  According to Vernon, 1955 perceptual set works in two ways:

  • The perceiver has certain expectations and focuses attention on particular aspects of the sensory data: This he calls a Selector”.
  • The perceiver knows how to classify, understand and name selected data and what inferences to draw from it. This she calls an “Interpreter”.

It has been found that a number of variables, or factors, influence perceptual set, and set in turn influences perception. The factors include:

• Expectations • Emotion • Motivation • Culture

Expectation and Perceptual Set

(a) Bruner & Minturn (1955) illustrated how expectation could influence set by showing participants an ambiguous figure “13” set in the context of letters or numbers e.g.

percpetual set Bruner Minturn

The physical stimulus “13” is the same in each case but is perceived differently because of the influence of the context in which it appears. We EXPECT to see a letter in the context of other letters of the alphabet, whereas we EXPECT to see numbers in the context of other numbers.

(b) We may fail to notice printing/writing errors for the same reason. For example:

1. “The Cat Sat on the Map and Licked its Whiskers”.

percpetual set

(a) and (b) are examples of interaction between expectation and past experience.

(c) A study by Bugelski and Alampay (1961) using the “rat-man” ambiguous figure also demonstrated the importance of expectation in inducing set. Participants were shown either a series of animal pictures or neutral pictures prior to exposure to the ambiguous picture. They found participants were significantly more likely to perceive the ambiguous picture as a rat if they had had prior exposure to animal pictures.

percpetual set expectation

Motivation / Emotion and Perceptual Set

Allport (1955) has distinguished 6 types of motivational-emotional influence on perception:

(i) bodily needs (e.g. physiological needs) (ii) reward and punishment (iii) emotional connotation (iv) individual values (v) personality (vi) the value of objects.

(a) Sandford (1936) deprived participants of food for varying lengths of time, up to 4 hours, and then showed them ambiguous pictures. Participants were more likely to interpret the pictures as something to do with food if they had been deprived of food for a longer period of time.

Similarly Gilchrist & Nesberg (1952), found participants who had gone without food for the longest periods were more likely to rate pictures of food as brighter. This effect did not occur with non-food pictures.

(b) A more recent study into the effect of emotion on perception was carried out by Kunst- Wilson & Zajonc (1980). Participants were repeatedly presented with geometric figures, but at levels of exposure too brief to permit recognition.

Then, on each of a series of test trials, participants were presented a pair of geometric forms, one of which had previously been presented and one of which was brand new.  For each pair, participants had to answer two questions: (a) Which of the 2 had previously been presented? ( A recognition test); and (b) Which of the two was most attractive? (A feeling test).

The hypothesis for this study was based on a well-known finding that the more we are exposed to a stimulus, the more familiar we become with it and the more we like it.  Results showed no discrimination on the recognition test – they were completely unable to tell old forms from new ones, but participants could discriminate on the feeling test, as they consistently favored old forms over new ones. Thus information that is unavailable for conscious recognition seems to be available to an unconscious system that is linked to affect and emotion.

Culture and Perceptual Set

percpetual set culture

Elephant drawing split-view and top-view perspective. The split elephant drawing was generally preferred by African children and adults .

(a) Deregowski (1972) investigated whether pictures are seen and understood in the same way in different cultures. His findings suggest that perceiving perspective in drawings is in fact a specific cultural skill, which is learned rather than automatic. He found people from several cultures prefer drawings which don”t show perspective, but instead are split so as to show both sides of an object at the same time.

In one study he found a fairly consistent preference among African children and adults for split-type drawings over perspective-drawings. Split type drawings show all the important features of an object which could not normally be seen at once from that perspective. Perspective drawings give just one view of an object. Deregowski argued that this split-style representation is universal and is found in European children before they are taught differently.

(b) Hudson (1960) noted difficulties among South African Bantu workers in interpreting depth cues in pictures. Such cues are important because they convey information about the spatial relationships among the objects in pictures. A person using depth cues will extract a different meaning from a picture than a person not using such cues.

Hudson tested pictorial depth perception by showing participants a picture like the one below. A correct interpretation is that the hunter is trying to spear the antelope, which is nearer to him than the elephant. An incorrect interpretation is that the elephant is nearer and about to be speared. The picture contains two depth cues: overlapping objects and known size of objects. Questions were asked in the participants native language such as:

What do you see? Which is nearer, the antelope or the elephant? What is the man doing?

The results indicted that both children and adults found it difficult to perceive depth in the pictures.

percpetual set culture

The cross-cultural studies seem to indicate that history and culture play an important part in how we perceive our environment. Perceptual set is concerned with the active nature of perceptual processes and clearly there may be a difference cross-culturally in the kinds of factors that affect perceptual set and the nature of the effect.

Allport, F. H. (1955). Theories of perception and the concept of structure . New York: Wiley.

Bruner, J. S. and Minturn, A.L. (1955). Perceptual identification and perceptual organisation, Journal of General Psychology 53: 21-8.

Bugelski, B. R., & Alampay, D. A., (1961). The role of frequency in developing perceptual sets. Canadian Journal of Psychology , 15, 205-211.

Deregowski, J. B., Muldrow, E. S. & Muldrow, W. F. (1972). Pictorial recognition in a remote Ethiopian population. Perception , 1, 417-425.

Gilchrist, J. C.; Nesberg, Lloyd S. (1952). Need and perceptual change in need-related objects. Journal of Experimental Psychology , Vol 44(6).

Hudson, W. (1960). Pictorial depth perception in sub-cultural groups in Africa. Journal of Social Psychology , 52, 183-208.

Kunst- Wilson, W. R., & Zajonc, R. B. (1980). Affective discrimination of stimuli that cannot be recognised. Science , Vol 207, 557-558.

Necker, L. (1832). LXI. Observations on some remarkable optical phenomena seen in Switzerland; and on an optical phenomenon which occurs on viewing a figure of a crystal or geometrical solid . The London and Edinburgh Philosophical Magazine and Journal of Science, 1 (5), 329-337.

Sanford, R. N. (1936). The effect of abstinence from food upon imaginal processes: a preliminary experiment. Journal of Psychology: Interdisciplinary and Applied , 2, 129-136.

Vernon, M. D. (1955). The functions of schemata in perceiving. Psychological Review , Vol 62(3).

Why people should be skeptical when evaluating the accuracy of their perceptual set?

People should be skeptical when evaluating the accuracy of their perceptual set because it can lead to biased and subjective interpretations of reality. It can limit our ability to consider alternative perspectives or recognize new information that challenges our beliefs. Awareness of our perceptual sets and actively questioning them allows for more open-mindedness, critical thinking, and a more accurate understanding of the world.

Print Friendly, PDF & Email

ORIGINAL RESEARCH article

Perceptions as hypotheses: saccades as experiments.

what perceptual hypothesis

  • 1 The Wellcome Trust Centre for Neuroimaging, University College London, London, UK
  • 2 Institut de Neurosciences de la Timone, CNRS - Aix-Marseille University, Marseille, France
  • 3 Queensland Institute of Medical Research, Royal Brisbane Hospital, Brisbane, QLD, Australia

If perception corresponds to hypothesis testing ( Gregory, 1980 ); then visual searches might be construed as experiments that generate sensory data. In this work, we explore the idea that saccadic eye movements are optimal experiments, in which data are gathered to test hypotheses or beliefs about how those data are caused. This provides a plausible model of visual search that can be motivated from the basic principles of self-organized behavior: namely, the imperative to minimize the entropy of hidden states of the world and their sensory consequences. This imperative is met if agents sample hidden states of the world efficiently. This efficient sampling of salient information can be derived in a fairly straightforward way, using approximate Bayesian inference and variational free-energy minimization. Simulations of the resulting active inference scheme reproduce sequential eye movements that are reminiscent of empirically observed saccades and provide some counterintuitive insights into the way that sensory evidence is accumulated or assimilated into beliefs about the world.

Introduction

This paper continues our effort to understand action and perception in terms of variational free-energy minimization ( Friston et al., 2006 ). The minimization of free energy is based on the assumption that biological systems or agents maximize the Bayesian evidence for their model of the world through an active sampling of sensory information. In this context, negative free energy provides a proxy for model evidence that is much easier to evaluate than evidence per se . Under some simplifying assumptions, free-energy reduces to the amount of prediction error. This means that minimizing free-energy corresponds to minimizing prediction errors and can be formulated as predictive coding ( Rao and Ballard, 1999 ; Friston, 2005 ). Expressed like this, minimizing free-energy sounds perfectly plausible and fits comfortably with Bayesian treatments of perception ( Knill and Pouget, 2004 ; Yuille and Kersten, 2006 ). However, log model evidence is the complement of self information or surprise in information theory. This means that maximizing evidence corresponds to minimizing surprise; in other words, agents should sample their world to preclude surprises. Despite the explanatory power of predictive coding as a metaphor for perceptual inference in the brain, it leads to a rather paradoxical conclusion: if we are trying to minimize surprise, we should avoid sensory stimulation and retire to a dark and quiet room.

This is the dark room problem and is often raised as a natural objection to the principle of free-energy minimization. In Friston et al. (2012) , we rehearse the problem and its implications in the form of a three-way conversation between a physicist, a philosopher, and an information theorist. The resolution of the dark room problem is fairly simple: prior beliefs render dark rooms surprising. The existence of these beliefs is assured by natural selection, in the sense that agents that did not find dark rooms surprising would stay there indefinitely, until they die of dehydration or loneliness. However, this answer to the darkroom paradox does not tell us very much about the nature or principles that determine the prior beliefs that are essential for survival. In this paper, we consider prior beliefs more formally using information theory and the free-energy formulation and specify exactly what these prior beliefs are optimizing. In brief, we will see that agents engage actively with their sensorium and must be equipped with prior beliefs that salient features of the world will disclose themselves, or be discovered by active sampling. This leads to a natural explanation for exploratory behavior and visual search strategies, of the sort studied in psychology and psychophysics ( Gibson, 1979 ; Itti and Koch, 2001 ; Humphreys et al., 2009 ; Itti and Baldi, 2009 ; Shires et al., 2010 ; Shen et al., 2011 ; Wurtz et al., 2011 ). Crucially, this behavior is an emergent property of minimizing surprise about sensations and their causes. In brief, this requires an agent to select or sample sensations that are predicted and believe that this sampling will minimize uncertainty about those predictions .

The prior beliefs that emerge from this formulation are sensible from a number of perspectives. We will see that they can be regarded as beliefs that sensory information is acquired to minimize uncertainty about its causes. These sorts of beliefs are commonplace in everyday life and scientific investigation. Perhaps the simplest example is a scientific experiment designed to minimize the uncertainty about some hypothetical mechanism or treatment effect ( Daunizeau et al., 2011 ). In other words, we acquire data we believe will provide evidence for (or against) a hypothesis. In a psychological setting, if we regard perception as hypothesis testing ( Gregory, 1980 ), this translates naturally into an active sampling of sensory data to disclose the hidden objects or causes we believe are generating those data. Neurobiologically, this translates to optimal visual search strategies that optimize the salience of sampling; where salience can be defined operationally in terms of minimizing conditional uncertainty about perceptual representations. We will see that prior beliefs about the active sampling of salient features are exactly consistent with the maximization of Bayesian surprise ( Itti and Baldi, 2009 ), optimizing signal detection ( Morgan, 2011 ), the principle of minimum redundancy ( Barlow, 1961 ), and the principle of maximum information transfer ( Linsker, 1990 ; Bialek et al., 2001 ).

From the point of view of the free-energy principle, a more detailed examination of prior beliefs forces us to consider some important distinctions about hidden states of the world and the controlled nature of perceptual inference. In short, free-energy minimization is applied to both action and perception ( Friston, 2010 ) such that behavior, or more simply movement, tries to minimize prediction errors, and thereby fulfill predictions based upon conditional beliefs about the state of the world. However, the uncertainty associated with those conditional beliefs depends upon the way data are sampled; for example, where we direct our gaze or how we palpate a surface. The physical deployment of sensory epithelia is itself a hidden state of the world that has to be inferred. However, these hidden states can be changed by action, which means there is a subset of hidden states over which we have control. These will be referred to as hidden controls states or more simply hidden controls . The prior beliefs considered below pertain to these hidden controls and dictate how we engage actively with the environment to minimize the uncertainty of our perceptual inferences. Crucially, this means that prior beliefs have to be encoded physically (neuronally) leading to the notion of fictive or counterfactual representations ; in other words, what we would infer about the world, if we sample it in a particularly way. This leads naturally to the internal representation of prior beliefs about fictive sampling and the emergence of things like intention and salience. Furthermore, counterfactual representations take us beyond predictive coding of current sensations and into prospective coding about our sensory behavior in the future. This prospective coding rests on an internal model of control (control states) that may be an important element of generative models that endow agents with a sense of agency. This is because, unlike action, hidden controls are inferred, which requires a probabilistic representation of control. We will try to illustrate these points using visual search and the optimal control of saccadic eye movements ( Grossberg et al., 1997 ; Itti and Baldi, 2009 ; Srihasam et al., 2009 ); noting that similar principles should apply to active sampling of any sensory inputs. For example, they should apply to motor control when making inferences about objects causing somatosensory sensations ( Gibson, 1979 ).

This paper comprises four sections. In the first, we focus on theoretical aspects and describe how prior beliefs about hidden control states follow from the basic imperatives of self organization ( Ashby, 1947 ). This section uses a general but rather abstract formulation of agents, in terms of the states they can occupy, that enables us to explain action, perception, and control as corollaries of a single principle. The particular focus here will be on prior beliefs about control and how they can be understood in terms of more familiar constructs such as signal detection theory, the principle of maximum mutual information and specific treatments of visual attention such as Bayesian surprise ( Itti and Baldi, 2009 ). Having established the underlying theory, the second section considers neurobiological implementation in terms of predictive coding and recurrent message passing in the brain. This brief section reprises the implicit neural architecture we have described in many previous publications and extends it to include the encoding of prior beliefs in terms of (place coded) saliency maps. The third and fourth sections provide an illustration of the basic ideas using neuronally plausible simulations of visual search and the control of saccadic eye movements. This illustration allows us to understand Bayes-optimal searches in terms of saliency maps and the saltatory accumulation of evidence during perceptual categorization. We conclude with a brief discussion of the theoretical implications of these ideas and how they could be tested empirically.

Action, Perception, and Control

This section establishes the nature of Bayes-optimal inference in the context of controlled sensory searches. It starts with the basic premise that underlies free-energy minimization; namely, the imperative to minimize the dispersion of sensory states and their hidden causes to ensure a homeostasis of the external and internal milieu ( Ashby, 1947 ). It shows briefly how action and perception follow from this imperative and highlights the important role of prior beliefs about the sampling of sensory states.

This section develops the ideas in a rather compact and formal way. Readers who prefer a non-mathematical description could skip to the summary and discussion of the main results at the end of this section. For people familiar with the free-energy formulation, this paper contains an important extension or generalization of the basic account of action and perception: here, we consider not just the minimization of sensory surprise or entropy but the entropy or dispersion of both sensory states and the hidden states that cause them. In brief, this leads to particular prior beliefs about the active sampling of sensory states, which may offer an explanation for the nature of sensory searches.

Notation and Set up

We will use X : Ω × … → ℝ for real valued random variables and x ∈ X for particular values. A probability density will be denoted by p ( x ) = Pr{ X = x } using the usual conventions and its entropy H [ p ( x )] by H ( X ). The tilde notation x ̃ = x , x ′ , x ″ , … denotes variables in generalized coordinates of motion ( Friston, 2008 ), where each prime denotes a temporal derivative (using Lagrange’s notation). For simplicity, constant terms will be omitted from equalities.

In what follows, we would consider free-energy minimization in terms of active inference: Active inference rests on the tuple (Ω, ψ, S , A , R , q , p ) that comprises the following:

• A sample space Ω or non-empty set from which random fluctuations or outcomes ω ∈ Ω are drawn.

• Hidden states Ψ:Ψ × A × Ω → ℝ that constitute the dynamics of states of the world that cause sensory states and depend on action.

• Sensory states S :Ψ × A × Ω → ℝ that correspond to the agent’s sensations and constitute a probabilistic mapping from action and hidden states.

• Action A : S × R → ℝ corresponding to an agent’s action that depends on its sensory and internal states.

• Internal states R:R × S × Ω → ℝ that constitute the dynamics of states of the agent that cause action and depend on sensory states.

• Conditional density q ψ ̃ : = q ψ ̃ | μ ̃ – an arbitrary probability density function over hidden states ψ ̃ ∈ ψ that is parameterized by internal states μ ̃ ∈ R .

• Generative density p s ̃ , ψ ̃ | m – a probability density function over sensory and hidden states under a generative model denoted by m .

We assume that the imperative for any biological system is to minimize the dispersion of its sensory and hidden states, with respect to action ( Ashby, 1947 ). We will refer to the sensory and hidden states collectively as external states S × Ψ. Mathematically, the dispersion of external states corresponds to the (Shannon) entropy of their probability density that, under ergodic assumptions, equals (almost surely) the long-term time average of Gibbs energy:

Gibbs energy G s ̃ , ψ ̃ is defined in terms of the generative density or model. Clearly, agents cannot minimize this energy directly because the hidden states are unknown. However, we can decompose the entropy into the entropy of the sensory states (to which the system has access) and the conditional entropy of hidden states (to which the system does not have access)

This means that the entropy of the external states can be minimized through action to minimize sensory surprise - ln p s ̃ ( t ) | m , under the assumption that the consequences of action minimize conditional entropy:

The consequences of action are expressed by changes in a subset of external states U ⊂ Ψ, that we will call hidden control states or hidden controls . When Eq. 3 is satisfied, the variation of entropy in Eq. 1 with respect to action and its consequences are zero, which means the entropy has been minimized (at least locally). However, the hidden controls cannot be optimized explicitly because they are hidden from the agent. To resolve this problem, we first consider action and then return to optimizing hidden controls post hoc .

Action and Perception

Action cannot minimize sensory surprise directly (Eq. 3) because this would involve an intractable marginalization over hidden states, so surprise is replaced with an upper bound called variational free energy ( Feynman, 1972 ). This free energy is a functional of the conditional density or a function of its parameters and is relatively easy to evaluate. However, replacing surprise with free energy means that internal states also have to minimize free energy, to ensure it is a tight bound on surprise:

This induces a dual minimization with respect to action and the internal states that parameterize the conditional density. These minimizations correspond to action and perception respectively. In brief, the need for perception is induced by introducing free energy to finesse the evaluation of surprise; where free energy can be evaluated by an agent fairly easily, given a Gibbs energy or a generative model. The last equality says that free energy is always greater than surprise because the second (Kullback–Leibler divergence) term is non-negative. This means that when free energy is minimized with respect to the internal states, free-energy approximates surprise and the conditional density approximates the posterior density over external states:

This is known as approximate Bayesian inference, which becomes exact when the conditional and posterior densities have the same form ( Beal, 2003 ). Minimizing free energy also means that the entropy of the conditional density approximates the conditional entropy. This allows us to revisit the optimization of hidden controls, provided we know how they affect the entropy of the conditional density:

The Maximum Entropy Principle and the Laplace Assumption

If we admit an encoding of the conditional density up to second order moments, then the maximum entropy principle ( Jaynes, 1957 ) implicit in the definition of free energy (Eq. 4) requires q ( ψ ˜ | μ ˜ ) = N  ( μ ˜ , ∑ ) to be Gaussian. This is because a Gaussian density has the maximum entropy of all forms that can be specified with two moments. Adopting a Gaussian form is known as the Laplace assumption and enables us to express the entropy of the conditional density in terms of its first moment or expectation. This follows because we can minimize free energy with respect to the conditional covariance as follows:

Here, the conditional precision ∏ ( s ̃ , μ ̃ ) is the inverse of the conditional covariance Σ ( s ̃ , μ ̃ ) . In short, the entropy of the conditional density and free energy are functions of the conditional expectations and sensory states.

Bayes-Optimal Control

We can now optimize the hidden controls vicariously through prior expectations that are fulfilled by action. This optimization can be expressed in terms of prior expectations about hidden controls

This equation means the agent expects hidden controls to minimize a counterfactual uncertainty about hidden states. This uncertainty corresponds to entropy of a fictive or counterfactual density parameterized by conditional expectations about hidden states in the future μ ̃ x t + τ that depend on hidden controls. From Eq. 6, minimizing counterfactual uncertainty is equivalent to maximizing the precision of counterfactual beliefs.

Interestingly, Eqs 4 and 7 say that conditional expectations (about hidden states) maximize conditional uncertainty, while prior expectations (about hidden controls) minimize conditional uncertainty. This means the posterior and prior beliefs are in opposition, trying to maximize and minimize uncertainty (entropy) about hidden states respectively. The latter represent prior beliefs that hidden states are sampled to maximize conditional confidence, while the former minimizes conditional confidence to ensure the explanation for sensory data does not depend on very precise values of the hidden states – in accord with the maximum entropy principle (or Laplace’s principle of indifference). In what follows, we will refer to the negative entropy of the counterfactual density as salience noting that salience is a measure of certainty about hidden states that depends on how they are sampled. In other words, salience is the precision of counterfactual beliefs that depend on where or how sensory data are sampled. This means that prior beliefs about hidden controls entail the expectation that salient features will be sampled.

A subtle but important point in this construction is that it optimizes hidden controls without specifying how they depend on action. The agent is not aware of action because action is not inferred or represented. Instead, the agent has prior beliefs about hidden (and benevolent) causes that minimize conditional uncertainty. The agent may infer that these control states are produced by its own movements and thereby infer agency, although this is not necessary: The agent’s generative model must specify how hidden controls affect sensory samples so that action can realize prior beliefs; however, the agent has no model or representation of how action affects hidden controls. This is important because it eschews the inverse motor control problem; namely, working out which actions produce desired hidden controls. We will return to this later.

To recap, we started with the assumption that biological systems seek to minimize the dispersion or entropy of states in their external milieu to ensure a sustainable and homoeostatic exchange with their environment ( Ashby, 1947 ). Clearly, these states are hidden and therefore cannot be measured or changed directly. However, if agents know how their action changes sensations (for example, if they know contracting certain muscles will necessarily excite primary sensory afferents from stretch receptors), then they can minimize the dispersion of their sensory states by countering surprising deviations from expected values. If the uncertainty about hidden states, given sensory states, is small, then the implicit minimization of sensory surprise through action will be sufficient. Minimizing surprise through action is not as straightforward as it might seem, because the evaluation of surprise per se is intractable. This is where free energy comes in – to provide an upper bound that enables agents to minimize free energy instead of surprise. However, in creating the upper bound the agent now has to minimize the difference between surprise and free energy by changing its internal states. This corresponds to perception and makes the conditional density an approximation to the true posterior density ( Helmholtz, 1866/1962 ; Gregory, 1980 ; Ballard et al., 1983 ; Dayan et al., 1995 ; Friston, 2005 ). When the agent has optimized its conditional density, through Bayes-optimal perception, it is now in a position to minimize the uncertainty about hidden states causing sensations. It can do this by engaging action to realize prior beliefs about states which control this uncertainty. In other words, it only has to believe that hidden states of the world will disclose themselves in an efficient way and then action will make these beliefs come true.

For example, if I am sitting in my garden and register some fluttering in the periphery of my vision, then my internal brain states will change to encode the perceptual hypothesis that the sensations were caused by a bird. This minimizes my surprise about the fluttering sensations. On the basis of this hypothesis I will select prior beliefs about the direction of my gaze that will minimize the uncertainty about my hypothesis. These prior beliefs will produce proprioceptive predictions about my oculomotor system and the visual consequences of looking at the bird. Action will fulfill these proprioceptive predictions and cause me to foveate the bird through classical reflex arcs. If my original hypothesis was correct, the visual evidence discovered by my orienting saccade will enable me to confirm the hypothesis with a high degree of conditional certainty. We will pursue this example later using simulations.

Crucially, placing prior beliefs about hidden controls in the perception–action cycle rests upon having a generative model that includes control. In other words, this sort of Bayes-optimal search calls on an internal model of how we sample our environment. Implicit in a model of controlled sampling is a representation or sense of agency, which extends the free-energy formalism in an important way. Note however, this extension follows naturally from the basic premise that the purpose of action and perception is to minimize the joint entropy of hidden world states and their sensory consequences. In this section, we have seen how prior beliefs, that afford important constraints on free energy, can be harnessed to minimize not just the entropy of sensory states but also the hidden states that cause them. This adds extra dependencies between conditional and prior expectations that have to be encoded by internal brain states (see Figure 1 ). We will see later that this leads to a principled exploration of the sensorium, which shares many features with empirical behavior. Before considering the neurobiological implementation of these dependencies, this section concludes by revisiting counterfactual priors to show that they are remarkably consistent with a number of other perspectives:

www.frontiersin.org

Figure 1. This schematic shows the dependencies among various quantities that are assumed when modeling the exchanges of a self organizing system like the brain with the environment . The top panel describes the states of the environment and the system or agent in terms of a probabilistic dependency graph, where connections denote directed dependencies. The quantities are described within the nodes of this graph with exemplar forms for their dependencies on other variables (see main text). Here, hidden and internal states are separated by action and sensory states. Both action and internal states encoding a conditional density minimize free energy, while internal states encoding prior beliefs maximize salience. Both free energy and salience are defined in terms of a generative model that is shown as fictive dependency graph in the lower panel. Note that the variables in the real world and the form of their dynamics are different from that assumed by the generative model; this is why external states are in bold. Furthermore, note that action is a state in the model of the brain but is replaced by hidden controls in the brain’s model of its world. This means that the agent is not aware of action but has beliefs about hidden causes in the world that action can fulfill through minimizing free energy. These beliefs correspond to prior expectations that sensory states will be sampled in a way that optimizes conditional confidence or salience.

The Infomax Perspective

Priors about hidden controls express the belief that conditional uncertainty will be minimal. The long-term average of this conditional uncertainty is the conditional entropy of hidden states, which can be expressed as the entropy over hidden states minus the mutual information between hidden and sensory states

In other words, minimizing conditional uncertainty is equivalent to maximizing the mutual information between external states and their sensory consequences. This is one instance of the Infomax principle ( Linsker, 1990 ). Previously, we have considered the relationship between free-energy minimization and the principle of maximum mutual information, or minimum redundancy ( Barlow, 1961 , 1974 ; Optican and Richmond, 1987 ; Oja, 1989 ; Olshausen and Field, 1996 ; Bialek et al., 2001 ) in terms of the mapping between hidden and internal states ( Friston, 2010 ). In this setting, one can show that “the Infomax principle is a special case of the free-energy principle that obtains when we discount uncertainty and represent sensory data with point estimates of their causes.” Here, we consider the mapping between external and sensory states and find that prior beliefs about how sensory states are sampled further endorse the Infomax principle.

The Signal Detection Perspective

A related perspective comes from signal detection theory ( Morgan, 2011 ) and the sensitivity of sensory mappings to external states of the world: For a sensory mapping with additive Gaussian noise (in which sensory precision is not state dependent):

This means minimizing conditional uncertainty (as approximated by the entropy of the conditional density) rests on maximizing signal to noise: ∂ μ ̃ g ̃ T ∏ ω ∂ μ ̃ g ̃ . Here, the gradients of the sensory mapping ∂ μ ̃ g ̃ can be regarded as the sensitivity of the sensory mapping to changes in hidden states, where this sensitivity depends on hidden controls.

There are several interesting points to be made here: first, when the sensory mapping is linear, its gradient is constant and conditional uncertainty does not depend upon hidden controls. In this instance, everything is equally salient and there are no optimal prior beliefs about hidden controls. This has been the simplifying assumption in previous treatments of the free-energy principle, where “the entropy of hidden states is upper-bounded by the entropy of sensations, assuming their sensitivity to hidden states is constant, over the range of states encountered” ( Friston, 2010 ). However, this assumption fails with sensory mappings that are non-linear in hidden controls. Important examples in the visual domain include visual occlusion, direction of gaze and, most simply, the level of illumination. The last example speaks directly to the dark room problem and illustrates its resolution by prior beliefs: if an agent found itself in a dark room, the simplest way to increase the gain or sensitivity of its sensory mapping would be to switch on a light. This action would be induced by prior beliefs that there will be light, provided the agent has a generative model of the proprioceptive and visual consequences of illuminating the room. Note that action is caused by proprioceptive predictions under beliefs about hidden controls (changes in illumination), which means the agent does not have to know or model how its actions change hidden controls.

Finally, although we will not pursue it in this paper, the conditional entropy or salience also depends on how causes affect sensory precision. This is only relevant when sensory precision is state dependent; however, this may be important in the context of attention and salience. We have previously cast attention has optimizing conditional expectations about precision ( Feldman and Friston, 2010 ). In the current context, this optimization will affect salience and subsequent sensory sampling. This will be pursued in another paper.

The Bayesian Surprise Perspective

Bayesian surprise is a measure of salience based on the Kullback–Leibler divergence between the conditional density (which encodes posterior beliefs) and the prior density ( Itti and Baldi, 2009 ). It measures the information in the data that can be recognized. Empirically, humans direct their gaze toward visual features with high Bayesian surprise: “subjects are strongly attracted toward surprising locations, with 72% of all human gaze shifts directed toward locations more surprising than the average, a figure which rises to 84% when considering only gaze targets simultaneously selected by all subjects” ( Itti and Baldi, 2009 ). In the current setup, Bayesian surprise is the cross entropy or divergence between the posterior and priors over hidden states

If prior beliefs about hidden states are uninformative, the first term is roughly constant. This means that maximizing salience is the same as maximizing Bayesian surprise. This is an important observation because it links salience in the context of active inference with the large literature on salience in the theoretical ( Humphreys et al., 2009 ) and empirical ( Shen et al., 2011 ; Wardak et al., 2011 ) visual sciences; where Bayesian surprise was introduced to explain visual searches in terms of salience.

Minimizing free energy will generally increase Bayesian surprise, because Bayesian surprise is also the complexity cost associated with updating beliefs to explain sensory data more accurately ( Friston, 2010 ). The current arguments suggest that prior beliefs about how we sample the world - to minimize uncertainty about our inferences - maximize Bayesian surprise explicitly. The term Bayesian surprise can be a bit confusing because minimizing surprise per se (or maximizing model evidence) involves keeping Bayesian surprise (complexity) as small as possible. This paradox can be resolved here by noting that agents expect Bayesian surprise to be maximized and then acting to minimize their surprise, given what they expect.

In summary, the imperative to maximize salience or conditional confidence about the causes of sensations emerges naturally from the basic premise that self organizing biological systems (like the brain) minimize the dispersion of their external states when subject to an inconstant and fluctuating environment. This imperative, expressed in terms of prior beliefs about hidden controls in the world that are fulfilled by action, is entirely consistent with the principle of maximum information transfer, sensitivity arguments from signal detection theory and formulations of salience in terms of Bayesian surprise. In what follows, we now consider the neurobiological implementation of free-energy minimization through active inference:

Neurobiological Implementation of Active Inference

In this section, we take the general principles above and consider how they might be implemented in the brain. The equations in this section may appear a bit complicated; however, they are based on just four assumptions:

• The brain minimizes the free energy of sensory inputs defined by a generative model.

• This model includes prior expectations about hidden controls that maximize salience.

• The generative model used by the brain is hierarchical, non-linear, and dynamic.

• Neuronal firing rates encode the expected state of the world, under this model.

The first assumption is the free-energy principle, which leads to active inference in the embodied context of action. The second assumption follows from the arguments of the previous section. The third assumption is motivated easily by noting that the world is both dynamic and non-linear and that hierarchical causal structure emerges inevitably from a separation of temporal scales ( Ginzburg and Landau, 1950 ; Haken, 1983 ). Finally, the fourth assumption is the Laplace assumption that, in terms of neural codes, leads to the Laplace code that is arguably the simplest and most flexible of all neural codes ( Friston, 2009 ).

Given these assumptions, one can simulate a whole variety of neuronal processes by specifying the particular equations that constitute the brain’s generative model. The resulting perception and action are specified completely by the above assumptions and can be implemented in a biologically plausible way as described below (see Table 1 for a list of previous applications of this scheme). In brief, these simulations use differential equations that minimize the free energy of sensory input using a generalized (gradient) descent ( Friston et al., 2010b ).

www.frontiersin.org

Table 1 . Processes and paradigms that have been modeled using the scheme in this paper .

These coupled differential equations describe perception and action respectively and just say that internal brain states and action change in the direction that reduces free energy. The first is known as (generalized) predictive coding and has the same form as Bayesian (e.g., Kalman–Bucy) filters used in time series analysis; see also ( Rao and Ballard, 1999 ). The first term in Eq. 11 is a prediction based upon a differential matrix operator 𝒟 that returns the generalized motion of the expectation, such that 𝒟 μ ̃ = μ ′ , μ ″ , μ ‴ , … T . The second term is usually expressed as a mixture of prediction errors that ensures the changes in conditional expectations are Bayes-optimal predictions about hidden states of the world. The second differential equation says that action also minimizes free energy - noting that free energy depends on action through sensory states S : Ψ × A × Ω → ℝ. The differential equations in (11) are coupled because sensory input depends upon action, which depends upon perception through the conditional expectations. This circular dependency leads to a sampling of sensory input that is both predicted and predictable, thereby minimizing free energy and surprise.

To perform neuronal simulations under this framework, it is only necessary to integrate or solve Eq. 11 to simulate the neuronal dynamics that encode conditional expectations and ensuing action. Conditional expectations depend upon the brain’s generative model of the world, which we assume has the following (hierarchical) form

This equation is just a way of writing down a model that specifies a probability density over the sensory and hidden states, where the hidden states Ψ = X × V × U have been divided into hidden dynamic, causal and control states. Here [ g ( i ) , f ( i ) ] are non-linear functions of hidden states that generate sensory inputs at the first (lowest) level, where, for notational convenience, v (0) : = s .

Hidden causes V ⊂ Ψ can be regarded as functions of hidden dynamic states; hereafter, hidden states X ⊂ Ψ. Random fluctuations ( ω x i , ω v i ) on the motion of hidden states and causes are conditionally independent and enter each level of the hierarchy. It is these that make the model probabilistic: they play the role of sensory noise at the first level and induce uncertainty about states at higher levels. The (inverse) amplitudes of these random fluctuations are quantified by their precisions ( ∏ x i , ∏ v i ) , which we assume to be fixed in this paper. Hidden causes link hierarchical levels, whereas hidden states link dynamics over time. Hidden states and causes are abstract quantities (like the motion of an object in the field of view) that the brain uses to explain or predict sensations. In hierarchical models of this sort, the output of one level acts as an input to the next. This input can produce complicated (generalized) convolutions with deep (hierarchical) structure.

Perception and Predictive Coding

Given the form of the generative model (Eq. 12) we can now write down the differential equations (Eq. 11) describing neuronal dynamics in terms of (precision-weighted) prediction errors on the hidden causes and states. These errors represent the difference between conditional expectations and predicted values, under the generative model (using A · B : = A T B and omitting higher-order terms):

Equation 13 can be derived fairly easily by computing the free energy for the hierarchical model in Eq. 12 and inserting its gradients into Eq. 11. What we end up with is a relatively simple update scheme, in which conditional expectations are driven by a mixture of prediction errors, where prediction errors are defined by the equations of the generative model.

It is difficult to overstate the generality and importance of Eq. 13: its solutions grandfather nearly every known statistical estimation scheme, under parametric assumptions about additive or multiplicative noise ( Friston, 2008 ). These range from ordinary least squares to advanced variational deconvolution schemes. The resulting scheme is called generalized filtering or predictive coding ( Friston et al., 2010b ). In neural network terms, Eq. 13 says that error units receive predictions from the same level and the level above. Conversely, conditional expectations (encoded by the activity of state units) are driven by prediction errors from the same level and the level below. These constitute bottom-up and lateral messages that drive conditional expectations toward a better prediction to reduce the prediction error in the level below. This is the essence of recurrent message passing between hierarchical levels to optimize free energy or suppress prediction error: see Friston and Kiebel (2009) for a more detailed discussion. In neurobiological implementations of this scheme, the sources of bottom-up prediction errors, in the cortex, are thought to be superficial pyramidal cells that send forward connections to higher cortical areas. Conversely, predictions are conveyed from deep pyramidal cells, by backward connections, to target (polysynaptically) the superficial pyramidal cells encoding prediction error ( Mumford, 1992 ; Friston and Kiebel, 2009 ). Figure 2 provides a schematic of the proposed message passing among hierarchically deployed cortical areas.

www.frontiersin.org

Figure 2. Schematic detailing the neuronal architecture that might encode conditional expectations about the states of a hierarchical model . This shows the speculative cells of origin of forward driving connections that convey prediction error from a lower area to a higher area and the backward connections that construct predictions ( Mumford, 1992 ). These predictions try to explain away prediction error in lower levels. In this scheme, the sources of forward and backward connections are superficial and deep pyramidal cells respectively. The equations represent a generalized descent on free-energy under the hierarchical models described in the main text: see also ( Friston, 2008 ). State units are in black and error units in red. Here, neuronal populations are deployed hierarchically within three cortical areas (or macro-columns). Within each area, the cells are shown in relation to cortical layers: supra-granular (I–III) granular (IV) and infra-granular (V–VI) layers. For simplicity, conditional expectations about control states had been absorbed into conditional expectations about hidden causes.

In active inference, conditional expectations elicit behavior by sending top-down predictions down the hierarchy that are unpacked into proprioceptive predictions at the level of the cranial nerve nuclei and spinal-cord. These engage classical reflex arcs to suppress proprioceptive prediction errors and produce the predicted motor trajectory

The reduction of action to classical reflexes follows because the only way that action can minimize free energy is to change sensory (proprioceptive) prediction errors by changing sensory signals; cf., the equilibrium point formulation of motor control (Feldman and Levin 1995 ). In short, active inference can be regarded as equipping a generalized predictive coding scheme with classical reflex arcs: see ( Friston et al., 2009 , 2010a ) for details. The actual movements produced clearly depend upon top-down predictions that can have a rich and complex structure, due to perceptual optimization based on the sampling of salient exteroceptive and interoceptive inputs.

Counterfactual Processing

To optimize prior expectations about hidden controls it is necessary to identify those that maximize the salience of counterfactual representations implicit in the counterfactual density in Eq. 7. Clearly, there are many ways this could be implemented. In this paper, we will focus on visual searches and assume that counterfactual expectations are represented explicitly and place coded in a saliency map over the space of hidden causes. In other words, we will assume that salience is encoded on a grid corresponding to discrete values of counterfactual expectations associated with different hidden control states. The maximum of this map defines the counterfactual expectation with the greatest salience, which then becomes the prior expectation about hidden control states. This prior expectation enters the predictive coding in Eq. 13. The salience of the j -th counterfactual expectation is, from Eqs 9 and 12,

where the counterfactual prediction errors and their precisions are:

Given that we will be simulating visual searches with saccadic eye movements, we will consider the prior expectations to be updated at discrete times to simulate successive saccades, where the hidden controls correspond to locations in the visual scene that attract visual fixation.

In summary, we have derived equations for the dynamics of perception and action using a free-energy formulation of adaptive (Bayes-optimal) exchanges with the world and a generative model that is both generic and biologically plausible. In what follows, we use Eqs 13–15 to simulate neuronal and behavioral responses. A technical treatment of the material above can be found in ( Friston et al., 2010a ), which provides the details of the scheme used to integrate Eq. 11 to produce the simulations in the next section. The only addition to previous illustrations of this scheme is Eq. 15, which maps conditional expectations about hidden states to prior expectations about hidden controls: it is this mapping that underwrites the sampling of salient features and appeals to the existence of hidden control states that action can change. Put simply, this formulation says that action fulfills predictions and we predict that the consequences of action (i.e., hidden controls) minimize the uncertainty about predictions.

Modeling Saccadic Eye Movements

In this section, we will illustrate the theory of the previous section, using simulations of sequential eye movements. Saccadic eye movements are a useful vehicle to illustrate active inference about salient features of the world because they speak directly to visual search strategies and a wealth of psychophysical, neurobiological, and theoretical study (e.g., Grossberg et al., 1997 ; Ferreira et al., 2008 ; Srihasam et al., 2009 ; Bisley and Goldberg, 2010 ; Shires et al., 2010 ; Tatler et al., 2011 ; Wurtz et al., 2011 ). Having said this, we do not aim to provide detailed neurobiological simulations of oculomotor control, rather to use the basic phenomenology of saccadic eye movements to illustrate the key features of the optimal inference scheme described above. This scheme can be regarded as a formal example of active vision ( Wurtz et al., 2011 ); sometimes described in enactivist terms as visual palpation (O’Regan and Noë, 2001 ).

In what follows, we describe the production of visual signals and how they are modeled in terms of a generative model. We will focus on a fairly simple paradigm – the categorization of faces – and therefore sidestep many of the deeper challenges of understanding visual searches. These simulations should not be taken as a serious or quantitative model of saccadic eye movements – they just represent a proof of principle to illustrate the basic phenomenology implied by prior beliefs that constitute a generative model. Specifying a generative model allows us to compute the salience of stimulus features that are sampled and enables us to solve or integrate Eq. 11 to simulate the neuronal encoding of posterior beliefs and ensuing action. We will illustrate this in terms of oculomotor dynamics and the perception of a visual stimulus or scene. The simulations reported below can be reproduced by calling (annotated) Matlab scripts from the DEM graphical user interface ( Visual search ), available as academic freeware ( http://www.fil.ion.ucl.ac.uk/spm/ ).

The Generative Process

To integrate the generalized descent on free energy in Eq. 11, we need to define the processes generating sensory signals as a function of (hidden) states and action:

Note that these hidden states are true states that actually produce sensory signals. These have been written in boldface to distinguish them from the hidden states assumed by the generative model (see below). In these simulations, the world is actually very simple: sensory signals are generated in two modalities – proprioception and vision. Proprioception, s p ∈ ℝ 2 reports the center of gaze or foveation as a displacement from the origin of some extrinsic frame of reference. Inputs in the visual modality comprise a list s q ∈ ℝ 256 of values over an array of sensory channels sampling a two-dimensional image or visual scene I: ℝ 2 → ℝ. This sampling uses a grid of 16 × 16 channels that uniformly samples a small part the image (one sixth of the vertical and horizontal extent). The numerical size of the grid was chosen largely for computational expedience. In terms of its size in retinotopic space – it represents a local high-resolution (foveal) sampling that constitutes an attentional focus. To make this sampling more biologically realistic, each channel is equipped with a center-surround receptive field that samples a local weighted average of the image. The weights correspond to a Gaussian function with a standard deviation of one pixel minus another Gaussian function with a standard deviation of four pixels. This provides an on-off center-surround sampling. Furthermore, the signals are modulated by a two-dimensional Hamming function – to model the loss of precise visual information from the periphery of the visual field. This modulation was meant to model the increasing size of classical receptive fields and an attentional down-weighting of visual input with increasing eccentricity from the center of gaze ( Feldman and Friston, 2010 ).

The only hidden states in this generative process x p ∈ ℝ 2 are the center of oculomotor fixation, whose motion is driven by action and decays with a suitably long time constant of 16 time bins (each time bin corresponds to 12 ms). These hidden states are also subject to random fluctuations, with a temporal smoothness of one half of a time bin (6 ms). The hidden states determine where the visual scene is sampled (foveated). In practice, the visual scene corresponds to a large grayscale image, where the i -th visual channel is sampled at location d i + x p ∈ ℝ 2 using sinc interpolation (as implemented in the SPM image analysis package). Here, d i ∈ ℝ 2 specifies the displacement of the i -th channel from the center of the sampling grid. The proprioceptive and visual signals were effectively noiseless, where there random fluctuations (ω v,p , ω v,q ) had a log precision of 16. The motion of the fixation point was subject to low amplitude fluctuations (ω x,p ) with a log precision of eight. This completes our description of the process generating proprioceptive and visual signals, for any given visual scene and action-dependent trajectory of hidden states (center of fixation). We now turn to the model of this process that generates predictions and action:

The Generative Model

The model of sensory signals used to specify variational free energy and consequent action (visual sampling) is slightly more complicated than the actual process of generating data:

As in the generative process above, proprioceptive signals are just a noisy mapping from hidden proprioceptive states encoding the direction of gaze. The visual input is modeled as a mixture of images sampled at a location specified by the proprioceptive hidden state. This hidden state decays with a time constant of four time bins (48 ms) toward a hidden control state. In other words, the hidden control determines the location that attracts gaze.

The visual input depends on a number of hypotheses or internal images I i : ℝ 2 → ℝ: i ∈ {1,…, N } that constitute the agent’s prior beliefs about what could cause its visual input. In this paper, we use N = 3 hypotheses. The input encountered at any particular time is a weighted mixture of these internal images, where the weights correspond to hidden perceptual states. The dynamics of these perceptual states (last equality above) implement a form of dynamic softmax, in the sense that the solution of their equations of motion ensures the weights sum (approximately) to one:

This means we can interpret exp( x q,i ) as the (softmax) probability that the i -th internal image or hypothesis is the cause of visual input. The decay term (with a time constant of 512 time bins) just ensures that perceptual states decay slowly to the same value, in the absence of perceptual fluctuations.

In summary, given hidden proprioceptive and perceptual states the agent can predict the proprioceptive and visual input. The generative model is specified by Eq. 18 and the precision of the random fluctuations that determine the agent’s prior certainty about sensory inputs and the motion of hidden states. In the examples below, we used a log precision of eight for proprioceptive sensations and the motion of hidden states that - and let the agent believe its visual input was fairly noisy, with a log precision of four. In practice, this means it is more likely to change its (less precise) posterior beliefs about the causes of visual input to reduce prediction error, as opposing to adjusting its (precise) posterior beliefs about where it is looking. All that now remains is to specify prior beliefs about the hidden control state attracting the center of gaze:

Priors and Saliency

To simulate saccadic eye movements, we integrated the active inference scheme for 16 time bins (196 ms) and then computed a map of salience to reset the prior expectations about the hidden control states that attract the center of gaze. This was repeated eight times to give a sequence of eight saccadic eye movements. The simulation of each saccade involves integrating the coupled differential Eqs 11, 14, and 17 to solve for the true hidden states, action, and posterior expectations encoded by neuronal activity. The integration used a local linearization scheme ( Ozaki, 1992 ) in generalized coordinates of motion as described in several previous publications ( Friston et al., 2010a ).

The salience was computed for 1024 = 32 × 32 locations distributed uniformly over the visual image or scene. The prior expectation of the hidden control state was the (generalized) location η ̃ j ∈ [ η j , 0 , 0 , … ] T that maximized salience, according to Eq. 15:

The fictive prediction errors at each location where evaluated at their solution under the generative model; namely,

In other words, salience is evaluated for proprioceptive and perceptual expectations encoding current posterior beliefs about the content of the visual scene and the fictive point of fixation to which gaze is attracted. The ensuing salience over the 32 × 32 locations constitutes a salience map that drives the next saccade. Notice that salience is a function of, and only of, fictive beliefs about the state of the world and essentially tells the agent where to sample (look) next. Salience depends only on sensory signals vicariously, through the current posterior beliefs. This is important because it means that salience is not an attribute of sensations, but beliefs about those sensations. In other words, salience is an attribute of features we believe to be present in the world and changes with the way that those features are sampled. In the present setting, salience is a function of where the agent looks. Note that the simulations of saccadic eye movements in this paper are slightly unusual, in that the salience map extends beyond the field of view. This means salient locations in the visual scene are represented outside the field of view: these locations are parts of a scene that should provide confirmatory evidence for current hypotheses about the extended (visual) environment.

Figure 3 provides a simple illustration of salience based upon the posterior beliefs or hypothesis that local (foveal) visual inputs are caused by an image of Nefertiti. The left panels summarize the classic results of Yarbus (1967) ; in terms of a stimulus and the eye movements it elicits. The right panels depict visual input after sampling the image on the right with center-surround receptive fields and the associated saliency map based on a local sampling of 16 × 16 pixels, using Eq. 20. Note how the receptive fields suppress absolute levels of luminance contrast and highlight edges. It is these edges that inform posterior beliefs about both the content of the visual scene and where it is being sampled. This information reduces conditional uncertainty and is therefore salient. The salient features of the image include the ear, eye, and mouth. The location of these features and a number of other salient locations appear to be consistent with the locations that attract saccadic eye movements (as shown on the right). Crucially, the map of salience extends well beyond the field of view (circle on the picture). As noted above, this reflects the fact that salience is not an attribute of what is seen, but what might be seen under a particular hypothesis about the causes of sensations.

www.frontiersin.org

Figure 3. This provides a simple illustration of salience based upon the posterior beliefs or hypothesis that local (foveal) visual inputs are caused by an image of Nefertiti . The left panels summarize the classic results of Yarbus; in terms of a stimulus and the eye movements it elicits. The right panels depict visual input after sampling the image on the right (using conventional center-surround receptive fields) and the associated saliency map based on a local sampling of 16 × 16 pixels, using the generative model described in the main text. The size of the resulting field of view, in relation to the visual scene, is indicated with the circle on the left image. The key thing to note here is that the salient features of the image include the ear, eye, and mouth. The location of these features and other salient locations appear to be consistent with the locations that attract saccadic eye movements (as shown on the left).

To make the simulations a bit more realistic, we added a further prior implementing inhibition of return ( Itti and Koch, 2001 ; Wang and Klein, 2010 ). This involved suppressing the salience of locations that have been recently foveated, using the following scheme:

Here, S k = S ( η ̃ j ) - min ( S ( η ̃ j ) ) is the differential salience for the k -th saccade and R k is an inhibition of return map that remembers recently foveated locations. This map reduces the salience of previous locations if they were visited recently. The function ρ( S k ) ∈ [0,1] is a Gaussian function (with a standard deviation of 1/16 of the image size) of the distance from the location of maximum salience that attracts the k -th saccade. The addition of inhibition of return ensures that a new location is selected by each saccade and can be motivated ethologically by prior beliefs that the visual scene will change and that previous locations should be revisited.

Functional Anatomy

Figure 4 provides an intuition as to how active inference under salience priors might be implemented in the brain. This schematic depicts a particular instance of the message passing scheme in Figure 2 , based on the generative model above. This model prescribes a particular hierarchical form for generalized predictive coding; shown here in terms of state and error units (black and red, denoting deep and superficial pyramidal cell populations respectively) that have been assigned to different cortical or subcortical regions. The insert on the left shows a visual scene (a picture of Nefertiti) that can be sampled locally by foveating a particular point – the true hidden state of the world. The resulting visual input arrives in primary visual cortex to elicit prediction errors that are passed forward to “what” and “where” streams ( Ungerleider and Mishkin, 1982 ). State units in the “what” stream respond by adjusting their representations to provide better predictions based upon a discrete number of internal images or hypotheses. Crucially, the predictions of visual input depend upon posterior beliefs about the direction of gaze, encoded by the state units in the “where” stream ( Bisley and Goldberg, 2010 ). These posterior expectations are themselves informed by top-down prior beliefs about the direction of gaze that maximizes salience. The salience map shown in the center is updated between saccades based upon conditional expectations about the content of the visual scene. Conditional beliefs about the direction of gaze provide proprioceptive predictions to the oculomotor system in the superior colliculus and pontine nuclei, to elaborate a proprioceptive prediction error ( Grossberg et al., 1997 ; Shires et al., 2010 ; Shen et al., 2011 ). This prediction error drives the oculomotor system to fulfill posterior beliefs about where to look next. This can be regarded as an instance of the classical reflects arc, whose set point is determined by top-down proprioceptive predictions. The anatomical designations should not be taken seriously (for example, the salience map may be assembled in the pulvinar or frontal cortex and mapped to the deep layer of the superior colliculus). The important thing to take from this schematic is the functional logic implied by the anatomy that involves reciprocal message passing and nested loops in a hierarchical architecture that is not dissimilar to circuits in the real brain. In particular, note that representations of hidden perceptual states provide bilateral top-down projections to early visual systems (to predict visual input) and to the systems computing salience, which might involve the pulvinar of the thalamus ( Wardak et al., 2011 ; Wurtz et al., 2011 ).

www.frontiersin.org

Figure 4. This schematic depicts a particular instance of the message passing scheme in Figure 2 . This example follows from the generative model of visual input described in the main text. The model prescribes a particular hierarchical form for generalized predictive coding; shown here in terms of state and error units (black and red respectively) that have been assigned to different cortical or subcortical regions. The insert on the left shows a visual scene (a picture of Nefertiti) that can be sampled locally by foveating a particular point – the true hidden state of the world. The resulting visual input arrives in primary visual cortex to elicit prediction errors that are passed forward to what and where streams. State units in the “what” stream respond by adjusting their representations to provide better predictions based upon a discrete number of internal images or hypotheses. Crucially, the predictions of visual input depend upon posterior beliefs about the direction of gaze encoded by state units in the “where” stream. These conditional expectations are themselves informed by top-down prior beliefs about the direction of gaze that maximizes salience. The salience map shown in the center is updated between saccades based upon posterior beliefs about the content of the visual scene. Posterior beliefs about the content of the visual scene provide predictions of visual input and future hidden states subtending salience. Posterior beliefs about the direction of gaze are used to form predictions of visual input and provide proprioceptive predictions to the oculomotor system in the superior colliculus and pontine nuclei, to elaborate a proprioceptive prediction error. This prediction error drives the oculomotor system to fulfill posterior beliefs about where to look next. This can be regarded as an instance of the classical reflects arc, whose set point is determined by top-down proprioceptive predictions. The variables associated with each region are described in detail in the text, while the arrows connecting regions adopt same format as in Figure 2 (forward prediction error afferents in red and backward predictions in black).

In this section, we have described the process generating sensory information in terms of a visual scene and hidden states that specify how that scene is sampled. We have described both the likelihood and priors that together comprise a generative model. The special consideration here is that these priors reflect prior beliefs that the agent will sample salient sensory features based upon its current posterior beliefs about the causes of those features. We are now in a position to look at the sorts of behavior this model produces.

Simulating Saccadic Eye Movements

In this section, we present a few examples of visual search under the generative model described above. Our purpose here is to illustrate the nature of active inference, when it is equipped with priors that maximize salience or minimize uncertainty. We will present three simulations; first a canonical simulation in which the visual scene matches one of three internal images or hypotheses. This simulation illustrates the operation of optimal visual searches that select the hypothesis with the lowest free energy and minimize conditional uncertainty about this hypothesis. We will then repeat the simulation using a visual scene that the agent has not experienced and has no internal image of. This is used to illustrate a failure to select a hypothesis and the consequent itinerant sampling of the scene. Finally, largely for fun, we simulate a “dark room” agent whose prior beliefs compel it to sample the least salient locations to demonstrate how these priors result in sensory seclusion from the environment.

Figure 5 shows the results of the first simulation, in which the agent had three internal images or hypotheses about the scene it might sample (an upright face, an inverted face, and a rotated face). The agent was presented with an upright face and its posterior expectations were evaluated over 16 (12 ms) time bins, after which salience was evaluated. The agent then emitted a saccade by foveating the most salient location during the subsequent 16 time bins – from its starting location (the center of the visual field). This was repeated for eight saccades. The upper row shows the ensuing eye movements as red dots (in the extrinsic coordinates of the true scene) at the fixation point of each saccade. The corresponding sequence of eye movements is shown in the insert on the upper left, where the red circles correspond roughly to the agent’s field of view. These saccades were driven by prior beliefs about the direction of gaze based upon the salience maps in the second row. Note that these maps change with successive saccades as posterior beliefs about the hidden perceptual states become progressively more confident. Note also that salience is depleted in locations that were foveated in the previous saccade – this reflects the inhibition of return. Posterior beliefs about hidden states provide visual and proprioceptive predictions that suppress visual prediction errors and drive eye movements respectively. Oculomotor responses are shown in the third row in terms of the two hidden oculomotor states corresponding to vertical and horizontal displacements. The portions of the image sampled (at the end of each saccade) are shown in the fourth row (weighted by the Hamming function above). The final two rows show the posterior beliefs in terms of their sufficient statistics (penultimate row) and the perceptual categories (last row) respectively. The posterior beliefs are plotted here in terms of posterior expectations and 90% confidence interval about the true stimulus. The key thing to note here is that the expectation about the true stimulus supervenes over its competing representations and, as a result, posterior confidence about the stimulus category increases (the posterior confidence intervals shrink to the expectation): see ( Churchland et al., 2011 ) for an empirical study of this sort phenomena. The images in the lower row depict the hypothesis selected; their intensity has been scaled to reflect conditional uncertainty, using the entropy (average uncertainty) of the softmax probabilities.

www.frontiersin.org

Figure 5. This figure shows the results of the first simulation, in which a face was presented to an agent, whose responses were simulated using the optimal inference scheme described in the main text . In this simulation, the agent had three internal images or hypotheses about the stimuli it might sample (an upright face, an inverted face, and a rotated face). The agent was presented with an upright face and its conditional expectations were evaluated over 16 (12 ms) time bins until the next saccade was emitted. This was repeated for eight saccades. The ensuing eye movements are shown as red dots at the location (in extrinsic coordinates) at the end of each saccade in the upper row. The corresponding sequence of eye movements is shown in the insert on the upper left, where the red circles correspond roughly to the proportion of the image sampled. These saccades are driven by prior beliefs about the direction of gaze based upon the saliency maps in the second row. Note that these maps change with successive saccades as posterior beliefs about the hidden states, including the stimulus, become progressively more confident. Note also that salience is depleted in locations that were foveated in the previous saccade. These posterior beliefs provide both visual and proprioceptive predictions that suppress visual prediction errors and drive eye movements respectively. Oculomotor responses are shown in the third row in terms of the two hidden oculomotor states corresponding to vertical and horizontal displacements. The associated portions of the image sampled (at the end of each saccade) are shown in the fourth row. The final two rows show the posterior beliefs in terms of their sufficient statistics and the stimulus categories respectively. The posterior beliefs are plotted here in terms of conditional expectations and the 90% confidence interval about the true stimulus. The key thing to note here is that the expectation about the true stimulus supervenes over its competing expectations and, as a result, conditional confidence about the stimulus category increases (the confidence intervals shrink to the expectation). This illustrates the nature of evidence accumulation when selecting a hypothesis or percept that best explains sensory data. Within-saccade accumulation is evident even during the initial fixation with further stepwise decreases in uncertainty as salient information is sampled at successive saccades.

This simulation illustrates a number of key points. First, it illustrates the nature of evidence accumulation in selecting a hypothesis or percept the best explains sensory data. One can see that this proceeds over two timescales; both within and between saccades. Within-saccade accumulation is evident even during the initial fixation, with further stepwise decreases in uncertainty as salient information is sampled. The within-saccade accumulation is formally related to evidence accumulation as described in models of perceptual discrimination ( Gold and Shadlen, 2003 ; Churchland et al., 2011 ). This is meant in the sense that the posterior expectations about perceptual states are driven by prediction errors. However, the accumulation here rests explicitly on the formal priors implied by the generative model. In this case, the prevalence of any particular perceptual category is modeled as a dynamical process that has certain continuity properties. In other words, inherent in the model is the belief that the content of the world changes in a continuous fashion. This means that posterior beliefs have a persistence or slower timescale than would be observed under schemes that just accumulate evidence. This is reflected in the progressive elevation of the correct perceptual state above its competitors and the consequent shrinking of the posterior confidence interval. The transient changes in the posterior beliefs, shortly after each saccade, reflect the fact that new data are being generated as the eye sweeps toward its new target location. It is important to note that the agent is not just predicting visual contrast, but also how contrast changes with eye movements – this induces an increase in conditional uncertainty (in generalized coordinates of motion) during the fast phase of the saccade. However, due to the veracity of the posterior beliefs, the conditional confidence shrinks again when the saccade reaches its target location. This shrinkage is usually to a smaller level than in the previous saccade.

This illustrates the second key point; namely, the circular causality that lies behind perception. Put simply, the only hypothesis that can endure over successive saccades is the one that correctly predicts the salient features that are sampled. This sampling depends upon action or an embodied inference that speaks directly to the notion of visual palpation (sniffing; O’Regan and Noë, 2001 ). This means that the hypothesis prescribes its own verification and can only survive if it is a correct representation of the world. If its salient features are not discovered, it will be discarded in favor of a better hypothesis. This provides a nice perspective on perception as hypothesis testing, where the emphasis is on the selective processes that underlie sequential testing. This is particularly pertinent when hypotheses can make predictions that are more extensive than the data available at any one time.

Finally, although the majority of saccades target the eyes and nose, as one might expect, there is one saccade to the forehead. This is somewhat paradoxical, because the forehead contains no edges and cannot increase posterior confidence about a face. However, this region is highly informative under the remaining two hypotheses (corresponding to the location of the nose in the inverted face and the left eye in the rotated face). This subliminal salience is revealed through inhibition of return and reflects the fact that the two competing hypotheses have not been completely excluded. This illustrates the competitive nature of perceptual selection induced by inhibition of return and can regarded, heuristically, as occasional checking of alternative hypotheses. This is a bit like a scientist who tries to refute his hypothesis by acquiring data that furnish efficient tests of his competing or null hypotheses.

We then repeated the simulation, but used an unknown (unrecognizable) face – the image of Nefertiti from previous Figures. Because the agent has no internal image or hypothesis that can produce veridical predictions about salient locations to foveate, it cannot resolve the causes of its sensory input and is unable to assimilate visual information into a precise posterior belief. (See Figure 6 ). Saccadic movements are generated by a saliency map that represents the most salient locations based upon a mixture of all internal hypotheses about the stimulus. The salience maps here have a lower spatial resolution than in the previous figure because sensory channels are deployed over a larger image. Irrespective of where the agent looks, it can find no posterior beliefs or hypothesis that can explain the sensory input. As a result, there is a persistent conditional uncertainty about the states of the world that fail to resolve themselves. The ensuing percepts are poorly formed and change sporadically with successive saccades.

www.frontiersin.org

Figure 6. This figure uses the same format as the previous figure, but shows the result of presenting an unknown (unrecognizable) face – the image of Nefertiti from previous figures . Because the agent has no internal image or hypothesis that can produce veridical predictions about salient locations to foveate, it cannot resolve the causes of its sensory input and is unable to assimilate visual information into a precise posterior belief about the stimulus. Saccadic movements are generated by a saliency map that represents the most salient locations based upon a mixture of all internal hypotheses about the stimulus. Irrespective of where the agent looks, it can find no posterior beliefs or hypothesis that can explain the sensory input. As a result, there is a persistent posterior uncertainty about the states of the world that fail to resolve themselves. The ensuing percepts are poorly formed and change sporadically with successive saccades.

Finally, we presented a known (recognizable) face to an agent whose prior beliefs minimize salience, as opposed to maximize it. This can be regarded as an agent that ( a priori ) prefers dark rooms. This agent immediately foveates the least informative part of the scene and maintains fixation at that location – see Figure 7 . This results in a progressive increase in uncertainty and ambiguity about the stimulus causing visual input; as reflected in the convergence of posterior expectations about the three hypotheses and an increase in conditional uncertainty (penultimate row). As time progresses, the percept fades (lower row) and the agent is effectively sequestered from its sensorium. Note here how the salience map remains largely unchanged and encodes a composite salience of all (three) prior hypotheses about visual input.

www.frontiersin.org

Figure 7. This figure uses the same format as the previous figures but shows the result of presenting a known (recognizable) face to an agent whose prior beliefs about eye movements are that they should minimize salience, as opposed to maximize it . This can be regarded as an agent that prefers dark rooms. This agent immediately foveates the least informative part of the scene and maintains fixation at that location (because the inhibition of return operates by decreasing the salience of the location foveated previously). This results in a progressive increase in uncertainty about the stimulus; as reflected in the convergence of posterior expectations about the three hypotheses and an increase in conditional uncertainty (penultimate row). As time progresses, the percept fades (lower row) and the agent is effectively sequestered from its sensorium. Note here how the salience map remains largely unchanged and encodes a composite salience of all (three) prior hypotheses about states of the world causing visual input.

This work suggests that we can understand exploration of the sensorium in terms of optimality principles based on straightforward ergodic or homoeostatic principles. In other words, to maintain the constancy of our external milieu, it is sufficient to expose ourselves to predicted and predictable stimuli. Being able to predict what is currently seen also enables us to predict fictive sensations that we could experience from another viewpoint. The mathematical arguments presented in the first section suggest that the best viewpoint is the one that confirms our predictions with the greatest precision or certainty. In short, action fulfills our predictions, while we predict the consequences of our actions will maximize confidence in those predictions. This provides a principled way in which to explore and sample the world; for example, with visual searches using saccadic eye movements. These theoretical considerations are remarkably consistent with a number of compelling heuristics; most notably the Infomax principle or the principle of minimum redundancy, signal detection theory, and recent formulations of salience in terms of Bayesian surprise.

Simulations of successive saccadic eye movements or visual search, based on maximizing saliency or posterior precision reproduce, in a phenomenological sense, some characteristics of visual searches seen empirically. Although these simulations should not be taken as serious proposals for the neurobiology of oculomotor control, they do provide a rough proof of principle for the basic idea. An interesting perspective on perception emerges from the simulations, in which percepts are selected through a form of circular causality: in other words, only the correct percept can survive the cycle of action and perception, when the percept is used to predict where to look next. If the true state of the world and the current hypothesis concur, then the percept can maintain itself by selectively sampling evidence for its own existence. This provides an embodied (enactivist) explanation for perception that fits comfortably with the notion of visual sniffing or palpation (O’Regan and Noë, 2001 ; Wurtz et al., 2011 ), in contrast to passive evidence accumulation schemes. Having said this, evidence accumulation is an integral part of optimal inference; in the sense that dynamics on representations of hidden states, representing competing hypotheses, are driven by prediction errors. However, this is only part of the story; in that emerging representations come to play a role in determining where to search for evidence. This is illustrated nicely in the context of saccadic eye movements of the sort we have simulated.

There are many limitations of the current scheme that we have glossed over. For example, there is no principled reason why we should include inhibition of return. Of course, we can appeal to natural selection to say that this sort of prior belief would be more robust in a changing environment; however, this sort of proposition is best substantiated with simulations or analytic arguments. The question here would be whether inhibition of return is an emergent property of free-energy minimization or Bayes-optimal sampling of the visual scene. Another simplifying assumption that we have made is that the agent executes a new saccade or search on a fixed and slow timescale, without considering how saccadic eye movements are actually triggered or when they may be triggered in an optimal fashion ( Grossberg et al., 1997 ). Note that the emission of sporadic movements follows from the sporadic updates of the salience map – the actual movement is responding continuously to proprioceptive predictions based upon salience. One advantage of considering sporadic updates is that the solution of fictive hidden states in the future becomes relatively simple; for example, given prior beliefs about hidden control (the location of a point attractor of foveation), it is a relatively simple matter to compute the hidden states in the future (that are attracted to that location). This advantage may have been exploited by evolution. However, the fixed timescale (16 times bins of 12 ms) does not account for the link between when and where in oculomotor control – observed particularly in reading studies ( Rayner, 1978 ): Solving for fictive states in the future may not be simple when hidden causes are changing quickly – as in reading ( Yarbus, 1967 ). Eye movements have been studied extensively in the context of reading or have been used to infer reading processes. Huge amounts of data are available (including corpus studies) and it would be interesting to see how the current framework could explain robust effects in reading. Moreover, models of oculomotor control in reading – such as EZ-Reader or SWIFT ( Rayner, 2009 ) – are particularly elaborate and include contextual constraints (allowing predictions) and mechanisms linking where and when decisions. These schemes represent interesting task-specific models that may lend themselves to integration in the theoretical framework introduced here. Finally, we have not paid much attention to the vast amount of work on the neurobiology of saccadic eye movements and their functional anatomy. It will be an interesting exercise to see how much of the empirical work on the psychophysics and neurophysiology of saccades can be addressed using the theory above.

There are a number of obvious next steps that suggest themselves. For example, endowing the generative model with a deeper hierarchical structure and allowing it to represent multiple objects at the same time. One can see, at least intuitively, how the ensuing inference would correspond to scene construction and require one to address fundamental issues like translational invariance and the representation of things like occlusion and depth. The hierarchical nature of representations is particularly interesting from the point of view of face processing: for example Miellet et al. (2011) showed that observers can use either a local (sampling foveal information) or a global (sampling diagnostic extra-foveal features) strategy – suggesting “that face identification is not rooted in a single, or even preferred, information-gathering strategy” ( Miellet et al., 2011 ). In the same vein, a central fixation bias has been established for Eastern observers ( Blais et al., 2008 ; Miellet et al., 2010 ). The nature of hierarchical inference may be crucial for a formal understanding of these phenomena: in hierarchical generative models, hidden causes are represented at multiple levels of abstraction, each with conditional dependencies on other levels. This means that each – global or local – level contributes to conditional uncertainty and will therefore compete in determining the most salient sensory samples that resolve uncertainty. One can see how a context-sensitive competition among different levels of representation could manifest as a switching between the sampling of sensory information that informs local and global features ( Miellet et al., 2011 ). In principle, this sort of competition could be simulated by repeating the simulations presented above, using a hierarchical generative model.

It would also be interesting to simulate bistable perception within the current framework, using ambiguous figures and binocular presentation. The illustrations in this paper have used static visual scenes; however, the same principles could be applied to dynamically changing scenes and should, hopefully, reproduce the sorts of saccades seen in reading. The solutions to hidden fictive states in this dynamic context would be more complicated but not computationally intractable. Finally, we have not considered microsaccadic or fixation or movements. In the setting of active inference, fast microscopic movements represent an interesting area of study because they are the product closed loop feedback control with multiple hierarchically deployed loops (see Figure 4 ). This suggests that their statistics should show some interesting features characteristic of self-organized dynamics that are not bound to a particular temporal scale. We look forward to addressing these and other theoretical issues.

As noted by our reviewers, not all sensory epithelia can be moved around to search the sensorium – as in active touch and vision. For example, how could we generalize this approach to audition? An intriguing possibility is the prior beliefs that guide the active sampling of somatosensory and visual information could also guide directed attention. In Feldman and Friston (2010) , we described the use of prior beliefs – about the location of precise sensory information – to explain the behavioral and electrophysiological correlates of attention (in the context of the Posner paradigm and biased competition). One might imagine that prior beliefs about the location of salient sensory information would not just inform proprioceptive predictions but predictions about the precision of sensory signals at particular locations in visual and auditory space. Turning this conjecture around, it suggests that (directed) attention could be understood – and implemented – in terms of prior beliefs about salient sensory channels that provide precise sensory confirmation of latent perceptual hypotheses. This perspective may provide a nice (formal) unification of overt active searches and covert attentional searches, in terms of prior beliefs that select where we look and where we attend.

It should be noted that the contribution of this work is purely conceptual and that we have not considered any empirical validation of the underlying ideas. There are many theoretically informed empirical initiatives that touch on the issues we have considered and much more: see Thurtell et al. (2011) , Dandekar et al. (2012) , Morvan and Maloney (2012) , Purcell et al. (2012) for recent examples. There are a number of interesting ways in which the computational ideas above could be linked to empirical studies of saccadic eye movements. First, one could focus on empirically derived salience maps and try to reverse engineer the underlying visual hypotheses that subjects were entertaining. In other words, Eq. 15 provides a model of a salience map – in terms of underlying hypothetical images and precisions; which (in principle) can be estimated, given an empirical salience map based on occupancy during visual searches. One could take this a stage further and use the simulations above as a generative or forward model of real eye movements – in terms of their statistics as measured with eye tracking or in terms of neuronal responses measured with electroencephalography. The exciting thing here is that one could then optimize the model parameters (e.g., internal templates) or compare different models of salience using Bayesian model comparison. As noted by one of our reviewers, the (neuronally plausible) predictive coding scheme we used to simulate saccadic eye movements can also be used to simulate event related potentials. This speaks to the interesting notion of modeling eye movements measured with eye tracking, oculomotor responses with electrooculography, and event related neuronal responses with electroencephalography. In principle, this modeling could furnish a dynamic causal model ( David et al., 2006 ) of multimodal responses – elicited by visual searches – that is both physiologically and computationally informed. This is outstanding but potentially important work that could provide empirical evidence for the theoretical ideas presented in this and other papers.

In summary, we have tried to formalize the intuitive notion that are interactions with the world are akin to sensory experiments, by which we confirm our hypotheses about its causal structure in an optimal and efficient fashion. This mandates prior beliefs that the deployment of sensory epithelia and our physical relationship to the world will disclose its secrets – beliefs that are fulfilled by action. The resulting active or embodied inference means that not only can we regard perception as hypotheses, but we could regard action as performing experiments that confirm or disconfirm those hypotheses.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

This work was funded by the Wellcome trust. We would like to thank our reviewers for very helpful guidance on how to present and contextualize this work. Laurent Perrinet was supported by European Union project Number FP7-269921, “Brain-Scales” and project “CODDE” from Seventh Framework Program FP7/2007-2013 under agreement number 214728-2.

Ashby, W. R. (1947). Principles of the self-organizing dynamic system. J. Gen. Psychol. 37, 125–128.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Ballard, D. H., Hinton, G. E., and Sejnowski, T. J. (1983). Parallel visual computation. Nature 306, 21–26.

Barlow, H. (1961). “Possible principles underlying the transformations of sensory messages,” in Sensory Communication , ed. W. Rosenblith (Cambridge, MA: MIT Press), 217–34.

Barlow, H. B. (1974). Inductive inference, coding, perception, and language. Perception 3, 123–134.

Beal, M. J. (2003). Variational Algorithms for Approximate Bayesian Inference . Ph.D. thesis, University College London, London.

Bialek, W., Nemenman, I., and Tishby, N. (2001). Predictability, complexity, and learning. Neural Comput. 13, 2409–2463.

Bisley, J. W., and Goldberg, M. E. (2010). Attention, intention, and priority in the parietal lobe. Annu. Rev. Neurosci. 33, 1–21.

Blais, C., Jack, R. E., Scheepers, C., Fiset, D., and Caldara, R. (2008). Culture shapes how we look at faces. PLoS ONE 3, e3022. doi:10.1371/journal.pone.0003022

CrossRef Full Text

Churchland, A. K., Kiani, R., Chaudhuri, R., Wang, X. J., Pouget, A., and Shadlen, M. N. (2011). Variance as a signature of neural computations during decision making. Neuron 69, 818–831.

Dandekar, S., Privitera, C., Carney, T., and Klein, S. A. (2012). Neural saccadic response estimation during natural viewing. J. Neurophysiol. 107, 1776–1790.

Daunizeau, J., Preuschoff, K., Friston, K., and Stephan, K. (2011). Optimizing experimental design for comparing models of brain function. PLoS Comput Biol. 7, e1002280. doi:10.1371/journal.pcbi.1002280

David, O., Kiebel, S., Harrison, L., Mattout, J., Kilner, J. M., and Friston, K. J. (2006). Dynamic causal modeling of evoked responses in EEG and MEG. Neuroimage 30, 1255–1272.

Dayan, P., Hinton, G. E., and Neal, R. (1995). The Helmholtz machine. Neural Comput. 7, 889–904.

Feldman, A. G., and Levin, M. F. (1995). The origin and use of positional frames of reference in motor control. Behav. Brain Sci. 18, 723–806.

Feldman, H., and Friston, K. J. (2010). Attention, uncertainty, and free-energy. Front. Hum. Neurosci. 4:215. doi:10.3389/fnhum.2010.00215

Ferreira, F., Apel, J., and Henderson, J. M. (2008). Taking a new look at looking at nothing. Trends Cogn. Sci. (Regul. Ed.) 12, 405–410.

Feynman, R. P. (1972). Statistical Mechanics . Reading, MA: Benjamin.

Friston, K. (2005). A theory of cortical responses. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 360, 815–836.

Friston, K. (2008). Hierarchical models in the brain. PLoS Comput. Biol. 4, e1000211. doi:10.1371/journal.pcbi.1000211

Friston, K. (2009). The free-energy principle: a rough guide to the brain? Trends Cogn. Sci. (Regul. Ed.) 13, 293–301.

Friston, K. (2010). The free-energy principle: a unified brain theory? Nat. Rev. Neurosci. 11, 127–138.

Friston, K., and Ao, P. (2011). Free-energy, value and attractors. Comput. Math. Methods Med. 2012, doi:10.1155/2012/937860

Friston, K., Thornton, C., and Clark, A. (2012). Free-energy minimization and the dark-room problem. Front. Psychol. 3:130. doi:10.3389/fpsyg.2012.00130

Friston, K., and Kiebel, S. (2009). Cortical circuits for perceptual inference. Neural Netw. 22, 1093–1104.

Friston, K., Kilner, J., and Harrison, L. (2006). A free energy principle for the brain. J. Physiol. Paris 100, 70–87.

Friston, K., Mattout, J., and Kilner, J. (2011). Action understanding and active inference. Biol. Cybern. 104, 137–160.

Friston, K. J., Daunizeau, J., and Kiebel, S. J. (2009). Active inference or reinforcement learning? PLoS ONE 4, e6421. doi:10.1371/journal.pone.0006421

Friston, K. J., Daunizeau, J., Kilner, J., and Kiebel, S. J. (2010a). Action and behavior: a free-energy formulation. Biol. Cybern. 102, 227–260.

Friston, K., Stephan, K., Li, B., and Daunizeau, J. (2010b). Generalised Filtering. Math. Probl. Eng. 2010, 621670.

Gibson, J. J. (1979). The Ecological Approach to Visual Perception . Boston: Houghton Mifflin.

Ginzburg, V. L., and Landau, L. D. (1950). On the theory of superconductivity. Zh. Eksp. Teor. Fiz. 20, 1064.

Gold, J. I., and Shadlen, M. N. (2003). The influence of behavioral context on the representation of a perceptual decision in developing oculomotor commands. J. Neurosci. 23, 632–651.

Pubmed Abstract | Pubmed Full Text

Gregory, R. L. (1980). Perceptions as hypotheses. Philos. Trans. R. Soc. Lond. B Biol. Sci. 290, 181–197.

Grossberg, S., Roberts, K., Aguilar, M., and Bullock, D. (1997). A neural model of multimodal adaptive saccadic eye movement control by superior colliculus. J. Neurosci. 17, 9706–9725.

Haken, H. (1983). Synergetics: An Introduction. Non-Equilibrium Phase Transition and Self-Selforganisation in Physics, Chemistry and Biology , 3rd Edn. Berlin: Springer Verlag.

Helmholtz, H. (1866/1962). “Concerning the perceptions in general,” in Treatise on Physiological Optics , 3rd Edn, Vol. III, ed. J. Southall, trans. (New York: Dover).

Humphreys, G. W., Allen, H. A., and Mavritsaki, E. (2009). Using biologically plausible neural models to specify the functional and neural mechanisms of visual search. Prog. Brain Res. 176, 135–148.

Itti, L., and Baldi, P. (2009). Bayesian surprise attracts human attention. Vision Res. 49, 1295–1306.

Itti, L., and Koch, C. (2001). Computational modelling of visual attention. Nat. Rev. Neurosci. 2, 194–203.

Jaynes, E. T. (1957). Information theory and statistical mechanics. Phys. Rev. 106, 620–630.

Kiebel, S. J., Daunizeau, J., and Friston, K. J. (2009). Perception and hierarchical dynamics. Front. Neuroinform. 3:20. doi:10.3389/neuro.11.020.2009

Knill, D. C., and Pouget, A. (2004). The Bayesian brain: the role of uncertainty in neural coding and computation. Trends Neurosci. 27, 712–719.

Linsker, R. (1990). Perceptual neural organization: some approaches based on network models and information theory. Annu. Rev. Neurosci. 13, 257–281.

Miellet, S., Caldara, R., and Schyns, P. G. (2011). Local Jekyll and global Hyde: the dual identity of face identification. Psychol. Sci. 22, 1518–1526.

Miellet, S., Zhou, X., He, L., Rodger, H., and Caldara, R. (2010). Investigating cultural diversity for extrafoveal information use in visual scenes. J. Vis. 10, 21.

Morgan, M. J. (2011). Features and the ‘primal sketch.’ Vision Res. 51, 738–753.

Morvan, C., and Maloney, L. T. (2012). Human visual search does not maximize the post-saccadic probability of identifying targets. PLoS Comput. Biol. 8, e1002342. doi:10.1371/journal.pcbi.1002342

Mumford, D. (1992). On the computational architecture of the neocortex. II. Biol. Cybern. 66, 241–251.

Oja, E. (1989). Neural networks, principal components, and subspaces. Int. J. Neural Syst. 1, 61–68.

Olshausen, B. A., and Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609.

Optican, L., and Richmond, B. J. (1987). Temporal encoding of two-dimensional patterns by single units in primate inferior cortex. II Information theoretic analysis. J. Neurophysiol. 57, 132–146.

O’Regan, J. K., and Noë, A. (2001). A sensorimotor account of vision and visual consciousness. Behav. Brain Sci. 24, 939–973.

Ozaki, T. (1992). A bridge between nonlinear time-series models and stochastic dynamical systems: a local linearization approach. Stat. Sin. 2, 113–135.

Purcell, B. A., Schall, J. D., Logan, G. D., and Palmeri, T. J. (2012). From salience to saccades: multiple-alternative gated stochastic accumulator model of visual search. J. Neurosci. 32, 3433–3446.

Rao, R. P., and Ballard, D. H. (1999). Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat. Neurosci. 2, 79–87.

Rayner, K. (1978). Eye movements in reading and information processing. Psychol. Bull. 85, 618–660.

Rayner, K. (2009). Eye movements in reading: models and data. J. Eye Mov. Res. 2, 1–10.

Shen, K., Valero, J., Day, G. S., and Paré, M. (2011). Investigating the role of the superior colliculus in active vision with the visual search paradigm. Eur. J. Neurosci. 33, 2003–2016.

Shires, J., Joshi, S., and Basso, M. A. (2010). Shedding new light on the role of the basal ganglia-superior colliculus pathway in eye movements. Curr. Opin. Neurobiol. 20, 717–725.

Srihasam, K., Bullock, D., and Grossberg, S. (2009). Target selection by the frontal cortex during coordinated saccadic and smooth pursuit eye movements. J. Cogn. Neurosci. 21, 1611–1627.

Tatler, B. W., Hayhoe, M. M., Land, M. F., and Ballard, D. H. (2011). Eye guidance in natural vision: reinterpreting salience. J. Vis. 11, 5.

Thurtell, M. J., Joshi, A. C., Leigh, R. J., and Walker, M. F. (2011). Three-dimensional kinematics of saccadic eye movements in humans: is the “half-angle rule” obeyed? Ann. N. Y. Acad. Sci. 1233, 34–40.

Ungerleider, L. G., and Mishkin, M. (1982). “Two cortical visual systems,” in Analysis of Visual Behavior , eds D. Ingle, M. A. Goodale, and R. J. Mansfield (Cambridge, MA: MIT Press), 549–586.

Wang, Z., and Klein, R. M. (2010). Searching for inhibition of return in visual search: a review. Vision Res. 50, 220–228.

Wardak, C., Olivier, E., and Duhamel, J. R. (2011). The relationship between spatial attention and saccades in the frontoparietal network of the monkey. Eur. J. Neurosci. 33, 1973–1981.

Wurtz, R. H., McAlonan, K., Cavanaugh, J., and Berman, R. A. (2011). Thalamic pathways for active vision. Trends Cogn. Sci. (Regul. Ed.) 5, 177–184.

Yarbus, A. L. (1967). Eye Movements and Vision . New York: Plenum.

Yuille, A., and Kersten, D. (2006). Vision as Bayesian inference: analysis by synthesis? Trends Cogn. Sci. (Regul. Ed.) 10, 301–308.

Keywords: free energy, active inference, visual search, surprise, salience, exploration, Bayesian inference, perception

Citation: Friston K, Adams RA, Perrinet L and Breakspear M (2012) Perceptions as hypotheses: saccades as experiments. Front. Psychology 3 :151. doi: 10.3389/fpsyg.2012.00151

Received: 08 January 2012; Accepted: 26 April 2012; Published online: 28 May 2012.

Reviewed by:

Copyright: © 2012 Friston, Adams, Perrinet and Breakspear. This is an open-access article distributed under the terms of the Creative Commons Attribution Non Commercial License , which permits non-commercial use, distribution, and reproduction in other forums, provided the original authors and source are credited.

*Correspondence: Karl Friston, Wellcome Trust Centre for Neuroimaging, Institute of Neurology, Queen Square, London WC1N 3BG, UK. e-mail: k.friston@ucl.ac.uk

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

The Common Kind Theory and The Concept of Perceptual Experience

  • Original Research
  • Published: 25 October 2021
  • Volume 88 , pages 2847–2865, ( 2023 )

Cite this article

what perceptual hypothesis

  • Neil Mehta   ORCID: orcid.org/0000-0002-6207-6778 1  

251 Accesses

Explore all metrics

In this paper, I advance a new hypothesis about what the ordinary concept of perceptual experience might be. To a first approximation, my hypothesis is that it is the concept of something that seems to present mind-independent objects. Along the way, I reveal two important errors in Michael Martin’s argument for the very different view that the ordinary concept of perceptual experience is the concept of something that is impersonally introspectively indiscriminable from a veridical perception. This conceptual work is significant because it provides three pieces of good news for the common kind theorist.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

Similar content being viewed by others

what perceptual hypothesis

Property-Awareness and Representation

Phenomenal character and the epistemic role of perception, perceptual objectivity and the limits of perception.

Martin ( 2004 , p. 37).

I will be focusing on Martin’s most detailed presentation of the argument, which is in his (2004, pp. 47–52). But see also Martin ( 2006 ), which briefly revisits parts of this argument.

The common kind theory has so many advocates that it would be tedious to cite them all. Still, for some paradigms, see Tye ( 1995 ), Schellenberg ( 2018 ).

The notion of a fundamental kind has been developed in several subtly different ways. See for example Martin ( 2006 , pp. 360–361), Brewer ( 2011 , p. 3), Logue ( 2012 b, p. 174) and ( 2013 , p. 109).

Metaphysical disjunctivists (or those who accept some nearby view) include Hinton ( 1967 ), Campbell ( 2002 ), Martin ( 2004 ) and ( 2006 ), Snowdon ( 2005 ), Fish ( 2009 ), Nudds ( 2009 ), Brewer ( 2011 ), Logue ( 2012 ), Allen ( 2015 ), Genone ( 2016 ), Miracchi ( 2017 ), Moran ( 2018 ), French and Gomes ( 2019 ). For a bracingly clear overview of different forms of metaphysical disjunctivism, see Soteriou ( 2016 ).

I use the expressions reasonably natural property and natural kind in the sense of Lewis ( 1983 ).

I use the term “entity” as an especially broad sortal that includes properties, objects, events, states, etc.

Martin dubs this the “immodest view” (2004, pp. 47–48). However, since this is precisely the view that I wish to defend, I prefer the less prejudicial label given above. In addition, Martin does not speak of just one property E ; he speaks of a whole host of properties E 1 … E n . But you can think of E as the conjunction of E 1 … E n .

For a few advocates of the common kind theory, see fn. 3.

For advocates of metaphysical disjunctivism, see fn. 5.

Again, see Martin ( 2004 , pp. 75–76) and ( 2006 , §5). Martin takes inspiration from Hinton ( 1967 ). I have departed from Martin’s presentation in a few minor ways, however. First, Martin dubs this the “modest view,” but I will argue that there is nothing particularly modest about it; thus I prefer the more informative label given in the text. Second, Martin inquires into the ordinary concept of a perceptual experience of a street scene , but for our purposes I find it more helpful to inquire more generally into the ordinary concept of perceptual experience .

See Martin ( 2004 , pp. 74–81) and ( 2006 , pp. 379–96).

Martin ( 2004 , p. 49).

Ibid, p. 50.

Ibid, p. 49.

Ibid, pp. 49–50.

Ibid, p. 51.

I thank my undergraduate student Xianda Wen for the astute observation that seemings are sometimes inconsistent.

I owe this concern to an anonymous referee.

Martin ( 2004 , pp. 50–51).

Ibid, pp. 50–51.

At least, assuming that some perceptual experiences exist. (Otherwise there are no events that instantiate E , so telling whether or not an event instantiates these properties might turn out to be very easy).

Martin ( 2004 , pp. 51–52).

See their (2008, p. 75).

Ibid, p. 78.

I thank an anonymous referee for this suggestion.

My objection presupposes that sensations are not perceptual experiences. Can Martin reinstate his argument by denying this? Perhaps—but at this point in the dialectic, the onus is on him to defend this claim. He does not do so.

See Martin ( 2004 , pp. 37–38).

See Siegel ( 2004 , p. 94). For a response, see Martin ( 2004 , pp. 80–81). For what it is worth, I believe that Martin’s response does not handle all of the problematic cases.

See Siegel ( 2008 , pp. 218–223). For responses, see Nudds ( 2009 , pp. 342–343); Soteriou ( 2016 , ch. 6).

See Sturgeon ( 2008 , p. 134). For a response, see Nudds ( 2009 , p. 342).

See Siegel ( 2008 , pp. 211–214). For a response, see Nudds ( 2009 , pp. 342–343). For the record, I believe that Siegel’s objection is correct.

This phenomenon is well-known, though it has been called many different things—Millar ( 2014 , p. 240) gives an especially perspicuous description of it under the heading of object-immediacy . For other influential descriptions of this phenomenon, see Broad ( 1952 , p. 6); Alston ( 1999 , p. 182); Sturgeon ( 2000 , p. 9); Martin ( 2002 , p. 413); Levine ( 2006 , p. 179); and Brewer ( 2011 , p. 2).

For more discussion of these matters, see Mackie ( 2019 ).

As an anonymous referee observes, this view is by no means irresistible. Another option is to say that perceptual experiences and sensations both simply seem to present objects (while remaining silent on their mind-independence); perhaps perceptual experiences and sensations even belong to the same fundamental kind. If this is right, then we might instead consider:

The variant presentational semantic view : It is a conceptual truth that what it is to be a perceptual experience or sensation is to seem to present objects. (The property of seeming to present objects is thus experience-grounding.) In addition, this property is introspectible, and it is not perception-dependent. For the sake of simplicity, however, I will continue to work with the view in the text.

See Bayern et al. ( 2018 ).

Notice that, on this view, seeming (or purporting ) to present mind-independent objects does not require concept-possession, but introspectively seeming to present such objects does require concept-possession.

For the record, I am not just being coy here: I am not a representationalist. I prefer a pluralist theory of perception, one that blends certain elements of naïve realism and representationalism. See Mehta ( ms ).

Some would reject this last claim. For instance, some will think that veridical perceptions do not seem to present mind-independent objects, but just objects, simpliciter . I discuss this idea in more detail in fn. 40.

This is not quite right, since it is possible to hallucinate an impossible object such as an Escher staircase. But, borrowing an idea from Martin ( 2004 , pp. 80–81), the objection could be reformulated into something like this: surely what it is to seem to present mind-independent objects is just to be exhaustively decomposable into parts that each seem to be perceptions. I will ignore this nuance in what follows.

This seeming is not introspective, so we can still allow that a perception might introspectively seem to present mind- dependent objects. Again, this is one way to understand the case in which the subject mistakes a perception of a faint ringing sound for a ringing sensation.

It is worth mentioning an alternative approach. We might say that what it is to be a perceptual experience is to seem to present external (rather than mind-dependent) objects (see fn. 40); that what it is to be a perception is in fact to present external objects; and that what it is to be a sensation is in fact to present internal objects. Perhaps hallucinations are a subclass of sensations—the ones that in fact present internal objects but seem to present external objects. This approach can allow that some perceptions present, and correctly seem to present, objects that are external but mind-dependent. So this approach lets us reject the biconditional claim that something seems to present mind-independent objects just in case it seems to be a perception. The approach can also allow us to say that perceptions and sensations can be introspectively mistaken for one another, since the seemings invoked in the account are not introspective.

Some experiences might seem to present mind-independent objects and mind-dependent ones. How would I account for these? I would say that they are mixtures of perceptual experiences and sensations. (It is not surprising to posit mixed experiences. It is for instance entirely possible to mix perceptual experiences and imaginative ones, by imagining coffee in a cup that I see to be empty.) However, another option is to say that it is possible to perceive , in an unmixed way, mind-independent objects and mind-dependent ones, as long as all of these objects are external. See fns. 40 and 47 for a way to develop this idea.

See Martin ( 2004 , p. 71).

Ibid, pp. 68–70.

For other metaphysical disjunctivist attempts to fill this lacuna, see Alston ( 1999 , p. 191); Fish ( 2009 , p. 94); Allen ( 2015 ).

I use the terms reasonably natural property and natural kind in the sense of Lewis ( 1983 ).

For further discussion of fundamental kinds, see Mehta ( 2021 ).

Allen, K. (2015). Hallucination and imagination. Australasian Journal of Philosophy, 93 (2), 287–302.

Article   Google Scholar  

Alston, W. (1999). Back to the theory of appearing. Philosophical Perspectives, 13 , 181–203.

Google Scholar  

Bayern, A. M. P., Danel, S., Auersperg, A. M. I., et al. (2018). Compound tool construction by New Caledonian crows. Scientific Reports, 8 (15676), 1–8.

Braddon-Mitchell, D. (2003). Qualia and analytical conditionals. Journal of Philosophy, 100 (3), 111–135.

Brewer, B. (2011). Perception and its objects . Oxford University Press.

Broad, C. (1952). Some elementary reflexions on sense-perception. Philosophy, 27 , 3–17.

Byrne, A., & Logue, H. (2008). Either/or. In A. Haddock & F. Macpherson (Eds.), Disjunctivism perception, action, knowledge (pp. 57–94). Oxford University Press.

Campbell, J. (2002). Reference and consciousness . Oxford University Press.

Chalmers, D. (2012). Constructing the world . Oxford University Press.

Fish, W. (2009). Perception, hallucination, and illusion . Oxford University Press.

French, C., & Gomes, A. (2019). How naïve realism can explain both the particularity and the generality of experience. Philosophical Quarterly, 69 (274), 41–63.

Genone, J. (2016). Recent work on naïve realism. American Philosophical Quarterly, 53 (1), 1–24.

Hinton, J. (1967). Visual experiences. Mind, 76 , 217–227.

Jackson, F. (1998). From metaphysics to ethics: A defence of conceptual analysis . Clarendon Press.

Kripke, S. (1972). Naming and necessity . Harvard University Press.

Levine, J. (2006). Conscious awareness and self-representation. In U. Kriegel & K. Williford (Eds.), Self-representational approaches to consciousness (pp. 173–198). MIT Press.

Lewis, D. (1983). New work for a theory of universals. Australasian Journal of Philosophy, 61 (4), 343–377.

Lewis, D. (1984). Putnam’s paradox. Australasian Journal of Philosophy, 62 (3), 221–236.

Logue, H. (2012). What should the naïve realist say about total hallucinations? Philosophical Perspectives, 26 , 173–199.

Logue, H. (2013). Good news for the disjunctivist about (one of) the bad cases. Philosophy and Phenomenological Research, 86 (1), 105–133.

Mackie, P. (2019). Perception, mind-independence, and Berkeley. Australasian Journal of Philosophy, 98 (3), 449–464.

Martin, M. (2002). The transparency of experience. Mind and Language, 17 (4), 376–425.

Martin, M. (2004). The limits of self-awareness. Philosophical Studies, 120 , 37–89.

Martin, M. (2006). On being alienated. In T. Gendler & J. Hawthorne (Eds.), Perceptual experience (pp. 354–410). Oxford University Press.

Mehta, N. (2021). “Naïve realism with many fundamental kinds.” Acta Analytica (online).

Mehta (ms). The many problems of perception.

Millar, B. (2014). The phenomenological directness of perceptual experience. Philosophical Studies, 170 , 235–253.

Miracchi, L. (2017). Perception first. Journal of Philosophy, 114 (12), 629–677.

Moran, A. (2018). Naïve realism, hallucination, and causation: A new response to the screening off problem. Australasian Journal of Philosophy, 97 (2), 368–382.

Nudds, M. (2009). Recent work in perception: Naïve realism and its opponents. Analysis Reviews, 69 (2), 334–346.

Schellenberg, S. (2018). The unity of perception: content, consciousness, evidence . Oxford University Press.

Siegel, S. (2004). Indiscriminability and the phenomenal. Philosophical Studies, 120 (1–3), 91–112.

Siegel, S. (2008). The epistemic conception of hallucination. In A. Haddock & F. Macpherson (Eds.), Disjunctivism (pp. 205–224). Oxford University Press.

Snowdon, P. (2005). The formulation of disjunctivism: A response to fish. Proceedings of the Aristotelian Society, 105 (1), 129–141.

Soteriou, M. (2016). Disjunctivism . Routledge.

Sturgeon, S. (2000). Matters of mind . Routledge.

Sturgeon, S. (2008). Disjunctivism about visual experience. In A. Haddock & F. Macpherson (Eds.), Disjunctivism: Perception, action, knowledge (pp. 112–143). Oxford University Pres.

Tye, M. (1995). Ten problems of consciousness: A representational theory of the phenomenal mind . MIT Press.

Download references

Author information

Authors and affiliations.

Yale-NUS College, Singapore, Singapore

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Neil Mehta .

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Mehta, N. The Common Kind Theory and The Concept of Perceptual Experience. Erkenn 88 , 2847–2865 (2023). https://doi.org/10.1007/s10670-021-00480-z

Download citation

Received : 13 January 2021

Accepted : 10 October 2021

Published : 25 October 2021

Issue Date : October 2023

DOI : https://doi.org/10.1007/s10670-021-00480-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Find a journal
  • Publish with us
  • Track your research
  • Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Sweepstakes
  • Guided Meditations
  • Verywell Mind Insights
  • 2024 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

Perceptual Sets in Psychology

Naufal MQ / Getty Images

How It Works

Top-down processing.

  • Forces of Influence
  • Real Life Impact

A perceptual set refers to a predisposition to perceive things in a certain way. In other words, we often tend to notice only certain aspects of an object or situation while ignoring other details.

What Is a Perceptual Set?

When it comes to our perceptions of the world around us, you might assume that what you see is what you get. However, in truth, research shows that the way you perceive the world through all of your senses is heavily influenced (and biased) by your own past experiences, expectations, motivations , beliefs, emotions , and even your culture.

For example, think about the last time you started a new class. Did you have any expectations at the outset that might have influenced your experience in the class? If you expect a class to be boring, are you more likely to be uninterested in class?

In psychology , this is what is known as a perceptual set.

A perceptual set is basically a tendency to view things only in a certain way.

What exactly is a perceptual set, why does it happen, and how does it influence how we perceive the world around us?

How do psychologists define perceptual sets?

"Perception can also be influenced by an individual's expectations, motives, and interests. The term perceptual set refers to the tendency to perceive objects or situations from a particular frame of reference," explains authors Susan Nolan and Sandra Hockenbury of the textbook  Discovering Psychology .

Sometimes, perceptual sets can be helpful. They often lead us to make fairly accurate conclusions about what exists in the world around us. In cases where we find ourselves wrong, we often develop new perceptual sets that are more accurate.

Sometimes, our perceptual sets can lead us astray.  

If you have a strong interest in military aircraft, for example, an odd cloud formation in the distance might be interpreted as a fleet of fighter jets; whereas, someone else may see it as a group of migrating ducks in flight.

In one experiment that illustrates this tendency, participants were presented with different non-words, such as sael . Those who were told that they would be reading boating-related words read the word as "sail," while those who were told to expect animal-related words read it as "seal."

A perceptual set is a good example of what is known as top-down processing . In top-down processing, perceptions begin with the most general and move toward the more specific. Such perceptions are heavily influenced by context, expectations, and prior knowledge.

If we expect something to appear in a certain way, we are more likely to perceive it according to our expectations.

Existing schemas , mental frameworks, and concepts often guide perceptual sets. For example, people have a strong schema for faces, making it easier to recognize familiar human faces in the world around us. It also means that when we look at an ambiguous image, we are more likely to see it as a face than some other type of object.

Researchers have also found that when multiple items appear in a single visual scene, perceptual sets will often lead people to miss additional items after locating the first one. For example, airport security officers might be likely to spot a water bottle in a bag but then miss that the bag also contains a firearm.  

Forces of Influence 

Below are examples of various forces of influence:

  • Motivation can play an important role in perceptual sets and how we interpret the world around us. If we are rooting for our favorite sports team, we might be motivated to view members of the opposing team as overly aggressive, weak, or incompetent. In one classic experiment, researchers deprived participants of food for several hours. When they were later shown a set of ambiguous images, those who had been food-deprived were far more likely to interpret the images as food-related objects. Because they were hungry, they were more motivated to see the images in a certain way.
  • Expectations also play an important role. If we expect people to behave in certain ways in certain situations, these expectations can influence how we perceive these people and their roles. One of the classic experiments on the impact of expectation on perceptual sets involved showing participants either a series of numbers or letters. Then, the participants were shown an ambiguous image that could either be interpreted as the number 13 or the letter B. Those who had viewed the numbers were more likely to see it as a 13, while those who had viewed the letters were more likely to see it as the letter B.
  • Culture also influences how we perceive people, objects, and situations. Surprisingly, researchers have found that people from different cultures even tend to perceive perspective and depth cues differently.
  • Emotions can have a dramatic impact on how we perceive the world around us. For example, if we are angry, we might be more likely to perceive hostility in others. One experiment demonstrated that when people came to associate a nonsense syllable with mild electrical shocks, they experienced physiological reactions to the syllable even when it was presented subliminally.
  • Attitudes can also have a powerful influence on perception. In one experiment, Gordon Allport demonstrated that prejudice could have an influence on how quickly people categorize people of various races.

Real-Life Examples

Researchers have shown that perceptual sets can have a dramatic impact on day-to-day life.

In one experiment, young children were found to enjoy french fries more when they were served in a McDonald's bag rather than just a plain white bag. In another example, people who were told that an image was of the famed "Loch Ness monster" were more likely to see the mythical creature in the picture, while others who did not have the expectation of seeing a sea creature, saw only a curved tree trunk.

"Once we have formed a wrong idea about reality, we have more difficulty seeing the truth.

As previously mentioned, our perceptual set for faces is so strong that it actually causes us to see faces where there are none. Consider how people often describe seeing a face on the moon or in many of the inanimate objects that we encounter in our everyday lives.

As you can see, perception is not simply a matter of seeing what is in the world around us. A variety of factors can influence how we take in information and how we interpret it, as stimuli are filtered through our personal knowledge, expectations, emotions, and context.

Biggs A, Adamo S, Dowd E, Mitroff S. Examining perceptual and conceptual set biases in multiple-target visual search .  Attention, Perception, & Psychophysics . 2015;77(3):844-855. doi:10.3758/s13414-014-0822-0

Nolan SA, Hockenbury SE. Discovering Psychology . Worth Publishers, 2021.

Hardy M, Heyes S.  Beginning Psychology . Oxford University Press; 1999.

Gaspelin N, Luck SJ. " Top-down" does not mean "voluntary ".  J Cogn . 2018;1(1):25. doi:10.5334/joc.28

Sanford R. The effects of abstinence from food upon imaginal processes: A preliminary experiment .  J Psychol . 1936;2(1):129-136. doi:10.1080/00223980.1936.9917447

Bruner J, Minturn A. Perceptual identification and perceptual organization .  J Gen Psychol . 1955;53(1):21-28. doi:10.1080/00221309.1955.9710133

de Bruïne G, Vredeveldt A, van Koppen PJ. Cross-cultural differences in object recognition: Comparing asylum seekers from Sub-Saharan Africa and a matched Western European control group .  Appl Cogn Psychol . 2018;32(4):463‐473. doi:10.1002/acp.3419

Lazarus RS, McCleary RA. Autonomic discrimination without awareness: A study of subception .  Psychological Review.  1951;58(2):113–122. doi:10.1037/h0054104

Barlow FK, Hornsey MJ, Thai M, Sengupta NK, Sibley CG. The wallpaper effect: The contact hypothesis fails for minority group members who live in areas with a high proportion of majority group members .  PLoS One . 2013;8(12):e82228. doi:10.1371/journal.pone.0082228

Solomon MR, Russell-Bennett R, Previte J.  Consumer Behaviour: Buying, Having, Being . Frenchs Forest, NSW: 2013.

Campbell S. The Loch Ness Monster: The Evidence . Prometheus Books; 1997.

Myers DG.  Psychology . 7th ed. Worth Publishers; 2004.

By Kendra Cherry, MSEd Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Front Hum Neurosci

Understanding human perception by human-made illusions

Claus-christian carbon.

1 Department of General Psychology and Methodology, University of Bamberg, Bamberg, Germany

2 Bamberg Graduate School of Affective and Cognitive Sciences (BaGrACS), Bamberg, Germany

It may be fun to perceive illusions, but the understanding of how they work is even more stimulating and sustainable: They can tell us where the limits and capacity of our perceptual apparatus are found—they can specify how the constraints of perception are set. Furthermore, they let us analyze the cognitive sub-processes underlying our perception. Illusions in a scientific context are not mainly created to reveal the failures of our perception or the dysfunctions of our apparatus, but instead point to the specific power of human perception. The main task of human perception is to amplify and strengthen sensory inputs to be able to perceive, orientate and act very quickly, specifically and efficiently. The present paper strengthens this line of argument, strongly put forth by perceptual pioneer Richard L. Gregory (e.g., Gregory, 2009 ), by discussing specific visual illusions and how they can help us to understand the magic of perception.

About the veridicality of perception

The relationship between reality and object.

Sensory perception is often the most striking proof of something factual—when we perceive something, we interpret it and take it as “objective”, “real”. Most obviously, you can experience this with eyewitness testimonies: If an eyewitness has “seen it with the naked eye”, judges, jury members and attendees take the reports of these percepts not only as strong evidence, but usually as fact—despite the active and biasing processes on basis of perception and memory. Indeed, it seems that there is no better, no more “proof” of something being factual knowledge than having perceived it. The assumed link between perception and physical reality is particularly strong for the visual sense—in fact, we scrutinize it only when sight conditions have been unfortunate, when people have bad vision or when we know that the eyewitness was under stress or was lacking in cognitive faculties. When people need even more proof of reality than via the naked eye, they intuitively try to touch the to-be-analyzed entity (if at all possible) in order to investigate it haptically. Feeling something by touch seems to be the ultimate perceptual experience in order for humans to speak of physical proof (Carbon and Jakesch, 2013 ).

We can analyze the quality of our perceptual experiences by standard methodological criteria. By doing so we can regularly find out that our perception is indeed mostly very reliable and also objective (Gregory and Gombrich, 1973 )—but only if we employ standard definitions of “objective” as being consensual among different beholders. Still, even by meeting these methodological criteria, we cannot give something in evidence about physical reality. It seems that knowledge about the physical properties of objects cannot be gained by perception, so perception is neither “veridical” nor “valid” in the strict sense of the words—the properties of the “thing in itself” remain indeterminate in any empirical sense (Kant, 1787/1998 ). We “reliably” and “objectively” might perceive the sun going up in the morning and down in the evening; the physical relations are definitely different, as we have known at least since Nicolaus Copernicus’s proposed heliocentricism—it might also be common sense that the Earth is a spheroid for most people, still the majority of people have neither perceived the Earth as spherical nor represented it like that; one reason for this is that in everyday life contexts the illusion of a plane works perfectly well to guide us in the planning and execution of our actions (Carbon, 2010b ).

Limitations of the possibility of objective perception

The limitations of perception are even more far reaching: our perception is not only limited when we do not have access to the thing in itself, it is very practically limited to the quality of processing and the general specifications of our perceptual system. For instance, our acoustic sense can only register and process a very narrow band of frequencies ranging from about 16 Hz–20 kHz as a young adult—this band gets narrower and narrower with increasing age. Typically, infrasonic and ultrasonic bands are just not perceivable despite being essential for other species such as elephants and bats, respectively. The perception of the environment and, consequently, the perception and representation of the world as such, is different for these species—what would be the favorite music of an elephant, which preference would a bat indicate if “honestly asked”? What does infrasonic acoustics sound and feel like? Note: infrasonic frequencies can also be perceived by humans; not acoustically in a strict sense but via vibrations—still, the resulting experiences are very different (cf. Nagel, 1974 ). To make such information accessible we need transformation techniques; for instance, a Geiger-Müller tube for making ionizing radiation perceivable as we have not developed any sensory system for detecting and feeling this band of extremely high frequency electromagnetic radiation.

But even if we have access to given information from the environmental world, it would be an illusion to think of “objective perception” of it—differences in perception across different individuals seem to be obvious: this is one reason for different persons having different tastes, but it is even more extreme: even within a lifetime of one person, the perceptual qualities and quantities which we can process change. Elderly people, for instance, often have yellowish corneas yielding biased color perception reducing the ability to detect and differentiate bluish color spectra. So even objectivity of perceptions in the sense of consensual experience is hardly achievable, even within one species, even within one individual—just think of fashion phenomena (Carbon, 2011a ), of changes in taste (Martindale, 1990 ) or the so-called cycle of preferences (Carbon, 2010a )! Clearly, so-called objective perception is impossible, it is an illusion.

Illusory construction of the world

The problem with the idea of veridical perception of the world is further intensified when taking additional perceptual phenomena, which demonstrate highly constructive qualities of our perceptual system, into account. A very prominent example of this kind is the perceptual effect which arises when any visual information which we want to process falls on the area of the retina where the so-called blind spot is located (see Figure ​ Figure1 1 ).

An external file that holds a picture, illustration, etc.
Object name is fnhum-08-00566-g0001.jpg

Demonstration of the blind spot, the area on the retina where visual information cannot be processed due to a lack of photoreceptors . The demonstration works as follows: Fixate at a distance of approx. 40 cm the X on the left side with your right eye while having closed your left eye—now move your head slightly in a horizontal way from left to right and backwards till the black disc on the right side seems to vanish.

Interestingly, visual information that is mapped on the blind spot is not just dropped—this would be the easiest solution for the visual apparatus. It is also not rigidly interpolated, for instance, by just doubling neighbor information, but intelligently complemented by analysing the meaning and Gestalt of the context. If we, for example, are exposed to a couple of lines, the perceptual system would complement the physically non-existing information of the blind spot by a best guess heuristic how the lines are interconnected in each case, mostly yielding a very close approximation to “reality” as it uses most probable solutions. Finally, we experience clear visual information, seemingly in the same quality as the one which mirrors physical perception—in the end, the “physical perception” and the “constructed perception”, are of the same quality, also because the “physical perception” is neither a depiction of physical reality, but is also constructed by top-down processes based on best guess heuristic as a kind of hypothesis testing or problem solving (Gregory, 1970 ).

Beside this prominent example which has become common knowledge up to now, a series of further phenomena exist where we can speak of full perceptual constructions of the world outside without any direct link to the physical realities. A very intriguing example of this kind will be described in more detail in the following: When we make fast eye movements (so-called saccades) our perceptual system is suppressed, with the result that we are functionally blind during such saccades. Actually, we do not perceive these blind moments of life although they are highly frequent and relatively long as such—actually, Rayner et al. estimated that typical fixations last about 200–250 ms and saccades last about 20–40 ms (Rayner et al., 2001 ), so about 10% of our time when we are awake is susceptible to such suppression effects. In accordance with other filling-in phenomena, missing data is filled up with the most plausible information: Such a process needs hypotheses about what is going on in the current situation and how the situation will evolve (Gregory, 1970 , 1990 ). If the hypotheses are misleading because the underlying mental model of the situation and its further genesis is incorrect, we face an essential problem: what we then perceive (or fail to perceive) is incompatible with the current situation, and so will mislead our upcoming action. In most extreme cases, this could lead to fatal decisions: for instance: if the model does not construct a specific interfering object in our movement axis, we might miss information essential to changing our current trajectory resulting in a collision course. In such a constellation, we would be totally startled by the crash, as we would not have perceived the target object at all—this is not about missing an object but about entirely overlooking it due to a non-existing trace of perception.

Despite the knowledge about these characteristics of the visual system, we might doubt such processes as the mechanisms are working to so great an extent in most everyday life situations that it provides the perfect illusion of continuous, correct and super-detailed visual input. We can, however, illustrate this mechanism very easily by just observing our eye movements in a mirror: when executing fast eye movements, we cannot observe them by directly inspecting our face in the mirror—we can only perceive our fixations and the slow movements of the eyes. If we, however, film the same scene with a video camera, the whole procedure looks totally different: Now we clearly also see the fast movements; so we can directly experience the specific operation of the visual system in this respect by comparing the same scene captured by two differently working visual systems: our own, very cognitively operating, visual system and the rigidly filming video system which just catches the scene frame by frame without further processing, interpreting and tuning it. 1 We call this moment of temporary functional blindness phenomenon “saccade blindness” or “saccade suppression”, which again illustrates the illusionary aspects of human perception “saccadic suppression”, Bridgeman et al., 1975 ; “tactile suppression”, Ziat et al., 2010 ). We can utilize this phenomena for testing interesting hypotheses on the mental representation of the visual environment: if we change details of a visual display during such functional blind phases of saccadic movements, people usually do not become aware of such changes, even if very important details, e.g., the expression of the mouth, are changed (Bohrn et al., 2010 ).

Illusions by top-down-processes

Gregory proposed that perception shows the quality of hypothesis testing and that illusions make us clear how these hypotheses are formulated and on which data they are based (Gregory, 1970 ). One of the key assumptions for hypothesis testing is that perception is a constructive process depending on top-down processing. Such top-down processes can be guided through knowledge gained over the years, but perception can also be guided by pre-formed capabilities of binding and interpreting specific forms as certain Gestalts. The strong reliance of perception on top-down processing is the essential key for assuring reliable perceptual abilities in a world full of ambiguity and incompleteness. If we read a text from an old facsimile where some of the letters have vanished or bleached out over the years, where coffee stains have covered partial information and where decay processes have turned the originally white paper into a yellowish crumbly substance, we might be very successful in reading the fragments of the text, because our perceptual system interpolates and (re-)constructs (see Figure ​ Figure2). 2 ). If we know or understand the general meaning of the target text, we will even read over some passages that do not exist at all: we fill the gaps through our knowledge—we change the meaning towards what we expect.

An external file that holds a picture, illustration, etc.
Object name is fnhum-08-00566-g0002.jpg

Demonstration of top-down processing when reading the statement “The Grand Illussion” under highly challenging conditions (at least challenging for automatic character recognition) .

A famous example which is often cited and shown in this realm is the so-called man-rat-illusion where an ambiguous sketch drawing is presented whose content is not clearly decipherable, but switches from showing a man to showing a rat—another popular example of this kind is the bistable picture where the interpretation flips from an old woman to a young woman an v.v. (see Figure ​ Figure3)—most 3 )—most people interpret this example as a fascinating illusion demonstrating humans’ capability of switching from one meaning to another, but the example also demonstrates an even more intriguing process: what we will perceive at first glance is mainly guided through the specific activation of our semantic network. If we have been exposed to a picture of a man before, or if we think of a man or have heard the word “man”, the chance is strongly increased that our perceptual system interprets the ambiguous pattern towards a depiction of a man—if the prior experiences were more associated with a rat, a mouse or another animal of such a kind, we will, in contrast, tend to interpret the ambiguous pattern more as a rat.

An external file that holds a picture, illustration, etc.
Object name is fnhum-08-00566-g0003.jpg

The young-old-woman illusion (also known as the My Wife and My Mother-In-Law illusion) already popular in Germany in the 19th century when having been frequently depicted on postcards . Boring ( 1930 ) was the first who presented this illusion in a scientific context (image on the right) calling it a “new” illusion (concretely, “a new ambiguous figure”) although it was very probably taken from an already displayed image of the 19th century within an A and P Condensed Milk advertisement (Lingelbach, 2014 ).

So, we can literally say that we perceive what we know—if we have no prior knowledge of certain things we can even overlook important details in a pattern because we have no strong association with something meaningful. The intimate processing between sensory inputs and our semantic networks enables us to recognize familiar objects within a few milliseconds, even if they show the complexity of human faces (Locher et al., 1993 ; Willis and Todorov, 2006 ; Carbon, 2011b ).

Top-down processes are powerful in schematizing and easing-up perceptual processes in the sense of compressing the “big data” of the sensory inputs towards tiny data packages with pre-categorized labels on such schematized “icons” (Carbon, 2008 ). Top-down processes, however, are also susceptible to characteristic fallacies or illusions due to their guided, model-based nature: When we have only a brief time slot for a snapshot of a complex scene, the scene is (if we have associations with the general meaning of the inspected scene at all) so simplified that specific details get lost in favor of the processing and interpretation of the general meaning of the whole scene.

Biederman ( 1981 ) impressively demonstrated this by exposing participants to a sketch drawing of a typical street scene where typical objects are placed in a prototypical setting, with the exception that a visible hydrant in the foreground was not positioned on the pavement besides a car but unusually directly on the car. When people were exposed to such a scene for only 150 ms, followed by a scrambled backward mask, they “re-arranged” the setting by top-down processes based on their knowledge of hydrants and their typical positions on pavements. In this specific case, people have indeed been deceived, because they report a scene which was in accordance with their knowledge but not with the assessment of the presented scene—but for everyday actions this seems unproblematic. Although you might indeed lose the link to the fine-detailed structure of a specific entity when strongly relying on top-down processes, such an endeavor works quite brilliantly in most cases as it is a best guess estimation or approximation—it works particularly well when we are running out of resources, e.g., when we are in a specific mode of being pressed for time and/or you are engaged in a series of other cognitive processes. Actually, such a mode is the standard mode in everyday life. However, even if we had the time and no other processes needed to be executed, we would not be able to adequately process the big data of the sensory input.

The whole idea of this top-down processing with schematized perception stems from F. C. Bartlett’s pioneering series of experiments in a variety of domains (Bartlett, 1932 ). Bartlett already showed that we do not read the full information from a visual display or a narrative, but that we rely on schemata reflecting the essence of things, stories, and situations being strongly shaped by prior knowledge and its specific activation (see for a critical reflection of Bartlett’s method Carbon and Albrecht, 2012 ).

Perception as a grand illusion

Reconstructing human psychological reality.

There is clearly an enormous gap between the big data provided by the external world and our strictly limited capacity to process them. The gap widens even further when taking into account that we not only have to process the data but ultimately have to make clear sense of the core of the given situation. The goal is to make one (and only one) decision based on the unambiguous interpretation of this situation in order to execute an appropriate action. This very teleological way of processing needs inhibitory capabilities for competing interpretations to strictly favor one single interpretation which enables fast action without quarrelling about alternatives. In order to realize such a clear interpretation of a situation, we need a mental model of the external world which is very clear and without ambiguities and indeterminacies. Ideally, such a model is a kind of caricature of physical reality: If there is an object to be quickly detected, the figure-ground contrast, e.g., should be intensified. If we need to identify the borders of an object under unfavorable viewing conditions, it is helpful to enhance the transitions from one border to another, for instance. If we want to easily diagnose the ripeness of a fruit desired for eating, it is most helpful when color saturation is amplified for familiar kinds of fruits. Our perceptual system has exactly such capabilities of intensifying, enhancing and amplifying—the result is the generation of schematic, prototypical, sketch-like perceptions and representations. Any metaphor for perception as a kind of tool which makes photos is fully misleading because perception is much more than blueprinting: it is a cognitive process aiming at reconstructing any scene at its core.

All these “intelligent perceptual processes” can most easily be demonstrated by perceptual illusions: For instance, when we look at the inner horizontal bar of Figure ​ Figure4, 4 , we observe a continuous shift from light to dark gray and from left to right, although there is no physical change in the gray value—in fact only one gray value is used for creating this region. The illusion is induced by the distribution of the peripheral gray values which indeed show a continuous shift of gray levels, although in a reverse direction. The phenomenon of simultaneous contrast helps us to make the contrast clearer; helping us to identify figure-ground relations more easily, more quickly and more securely.

An external file that holds a picture, illustration, etc.
Object name is fnhum-08-00566-g0004.jpg

Demonstration of the simultaneous contrast, an optical illusion already described as phenomenon 200 years ago by Johan Wolfgang von Goethe and provided in high quality and with an intense effect by McCourt ( 1982 ): the inner horizontal bar is physically filled with the same gray value all over, nevertheless, the periphery with its continuous change of gray from darker to lighter values from left to right induce the perception of a reverse continuous change of gray values . The first one who showed the effect in a staircase of grades of gray was probably Ewald Hering (see Hering, 1907 ; pp. I. Teil, XII. Kap. Tafel II), who also proposed the theory of opponent color processing.

A similar principle of intensifying given physical relations by the perceptual system is now known as the Chevreul-Mach bands (see Figure ​ Figure5), 5 ), independently introduced by chemist Michel Eugène Chevreul (see Chevreul, 1839 ) and by physicist and philosopher Ernst Waldfried Josef Wenzel Mach (Mach, 1865 ). Via the process of lateral inhibition, luminance changes from one bar to another are exaggerated, specifically at the edges of the bars. This helps to differentiate between the different areas and to trigger edge-detection of the bars.

An external file that holds a picture, illustration, etc.
Object name is fnhum-08-00566-g0005.jpg

Chevreul-Mach bands. Demonstration of contrast exaggeration by lateral inhibition: although every bar is filled with one solid level of gray, we perceive narrow bands at the edges with increased contrast which does not reflect the physical reality of solid gray bars.

Constructing human psychological reality

This reconstructive capability is impressive and helps us to get rid of ambiguous or indeterminate percepts. However, the power of perception is even more intriguing when we look at a related phenomenon. When we analyze perceptual illusions where entities or relations are not only enhanced in their recognizability but even entirely constructed without a physical correspondence, then we can quite rightly speak of the “active construction” of human psychological reality. A very prominent example is the Kanizsa triangle (Figure ​ (Figure6) 6 ) where we clearly perceive illusory contours and related Gestalts—actually, none of them exists at all in a physical sense. The illusion is so strong that we have the feeling of being able to grasp even the whole configuration.

An external file that holds a picture, illustration, etc.
Object name is fnhum-08-00566-g0006.jpg

Demonstration of illusory contours which create the clear perception of Gestalts . The so-called Kanizsa triangle named after Gaetano Kanizsa (see Kanizsa, 1955 ), a very famous example of the long tradition of such figures displayed over centuries in architecture, fashion and ornamentation. We not only perceive two triangles, but even interpret the whole configuration as one with clear depth, with the solid white “triangle” in the foreground of another “triangle” which stands bottom up.

To detect and recognize such Gestalts is very important for us. Fortunately, we are not only equipped with a cognitive mechanism helping us to perceive such Gestalts, but we also feel rewarded when having recognized them as Gestalts despite indeterminate patterns (Muth et al., 2013 ): in the moment of the insight for a Gestalt the now determinate pattern gains liking (the so-called “Aesthetic-Aha-effect”, Muth and Carbon, 2013 ). The detection and recognition process adds affective value to the pattern which leads to the activation of even more cognitive energy to deal with it as it now means something to us.

Conclusions

Perceptual illusions can be seen, interpreted and used in two very different aspects: on the one hand, and this is the common property assigned to illusions, they are used to entertain people. They are a part of our everyday culture, they can kill time. On the other hand, they are often the starting point for creating insights. And insights, especially if they are based on personal experiences through elaborative processes actively, are perfect pre-conditions to increase understanding and to improve and optimize mental models (Carbon, 2010b ). We can even combine both aspects to create an attractive learning context: by drawing people’s attention via arousing and playful illusions, we generate attraction towards the phenomena underlying the illusions. If people get really interested, they will also invest sufficient time and cognitive energy to be able to solve an illusion or to get an idea of how the illusion works. If they arrive at a higher state of insight, they will benefit from understanding what kind of perceptual mechanism is underlying the phenomenon.

We can of course interpret perceptual illusions as malfunctions indicating the typical limits of our perceptual or cognitive system—this is probably the standard perspective on the whole area of illusions. In this view, our systems are fallible, slow, malfunctioning, and imperfect. We can, however, also interpret illusory perceptions as a sign of our incredible, highly complex and efficient capabilities of transforming sensory inputs into understanding and interpreting the current situation in a very fast way in order to generate adequate and goal-leading actions in good time (see Gregory, 2009 )—this view is not yet the standard one to be found in beginners’ text books and typical descriptions or non-scientific papers on illusions. By taking into account how perfectly we act in most everyday situations, we can experience the high “intelligence” of the perceptual system quite easily and intuitively. We might not own the most perfect system when we aim to reproduce the very details of a scene, but we can assess the core meaning of a complex scene.

Typical perceptual processes work so brilliantly that we can mostly act appropriately, and, very important for a biological system, we can act in response to the sensory inputs very fast—this has to be challenged by any technical, man-made system, and will always be the most important benchmark for artificial perceptual systems. Following the research and engineering program of bionics (Xie, 2012 ),where systems and processes of nature are transferred to technical products, we might be well-advised to orient our developments in the field of perception to the characteristic processing of biological perceptual systems, and their typical behavior when perceptual illusions are encountered.

Conflict of interest statement

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

This paper was strongly inspired by Richard L. Gregory’s talks, texts and theories which I particularly enjoyed during the first years of my research career. The outcome of these “perceptions” changed my “perception on reality” and so on “reality” as such. I would also like to thank two anonymous reviewers who put much effort in assisting me to improve a previous version of this paper. Last but not least I want to express my gratitude to Baingio Pinna, University of Sassari, who edited the whole Research Topic together with Adam Reeves, Northeastern University, USA.

1 There is an interesting update in technology for demonstrating this effect putting forward by one of the reviewers. If you use the 2nd camera of your smartphone (the one for shooting “selfies”) or your notebook camera and you look at your depicted eyes very closely, then the delay of building up the film sequence is seemingly a bit longer than the saccadic suppression yielding the interesting effect of perceiving your own eye movements directly. Note: I have tried it out and it worked, by the way best when using older models which might take longer for building up the images. You will perceive your eye movements particular clearly when executing relatively large saccades, e.g., from the left periphery to the right and back.

  • Bartlett F. C. (1932). Remembering: A Study in Experimental and Social Psychology. Cambridge: Cambridge University Press [ Google Scholar ]
  • Biederman I. (1981). “ On the semantics of a glance at a scene ,” in Perceptual Organization , eds Kubovy M., Pomerantz J. R. (Hillsdale, New Jersey: Lawrence Erlbaum; ), 213–263 [ Google Scholar ]
  • Bohrn I., Carbon C. C., Hutzler F. (2010). Mona Lisa’s smile—perception or deception? Psychol. Sci. 21 , 378–380 10.1177/0956797610362192 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Boring E. G. (1930). A new ambiguous figure . Am. J. Psychol. 42 , 444–445 10.2307/1415447 [ CrossRef ] [ Google Scholar ]
  • Bridgeman G., Hendry D., Stark L. (1975). Failure to detect displacement of visual world during saccadic eye movements . Vision Res. 15 , 719–722 10.1016/0042-6989(75)90290-4 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Carbon C. C. (2008). Famous faces as icons. The illusion of being an expert in the recognition of famous faces . Perception 37 , 801–806 10.1068/p5789 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Carbon C. C. (2010a). The cycle of preference: long-term dynamics of aesthetic appreciation . Acta Psychol. (Amst) 134 , 233–244 10.1016/j.actpsy.2010.02.004 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Carbon C. C. (2010b). The earth is flat when personally significant experiences with the sphericity of the earth are absent . Cognition 116 , 130–135 10.1016/j.cognition.2010.03.009 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Carbon C. C. (2011a). Cognitive mechanisms for explaining dynamics of aesthetic appreciation . Iperception 2 , 708–719 10.1068/i0463aap [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Carbon C. C. (2011b). The first 100 milliseconds of a face: on the microgenesis of early face processing . Percept. Mot. Skills 113 , 859–874 10.2466/07.17.22.pms.113.6.859-874 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Carbon C. C., Albrecht S. (2012). Bartlett’s schema theory: the unreplicated “portrait d’homme” series from 1932 . Q. J. Exp. Psychol. (Hove) 65 , 2258–2270 10.1080/17470218.2012.696121 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Carbon C. C., Jakesch M. (2013). A model for haptic aesthetic processing and its implications for design . Proc. IEEE 101 , 2123–2133 10.1109/jproc.2012.2219831 [ CrossRef ] [ Google Scholar ]
  • Chevreul M.-E. (1839). De La loi du Contraste Simultané des Couleurs et de L’assortiment des Objets Colorés: Considéré D’après cette loi Dans ses rapports avec La peinture, les Tapisseries des Gobelins, les Tapisseries de Beauvais pour meubles, les Tapis, la Mosaique, les Vitraux colorés, L’impression des étoffes, L’imprimerie, L’enluminure, La decoration des édifices, L’habillement et L’horticulture. Paris, France: Pitois-Levrault [ Google Scholar ]
  • Gregory R. L. (1970). The Intelligent Eye. London: Weidenfeld and Nicolson [ Google Scholar ]
  • Gregory R. L. (1990). Eye and Brain: The Psychology of Seeing. 4th Edn. Princeton, N.J.: Princeton University Press [ Google Scholar ]
  • Gregory R. L. (2009). Seeing Through Illusions. Oxford: Oxford University Press [ Google Scholar ]
  • Gregory R. L., Gombrich E. H. (1973). Illusion in Nature and Art. London/UK: Gerald Duckworth and Company Ltd [ Google Scholar ]
  • Hering E. (1907). Grundzüge der Lehre vom Lichtsinn. Sonderabdruck A.D. ‘Handbuch der Augenheilkunde’. Leipzig, Germany: W. Engelmann [ Google Scholar ]
  • Kanizsa G. (1955). Margini quasi-percettivi in campi con stimolazione omogenea . Riv. Psicol. 49 , 7–30 [ Google Scholar ]
  • Kant I. (1787/1998). Kritik der Reinen Vernunft [Critique of Pure Reason]. Hamburg: Meiner [ Google Scholar ]
  • Lingelbach B. (2014). The barn . Perception . 10.1068/p7743 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Locher P., Unger R., Sociedade P., Wahl J. (1993). At 1st glance: accessibility of the physical attractiveness stereotype . Sex Roles 28 , 729–743 10.1007/bf00289990 [ CrossRef ] [ Google Scholar ]
  • Mach E. (1865). Über die wirkung der räumlichen vertheilung des lichtreizes auf die Netzhaut . Sitzungsberichte der Mathematisch-Naturwissenschaftlichen Classe der Kaiserlichen Akademie der Wissenschaften 52 , 303–322 [ Google Scholar ]
  • Martindale C. (1990). The Clockwork Muse: The Predictability of Artistic Change. New York: Basic Books [ Google Scholar ]
  • McCourt M. E. (1982). A spatial-frequency dependent grating-induction effect . Vision Res. 22 , 119–134 10.1016/0042-6989(82)90173-0 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Muth C., Carbon C. C. (2013). The aesthetic aha: on the pleasure of having insights into Gestalt . Acta Psychol. (Amst) 144 , 25–30 10.1016/j.actpsy.2013.05.001 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Muth C., Pepperell R., Carbon C. C. (2013). Give me Gestalt! preference for cubist artworks revealing high detectability of objects . Leonardo 46 , 488–489 10.1162/leon_a_00649 [ CrossRef ] [ Google Scholar ]
  • Nagel T. (1974). What is it like to be a bat? Philos. Rev. 83 , 435–450 [ Google Scholar ]
  • Rayner K., Foorman B. R., Perfetti C. A., Pesetsky D., Seidenberg M. S. (2001). How psychological science informs the teaching of reading . Psychol. Sci. 2 , 31–74 10.1111/1529-1006.00004 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Willis J., Todorov A. (2006). First impressions: making up your mind after a 100-ms exposure to a face . Psychol. Sci. 17 , 592–598 10.1111/j.1467-9280.2006.01750.x [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Xie H. (2012). A study on natural bionics in product design . Manuf. Eng. Automation 591–593 , 209–213 10.4028/www.scientific.net/amr.591-593.209 [ CrossRef ] [ Google Scholar ]
  • Ziat M., Hayward V., Chapman C. E., Ernst M. O., Lenay C. (2010). Tactile suppression of displacement . Exp. Brain Res. 206 , 299–310 10.1007/s00221-010-2407-z [ PubMed ] [ CrossRef ] [ Google Scholar ]

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 24 September 2019

Perceptual and conceptual processing of visual objects across the adult lifespan

  • Rose Bruffaerts 1 , 2 , 3 ,
  • Lorraine K. Tyler 1 , 4 ,
  • Meredith Shafto 1 ,
  • Kamen A. Tsvetanov 1 , 4 ,
  • Cambridge Centre for Ageing and Neuroscience &
  • Alex Clarke   ORCID: orcid.org/0000-0001-7768-5229 1  

Scientific Reports volume  9 , Article number:  13771 ( 2019 ) Cite this article

4908 Accesses

15 Citations

2 Altmetric

Metrics details

Making sense of the external world is vital for multiple domains of cognition, and so it is crucial that object recognition is maintained across the lifespan. We investigated age differences in perceptual and conceptual processing of visual objects in a population-derived sample of 85 healthy adults (24–87 years old) by relating measures of object processing to cognition across the lifespan. Magnetoencephalography (MEG) was recorded during a picture naming task to provide a direct measure of neural activity, that is not confounded by age-related vascular changes. Multiple linear regression was used to estimate neural responsivity for each individual, namely the capacity to represent visual or semantic information relating to the pictures. We find that the capacity to represent semantic information is linked to higher naming accuracy, a measure of task-specific performance. In mature adults, the capacity to represent semantic information also correlated with higher levels of fluid intelligence, reflecting domain-general performance. In contrast, the latency of visual processing did not relate to measures of cognition. These results indicate that neural responsivity measures relate to naming accuracy and fluid intelligence. We propose that maintaining neural responsivity in older age confers benefits in task-related and domain-general cognitive processes, supporting the brain maintenance view of healthy cognitive ageing.

Similar content being viewed by others

what perceptual hypothesis

The vertical position of visual information conditions spatial memory performance in healthy aging

what perceptual hypothesis

Three major dimensions of human brain cortical ageing in relation to cognitive decline across the eighth decade of life

what perceptual hypothesis

Two separate, large cohorts reveal potential modifiers of age-associated variation in visual reaction time performance

Introduction.

Recognizing objects is a fundamental aspect of human cognition. Accessing the meaning of an object is essential in order to interact successfully with the world around us, and is therefore a vitally important cognitive function to maintain across the adult lifespan. Research with young adults suggests that accessing meaning from vision is accomplished within the first half second of seeing an object 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , and involves recurrent activity within the ventral temporal cortex extending into the anteromedial temporal cortex 2 , 10 , 11 , 12 , 13 , 14 . As the visual input is processed along this pathway, it is transformed into an initial coarse grained semantic representation (e.g. animal, tool ) in the inferior temporal cortex before a more semantically specific representation emerges (e.g. cow, hammer ) in the anteromedial temporal cortex 12 , 15 , 16 .

Using multivariate analysis enables quantification of the representation of perceptual and semantic information during this rapid transformation process. Clarke et al . 4 investigated the time course of single object processing using a computational model of vision 17 combined with semantic-feature information 18 . In young participants, perceptual information was represented within the first 150 ms of object presentation, with the addition of semantic information providing a better account of object representations up to 400 ms 4 . The combination of explicit models of vision and semantics provides an integrated account of the processing of perceptual and conceptual information of visual objects 19 , 20 . While it is well-known that visual processing becomes slower in middle-aged and mature people 21 , 22 , 23 , it is unclear whether there are age-related differences in the processing of visual or semantic information of single objects. Here, we evaluated differences in measures of perceptual and semantic information across the lifespan using MEG in a large population-derived ageing cohort from the Cambridge Centre for Ageing and Neuroscience (Cam-CAN; http://www.cam-can.org ). Possible age-related neural differences in object processing may or may not relate to behavior: changes may impact either task-performance or domain-general cognitive function, or both. To address this, we relate neural measures of perceptual and semantic information processing to different metrics of cognition to evaluate their relevance for healthy cognitive ageing.

It is well established that both early and late aspects of visually evoked neural responses show age-related changes, where activity is reduced and delayed with age 22 , 24 , 25 , 26 , 27 , 28 . For example, recently Price et al . 28 used MEG to show that the initial neural response to checkerboards in early visual cortex is increasingly delayed across the lifespan. Further, age-related differences in later visual components have been observed, such as delayed N170 25 and slower information processing of faces 26 , 27 . However, it remains to be determined if age-related changes are related to early visual processes or semantic activation, how changes in the initial visual processes (amplitude or delay) impact semantics, and further if such changes have behavioural consequences.

Rather than a mere description of age-related neural differences, a challenge is to relate these differences to cognition to elucidate what happens during successful ageing 29 , 30 . Across the adult lifespan, differences in fluid intelligence and picture naming accuracy can be predicted from the degree to which different brain networks are responsive to these tasks 31 . This suggests that maintenance of neural responsivity could support successful cognitive ageing. The “maintenance view” hypothesizes that the brains of mature adults whose neurobiology is well preserved, will show activation patterns similar to younger adults which are germane to proficient performance 32 . However, many current models of healthy cognitive ageing are primarily based on fMRI studies 29 , which could be confounded by the effects of age on vasculature 33 . Therefore, we need electrophysiological studies to complement fMRI research, and extend our current theoretical models of neurocognitive ageing. Research techniques such as MEG, provide both a direct measure of neural activity and allow us to examine temporal dynamics, and therefore offer an ideal approach to examine neurocognitive models of ageing.

In the current study, we ask whether the representation of perceptual and semantic information reflected in the MEG signal is different across the adult lifespan, and whether this relates to task–related measures of cognition, e.g. naming accuracy, and domain-general cognitive measures, namely fluid and crystallized intelligence. We analyzed MEG signals during a picture naming task from the Cam-CAN cohort study 34 . By relating single-object measures of vision and semantics to MEG signals, we were able to test (1) whether representations of visual and semantic information are different across the adult lifespan (2) whether changes in representation of visual information impacts semantics, and (3) do these age-related differences in neural processing relate to behavioral performance.

Rather than using an approach based on raw MEG signals, we follow the strategy used in our previous study 4 where we modelled MEG signals with explicit models of vision and semantics. The outcome, which is a quantification of the individual’s capacity to represent visual or semantic information, can be seen as a measure of neural responsivity. In other words, the individual’s ability to neurally represent a stimulus – quantified by a higher correlation between neural activity and the visual or semantic model - implies higher neural responsivity. The brain maintenance hypothesis suggests that better neural responsivity supports better cognition in older individuals 32 , and we predict that we will see evidence of this through our measures of visual and semantic processing. Moreover, moderation analysis can be used to test whether age plays a role in the relationship between neural responsivity and behavioral performance. Following Samu et al . 31 , we investigated picture naming accuracy, a task-specific cognitive measure of object naming, which is based on the output of visual and semantic information processing. Additionally, we investigated domain-general performance (fluid and crystallized intelligence) because neural responsivity might reflect a more general neural property of performance across tasks. Fluid intelligence is on average lower in mature adults, while crystallized intelligence is unchanged 31 , 35 , 36 . This difference prompts us to study the relationship between neural responsivity and both cognitive measures.

Behavioural results

Overall object naming accuracy for the 302 common objects was high (90.9%, SD 5.3%), but decreased significantly with age (Pearson’s r = −0.476, p < 0.001) (Fig.  1a ). When dividing the participants into equally sized age groups, we found mean accuracy was 92.3% (SD 4.7%) in the young group (24–37 years old), in the middle-aged group (47–60 years old) it was 93.8% (SD 3.8%) and in the mature group (70–87 years old) it was 85.4% (SD 4.8%). These results are consistent with previously reported age-related differences in accuracy for the same participants during fMRI picture naming 31 . Mean reaction times for correct responses tended to increase with age, but did not reach significance (r = 0.200, p = 0.066; Fig.  1b ).

figure 1

Behavioural results for ( a ) naming accuracy, ( b ) reaction times, ( c ) Spot the Word and ( d ) Cattell Culture Fair versus age.

Crystallized intelligence (measured with the Spot the Word task 37 ) did not change with age in our sample (r = 0.078, p = 0.475, Fig.  1c ). As expected, fluid intelligence (measured with Cattell Culture Fair 38 ) significantly declined with age (r = −0.712, p < 0.001, Fig.  1d ). Crystallized intelligence and fluid intelligence correlated with naming accuracy (resp. r = 0.246, p = 0.023; r = 0.560, p < 0.001). Mean reaction times for correct responses were faster when fluid intelligence scores were higher (r = −0.281, p = 0.009). Mean reaction times for correct responses did not correlate to crystallized intelligence (r = −0.143, p = 0.189).

Visual and semantic model fits decrease across the lifespan

We next evaluated differences in visual and semantic neural processes across the lifespan by quantifying how much of the variability in the MEG signals could be explained by the models of vision and semantics – namely the AlexNet Deep Convolutional Neural Network 39 and a semantic feature-based model 18 , 40 . Regularised regression was performed at each time-point and for every MEG sensor separately, providing a measure of how well the visual or semantic models could explain the MEG signals over time.

First, using the fit between the visual model and the MEG signals, we calculated a single measure of the individual’s visual model fit (Fig.  2a ), and an individual peak latency (Fig.  2c ). After removing effects attributed to the visual model (Fig.  2a ), a second regression was used to calculate how well the semantic-feature based model could explain the residual MEG signals over time (after accounting for the visual model, Fig.  2b ). The individual semantic model fit was determined as the average semantic model fit between 150 and 400 ms (interval derived in an independent sample 4 ). Using this approach, we obtained independent measures of visual and semantic model fits for each individual.

figure 2

Schematic representation of the analysis pipeline. Calculation of ( a ) visual model fit, ( b ) semantic model fit and ( c ) peak latency. See method section for details.

Overall, we see positive visual model fits across all ages peaking close to 110 ms (Fig.  3a–d ), with the greatest model fits over posterior sensors (Fig.  4a–c ). The visual model fit significantly decreased across the adult lifespan (r = −0.274, p = 0.011; Figs  3d and 5a ) indicating that the capacity to represent visual information, as reflected by the AlexNet model, is reduced in the mature group. Across all age groups, the semantic model demonstrated increasing model fits between 150 and 400 ms (Fig.  3e–h ), with the highest model fits observed over temporal sensors (Fig.  4d–f ). Semantic model fits significantly decreased with age (r = −0.284, p = 0.009; Figs  3h and 5b ). Variability for the semantic model fits do not change across the lifespan (Fligner-Killeen test of homogeneity of variances: p = 0.593), whilst for the visual model fit, variability was lower in the mature group (p = 0.002).

figure 3

Model fits across time showing R² values for the (abcd) visual and (efgh) semantic model for the (ae) young, (bf) middle-aged and (cg) mature groups for all sensors and averaged across sensors for the three age groups (dh). Note that the effect sizes cannot be directly compared, as the visual model fit is calculated on the raw MEG signal and the semantic model fit is calculated on the residuals after the visual model fits are regressed out (see methods and Fig.  2 ).

figure 4

Topographies of visual model fit at 110 ms after stimulus onset, the mean peak latency, (abc) and semantic model fit at 290 ms after stimulus onset, derived from Clarke et al . (2015) as time with maximal classification accuracy for the semantic model, (def). Topographies for magnetometers gradiometers are visualized in the young (ad), middle-aged (be) and mature (cf) age groups.

figure 5

Relationship between the visual model fit, the semantic model fit, age and accuracy. ( a ) Correlation between age and the visual model fit, ( b ) Correlation between age and the semantic model fit, ( c ) Correlation between visual and semantic model fit (corrected for age), ( d ) Correlation between accuracy and semantic model fit (corrected for age).

A key question is whether the visual model fit influences the semantic model fit, and how model fits relate to task performance. We found a significant positive correlation between the visual and semantic model fits (r = 0.353, p < 0.001), which remained even after controlling for age (r = 0.287, p = 0.008) (Fig.  5c ). This shows that the initial visual representation of an item has subsequent consequences for its semantic representation, over and above the age-related differences. Further, we observed that higher semantic model fits correlated with higher naming accuracy levels, over and above the effect of age (r = 0.242, p = 0.026; Fig.  5d ). This effect was not present for the visual models fits (r = 0.122, p = 0.264, not shown). No correlation was found between visual or semantic model fits and domain-general performance measures namely Cattell Score and Spot the Word score (p > 0.247).

Effect of age on the relationship between performance and visual and semantic model fits

Having a higher semantic model fit related to better accuracy for object naming. Next we ask whether the relationship between either of our measures of neural responsivity, the visual and semantic model fits, and cognition is different across the age groups using moderation analysis. Moderation analysis determines whether the relationship between the independent variable (e.g. visual model fit) and a dependent variable (e.g. accuracy) varies as a function of another dependent variable, i.e. moderator variable (e.g. age). In terms of the brain maintenance view, it would be expected that when the visual and semantic model fits are higher, and therefore more like the younger and middle-aged participants, cognitive performance should be better.

We evaluated whether age moderates the relationship between the visual or semantic model fit and measures of cognition (fluid intelligence, crystallized intelligence, naming accuracy). Fluid intelligence could be predicted from a moderation model including age, the semantic model fit and the interaction of age and the semantic model fit (R² = 0.575, F(80, 4) = 27.0, p < 0.001, Table  1 ). The main effect of age was significant (β = −0.355, p < 0.001), but the main effect of the semantic model fit was not (β = −898, p = 0.072). Critically, the interaction between age and the semantic model fit was significant (β = 20.5, p = 0.013) (Fig.  6a , Table  1 ). Visualization of this relationship for a subsample divided into young, middle-aged and mature groups, shows that the relationship between fluid intelligence and semantic model fit becomes stronger for older individuals, i.e. high fluid intelligence in old age is associated with high semantic model fit (Fig.  6b ). A trend for significance was found for the interaction between age and visual model fit (β = 2.66, p = 0.062, Table  2 ), that produced a qualitatively similar effect. No moderation effects were seen in relation to naming accuracy or crystallized intelligence using the semantic model fit (Table  1 ) or visual model fit (Table  2 ).

figure 6

Prediction of fluid intelligence: (ab) interaction between age and the semantic model fit. ( a ) The interaction effect is visualized by generation of the predicted Cattell Score for every combination of age and semantic model fit based on the interaction term from the moderation model. ( b ) The correlation within the young, middle-aged and mature group.

Impact of peak visual latency on visual and semantic information

In addition to the amplitude of the visual model fits, the peak latency of the visual model fit was calculated for every subject to test if the speed of visual information processing related to age. Second, we tested if the speed of processing related to the capacity to represent visual and semantic information, as measured by the model fits.

The average peak of the visual model fit across all participants occurred at 110 ms. The latency of individual participants’ visual model fit peaks increased significantly with age (r = 0.379, p < 0.001; Fig.  7a ), showing age-related delays in the visual processing of complex objects. We next tested whether the peak latency of the visual model influences the visual and/or semantic model fits. Since both the peak latency and the visual and semantic model fit are negatively affected by age, the following analysis was corrected for age. Peak latency showed no correlation with the visual model fit (r = −0.142, p = 0. 195) (Fig.  7b ) or the semantic model fit (r = −0.024, p = 0.825) (Fig.  7c ).

figure 7

Relationship between the peak latency, age and the visual and semantic model fits. ( a ) Correlation between age and peak latency, ( b ) Correlation between peak latency and visual model fit (corrected for age), ( c ) Correlation between peak latency and semantic model fit (corrected for age).

Like above, correlation analyses were conducted to ask if the relationship between the peak latency of the visual model and measures of cognition (fluid intelligence, crystallized intelligence, naming accuracy) were linked, but we found no evidence of this (p > 0.748). Moderation analyses were conducted to test if the relationship between the peak latency of the visual model and measures of cognition varied as a function of age, but no moderation effects were seen (all p’s > 0.1). Therefore, we find no evidence that neural slowing has a dramatic influence on how visual and semantic information is represented.

We investigated differences in object processing across the adult lifespan in a large population-derived sample of cognitively healthy adults using a well-validated model of object processing in the ventral stream 12 . Here, we (1) characterize visual and semantic processes involved in object processing across the adult lifespan, (2) ask if differences in visual processing impact semantics, and (3) evaluate how measures of visual and semantic representations, which we argue reflect the neural responsivity of the visual and semantic processes, relate to cognitive function. We find clear evidence of differences across the adult lifespan in the representation of visual and semantic information: our results show neural slowing and decreases in measures of representation of visual and semantic information with age, while decreased visual effects also relate to decreased semantic effects. In relation to cognition, we see that higher measures of semantic processing are found in subjects with higher naming accuracy, and that higher semantic processing in older age was associated with increased fluid intelligence scores. Together, our results support a view that maintaining high-levels of neural responsivity is associated with both better task-related performance, and more domain general cognitive functions in line with the brain maintenance hypothesis.

Our results demonstrate a relationship between an individual’s semantic processing and both task-specific and domain-general measures of cognition. We find that higher measures of semantic processing were associated with better naming accuracy (Fig.  5d ), showing that the semantic model fits are capturing semantic representations that are related to behaviour. It is well established that picture naming errors increase with age (for a review 41 ), and this has been previously linked to phonological retrieval errors 41 , 42 . Our study adds to this by showing that the semantic processing in the first 400 ms, likely prior to phonological processing, may also contribute to naming errors. We also observed a second relationship between semantic model fit and cognition, where model fit became increasingly related to fluid intelligence with increasing age (Fig.  6 ). Whilst only significant for the semantic model fits, the effects were qualitatively similar and marginally significant for the visual model fits suggesting that neural responsivity overal becames increasingly related to fluid intelligence with increasing age. This illustrates that the capacity to represent visual or semantic information in neural signals, a measure of the neural responsivity of the visual system, could be relevant to a general measure of cognition.

Increased neural responsivity has previously been linked to higher fluid intelligence 31 , 43 and cognitive control 44 . Samu et al . 31 reported that mean-task responsive (MTR) components (also a measure of neural responsivity) that related to task performance showed significant age-related declines. The MTR components in Samu’s study gave an aggregate measure of fMRI task responsivity during either picture naming or a fluid intelligence task, and were able to explain individual variability in task performance. These task related activations further declined with age, and increased with task performance. The majority of voxels contributing to the MTR components were from occipitotemporal cortex, with the implication being that the greater task responsivity, the better that performance will be maintained into older adulthood.

Based on our model fits, which we view as measures of neural responsivity derived from MEG data, we find additional evidence that better neural responsivity plays a role in healthy cognitive ageing. This is further supported by correlations we observe between our model fits and MTR components from Samu et al . 31 for the same participants (data for 63/85 of our participants also in 31 ). There was a strong correlation between the MTR of the fMRI picture naming task and the visual model fit of the same participants in the MEG picture naming task (r = 0.487, p < 0.001). This suggests that the MTR components at least partially reflect the responsitivity of the neural substrate of visual object processing which we derived in this study. In addition, there was a correlation between the MTR of the fMRI picture naming task and the MEG semantic model fit (r = 0.274, p = 0.030). Overall, this provides additional evidence that the model fits are estimates of neural responsivity. Our analyses are consistent with the idea that better cognitive performance is supported by good neural responsivity. We hypothesize that a reduced ability to modulate task-relevant brain networks may contribute to age-related declines in cognition. Like Samu et al . 31 our results are consistent with the brain maintenance hypothesis, which states that individual differences in age-related brain changes, such as neural responsivity, allow some people to show little or no age-related cognitive decline 32 . Thus, retaining youth-like neural function is key to preservation of cognitive performance across the lifespan 45 .

Another mechanism which is sometimes proposed to compensate for potential age-related changes is the recruitment of contralateral and prefrontal regions 46 , 47 . Our study does not allow us to differentiate between maintenance and compensation as the mechanism by which some mature controls perform at similar levels to the younger groups. The focus in our study is the timing and untangling of visual and semantic effects, and did not examine regional effects which would be required to test for top-down compensation mechanisms or the recruitment of additional regions. To the contrary, we elected to avoid assumptions about the localization of our effects at the individual level and used data from all available sensors. Our approach leaves open the possibility of a top-down modulatory process on early visual activity, which would be in line with compensation mechanisms. This notion is supported by connectivity studies showing increased frontal to posterior connectivity during object naming in older adults 48 , 49 . However, our MEG effects did correlate with fMRI-based MTR components that are localized to occipitotemporal cortex which may not be compatible with compensation, suggesting our results are more consistent with the brain maintenance hypothesis than compensation.

Several lines of research suggest an age-related slowing of neural responses to visual stimuli 22 , 24 , 25 , 26 , 27 , 28 . Consistent with this, we demonstrate a clear increase in the delay in visual information processing with increasing age, but found no evidence this delay related to age-related cognitive changes. This may argue against the universality of the general slowing hypothesis, which proposes that general slowing leads to age-related declines in performance 50 . Instead, our data argues that although visual slowing does occur across the adult lifespan, it does not necessarily have detrimental consequences for cognition, while the magnitude of the visual and semantic model fits does relate to both task-specific and domain general measures of cognition. However, it has also been noted that age-related processing speed declines may only impact cognition in task with high cognitive demands 51 , 52 , 53 . In the current study, participants are naming a series of highly familiar, easily nameable pictures, and it could be the case that the age-related visual delay we observed would only have cognitive impacts in more challenging situations.

Our finding that the capacity to represent visual and semantic information is lower in mature adults, might be viewed as supporting evidence for the information degradation hypothesis 54 . Repeatedly, correlations have been observed between visual perceptual decline and cognitive decline across the adult lifespan in large samples 55 , 56 , 57 . The information degradation hypothesis states that degraded perceptual input resulting from age-related neurobiological changes causes a decline in cognitive processes 54 . We find that the capacity to represent visual information correlates with the capacity to represent semantic information, which is consistent with this hypothesis. Because our approach is correlational, we cannot make any claims about the causal nature of the changes in neural responsitivity to visual input on semantic processing. To support the information degradation hypothesis and rule out e.g. the influence of cognition on perceptual processing or other confounding effects, experimental manipulation of perceptual input is required 58 . However, our approach does yield a sensitive method to determine neural responsitivity to visual input at the individual level, which can benefit further work aimed at corroborating or refuting the information degradation hypothesis.

Even though we have made use of a large sample of healthy adults from the population-representative Cam-CAN cohort 34 , we acknowledge the need for longitudinal research to further examine the hypothesis that neural responsivity decreases across the lifespan, and that these changes have an impact on cognitive function. From our cross-sectional sample, we can only assess age-related differences 59 . The relationships which we observe do not allow us to make causal inferences and might also underestimate nonlinear age trends 60 . Secondly, our findings offer only a partial explanation for the variability in naming accuracy and fluid intelligence in older adulthood. Note that we investigated visual and semantic processing during picture naming, but not phonological retrieval and articulatory response generation. A future direction which might explain additional variability in naming accuracy consists of the implementation of explicit phonological and articulatory models to elucidate these 2 processes. A consideration is the relatively high education level across individuals in our sample. The limited variability of education levels across the age ranges precludes claims about the effect of education on brain maintenance. Importantly, including education as a covariate of no interest did not change our results, suggesting that the observed findings are beyond the effects of education. Specifically targeted large population-based samples are needed to investigate this in more detail.

In conclusion, our results show that in healthy elderly adults, visual object processing is slower and the capacity of the brain to represent visual and semantic object information is reduced. In elderly participants, having higher measures of neural responsivity were linked to better measures of fluid intelligence, and higher semantic neural responsivity was associated with higher naming accuracy. These results are in line with the brain maintenance hypothesis, which states that individual differences in age-related brain changes allow some people to show little or no age-related cognitive decline. Our measures of neural responsivity suggest that age-related declines may partly be underpinned by a reduced ability to modulate task-relevant brain networks.

Participants

One hundred and eighteen members of the CamCan cohort of healthy adults aged 18–88 years 34 participated in this study. Exclusion criteria for the Cam-CAN Phase III cohort, that was selected for extensive neuroimaging, included Mini Mental State Examination scores <25 61 , poor vision (<20/50 on the Snellen test 62 ), non-native English speakers, drug abuse, a serious psychiatric condition or serious health conditions (for full exclusion criteria, see 34 ). Informed consent was obtained from all participants and ethical approval for the study was obtained from the Cambridgeshire 2 (now East of England-Cambridge Central) Research Ethics Committee. All experiments were performed in accordance with relevant guidelines and regulations.

From this subset, 85 participants were included in the current analysis. They were all right-handed and were aged 24–87 years (M = 53.2, SD = 18.0, 44 male). Of the initial total of 118 participants, 19 were excluded because of technical problems during data acquisition, 12 were excluded at the preprocessing stage because of poor data quality (see MEG preprocessing) and 2 were excluded because they were strictly left-handed (assessed by means of the Edinburgh Handedness Inventory). The overall education level in this subset of the population-derived cohort was high: 70.2% obtained a degree, and 88.2% obtained at least an A-level certification. In our sample, age negatively correlated with education level (r: −0.365, p < 0.001). The average score on the HADS depression scale was 2.48 (s.d. 2.89) and on the HADS anxiety scale 4.73 (s.d. 3.34), and these scores did not correlate with age in our dataset (p > 0.167).

Experimental design

Participants named pictures of single objects at the basic-level (e.g., “tiger”,”broom”). The stimulus set is the same as in Clarke et al . 4 and consisted of 302 items from a variety of superordinate categories that represented concepts from an anglicized version of a large property generation study 18 , 40 . The items were presented as colour photographs of single objects on a white background. Each trial began with a black fixation cross (500 ms), followed by presentation of the item (500 ms). Afterwards a blank screen was shown, lasting between 2400 and 2700 ms. Each item was presented once. The order of stimuli was pseudo-randomized such that consecutive stimuli were not phonologically related (i.e., shared an initial phoneme) and no more than 4 living or non-living items could occur in a row. Stimuli were presented using Eprime (version 2; Psychology Software Tools, Pittsburgh, PA, USA) and answers were recorded by the experimenter. Offline, responses were checked for accuracy (synonyms, e.g. “couch” for “sofa”, were scored as correct).

Crystallized and fluid intelligence tests were administered offline during a prior stage of the Cam-CAN study 34 . Crystallized intelligence was measured using the Spot the Word test in which participants performed a lexical decision task on word-nonword pairs (e.g. pinnace-strummage) 37 . This test was designed to measure lifetime acquisition of knowledge. Fluid intelligence was measured using the Cattell Culture Fair, Scale 2 Form A, a timed pen-and-paper test in which participants performed 4 subtests with different types of nonverbal puzzles: series completion, classification, matrices and conditions 38 .

Stimulus measures

Visual information for each item was derived from the AlexNet deep convolutional neural network model 39 , as implemented in the Caffe deep learning framework 63 , and trained on the ILSVRC12 classification data set from ImageNet. We used the layers 2 to 7 of the DNN, consisting of five convolutional layers (conv2–conv5) followed by two fully connected layers (fc6 and fc7). The convolutional kernels learned in each convolutional layer correspond to filters receptive to particular kinds of visual input (conv1 was discarded because conv2 has been shown to mimic the activity in early visual cortex more closely than conv1 8 , 19 ). We presented our 302 stimuli to the DNN which produced activation values for all nodes in each layer of the network for each image. Activation values for all nodes were concatenated across layers, resulting in an objects by nodes matrix. PCA reduction was used to obtain 100 components, otherwise the blank space surrounding objects would be represented across a large number of nodes 64 .

The semantic measures used were the same as those used as in Clarke et al . 4 , and derived from semantic feature norms 18 , 40 . For every concept, these feature norms consist of an extensive list of features generated by participants in response to this concept. These features are visual, auditory, tactile, encyclopedic, etc. The relationship between items can be captured through the similarity of their features, where similar concepts will share many features, while the distinctive properties of a concept will differentiate it from other category members. For each of the 302 concepts, a binary vector indicates whether semantic features (N = 1510) are associated with the concept or not. PCA was used to reduce the concept-feature matrix from 1510 features for every concept, to 6 components for every concept.

MEG/MRI recording

MEG and MRI acquisition in the Cam-CAN cohort is described in detail in Taylor et al . 65 . Continuous MEG data were recorded using a whole-head 306 channel (102 magnetometers, 204 planar gradiometers) Vector-view system (Elekta Neuromag, Helsinki, Finland) located at the MRC Cognition and Brain Sciences Unit, Cambridge, UK. Participants were in a seated position. Eye movements were recorded with electro-oculogram (EOG) electrodes. ECG was recorded by means of one pair of bipolar electrodes. Five head-position indicator (HPI) coils were used to record the head position within the MEG helmet every 200 ms. The participant’s head shape was digitally recorded using >50 measuring points by means of a 3D digitizer (Fastrak Polhemus, Inc., Colchester, VA, USA) along with the position of the EOG electrodes, HPI coils and fiducial points (nasion, left and right periauricular). MEG signals were recorded at a sampling rate of 1000 Hz, with a highpass filter of 0.03 Hz. If required, participants were given MEG-compatible glasses to correct their vision.

MEG preprocessing

Initial preprocessing of the raw data used MaxFilter version 2.2 (Elekta-Neuromag Oy, Helsinki, Finland) as described in the Cam-CAN pipeline 66 . For each run, temporal signal space separation 67 was applied to remove noise from external sources and from HPI coils for continuous head-motion correction (correlation threshold: 0.98, 10 s sliding window), and to virtually transform data to a common head position. MaxFilter was also used to remove mains-frequency noise (50 Hz notch filter) and automatically detect and virtually reconstruct noisy channels.

Further preprocessing was performed using SPM12 (Wellcome Institute of Imaging Neuroscience, London, UK). MEG data were low-pass filtered at 200 Hz (fifth order Butterworth filter) and high-pass filtered at 0.1 Hz (fourth order Butterworth filter). Initial epoching from −1s to 1 s was performed before artifact removal by means of Independent Component Analysis (ICA) using RUNICA 68 . Artifactual components were identified using the SASICA toolbox 69 consisting of components related to blinks, eye movements, rare events, muscle artifacts and saccades. Spatial topographies of the ICs suggested by SASICA were visually inspected prior to their rejection. Finally, IC epochs were averaged and correlated with a “speech template” curve that was modelled as a sigmoidal curve with a slope starting at 200 ms reaching a plateau at 1200 ms. ICs with a correlation of >0.8 were removed. ICA was applied to magnetometers and gradiometers separately. Following ICA, items that were not correctly named or only named after a hesitation period, were excluded from further analysis at the subject level. Finally, MEG data were baseline corrected (time window: −200 to 0 ms) and cropped to the epoch of interest from −200 ms to 600 ms. Temporal signal-to-noise ratio (tSNR) was calculated as the ratio between the mean and standard deviation for the baseline period. Participants with tSNR < 1 were excluded from further processing (N = 12). No significant tSNR differences were observed between age groups (p = 0.183). Data were downsampled to 100 Hz to obtain manageable computing times.

Visual model fit

Using lasso linear regression, R 2 values were calculated that captured how well the MEG signals (dependent variable) were modelled by the AlexNet model (independent variables) (Fig.  2a ). Lasso regression was used to avoid overfitting. Lasso regression was implemented using glmnet for Matlab 70 where the regularization parameter lambda was set using 10 fold cross-validation from a set of 100 potential lambda values defined automatically based on the data. Using the optimal lambda value, R² was calculated for each participant at each timepoint and sensor independently. To derive one model fit value per timepoint, R² values were subsequently averaged across all sensors (magnetometers and gradiometers) to construct a time course for every participant. We averaged across all sensors because visual object processing elicits widespread neural responses and the distribution of these responses might vary between individuals and age groups. For this reason, we did not want to make any assumptions by using predefined regions. To correct for individual differences in model fit unrelated to object processing, we subtracted the average R² values before stimulus onset (−200 to 0 ms) from the R² values after stimulus onset (0 to 600 ms) for every participant. To obtain a measure of each individual’s peak model fit latency, a mean template across all participants was constructed, before each individual’s timecourse was virtually shifted in 10 ms steps relative to the mean template to find the maximal correlation to the template 28 . The individual’s peak latency was calculated from the mean peak latency (110 ms) and the shift needed to maximally correlate to the template (Fig.  2c ). The individual visual model fit is the visual model fit averaged across sensors at the individual’s peak latency (Fig.  2c ).

Semantic model fit

In a second step, multiple linear regression was performed between the semantic model and the residuals from the visual model fit (Fig.  2b ). A time windows of interest between 150 and 400 ms was derived from Clarke et al . 4 (note that the 14 participants from Clarke et al . 4 are not part of the Cam-CAN cohort). An individual’s semantic model fit was calculated by averaging across time points and sensors between 150 and 400 ms. In this way, we are modelling semantic information in a very stringent way, that is over and above what the AlexNet model can explain. By regressing out the visual model, all variability which can be explained by the visual model will be removed from the MEG signals.

Statistical analysis

To test for age-related changes in visual and semantic processing, the measures of visual and semantic model fit, as well as the measure of peak latency, were correlated with age. Secondly, we investigated the relationships between peak latency, visual model fit and semantic model fit and added age as a covariate of no interest. Next, we correlated peak latency, visual model fit and semantic model fit on one hand with our cognitive measures (naming accuracy, fluid or crystallized intelligence) on the other hand with age as a covariate of no interest.

Using moderation analysis, we test whether the relationship between visual or semantic model fits and our cognitive measures is different across the age groups. As in Samu et al . 31 , we used multiple linear regression with an interaction term to test the potential moderation effect of age on the relation between two other variables 71 . More specifically, if we wanted to investigate the relation between X and Y, and Z is the moderator variable “age” to be tested, we ran a multiple linear regression with Y as the dependent variable, and X, Z and the interaction term XZ as predictor variables. A significantly non-zero coefficient of predictor XZ would in turn indicate a moderator effect of Z (“age”) on the relationship between X and Y. In all correlation and moderation analyses, gender was added as a covariate of no interest 31 . Normality was assessed using Q-Q plots and homogeneity of variances was determined by Fligner-Killeen’s Equality of Variances test.

The statistical analyses were performed using all 85 subjects (24–87 years old), with age treated as a continuous variable. However, visualization of e.g. moderation analysis is not always straightforward. Therefore, for visualization purposes we split the dataset in three equal groups of 21 subjects each which were separated by a ten year age gap to highlight changes between age groups. The youngest group consisted of all participants between 24 and 37 years old (12 female, 9 male), the middle-aged group consisted of all participants between 47 and 60 years old (10 female, 11 male), the oldest group consisted of all participants between 70 and 87 years old (10 female, 11 male).

Data Availability

The data set analysed in this study is part of the Cambridge Centre for Ageing and Neuroscience (Cam-CAN) research project ( www.cam-can.com ). The entire Cam-CAN dataset will be made publicly available in the future.

Schendan, H. E. & Maher, S. M. Object knowledge during entry-level categorization is activated and modified by implicit memory after 200 ms. NeuroImage 44 , 1423–1438 (2009).

Article   Google Scholar  

Clarke, A., Taylor, K. I. & Tyler, L. K. The evolution of meaning: spatio-temporal dynamics of visual object recognition. J. Cogn. Neurosci. 23 , 1887–1899 (2011).

Clarke, A., Taylor, K. I., Devereux, B., Randall, B. & Tyler, L. K. From Perception to Conception: How Meaningful Objects Are Processed over Time. Cereb. Cortex 23 , 187–197 (2013).

Clarke, A., Devereux, B. J., Randall, B. & Tyler, L. K. Predicting the Time Course of Individual Objects with MEG. Cereb. Cortex 25 , 3602–3612 (2015).

Cichy, R. M., Pantazis, D. & Oliva, A. Resolving human object recognition in space and time. Nat. Neurosci. 17 , 455–462 (2014).

Article   CAS   Google Scholar  

Leonardelli, E., Fait, E. & Fairhall, S. L. Temporal dynamics of access to amodal representations of category-level conceptual information. Sci. Rep. 9 , 239 (2019).

Article   ADS   Google Scholar  

Kaiser, D., Azzalini, D. C. & Peelen, M. V. Shape-independent object category responses revealed by MEG and fMRI decoding. J. Neurophysiol. 115 , 2246–2250 (2016).

Cichy, R. M., Khosla, A., Pantazis, D., Torralba, A. & Oliva, A. Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. Sci. Rep . 6 (2016).

Isik, L., Meyers, E. M., Leibo, J. Z. & Poggio, T. The dynamics of invariant object recognition in the human visual system. J. Neurophysiol. 111 , 91–102 (2014).

Schendan, H. E. & Ganis, G. Electrophysiological potentials reveal cortical mechanisms for mental imagery, mental simulation, and grounded (embodied) cognition. Front. Psychol. 3 , 329 (2012).

Kravitz, D. J., Saleem, K. S., Baker, C. I., Ungerleider, L. G. & Mishkin, M. The ventral visual pathway: an expanded neural framework for the processing of object quality. Trends Cogn. Sci. 17 , 26–49 (2013).

Clarke, A. & Tyler, L. K. Understanding What We See: How We Derive Meaning From Vision. Trends Cogn. Sci. 19 , 677–687 (2015).

Chen, Y. et al . The ‘when’ and ‘where’ of semantic coding in the anterior temporal lobe: Temporal representational similarity analysis of electrocorticogram data. Cortex 79 , 1–13 (2016).

Rupp, K. et al . Semantic attributes are encoded in human electrocorticographic signals during visual object recognition. NeuroImage 148 , 318–329 (2017).

Moss, H. E. Anteromedial Temporal Cortex Supports Fine-grained Differentiation among Objects. Cereb. Cortex 15 , 616–627 (2004).

Tyler, L. K. et al . Objects and categories: feature statistics and object processing in the ventral stream. J. Cogn. Neurosci. 25 , 1723–1735 (2013).

Serre, Wolf & Poggio. Object Recognition with Features Inspired by Visual Cortex. In Computer Vision and pattern recognition (2005).

Taylor, K. I., Devereux, B. J., Acres, K., Randall, B. & Tyler, L. K. Contrasting effects of feature-based statistics on the categorisation and basic-level identification of visual objects. Cognition 122 , 363–374 (2012).

Devereux, B. J., Clarke, A. & Tyler, L. K. Integrated deep visual and semantic attractor neural networks predict fMRI pattern-information along the ventral object processing pathway. Sci. Rep. 8 , 10636 (2018).

Bruffaerts, R. et al . Redefining the resolution of semantic knowledge in the brain: advances made by the introduction of models of semantics in neuroimaging. Neurosci. Biobehav. Rev , https://doi.org/10.1016/j.neubiorev.2019.05.015 (2019).

Chaby, L., George, N., Renault, B. & Fiori, N. Age-related changes in brain responses to personally known faces: an event-related potential (ERP) study in humans. Neurosci. Lett. 349 , 125–129 (2003).

Onofrj, M., Thomas, A., Iacono, D., D’Andreamatteo, G. & Paci, C. Age-related changes of evoked potentials. Neurophysiol. Clin. Clin. Neurophysiol. 31 , 83–103 (2001).

Spear, P. D. Neural bases of visual deficits during aging. Vision Res. 33 , 2589–2609 (1993).

Allison, T., Hume, A. L., Wood, C. C. & Goff, W. R. Developmental and aging changes in somatosensory, auditory and visual evoked potentials. Electroencephalogr. Clin. Neurophysiol. 58 , 14–24 (1984).

Nakamura, A. et al . Age-related changes in brain neuromagnetic responses to face perception in humans. Neurosci. Lett. 312 , 13–16 (2001).

Rousselet, G. A. et al . Age-related delay in information accrual for faces: evidence from a parametric, single-trial EEG approach. BMC Neurosci. 10 , 114 (2009).

Rousselet, G. A. et al . Healthy aging delays scalp EEG sensitivity to noise in a face discrimination task. Front. Psychol. 1 , 19 (2010).

PubMed   PubMed Central   Google Scholar  

Price et al . Age-Related Delay in Visual and Auditory Evoked Responses is Mediated by White- and Gray-matter Differences. Nat. Commun (2017).

Grady, C. The cognitive neuroscience of ageing. Nat. Rev. Neurosci. 13 , 491–505 (2012).

Geerligs, L. & Tsvetanov, K. A. The use of resting state data in an integrative approach to studying neurocognitive ageing–commentary on Campbell and Schacter (2016). Lang. Cogn. Neurosci . 32 (2017).

Samu, D. et al . Preserved cognitive functions with age are determined by domain-dependent shifts in network responsivity. Nat. Commun. 8 , ncomms14743 (2017).

Nyberg, L., Lövdén, M., Riklund, K., Lindenberger, U. & Bäckman, L. Memory aging and brain maintenance. Trends Cogn. Sci. 16 , 292–305 (2012).

Tsvetanov, K. A. et al . The effect of ageing on fMRI: Correction for the confounding effects of vascular reactivity evaluated by joint fMRI and MEG in 335 adults. Hum. Brain Mapp. 36 , 2248–2269 (2015).

Shafto, M. A. et al . The Cambridge Centre for Ageing and Neuroscience (Cam-CAN) study protocol: a cross-sectional, lifespan, multidisciplinary examination of healthy cognitive ageing. BMC Neurol. 14 , 204 (2014).

Salthouse, T. A. Quantity and structure of word knowledge across adulthood. Intelligence 46 , 122–130 (2014).

Campbell, K. L. et al . Robust Resilience of the Frontotemporal Syntax System to Aging. J. Neurosci. 36 , 5214–5227 (2016).

Baddeley, A., Emslie, H. & Nimmo-Smith, I. The Spot-the-Word test: A robust estimate of verbal intelligence based on lexical decision. Br. J. Clin. Psychol. 32 , 55–65 (1993).

Cattell, R. B. & Cattell, A. K. S. Handbook for the individual or group Culture Fair Intelligence Test (1960).

Krizhevsky, A., Sutskever, I. & Hinton, G. E. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems 1097–1105 (2012).

McRae, K., Cree, G. S., Seidenberg, M. S. & Mcnorgan, C. Semantic feature production norms for a large set of living and nonliving things. Behav. Res. Methods 37 , 547–559 (2005).

Burke, D. M., Shafto, M. A., Craik, F. I. M. & Salthouse, T. A. Language and aging. Handb. Aging Cogn. 3 , 373–443 (2008).

Google Scholar  

Shafto, M. A., James, L. E., Abrams, L. & Tyler, L. K., Cam-CAN. Age-Related Increases in Verbal Knowledge Are Not Associated With Word Finding Problems in the Cam-CAN Cohort: What You Know Won’t Hurt You. J. Gerontol. B. Psychol. Sci. Soc. Sci. 72 , 100–106 (2017).

Tsvetanov, K. A. et al . Extrinsic and Intrinsic Brain Network Connectivity Maintains Cognition across the Lifespan Despite Accelerated Decay of Regional Brain Activation. J. Neurosci. 36 , 3115–3126 (2016).

Tsvetanov, K. A. et al . Activity and Connectivity Differences Underlying Inhibitory Control Across the Adult Life Span. J. Neurosci. 38 , 7887–7900 (2018).

Düzel, E., Schütze, H., Yonelinas, A. P. & Heinze, H.-J. Functional phenotyping of successful aging in long-term memory: Preserved performance in the absence of neural compensation. Hippocampus 21 , 803–814 (2011).

PubMed   Google Scholar  

Park, D. C. & Reuter-Lorenz, P. The adaptive brain: aging and neurocognitive scaffolding. Annu. Rev. Psychol. 60 , 173–196 (2009).

Davis, S. W., Dennis, N. A., Daselaar, S. M., Fleck, M. S. & Cabeza, R. Que PASA? The posterior-anterior shift in aging. Cereb. Cortex N. Y. N 1991 18 , 1201–1209 (2008).

Gilbert, J. R. & Moran, R. J. Inputs to prefrontal cortex support visual recognition in the aging brain. Sci. Rep . 6 (2016).

Hoyau, E. et al . Aging modulates fronto-temporal cortical interactions during lexical production. A dynamic causal modeling study. Brain Lang. 184 , 11–19 (2018).

Salthouse, T. A Theory of Cognitive Aging . (Elsevier, 1985).

Salthouse, T. A. Aging associations: influence of speed on adult age differences in associative learning. J. Exp. Psychol. Learn. Mem. Cogn. 20 , 1486–1503 (1994).

Salthouse, T. A. The processing-speed theory of adult age differences in cognition. Psychol. Rev. 103 , 403–428 (1996).

Guest, D., Howard, C. J., Brown, L. A. & Gleeson, H. Aging and the rate of visual information processing. J. Vis. 15 , 10 (2015).

Schneider, B., Pichora-Fuller, M., Craik, F. I. M. & Salthouse, T. A. Implication of perceptual deterioration for cognitive aging research. In The handbook of Aging and Cognition 155–219 (2008).

Roberts, K. L. & Allen, H. A. Perception and Cognition in the Ageing Brain: A Brief Review of the Short- and Long-Term Links between Perceptual and Cognitive Decline. Front. Aging Neurosci . 8 (2016).

Chen, S. P., Bhattacharya, J. & Pershing, S. Association of Vision Loss With Cognition in Older Adults. JAMA Ophthalmol , https://doi.org/10.1001/jamaophthalmol.2017.2838 (2017).

Li, K. Z. H. & Lindenberger, U. Relations between aging sensory/sensorimotor and cognitive functions. Neurosci. Biobehav. Rev. 26 , 777–783 (2002).

Monge, Z. A. & Madden, D. J. Linking Cognitive and Visual Perceptual Decline in Healthy Aging: The Information Degradation Hypothesis. Neurosci. Biobehav. Rev. 69 , 166–173 (2016).

Salthouse, T. A. Neuroanatomical substrates of age-related cognitive decline. Psychol. Bull. 137 , 753–784 (2011).

Raz, N. & Lindenberger, U. Only time will tell: Cross-sectional studies offer no solution to the age–brain–cognition triangle: Comment on Salthouse (2011). Psychol. Bull. 137 , 790–795 (2011).

Folstein, M. F., Folstein, S. E. & McHugh, P. R. “Mini-mental state”. J. Psychiatr. Res. 12 , 189–198 (1975).

Snellen, H. Probebuchstaben zur bestimmung der sehscharfe . (Van de Weijer, 1862).

Jia, Y. et al . Caffe: Convolutional Architecture for Fast Feature Embedding. ArXiv14085093 Cs (2014).

Clarke, A., Devereux, B. J. & Tyler, L. K. Oscillatory Dynamics of Perceptual to Conceptual Transformations in the Ventral Visual Pathway. J. Cogn. Neurosci. 30 , 1590–1605 (2018).

Taylor, J. R. et al . The Cambridge Centre for Ageing and Neuroscience (Cam-CAN) data repository: Structural and functional MRI, MEG, and cognitive data from a cross-sectional adult lifespan sample. NeuroImage 144 , 262–269 (2017).

Taylor, J. R. et al . The Cambridge Centre for Ageing and Neuroscience (Cam-CAN) data repository: Structural and functional MRI, MEG, and cognitive data from a cross-sectional adult lifespan sample. NeuroImage , https://doi.org/10.1016/j.neuroimage.2015.09.018 (2015).

Taulu, S., Simola, J. & Kajola, M. Applications of the signal space separation method. IEEE Trans. Signal Process. 53 , 3359–3372 (2005).

Article   ADS   MathSciNet   Google Scholar  

Delorme, A. & Makeig, S. EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J. Neurosci. Methods 134 , 9–21 (2004).

Chaumon, M., Bishop, D. V. M. & Busch, N. A. A practical guide to the selection of independent components of the electroencephalogram for artifact correction. J. Neurosci. Methods 250 , 47–63 (2015).

Qian, J., Hastie, T., Friedman, J., Tibshirani, R. & Simon, N. Glmnet for Matlab . Date of access: 2019 (2013).

Hayes, A. F. Introduction to mediation, moderation, and conditional process analysis: a regression-based approach . (The Guilford Press, 2013).

Download references

Acknowledgements

RB is a postdoctoral fellow of the Research Foundation Flanders (F.W.O.). The Cambridge Centre for Ageing and Neuroscience (Cam-CAN) research was supported by the Biotechnology and Biological Sciences Research Council (grant number BB/H008217/1). LKT, AC, MS are supported by an ERC Advanced Investigator Grant no 669820 awarded to LKT. KAT is supported by a British Academy Postdoctoral Fellowship (PF160048). We thank the Cam-CAN respondents and their primary care teams in Cambridge for their participation in this study.

Author information

Authors and affiliations.

Department of Psychology, University of Cambridge, Cambridge, CB2 3EB, UK

Rose Bruffaerts, Lorraine K. Tyler, Meredith Shafto, Kamen A. Tsvetanov, William D. Marslen-Wilson & Alex Clarke

Laboratory for Cognitive Neurology, Department of Neurosciences, University of Leuven, 3000, Leuven, Belgium

Rose Bruffaerts

Neurology Department, University Hospitals Leuven, 3000, Leuven, Belgium

Cambridge Centre for Ageing and Neuroscience (Cam-CAN), University of Cambridge and MRC Cognition and Brain Sciences Unit, Cambridge, CB2 7EF, UK

Lorraine K. Tyler, Kamen A. Tsvetanov, Carol Brayne, Edward T. Bullmore, Andrew C. Calder, Rhodri Cusack, Tim Dalgleish, John Duncan, Richard N. Henson, Fiona E. Matthews, William D. Marslen-Wilson, James B. Rowe, Karen Campbell, Teresa Cheung, Simon Davis, Linda Geerligs, Rogier Kievit, Anna McCarrey, Abdur Mustafa, Darren Price, David Samu, Jason R. Taylor, Matthias Treder, Janna van Belle, Nitin Williams, Lauren Bates, Tina Emery, Sharon Erzinçlioglu, Andrew Gadie, Sofia Gerbase, Stanimira Georgieva, Claire Hanley, Beth Parkin, David Troy, Tibor Auer, Marta Correia, Lu Gao, Emma Green, Rafael Henriques, Jodie Allen, Gillian Amery, Liana Amunts, Anne Barcroft, Amanda Castle, Cheryl Dias, Jonathan Dowrick, Melissa Fair, Hayley Fisher, Anna Goulding, Adarsh Grewal, Geoff Hale, Andrew Hilton, Frances Johnson, Patricia Johnston, Thea Kavanagh-Williamson, Magdalena Kwasniewska, Alison McMinn, Kim Norman, Jessica Penrose, Fiona Roby, Diane Rowland, John Sargeant, Maggie Squire, Beth Stevens, Aldabra Stoddart, Cheryl Stone, Tracy Thompson, Ozlem Yazlik, Dan Barnes, Marie Dixon, Jaya Hillman, Joanne Mitchell & Laura Villis

Author notes

A comprehensive list of consortium members appears at the end of the paper.

You can also search for this author in PubMed   Google Scholar

Cambridge Centre for Ageing and Neuroscience

  • Carol Brayne
  • , Edward T. Bullmore
  • , Andrew C. Calder
  • , Rhodri Cusack
  • , Tim Dalgleish
  • , John Duncan
  • , Richard N. Henson
  • , Fiona E. Matthews
  • , William D. Marslen-Wilson
  • , James B. Rowe
  • , Karen Campbell
  • , Teresa Cheung
  • , Simon Davis
  • , Linda Geerligs
  • , Rogier Kievit
  • , Anna McCarrey
  • , Abdur Mustafa
  • , Darren Price
  • , David Samu
  • , Jason R. Taylor
  • , Matthias Treder
  • , Janna van Belle
  • , Nitin Williams
  • , Lauren Bates
  • , Tina Emery
  • , Sharon Erzinçlioglu
  • , Andrew Gadie
  • , Sofia Gerbase
  • , Stanimira Georgieva
  • , Claire Hanley
  • , Beth Parkin
  • , David Troy
  • , Tibor Auer
  • , Marta Correia
  • , Emma Green
  • , Rafael Henriques
  • , Jodie Allen
  • , Gillian Amery
  • , Liana Amunts
  • , Anne Barcroft
  • , Amanda Castle
  • , Cheryl Dias
  • , Jonathan Dowrick
  • , Melissa Fair
  • , Hayley Fisher
  • , Anna Goulding
  • , Adarsh Grewal
  • , Geoff Hale
  • , Andrew Hilton
  • , Frances Johnson
  • , Patricia Johnston
  • , Thea Kavanagh-Williamson
  • , Magdalena Kwasniewska
  • , Alison McMinn
  • , Kim Norman
  • , Jessica Penrose
  • , Fiona Roby
  • , Diane Rowland
  • , John Sargeant
  • , Maggie Squire
  • , Beth Stevens
  • , Aldabra Stoddart
  • , Cheryl Stone
  • , Tracy Thompson
  • , Ozlem Yazlik
  • , Dan Barnes
  • , Marie Dixon
  • , Jaya Hillman
  • , Joanne Mitchell
  •  & Laura Villis

Contributions

R.B. Contributed unpublished analytic tools, Analyzed data, Wrote the paper, Made figures. L.K.T. Designed research, Analyzed data, Wrote the paper. M.S. Designed research, contributed unpublished analytic tools, Analyzed data. K.T. Performed research, contributed unpublished analytic tools, Analyzed data. Cam-CAN: Designed research, Performed research. A.C. Designed research, Contributed unpublished analytic tools, Analyzed data, Wrote the paper. All authors are in agreement on the final version of the paper. CamCan Corporate Authorship Membership.

Corresponding author

Correspondence to Lorraine K. Tyler .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Bruffaerts, R., Tyler, L.K., Shafto, M. et al. Perceptual and conceptual processing of visual objects across the adult lifespan. Sci Rep 9 , 13771 (2019). https://doi.org/10.1038/s41598-019-50254-5

Download citation

Received : 14 February 2019

Accepted : 02 September 2019

Published : 24 September 2019

DOI : https://doi.org/10.1038/s41598-019-50254-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Recurrent connectivity supports higher-level visual and semantic object representations in the brain.

  • Jacqueline von Seth
  • Victoria I. Nicholls
  • Alex Clarke

Communications Biology (2023)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

what perceptual hypothesis

SEP home page

  • Table of Contents
  • Random Entry
  • Chronological
  • Editorial Information
  • About the SEP
  • Editorial Board
  • How to Cite the SEP
  • Special Characters
  • Advanced Tools
  • Support the SEP
  • PDFs for SEP Friends
  • Make a Donation
  • SEPIA for Libraries
  • Back to Entry
  • Entry Contents
  • Entry Bibliography
  • Academic Tools
  • Friends PDF Preview
  • Author and Citation Info
  • Back to Top

Supplement to Philosophy of Linguistics

Whorfianism.

Emergentists tend to follow Edward Sapir in taking an interest in interlinguistic and intralinguistic variation. Linguistic anthropologists have explicitly taken up the task of defending a famous claim associated with Sapir that connects linguistic variation to differences in thinking and cognition more generally. The claim is very often referred to as the Sapir-Whorf Hypothesis (though this is a largely infelicitous label, as we shall see).

This topic is closely related to various forms of relativism—epistemological, ontological, conceptual, and moral—and its general outlines are discussed elsewhere in this encyclopedia; see the section on language in the Summer 2015 archived version of the entry on relativism (§3.1). Cultural versions of moral relativism suggest that, given how much cultures differ, what is moral for you might depend on the culture you were brought up in. A somewhat analogous view would suggest that, given how much language structures differ, what is thinkable for you might depend on the language you use. (This is actually a kind of conceptual relativism, but it is generally called linguistic relativism, and we will continue that practice.)

Even a brief skim of the vast literature on the topic is not remotely plausible in this article; and the primary literature is in any case more often polemical than enlightening. It certainly holds no general answer to what science has discovered about the influences of language on thought. Here we offer just a limited discussion of the alleged hypothesis and the rhetoric used in discussing it, the vapid and not so vapid forms it takes, and the prospects for actually devising testable scientific hypotheses about the influence of language on thought.

Whorf himself did not offer a hypothesis. He presented his “new principle of linguistic relativity” (Whorf 1956: 214) as a fact discovered by linguistic analysis:

When linguists became able to examine critically and scientifically a large number of languages of widely different patterns, their base of reference was expanded; they experienced an interruption of phenomena hitherto held universal, and a whole new order of significances came into their ken. It was found that the background linguistic system (in other words, the grammar) of each language is not merely a reproducing instrument for voicing ideas but rather is itself the shaper of ideas, the program and guide for the individual’s mental activity, for his analysis of impressions, for his synthesis of his mental stock in trade. Formulation of ideas is not an independent process, strictly rational in the old sense, but is part of a particular grammar, and differs, from slightly to greatly, between different grammars. We dissect nature along lines laid down by our native languages. The categories and types that we isolate from the world of phenomena we do not find there because they stare every observer in the face; on the contrary, the world is presented in a kaleidoscopic flux of impressions which has to be organized by our minds—and this means largely by the linguistic systems in our minds. We cut nature up, organize it into concepts, and ascribe significances as we do, largely because we are parties to an agreement to organize it in this way—an agreement that holds throughout our speech community and is codified in the patterns of our language. The agreement is, of course, an implicit and unstated one, but its terms are absolutely obligatory ; we cannot talk at all except by subscribing to the organization and classification of data which the agreement decrees. (Whorf 1956: 212–214; emphasis in original)

Later, Whorf’s speculations about the “sensuously and operationally different” character of different snow types for “an Eskimo” (Whorf 1956: 216) developed into a familiar journalistic meme about the Inuit having dozens or scores or hundreds of words for snow; but few who repeat that urban legend recall Whorf’s emphasis on its being grammar, rather than lexicon, that cuts up and organizes nature for us.

In an article written in 1937, posthumously published in an academic journal (Whorf 1956: 87–101), Whorf clarifies what is most important about the effects of language on thought and world-view. He distinguishes ‘phenotypes’, which are overt grammatical categories typically indicated by morphemic markers, from what he called ‘cryptotypes’, which are covert grammatical categories, marked only implicitly by distributional patterns in a language that are not immediately apparent. In English, the past tense would be an example of a phenotype (it is marked by the - ed suffix in all regular verbs). Gender in personal names and common nouns would be an example of a cryptotype, not systematically marked by anything. In a cryptotype, “class membership of the word is not apparent until there is a question of using it or referring to it in one of these special types of sentence, and then we find that this word belongs to a class requiring some sort of distinctive treatment, which may even be the negative treatment of excluding that type of sentence” (p. 89).

Whorf’s point is the familiar one that linguistic structure is comprised, in part, of distributional patterns in language use that are not explicitly marked. What follows from this, according to Whorf, is not that the existing lexemes in a language (like its words for snow) comprise covert linguistic structure, but that patterns shared by word classes constitute linguistic structure. In ‘Language, mind, and reality’ (1942; published posthumously in Theosophist , a magazine published in India for the followers of the 19th-century spiritualist Helena Blavatsky) he wrote:

Because of the systematic, configurative nature of higher mind, the “patternment” aspect of language always overrides and controls the “lexation”…or name-giving aspect. Hence the meanings of specific words are less important than we fondly fancy. Sentences, not words, are the essence of speech, just as equations and functions, and not bare numbers, are the real meat of mathematics. We are all mistaken in our common belief that any word has an “exact meaning.” We have seen that the higher mind deals in symbols that have no fixed reference to anything, but are like blank checks, to be filled in as required, that stand for “any value” of a given variable, like …the x , y , z of algebra. (Whorf 1942: 258)

Whorf apparently thought that only personal and proper names have an exact meaning or reference (Whorf 1956: 259).

For Whorf, it was an unquestionable fact that language influences thought to some degree:

Actually, thinking is most mysterious, and by far the greatest light upon it that we have is thrown by the study of language. This study shows that the forms of a person’s thoughts are controlled by inexorable laws of pattern of which he is unconscious. These patterns are the unperceived intricate systematizations of his own language—shown readily enough by a candid comparison and contrast with other languages, especially those of a different linguistic family. His thinking itself is in a language—in English, in Sanskrit, in Chinese. [footnote omitted] And every language is a vast pattern-system, different from others, in which are culturally ordained the forms and categories by which the personality not only communicates, but analyzes nature, notices or neglects types of relationship and phenomena, channels his reasoning, and builds the house of his consciousness. (Whorf 1956: 252)

He seems to regard it as necessarily true that language affects thought, given

  • the fact that language must be used in order to think, and
  • the facts about language structure that linguistic analysis discovers.

He also seems to presume that the only structure and logic that thought has is grammatical structure. These views are not the ones that after Whorf’s death came to be known as ‘the Sapir-Whorf Hypothesis’ (a sobriquet due to Hoijer 1954). Nor are they what was called the ‘Whorf thesis’ by Brown and Lenneberg (1954) which was concerned with the relation of obligatory lexical distinctions and thought. Brown and Lenneberg (1954) investigated this question by looking at the relation of color terminology in a language and the classificatory abilities of the speakers of that language. The issue of the relation between obligatory lexical distinctions and thought is at the heart of what is now called ‘the Sapir-Whorf Hypothesis’ or ‘the Whorf Hypothesis’ or ‘Whorfianism’.

1. Banal Whorfianism

No one is going to be impressed with a claim that some aspect of your language may affect how you think in some way or other; that is neither a philosophical thesis nor a psychological hypothesis. So it is appropriate to set aside entirely the kind of so-called hypotheses that Steven Pinker presents in The Stuff of Thought (2007: 126–128) as “five banal versions of the Whorfian hypothesis”:

  • “Language affects thought because we get much of our knowledge through reading and conversation.”
  • “A sentence can frame an event, affecting the way people construe it.”
  • “The stock of words in a language reflects the kinds of things its speakers deal with in their lives and hence think about.”
  • “[I]f one uses the word language in a loose way to refer to meanings,… then language is thought.”
  • “When people think about an entity, among the many attributes they can think about is its name.”

These are just truisms, unrelated to any serious issue about linguistic relativism.

We should also set aside some methodological versions of linguistic relativism discussed in anthropology. It may be excellent advice to a budding anthropologist to be aware of linguistic diversity, and to be on the lookout for ways in which your language may affect your judgment of other cultures; but such advice does not constitute a hypothesis.

2. The so-called Sapir-Whorf hypothesis

The term “Sapir-Whorf Hypothesis” was coined by Harry Hoijer in his contribution (Hoijer 1954) to a conference on the work of Benjamin Lee Whorf in 1953. But anyone looking in Hoijer’s paper for a clear statement of the hypothesis will look in vain. Curiously, despite his stated intent “to review and clarify the Sapir-Whorf hypothesis” (1954: 93), Hoijer did not even attempt to state it. The closest he came was this:

The central idea of the Sapir-Whorf hypothesis is that language functions, not simply as a device for reporting experience, but also, and more significantly, as a way of defining experience for its speakers.

The claim that “language functions…as a way of defining experience” appears to be offered as a kind of vague metaphysical insight rather than either a statement of linguistic relativism or a testable hypothesis.

And if Hoijer seriously meant that what qualitative experiences a speaker can have are constituted by that speaker’s language, then surely the claim is false. There is no reason to doubt that non-linguistic sentient creatures like cats can experience (for example) pain or heat or hunger, so having a language is not a necessary condition for having experiences. And it is surely not sufficient either: a robot with a sophisticated natural language processing capacity could be designed without the capacity for conscious experience.

In short, it is a mystery what Hoijer meant by his “central idea”.

Vague remarks of the same loosely metaphysical sort have continued to be a feature of the literature down to the present. The statements made in some recent papers, even in respected refereed journals, contain non-sequiturs echoing some of the remarks of Sapir, Whorf, and Hoijer. And they come from both sides of the debate.

3. Anti-Whorfian rhetoric

Lila Gleitman is an Essentialist on the other side of the contemporary debate: she is against linguistic relativism, and against the broadly Whorfian work of Stephen Levinson’s group at the Max Planck Institute for Psycholinguistics. In the context of criticizing a particular research design, Li and Gleitman (2002) quote Whorf’s claim that “language is the factor that limits free plasticity and rigidifies channels of development”. But in the claim cited, Whorf seems to be talking about the psychological topic that holds universally of human conceptual development, not claiming that linguistic relativism is true.

Li and Gleitman then claim (p. 266) that such (Whorfian) views “have diminished considerably in academic favor” in part because of “the universalist position of Chomskian linguistics, with its potential for explaining the striking similarity of language learning in children all over the world.” But there is no clear conflict or even a conceptual connection between Whorf’s views about language placing limits on developmental plasticity, and Chomsky’s thesis of an innate universal architecture for syntax. In short, there is no reason why Chomsky’s I-languages could not be innately constrained, but (once acquired) cognitively and developmentally constraining.

For example, the supposedly deep linguistic universal of ‘recursion’ (Hauser et al. 2002) is surely quite independent of whether the inventory of colour-name lexemes in your language influences the speed with which you can discriminate between color chips. And conversely, universal tendencies in color naming across languages (Kay and Regier 2006) do not show that color-naming differences among languages are without effect on categorical perception (Thierry et al. 2009).

4. Strong and weak Whorfianism

One of the first linguists to defend a general form of universalism against linguistic relativism, thus presupposing that they conflict, was Julia Penn (1972). She was also an early popularizer of the distinction between ‘strong’ and ‘weak’ formulations of the Sapir-Whorf Hypothesis (and an opponent of the ‘strong’ version).

‘Weak’ versions of Whorfianism state that language influences or defeasibly shapes thought. ‘Strong’ versions state that language determines thought, or fixes it in some way. The weak versions are commonly dismissed as banal (because of course there must be some influence), and the stronger versions as implausible.

The weak versions are considered banal because they are not adequately formulated as testable hypotheses that could conflict with relevant evidence about language and thought.

Why would the strong versions be thought implausible? For a language to make us think in a particular way, it might seem that it must at least temporarily prevent us from thinking in other ways, and thus make some thoughts not only inexpressible but unthinkable. If this were true, then strong Whorfianism would conflict with the Katzian effability claim. There would be thoughts that a person couldn’t think because of the language(s) they speak.

Some are fascinated by the idea that there are inaccessible thoughts; and the notion that learning a new language gives access to entirely new thoughts and concepts seems to be a staple of popular writing about the virtues of learning languages. But many scientists and philosophers intuitively rebel against violations of effability: thinking about concepts that no one has yet named is part of their job description.

The resolution lies in seeing that the language could affect certain aspects of our cognitive functioning without making certain thoughts unthinkable for us .

For example, Greek has separate terms for what we call light blue and dark blue, and no word meaning what ‘blue’ means in English: Greek forces a choice on this distinction. Experiments have shown (Thierry et al. 2009) that native speakers of Greek react faster when categorizing light blue and dark blue color chips—apparently a genuine effect of language on thought. But that does not make English speakers blind to the distinction, or imply that Greek speakers cannot grasp the idea of a hue falling somewhere between green and violet in the spectrum.

There is no general or global ineffability problem. There is, though, a peculiar aspect of strong Whorfian claims, giving them a local analog of ineffability: the content of such a claim cannot be expressed in any language it is true of . This does not make the claims self-undermining (as with the standard objections to relativism); it doesn’t even mean that they are untestable. They are somewhat anomalous, but nothing follows concerning the speakers of the language in question (except that they cannot state the hypothesis using the basic vocabulary and grammar that they ordinarily use).

If there were a true hypothesis about the limits that basic English vocabulary and constructions puts on what English speakers can think, the hypothesis would turn out to be inexpressible in English, using basic vocabulary and the usual repertoire of constructions. That might mean it would be hard for us to discuss it in an article in English unless we used terminological innovations or syntactic workarounds. But that doesn’t imply anything about English speakers’ ability to grasp concepts, or to develop new ways of expressing them by coining new words or elaborated syntax.

5. Constructing and evaluating Whorfian hypotheses

A number of considerations are relevant to formulating, testing, and evaluating Whorfian hypotheses.

Genuine hypotheses about the effects of language on thought will always have a duality: there will be a linguistic part and a non-linguistic one. The linguistic part will involve a claim that some feature is present in one language but absent in another.

Whorf himself saw that it was only obligatory features of languages that established “mental patterns” or “habitual thought” (Whorf 1956: 139), since if it were optional then the speaker could optionally do it one way or do it the other way. And so this would not be a case of “constraining the conceptual structure”. So we will likewise restrict our attention to obligatory features here.

Examples of relevant obligatory features would include lexical distinctions like the light vs. dark blue forced choice in Greek, or the forced choice between “in (fitting tightly)” vs. “in (fitting loosely)” in Korean. They also include grammatical distinctions like the forced choice in Spanish 2nd-person pronouns between informal/intimate and formal/distant (informal tú vs. formal usted in the singular; informal vosotros vs. formal ustedes in the plural), or the forced choice in Tamil 1st-person plural pronouns between inclusive (“we = me and you and perhaps others”) and exclusive (“we = me and others not including you”).

The non-linguistic part of a Whorfian hypothesis will contrast the psychological effects that habitually using the two languages has on their speakers. For example, one might conjecture that the habitual use of Spanish induces its speakers to be sensitive to the formal and informal character of the speaker’s relationship with their interlocutor while habitually using English does not.

So testing Whorfian hypotheses requires testing two independent hypotheses with the appropriate kinds of data. In consequence, evaluating them requires the expertise of both linguistics and psychology, and is a multidisciplinary enterprise. Clearly, the linguistic hypothesis may hold up where the psychological hypothesis does not, or conversely.

In addition, if linguists discovered that some linguistic feature was optional in two different languages, then even if psychological experiments showed differences between the two populations of speakers, this would not show linguistic determination or influence. The cognitive differences might depend on (say) cultural differences.

A further important consideration concerns the strength of the inducement relationship that a Whorfian hypothesis posits between a speaker’s language and their non-linguistic capacities. The claim that your language shapes or influences your cognition is quite different from the claim that your language makes certain kinds of cognition impossible (or obligatory) for you. The strength of any Whorfian hypothesis will vary depending on the kind of relationship being claimed, and the ease of revisability of that relation.

A testable Whorfian hypothesis will have a schematic form something like this:

  • Linguistic part : Feature F is obligatory in L 1 but optional in L 2 .
  • Psychological part : Speaking a language with obligatory feature F bears relation R to the cognitive effect C .

The relation R might in principle be causation or determination, but it is important to see that it might merely be correlation, or slight favoring; and the non-linguistic cognitive effect C might be readily suppressible or revisable.

Dan Slobin (1996) presents a view that competes with Whorfian hypotheses as standardly understood. He hypothesizes that when the speakers are using their cognitive abilities in the service of a linguistic ability (speaking, writing, translating, etc.), the language they are planning to use to express their thought will have a temporary online effect on how they express their thought. The claim is that as long as language users are thinking in order to frame their speech or writing or translation in some language, the mandatory features of that language will influence the way they think.

On Slobin’s view, these effects quickly attenuate as soon as the activity of thinking for speaking ends. For example, if a speaker is thinking for writing in Spanish, then Slobin’s hypothesis would predict that given the obligatory formal/informal 2nd-person pronoun distinction they would pay greater attention to the formal/informal character of their social relationships with their audience than if they were writing in English. But this effect is not permanent. As soon as they stop thinking for speaking, the effect of Spanish on their thought ends.

Slobin’s non-Whorfian linguistic relativist hypothesis raises the importance of psychological research on bilinguals or people who currently use two or more languages with a native or near-native facility. This is because one clear way to test Slobin-like hypotheses relative to Whorfian hypotheses would be to find out whether language correlated non-linguistic cognitive differences between speakers hold for bilinguals only when are thinking for speaking in one language, but not when they are thinking for speaking in some other language. If the relevant cognitive differences appeared and disappeared depending on which language speakers were planning to express themselves in, it would go some way to vindicate Slobin-like hypotheses over more traditional Whorfian Hypotheses. Of course, one could alternately accept a broadening of Whorfian hypotheses to include Slobin-like evanescent effects. Either way, attention must be paid to the persistence and revisability of the linguistic effects.

Kousta et al. (2008) shows that “for bilinguals there is intraspeaker relativity in semantic representations and, therefore, [grammatical] gender does not have a conceptual, non-linguistic effect” (843). Grammatical gender is obligatory in the languages in which it occurs and has been claimed by Whorfians to have persistent and enduring non-linguistic effects on representations of objects (Boroditsky et al. 2003). However, Kousta et al. supports the claim that bilinguals’ semantic representations vary depending on which language they are using, and thus have transient effects. This suggests that although some semantic representations of objects may vary from language to language, their non-linguistic cognitive effects are transitory.

Some advocates of Whorfianism have held that if Whorfian hypotheses were true, then meaning would be globally and radically indeterminate. Thus, the truth of Whorfian hypotheses is equated with global linguistic relativism—a well known self-undermining form of relativism. But as we have seen, not all Whorfian hypotheses are global hypotheses: they are about what is induced by particular linguistic features. And the associated non-linguistic perceptual and cognitive differences can be quite small, perhaps insignificant. For example, Thierry et al. (2009) provides evidence that an obligatory lexical distinction between light and dark blue affects Greek speakers’ color perception in the left hemisphere only. And the question of the degree to which this affects sensuous experience is not addressed.

The fact that Whorfian hypotheses need not be global linguistic relativist hypotheses means that they do not conflict with the claim that there are language universals. Structuralists of the first half of the 20th century tended to disfavor the idea of universals: Martin Joos’s characterization of structuralist linguistics as claiming that “languages can differ without limit as to either extent or direction” (Joos 1966, 228) has been much quoted in this connection. If the claim that languages can vary without limit were conjoined with the claim that languages have significant and permanent effects on the concepts and worldview of their speakers, a truly profound global linguistic relativism would result. But neither conjunct should be accepted. Joos’s remark is regarded by nearly all linguists today as overstated (and merely a caricature of the structuralists), and Whorfian hypotheses do not have to take a global or deterministic form.

John Lucy, a conscientious and conservative researcher of Whorfian hypotheses, has remarked:

We still know little about the connections between particular language patterns and mental life—let alone how they operate or how significant they are…a mere handful of empirical studies address the linguistic relativity proposal directly and nearly all are conceptually flawed. (Lucy 1996, 37)

Although further empirical studies on Whorfian hypotheses have been completed since Lucy published his 1996 review article, it is hard to find any that have satisfied the criteria of:

  • adequately utilizing both the relevant linguistic and psychological research,
  • focusing on obligatory rather than optional linguistic features,
  • stating hypotheses in a clear testable way, and
  • ruling out relevant competing Slobin-like hypotheses.

There is much important work yet to be done on testing the range of Whorfian hypotheses and other forms of linguistic conceptual relativism, and on understanding the significance of any Whorfian hypotheses that turn out to be well supported.

Copyright © 2024 by Barbara C. Scholz Francis Jeffry Pelletier < francisp @ ualberta . ca > Geoffrey K. Pullum < pullum @ gmail . com > Ryan Nefdt < ryan . nefdt @ uct . ac . za >

  • Accessibility

Support SEP

Mirror sites.

View this site from another server:

  • Info about mirror sites

The Stanford Encyclopedia of Philosophy is copyright © 2024 by The Metaphysics Research Lab , Department of Philosophy, Stanford University

Library of Congress Catalog Data: ISSN 1095-5054

COMMENTS

  1. Visual Perception Theory In Psychology

    For Gregory, perception is a hypothesis which is based on prior knowledge. In this way, we are actively constructing our perception of reality based on our environment and stored information. Summary. A lot of information reaches the eye, but much is lost by the time it reaches the brain (Gregory estimates about 90% is lost).

  2. Frontiers

    Second, there is the hypothesis of direct perception (HDP), which proposes that perceptual experience primarily is a process of directly revealing or disclosing the meaning of the perceived (Gallagher, 2008a; Zahavi, 2011). There are two complementary aspects to the HDP.

  3. 5.6 The Gestalt Principles of Perception

    pattern perception: ability to discriminate among different figures and shapes. perceptual hypothesis: educated guess used to interpret sensory information. principle of closure: organize perceptions into complete objects rather than as a series of parts. proximity: things that are close to one another tend to be grouped together

  4. Perceptions as Hypotheses

    Claims: (1) that perceptions are essentially like predictive hypotheses in science; (2) that the procedures of science are a guide for discovering processes of perception; (3) that many perceptual illusions correspond to and may receive explanations from under-. standing systematic errors occurring in science.

  5. Perceptual Set In Psychology: Definition & Examples

    The hypothesis for this study was based on a well-known finding that the more we are exposed to a stimulus, the more familiar we become with it and the more we like it. ... Perception, 1, 417-425. Gilchrist, J. C.; Nesberg, Lloyd S. (1952). Need and perceptual change in need-related objects. Journal of Experimental Psychology, Vol 44(6).

  6. Action-based Theories of Perception

    Action-based Theories of Perception. First published Wed Jul 8, 2015; substantive revision Tue Sep 19, 2023. Action is a means of acquiring perceptual information about the environment. Turning around, for example, alters your spatial relations to surrounding objects and, hence, which of their properties you visually perceive.

  7. Perceptions as hypotheses: saccades as experiments

    If perception corresponds to hypothesis testing (Gregory, 1980); then visual searches might be construed as experiments that generate sensory data. In this work, we explore the idea that saccadic eye movements are optimal experiments, in which data are gathered to test hypotheses or beliefs about how those data are caused.

  8. Perceptions as hypotheses

    To understand perception, the signal codes and the stored knowledge or assumptions used for deriving perceptual hypotheses must be discovered. Systematic perceptual errors are important clues for appreciating signal channel limitations, and for discovering hypothesis-generating procedures.

  9. Perceptions as Hypotheses

    Abstract. Philosophers concerned with perception traditionally consider phenomena of perception which may readily be verified by individual observation and a minimum of apparatus. Experimental psychologists and physiologists, on the other hand, tend to use elaborate experimental apparatus and sophisticated techniques, so that individual ...

  10. Embodiment and the Perceptual Hypothesis

    The Perceptual Hypothesis opposes Inferentialism, which is the view that our knowledge of others' mental features is always inferential. The claim that some mental features are embodied is the claim that some mental features are realised by states or processes that extend beyond the brain. The view I discuss here is that the Perceptual ...

  11. Perception: The Sensory Experience of the World

    Perception refers to our sensory experience of the world. It is the process of using our senses to become aware of objects, relationships, and events. It is through this experience that we gain information about the environment around us. Perception relies on the cognitive functions we use to process information, such as utilizing memory to ...

  12. The Common Kind Theory and The Concept of Perceptual Experience

    In this paper, I advance a new hypothesis about what the ordinary concept of perceptual experience might be. To a first approximation, my hypothesis is that it is the concept of something that seems to present mind-independent objects. Along the way, I reveal two important errors in Michael Martin's argument for the very different view that the ordinary concept of perceptual experience is ...

  13. PDF Methods of Analysis Perceptual Analysis

    Perceptual analysis commonly refers to a specific process of the brain and mind— or a process of perception— that ultimately facilitates habituation, sensitization, and knowledge. ... the attentional selection will be dictated by the hypothesis that it is a beak and the content of its percept is more likely to be organized in a way that ...

  14. Attention and Conscious Perception in the Hypothesis Testing Brain

    The core idea is that conscious perception correlates with activity, spanning multiple levels of the cortical hierarchy, which best suppresses precise prediction error: what gets selected for conscious perception is the hypothesis or model that, given the widest context, is currently most closely guided by the current (precise) prediction errors 5.

  15. Perceptual Learning

    According to Gibson, perceptual learning is " [a]ny relatively permanent and consistent change in the perception of a stimulus array, following practice or experience with this array…" (1963: 29). [1] Gibson's definition has three basic parts. First, perceptual learning is long-lasting. Second, it is perceptual.

  16. Perceptual Sets in Psychology

    A perceptual set is a good example of what is known as top-down processing. In top-down processing, perceptions begin with the most general and move toward the more specific. Such perceptions are heavily influenced by context, expectations, and prior knowledge. If we expect something to appear in a certain way, we are more likely to perceive it ...

  17. PDF Paradigms of Perception

    Perception: Conscious sensory experience. Electrical signals that represent something (eg. seeing a tiger) are somehow transformed into your experience of seeing a "tiger". Recognition: Our ability to place an object in a category, such as "tiger," that gives its meaning.

  18. Understanding human perception by human-made illusions

    Gregory proposed that perception shows the quality of hypothesis testing and that illusions make us clear how these hypotheses are formulated and on which data they are based (Gregory, 1970). One of the key assumptions for hypothesis testing is that perception is a constructive process depending on top-down processing. Such top-down processes ...

  19. Beliefs and desires in the predictive brain

    The first is perceptual inference, where incoming sensory signals are used to adjust hypotheses at higher levels, such that the hypotheses more closely match the outside world.

  20. Perceptual and conceptual processing of visual objects across ...

    The information degradation hypothesis states that degraded perceptual input resulting from age-related neurobiological changes causes a decline in cognitive processes 54. We find that the ...

  21. Bayesian approaches to brain function

    This field of study has its historical roots in numerous disciplines including machine learning, experimental psychology and Bayesian statistics.As early as the 1860s, with the work of Hermann Helmholtz in experimental psychology, the brain's ability to extract perceptual information from sensory data was modeled in terms of probabilistic estimation.

  22. Embodiment and the Perceptual Hypothesis

    The Perceptual Hypothesis opposes Inferentialism, which is the view that our knowledge of others' mental features is always inferential. The claim that some mental features are embodied is the claim that some mental features are realised by states or processes that extend beyond the brain. The view I discuss here is that the Perceptual ...

  23. Whorfianism

    The claim is very often referred to as the Sapir-Whorf Hypothesis (though this is a largely infelicitous label, as we shall see). ... And the associated non-linguistic perceptual and cognitive differences can be quite small, perhaps insignificant. For example, Thierry et al. (2009) provides evidence that an obligatory lexical distinction ...