A Hierarchical Modular Architecture for Embodied Cognition

in Multisensory Research
Restricted Access
Get Access to Full Text
Rent on DeepDyve

Have an Access Token?

Enter your access token to activate and access content online.

Please login and go to your personal user account to enter your access token.


Have Institutional Access?

Access content through your institution. Any other coaching guidance?


Cognition can appear complex owing to the fact that the brain is capable of an enormous repertoire of behaviors. However, this complexity can be greatly reduced when constraints of time and space are taken into account. The brain is constrained by the body to limit its goal-directed behaviors to just a few independent tasks over the scale of 1–2 min, and can pursue only a very small number of independent agendas. These limitations have been characterized from a number of different vantage points such as attention, working memory and dual task performance. It may be possible that the disparate perspectives of all these methodologies can be unified if behaviors can be seen as modular and hierarchically organized. From this vantage point, cognition can be seen as having a central problem of scheduling behaviors to achieve short term goals. Thus dual-task paradigms can be seen as studying the concurrent management of simultaneous, competing agendas. Attention can be seen as focusing on the decision as to whether to interrupt the current agenda or persevere. Working memory can be seen as the bookkeeping necessary to manage the state of the current active agenda items.

A Hierarchical Modular Architecture for Embodied Cognition

in Multisensory Research



  • AdamsF. (2010). Embodied cognitionPhenomenol. Cogn. 9619628.

  • AndersonJ. (1983). The Architecture of Cognition. Harvard University PressNew Haven, CT, USA.

  • ArbibM. A. (1988). The Handbook of Brain Theory and Neural Networks pp.  830834. MIT PressCambridge, MA, USA.

  • ArkinR. (1998). Behavior Based Robotics. MIT PressCambridge, MA, USA.

  • BaddeleyA. (1992). Working memoryScience 255556559.

  • BallardD. H.HayhoeM. M.PookP.RaoR. (1997a). Deictic codes for the embodiment of cognitionBehav. Brain Sci. 20723767.

  • BallardD. H.HayhoeM. M.PookP. K.RaoR. P. N. (1997b). Deictic codes for the embodiment of cognitionBehav. Brain Sci. 20723742.

  • BartoA. G.MahadevanS. (2003). Recent advances in hierarchical reinforcement learningDiscrete Event Dyn. S. 134177.

  • BonassoE. P.FirbyR. J.GatE.KortenkampD.MillerD. P.SlackM. G. (1997). Experiences with an architecture for intelligent reactive agentsJ. Exp. Theor. Artif. Intell. 9237256.

  • BowmanH.WybleB. (2007). The simultaneous type, serial token model of temporal attention and working memoryPsychol. Rev. 1143870.

  • BrooksR. (1986). A robust layered control system for a mobile robot, robotics and automationIEEE J. 21423.

  • BrysonJ. J.SteinL. A. (2001). Modularity and design in reactive intelligence in: Internat. Joint Conf. on Artificial Intelligence Seattle Washington USA.

  • ClarkA. (1999). An embodied model of cognitive science? Trends Cogn. Sci. 3345351.

  • Craig BoutilierR. D.GoldszmidtM. (2000). Stochastic dynamic programming with factored representationsArtifi. Intell. 12149107.

  • DawN. D.O’DohertyJ. P.DayanP.SeymourB.DolanR. J. (2006). Cortical substrates for exploratory decisions in humansNature 44876879.

  • DayanP.HintonG. E. (1992). Feudal reinforcement learningNeural Information Processing Systems 5271.

  • DoyaK.SamejimaK.KatagiriK. I.KawatoM. (2002). Multiple model-based reinforcement learningNeural Comput. 1413471369.

  • DrollJ.HayhoeM.TrieschJ.SullivanB. (2005). Task demands control acquisition and storage of visual informationJ. Exp. Psychol. Human 3114161438.

  • FanJ.McCandlissB. D.FossellaJ.FlombaumJ. I.PosnerM. I. (2005). The activation of attentional networksNeuroImage 26471479.

  • FirbyR. J.KahnR. E.ProkopowiczP. N.SwainM. J. (1995). An architecture for vision and action in: Int. Joint Conf. on Artificial Intelligence pp. 72–79.

  • GibsonJ. J. (1979). The Ecological Approach to Visual Perception. Houghton MifflinBoston, USA.

  • GuestrinC. E.KollerD.ParrR.VenkataramanS. (2003). Efficient solution algorithms for factored MDPsJ. Artif. Intell. Res. 19399468.

  • HayhoeM. M.ShrivastavaA.MruczekR.PelzJ. (2003). Visual memory and motor planning in a natural taskJ. Vision 34963.

  • HikosakaO.Bromberg-MartinE.HongS.MatsumotoM. (2008). New insights on the subcortical representation of rewardCurr. Opin. Neurobiol. 18203208.

  • HumphrysM. (1996). Action selection methods using reinforcement learning in: From Animals to Animats 4: Proc. 4th Internat. Conf. Simulation of Adaptive Behavior P. Maes M. Mataric J.-A. Meyer J. Pollack and S. W. Wilson (Eds) MIT Press Bradford Books Cambridge MA USA pp. 135–144.

  • HurleyS. (2008). The shared circuits model (scm): how control, mirroring, and simulation can enable imitation, deliberation, and mindreadingBehav. Brain Sci. 31122.

  • IttiL. (2005). Quantifying the contribution of low-level saliency to human eye movements in dynamic scenesVis. Cogn. 1210931123.

  • IttiL.KochC. (2000). A saliency based search mechanism for overt and covert shifts of attentionVision Research 4014891506.

  • KarlssonJ. (1997). Learning to solve multiple goals PhD thesis University of Rochester NY USA.

  • LairdJ. E.NewellA.RosenblumP. S. (1987). Soar: an architecture for general intelligenceArtif. Intell. 33164.

  • LangleyP.ChoiD. (2006). Learning recursive control programs from problem solvingJ. Mach. Learn. Res. 7493518.

  • LuckS. J.VogelE. K. (1997). The capacity of visual working memory for features and conjunctionsNature 390279281.

  • MeuleauN.HauskrechtM.KimK.-E.PeshkinL.KaelblingL.DeanT.BoutilierC. (1998). Solving very large weakly coupled Markov decision processes in: Proc. 15th Natl/10th Conf. Madison WI USA AAAI/IAAI pp. 165–172.

  • NavalpakkamV.KochC.RangelA.PeronaP. (2010). Optimal reward harvesting in complex perceptual environmentsProc. Nat. Acad. Sci. USA 10752325237.

  • NeisserU. (1967). Cognitive Psychology. Appleton-Century-CroftsNew York.

  • NoeA. (2005). Action in Perception. MIT PressCambridge, MA, USA.

  • NordfangM.DyrholmM.BundesenC. (2012). Identifying bottom-up and top-down components of attentional weight by experimental analysis and computational modellingJ. Exp. Psychol. Gen. DOI:10.1037/a0029631.

  • O’ReganJ. K.NoeA. (2001). A sensorimotor approach to vision and visual consciousnessBehav. Brain Sci. 24939973.

  • ParrR.RussellS. (1997). Reinforcement learning with hierarchies of machines in: Advances in Neural Information Processing SystemsJordanM. I.KearnsM. J.SollaS. A. (Eds). MIT PressCambridge, MA, USA.

  • PfeiferR.ScheierC. (1999). Understanding Intelligence. Bradford BooksCambridge, MA, USA.

  • PosnerM. I.RothbartM. K. (2007). Research on attention networks as a model for the integration of psychological scienceAnn. Rev. Psychol. 58123.

  • RaoR. P. N.ZelinskyG. J.HayhoeM. M.BallardD. H. (2002). Eye movements in iconic visual searchVision Research 421444714463.

  • RitterS.AndersonJ. R.CytrynowiczM.MedvedevaO. (1998). Authoring content in the pat algebra tutorJ. Interact. Media Educ. 98130.

  • RoelfsemaP. R.KhayatP. S.SpekreijseH. (2003). Sub-task sequencing in the primary visual cortexProc. Nat. Acad. Sci. USA 10054675472.

  • RothkopfC. A.BallardD. H. (2009). Image statistics at the point of gaze during human navigationVis. Neurosci. 268192.

  • RothkopfC. A.BallardD. H. (2010). Credit assignment in multiple goal embodied visuomotor behaviourFront. Psychol. 1113online.

  • RothkopfC. A.BallardD. H.HayhoeM. M. (2007). Task and context determine where you lookJ. Vision 7120.

  • RoyD. K.PentlandA. P. (2002). Learning words from sights and sounds: a computational modelCogn. Sci. 26113146.

  • RummeryG. A.NiranjanM. (1994). Online Q-learning using connectionist systems Technical Report CUED/FINFENG/TR 166 Cambridge University Engineering Department UK.

  • RussellS.ZimdarsA. (2003). Q-decomposition for reinforcement learning agents in: Proc. Int. Conf. Machine Learning.

  • RuthruffE.PashlerH. E.HazeltineE. (2003). Dual-task interference with equal task emphasis: graded capacity-sharing or central postponement? Atten. Percept. Psycho. 65801816.

  • SallansB.HintonG. E. (2004). Reinforcement learning with factored states and actionsJ. Mach. Learn. Res. 510631088.

  • SamejimaK.DoyaK.KawatoM. (2003). Inter-module credit assignment in modular reinforcement learningNeural Networks 16985994.

  • SchultzW. (2000). Multiple reward signals in the brainNat. Rev. Neurosci. 1199207.

  • SchultzW.DayanP.MontagueP. R. (1997). A neural substrate of prediction and rewardScience 27515931599.

  • ShapiroL. (2011). Embodied Cognition. RoutledgeNew York, USA.

  • SinghS.CohnD. (1998). How to dynamically merge Markov decision processes in: Neural Information Processing Systems Conf. Denver CO USA 1997 Vol. 10 pp. 1057–1063.

  • SpragueN.BallardD. (2003). Multiple-goal reinforcement learning with modular sarsa(0) in: Internat. Joint Conf. Artificial Intelligence Acapulco USA.

  • SpragueN.BallardD.RobinsonA. (2007). Modeling embodied visual behaviorsACM Trans. Appl. Percept. 411.

  • StewartJ.GapenneO.Di PaoloE. (Eds) (2010). En-action: Toward a New Paradigm for Cognitive Science. MIT PressCambridge, MA, USA.

  • SunR. (2006). Cognition and Multi-Agent InteractionCh. 4 pp.  7999. Cambridge University PressUK.

  • SuttonR. S.BartoA. G. (1998). Reinforcement Learning: An Introduction. MIT PressCambridge, MA, USA.

  • SuttonR. S.PrecupD.SinghS. P. (1999). Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learningArtif. Intell. 112181211.

  • TolmanE. C. (1948). Cognitive maps in rats and menPsycholog. Rev. 55189208.

  • TorralbaA.OlivaA.CastelhanoM.HendersonJ. M. (2006). Contextual guidance of attention in natural scenes: the role of global features on object searchPsychol. Rev. 113766786.

  • TreismanA. M. (1980). A feature-integration theory of attentionCogn. Psychol. 1297136.

  • TrickL. M.PylyshynZ. W. (1994). Why are small and large numbers enumerated differently? A limited-capacity preattentive stage in visionPsychol. Rev. 10180102.

  • UllmanS. (1985). Visual routinesCognition 1897159.

  • VarealaF. J.ThompsonE.RoschE. (1991). The Embodied Mind: Cognitive Science and Human Experience. MIT PressCambridge, MA, USA.

  • VigoritoC. M.BartoA. G. (2010). Intrinsically motivated hierarchical skill learning in structured environmentsIEEE Trans. Auton. Mental Dev. 2132143.

  • WatkinsC. J. C. H. (1989). Learning from delayed rewards PhD thesis University of Cambridge UK.

  • YuC.BallardD. (2004). A multimodal learning interface for grounding spoken language in sensorimotor experienceACM Trans. Appl. Percept. 15780.


  • View in gallery

    Four levels of a hierarchical cognitive architecture that operate at different timescales. The central element is the task level, wherein a given task may be described in terms of a module of states and actions. A thread keeps track of the process of execution though the module. The next level down consists of visual and motor routines (arrows) that monitor the status state and action space fidelities respectively. Above the task level is the operating system level, whereby priorities for modules are use to select an appropriate small suite of modules to manage a given real world situation. The topmost level is characterized as an attentional level. If a given module is off the page of its expectations, it may be re-programmed via simulation and modification.

  • View in gallery

    In a task-directed module, the maintenance of state information is handled by routines that exhibit agenda-driven control strategies. To get information, a hypothesis to be tested is putatively sent to the thalamus, where it is compared to coded image data. The result is state information that may in addition trigger a gaze change.

  • View in gallery

    Human gaze data for the same environment showing striking evidence for visual routines. Humans in a virtual walking environment manipulate gaze location depending on the specific task goal. The small black dots show the location of all fixation points on litter and obstacles. When picking up litter (left) gaze points cluster on the center of the object. When avoiding a similar object (right) gaze points cluster at the edges. From Rothkopf and Ballard (2009). This figure is published in color in the online version.

  • View in gallery

    In walking down a sidewalk, the virtual reality scenery was augmented with a dizzying array of distractors e.g. an upside-down cow. Subjects viewed the scene with a binocular HMD which was upadate for head motion. While waiting for a start command, subjects did fixate distractors (items 5 and 6), but when the sidewalk navigation experiment began, subjects fixated the task-relevant objects almost exclusively (items 1, 2 and 3).

  • View in gallery

    In any period during behavior there is only a subset of the total module set that is active. We term these periods episodes. In the time course of behavior, modules that are needed become active and those that are no longer needed become inactive. The diagram depicts two sequential episodes of three modules each {3, 4, 7} and {2, 8, 10}. The different modules are denoted with different shadings and numbers. The different lengths indicate that modules can exhibit different numbers of states and finish at different times. The horizontal arrows denote the scheduler’s action in activating and deactivating modules. On the right is the large library of possible modules. Our formal results only depend on each module being chosen sufficiently often and not on the details of the selection strategy. The same module may be selected in sequential episodes.

  • View in gallery

    A potential job for an alerting module: Detecting unusual variations in optic flow while driving. (A) Encroaching car produces a pronounced deviation from background radial flow expectation. Radial flow can be dismissed as a normal expectation, but the horizontal flow of a car changing lanes signals an alert. (B) The time line shows that this signal, as measured by a space and time-window integration, is easily detectable. This figure is published in color in the online version.

  • View in gallery

    (A) The Sprague model of gaze allocation. Modules compete for gaze in order to update their measurements. The figure shows a caricature of the basic method for a given module. The trajectory through the agent’s state space is estimated using Kalman filter that propagates estimates in the absence of measurements and, as a consequence, build up uncertainty (large shaded area). If the behavior succeeds in obtaining a fixation, state space uncertainty is reduced (dark). The reinforcement learning model allows the value of reducing uncertainty to be calculated. (B) In the side-walking venue, three modules are updated using the Sprague protocol, a sequential protocol and a random protocol (reading from left to right). The Sprague protocol outperforms the other two.

  • View in gallery

    A fundamental problem for a biological agent using a modular architecture. At any given instant, shown with dotted lines, when multiple modules are active and only a global reward signal G is available, the modules each have to be able to calculate how much of the rewards is due to their activation. This is known as the credit assignment problem. Our setting simplifies the problem by assuming that individual reinforcement learning modules are independent and communicate only their estimates of their reward values. The modules can be activated and deactivated asynchronously, and may each need different numbers of steps to complete, as suggested by the diagram.

  • View in gallery

    (A) Reward calculations for the walkway navigation task for the three component behaviors. Top row: Initial values. Bottom row: Final reward estimates. (B) Time course of learning reward for each of the three component behaviors. RMS error between true and calculated reward as a function of iteration number.

  • View in gallery

    Value functions and their associated policies for each of three modules that have been learned by a virtual avatar walking along a sidewalk strewn with litter and obstacles. The red disk marks the state estimate for each of them. The individual states for each module are assumed to be estimated by separate applications of the gaze vector to compute the requisite information. Thus the state for the obstacle is the heading to it, and similarly for the state for a litter object. The state for the sidewalk is a measure of the distance to its edge. In the absence of a gaze update, it is assumed that subjects use vestibular and proprioceptive information to update the individual module states. This figure is published in color in the online version.

Index Card

Content Metrics

Content Metrics

All Time Past Year Past 30 Days
Abstract Views 54 52 1
Full Text Views 57 57 1
PDF Downloads 7 7 0
EPUB Downloads 0 0 0