Saturday, March 9, 2024

Implementation of a Simple Visuomotor Environment and Brain-inspired Visuomotor Agent

Abstract: As many animals, including humans, make behavioral decisions based on visual information, a cognitive model of the visuomotor system would serve as a basis in intelligence research, including AGI. This article reports on the implementation of a relatively simple system: a virtual environment that displays shapes and cursors and an agent that performs gaze shift and cursor control based on the information from the environment. The visual system is modeled after that of humans with the central and peripheral fields of view, and the agent architecture is based on the structure of the brain.

1. Introduction

This article reports on the implementation of a simple environment and agent architecture for decision making based on visual information, which would serve as part of more generic cognitive models/architectures. It also addresses human ‘active vision,’ where visual information is collected and integrated through gaze shift.

This work adopts a strategy of starting with a relatively simple model. The implemented two-dimensional visual environment displays simple figures and cursors. Figures and a cursor can be moved (dragged) by instructions from the agent.

As for the agent, the following were modeled and implemented, imitating the human visual system.

1) distinction between central and peripheral vision,

2) gaze shift based on salience in the peripheral vision [1],

3) unsupervised learning of shapes captured in the central vision,

4) reinforcement learning of cursor movement and dragging,

5) “surprise” due to changes in the environment caused by actions and habituation due to learning,

6) reward based on “surprise”.

Here 3), 4), and 5) involve learning and are provided with learning models. Agent's action consists of gaze shift and cursor movement + dragging. gaze shift in the model does not learn and is driven by salience.

2. Environment

The environment has a screen divided into an N × N grid (Figure 1). The center of the screen is a "stage" consisting of an M × M grid (M<N). The edges of the stage are marked with border lines. M different shapes are displayed on the stage. The visual information presented to the agent is a color bitmap of the field of view (M × M grid) centered on the gaze. The gaze is located at the center of a grid cell on the stage, and shifted when the environment is given a gaze shift signal (a vector of maximum and minimum values [± M, ± M]). It does not move off the stage. Two cursors of different colors are displayed on the stage. When the environment is given a cursor movement signal (a vector of maximum and minimum [± 1, ± 1]), one of the cursors may move, while it does not move off the stage. If the cursor is superimposed on a figure and the environment is given a non zero cursor move and grab signal, the figure is moved in the same direction and distance as the cursor move (i.e., dragged). Figure 1 shows an example display.

Figure 1: Environment

3. Agent

The agent receives the input of a color bitmap of the field of view from the environment, and outputs gaze shift, cursor movement, and grab signals to the environment. The agent has an architecture consisting of the following modules (Fig.2 – the following parentheses indicate module names in the figure). Salience Calculation Module (Periphery2Saliency), Gaze Shift Module (PriorityMap2Gaze), Central Visual Field Change Prediction Module (FoveaDiffPredictor), Surprise-reward calculation module (SurpriseReward), object recognition module (ObjectRecognizer), and Cursor Control Module (CursorActor). See the figure for connections between modules.

Figure 2: Architecture

The Cursor Control Module uses reinforcement learning rewarded by changes in the external world caused by its own action (contingency detection) [2].

As for correspondence with the brain, the saliency calculation module corresponds to the superior colliculus, the Gaze Shift Module corresponds to the neural circuit from the superior colliculus to the eye, and the Object Recognition Module corresponds to the ‘what path’ of the visual cortex, which performs object identification. As the Central Visual Field Change Prediction Module and the surprise-reward calculation module use the output of the object recognition module, it could correspond to a visual association cortex such as the frontal eye field [3]. The Cursor Control Module would correspond to the motor cortex.

3.1 Salience Calculation Module (Periphery2Saliency)

After reducing the resolution of the input bitmap, it creates a monochrome brightness map corresponding to the peripheral visual field, and adds an edge detection map and a time differential map to it. Though it is said that the log-polar coordinate system is used in human peripheral vision, the ordinary Cartesian coordinates were used for engineering interpretability and amenability with off-the-shelf tools such as the regular CNN.

3.2 Gaze Shift Module (PriorityMap2Gaze)

A gaze shift signal is calculated to move the gaze to the part with maximum saliency based on the saliency (priority) map from the saliency calculation module.

3.3 Object Recognition Module (ObjectRecognizer)

It feeds the bitmap of the central visual field to an unsupervised learner, and outputs the latent variables of the learner.

3.4 Central Visual Field Change Prediction Module (FoveaDiffPredictor)

‘Central visual field change’ refers to the scalar (summed) time difference of the Object Recognition Module output. The module predicts it from the outputs of the Object Recognition Module and Cursor Control Module at the previous time. If a gaze shift has occurred at the previous time, no prediction is made and the output is set to zero (saccade suppression). Prediction is learned, and its output is the prediction error.

3.5 Surprise-Reward Calculation Module (SurpriseReward)

It outputs {scalar (summed) value of time difference of Object Recognition Module output x prediction error (the output of the Central Visual Field Change Prediction Module)}.' The output becomes zero if the prediction error is zero or if there is no time change in the output of the Object Recognition Module.

3.6 Cursor Control Module (CursorActor)

It is a reinforcement learner that observes the output of the Object Recognition Module and outputs the cursor control (movement vector + grab) signal. The reward is the output of the Surprise-Reward Calculation Module.

4 Implementation and Test

The code is located here:

4.1 Environment

The environment was implemented with Python and PyGame. Card game symbols (pips) were used as figures. The initial positions of figures and cursors are at random for each episode (the initial position of the cursor controlled by the agent was set on a figure).

4.2 Agent

The agent was implemented with Python and BriCA (Brain-inspired Computing Architecture)[4], a computational platform for developing brain-inspired software. As BriCA supports modular architecture development, the reuse of the implementation in more complex architectures could be easier. With the BriCA platform, architectural design is first specified in a spreadsheet and then converted into an architecture description language (BriCA language). At runtime, the interpreter loads and executes the BriCA language description. BriCA modules exchange numerical vector signals in a token-passing manner. PyTorch was used as a machine learning platform.

Salience Calculation Module (Periphery2Saliency)

It reduces the resolution of the input bitmap, calculates a monochrome brightness map corresponding to the peripheral visual field, and adds an edge detection map and a time differential map to the brightness map with preconfigured weights.

Gaze Shift Module (PriorityMap2Gaze)

It computes the ‘priority map’ by 1) adding random noise to the output of the saliency calculation module (salience map), and 2) adding the priority map at the previous time multiplied by the damping coefficient. The gaze shift signal is calculated so that the gaze moves to the field of view corresponding to the part with the maximum value in the priority map.

Object recognition module (ObjectRecognizer)

βVAE (from Princeton U.: code) was used after kinds of autoencoders had been compared as unsupervised learners. The choice was made with the expectation that the number of output dimensions would be relatively small and it provides interpretable (distangled) latent variables.

Central Visual Field Change Prediction Module (FoveaDiffPredictor)

It predicts scalar changes in the central visual field from the output of the Object Recognition Module and Cursor Control Module at the previous time, and outputs the prediction error. A three-layer perceptron was used as a predictor.

Surprise-Reward Calculation Module (SurpriseReward)

It outputs {the scalar value of the time difference of the Object Recognition Module output × prediction error (Central Visual Field Change Prediction Module output)}.

Cursor Control Module (CursorActor)

It uses a cerebral cortex/basal ganglia loop model [5] (code), based on the hypothesis that the cerebral cortex predicts actions through learning, and the basal ganglia determines (through reinforcement learning) whether to perform the action. The implemented basal ganglia model determines whether or not it is possible to perform it based on the given observation data and type of action (Go/NoGo) through reinforcement learning. Meanwhile, the cortical model initially selects the type of action at random, and as the learning of the basal ganglia model progresses, it begins to predict and present the type of action performed from observational data. The used reinforcement learning algorithm was DQN (Deep Q-Network).

4.3 Experiments (Tests)

Experiments (tests) and learning were performed by modules starting from the area closest to the visual input.

Salience Calculation Module and Gaze Shift Module

These modules do not depend on other modules and do not perform learning. They were qualitatively tested with their own environment (Vision1Env.py), where circles with multiple colors, intensities, and sizes were presented in the field of view. Gaze shift was observed and parameters parameters (e.g., intensity, edge, time differential weight for saliency map calculation) were adjusted by the developer.

Object Recognition Module

All combinations of images that would appear in the central visual field were fed to the βVAE (with the number of latent variables=10) to be trained (TrainFovea_VAE.py). While original images were generally reconstructed after about 10,000 episodes, the latent (disentangled) variables corresponding to the elements in the images were not found.

Central Visual Field Change Prediction Module

The three-layer perceptron was trained to predict changes in the central visual field from the outputs of the Object Recognition Module and of the Cursor Control Module except for immediately after saccades. The loss became zero around episode 150.

Surprise-Reward Calculation Module

The multiplication was performed correctly (no learning is performed in this module).

Cursor Control Module

It was trained to output the cursor control (movement vector +grab) signal by observing the output of the Object Recognition Module and rewarded by the output of the Surprise-Reward Calculation Module (the Central Visual Field Change Prediction Module had not been trained).

The amount of reward acquired was tripled compared to random trials (average reward 0.12) (Fig.3).

Figure 3: Cursor Control Module learning results

Horizontal axis: number of episodes

Vertical axis: average reward (average of 5 trials)

5. Conclusion

The article reported on the implementation of an environment that displays shapes and cursors on the screen, and an agent that moves the eye and controls the cursor based on visual information.

Tasks that utilize gaze shift (active vision tasks) have been developed elsewhere. DeepMind has developed PsychLab with tasks using gaze shift [6]^*1. The image recognition learning task using gaze shift is part of what is called object centric learning (👉 review). Working memory tasks such as oculomotor delayed response tasks^*2 use gaze shift. Papers [7] and [8] propose biologically plausible models of active vision.

In this article, learning was performed using “surprise'' or prediction errors as reward, which is a regular way in unsupervised learning. Learning about changes in the environment due to one's own actions (contingencies) through prediction errors or “surprises'' appears as a theme in psychology [2]. There are various studies related to surprise, exploratory behavior, and curiosity [9][10][11](chapter 3).

Papers [12] and [13] provide neural models similar to that in this article, though more specific ([12] does not model central/peripheral vision as it is concerned with the rat).

When controlling gaze shift using reinforcement learning, it would be necessary to explicitly model the frontal eye field as the corresponding region of the brain (the model would have a mechanism similar to the Cursor Control Module). The representation of the scene consisting of kinds of objects and their location (presumably integrated around the hippocampus) would also be required in tasks using gaze shift.

A model of areas around the hippocampus is important for the recognition of scene sequences, as the hippocampus is also said to be responsible for episodic memory. The model of the prefrontal cortex would be required for working memory tasks, as the region is said to be involved in it.

Finally, the environment was implemented having in mind the modeling of visual understanding of other people's actions and language acquisition presupposing such understanding. Thus, what additional structures will be needed for those models shall be studied.

*1: In this hackathon, a subset of tasks from PsychLab was used.

*2: In this hackathon, a match-to-sample task that requires working memory and gaze shift was used.

References

[1] Veale, et al.: How is visual salience computed in the brain? Insights from behaviour, neurobiology and modelling, Phil. Trans. R. Soc. B, 372(1714) (2017). https://doi.org/10.1098/rstb.2016.0113
[2] Hiraki, K.: Detecting contingency: A key to understanding development of self and social cognition, Japanese Psychological Research, 48(3) (2006).
https://doi.org/10.1111/j.1468-5884.2006.00319.x
[3] Ferrera, V. and Barborica, A.: Internally Generated Error Signals in Monkey Frontal Eye Field during an Inferred Motion Task, Journal of Neuroscience, 30 (35) (2010). https://doi.org/10.1523/JNEUROSCI.2977-10.2010
[4] Kouichi Takahashiet al.: A Generic Software Platform for Brain-inspired Cognitive Computing, Procedia Computer Science, 71 (2015). https://doi.org/10.1016/j.procs.2015.12.185
[5] Arakawa, N.: Implementation of a Model of the Cortex Basal Ganglia Loop, ArXiv (2024). https://doi.org/10.48550/arXiv.2402.13275
[6] Leibo, J., et al.: Psychlab: A Psychology Laboratory for Deep Reinforcement Learning Agents, ArXiv (2018) https://doi.org/10.48550/arXiv.1801.08116
[7] Hoang, K. et al.: Active vision: on the relevance of a bio-inspired approach for object detection, Bioinspiration & Biomimetics, 15(2) (2020).
https://doi.org/10.1088/1748-3190/ab504c
[8] McBride, S., Huelse, M., and Lee, M.: Identifying the Computational Requirements of an In- tegrated Top-Down-Bottom-Up Model for Overt Visual Attention within an Active Vision System. PLoS ONE 8(2) (2013). https://doi.org/10.1371/journal.pone.0054585
[9] Oudeyer P.Y., Kaplan , F., and Hafner, V.: Intrinsic Motivation Systems for Autonomous Mental Development, IEEE Transactions on Evolutionary Computation, 11(2). (2007). https://doi.org/10.1109/TEVC.2006.890271

[10] Schmidhuber, H.: Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010), IEEE Transactions on Autonomous Mental Development, 2(3) (2010). https://doi.org/10.1109/tamd.2010.2056368
[11] Cangelosi, A., et al.: Developmental Robotics: From Babies to Robots, MIT Press (2015) https://doi.org/10.7551/mitpress/9320.001.0001
[12] Fiore V., et al.: Instrumental conditioning driven by neutral stimuli: A model tested with a simulated robotic rat, in Proceedings of the Eighth International Conference on Epigenetic Robotics (2008).
[13] Santucci, V.G., et al.: Biological Cumulative Learning through Intrinsic Motivations: A Sim- ulated Robotic Study on the Development of Visually-Guided Reaching, in Proceedings of the Tenth International Conference on Epigenetic Robotics (2010).

Tuesday, October 3, 2023

AutoEncoder-based Predictor (implementation)

I have been 'playing around' with autoencoder implementations to realize 'a predictor,' as the principal function of the neocortex is supposed to be prediction. I tried a simple autoencoder and a sparse autoencoder from a cerenaut repository and a β-VAE implementation from a project repository of Princeton University (see the explanatory article). I chose the β-VAE, for I'll use it to model the association cortex, where the use of CNN may not be appropriate (the β-VAE does not use CNN but only Linear layers). (And the simple one may not be potent enough.)

I constructed a predictor with the encoder, decoder, and autoencoder factory from the repository with a single modification in the decoder setting. Namely, the predictor differs only with the decoder output setting; while the autoencoder predicts encoder input, the predictor predicts other input.

The implementation is found here: https://github.com/rondelion/AEPredictor

A test result with MNIST rotation (to predict rotated images) is shown below after 100 epochs of training:

Sunday, July 9, 2023

Basic Salience - Saccade model

I implemented a simple salience-saccade model. Visit the repository for details. The model can be used for any (active) vision-based agent building.

In 2021, I wrote about Visual Task and Architectures. The current implementation is about the where path, saliency map, and active vision (gaze control) in the post. As for the what path, I did a rudimentary implementation in 2021. I implemented a cortico-thalamo-BG control algorithm in 2022. I also worked on the match-to-sample task of a non-visual type this year (previous post).

While I might go for experiments on minimal visual word acquisition, I should add the what path (object recognition) to the current model in any case.

Monday, April 24, 2023

Solving a Delayed Match-to-Sample Task with Sequential Memory

Introduction

This report presents a solution and implementation of a delayed Match-to-Sample Task using episode sequences. (See my post for the importance of the M2S task in AGI research.)

A delayed Match-to-Sample task is a task to determine whether a presented (target) pattern is the same as another one (sample) presented previously in the session. In the case of a graphic-based task, either the shape or the color of the presented graphic can be used as the matching attribute. In this report, a cue (task switch) is presented before sample presentation to specify the matching attribute. Both the cue and the matching pattern are low-dimensional binary vectors for the sake of simplicity.

Working memory is required to solve a DM2S task. The agent needs to remember the cue (task switch), select a part of a pattern presented as the attribute of the sample according to the cue, remember the part, and compare it with the attribute of the target pattern presented later. Due to the need for working memory, it is assumed that simple reinforcement learning cannot solve the problem.

In this report, the agent memorizes the sequences appearing in all task episodes (for a long term) and solves the task by finding a past sequence that would lead to success in the current episode (memory for a short term). Implementation has shown that in the simplest setting, the agent can solve the task after experiencing several hundred episodes in most cases.

The Method

Sequence Memory

The agent memorizes the entire input-output sequences of episodes experienced. The memory has a tree structure with the root at the end of the episode. The tree branches according to inputs-outputs, and its nodes have information on the number of successes and the number of experiences.

Using the Sequence Memory

The agent remembers the input-output sequence in each episode and searches sequences in the ‘long-term’ sequence memory that matches the current sequence and leads to success. The sequence memory is indexed with the partial observation sequences as a key to allow the longest match. Among the sub-sequences matched by the index, the one with the highest value (success rate x number of successes) at the beginning is used (the reason for using the number of successes is to eliminate the ones that succeeded due to a fluke), and the action is decided by following the rest of the sub-sequences at the end of the sub-sequence. The sequence memory for each episode corresponds to the working memory, and its ‘long-term’ sequence memory corresponds to the policy in reinforcement learning.

Architecture

Fig.1 Architecture

The agent consists of the Gate, Episodic Memory, and Action Chooser.

Gate

Attention is paid to a part of the observation and gated observation (non-attended parts are masked) is output. It also outputs whether there has been a change in observation (obs. change).

Attention is determined by the salience of the environmental input and the attention signal from Episodic Memory; if there is a definite attention from Episodic Memory and the target of the attention is salient, the part is selected; otherwise, one of the salient parts is selected as the target of attention with equal probability. If there is no salient part in the observation (if it is a 0 vector), no attention is given and the attention output is a 0 vector.

Episodic Memory

It receives gated observation, attention, obs. change, and reward from the Gate, and outputs attention instruction to Gate and action instruction to Action Chooser.

At the end of each episode, Episodic Memory registers the input-output sequence of the episode in the sequence memory.

If a (sub-)sequence of the gated observation matches a success sequence in the memory, Episodic Memory determines outputs according to the rest of the sequence. Episodic Memory receives information about attentional and action choices made ('efferent copy') from Gate and Action Chooser respectively, to be recorded in the sequence memory.

For two steps immediately after a change in the observation (obs. change), Episodic Memory chooses only ‘attentions.’ This is to allow the agent to check the situation before outputting to the external environment (it also narrows the search space).

Action Chooser

It receives an action instruction (probability vector) from Episodic Memory, performs action selection, and passes the results to the environment and Episodic Memory.

Implementation and Experimental Results

Environment/Task

Phases

The task has the following phases:

{task switch presentation, blank, sample presentation, blank, target presentation, blank}

Input/Output

The output from the environment (observation) is a binary sequence consisting of {task switch, attribute sequence, control switch}.

The number of dimensions of an attribute sequence is the number of attributes x the attribute dimension. Each attribute is a one-hot vector having the attribute dimension.

A task switch is a one-hot vector with the attribute dimension that specifies the attribute to be extracted (for implementation convenience, attribute dimension > number of attributes).

The number of dimensions of the control switch is also a binary vector of the attribute dimension, with the first column being 1 in the sample presentation phase, the second column being 1 in the target presentation and response phase, and with columns being 0 otherwise. The output of the blank phases is a 0 vector.

Reward values are either 0 (failure) or 1 (success).

There are three types of inputs (actions) from the agent: {0, 1, 2}.

Success Conditions

The environment gives success only when the attribute specified in the task switch matches the sample and target and the input from the agent in the target presentation phase is 2, or when the attribute specified in the task switch does not match the sample and target and the input from the agent in the target presentation phase is 1.

Implementation

Python and Open AI Gym are used.

The agent implementation used Python and BriCA (Brain-inspired Computing Architecture), a platform for building cerebral agents, in which information is passed between modules at each time step in defined connections.

Experimental Setup

Length (steps) of the phases

Task switch presentation: 2, Blank: 1, Sample presentation: 2, Target presentation and response: 3

Number of attributes and attribute dimensions: 2 or 3, respectively

Perplexity of inputs and actions (size of the search space)

The number of different inputs and outputs that can appear is the number shown below, and all of these must be experienced in order to gain full knowledge. Since the environment is stochastic, there is no guarantee that a complete experience can be obtained in a finite number of trials.

When number of attributes: 2, attribute dimension: 2: 2 x (4 x 3) x (4 x 5) = 480

When number of attributes: 3 and attribute dimension: 3: 3 x (8 x 4) x (8 x 6) = 3,024

Solution: task switches x (attribute values x Attention destinations) x (attribute values x (attention destinations + action types))

Results

Fig. 2 Experimental results

Vertical axis: average reward, horizontal axis: episodes x 100

Blue line: number of attributes: 2, attribute dimension: 2;

Red line: number of attributes: 3, attribute dimension: 3

The learning curves differ according to the number of attributes and attribute dimension settings. In the setting with a minimum complexity (blue line – number of attributes: 2, attribute dimension: 2), the task is solved in a few hundred trials in most cases.

Comparison with Reinforcement Learning

It was examined whether the reinforcement learning agents (vpg, a2c, and ppo from TensorForce) learn the task. The results are shown in the graph below, and it appears that proper learning does not occur.

Fig. 3 Experimental results of reinforcement learning

Vertical axis: average reward; horizontal axis: episodes x 100

Discussions

Comparison with Reinforcement Learning

The proposed system can generally solve the task if it has enough experience to tell matched sequences are not of ‘fluke.’ With the perplexity of the task (see above), it is assumed that the problem is solved with a minimum number of trials.

While a reinforcement learner may also maintain a graph of the ‘Markov’ series leading to the reward (e.g., Bellman backup tree), the sub-series are not normally memorized and used for matching. In this implementation, the number of successes is also stored to avoid ‘fluke’ sequences, whereas only probability and reward evaluation values are stored in normal RL.

Related Research

[McCallum 1995] uses case trees for problem solving and refers to further works in the context of reinforcement learning.

My post in 2022 proposed a “model of fluid intelligence based on examining experienced sequences,” a mechanism that allows agents to discover the conditions of the sequences required by the task. In the real world, it is not possible to know in advance how far back from the reward the agent should remember, so the proposed strategy could be applied to start with a sequence near the reward and extend the policy sequence if it does not work.

I also reported in another post in 2022 on an attempt to solve a delayed Match-to-Sample task with a brain-inspired working memory architecture, which did not store sequences and learned to select attention and action independently; it could not identify overall successful sequences.

Biological Plausibility

While the current implementation is not biologically plausible in that it does not use artificial neural networks (or other neural mimicking mechanisms), its design was inspired by the information processing mechanisms of the brain.

Gate incorporates the mechanisms of attention and salience maps in the mammalian visual system. If attention is thought of as eye movements, it can also be understood as the mechanism of active vision.

In the brain, episodic memory is believed to be held in the hippocampus. If so, it is conceivable that episodic memory can be recalled from partial input-output sequences and used for action selection (see [Sasaki 2018][Dragoi 2011] for the discussion of hippocampal use of sequential memory).

In the current implementation, a single module (Episodic Memory) was used to manage the control of both attention and action; it might be better to implement modules separately because they differ in terms of timing (Gate runs before Episodic Memory while Action Chooser runs after).

Information Compression

In this implementation, the environmental input is a low-dimensional vector; even so, the number of cases becomes quite large if all of the input-output pattern sequences are to be searched (see above on perplexity). When dealing with real environments, it would be necessary to compress information with deep learning or other methods to reduce the search space. The pattern matching method implemented in this study is based on perfect (strict) matching; with analog data from real environments, the use of a more flexible matching method would be a must. For this purpose, it would also be desirable to use artificial neural networks.

In the current implementation, Episodic Memory stores the masked environmental input (gated observation) as it is; if recognition of the attended attribute and the choice of the attention is used for action, the attribute itself need not be remembered, and it will reduce the perplexity.

Future Directions

Future directions may include: validation with other intelligence test tasks (e.g., analogy tasks), search for more biologically plausible architectures, tasks using image (see the information compression section above), and search for "causal inference" capabilities such as those performed by human infants.

References

[McCallum 1995] McCallum, R.A., Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State, Proceedings of the Twelfth International Conference on Machine

Learning (1995) https://doi.org/10.1016/B978-1-55860-377-6.50055-4

[Sasaki 2018] Takuya Sasaki, et al.: Dentate network activity is necessary for spatial working memory by supporting CA3 sharp-wave ripple generation and prospective firing of CA3 neurons, Nature Neuroscience vol. 21 (2018) https://doi.org/10.1038/s41593-017-0061-5

[Dragoi 2011] George Dragoi and Susumu Tonegawa: Preplay of future place cell sequences by hippocampal cellular assemblies, Nature 469 (7330) (2011)

https://doi.org/10.1038%2Fnature09633

Thursday, January 26, 2023

Memo: AGI Research Map 2023-01

This memo gives an overview of the AGI research with Fig.1 "AGI Research Map 2023-01" shown below.

Fig. 1 AGI Research Map 2023-01

1. The Choice of Approaches

The upper left portion of the figure shows the approach choices; all choices except No are Yes.

If you don’t go after human cognitive functions, you’d obtain an AGI that is not (necessarily) human-like (e.g., ~~General Problem Solver~~ or AIXI).
Note: "Not-human-like” AGI may not efficiently process tasks that are efficiently processed by humans.

If you go after human cognitive functions, you have a choice whether to go after human modeling (i.e., cognitive science). If you don’t go after human modeling, you may go after functionally human-like (but structurally not human-like) AGI (this may be a rather engineering oriented approach). If you go after human modeling, you have a choice whether to mimic the brain. If you don’t go after mimicking the brain, then you would go after (cognitive) psychological modeling. You can go after mimicking the brain and still be on engineering (reverse engineering).

If you go after human cognitive functions, you would also go after embodiment (in 3D space) and implementing episodic memory.

2. Problem Solving

The upper right portion of the figure is a classification of problem-solving capabilities. There are two broad categories there: statistical problem solving and constraint satisfaction, both of which AGI should use.

In statistical problem solving, predictions and decisions are made based on statistics. Machine learning is a type of statistical problem solving.

Constraint satisfaction requires finding a solution that satisfies given conditions (constraints). Logic (deduction) and GOFAI generally belong to it. In constraint satisfaction, statistical information can be used for efficiency.

Mathematics is a deductive system, so its operation requires constraint satisfaction.

Statistics uses mathematics (but may not use deduction while in action).

Causal inference uses both statistics and constraint satisfaction.

While hypothesis generation (abduction) is constraint satisfaction in nature, statistical information helps hypotheses generation.
In mathematics, hypotheses are created to be proved by deduction.

Algorithm generation (programming) is a kind of constraint satisfaction and is a key element for self-improving superintelligence.

Human beings have all the problem solving capabilities mentioned here.

Scientific practice is a (social) activity in which all of the problem-solving capabilities are put to use.

3. Human-specific Cognitive Capabilities

The bottom center of the figure lists human-specific cognitive capabilities (i.e., non-human animals do not have them). If you go after human cognitive functions, you have to realize these capabilities.

Linguistic functions have been considered the hallmark of human intelligence (cf. Turing Test).
Certain social intelligence such as intention understanding and theory of mind is also considered to be unique to humans.
According to [Tomasello 2009], causal thinking is also unique to humans (humans always always ask about causes).
As human children grow, they also develop a concept of quantity that is not found in other animals (mathematical intuition).

With regard to language, the subfields of linguistics, i.e., syntax, semantics, and pragmatics, are listed (phonology is omitted). If you are for generative grammar, you would go for constructive semantics as well. Meanwhile, the semantics successfully used in machine learning is distributional semantics (and embedding). Since constructive semantics is necessary for precise interpretation of sentence meaning, these semantics would have to be integrated.

If you go after development, you would go after language acquisition as well, where a system acquires language by interacting with existing language speakers in the environment (as human infants do); it learns the meaning of linguistic expressions by inferring the intent of others to use language. If you don’t go after development, you might go after systems that learn from corpora (as current large-scale language models do).

4. Essential Elements and Development Priorities

All the capabilities listed in the problem-solving section are required for AGI.

Some human-specific cognitive capabilities are optional when you pursue not-necessarily-human-like AGI; for example, an AGI agent that communicates with humans in logical formulae may not need human social intelligence nor human language acquisition capabilities.

The arrows in the figure show the relationship between the use of functions. You would have to develop those which are used before those which use them. For example, the mathematical capability would require the implementation of a deductive engine beforehand.

5. Capability Settings and Testability

In designing an artifact, you have to specify its capabilities (functional specifications) in advance. While the settings of capabilities in AGI design must be specific enough for designing tasks to test them, the tasks must be "large enough" to cover functional generality. The trade-off between specificity and generality is subject to discussion with regard to the definition of generality in AGI research.

Reference

[Tomasello 2009] Tomasello, M.: The Cultural Origins of Human Cognition, Harvard University Press (2009).

Tuesday, November 22, 2022

Remaining Issues with AGI as of 2022

Abstract

This article confirms the definition of AGI and discusses unrealized functions of human-like AGI as of 2022, which include fluid intelligence, generative rule handling with case-based AI, making out in the real-world, social intelligence, language acquisition, and mathematics. (The article is an English version of a proceedings article in Japanese for a local workshop on AGI [2022-11-22].)

1. AGI and General Intelligence

General intelligence, which is a part of the term Artificial General Intelligence (AGI), is a psychological term, originally postulated as one or a few general problem-solving factors in the measurement of human intelligence [1]. Factors of intelligence are determined by statistically processing the results of intelligence tests. The CHC model is an attempt to comprehensively enumerate the factors.

While AGI has not been unanimously defined in the community, it is generally considered to be an attempt to provide artifacts with problem-solving abilities that can deal with problems beyond those assumed at the time of design. (AI that solves only the problems assumed at the time of design is called "narrow AI" as opposed to AGI.)

While, as indicated above, human general intelligence and the general intelligence for AGI are different by definition, this article gives examples of what the current AI has not achieved from the standpoint that AGI should achieve "at least" human intelligence or human problem-solving abilities.

2. Fluid Intelligence

Fluid intelligence, posited as one of the intelligence factors, is "the ability to solve novel, abstract problems that do not depend on task-specific knowledge" [2] and is often regarded as a central part of human intelligence. By this definition, fluid intelligence is closely related to the "problem-solving ability to deal with problems beyond design assumptions" required for AGI. (Note: A more general discussion of fluid intelligence as "policy generation" is given in [3] (Chapter 12)).

Fig.1 Raven Progressive Matrix

CC BY-SA 3.0 Life of Riley @Wikimedia

In a matrix reasoning task, subjects are presented with a matrix, where a cell in the last row is blank. Subjects discover a rule from the pattern shown in the other rows and apply the rule to the last row to fill in the blank cell.

Tasks to measure fluid intelligence require the ability to conduct an internal search from one or a small number of examples presented to find a solution while generating hypotheses (cf. my blog article). The Raven Progressive Matrices (RPMs; see Fig.1) are typical intelligence test tasks that measure fluid intelligence. A review article [4] summarizes attempts to solve RPMs using deep learning, and describes the problem of insufficient generalization with deep learning. Humans solve tasks without being given a large number of task examples in advance as they discover the regularities/rules while dealing with the tasks. Thus, to realize fluid intelligence in AGI, it would be important to implement the ability to discover rules (see my previous post).

3. A Theoretical Problem: Generative Rules and Case-Based AI

Current mainstream machine learning-based AI is basically case-based, which tries to solve problems with a large number of examples. Case-based AI cannot, in principle, solve problems that do not exist in the examples or in their generalization. Meanwhile, human languages use generative rules, which can generate an infinite number of patterns from a finite set of rules and vocabulary. A finite set of cases can not, in principle, cover the infinity to be generated by rules. Besides natural languages, computer languages, logic, and mathematics are examples of systems based on generative rules.

The inability of case-based AI to cover generative rule-based phenomena does not mean that AI in general cannot handle them; "good old" symbolic AI often handled generative rules. Given the success of case-based AI, it will be important to incorporate generative rule handling into case-based AI.

Notes: For a discussion rather favoring the symbolic approach to case-based AI, see [5]. cf. related conference: The Challenge of Compositionality for AI and a recent talk.
For a successful example of combining deep learning and symbolic search, see MuZero [6].

4. Dealing with the Real World

Intelligent robots that work in the real world like humans are not yet available. For example, we are yet to have a robot proposed by Wozniak, which can make coffee in a kitchen it enters for the first time. While the current mainstream ML-based AI is case-based as pointed out above, it lacks enough experience (cases) in the real-world. While data for learning is often collected from the Internet, data from interaction of agents with the real (3D) world or of "lived experience” is scarce. Note that research on real-world interactions of artificial agents has been made in the field of cognitive (developmental) robotics [7] [8].

5. Social Intelligence

Humans begin to infer the intentions of others as infants [9] and often acquire a "theory of mind" before reaching school age. Such intelligence has not been realized in AI. Because society is also part of the real world, lived experience is required to learn social intelligence. While data for social intelligence can be collected in cognitive developmental robotics and cognitive psychology, human social intelligence may require genetically wired mechanisms (or prior knowledge), which are studied in broader cognitive science such as neuroscience.

6. Language Acquisition

Linguistic competence is the ability to appropriately handle the phonological, morphological, syntactic, semantic, and pragmatic aspects of a language. As grammar is a set of generative rules, its appropriate handling requires the ability to handle generative rules (see above) [10]. Case-based AI can handle "meaning" hidden in the distribution of words and associations between words and images appearing in data sources (corpus). Since the meaning of a complex linguistic expression such as a sentence is synthesized from the meanings of its components by generative rules, the ability to handle generative rules is also necessary to handle compositional semantics. Meanwhile, "lived experience" (see above) is required to handle semantics grounded on real-world experience (cf. The symbol grounding problem has been partially solved [11]). Pragmatic competence is social intelligence acquired through the practice of linguistic exchange (language games) with others; so, again, lived experience is necessary. Linguistic competence requires the ability to handle generative rules and the lived experience of language practice, both of which have not yet been fully integrated to the current AI.

Human language acquisition begins in infancy. Infants are assumed to have an innate ability to handle generative rules in addition to statistical learning. Infants are also able to infer the intention of their caregivers to understand the relationship between words and their referents (see social intelligence above). Given these facts, AI's acquisition of linguistic abilities would profit from research on human language acquisition.

7. Mathematics

According to mathematical logic, mathematics can be viewed as a system of "generative rules" (see above). In fact, case-based AI cannot even handle addition [12][13]. On the other hand, the part of mathematics formulated in first-order predicate logic can be handled by the Good Old symbolic AI (e.g., quantifier elimination solvers).

If AI is to imitate human mathematical abilities, cognitive scientific research on human mathematical abilities (to handle numbers and quantity) would be necessary (cf. this is an area J. Piaget, et al. pioneered).

8. Summary

This article discussed the unrealized functions of current AI compared to human intelligence. Specifically, case-based AI cannot handle generative rules, so it cannot handle syntactic and compositional semantics of language nor mathematics. It was also pointed out that current AI suffers a paucity of lived experience.

As classical symbolic AI handled generative rules, it is important to make case-based AI handle generative rules (philosophically, it is a synthesis of empiricism and rationalism).

It was suggested that cognitive robotics research will be important to address the issue of lived experience for AI.

Finally, it is noted that the insights of cognitive science in general will be important for AGI research in terms of learning from human intelligence.

References

[1] Spearman, C.: General Intelligence, Objectively Determined and Measured, The American Journal of Psychology, Vol.15, No.2, pp.201—292. doi:10.2307/1412107 (1904).

[2] Kievit, R.A., et al.: A watershed model of individual differences in fluid intelligence, Neuropsychologia, Vol.91, pp.186–198 (2016)　doi:10.1016/j.neuropsychologia.2016.08.008

[3] Hernández-Orallo, J.: The Measure of All Minds: Evaluating Natural and Artificial Intelligence, The Cambridge University Press (2017)

[4] Małkiński, M., et al.: Deep Learning Methods for Abstract Visual Reasoning: A Survey on Raven’s Progressive Matrices, arXiv,　doi:10.48550/arXiv.2201.12382 (2022)

[5] Marcus, G.: The Next Decade in AI: Four Steps Towards Robust Artificial Intelligence, arXiv, doi:10.48550/arXiv.2002.06177 (2020)

[6] Schrittwieser, J. et al.: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model, arXiv, doi:10.48550/arXiv.1911.08265 (2020)

[7] Pfeifer, R., Bongard, J.: How the Body Shapes the Way We Think: A New View of Intelligence, MIT Press (2006)

[8] Cangelosi, A., et al.: Developmental Robotics: From Babies to Robots, MIT Press (2015)

[9] Gergely, G., Bekkering, H. & Király, I.: Rational imitation in preverbal infants. Nature 415, 755 (2002). doi: 10.1038/415755a

[10] Delétang, G. et al.: Neural Networks and the Chomsky Hierarchy, arXiv, doi:10.48550/arXiv.2207.02098 (2022)

[11] Steels, L.: The symbol grounding problem has been solved, so what’s next?, in Symbols and Embodiment: Debates on meaning and cognition, doi:　10.1093/acprof:oso/9780199217274.003.0012 (2008)

[12] Brown, T., et al.: Language Models are Few-Shot Learners, ArXiv, doi: 10.48550/arXiv.2005.14165 (2020)

[13] Fujisawa, I., et al.: Logical Tasks for Measuring Extrapolation and Rule Comprehension, ArXiv, doi: 10.48550/arXiv.2211.07727 (2022)

Monday, November 21, 2022

A Model of Fluid Intelligence based on Examining Experienced Sequences

[日本語版]

Abstract

This article proposes a model of rule/policy discovery based on examining experienced sequences. Fluid intelligence, as measured by intelligence tests, can be viewed as the ability to discover policies for solving problems from one or a small number of examples. Agents with fluid intelligence examine a small number of experienced time series to discover common rules. If the sequence is not present, memory recall (replay) would be used. The proposed model ”goes over” experienced sequences and extracts elements such as attributes, relationships among the input elements, and agent actions, to generate hypothetical policies.

1. Introduction

AGI as engineering is an attempt to give artifacts a general problem-solving capability beyond design. General intelligence was originally postulated as a factor of one or a few general problem-solving capabilities in the measurement of human intelligence [1]. Fluid intelligence was postulated as one of the factors that make up general intelligence. While the original definition of fluid intelligence [2] was "the ability to recognize relationships," the definitions in the community vary. Kievit et al. summarize fluid intelligence as "the ability to solve novel, abstract problems that do not depend on task-specific knowledge" [3]. More generally, Hernández-Orallo [4](Chapter 12) addresses fluid intelligence as a policy-generating capability. In an intelligence test, the subject is required to find a policy for solving the problem from one or a few examples. This requires the ability to conduct an internal search, generate multiple hypotheses, and find a solution, and it would be the central ability of fluid intelligence. In the following, fluid intelligence is regarded as the ability to discover policies for problem solving from one or a few examples. Note that while there are attempts to solve fluid intelligence tasks such as Raven’s Progressive Matrices (see Appendix) with deep learning methods [5], if they have learned with ample task data similar to the task to be tested, they are using crystallized intelligence rather than fluid intelligence to solve it.

In intelligence test-like tasks (see "Appendix: Assessment Tasks"), abstraction is necessary, for the same situation is not normally repeated. The abstract elements of the solution include the attributes of the input elements, the relationships between the attributes, and the actions of the agent. Policy discovery, including abstraction, is a process of induction. While machine learning is also inductive, a difference lies in the number of samples. Fluid intelligence in intelligence testing requires finding common structures from a small number of samples. This ability is useful in devising solutions to problems encountered in a variety of situations. In the following, a model of rule (policy) discovery based on examining experienced time series is proposed.

2. Model of the Discovery Process

Discovery of rules (policies) from experienced series is done by "going over" the the series:

If the entire problem is not presented to the agent at once, a replay is performed, otherwise the agent goes over the presented scene.
Elements (attributes of input elements, relations among the attributes, and actions of agents) are extracted from the success series to form a hypothetical policy series.
Various heuristics can be used to determine which elements are prioritized in the hypothetical policy. (e.g., Emphasis is placed on elements in close temporal and spatial proximity and the relationships associated with them.)
Elements in the failed series are discounted.
The hypothetical policy is verified with one or more series.
Hypothetical policies that fail verification are stored as rejected policies so that they will not be tested/used again.

3. Required Mechanisms

Mechanism to go over spatial scenes by gaze (eye) movement for problems presented visually
Mechanism to go over a temporal sequence – replay mechanism which recalls memorized sequences for policy generation and validation
Mechanism to generate policy elements; e.g., attribute extraction (abstraction) and discovery of relationships between elements
Mechanism to give preference: preferences are useful for the search process to select policy elements.
Mechanism to create a hypothetical policy series by adopting policy elements
Mechanism to store hypothetical policies
Mechanism to determine whether a hypothetical policy can be applied to a spatial scene or temporal (replayed) series
Mechanism not to use rejected series
Working memory – required for various tasks
Mechanism to control the process as a whole

4. Policy Generation, Verification, and Application

Based on the required mechanisms, the process of policy generation, verification, and application can be summarized as follows:

Policy generation

Go over the successful series and create a series that reproduces the input elements.
Attention is given preferentially to a specific attribute or relation in the sequence.
Generate a hypothetical policy series from the sequence of attributes or relations extracted with the given attention.

Policy Verification

A series may be made by trial runs or by memory recall (replay).
Recall the hypothetical policy from the (trial or recalled) series, and try to apply it (see below).
If a (trial or recalled) success series matches the policy, retain it for further validation with other success series.
If a (trial or recalled) failure series matches the policy, reject the policy.

Applying policy to a series

Apply the policy to the sequence, starting with the first element in the sequence and checking for a match to each recalled policy sequence element in turn.
If the application of a policy element fails, then the policy fails.

5. Implementation with (Artificial) Neural Networks

If the elements (attributes and relations) are entirely symbolized and provided, the mechanism above could be implemented by a symbolic (GOFAI) algorithm. If the elements are not clearly defined, it should be difficult to create a symbolic algorithm, and implementation would require fuzzy pattern matching and learning functions as found in (artificial) neural networks. Note that problems must be solved without having been exposed to similar tasks even when learning is required. In the following, hints for implementation with (artificial) neural networks are presented in line with the Required Mechanisms.

Mechanism to go over spatial scenes
The mechanism of saccade control by the brain can be imitated.
Mechanism to go over temporal sequences
Experienced sequences are recalled from other sequences and used for policy generation and validation. Since no generalization is needed in the memory of series, a simple non-learning storage device could be used.
Since replay is believed to occur in the hippocampus in the brain, the hippocampal mechanism can be imitated. Meanwhile, as the phonological loop (working memory for speech) [6] is assumed to be located in the cortex, extra-hippocampal cortical circuits may also have replay-like functions.
Mechanism for generate policy elements

Attribute extraction (Abstraction)
It is known that abstraction occurs in artificial neural networks through learning.
Discovery of relations between elements
Relations (e.g., co-occurrence of attributes) among elements can be extracted with artificial neural networks. In order for a neural network to recognize transformation (such as rotation) of a figure, it must have learned the transformation.
When policy elements are created during replay, it would be better to have a mechanism to control the timing of replay to create a time margin for processing. [Note]

Mechanism to give preferences
Preferences such as for spatial proximity can be incorporated into the structure of a neural network.
Mechanism to create a hypothetical policy series by adopting policy elements/Mechanism to store hypothetical policies

Policy elements are recalled and adopted by attention. A certain mechanism (e.g., winner-takes-all) would be needed to select an element for attention.
The series formed in the system can be stored in a mechanism similar to replay.
Policy elements are pairs of attributes or relations to be selected and attention.

Mechanism to determine whether a hypothetical policy can be applied to a spatial scene or replayed temporal series / Mechanism not to use rejected series

Matching a hypothetical policy with memorized series could be implemented with the pattern matching function of a neural network.
Policies that match the failed series are classified as rejected, and will not be used.

Working memory – networks such as a bi-stable network could be used.
Mechanism to control the process as a whole
The process is repeated until a policy consistent with all the presented series is generated.

Note: Policy elements become the object of attention (i.e., made aware) when they are added to the policy. In this sense, policy generation involves System 2 in dual-process theory [7], which also makes policy verbalization possible. However, other processes are not necessarily brought to attention.

6. Conclusion

This article has only suggested a model. Future work would include its psychological validation and/or software implementation. A literature survey on brain regions and functions corresponding to the model will be necessary to support it from the neuroscientific viewpoint. Since policies discovered by the model include the actions (operations) of the agent, the mechanism is to discover at least one class of algorithms. By examining how general the class of algorithms it discovers, it will be possible to evaluate it as a model of general intelligence.

References

[1] Spearman, C.: General Intelligence, Objectively Determined and Measured, The American Journal of Psychology, Vol.15, No.2, pp.201--292. doi:10.2307/1412107 (1904).

[2] Cattell, R.B.: The measurement of adult intelligence, Psychol. Bull., Vol.40, pp.153-–193. doi:10.1037/h0059973 (1943)

[3] Kievit, R.A., et al.: A watershed model of individual differences in fluid intelligence, Neuropsychologia, Vol.91, pp.186–198 (2016)
doi:10.1016/j.neuropsychologia.2016.08.008

[4] Hernández-Orallo, J.: The Measure of All Minds: Evaluating Natural and Artificial Intelligence, The Cambridge University Press (2017)

[5] Małkiński, M., et al.: Deep Learning Methods for Abstract Visual Reasoning: A Survey on Raven’s Progressive Matrices, arXiv, doi:10.48550/arXiv.2201.12382 (2022)

[6] Baddeley, A.D., Hitch, G.J.: Working Memory, In G.A. Bower (Ed.), Recent advances in learning and motivation (Vol. 8, pp. 47-90), New York: Academic Press (1974)

[7] Kahneman, D.: A perspective on judgement and choice, American Psychologist, Vol.58, No.9, pp.697-–720. doi:10.1037/0003-066x.58.9.697 (2003)

[8] Joyner, A., et al.: Using Human Computation to Acquire Novel Methods for Addressing Visual Analogy Problems on Intelligence Tests, ICCC (2015) [PDF]

[9] Carpenter, A., Just, A., Shell, P.: What one intelligence test measures: A theoretical account of the processing in the Raven Progressive Matrices Test, Psychological Review, 97(3) doi: 10.1037/ 0033-295X.97.3.404 (1990)