Monday, November 18, 2013

Learning attitude control

In the previous post, a supposedly swimming robot was set up in the Sigverse simulator, where I wrote;
The motion is out of control now.
So the current issue is attitude control perhaps by reinforcement learning.
It took a while for experimenting attitude control learning.  Though I've got ideas from actor-critic learning, I used simpler algorithms as the task is simple. Here's the idea (& result).
  • Body rotation & the stabilization goal
    The body to be  stabilized rotates and the goal of stabilization is to align the bottom center to the vertical direction.  The action of the body is acceleration to either direction.
  • States
    The learner learns policies mapping from states to actions.
    The states are represented with an analog vector whose dimensions represent the rotation angle of the body.  In fact, the angle of the body is mapped to the vector by gaussian filters corresponding to the dimensions (and the angles allocated to the dimensions).  For example, if there are 36 dimensions, each of them is allocated to a 10 degree notch.  The gaussian closest to the current rotation angle output the largest value.
  • Linear architecture
    The learner learns the mean acceleration for the current angle.  The mean acceleration is modeled as a linear sum of the state vector so that the learner learns the sum weights.
        average(output) = Σ wi * vi
    The actual output (acceleration) is determined by a gaussian distribution with the mean acceleration given by the linear architecture and the variance, which is also learned.
  • Weights are learned with one of the following algorithms:
    • Average success
      The weight is set to the average output for the state.
    • Estimation difference
      Move the weight toward the actual output in proportion to (actual output - estimated output), where the estimated output is the mean acceleration given by the linear architecture.
  • Reward
    Action success is measured by the following rewards:
    • When the bottom sways away from the vertical position, then speed reduction is the reward.
    • When the bottom is approaching to the vertical position, then the approach is the reward.
    Perhaps simpler rewards may work as well.
  • Variance adjusting
    When the action is successful, then adjust the variance σ in the following way (to adjust σ according to the difference to the discrepancy between the actual output and the estimate):
    σ = σ + α × reward × (|output - estimate| - σ)
    where α is the learning rate.
  • Result
    Either algorithm worked (somehow the body stabilizes very quickly).  Learning seems to converge faster with the average success.  I opt for the average success as it is an accumulative method (the other method has no accumulative memory of the past).  The estimation difference method also requires a mechanism to reduce its learning rate to fix the learning, which I didn't implement this time.
  • Current plan
    This trial was not implemented as a Sigverse simulation but a stand-alone Java program.  A Sigverse implementation is due where rotation will be three dimensional. 

Friday, October 18, 2013

Phase I Progress Memo 1

In my previous post, I delineated my research plan (as of 2013-09).
In the overall plan for the language acquisition experiment, Phase I was described as :

Phase I: Robot & World Setting

  • Basic Ideas
    • Locomotion: Rover[with wheels]or Swimmer[like fish]
    • Vision: chromatic, binocular & without saccade
    • No manipulator
      Still the robot can interact with objects by physical contact.
      The robot may have a 'mouth-like' manipulator afterwards.
    • The robot acts autonomously/spontaneously.Spontaneity may be motivated by the motion itself, attitude (posture) control, learning new motions, learning new objects, etc.  Learning by spontaneous acts can be said to be learning by 'playing' or curiosity.
    • Designing and implementing on drives related to learning will not be done in Phase I.
  • Simulator design & implementation
    Spontaneous actions should be designed elaboratively.

Below is the current status of progress for Phase I (as of 2013-10):
  • Swimmer (Fish) has been chosen for the robot
    setting.  The robot will have vernier-thruster-like
    kinematic controllers (2nd photo) instead of tilt
    rotors in the original sketch (for the sake of

  • SigVerse has been chosen for the robot simulator, which can obtain visual data from the robot.
  • Currently unable to give a fish shape to the robot in SigVerse, so that it has the shape of a humanoid head flying around (see the photo ↓).
    Kinda spooky but so be it for the moment.
  • The motion is out of control now.
    So the current issue is attitude control perhaps by reinforcement learning.
  • Spontaneous motion may be implemented after having made its attitude stable.

Monday, September 9, 2013

My New Research Plan (2013-09)

I changed the plan.
There are two major changes.

  • I would use simulator software rather than a real robot.
  • I would focus on language acquisition rather than using conventional NLP.


Research on language acquisition (symbol grounding) with a robot simulator.

Language Acquisition

Research will follow the human (infant) language acquisition process.  More concretely speaking, the simulated robot shall learn labels on the properties, movements and relations of objects, based on the cognition of bundles of coherent and continuous properties (Spelke's objects). It shall utter with learned labels. Moreover, it shall create adequate internal semantic representations and relate them with corresponding syntactic structures.
(See below for more details.)

Core cognitive architecture

The core cognitive architecture shall have the following functions.
Upon designing and implementing the cognitive architecture, generic mechanisms shall be (re)used whenever possible.
  • Time-series pattern learner/recognizer
    having motor commands and sensory input as their input
  • Attention and situation assessment
    on what will be learned and which action will be taken.
    It is part of the executive control function.
  • Cognitive model based on association
    to memorize and recollect (temporal) generative patterns as associative sequences.
    Linguistic competence will be realized with this model.
    It contains backtracking mechanism based on the function of attention and situation assessment mentioned above.
  • Episodic memory
    Patterns (the representation of situations -- combinations of abstract patterns created by (non-supervised) learning) positively assessed by the attention and situation assessment will be memorized.
 cf. A Figure of Human Cognitive Function


  • Robot simulator
    Sigverse, etc.
  • Visual data processing
    OpenCV, etc.
  • Speech recognition/synthesis: option
    (Research can be carried out on the text basis.)
  • Learning module
    SOINN, k-means, SOM, SVN, DeSTIN, HMM, etc.
    (To be used as plug-ins depending on the purpose)

A Tentative Research Steps

Phase I: Robot & World Setting

  • Basic Ideas
    • Locomotion: Rover[with wheels] or Swimmer[like fish]
    • Vision: chromatic, binocular & without saccade
    • No manipulator
      Still the robot can interact with objects by physical contact.
      The robot may have a 'mouth-like' manipulator afterwards.
    • The robot acts autonomously/spontaneously.Spontaneity may be motivated by the motion itself, attitude (posture) control, learning new motions, learning new objects, etc.  Learning by spontaneous acts can be said to be learning by 'playing' or curiosity.
    • Designing and implementing on drives related to learning will not be done in Phase I.
  • Simulator design & implementation
    Spontaneous actions should be designed elaboratively.
an example fish robot

Phase II: Recognizing Spelke's Objects

  • Basic Ideas
    • Spelke's Object: coherent, solid & inert bundle of features of a certain dimension that continues over time.
      Features: colors, shapes (jagginess), texture, visual depth, etc.
    • While recognition of Spelke's objects may be preprogrammed, recognized objects become objects of categorization by means of non-supervised learning.  In this process, hierarchical (deep) learning would be done from the categorization of primitive features to the re-categorization of categorized patterns.
    • Object recognition will be carried out within spontaneous actions of the robot.
    • The robot shall gather information preferentially on 'novel' objects (curiosity-driven behavior) ('novelty' to be defined).
  • Determining the recognition method & implementation
  • Recognition experiments
  • Determining (non-supervised) learning methods  & implementation
  • Experiments on object categorization
  • Designing novelty-driven recognition & behavior & implementation

Phase III: Labeling

  • Basic Ideas
    • The robot shall relate specific types of Spelke's objects it looks at with linguistic labels.
    • Labels may be nominals representing shapes and adjectives representing features such as colors.
      Types of objects may be learned in a supervised manner with labels or have been categorized by non-supervised learning.
    • The robot shall utter labels on recognizing types after learning association between the labels and types.
    • The robot shall recognize/utter the syntactic pattern 'adjective + noun'.
  • Determining the recognition method & implementation
  • Designing and implementing mechanism for handling syntactic structure.
  • Labeling experiment

Phase IV: Relation Learning

  • Basic Ideas
    • The robot shall learn labels for
      • object locomotion such as moving up/down, right/left and closer/away
      • orientational relations between objects such as above/below, right/left and short/thither
    • Objects should be get the robot's attention by force (programming) or by certain preprogrammed mechanism of attention (such as attention to moving objects). 
  • Designing & implementing labeling mechanism
  • Experiments

Phase V: Linguistic Interaction

  • Basic Ideas
    • The robot shall answer to questions using labels learned in Phase III & Phase IV.
    • The robot shall respond to requests on its behavior.
    • The robot shall utter clarification questions.
  • Designing & implementing mechanism for linguistic interaction
  • Experiments

Phase VI: Episodic memory

  • Basic Ideas
    • Episodes (situations) to be memorized are the appearance of objects and changes in relations among them.
    • The memory of novel objects and situations is prioritized.
  • Designing & implementing episodic memory and attentional mechanism
  • Designing & implementing episodic recollection & linguistic report.
  • Experiments

Phase VII: More complicated syntax

  • Basic Idea
    The robot shall understand/utter linguistic expressions having nested phrase structure.
  • Designing & implementing nested phrase structure.
  • Experiments

Sunday, August 11, 2013

My New Research Plan (2013-08)


To create a rover that explores the environment, learns from it and communicates with human-beings in a human language.
The cognitive model of the rover shall be based on association. The real purpose of the research is to verify the feasibility of cognitive models based on association.

The cognitive model based on association

In the model, generative patterns are represented as series of association.
Patterns can be visual, auditory, tactile, and linguistic.

Recognition of situations

The pattern representing a situation could be obtained by integrating pattern recognition results from various modalities within a time series.
An associative series of the situation is a "semantic network", which gives the semantics for the linguistic function.


The learning mechanism will be neural networks (in a broad sense).
The learning includes the recognition of exterior things, the episodic memory and the categorization of situations.  Categorization may be done by supervised and/or unsupervised  learning.
The rover will learn from its voluntary actions.


The interface between syntax (parsing/generation) and semantics (association of situations) shall be learned (acquired), by means of association-based cognitive models.
While morpheme dictionaries may be given initially, vocabulary acquisition will be an issue in the future.
The reason for the linguistic interface is that language will be the key for realizing the human-level intelligence, besides it is an effective means for communicating with human-beings.
The rover will have basic language features such as the following .

  • Description of the scene (verbalization of situation recognition)
  • Response to (human) questions
  • Response to (human) instructions


Human (animal) infants tend to have the innate ability to recognize other individuals and communicate with them. The rover will be given certain recognitive abilities such as face recognition, motion capture, speech recognition and gaze recognition.

Core cognitive architecture

  • Time-series pattern recognizers having motor commands and sensory input as their input
  • Attention and situation assessment
  • Episodic memory (← pattern recognizers, situation assessment & attention)
  • Backtracking (parsing and planning with certain evaluation functions)

 cf. A Figure of Human Cognitive Function

System configuration


  • Locomotive function
    Roomba / Kobuki, etc.
  • Visual function
    Early stage: Kinect
    Later adding saccade
  • Acceleration sensor
    (works also as collision detector)
  • Audio function
  • On board PC (notebook)
  • Wireless connection (WiFi / BlueTooth) for monitoring


  • OS
    ROS, etc.
  • Visual data processing
    Peripheral vision, central visual field, optical flow, depth recognition
    Object detection, tracking, motion capture, facial recognition, gaze recognition
    Kinect software, OpenCV, etc.
  • Speech recognition
    HARK, etc.
  • Speech synthesis
    Vocaloid, etc.
  • Learning module
    SOINN, k-means, SOM, SVN, DeSTIN, HMM, etc.
    ※To be used as plug-ins depending on the purpose.

A Tentative Research Steps

Phase I: Kinect + OpenCV + HARK + Vocaloid (preliminary exercises)

  • Checking for visual functions (facial recognition, motion capture)
  • Checking for speech recognition function
  • Checking for speech synthesis function
  • Implementating a conventional linguistic (speech) interface
  • Experimenting on visual experience reports with a conventional linguistic interface

Phase II: Pattern Recognition

  • Selection and implementation of time-series pattern recognizers
  • Visual pattern recognition experiment
  • Experiments on pattern recognition reports with a conventional linguistic interface

Phase III: Episodic memory

  • Defining the situations to be remembered
  • Implementing episodic memory and attentional mechanism
  • Experimenting on episode reports with a conventional linguistic interface

Phase IV: Eye movement

  • Kinect may be put on a movable (controllable) stage (w. an acceleration sensor)
  • Human tracking
  • Behavior control (extending a conventional language generation mechanism) 
  • Gaze induction by instruction with a conventional linguistic interface
  • Q & A with a conventional linguistic interface

Phase V: Roaming (Roomba / Kobuki)

  • Coupling vision and roaming (reflexive) 
  • Defining the relation between attention and roaming ("curiosity")
  • 3D object learning/recognition via roaming
  • Instruction of motion with a conventional linguistic interface

Phase VI: Design and Implementation of a non-conventional (associative) linguistic information processing

Saturday, July 27, 2013

Information Binding with Dynamic Associative Representations

Now my paper for the FormalMagic workshop (at the the AGI-13 conference) is available on line

Thursday, May 23, 2013

Language Generation (Human Cognitive Functions)

This is the last posting of explaining the figure on human cognitive functions. Today's topic is Language Generation.

Language Generation is the process in which a network of association (or a semantic network) is mapped to syntactic structure (phrase structure).  The choice of a part of the network to be uttered leads to the generation of a syntactic structure and a linear phonetic representation.  Generated phonetic representation (image) may be 'parsed back' to semantic/pragmatic representation so that its effects are evaluated.  The process is thought to be a special case of planning, though the processing of syntactic structure would be automatically processed in a specialized module as in the case of parsing.

Wednesday, May 22, 2013

Language Understanding (Human Cognitive Functions)

This is the 7th posting of explaining the figure on human cognitive functions. Today's topic is Parsing (Language Understanding).

As parsing seems to be done in the neo-cortex in human language processing and it apparently requires temporal pattern learning, it is thought to be the function of THC.  On the other hand, parsing is automatically done in a specialized module (see The Modularity of Mind by J. Fodor).   The output of parsing yields the representation of situations (if language understanding is successful), which is thought to be a network of association (or a semantic network).
Part of the representation of situations would be learned through interaction with the exterior world by means of perception and motion, and relation with language expression learned in the process of language acquisition.  We acquire more abstract concepts without direct correspondence with objects in the exterior world by expanding the basic semantics that is innate or acquired by early interaction with the world.

Tuesday, May 21, 2013

Planning (Human Cognitive Functions)

This is the 6th posting of explaining the figure on human cognitive functions. Today's topic is Planning.

Planning, which comes after Imagining Situations, is thought to be the imagining process to construct the representation of action sequences evaluated
 (with the past (learned) experience) as good enough for a given goal representation. If an action sequence is not evaluated as preferable, then another sequence would be imagined by some backtracking mechanism.
Planning is in most cases done consciously and is thought to involve Working Memory (or the executive function).  While WM is omitted from the figure, any cognitive function would involve it if the process is carried out reflectively or consciously.

Monday, May 20, 2013

Imagining Situations (Human Cognitive Functions)

This is the 5th posting of explaining the figure on human cognitive functions. Today's topic is Imagination.

The line Imagining Situations coming from Association is in fact the major function of association, which presupposes Perceiving Situations discussed earlier, where it is suggested that situations may be represented as sequences of association.  Since THC (Temporal Hierarchical Categorizer) can associate temporal patterns with other temporal patterns, if sequences of association representing situation are given to THC as input, it will be able to associate the representation of a situation with another one.  Here, the input is not sensory but it is the representation of situations so that THC constitutes a recurrent loop.
Situations to be imagined are not only past situations but also ‘imaginary’ situations created by combining parts of (past) situations.

Sunday, May 19, 2013

Episodic Memory (Human Cognitive Functions)

This is the 4th posting of explaining the figure on human cognitive functions. Today's topic is Episodic Memory.

Episodic memory comes next to Perceiving Situations, for an episode is an individual situation, thereby the relation between Perceiving Situations and Episodic Memory parallels with that between Object Category Recognition and Individual Object Recognition mentioned above.  Objects and relations within an episode should be individuals.
While the types of recognition previously discussed can be learned ‘stochastically,’ episodic memory requires one-shot memorizing.  In the brain, situations to be memorized would be chosen by clues such as novelty so that signals necessary for memorizing them would be provided (perhaps with long-term potentiation in the hippocampus).

Saturday, May 18, 2013

Perceiving Situations (Human Cognitive Functions)

This is the third posting of explaining the figure on human cognitive functions. Today's topic is the perception of situations.

Perceiving Situations is another function of THC (the Temporal Hierarchical Categorizer).  A small animal may perceive, for example, a situation in which it should hide itself upon detecting a moving spot in the sky.   A human being may recall the situation type of Fire upon sensing the smell of burning.   The representation of a Situation normally contains those of objects, their features and relations among them (as in some formal semantic theories).  For example, the representation (frame) of the Eating situation contains the representations of Eater, Food and the (Eating) relation between them.   As the representation of a situation is a combination of its elements, they are normally constructed ad hoc.
While the representation of a situation can be complex, it does not have to be represented at the same time (synchronously) but can be represented dynamically by associating its components one by one.
Dynamic representation can be applied to perception such as complex visual scene perception or multi-modal perceptual integration, where information would be bound by a series of association (see Information Binding, the dotted line from Association in the figure).

Friday, May 17, 2013

Object Recognition (Human Cognitive Functions)

This is the second posting of explaining the figure on human cognitive functions. Today's topic is object recognition.

2a Object Category Recognition

The line Object Category Recognition is rooted in Temporal Hierarchical Categorizer (THC) in the figure.  This is a function of THC to recognize exterior objects.  It is pattern recognition, but its realization would not be easy.  To think of vision, an object would hardly enter into the visual field again with the previous conditions; an object would have various orientations, be under various lighting conditions and have focused images in various parts of the retina.   As multi-layered neural networks such as deep learning networks are achieving some good results with object category recognition, one would say the brain works in a similar way.  However, artificial learning mechanisms (including deep learning) normally require extensive tuning and it is not clear how is the brain tuned in learning yet.

2b Individual Object Recognition

Individual Object Recognition comes next to Object Category Recognition.  The function recognizes an individual, e.g., Tom after getting acquainted with him.  Unlike a category, an individual does not have instances and does not have different values for its feature at the same time (Leibniz’ law).  A human being may inherently has the concept of individuals or may learn it by observing physical objects traversing in time-space.  In any case, the mechanism of individual object recognition is different from category recognition.

2c Belief in External Objects

Belief in External Objects also comes next to Object Category Recognition.  The function recognizes things as being exterior.  Philosophically speaking, nobody can be completely sure that there are things beyond the mental realm.   However, if an agent recognizes (the category of) ‘external’ objects, learns how to interact with them and thereby gets into ‘healthy’ relation with them, then it could be said to believe in the external objects (with the intentional stance).

2d Belief in Other Minds

The line Belief in Other Minds (Theory of Mind) comes next.  This belief is the recognition of other people as having the mind similar to the believer itself.  While Theory of Mind (ToM) may partly be learned from interaction with others, considering the fact that not every human being has ToM, some part of it would be genetic (mirror neurons would be part of it).

Thursday, May 16, 2013

A Figure of Human Cognitive Functions

In this and the subsequent postings, I'm going to present a simplified and organized view of major human cognitive functions with an illustration (mind map -- the figure below) to serve for AGI designs.  (The branches in the map indicates cognitive functions and branching indicates their specialization.)

In this page, I'll explain the path from sensory input to motor output via pattern recognizers and association (the blue arrows 1a & 1b).

1a Temporal Hierarchical Categorizers

Temporal Hierarchical Categorizers (THC, hereafter) are pattern recognizers that classify (categorize) sensory input.
  • THC’s Categorizing function is acquired through supervised or unsupervised learning, so that a Categorizer may acquire categories of sensory input by itself.  In last decades, various (computational) neural networks have been proposed to explain automatic categorizing of input data, suggesting that biological neural networks may have such a function.
  • THC is Hierarchical.  While many of proposed neural network models were hierarchical, a new kind of hierarchical models called 'Deep Learning' has made successes in categorization tasks in recent years.  The importance of hierarchy in neo-cortical modeling has been also emphasized in the books by J. Hawkins and R. Kurzweil (the term THC, of course, echoes Hawkins’s Temporal Hierarchy Memory).  For pattern recognition in mammalian cerebra is carried out by the hierarchy of neo-cortices, the Categorizer is thought to be hierarchical.
  • THC is Temporal; for animals or robots to interact with the environment, the patterns they have to deal with are time series.  If a pattern recognizer is to be modeled as a neural network (model), a recurrent network would do the job.  The figure's reference to the diencephalon suggests a circuitry involving the part of the brain constitutes recurrent networks.

1b Association

Now let’s look at the Association part of the figure.  One of the most elementary associations would be that from sensory output to motor output, which associates a sensory pattern (or category) with motor pattern (or category).  Association is not pattern recognition but involves the recollection of patterns within a modality or between modalities, where the patterns are the ones recognized by pattern recognizers.  This means association appropriates the function of pattern recognizers.
The arrow titled ‘Associating Features’ from Association to THC in the figure suggests that there is association of lower feature patterns with higher patterns (e.g., when we imagine a flower, we recall the color and shape).  In real brains, this function would be assumed by massive efferent connections.

Friday, March 1, 2013

Linguistic Faculties & Cognitive Architecture

I made a presentation at a Hasegawa Lab. seminar on my tentative research plan.

Here is the link for the material.

Sorry if it's not comprehensible; it's a presentation material...

Monday, January 14, 2013

A survey on parts of speech

Towards the end of last year, I conducted a small experiment surveying the distribution of words and parts of speech with the CHILDES corpus (the Adam set by Roger Brown, 2004).  The purpose of the experiment was to see to which extent the distribution around a word contributes to the determination of its part of speech.  The occurrence of certain words directly before and after the target word was used for the discriminative feature vector.  First, the vector of the probabilities of those words was collected for parts of speech.  In the discrimination phase, the part of speech that has the probability vector closest (in Euclid distance) to the target word was chosen to be the candidate p.o.s.  As the result, when 200 preceding words and 200 following words are used as features, this method yielded 64% of accuracy.  The result was not much different when the target words were limited to non-ambiguous ones.  Cosine similarity did not work at all.
Apparently, this method alone does not determine parts of speech.  Also, one could see here the difficulty in clustering parts of speech by means of word distribution probabilities (and Euclid distance).  So I gave up continuing experiments in this line.  While massive data and the state of the art mathematical modeling methods such as iHMM or NPYLM may yield promising results, human children may be using different strategies such as building semantic categories before learning grammatical categories.

Saturday, January 5, 2013

Vivre avec les Robots

Today, I saw the French documentary on robots:
Vivre avec les Robots / Living with robots (2012) (in French from

Short introduction in English @
Production note in French

It presents the current robot technology and discusses its sociological implication.

The following are some of the research programs introduced in the show (the pointers are what I found on the Web):

Robotics at LAAS CNRS:
Especially, the Jido system performing human-robot collaboration:

Emotion detection with Nao

iCub, the (infantile) robot platform

CB2, the baby humanoid robot

Asimo from Honda (human-robot interaction)



Conversational agents with Jean-Claude Heudin

Angie, another Siri competitor