Tuesday, April 21, 2015

Research Plan Updated (2015-04)

I haven't written a post in this blog for a long time.  I have been loosely related to the AI lab at Dwango Co., Ltd., where issues such as brain-mimetic AI are discussed.  Meanwhile, I have changed my research plan on language acquisition so that it will be more focused on acquisition of verbs.


The plan is intended to create a system that performs language acquisition (symbol grounding), particularly the acquisition of verbs.  The reason for the focus on verbs is their pivotal role in sentence formation.  Human infants usually start acquiring verbs (among other words) around 1.5 years old.  The system will follow the human (infant) language acquisition process in a simplified simulated environment.  While the representation for words and concrete objects including agents are given, the system must form the concepts of motion patterns for itself and map verbs to the motion concepts.

Verbs and Motion

While not all verbs represent motions, the current plan focuses on motion verbs.  For the system to associate motion verbs with internal representation of motion, it must have the representation beforehand.
(For a comprehensive review of the subject matter, please refer toAction Meets Word -- How Children Learn Verbs, Oxford 2006.)

The Representation of 'Intended' Motion

As there are myriad of motion patterns in the environment, infants must attend to certain types of motion to recognize.  One possibility is that, as indicated by Melzoff (2015), they learn their own motion to recognize first and then generalize it to other agents' motion.  As the internal representation of one's own motion is associated with the representation of 'intention,' infants would come to associate similar motions of other agents with certain representation of intention so that they are recognized as 'intended motion.'
Meltzoff, A. N. “The ‘like Me’ Framework for Recognizing and Becoming an Intentional Agent.” Acta psychologica 124.1 (2007): 26–43. PMC. Web. (2015).

Associating Motion with a Verb

This may be the most difficult part of the current plan.  As in any acquisition of word-referent relations, the relation between words and referents is many-to-many.  In a given situation, many words may be uttered in a series, while there are many candidates of referents.  To determine the referential relation, the learner must do a detective work on hypotheses.  If a word (verb) is uttered with a noun whose referent is known, it is likely to be related to something (or some motion) of the referent.


40010000037977001815_1.jpgIn the simulation world reside the language learner (LL) and agents who use a language (LU-s). LU-s are programmed to talk to LL, as human parents do. Both have their own repertoire of behavior such as moving around. LL is programmed to acquire language use from the history of behavior and utterances of LU and itself.

Exterior objects are given to LL with their abstract properties in this simulation, as visual pattern recognition is out of the scope.

Words are also given to LL as segmented strings, as word segmentation from phonetic streams is out of the scope.

Social and internal rewards are given to LL; LU-s' behaviors such as 'smiles' and those in accordance with LL's prediction yield rewards inside LL.

LL probabilistically copies (part of) LU-s' utterances and this babbling is reinforced by rewards.

According to Tomasello, language learning requires the recognition of intentions of speakers. Or at least LL must be able to attend to what LU-s refer to with their utterances. In case of motion, while the agent (LL or LU) is salient enough, it would be difficult to find the motion segment referred to.

A means of visualization will be introduced, without which it is difficult to understand what is going on in the simulation...


In the simulation above, LL must learn motion patterns of itself and LU-s, with a time-series learning mechanism such as RNN.  Learned recurrent patterns may be categorized with categorizers such as k-means or SOINN.  (A combination of a recurrent network and a categorizer may be realized with the DeSTIN framework.)  The scheme for mapping verbs to motion pattern representations is to be determined.  (ref. a potentially related paper: W. Takano and Y. Nakamura