rondelion AI: My New Research Plan (2013-08)

Purpose

To create a rover that explores the environment, learns from it and communicates with human-beings in a human language.
The cognitive model of the rover shall be based on association. The real purpose of the research is to verify the feasibility of cognitive models based on association.

The cognitive model based on association

In the model, generative patterns are represented as series of association.
Patterns can be visual, auditory, tactile, and linguistic.

Recognition of situations

The pattern representing a situation could be obtained by integrating pattern recognition results from various modalities within a time series.
An associative series of the situation is a "semantic network", which gives the semantics for the linguistic function.

Learning

The learning mechanism will be neural networks (in a broad sense).
The learning includes the recognition of exterior things, the episodic memory and the categorization of situations. Categorization may be done by supervised and/or unsupervised learning.
The rover will learn from its voluntary actions.

Language

The interface between syntax (parsing/generation) and semantics (association of situations) shall be learned (acquired), by means of association-based cognitive models.
While morpheme dictionaries may be given initially, vocabulary acquisition will be an issue in the future.
The reason for the linguistic interface is that language will be the key for realizing the human-level intelligence, besides it is an effective means for communicating with human-beings.
The rover will have basic language features such as the following .

Description of the scene (verbalization of situation recognition)
Response to (human) questions
Response to (human) instructions

Sociality

Human (animal) infants tend to have the innate ability to recognize other individuals and communicate with them. The rover will be given certain recognitive abilities such as face recognition, motion capture, speech recognition and gaze recognition.

Core cognitive architecture

Time-series pattern recognizers having motor commands and sensory input as their input
Attention and situation assessment
Episodic memory (← pattern recognizers, situation assessment & attention)
Backtracking (parsing and planning with certain evaluation functions)

cf. A Figure of Human Cognitive Function

System configuration

Hardware

Locomotive function
Roomba / Kobuki, etc.
Visual function
Early stage: Kinect
Later adding saccade
Acceleration sensor
(works also as collision detector)
Audio function
On board PC (notebook)
Wireless connection (WiFi / BlueTooth) for monitoring

Software

OS
ROS, etc.
Visual data processing
Peripheral vision, central visual field, optical flow, depth recognition
Object detection, tracking, motion capture, facial recognition, gaze recognition
Kinect software, OpenCV, etc.
Speech recognition
HARK, etc.
Speech synthesis
Vocaloid, etc.
Learning module
SOINN, k-means, SOM, SVN, DeSTIN, HMM, etc.
※To be used as plug-ins depending on the purpose.

A Tentative Research Steps

Phase I: Kinect + OpenCV + HARK + Vocaloid (preliminary exercises)

Checking for visual functions (facial recognition, motion capture)
Checking for speech recognition function
Checking for speech synthesis function
Implementating a conventional linguistic (speech) interface
Experimenting on visual experience reports with a conventional linguistic interface

Phase II: Pattern Recognition

Selection and implementation of time-series pattern recognizers
Visual pattern recognition experiment
Experiments on pattern recognition reports with a conventional linguistic interface

Phase III: Episodic memory

Defining the situations to be remembered
Implementing episodic memory and attentional mechanism
Experimenting on episode reports with a conventional linguistic interface

Phase IV: Eye movement

Kinect may be put on a movable (controllable) stage (w. an acceleration sensor)
Human tracking
Behavior control (extending a conventional language generation mechanism)
Gaze induction by instruction with a conventional linguistic interface
Q & A with a conventional linguistic interface

Phase V: Roaming (Roomba / Kobuki)

Coupling vision and roaming (reflexive)
Defining the relation between attention and roaming ("curiosity")
3D object learning/recognition via roaming
Instruction of motion with a conventional linguistic interface

rondelion AI

Sunday, August 11, 2013

My New Research Plan (2013-08)