Monday, January 14, 2013

A survey on parts of speech

Towards the end of last year, I conducted a small experiment surveying the distribution of words and parts of speech with the CHILDES corpus (the Adam set by Roger Brown, 2004).  The purpose of the experiment was to see to which extent the distribution around a word contributes to the determination of its part of speech.  The occurrence of certain words directly before and after the target word was used for the discriminative feature vector.  First, the vector of the probabilities of those words was collected for parts of speech.  In the discrimination phase, the part of speech that has the probability vector closest (in Euclid distance) to the target word was chosen to be the candidate p.o.s.  As the result, when 200 preceding words and 200 following words are used as features, this method yielded 64% of accuracy.  The result was not much different when the target words were limited to non-ambiguous ones.  Cosine similarity did not work at all.
Apparently, this method alone does not determine parts of speech.  Also, one could see here the difficulty in clustering parts of speech by means of word distribution probabilities (and Euclid distance).  So I gave up continuing experiments in this line.  While massive data and the state of the art mathematical modeling methods such as iHMM or NPYLM may yield promising results, human children may be using different strategies such as building semantic categories before learning grammatical categories.

Saturday, January 5, 2013

Vivre avec les Robots


Today, I saw the French documentary on robots:
Vivre avec les Robots / Living with robots (2012)
http://dai.ly/WeIToo (in French from dailymotion.com)

Short introduction in English @ pariscience.fr
http://bit.ly/VByveB
Production note in French

It presents the current robot technology and discusses its sociological implication.

The following are some of the research programs introduced in the show (the pointers are what I found on the Web):

Robotics at LAAS CNRS: http://www.laas.fr/robots/
Especially, the Jido system performing human-robot collaboration:

Emotion detection with Nao

iCub, the (infantile) robot platform

CB2, the baby humanoid robot

Asimo from Honda (human-robot interaction)

Justin

Armar

Conversational agents with Jean-Claude Heudin

Angie, another Siri competitor