Saturday, November 10, 2012

Finding Nicholas Cassimatis

I came across an article by Dr. Cassimatis today (thanks to the AGI mailing list) today.

The following is my summary of the article entitled
"A Cognitive Substrate for Achieving Human-Level Intelligence" (AI Magazine 2006).

The profusion problem:
For intelligent systems to be broadly functional and robustthe profusion of knowledge, data structures and algorithms must be integrated.
The Cognitive Substrate Hypothesis 
There is certain cognitive substrates to solve a relatively small set of  computational problems with which other cognitive tasks can be solved.
"These include reasoning about temporal intervals, causal relations, identities between objects and events, ontologies, beliefs, and desires."
Findings in linguistic semantics (e.g., Jackendoff) support the hypothesis. 
Polyscheme cognitive architecture (Cassimatis 2005)
The common function principle (CFP):  Many AI algorithms can be implemented in terms of the same basic set of common functions.
The multiple implementation principle (MIP): Each common function can be implemented using multiple computational methods.
The article tries to show a parallel between (folk) physical concepts and grammatical concepts.

I agree with the problem setting (the profusion problem) and am also hopeful with the cognitive substrate hypothesis.  I am less sure about Polyscheme, as it is a more practical architecture issue, though the two principles would be desired.

For anyone pursuing a brain-inspired cognitive architecture, the integration issue would be a keener problem, as they often start from learning issues and leave other problems (of planning, language processing, etc.) aside... (This is not a comment on the article above, but a more general comment.)

Dr. Cassimatis is on the faculty of the Cognitive Science Department at Rensselaer, and has founded a search technology company SkyPhase.

Thursday, November 8, 2012

Suffix finder

Well, this isn't really an AI topic, but an elementary NLP exercise.
The AWK script extracts word suffixes from a word histogram.


#! /usr/bin/awk -f 
# Creates the suffix histogram of a word histogram
/^[^_]/ {
  words[$1]=$2  # word histogram
  wordt[$1]=$2  # word histogram copy (to evade the awk bug that destroys strings within nested reference of an array)
  for (x in words) {
     if (length(x)>3) {  # word length > 3
       for (i=length(x)-1;i>=length(x)-3;i--) {
                         # suffix length <= 3
         if (wordt[substr(x,1,i)]!="") {
              suffix = substr(x,i+1)
              stem = substr(x,1,i)
              # Look for suffix whose stem part is a word.
              if (suffixes[suffix]=="") suffixes[suffix]=words[x]
              else suffixes[suffix]=suffixes[suffix]+words[x]
              if (suffix_count[suffix]=="") suffix_count[suffix]=1
              else suffix_count[suffix] = suffix_count[suffix]=suffix_count[suffix]+1
  for (key in suffixes) {
    if (suffix_count[key]>=10) print key, suffixes[key]
    # print suffixes only used by more than 9 kinds of words.
Result from the CHILDES/Brown/Adam corpus(sorted by counts):

's 2664
s 1812
r 935
y 539
ed 446
es 444
e 282
n 250 # broke-n, etc.
d 168
er 145
ly 135
'd 76