I altered the agent architecture. The previous one used competing production-like rules for control. The current one uses simpler perceive-reward-action cycles. In case conflicts arise, they will be resolved by the action selector. This would be more compatible with regular reinforcement learning architecture. Besides the abolishment of competing rules, the rest is the same as the old design.
BTW, two agents are now split to the Language Learner and the Language User (caregiver). LL chooses its action randomly and LU calls LL when LL is not looking back to LU. LU's call is displayed as a small rectangular balloon on it, containing LL's current name "Luca." Agents also 'smile' when they are 'looking into each other.' These are part of the plan in my verb learning experiments.
The simulator was made with V-Rep and python.
The code is found on GitHub (Version 2 snapshot: 2016-08-01).