## Monday, November 18, 2013

### Learning attitude control

In the previous post, a supposedly swimming robot was set up in the Sigverse simulator, where I wrote;
The motion is out of control now.
So the current issue is attitude control perhaps by reinforcement learning.
It took a while for experimenting attitude control learning.  Though I've got ideas from actor-critic learning, I used simpler algorithms as the task is simple. Here's the idea (＆ result).
• Body rotation ＆ the stabilization goal
The body to be  stabilized rotates and the goal of stabilization is to align the bottom center to the vertical direction.  The action of the body is acceleration to either direction.
• States
The learner learns policies mapping from states to actions.
The states are represented with an analog vector whose dimensions represent the rotation angle of the body.  In fact, the angle of the body is mapped to the vector by gaussian filters corresponding to the dimensions (and the angles allocated to the dimensions).  For example, if there are 36 dimensions, each of them is allocated to a 10 degree notch.  The gaussian closest to the current rotation angle output the largest value.
• Linear architecture
The learner learns the mean acceleration for the current angle.  The mean acceleration is modeled as a linear sum of the state vector so that the learner learns the sum weights.
average(output) = Σ wi * vi
The actual output (acceleration) is determined by a gaussian distribution with the mean acceleration given by the linear architecture and the variance, which is also learned.
• Weights are learned with one of the following algorithms:
• Average success
The weight is set to the average output for the state.
• Estimation difference
Move the weight toward the actual output in proportion to (actual output - estimated output), where the estimated output is the mean acceleration given by the linear architecture.
• Reward
Action success is measured by the following rewards:
• When the bottom sways away from the vertical position, then speed reduction is the reward.
• When the bottom is approaching to the vertical position, then the approach is the reward.
Perhaps simpler rewards may work as well.