Consider the problem of balancing a ball on an adjustable beam as shown in Figure 4. The ball starts with an initial position and initial velocity , and we require the ball to be brought to a rest at the center of the beam by dynamically adjusting the angle of the beam. This problem is a classic regulator-type control problem and is precisely posed as: Given any initial condition , what is an appropriate control signal , which can produce the desired final state ? A neural net can be trained to learn such a control by observing the actions of a skilled human operator. The construction of this project is broken into 3 tasks as outlined below.
The first task is to model the dynamics of the system shown in Figure 4. Using Newton's law and neglecting friction, most first year physics students would have no problem formulating the following second order differential equation for the position of the ball on the beam (for a given angle ):
where g is the earth's gravitational constant.
A state space formulation can be used to convert (9) into a system of first order equations. Using the state assignment and , we obtain the desired system model
A numerical solution of the above continuous-time system can be computed using any numerical integration algorithm. For example, a simple forward Euler's integration yields the following
where is the discrete time variable, and is a small positive constant called the integration step-size. To complete the first task, computer graphics are added to visually display the ball, the beam, and the pivot (fix the beam angle at, say ). From a given initial position and starting from rest, the ball position is updated using (12) and (13) and the graphics display in updated accordingly. Since the angle is held fixed, the ball will eventually roll off the beam. In this problem, such a state of the system (the ball off the beam) represents an unrecoverable failure. One way to alert the user that the system has failed is by adding sound; for example, a crashing noise could be sounded whenever the ball leaves the beam.
For this experiment, we require the user of the program to be able to dynamically change the angle of the beam. The best way to accomplish this (although not the only way) is with the use of a joystick. And so the second (and most difficult) task of the project is to interface a joystick to the computer and to the graphics. In the neutral position, the joystick indicates the angle ; when pushed to the far left, the joystick should indicate , and when pushed to the far right, the joystick should indicate the angle . Although hardware interfacing tasks are always machine dependent, the joystick we used had 64 levels, so level 32 was mapped to beam angle 0, level 64 was mapped to , and level 0 was mapped to .
To complete the second task, the joystick inputs are combined with the dynamic equations and computer graphics generated by the first task. So at each time step, we require the following sequence of actions: (i) read the value of the beam angle from the joystick, (ii) draw the beam at that angle, (iii) draw the ball on the beam at its current position , (iv) use Equations (12) and (13) to compute the new position of the ball . At this point, the user can practice trying to balance the ball on the beam, but every time the ball rolls off the edge, he or she will hear a loud crashhhh!
Now that the dynamics of the system are adequately modeled with the user capable of changing the beam angle, the third and final task is to put in place the machine learning portion of the project. An appropriate network for generating a control signal for this system is shown in Figure 5. Here, the input to the network is a 3-tuple , and the output of the network is the control angle . To train the network, we produce training pairs by simulating the system dynamics with equations (12) and (13) to generate a sequence of inputs . For each input, though, we need a target output. The desired target angles will be supplied by the person controlling the joystick.
The neural net structure for controlling the ball on the beam.
The training phase consists of the following 6 steps: (i) read the (next) value of the beam angle from the joystick; the current state of the system and this joystick angle form a training pair , (ii) draw the beam at angle and draw the ball on the beam at its current position , (iii) compute a forward pass of the network shown in Figure 5, (iv) compute the required weight updates using Equations (7) and (8), (v) update the weights of the network, (vi) use Equations (11) and (12) to compute the new position of the ball , and (vii) repeat steps (i) through (vi) until training is complete.
The computer screen layout for the ball balancing problem.
The computer screen setup that we used is shown in Figure 6. Here the function keys are used to select among the various options available. These options could be modified or enhanced based on the preferences and creativity of the students. In our setup, selecting F1 gives the user a chance to get a "feel" for the joystick and practice balancing the ball on the beam a few times before training begins. When ready, the user initiates the training phase by pressing F10; at this point all the weights of the network are initialized to small random values. Once training starts, the user has 5 minutes to train the network. At any time during training, though, the user can interrupt the training phase by pressing F3 to view what the network has learned thus far. By pressing F3, the system enters the recall phase, and the neural net controller takes over. Here, the joystick no longer controls the beam angle; rather, the beam angle is controlled by the output of the neural net. If the neural net controller fails to balance the ball (for example, if the ball rolls off the edge of the beam) then the crash noise is sounded and the ball is given a new random initial condition, and the neural net controller tries again.
After 5 minutes, further training is disallowed and the user can evaluate the performance of the neural net controller by choosing F3. Here, 10 preset initial conditions are used to test the performance of the neural net controller. For each initial condition, 10 points are awarded if the neural net controller brings the ball to the center of the beam and to a stop; 5 points if the neural net controller is able to keep the ball on the beam for a sufficiently long time, say 30 seconds, but is not able to stop the ball at the center of the beam; and 0 points if the ball rolls off the edge of the beam. The total score is out of 100 points and can be improved with practice.
Once constructed, this program is useful for demonstrating many of the fundamental concepts of machine learning to undergraduate students, even non-engineering undergraduate students. For example, for the neural net to balance the ball, the network must be able to generalize; that is, on the basis of the training data construct a control law. To achieve good generalization, the student must construct a "sufficiently rich" set of inputs during the training phase. Typically, the student's first attempt at training the network results in poor performance precisely because he or she did not generate such a sufficiently rich training set of state-control pairs. Given a second chance at training the network, the student balances the ball in a way that is more suitable for the machine learning process, usually resulting in a more successful neural net controller.