UC Riverside Reinforcement Robotic Interface Description

Table of Contents

Current Status

October 2

  • Noted some objectives for present and future development (eg. December / January).
  • Objective 1: Separate the low level robotic control from the basic GUI and simulation framework, so that others can use the code for their own robot platforms. That is, abstract the notion of physical robotic experimentation into a separate component in the code (or control in the GUI).
  • Objective 2: Write some fast C code (openGL) for the graphics portion of the MazeFrame so we can show more than a two dimensional overview of the robot. Envision using 3D graphics to represent the robot and maze during simulation. May also make this the primary frame, ie flatten the GUI so that we are keeping track of statistics and camera movement in secondary windows.
  • Objective 3: Do some user task analysis for the interface, including, but not limited to, task scenarios and evaluation using heuristics.
  • Objective 4: Most importantly, get the simulation working, at least at a crude level. Compare the algorithms and do some algorithmic analysis before incorporating machine learning. Thus another reason to modulized the robot/GUI code better.

August 21

  • All documentation have been converted to html format.
  • Need a simple, possibly non-GUI application that uses and tests only the QLearning* classes (~ 2 days).
  • JNI code needs to be changed for the PC platform (~ 3 - 4 days).

August 11

  • ExplorerFrame and MazeFrame are basically complete.
  • StatusFrame is not complete, but the source of the data has already been incorporated, so the fix should not be too time-consuming (~ 3 - 5 hrs).
  • Four neighbor drawing code has been fixed; works fine now for LineRobot. However, CamRobot drawing code is buggy for eight neighbors (~ 2 hrs).
  • Readme dialog won't link hyperlinks correctly (~ 3 hrs).
  • About dialog is empty, but is very simple to implement (~ 20 minutes).
  • Communication in the InterfaceFrame is not finished; in particular, we need an ExperimentManager that tells the QLearningExperiment to do things and communicates the info to the frames (~ 3 days).
  • Open and save utilities not implemented; need a file format and some file stream code in the listener for the open and save buttons (~ 4 - 5 days).
  • Camera.c won't compile, due to Java glue and IMA (~ forever).

Running the Interface Program

Native code is implemented for the Unix system at the UCR Visualization and Intelligent Systems Laboratory. Namely, CamRobot.c, LineRobot.c, Camera.c, and HandyBoard.c are for use specifically with the IMA imaging system and the MIT Handy Board. To use the code for a different platform, simply change the specific references to these classes in the Status, Explorer, and Maze frames, or add your implementation as an option. See Program Structure for details.

All files are in the ucr_reu directory (corresponding to the ucr_reu package). The java-specific paths must be set as follows: CLASSPATH set to the directory containing ucr_reu, LD_LIBRARY_PATH set to ucr_reu itself.

To recompile the project, do the following:

  1. Make sure the CamRobot.java file static block loads the "CamRobot" library.
  2. Compile the java files using javac *.java, with the -classpath option set to the directory containing ucr_reu.
  3. Create a header file for CamRobot.java using javah -jni CamRobot. You must also reset the CLASSPATH (or use the -classpath option) to ucr_reu so that javah can find CamRobot.java.
  4. Make sure your C implementation file meets the requirements of the header file (same function declarations).
  5. Invoke the C compiler, creating the shared library in the process using cc -G -I/usr/java/include -I/usr/java/include/solaris CamRobot.c -o libCamRobot.so, assuming your java directory is located in usr (as in sevenup). For shasta, use: cc -G -I/usr/local/inst/jdk1.2.2/include -I/usr/local/inst/jdk1.2.2/include/solaris CamRobot.c -o libCamRobot.so
  6. Repeat procedure for LineRobot and Camera classes.

To run the program, use java ucr_reu/InterfaceFrame, with -classpath set to the directory containing ucr_reu; or simply java InterfaceFrame (unset your CLASSPATH).

Robot Learning Task

Given starting coordinates and a starting orientation, the task of the robot is to search for the goal coordinate of an arbitrary maze possibly filled with obstacles. When the goal is found, the robot is replaced at the starting coordinates and the next trial (or episode) of the search takes place. The robot and the corresponding search algorithm has no knowledge of the goal coordinates in all trials. They also have initial knowledge of the environment. As the number of episodes increases, the performance of the robot, as measured by total number of moves taken to reach the goal, should improve.

Although mazes and algorithms of various shapes and types can be used for this experiment, we chose to implement only 4 and 8 neighbor q learning on a rectangular maze. The interface is flexible enough, however to allow for mazes of different types (one only has to implement a specific maze class that extends the abstract Maze class). Similarly, different types of algorithms can be created (one needs to make small changes to NewExperimentDialog, implement an option panel for the new algorithm specifying the user input parameters, a custum Maze or RectMaze class that keeps track of state information, and an Experiment class which takes a subclass of Robot and a subclass of Maze among its arguments and generates specific actions by calling the robot and updating the maze).

An algorithmic implementation necessitates an implementation of the abstract Robot class. Currently, the two direct subclasses of Robot are LineRobot and CamRobot. LineRobot controls the line- sensing robot from the 1999 NSF UCR REU robotics project. See last year's documentation for more details. CamRobot controls the NSF UCR REU robot. It has a mounted camera, four proximity sensors, stepper motors, servo motors for rotating and tilting the camera, encoders for fine position control, and bump sensors. When a new robot is built, we need to construct a new subclass of Robot. We also need to add a specific case for the code in the ExplorerFrame and possibly the initialization dialogs.

This year's focuses on the use of a mounted camera to detect and avoid obstacles. To improve performance, the camera communicates directly with the computer, which sends signals to the Handy Board on the robot, telling it to check sensors, move forward, rotate, etc. The robot also has error checking capabilities associated with movement (encoders) and emergencies (bump sensors). We hope to demonstrate improved performance, as measured by the number of actions taken to reach the goal state in a maze, in a real-time robotic system.

Learning Algorithm

The only algorithms implemented so far are 4 and 8 neighbor q learning.

Input / Output Interface

Sample InterfaceFrame image

Program Structure / Architecture

The following is a list of classes for the project, along with specifications, their super class, and links to their present source code. (You can freely access the html source and compile the classes directly after removing the first and last lines of the file. Note that package statements have been omitted from the source to allow easy access.).

In addition to these classes, the project contains the interface InterfaceConstants, for the Reinforcement Learning Robotic Interface package. It contains all the programmer-specified constants for the package, including

  1. interface modes
  2. required bounds (in pixels) of frames and graphic components
  3. colors, borders and other GUI specifications
  4. default start-up values for robots and mazes
  5. strings for image files and hyperlinks
Every concrete class in the package implements InterfaceConstants. The three abstract classes: Robot, Experiment, and Maze can be used in other projects, perhaps as extensions. They are the core classes to extend for expanding the capabilities of the Reinforcement Robotic Interface.

Note that some classes are implemented by native code in conjunction with the Java Native Interface.

All robot code is written in Interactive C, for the MIT Handy Board. HandyBoard.c is the implementation of receiver/transmitter communication from the Handy Board's side. Note that motor and sensor code have not been added. A possible way to control the stepper motor using software is presented in StepperMotor.c, which was found in Peter Harrison's web site. The computer/board communication has been tested; optimizations for speed, however, have not yet been made. We hope to increase the baud rate for the communication for CamRobot and decrease the delay time on the computer's side. A version of the code for the LineRobot is working. Specifically, we can command the old line robot directly and show its status and position in the Interface Frame.

Class Description Extends
AboutDialog A modal dialog that provides product information about the reinforcement learning robotic interface package. It also contains links to documentation on Java and the interface package on the web. javax.
AlgorithmOptionPane An opaque panel that obtains the name of the preferred algorithm from the user via a set of radio buttons. At present, only q learning is supported, but extensions for new algorithms have been made. javax.
CamRobot An implementation of Robot capable of moving an arbitrary number of steps and rotating an arbitary number of degrees. It has a camera mounted on the robot and four proximity sensors that return an integer describing the distance of the nearest obstacle in four different directions. Implementation in native code is found in CamRobot.c and CamRobot.h. Robot
Camera An Object that can grab image from the Imaging Modular Vision system. It can also describe obstacles detected by the camera. Implementation in native code is found in Camera.c and Camera.h. java.
CommandModeDialog A modal dialog containing setup preferrences for a robot command mode session. It allows robot type selection and maze characterization (width, height, start and goal coordinates), but not obstacle initialization. TabbedDialog
DirectionVector An eight dimensional vector containing double values for the eight directions in the order (North, Northeast, East, Southeast, South, Southwest, West, Northwest). The structure can also give the maximum of the direction values without sorting by keeping track of a max value variable. java.
abstract Experiment An abstract class for running robotic experiments. It holds an instance of Robot and specifies methods that control the flow of the experiment by calling the robot (experimental initiation should take place in the subclass's constructor). java.
ExperimentFileChooser A modal file chooser for opening and saving robotic experiment files. At present the only active file filter implemented are for ".qre" files which record experiment statistics for continuing q learning robotic experiments. javax.
ExplorerFrame A resizable and movable internal frame that captures the live view of the maze from the robot's perspective. For a CamRobot, this is implemented by the camera mounted directly on the robot. The frame also contains tool bars for commanding the robot during a command mode session. javax.
HistoryTableModel A data model for the status frame that displays robot status and experiment progress statistics. It can be modified to display different types of information for different interface modes. javax.
InterfaceFrame The main frame for the Reinforcement Learning Robotic Interface package. It has a menu bar for selecting new experiments, command mode sessions, or simulations, for saving experiments, and for displaying help information. It contains a back ground image and three internal frames: explorer frame, status frame, and maze frame. It also contains references to Robot, RectMaze, and Experiment objects, which are passed between the internal frames, the initialization dialogs, and the interface frame itself. javax.
LineRobot An implementation of Robot capable of moving in a grid by using line sensors to detect gird lines on a rectangular maze. It can be monitored via an overhead camera. Implementation in native code is found in LineRobot.c and LineRobot.h. Robot
abstract Maze An abstract class that encapsulates the notion of a 2-dimensional space with specific goal coordinates. It keeps track of the start and goal coordinates for a search through the abstract maze space and specifies methods for placing and displacing obstacles. java.
MazeFrame A resizable and movable internal frame that provides a real-time, live view of the maze from an overhead perspective. The robot, maze, start and goal coordinates, obstacles, and search path are illustrated as the experiment or command mode session takes place in real time. javax.
MazeOptionPane An opaque panel that obtains the maze size, start coordinates, and goal coordinates from the user via a set of text fields. It modifies the RectMaze passed in through the constructor directly and provides action-based error checking. javax.
MazePane A graphic component that illustrates the progress of the robot in real time. In a simulation session, the obstacles are pre-drawn on the maze; otherwise, obstacles are drawn as they are discovered. javax.
NewExperimentDialog A modal dialog containing setup preferrences for a new robotic experiment. It allows robot type selection, maze characterization (width, height, start and goal coordiantes), algorithm selection, algorithm specification, but not obstacle initialization. TabbedDialog
ObstacleOptionPane An opaque panel that obtains the obstacle coordinates from the user during simulation session initialization. It contains a graphic panel which recieves input from the user in the form of mouse clicks and displays the current selection in a label. javax.
ObstacleSelectionPane A graphic component that illustrates the placement and displacement of obstacles in a simulation session initialization. It draws obstacles as the user clicks on a grid, checking for errors dynamically as they occurr. javax.
QLearningExperiment An implementation of Experiment for used in a 4 or 8 neighbor Q Learning reinforcement experiment or simulation. It controls the flow of the experiment by calling its Robot object to perform actions and updating the q values via its QLearningRectMaze object. Specifically, the action taken during each robot state is randomly selected with respect to the q values for each possible action:

P(ai | s) = (k ^ Q(s, ai)) / (sum over j (k ^ Q(s, aj)))

where the probability of taking the action ai at state s is calculated from the user-specified exploitation factor k greater than 0, and the q values for each action at state s. As k increases, the equation favors actions with high q values more and more. Note that when k = 1 or when Q(s, ai) = 0 for all ai, the calculated probability for taking any available action ai are the same.
QLearningOptionPane An opaque panel that obtains data from the user specifying the q learning experiment parameters: number of neighbors, number of trials, goal reinforcement, discount factor, exploitation factor, etc. It performs action-based error checking on the user input. javax.
QLearningRectMaze A subclass of RectMaze that supports q learning experiments. It keeps track of q values for all possible state actions, previous headings and coordinates, and the number of times a state action has occurred. It can updates its state action table and q value table. Q values are updated by treating the experiment as a nondeterministic Markov Decision Process, with the update rule:

Qn(s, a) = (1 - X) * Qn-1(s, a) + X(r + d * max (Qn-1(s', a')))

where the Q value at state s while taking the action a is computed from X (the learning rate, which depends on the number of times the specific state action has occurred), r (the reinforcement value associated with the state), Qn-1 (the previos q value at s), d (discount factor between 0 and 1), and the maximum of the q values at the the new state s'.
ReadmeDialog A modal dialog that brins the interface readme into the program by accessing its URL. It also allows the user to browse through the javadoc documentation. javax.
RectMaze An implementation of Maze that represents a rectangular maze with square cells that are either obstacles or free cells. It also keeps track of the current position of the target within the grid and performs error checking on attempts to modify the maze. Maze
abstract Robot An abstract class that encapsulates the basic functionalities of a movable machine that accepts commands from the user. It keeps track of the current heading of the robot as it traverses through a maze. java.
RobotOptionPane An opaque panel that obtains the type of robot to be used for the experiment from the user. At present only line robot and camera robot are supported, both of which are quite specific in their functionality, and both are implemented via a subclass of Robot. A better implementation would involve identifying the components of the robot and specifying their attributes during experiment initialization. javax.
StatusFrame A resizable and movable internal frame that keeps track of the status of the robot and maze characteristics in a history table. It displays different types of information for different interface modes (simulation, experiment, command). javax.
TabbedDialog A general purpose dialog window with finite number of tabs and two buttons (ok and cancel). Its subclasses are used to obtain user input during experiment, simulation, and command mode initialization. javax.


  • Mitchell, Tom. Machine Learning. McGraw-Hill, 1997. Ch 13.
    Introduces QLearning theory and heuristics such as probability algorithms for selection of the next action and simple algorithms for balancing exploitation and exploration in a real experiment.
  • Jain, Ramesh. And Rangachar Kasturi, Brian G. Schunck. Machine Vision. McGraw-Hill, 1995. Ch 3.
    Overview of simple thresholding algorithms for detecting obstacles in different situations; for example, when the maze grid size is fixed.
  • Jones, Joseph L. And Bruce A. Seiger, Anita M. Flynn. Mobile Robots: Inspiration to Implementation. 2nd Ed. AK Peters, 1999. Robot Programming Chapter.
    Teaches the art of programming for a mobile robot using a real robot example. Uses and teaches the Interactive C paradigm.
  • Martin, Fred. The Handy Board Technical Reference. Availabe Online.
    Details on Handy Board library functions, usage of Interactive C programming language, components of the Handy Board, etc.
  • Motorola. Motorola 68HC11 Porgramming Reference Guide. Available Online.
    Handy reference for the motorola chip used in the Handy Board, including address locations of control bits and other assembly programming necessities.
  • Kernighan, Brian. And Dennis Ritchie. The C Programming Language. Prentice Hall, 1988. Ch 7, 8.
    Describes the Unix system interface and the use of C to read and write to the serial port on an Unix machine. Also describes files, file pointers, and related technology for using data files to debug code.
  • Campione, Mary. And Kathy Walrath. The Java Tutorial. 2nd Ed. Addison-Wesley, 1998. GUI section.
    Gives the details of Swing/JFC GUI programming. Look specifically at the Java2D API for drawing the robot image, JNI for native interface implementatins, and Essential Java Classes for discussion of file streams and strings.
  • Austin, Calvin. And Monica Pawlan. Advanced Programming for the Java 2 Platform. Available Online. Ch 5 - 6.
    An overview of various Java essentials for advanced GUI building, Java Native Interface, and performance issues.


The Java2 API is available here .
The documentation API for the Reinforcement Learning Robotic interface package is also available .
An overview of the inheritance hierarchy for the project is here .

For more information, please contact:
Pat Leang , for questions regarding the transmitter / receiver, motor, sensor, encoder and handy board-related code
Ray Luo , for questions regarding the interface GUI, and communication between robot, computer, and camera
Thuan Dinh , for questions regarding the camera and the vision processing algorithm
Steve Wong , for questions regarding the testing code

Created by Ray Luo

