Nao Tutorial Nao Tutorial

Nao Tutorial

This tutorial is intended to provide some familiarity with the basics of the Austin Villa codebase for the purpose of controlling robot behaviors, vision, and localization. Any feedback or requests for clarification may be sent to jmenashe AT cs dot utexas dot edu.
Please refer to the Nao intro to run the initial robot setup before proceeding with this tutorial.

Hello World

This section will walk you through creating a simple Hello World program on the Nao. This assumes you've completed the nao intro.
  1. Open five terminal windows - this isn't always necessary, but will help with debugging.
  2. SSH into the nao with the first one, and run nao stop. When the nao is stopped, run naoqi
  3. In the second and third terminal windows, SSH into the nao and run bin/motion and bin/vision, respectively.
  4. In the fourth window, run the tool: ~/nao/trunk/bin/tool.
  5. In the fifth window, navigate to ~/nao/trunk/core/python on your machine.
  6. Open This script defines the different states that the robot is in, which are listed in the Files window in the tool. You are going to edit the Playing state.
  7. The Playing state currently inherits from MachineTask, which is a task that runs a state machine. Edit this class so that it inherits directly from Task.
  8. It won't be necessary to override the constructor function for our Hello World test, so remove the __init__ definition from the Playing class.
  9. Add a definition for a run method that makes two calls: core.speech.say('Hello, World!') and self.finish(). The first command will call the text to voice engine. The second command will indicate that this task has been completed.
  10. Upload your python scripts to the robot by clicking the "Send Py/Lua" button on the Files window in the tool. Alternatively, you can run the script ~/nao/trunk/build/copy_robot [Robot IP] python
  11. Restart python on the robot by clicking the "Restart Py/Lua" button in the Files window.
  12. Put the robot in "Playing" by clicking the "Playing" button in the Files window.
  13. Your robot should say "Hello, World!". If it doesn't or runs into an error, check the terminal you ran bin/vision in to see any error output.

Basic Tips

  1. All of your work for this class will be manifested in the vision binary. So to read the output of cout/printf or print you will need to view the console output from bin/vision when ssh'd into the nao.
  2. Joint and Sensor enums are defined in core/common/RobotInfo.h, and robot dimensions in core/common/RobotDimensions.h. Joint values can be accessed in python using the Joint enum, like so: percepts.joint_angles[core.HeadYaw]. Similarly for sensor values: percepts.sensors[core.centerButton].
  3. To do any body movements, you need to turn stiffness on first: commands.setStiffness()
  4. To walk, use commands.setWalkVelocity(x, y, t) where x is forward/backward [-1,1], y is left/right [-1,1], and t is turning angle [-pi,pi].
  5. To move the head left or right, use commands.setHeadPan(theta, time) where theta is the absolute angle [-pi/2,pi/2], time is the amount of time to take for the command to finish. Pass a third parameter True to use a relative head angle.

Code Structure

The UT Austin Villa codebase is organized into a system of modules and memory blocks. Modules can request or create memory blocks which can then be shared with other modules. Memory blocks are used for passing information between modules and processes through shared memory, as well as for streaming from the robot to the tool and for logging. Data is organized into frames. The Motion process is capable of reading sensors at 100Hz, and thus processes frames at this rate as well. The Vision process can read camera images at 30Hz. The Vision process controls the larger part of the codebase, including behaviors, so most code runs at this rate. Each frame, the robot must read in images, process images, run behaviors that react to the processed images and sensors, and update localization. Throughout the codebase you will see code written with the frame-by-frame paradigm.


Behaviors organize the high level tasks that the robot will carry out. Behaviors rely on underlying vision- and motion-related modules for low level computations such as kinematics and object detection. Behaviors are written in Python because of its flexibility and ease of use. These can be found in $NAO_HOME/core/python.

All Python scripts originate in This script is responsible for requesting low-level computations from C++ modules, as requesting processing from python behaviors. After performing initial computations, the current behavior's processFrame method is called.

The default behavior used in the CS393R class is in This script simply checks the game state and calls the corresponding task for that state. Game state tasks can be defined in

State Machine

State machines are a basic construct provided by the codebase that you may find useful to take advantage of. They come in two flavors: SimpleStateMachine and StateMachine. The simple version takes a list of state names as its constructor argument and creates a structure with those state names as member variables. The main purpose of the SimpleStateMachine is to help track which state you're in and how long you've been in that state. When more advanced transition graphs are needed, the StateMachine class is appropriate.

The state machine comes with a set of basic events and a Node definition from Transitions can be defined in a state machine's setup method using the self._adt, where "adt" stands for "add transition".

The basic event types are as follows:

  1. Null Event - This event always fires.
  2. Success Event - This event fires when its node's success() method returns True.
  3. Failure Event - This event fires when its node's failure() method returns True.
  4. Completion Event - This event fires when its node's complete() method returns True.
  5. Iteration Event - This event fires after a set number of iterations.
  6. Time Event - This event fires after a specified number of seconds.

Each frame the state machine will check all events originating from the current node. When an event fires the state machine will transition to the event's successive node. Thus, a statement like self._adt(start, T(3.0), StandNode()) will tell the state machine to transition from the start node to a standing node after remaining in start for 3 seconds. Naming nodes is necessary when a particular node needs to be referenced multiple times. In some cases a node is referenced only once, so these can be chained together in the _adt method: self._adt(start, T(3.0), StandNode(), T(3.0), finish). The method will take any number of arguments, and expects them in the order "Node, Event, Node, Event, Node, ..." corresponding to their organization in the event chain.

When the robot is set to Playing, a simple finite state machine will execute as defined in This machine will randomly walk forward, turn left, or turn right until it has chosen 4 actions. At this point the robot will say, "Completed," and sit down.


Tasks are another basic construct that is provided by the codebase. Tasks are organized hierarchically, so a single task may consist of multiple subtasks. To set the behavior of a task, define its run method. To call the task, use its processFrame method.

Reading Sensors

At some point it will be necessary to access sensor values in your behavior scripts. These can be accessed directly through the percepts module:

import percepts
print "My head yaw value is %f!" % percepts.joint_angles[core.HeadYaw]

Additionally, high level commands such as walking and standing can be called from the behavior scripts. These are available in


Unlike behaviors, vision processing is restricted to the C++ portion of the codebase. This is due to efficiency concerns; high-level languages like Python don't provide the low-level functionality available in C++ for manipulating memory, and these features are necessary for processing all of the pixels retrieved from the cameras each frame. By default your robots will run at 320x240 resolution for both the top and bottom camera.

The vision module processes both cameras in sequence, starting with the top. Objects that are found by the top camera may be overwritten by the bottom. When an object is successfully found, the vision module is responsible for filling out information in the appropriate WorldObject instance. For example, we might place the following in the core/vision/BallDetector.cpp file:

void BallDetector::detectBall() {
	int imageX, imageY;
	if(!findBall(imageX, imageY)) return; // function defined elsewhere that fills in imageX, imageY by reference
	WorldObject* ball = &vblocks_.world_object->objects_[WO_BALL];

	ball->imageCenterX = imageX;
	ball->imageCenterY = imageY;

	Position p = cmatrix_.getWorldPosition(imageX, imageY);
	ball->visionBearing = cmatrix_.bearing(p);
	ball->visionElevation = cmatrix_.elevation(p);
	ball->visionDistance = cmatrix_.groundDistance(p);

	ball->seen = true;

A behavior script may then check the seen variable for the ball object as follows:

import core
ball = core.world_objects.getObjPtr(core.WO_BALL) if ball.seen: walkTowardBall()

To decide when an object has been seen, the vision module relies on a number of detectors. Each detector has access to the blob_detector_ and classifier_ objects. The classifier is responsible for converting the YUV image pixels into a small set of colors corresponding to different field objects and organize these into contiguous runs of color. These colors are defined in core/vision/enums/Colors.h. The blob detector is responsible for combining adjacent runs into blobs. Most of the code for these has been removed for the uses of the class, but the classifier will still classify pixels according to our standard color set. To access these colors, use the getSegPixelValueAt(x,y) macro available to each detector in the vision module. A simple orange ball detector could then look like this:

bool BallDetector::findBall(int& imageX, int& imageY) {
	imageX = imageY = 0; int total = 0;
	for(int x = 0; x < iparams_.width; x++)
		for(int y = 0; y < iparams_.height; y++)
			if(getSegPixelValueAt(x,y) == c_ORANGE)
				imageX += x, imageY += y, total++;
	if(!total) return false;
	imageX /= total, imageY /= total;
	return true;

The detectors in the vision module are all handled by the ImageProcessor class. The main purpose of this class is to hold pointers to these objects, use them in the correct order, and perform some basic decisions about when (not) to use a detector. All of this happens in the ImageProcessor::processFrame method. So if we wanted to actually use the BallDetector class's detectBall method, we'd add it to the ImageProcessor as follows:

void ImageProcessor::processFrame(){
classifier_->classifyImage(color_table_); ball_detector_->detectBall(); // detect the ball from the segmented image }

Blob Detection

A more sophisticated approach to detecting objects in the vision frame is through blob detection. This is the process of identifying contiguous regions of color in the vision frame. Blob detection methods are available through the BlobDetector class. Here is a snippet that locates all of the orange blobs in the image and prints out various statistics:

/* Call these once at the start of each frame*/

/* ... */

BlobCollection blobs = blob_detector_->horizontalBlob[c_ORANGE];
if(blobs.size() > 0) {
  printf("found %i blobs\n", blobs.size());
  for(int i = 0; i < blobs.size(); i++) {
    Blob& b = blobs[i];
    printf("blob %i is centered at %i, %i, in bounding box (%i,%i) to (%i,%i)\n",
      i, b.avgX, b.avgY, b.xi, b.yi, b.xf, b.yf

While simple objects like the ball can be detected as single blobs, more complex objects like lines will need additional processing.

If you're viewing a log with "Run Core" enabled, you can see the detected orange blobs using the tool by looking at the middle right images of the Vision window. In the window below, the orange balls have been identified as blobs and thus have had their bounding boxes drawn around them. Other runs of contiguous color exist in the image, but these haven't been merged into blobs so they're represented as straight lines. Hovering the mouse over a blob gives more detailed information about it. To enable this feature for other colors, you'll need to edit the tools/UTNaoTool/VisionWindow_Draw.cpp file in the drawHorzLinePoints or drawVertLinePoints methods.

Blob Detection


Kicking can be accomplished by making calls to the KickRequestModule. An example of this is available in the latest version of the repository under The task Kick accepts a foot choice and a desired distance as constructor parameters. Distances haven't been tuned for each robot or for our particular lab, so they will not be accurate but will serve more as a rough guide for how far/hard the robot kicks the ball. The range is about 500 mm to 5000 mm.

Kicking makes the robots unstable, especially on some more than others. Always be behind your robot, ready to catch it, when it is kicking.


To be completed.