These experiments were conducted through a web interface via Amazon.com's Mechanical Turk service. Each subject's task is called a "HIT" by Amazon, meaning a Human Intelligence Task. Subjects received the following initial instructions:

Kermitbot needs your help!

Kermit the Robot Frog wants to be more like a real frog, but he doesn't understand that frogs need water. On top of that, he's always getting lost in the corridors. Poor Robo-kermie. Your job is to teach him how to go to the local watering hole. The videos below give the main instructions, but first:

- The final result of your Robo-kermie pupil will be tested later for how quickly it can get to the water. The trainers of the best 50% of robots will be paid 25% more.

- We strongly suggest closing all other applications currently running on your computer. Otherwise the game might have problems that hurt your chance of being in the top half of trainers.

- Don't refresh your browser! If something is terribly wrong, describe it in detail at the end and you may receive credit.

- Reasons we would reject your HIT: (1) You either did not watch the videos fully or did not answer the questions. (2) The records indicate that you did not honestly try to train the robot. (We will not reject simply for poor robot performance, though. But we will be sad.)

- You can only do this HIT once. We will only pay each worker for one completion.

Start your task here. You'll go through 6 steps. When you are asked for your HIT ID, enter unique user ID exactly (all the characters that are red above). Do not put in your worker ID!!! Once you finish, you'll be given a number. Then answer the questions below and enter the number at the bottom.

The word ``here'' contains a hyperlink that opens a page with an introductory video. The script for the video is below, with on-screen text in brackets.

[Teach a Kermitbot to find the water]

For this Turk task, you'll be training a simulated robot to play a game. Together you'll form a team: you (the trainer) and the robot (the student). Your robot's performance in the game will be judged against that of other Turkers.

Kermit the Robot Frog is very thirsty. He needs your help to find the water. So your goal is to teach the robot to find the water as fast as possible. [As fast as possible!]

[Play the game yourself] Before you teach the robot, we'll have you do the task yourself by controlling it. Click on the box below, and Kermit to the water three times.

After watching the video, the subject controls the agent to get to the goal 3 times. The experiment will not progress if the subject does not complete this task. To the right of the applet containing the agent and its environment are the instructions, ``To play the game, move Kermitbot to water with the arrow keys.'' Once this stage is complete, the subject clicks on a button to go to another page. Again, an instructional video is at the top; the script is below.

Good job. Now I'll describe how you're going to train the agent.

[Training basics] Here's the challenge. You'll be training the robot through reward and punishment signals. The forward slash button---which is also the question mark button---gives reward. Every time you push it, it gives a little bit of reward to the robot. You'll also see the screen flash blue when you give reward. The 'z' button gives punishment and will make the screen flash red. You can think of this as similar to training a dog or another animal through reward and punishment, but it will be somewhat different.

[Pre-practice pointers] I'll give you a couple of pointers now before you practice.

Number one. You can reward or punish rapidly to send a stronger signal than if you just pushed it once. [1. Rapid presses = stronger feedback] So, if I saw something I really liked, I might push reward, reward, reward, reward, reward, reward (really fast, eventually inarticulate) ... [well, not quite that much] and that would be a lot stronger than if I just pressed reward.

Second pointer. The robot knows that people can only respond so quickly, so don't worry about your feedback being slightly delayed. [1. Rapid presses = stronger feedback, 2. Small delays in feedback are okay.]

When you're ready to start practicing, click on the box below, and follow the instructions beside it.

Try your hardest, but don't be hard on yourself. The first time is often rough, and this is just practice. Remember that, as the robot is learning from you, you are also learning, learning how to teach the robot.

As on the previous page, an applet is below the video. Through the applet, the subject practices training the agent until the agent reaches the goal twice or 200 time steps pass (at 800 milliseconds each). To the right of the applet are the following instructions:

To train the robot:

- '/' rewards recent behavior

- 'z' punishes recent behavior

- The arrow keys do nothing. Kermitbot is in control now.

To control the game environment:

- '0' pauses

- '2' unpauses

Once practice is complete, the subject clicks a button to move to another page. At the top is another video with the following script:

[The real test] The mandatory practice is over, and the real test begins soon. If your practice training didn't go well, don't worry; it's just practice.

[Closing instructions] For the real test, you'll train a little bit longer. The approximate amount of time is at the bottom of the screen. [3 minutes]

Good luck, and thank you.

Through the applet below the video, the subject trains the agent in the session that is actually used as experimental data. To the right of the applet is the same set of instructions as on the previous page (indicating push-keys for training). The training session lasts for the durations reported in the paper, which differ for each of the two experiments. At the end of the training session, the trainer is given a ``finish code'' and is told to return to the original page. On this page is a questionnaire, which we use to screen for non-compliant subjects but have not analyzed further. At the end of the questionnaire, the trainer inputs the finish code received after training, completing their portion of the experiment.

Experimental details