CS3934: Reinforcement Learning -- Final Project

Final Project


 
Your final programming project can take one of two forms.
  • Practice (preferred): An implemenation of RL in some domain of your choice - ideally one that you are using for research or in some other class. In this case, please describe the domain and your initial plans on how you intend to implement learning. What will the states and actions be? What algorithm(s) do you expect will be most effective?
  • Theory: A proposal, implementation and testing of an algorithmic modification to an RL algorithm presented in the book. In this case, please describe the modification you propose to investigate and on what type of domain (possibly a toy domain) it is likely to show an improvement over things considered in the book.
  • You may try to build on some of the chapters or research papers you have read (or will read) in the class; you may try to reimplement something you've found interesting that others have done; you can try to do something that has never been done before; you may write new code from scratch; you may modify existing code. It's up to you!

    You are required to work in teams of 2 and are strongly encouraged to work on all aspects together (i.e. pair programming rather than divide and conquer). Teams should only turn in one submission. However, each person must turn in an independently-written summary of each person's contribution to the final product.

    You may build on existing work for this project and utilize existing code (your own or code found on the web), but you must give proper attribution to all existing work that you build on and make clear what your new contribution is. Any unattributed or uncited work that you use will be considered a breach of academic honesty and dealt with according to the course policy in the syllabus. Furthermore, you may not claim your own existing work as a new contribution. You may build on your own work, but it must be clearly cited as existing work and you must do new work for the class project.

    The schedule is as follows.


  • Project Proposal due on Thursday, October 24th at 11:59pm.
  •     Submit a proposal via gradescope including:  
       
    The proposal should be written with the goal of convincing us that what you are proposing to do is interesting and non-trivial (though not necessarily completely original - see below).

    It is completely legitimate to propose to do something based on something you read about provided that you are going to do the coding yourself. Just make sure to acknowledge any ideas (and code) that you borrow and be sure to clearly identify what you are going to do.

    We encourage you to look ahead to topics that will be covered later in the course that may interest you, or to focus on a topic of interest that will not be covered in this course.

    Be as specific as you can at this point. The more specific you are, the more detailed feedback you will get. For example, if you are doing an "applications" project:
  • In what sense is your problem sequential?
  • What is your problem's state space?
  • What is your problem's action space?
  • What reward function will you use?
  • What is the simplest possible first result that you will try to get? What RL algorithm will you use? What will be the baseline you compare against?
  • What will be the stretch goal for your project?
  • Even if you are doing more of an algorithm-based or theory-based project, try to be similarly specific about what you intend to study. What is your main question?
    And of course if you are reimplementing an existing technique or replicating a prior experiment, say exactly which.

  • Note that we will not be able to provide feedback on all proposals. This will be treated as a completion grade. However, we will read all proposals and will give feedback on those that need refinement.

  • Literature Survey due on Thursday, November 14th at 11:59pm.
  •    

  • Your final Project is due on Monday, December 9th at 11:59pm., with an extention to Sunday, December 15th at 11:50pm. Notice that each extra day takes 1 point off (~5% of the final project grade).
       Submit your final project including:

  • Source code, executable and README . We recommend you using a github repository to hold the source code, executable and a README file that provides a brief guide to run your code. In this case, you just need to provide the github link within your final report. If you want to keep your repository private for any reason, please zip your project folder including the source code, executable and the README file and upload the zipfile to the GradeScope under the Final Project Code .
  • A 5-minute mp4 YouTube video summarizing your project and the main results (make sure to choose unlist option so that only people with the URL can access your video).The URL should be included in your final report PDF, at the beginning of your report, along with your github link. The detail instructions for uploading the video are on Piazza.
  • A detailed written report describing your project, including its merits, and its deficincies. As much as possible, you should relate your approach to the readings from throughout the course. View this report as a term paper. It is in place of a final exam and will be a large factor in your final grade for the project and for the course. The report should be roughly in the style of a conference paper, including introduction, motivation, related work, etc. All writing should be your own -- all quotes must be clearly attributed.
  • Recall the points from the propoal and literature survey above. In particular, for applications projects be very clear about how you model your problem, and in what sense it's sequential.
  • Include at least 10 citations with full bibliographic references to acknowledge where your ideas came from.
  • Be very clear about what code you've used from other sources, if any. Clear citations are essential. Failure to credit ideas and code from external sources is cheating.
  • Make sure you evaluate both the good and bad points of your approach.
  • Show results of at least one experiment evaluating some aspect of or your entire approach, preferably showing error bars or some sort of statistical measure of the significance. Even if you didn't accomplish your goal, evaluate what you did do.
  • A single well-analyzed experiment in a simple domain that compares clearly against a baseline is preferable to a shallow set of experiments across many domains.
  • If any parameteres are mentioned in the report, be sure to mention how you arrived at their values. Was it the first thing you tried? Trial and error? Roughly how many trials? etc.
  • Remember to proofread and spell-check!

  • Each team member should individually (and privately) identify what was your role in the overall project, and what was your partner's role. If everything was done together, a short statement to that effect is sufficient. If you feel that your partner has not contributed adequately, this is the opportunity to let us know. Please submit a single PDF including the above to GradeScope under the Final Project Contribution. This time, each team member should individually upload this file. Do not forget to include the project title, your name and EID in this file.

  • [Back to Class Homepage]

    Page maintained by Peter Stone
    Questions? Send me mail