Due: Thu, 26 Jun 2014, 10pm
70 pts, 7% of total grade.
Write a program to win the Netflix Prize in Python.
Ignore the qualifying data. It's just there for explanation.
Just use the probe data and produce an RMSE of less than 1.0 and a runtime of less than 1 min.
Note: Because the Netflix files are very large, it's impractical to make copies of them to your local machine. Your best bet is to develop the solution on the CS machines.
- 17,770 movies
- 480,189 customers
- about 100,000,000 ratings
- about 5,600 ratings per movie
- about 200 ratings per customer
- 2,836,401 ratings, no customers from training data
- 1,425,333 ratings, all customers from training data
| /u/downing/cs/netflix/training_set/* (1.4 GB)
/u/downing/cs/netflix/training_set/mv_0002043.txt (162 KB)
17,770 files total (one per movie)
|1 - 2,649,429
|1 - 5||Oct 1998 - Dec 2005
|7,872 records total|
|/u/downing/cs/netflix/movie_titles.txt (578 KB)|
|1 - 17,770||1890 - 2005|
|17,770 records total|
|/u/downing/cs/netflix/qualifying.txt (52.5 MB)|
|1 - 2,649,429
|Oct 1998 - Dec 2005
|2,836,401 records total|
| /u/downing/cs/netflix/probe.txt (10.8 MB)
RMSE = 0.9474
|1 - 2,649,429
|1,425,333 records total|
standard in (a single test case) and cache files
Mukund will have a clone of the public test repo with all of the cache files here: /u/mukund/netflix-tests, so that you can hardcode the pathnames of the cache files you choose to use.
Mukund will keep the clone up-to-date regularly.
standard out (a single test case)
|1.0 - 5.0|
|1,425,333 records total|
Your program only processes a single test case via standard in and standard out.
With probe.txt being provided via standard in, your program produces the predictions being sought.
The output is large (10 MB), so it's more convenient for the grader to run your program and for you to not turn it in.
These are additional descriptions of the underlying math:
- Estimate time to completion.
- Create a private Git repository at GitHub, named cs373-netflix.
- Add these requirements to the issue tracker at GitHub, at least 10 issues.
Add at least 10 more issues, one for each bug or feature, both open and closed with a good description and a label.
- Invite the grader to your private code repo.
- Clone your private code repo onto your local directory.
- Make at least 5 commits, one for each bug or feature.
If you cannot describe your changes in a sentence, you are not committing often enough.
Make meaningful commit messages identifying the corresponding issue in the issue tracker (see here).
- Clone the public class repo onto your local directory.
It is critical that you clone the public class repo into a different directory than the one you're using for your private code repo.
- Copy the code files from the clone of the public class repo to the clone of the private code repo.
- Write unit tests in TestNetflix.py that test corner cases and failure cases until you have an average of 3 tests for each function, confirm the expected failures, and add, commit, and push to the private code repo.
- Implement and debug the simplest possible solution in Netflix.py with assertions that check pre-conditions, post-conditions, argument validity, and return-value validity, until all tests pass, and add, commit, and push to the private code repo.
- Create 1000 lines of acceptance tests in RunNetflix.in and RunNetflix.out that test corner cases and failure cases, and add, commit, and push to the private code repo.
- Pass five other students' acceptance tests.
- Clone the public test repo onto your local directory.
It is critical that you clone the public test repo into a different directory than the one you're using for your private code repo.
- Copy your unit tests and your acceptance tests to your clone of the public test repo, rename the files, do a git pull to synchronize your clone, and then add, commit and push to the public test repo.
The files MUST be named <cs-username>-RunNetflix.in, <cs-username>-RunNetflix.out, <cs-username>-TestNetflix.py, and <cs-username>-TestNetflix.out in the public test repo.
- Implement (or reuse) and debug the simplest possible set of caches until all tests pass, and add, commit, and push to the private code repo.
- Run pydoc on Netflix.py, which will create Netflix.html, that then documents the interfaces to your functions.
Create inline comments if you need to explain the why of a particular implementation.
Use a consistent coding convention with good variable names, good indentation, blank lines, and blank spaces.
- Create a log of your commits in Netflix.log.
- Obtain the git SHA with
git rev-parse HEAD
- Fill in the Google Form.
- It is your responsibility to protect your code from the rest of the students in the class. If your code gets out, you are as guilty as the recipient of academic dishonesty.
Requirements for getting a non-zero grade.
- [ 5 pts] GitHub private repo with grader invited as collaborator and a log of the commits.
- [ 5 pts] GitHub issue tracker with issues from requirements and more.
- [15 pts] Standard-compliant Python 3.2.3 with an RMSE of less than 1.0 and a runtime of less than 1 min on probe.txt.
- [15 pts] Average of 3 unit tests per function with good coverage in the public test repo with the precise naming of the files.
- [15 pts] 1000 lines of acceptance tests in the public test repo with the precise naming of the files. Your code must successfully pass five other students' acceptance tests with an RMSE of less than 1.0.
- [10 pts] Pydoc documentation.
- [ 5 pts] Google Form with time estimate.
- You can earn 5 bonus pts, if you produce an RMSE of less than 0.9474.
- You can earn another 5 bonus pts, if you work with a partner using pair programming and vouch for the fact that you worked on the project together for more than 75% of the time.
Only one solution must be turned in for the pair. If two solutions are turned in, there will be a 10% penalty, and the later one will be graded.
- Bonus pts will not increase the total score beyond the max score.
- Git Cheat Sheet
- Git Generating SSH Keys
- Git Guide
- Git Guides
- Git Immersion
- Git Reference
- Google Python Style Guide
- Try GitHub
|Name||GitHub ID||GitHub Test Repository||Google Form|
|Mukund Rathi||004rathim||netflix-tests||Google Form|