Description of Data
The training set (available at http://www.davidandre.com/TrainingSet.zip and http://www.cs.utexas.edu/~pstone/Workshops/2004icml/TrainingSet.zip) and test set will both be .csv files (comma separated values). The first row will have the column labels for the data. Each subsequent row corresponds to a moment in time for which data was collected.
What follows is a brief
description of the column labels. Any requests for clarification can be
directed to the pdmc_ml
mailing list. Note that some of the semantics is intentionally left
vague. For example, we're not revealing which gender class corresponds
to which gender, nor the nature of the activities for each annotation
value. The reason for this is to keep approaches purely data-driven. At
the workshop, further details will be revealed.
|
userID |
Each subject has a unique userID. All rows with the same userID belong to the same subject. This variable should not be used in your models. |
|
sessionID |
Each time a subject wears their SenseWear Pro armband, the data for that period of time is labeled as a session and is assigned a unique sessionID. This variable should not be used in your models. |
|
sessionTime |
For each session, time is recorded at 1 minute intervals. The data in the sessionTime column represents the number of milliseconds since the beginning of the current session. |
|
characteristic[1..2] |
The two characteristic columns represent two different characteristics of the subject. |
|
annotation |
Subjects wear the armband as they go about their daily lives and they timestamp when they start and stop any of a variety of activities. For example, when they lay down to sleep, they press the timestamp button, and when they get up, they press the timestamp button. They are instructed to annotate their data for each activity. The annotation column gives you the annotation for the current activity. Please note that they are instructed to annotate only when they are 100% sure that they are performing the activity, so the subject might be performing the activity even though it is not annotated. Also note that annotation value 0 is for unknown activities. |
|
gender |
The gender column has either a 1 or a 0 depending on the gender of the subject. |
|
sensor[1..9] |
The BodyMedia armbands collect data from the sensors and store it as channels once a minute as described on the website. These channels are shown in the sensor columns. |
Please note: The test set will not contain the following columns: userID, gender, and annotation.
Tasks
The competition consists of three tasks. Each entrant will be graded separately but equally on each task. The team with the highest combined score will be declared the winner. A small prize will be awarded to the winning competitors, and honorable mentions given to the best competitor on each sub-part of the competition. Each entry will consist of a column of predictions for the test set for each of the parts (gender, context 1, and context 2). The score for each component will just be the accuracy of the predictions (minute by minute for the contexts, session by session for the gender predictions).
The first task is to return a column of predictions for the test set labeled predictedGender for the gender for every sessionID. This column should be a 1 or a 0—the same notation as used in the gender column in the training set.
The second task is to correctly identify when a person is participating in context 1, signified by annotations with the value 3004. You should return a column labeled context1 with a 1 if the person is participating in the activity and a 0 if the person is not.
Positive examples of context 1: 3004
Could be in either class of context 1: 0, 3003, 5199, 5101
Negative examples of context 1: All other annotations.
The third and final task is to correctly identify when a person is participating in context 2, signified by annotations with the value 5102. You should return a column labeled context2 with a 1 if the person is participating in the activity and a 0 if the person is not.
Positive examples of context 2: 5102
Could be in either class of context 2: 0, 5103, 2901, 2902
Negative examples of context 2: All other annotations.