Details

All of the data to be used in the competition was collected using the BodyMedia SenseWear Pro Armband. The SenseWear armband, shown in the figure below, is a sleek, wireless and accurate wearable body monitor that enables continuous physiological monitoring outside the laboratory.

SenseWear Pro armband

The armband is worn on the back of the upper right arm and utilizes a unique combination of sensors. The sensors that are incorporated into the armband are as follows:

  1. Accelerometer: The accelerometer is a 2-axis MEMS device that measures motion. This motion can be mapped to forces exerted on the body and the gravity information from the second axis provides valuable context information for predictive algorithms. This sensor results in 5 channels: an average channel and an approximate variance channel for each axis as well as a channel reporting the number of steps taken by the user.

  2. Heat Flux: The proprietary heat flux sensor is a robust and reliable device that measures the amount of heat being dissipated by the body. The sensor uses materials with very low thermal resistance and extremely sensitive thermocouple arrays. It is placed in a thermally conductive path between the skin and the side of the armband exposed to the environment. A high gain internal amplifier is used to bring the signal to a level that can be sampled by the microprocessor located in the armband.

  3. Galvanic Skin Response: Galvanic Skin Response (GSR) represents electrical conductivity between two points on the wearer’s arm. The GSR sensor in the armband includes two hypoallergenic stainless steel electrodes integrated into the underside of the armband connected to a circuit that measures the skin’s conductivity between these two electrodes. Skin conductivity is affected by the sweat from physical activity and by emotional stimuli.

  4. Skin Temperature: Skin temperature is measured using a highly accurate thermistor-based sensor located on the backside of the armband near its edges and in contact with the skin. Continuously measured skin temperature is linearly reflective of the body’s core temperature.

  5. Near-Body Temperature: The near-body temperature sensor measures the air temperature immediately around the wearer’s armband. This sensor uses a highly accurate thermistor-based sensor and directly reflects the change in environmental conditions around the armband.


The SenseWear Pro armband is comfortable to wear, can be worn continuously, and will store up to 5 days of physiological data before the data needs to be retrieved. The armband is also equipped with wireless capability for retrieving the stored data and also for providing real-time feedback and data retrieval capabilities. Note that the armband data used in this contest is from the SenseWear Pro armband. The current armband offered for sale by BodyMedia is the SenseWear Pro 2 armband – the specifications of the new version differ somewhat from the first version (e.g. it has a replaceable battery rather than a rechargeable, can store 14 days worth of data, and has a slightly different sensor configuration). The participants in the contest need only concern themselves with the SenseWear Pro armband.

There are (at least) four categories of things that can be predicted using a continuous body monitoring device such as the armband: continuous metrics such as energy expenditure, contexts such as engaging in physical activity, conditions such as being ill with the flu, and characteristics such as height. It turns out that collecting data for conditions and continuous metrics can be difficult (either due to the conditions being somewhat rare or the continuous metrics requiring expensive laboratory equipment to measure against a gold standard), whereas collecting data for characteristics and contexts is relatively easy. To help in BodyMedia's algorithm design process, a free living study has been conducted over the past two years, resulting in more than 20,000 hours of free living data. Subjects wear the armband as they go about their daily lives, and they timestamp when they start and stop any of a variety of activities. For example, when they lay down to sleep, they press the timestamp button, and when they get up, they press the timestamp button. All of the data was collected at the discretion and convenience of the subjects as they went about their daily lives and routines. Additionally, each subject kept a journal specifying times when they were definitely not doing the activity they had collected data on. Therefore, it was possible to collect free-living samples of both positive and negative examples of each context. This process has yielded more than 60 gigabytes of well-annotated data.

The training data set for the contest consists of approximately 10,000 hours of this data.  The test data set will be a bit larger, around 12,000 hours of data. In physiological modeling, it is important to generalize to both new data from the same individuals and to new individuals, so the test set will include both new data from individuals that are in the training set and data from individuals not included in the training data. Efforts will be made to insure that the data in both sets is as much from the same distribution as possible in both cases. Each data set is broken down into on-body sessions - that is, contiguous sessions of minute by minute data from the same user. The data sets will include nine calibrated channels derived from the armband's sensors, 2 characteristic columns (i.e. age, smoker, and handedness), and 3 structural variables (user id, session id, and session time). Within each on-body session, an annotation code column denotes the user's context where it has been annotated. For example, the annotation code 1202 might represent running on a treadmill. The annotation code column is 0 where no annotation has been recorded. Additionally, the gender column indicates the gender of the user. The annotation code and gender columns will be included in the training data, but not in the test data.

The contest will consist of three equally weighted parts: identifying two contexts (e.g. annotation code 3004 or not and annotation code 5102 or not) and identifying one characteristic (gender). The learned models should not use the user-id or the session-id variables for prediction; to encourage this, the test set will not include the user-id column. Each entry will consist of a column of predictions for the test set for each of the parts (gender, context 1, and context 2). The score for each component will just be the accuracy of the predictions (minute by minute for the contexts, session by session for the gender predictions).