All
of the data to be used in the competition was collected using the
BodyMedia SenseWear Pro Armband. The SenseWear armband, shown in the
figure below, is a sleek, wireless and accurate wearable body monitor
that enables continuous physiological monitoring outside the laboratory.

The armband is worn on the back of the upper right arm and utilizes a unique combination of sensors. The sensors that are incorporated into the armband are as follows:
Accelerometer: The accelerometer is a 2-axis MEMS device that measures motion. This motion can be mapped to forces exerted on the body and the gravity information from the second axis provides valuable context information for predictive algorithms. This sensor results in 5 channels: an average channel and an approximate variance channel for each axis as well as a channel reporting the number of steps taken by the user.
Heat Flux: The proprietary heat flux sensor is a robust and reliable device that measures the amount of heat being dissipated by the body. The sensor uses materials with very low thermal resistance and extremely sensitive thermocouple arrays. It is placed in a thermally conductive path between the skin and the side of the armband exposed to the environment. A high gain internal amplifier is used to bring the signal to a level that can be sampled by the microprocessor located in the armband.
Galvanic Skin Response: Galvanic Skin Response (GSR) represents electrical conductivity between two points on the wearer’s arm. The GSR sensor in the armband includes two hypoallergenic stainless steel electrodes integrated into the underside of the armband connected to a circuit that measures the skin’s conductivity between these two electrodes. Skin conductivity is affected by the sweat from physical activity and by emotional stimuli.
Skin Temperature: Skin temperature is measured using a highly accurate thermistor-based sensor located on the backside of the armband near its edges and in contact with the skin. Continuously measured skin temperature is linearly reflective of the body’s core temperature.
Near-Body Temperature: The near-body temperature sensor measures the air temperature immediately around the wearer’s armband. This sensor uses a highly accurate thermistor-based sensor and directly reflects the change in environmental conditions around the armband.
The SenseWear Pro armband is
comfortable to wear, can be worn continuously, and will store up to 5
days of physiological data before the data needs to be retrieved. The
armband is also equipped with wireless capability for retrieving the
stored data and also for providing real-time feedback and data retrieval
capabilities. Note that the armband data used in this contest is from
the SenseWear Pro armband. The current armband offered for sale by
BodyMedia is the SenseWear Pro 2 armband – the specifications of the new
version differ somewhat from the first version (e.g. it has a
replaceable battery rather than a rechargeable, can store 14 days worth
of data, and has a slightly different sensor configuration). The
participants in the contest need only concern themselves with the
SenseWear Pro armband.
There are (at least) four categories of things that can be
predicted using a continuous body monitoring device such as the armband:
continuous metrics such as energy expenditure, contexts such as engaging
in physical activity, conditions such as being ill with the flu, and
characteristics such as height. It turns out that collecting data for
conditions and continuous metrics can be difficult (either due to the
conditions being somewhat rare or the continuous metrics requiring
expensive laboratory equipment to measure against a gold standard),
whereas collecting data for characteristics and contexts is relatively
easy. To help in BodyMedia's algorithm design process, a free living
study has been conducted over the past two years, resulting in more than
20,000 hours of free living data. Subjects wear the armband as
they go about their daily lives, and they timestamp when they start and
stop any of a variety of activities. For example, when they lay down to
sleep, they press the timestamp button, and when they get up, they
press the timestamp button. All of the data was collected at the
discretion and convenience of the subjects as they went about their
daily lives and routines. Additionally, each subject kept a journal
specifying times when they were definitely not doing the activity they
had collected data on. Therefore, it was possible to collect free-living
samples of both positive and negative examples of each context. This
process has yielded more than 60 gigabytes of well-annotated data.
The
training data set for the contest consists of approximately 10,000
hours of this data. The test data set will be a bit larger,
around 12,000 hours of data. In physiological modeling, it is important
to generalize to both new data from the same individuals and to new
individuals, so the test set will include both new data from
individuals that are in the training set and data from individuals not
included in the training data. Efforts will be made to insure that the
data in both sets is as much from the same distribution as possible in
both cases. Each data set is broken down into on-body sessions - that
is, contiguous sessions of minute by minute data from the same user.
The data sets will include nine calibrated channels derived from the
armband's sensors, 2 characteristic columns (i.e. age, smoker, and
handedness), and 3 structural variables (user id, session id, and
session time). Within each on-body session, an annotation code column
denotes the user's context where it has been annotated. For example,
the annotation code 1202 might represent running on a treadmill. The
annotation code column is 0 where no annotation has been recorded.
Additionally, the gender column indicates the gender of the user. The
annotation code and gender columns will be included in the training
data, but not in the test data.
The contest will consist of three equally weighted parts: identifying two contexts (e.g. annotation code 3004 or not and annotation code 5102 or not) and identifying one characteristic (gender). The learned models should not use the user-id or the session-id variables for prediction; to encourage this, the test set will not include the user-id column. Each entry will consist of a column of predictions for the test set for each of the parts (gender, context 1, and context 2). The score for each component will just be the accuracy of the predictions (minute by minute for the contexts, session by session for the gender predictions).