UT-Austin Computer Vision Group Datasets

Please find data links below, together with the associated papers.

Pano2Vid data

    360° videos with human annotated viewing directions. Also contains a set of ordinary videos representing the target output distribution.

Predicting the location of "interactees" in human-object interactions

Bounding box annotation for objects and persons involved in interaction from subset images of SUN and PASCAL.

Text detection ground truth on the Grocery Products and the Glass Video

Text detection ground truth on the Grocery Products dataset and the Glass Video dataset (frames are also included with permissions from the authors) 

Attributes "shades of meaning" data

Per-user presence/absence label data on 12 attributes, used to discover "shades of meaning" of attributes. Also includes textual explanations the users gave for a select set of labels, and a measure of consistency within a user's own annotations. 

UT Zappos50K

Large shoe dataset containing 50,025 catalog images from Zappos.com, along with fine-grained relative attribute labels, annotator rationales, meta-data, and benchmarks.

YouTube-Objects pixel-level ground truth

Pixel-level object masks for a subset of the YouTube-Objects video dataset.  Useful to train or evaluate video foreground segmentation algorithms.

UT Egocentric Data

Four videos from head-mounted cameras, each 3-5 hours long, captured in an uncontrolled environment.  Faces blurred for privacy reasons.

UT Snap Point Dataset

Human judgement on snap point quality of a subset of frames from UT Egocentric dataset and a newly collected mobile robot dataset (frames are also included)

Shoes with Relative Attributes Dataset

Relative attributes annotations for 14,658 shoe images from the Attribute Discovery dataset

Instance-level relative attribute annotations for PubFig and OSR

Human judgments on the relative strength of attributes present in pairs of images from the PubFig dataset of face images and the Outdoor Scene Recognition dataset

Attribute perception annotations

Attribute presence/absence labels from individual annotators, used for learning personalized adapted attribute models.

Interactive image segmentation

Human input annotations with timing information for a subset of images from the MSRC, iCoseg and IIS datasets. Includes bounding box, sloppy contour, and tight polygon masks.


Visual rationales data

Human annotators' visual rationales for scenes, "hot or not" and public figures.

List of activity recognition datasets from other sources (compiled by Chao-Yeh Chen)

Video labels and flow (for active video label propagation)

Densely labelled images with ground truth labels and optical flow for LabelMe Video, CamSeq01 Video and Segtrack Dataset.

Ground truth segmentation for MSRC-v0

Ground-truth annotations collected from Amazon Mechanical Turk

Trimmed action intervals for Hollywood dataset

Tighter action interval annotations on Hollywood activity recognition dataset. 

Tagged image data for Labelme

Ordered tag lists collected from Mechanical Turk workers for LabelMe image dataset.

Tagged image data for PASCAL 2007

Ordered tag lists collected from Mechanical Turk workers for PASCAL images.

Key-segments features for SegTrack data

Optical flow, BPLRs, and region segmentations computed for the SegTrack videos.

Annotation time data for MSRC images

Geo-located Flickr images for Rome and NYC

Sequences of Flickr tourist photos with ground truth GPS coordinates.   ~60K images per city.

Abnormal event annotations

Ground-truth annotations of abnormal events in the subway station sequences from the dataset of Adam et al.

PASCAL VOC cat and dog pixel-level ground truth

Ground truth used to evaluate segmentation-based detector on the cat and dog classes of PASCAL.