UT-Austin Computer Vision Group Datasets


Please find data links below, together with the associated papers.


UT Zappos50K

Large shoe dataset containing 50,025 catalog images from Zappos.com, along with fine-grained relative attribute labels, annotator rationales, meta-data, and benchmarks.



UT Egocentric Data

Four videos from head-mounted cameras, each 3-5 hours long, captured in an uncontrolled environment.  Faces blurred for privacy reasons.



Shoes with Relative Attributes Dataset

Relative attributes annotations for 14,658 shoe images from the Attribute Discovery dataset


Instance-level relative attribute annotations for PubFig and OSR

Human judgments on the relative strength of attributes present in pairs of images from the PubFig dataset of face images and the Outdoor Scene Recognition dataset


Attribute perception annotations

Attribute presence/absence labels from individual annotators, used for learning personalized adapted attribute models.



Interactive image segmentation

Human input annotations with timing information for a subset of images from the MSRC, iCoseg and IIS datasets. Includes bounding box, sloppy contour, and tight polygon masks.

 




Visual rationales data

Human annotators' visual rationales for scenes, "hot or not" and public figures.



List of activity recognition datasets from other sources (compiled by Chao-Yeh Chen)



Video labels and flow (for active video label propagation)

Densely labelled images with ground truth labels and optical flow for LabelMe Video, CamSeq01 Video and Segtrack Dataset.


Ground truth segmentation for MSRC-v0

Ground-truth annotations collected from Amazon Mechanical Turk
.


Trimmed action intervals for Hollywood dataset

Tighter action interval annotations on Hollywood activity recognition dataset. 



Tagged image data for Labelme

Ordered tag lists collected from Mechanical Turk workers for LabelMe image dataset.


Tagged image data for PASCAL 2007

Ordered tag lists collected from Mechanical Turk workers for PASCAL images.


Key-segments features for SegTrack data

Optical flow, BPLRs, and region segmentations computed for the SegTrack videos.


Annotation time data for MSRC images


Geo-located Flickr images for Rome and NYC

Sequences of Flickr tourist photos with ground truth GPS coordinates.   ~60K images per city.

Abnormal event annotations

Ground-truth annotations of abnormal events in the subway station sequences from the dataset of Adam et al.

PASCAL VOC cat and dog pixel-level ground truth

Ground truth used to evaluate segmentation-based detector on the cat and dog classes of PASCAL.