UT-Austin Computer Vision Group Datasets

Please find data links below, together with the associated papers.

Outfit Attributes dataset

Images partially annotated for collar type, pattern, material, and skirt/pants shape.

Learning the Latent "Look": Unsupervised Discovery of a Style-Coherent Embedding from Fashion Images. W-L. Hsiao and K. Grauman. In Proceedings of the International Conference on Computer Vision (ICCV), Venice, Italy, Oct 2017. [pdf]

Pano2Vid data

360° videos with human annotated viewing directions. Also contains a set of ordinary videos representing the target output distribution.

Pano2Vid: Automatic Cinematography for Watching 360° Videos. Y.-C. Su, D. Jayaraman and K. Grauman. In Proceedings of the Asian Conference on Computer Vision (ACCV), Taipei, Taiwan, Nov 2016. [pdf]

Predicting the location of "interactees" in human-object interactions

Bounding box annotation for objects and persons involved in interaction from subset images of SUN and PASCAL.

Predicting the Location of "Interactees" in Novel Human-Object Interactions. C-Y. Chen and K. Grauman. In Proceedings of the Asian Conference on Computer Vision (ACCV), Singapore, Nov 2014. [pdf]

Text detection ground truth on the Grocery Products and the Glass Video

Text detection ground truth on the Grocery Products dataset and the Glass Video dataset (frames are also included with permissions from the authors)

Text Detection in Stores Using a Repetition Prior. B. Xiong and K.Grauman. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, USA, March 2016. [pdf]

Attributes "shades of meaning" data

Per-user presence/absence label data on 12 attributes, used to discover "shades of meaning" of attributes. Also includes textual explanations the users gave for a select set of labels, and a measure of consistency within a user's own annotations.

Discovering Attribute Shades of Meaning with the Crowd. Adriana Kovashka and Kristen Grauman. International Journal of Computer Vision, Volume 114, Issue 1 , pp 56-73, 2015. [pdf]

UT Zappos50K

Large shoe dataset containing 50,025 catalog images from Zappos.com, along with fine-grained relative attribute labels, annotator rationales, meta-data, and benchmarks.

Fine-Grained Visual Comparisons with Local Learning. A. Yu and K. Grauman. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR),Columbus, OH, June 2014. [pdf]

YouTube-Objects pixel-level ground truth

Pixel-level object masks for a subset of the YouTube-Objects video dataset. Useful to train or evaluate video foreground segmentation algorithms.

Supervoxel-Consistent Foreground Propagation in Video. S. Jain and K. Grauman. In Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland, Sept 2014. [pdf]

UT Egocentric Data

Four videos from head-mounted cameras, each 3-5 hours long, captured in an uncontrolled environment. Faces blurred for privacy reasons.

Story-Driven Summarization for Egocentric Video. Z. Lu and K. Grauman. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, June 2013. [pdf]
Discovering Important People and Objects for Egocentric Video Summarization. Y. J. Lee, J. Ghosh, and K. Grauman. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, June 2012. [pdf]

UT Snap Point Dataset

Human judgement on snap point quality of a subset of frames from UT Egocentric dataset and a newly collected mobile robot dataset (frames are also included)

Detecting Snap Points in Egocentric Video with a Web Photo Prior. B. Xiong and K. Grauman. In Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland, Sept 2014. [pdf]

Shoes with Relative Attributes Dataset

Relative attributes annotations for 14,658 shoe images from the Attribute Discovery dataset

WhittleSearch: Image Search with Relative Attribute Feedback. Adriana Kovashka, Devi Parikh, and Kristen Grauman. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, June 2012. [pdf]

Instance-level relative attribute annotations for PubFig and OSR

Human judgments on the relative strength of attributes present in pairs of images from the PubFig dataset of face images and the Outdoor Scene Recognition dataset

WhittleSearch: Image Search with Relative Attribute Feedback. Adriana Kovashka, Devi Parikh, and Kristen Grauman. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, June 2012. [pdf]

Attribute perception annotations

Attribute presence/absence labels from individual annotators, used for learning personalized adapted attribute models.

Attribute Adaptation for Personalized Image Search. Adriana Kovashka and Kristen Grauman. In Proceedings of the International Conference on Computer Vision (ICCV), December 2013. [pdf]

Interactive image segmentation

Human input annotations with timing information for a subset of images from the MSRC, iCoseg and IIS datasets. Includes bounding box, sloppy contour, and tight polygon masks.

Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation. S. Jain and K. Grauman. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Sydney, Australia, December 2013. [pdf]

Visual rationales data

Human annotators' visual rationales for scenes, "hot or not" and public figures.

Annotator Rationales for Visual Recognition. J. Donahue and K. Grauman. Proceedings of the International Conference on Computer Vision (ICCV), Barcelona, Spain, November 2011. [pdf]

List of activity recognition datasets from other sources (compiled by Chao-Yeh Chen)

Video labels and flow (for active video label propagation)

Densely labelled images with ground truth labels and optical flow for LabelMe Video, CamSeq01 Video and Segtrack Dataset.

Active Frame Selection for Label Propagation in Videos. S. Vijayanarasimhan and K. Grauman. In Proceedings of the European Conference on Computer Vision (ECCV), Florence, Italy, October 2012. [pdf]

Ground truth segmentation for MSRC-v0

Ground-truth annotations collected from Amazon Mechanical Turk.

Object-Graphs for Context-Aware Category Discovery. Y. J. Lee and K. Grauman. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, June 2010. (Oral) [pdf]
Collect-Cut: Segmentation with Top-Down Cues Discovered in Multi-Object Images. Y. J. Lee and K. Grauman. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, June 2010. [pdf]

Trimmed action intervals for Hollywood dataset

Tighter action interval annotations on Hollywood activity recognition dataset.

Active Learning of an Action Detector from Untrimmed Videos. S. Bandla and K. Grauman. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Sydney, Australia, December 2013. [pdf]

Tagged image data for Labelme

Ordered tag lists collected from Mechanical Turk workers for LabelMe image dataset.

Reading Between The Lines: Object Localization Using Implicit Cues from Image Tags. S. J. Hwang and K. Grauman. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, June 2010. (Oral) [pdf]
Accounting for the Relative Importance of Objects in Image Retrieval. S. J. Hwang and K. Grauman. In Proceedings of the British Machine Vision Conference (BMVC), Aberystwyth, UK, September 2010. (Oral) [pdf]

Tagged image data for PASCAL 2007

Ordered tag lists collected from Mechanical Turk workers for PASCAL images.

Reading Between The Lines: Object Localization Using Implicit Cues from Image Tags. S. J. Hwang and K. Grauman. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, June 2010. (Oral) [pdf]
Accounting for the Relative Importance of Objects in Image Retrieval. S. J. Hwang and K. Grauman. In Proceedings of the British Machine Vision Conference (BMVC), Aberystwyth, UK, September 2010. (Oral) [pdf]

Key-segments features for SegTrack data

Optical flow, BPLRs, and region segmentations computed for the SegTrack videos.

Key-Segments for Video Object Segmentation. Y. J. Lee, J. Kim, and K. Grauman. In Proceedings of the International Conference on Computer Vision (ICCV), Barcelona, Spain, November 2011. [pdf]

Annotation time data for MSRC images

What’s It Going to Cost You?: Predicting Effort vs. Informativeness for Multi-Label Image Annotations. S. Vijayanarasimhan and K. Grauman. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, June 2009. [pdf]

Geo-located Flickr images for Rome and NYC

Sequences of Flickr tourist photos with ground truth GPS coordinates. ~60K images per city.

Clues from the Beaten Path: Location Estimation with Bursty Sequences of Tourist Photos. C.-Y. Chen and K. Grauman. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, June 2011. [pdf]

Abnormal event annotations

Ground-truth annotations of abnormal events in the subway station sequences from the dataset of Adam et al.

Observe Locally, Infer Globally: a Space-Time MRF for Detecting Abnormal Activities with Incremental Updates. J. Kim and K. Grauman. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, June 2009. [pdf]

PASCAL VOC cat and dog pixel-level ground truth

Ground truth used to evaluate segmentation-based detector on the cat and dog classes of PASCAL.

Efficient Region Search for Object Detection. S. Vijayanarasimhan and K. Grauman. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, June 2011. [pdf]