I am a fifth year (2017-) Computer Science Ph.D. student at The University of Texas at Austin, supervised by Prof. Philipp Krähenbühl. I obtained my bachelor degree from School of Computer Science at Fudan University, advised by Prof. Wei Zhang and Prof. Xiangyang Xue. I have interned at Microsoft Research Asia, Google Research, Intel Labs, and Facebook AI Research. My research is supported by Facebook Fellowship since 2021.
My research focuses on object-level visual recognition, including object detection, 3D perception, pose estimation, and tracking.
I am looking for a full-time job starting at fall 2022.CV / Google Scholar / GitHub / LinkedIn
Last updated January 2022
Large-scale well-curated datasets are treasures in computer vision. However, most datasets only focus on one single domain with a specific task and a fixed label set. Computer vision models trained on a single dataset can not generalize to all applications in the real world.
The goal of my research is to remove the artificial barriers of datasets and make object recognition generalize in the wild. There should be one single computer vision model, not a zoo of dataset-specific models. The model should be trained on a diverse set of datasets and should be able to recognize objects from different data sources in all domains.
Towards this goal, my research focuses on three aspects: 1. How to build a unified object representation for various vision tasks. I am proposing a point-based representation for object detection, 3D detection, and pose estimation (CenterNet, CenterNet2, CenterPoint). 2. How to generalize object recognition through time. I propose a simple solution to extend our point-based detector into a local tracker (CenterTrack), and then introduce a global tracker that does recognition globally over time. 3. How to expand the vocabulary of an object detection system. I learn a unified label space for different detection datasets (UniDet) and combine classification datasets with twenty-thousand classes (Detic).
Going further, I am interested in training a unified object recognition system that performs multi-tasks using weak or incomplete supervisions.