Arjun Karpur

Computer Science Student @ The University of Texas at Austin

Hello! I am currently a 4th year undergraduate student at The University of Texas at Austin, studying computer science as a part of the Turing Scholars Program. I am also pursuing a Masters degree in Computer Science via the 5-Year BS/MS Integrated Program with an expected graduation date of May 2019. At UT, I am a member of the Graphics & AI research lab, directed by Dr. Qixing Huang. My research interests are primarily in 2D/3D computer vision, with projects related to viewpoint/keypoint estimation and domain adaptation.

Over the past few summers, I have worked as a Software Engineering Intern for Apple, Tableau Software, and Intel. This summer (2018), I will be working for Hover Inc. as a Computer Vision Engineering Intern.

In my spare time, I enjoy woodworking (1, 2, 3), basketball, & music (jazz, hip-hop)!

Email Resume LinkedIn Github


Enforcing View Consistency With Latent Configurations for 3D Vision Tasks [2018]
Arjun Karpur, Qixing Huang
Undergraduate Honors Thesis - [PDF] [Code]

Common computer vision tasks, such as image classification, object detection, and pose estimation, benefit from large publicly available datasets containing well-annotated exemplars. These datasets make using a Convolutional Neural Network straightforward and often lead to great results. More obscure vision tasks suffer from limited annotated data for the corresponding information, especially in the 3D computer vision domain where annotation is often more difficult to recover. As such, many of these tasks require either intense data pre-processing and labeling per data instance or, more reasonably, use of unsupervised or semi-supervised learning techniques. For general 3D vision tasks, we argue that images from different viewpoints of the same object will yield near-identical results. Enforcing a view consistency constraint provides effective regularization that allows for utilization of unlabeled instances while transferring knowledge from a source domain to a target domain. View consistency can be optimized using a latent variable that persists across all images of the same object instance, increasing efficiency and allowing for output distribution alignment. To show the effectiveness of this constraint, we demonstrate learning 3D keypoint estimation and shape reconstruction in various unlabeled image datasets using a view-consistency approach to domain adaptation.

Unsupervised Domain Adaptation for 3D Keypoint Prediction from a Single Depth Scan [2017]
Xingyi Zhou, Arjun Karpur, Chuang Gan, Linjie Lou, Qixing Huang
arXiv 1712.05765 Preprint - [PDF] [Code] [Data]

We introduce a novel unsupervised domain adaptation technique for the task of 3D keypoint prediction from a single depth scan/image. Our key idea is to utilize the fact that predictions from different views of the same or similar objects should be consistent with each other. Such view consistency provides effective regularization for keypoint prediction on unlabeled instances. In addition, we introduce a geometric alignment term to regularize predictions in the target domain. The resulting loss function can be effectively optimized via alternating minimization. We demonstrate the effectiveness of our approach on real datasets and present experimental results showing that our approach is superior to state-of-the-art general purpose domain adaptation techniques.

StarMap for Category-Agnostic Keypoint and Viewpoint Estimation [2018]
Xingyi Zhou, Arjun Karpur, Linjie Lou, Qixing Huang
arXiv 1803.09331 Preprint - [PDF] [Code]

We propose a category-agnostic keypoint representation encoded with their 3D locations in the canonical object views. Our intuition is that the 3D locations of the keypoints in canonical object views contain rich semantic and compositional information. Our representation thus consists of a single channel, multi-peak heatmap (StarMap) for all the keypoints and their corresponding features as 3D locations in the canonical object view (CanViewFeature) defined for each category. Not only is our representation flexible, but we also demonstrate competitive performance in keypoint detection and localization compared to category-specific state-of-the-art methods. Moreover, we show that when augmented with an additional depth channel (DepthMap) to lift the 2D keypoints to 3D, our representation can achieve state-of-the-art results in viewpoint estimation. Finally, we demonstrate that each individual component of our framework can be used on the task of human pose estimation to simplify the state-of-the-art architectures.

Cross-Object Viewpoint Estimation via Domain Adaptation [2017]
Arjun Karpur, Xingyi Zhou
Visual Recognition (Grad) Final Project - [PDF] [Code]

In this paper, we study the problem of object viewpoint estimation from a single RGB image. The importance of this problem is poorly represented by the small presence of available image data with corresponding viewpoint annotations, mostly because of the inherent difficulty to annotate 3D training data from a 2D image. We propose a framework that learns an embedding which is invariant to both synthesized-or-real domains as well as object classes. The invariant embedding is realized by using a gradient reversal layer, which discourage the learned embedding to encode the domain or class information by reverse the gradient during back-propagation in training. We perform testing on the Pascal3D+ dataset and provide comparison and ablation study results.

Multiple User Biometric for Authentication to Secured Resources [2017]
Jim Baca, Arjun Karpur, Dhaval Patel, Preetham Shambhat, Naissa Conde, Prital Shah, A.G. Ramesh, Tobias Kohlenberg
US Patent 9,646,216 - [PDF]

Various embodiments are generally directed to the provision and use of multiple person biometric authentication systems. An apparatus including a processor element and logic executable by the processor component is disclosed. The logic is configured to cause the apparatus to receive information including an indication of a plurality of biometric measurements and generate a combined biometric indicator based in part on the plurality of biometric measurements. The combined biometric indicator can be generated using fuzzy hashing.