Computer Vision Group @ SJTU



We study Computer Vision and Robotics, focusing on the computational principles underlying Artificial Intelligence. We are interested in building robots that automatically understand and interact with the physical worlds, both inferring the semantics and extracting 3D structure.

We design end-to-end algorithms to learn deep 3D representations from big 3D data for visual scene understanding. We believe that it is critical to consider the role of a machine as an active explorer in a 3D world, such as a robot, and learn from rich 3D data close to the natural input to human visual system.

Specifically, our group is at the frontier of 3D Deep Learning, RGB-D Recognition and Reconstruction, Deep Learning for Robotics, Place-centric 3D Context Representation, Synthesis for Analysis, Big Data Robotics, Robot Learning, Large-scale Crowd-sourcing, and Petascale Big Data. As a real-world test for our research, we also focus on three key applications: Personal Robotics, Autonomous Driving, and Augmented Reality.


  • 3rd/4th place winners at the Amazon Picking Challenge 2016 with Team MIT.
  • Deep Sliding Shapes is featured on the Princeton Engineering news.
  • Ari Seff wins the 2016 NDSEG Fellowship.
  • Hosting 3D Deep Learning with Marvin tutorial at CVPR2016.
  • Marvin: our N-D deep learning framework at
  • Our DeepDriving system learns to drive a car using deep learning.
  • Facebook and Google use our LSUN and PLACES to dream deeply.
  • Large-scale Scene Understanding (LSUN Challenge) at CVPR2015.
  • Shuran Song is on the Princeton CS Department News.
  • Most popular talk at ECCV 2014 based on view count.
  • “Sliding Shapes” is covered by Princeton Discovery Magazine.
  • Google Research Awards 2014 and 2015.
  • Google Research News.
  • Google Research Best Papers Award 2012.
  • Press interview: “Being a normal kid until getting his first computer”.
  • Most popular talk at ECCV 2012 based on view count.
  • ECCV2012 Best Student Paper Award.