Dian Chen 陈典
I am currently a senior research scientist at Waabi. I obtained my Ph.D in CS at UT Austin, under the supervision of Prof. Philipp Krähenbühl
. During this time, I interned at Waymo, working on multi-agent behavior prediction.
Before that, I studied at UC Berkeley majoring in Computer Science and Applied Mathematics, where I worked with Prof. Pulkit Agrawal, Prof. Deepak Pathak, Prof. Sergey Levine, Prof. Pieter Abbeel, and Prof. Jitendra Malik, working on robot manipulation.
Email  / 
GitHub  / 
Scholar
|
|
Research
My research interests lie in robotics, computer vision and machine learning including reinforcement learning. I also work on autonomous driving.
|
|
MotionLM: Multi-Agent Motion Forecasting as Language Modeling
Ari Seff,
Brian Cera,
Dian Chen,
Aurick Zhou,
Nigamaa Nayakanti,
Khaled S. Refaat,
Rami Al-Rfou
Benjamin Sapp
International Conference on Computer Vision (ICCV), 2023
arxiv
We present MotionLM, a behavior predictor that represent continuous trajectories as sequences of discrete motion tokens. MotionLM casts multi-agent motion prediction as a language modeling task over this domain.
|
|
Coopernaut: End-to-End Driving with Cooperative Perception for Networked Vehicles
Jiaxun Cui,
Hang Qiu,
Dian Chen,
Peter Stone,
Yuke Zhu
Conference on Computer Vision and Pattern Recognition (CVPR), 2022
website /
code /
arxiv
We introduce COOPERNAUT, an end-to-end learning model that uses cross-vehicle perception for vision-based cooperative driving. Our model encodes LiDAR information into compact point-based representations that can be transmitted as messages between vehicles via realistic wireless channels.
|
|
Learning from All Vehicles
Dian Chen,
Philipp Krähenbühl
Conference on Computer Vision and Pattern Recognition (CVPR), 2022 Winner of 2021 CARLA AD Challenge
website /
code /
arxiv
We present LAV, a mapless, learning-based end-to-end driving system. LAV takes as input multi-modal sensor readings and learns from all nearby vehicles in the scene for both perception and planning. At test time, LAV predicts multi-modal future trajectories for all detected vehicles, including the ego-vehicle. Our system outperforms all prior methods on the public CARLA Leaderboard by a wide margin, improving driving score by 25 and route completion rate by 24 points.
|
|
Learning to Drive From a World on Rails
Dian Chen,
Vladlen Koltun,
Philipp Krähenbühl
(Oral Presentation) International Conference on Computer Vision (ICCV), 2021
website /
code /
video /
arxiv
We present a model-based RL method for autonomous driving and navigation tasks. The world model is factorized into a passively moving environment, and a compact ego component. Our method significantly simplifies reinforcement learning. It ranks first on the CARLA leaderboard, and outperforms state-of-the-art imitation learning and model-free reinforcement learning on driving tasks. It is also an order of magnitude more sample efficient than model-free RL on the navigation games in the ProcGen benchmark.
|
|
Learning by Cheating
Dian Chen,
Brady Zhou,
Vladlen Koltun,
Philipp Krähenbühl
Conference on Robot Learning (CoRL), 2019
website /
code /
video /
arxiv
We present a two-stage imitation learning method for vision-based driving. Our approach achieves 100% success rate on all tasks in the original CARLA benchmark, sets a new record on the NoCrash benchmark, and reduces the frequency of infractions by an order of magnitude compared to the prior state of the art.
|
|
Learning Instance Segmentation by Interaction
Deepak Pathak*,
Fred Shentu*,
Dian Chen*,
Pulkit Agrawal*,
Trevor Darrell,
Sergey Levine,
Jitendra Malik (*equal contribution)
Robotics Vision Workshop, Conference on Computer Vision and Pattern Recognition (CVPR), 2018
website /
arxiv
We present a robotic system that learns to segment its visual observations into individual objects by experimenting with its environment in a completely self-supervised manner. Our system is at par with the state-of-art instance segmentation algorithm trained with strong supervision.
|
|
Zero-Shot Visual Imitation
Deepak Pathak*,
Parsa Mahmoudieh*,
Michael Luo*,
Pulkit Agrawal*,
Dian Chen,
Fred Shentu,
Evan Shelhamer,
Jitendra Malik,
Alexei Efros,
Trevor Darrell (*equal contribution)
(Oral Presentation) International Conference on Learning Representation (ICLR), 2018
website /
arxiv
We present a novel skill policy architecture and dynamics consistency loss which extend visual imitation to more complex environments while improving robustness. Experiments results are shown in a robot knot tying task and a first-person visual navigation task.
|
|
Combining Self-Supervised Learning and Imitation for Vision-Based Rope Manipulationg
Ashvin Nair*,
Dian Chen*,
Pulkit Agrawal*,
Phillip Isola,
Jitendra Malik,
Pieter Abbeel,
Sergey Levine (*equal contribution)
IEEE International Conference on Robotics and Automation (ICRA), 2017
website /
arxiv
We present a system where a robot takes as input a sequence of images of a human manipulating a rope from an initial to goal configuration, and outputs a sequence of actions that can reproduce the human demonstration, using only monocular images as input.
|
|