Research Assistant Professor
Shanghai Jiao Tong University, School of Artificial Intelligence
上海交通大学人工智能学院 助理研究员, 硕士生导师
Member of Machine Vision and Intelligence Group (MVIG) at SJTUEmail: siriusyang at sjtu dot edu dot cn
Office: Bldg. SAI, No. 1954 Huashan Rd., Xuhui Dist., Shanghai, 200230, China
About.  I’m a Research Assistant Professor in Shanghai Jiao Tong University (SJTU),
affiliated with the School of Artificial Intelligence
(SAI),
where I joined in September 2024.
I obtained Ph.D. degree in Computer Science from SJTU in 2023, advised by Prof. Cewu
Lu at the Machine Vision and Intelligence Group and
M.S. degree in Mechanical Engineering, SJTU.
My research interests include 3D Vision and
Robotics.
Currently, I am focusing on modeling and imitating the hand manipulating objects,
including 3D hand | object pose | shape estimation,
grasp | motion generation, imitation learning, dexterous manipulation.
Join Us.  I am looking for Master Student at SJTU SAI and self-motivated research
interns. Contact me if you are interested
in
the above topics.
诚意科研研究实习生(带薪), 我们一起做有意思的科研。
A cross-embodiment framework that transfers wheeled-humanoid data to bipedal VLA models via morphology-agnostic 6D end-effector trajectories and a heuristic-enhanced online DAgger controller.
Multi-view Hand Reconstruction with a Point-Embedded Transformer
POEM-v2: a generalizable multi-view 3D hand reconstruction model trained on large-scale multi-view datasets. It enables accurate, flexible, and occlusion-robust hand mesh recovery across arbitrary multi-view setups.
AirExo-2: Scaling up Generalizable Robotic Imitation Learning with Low-Cost Exoskeletons
AirExo-2, a low-cost exoskeleton system for large-scale in-the-wild demonstration
collection.
It transforms the collected in-the-wild demonstrations into pseudo-robot demonstrations.
RISE-2,
a generalizable imitation policy that integrates 2D and 3D perceptions.
Dense Policy: Bidirectional Autoregressive Learning of Actions
a bidirectionally expanded learning approach that enhances auto-regressive policies for robotic
manipulation. It employs a lightweight encoder-only architecture to iteratively unfold the action
sequence from an initial single frame into the target sequence in a coarse-to-fine manner with
logarithmic-time inference.
Motion Before Action: Diffusing Object Motion as Manipulation Condition
A MLLM-based method that infuses language instructions into grasp generation; & A new
language-pose
dataset, CapGrasp,
featuring detailed caption of grasping poses.
OakInk2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion
A 4D motion dataset focusing on bimanual object manipulation tasks involved in complex daily
activities; & A three-tiered task abstraction: Object Affordance, Primitive Task, and
Complex Task, to systematically organize manipulation tasks.
FAVOR: Full-Body AR-Driven Virtual Object Rearrangement Guided by Instruction Text
A full-body human motion dataset that captures text-guided desktop object rearrangement through
MoCap
and AR glasses; & A pipeline for generating avatar's motion of object rearrangement driven by
text instruction.
Color-NeuS: Reconstructing Neural Implicit Surfaces with Color
Reconstructing 3D implicit surfaces with accurate, view-independent surface color by decoupling view-dependent shading from geometry. It combines a global color network and a relighting network to preserve volume rendering performance while enabling colored mesh extraction.
CHORD: Category-level in-Hand Object Reconstruction via Shape Deformation
A single-view hand-held object reconstruction method that exploits the categorical shape prior to
reconstruct the shape of intra-class objects; & A new synthetic dataset, COMIC, that contains the
category-level collection of objects with diverse shape, materials, interacting poses, and viewing
directions.
POEM: Reconstructing Hand in a Point Embedded Multi-view Stereo
A multi-view hand mesh recovery (HMR) method with Transformer. It leverages the "power of points",
including
Basis Points Set, point's positional encoding and point-Transformer, to unify and merge
information
from
sparsely arranged cameras.
DART: Articulated Hand Model with Diverse Accessories and Rich Textures
A MANO-derived hand model that contains exquisite hand-crafted texture maps, varying in
appearance and covering different kinds of blemishes, make-ups, and accessories.
OakInk: A Large-scale Knowledge Repository for Understanding Hand-Object Interaction
A dataset that focuses on human grasp based on object's affordance.
It contains two knowledge base: 1) Object affordance knowledge (Oak) and 2) Interaction knowledge
(Ink).
A new model: Tink, for transferring interaction pose from one object to another.
ArtiBoost: Boosting Articulated 3D Hand-Object Pose Estimation via Online Exploration
and Synthesis
CVPR, 2022   (Oral Presentation) paper
/
arXiv
/
code
An online data syhthesis tool for articulated hand(-object) pose estimation.
An grasping systhesis method that can generate dexterous hand grasping poses for arbitrary object.
Learning a Contact Potential Field to Model the Hand-Object Interaction
A novel contact representation (CPF) that used to imporve physical hand-object interaction.
A hybrid learning-fitting framework (MIHO) that aligns the top-down pose estimation with bottom-up
contact
modeling.
CPF: Learning a Contact Potential Field to Model the Hand-Object Interaction
A novel contact representation (CPF) that used to imporve physical hand-object interaction.
A hybrid learning-fitting framework (MIHO) that aligns the top-down pose estimation with bottom-up
contact
modeling.
HybrIK-X: Hybrid Analytical-Neural Inverse Kinematics for Whole-body Mesh Recovery
A hybrid inverse kinematics method for 3D body mesh recovery, combining 3D keypoint estimation and
body mesh
recovery. HybrIK-X extends this to model hands and faces, offering fast, accurate whole-body pose
estimation.
HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and
Shape Estimation
NeurIPS, 2024 paper
Part-aware diffusion for articulated object manipulation and controlled image editing in real scenes.
Talks
Human-Robot Data Companion: Pipeline and Representation. [2025.09]
SII TechFest workshop: Embodied AI Reasoning and Scaling. Thank Panpan Cai for hosting.
Paving the Way for Understanding Human Interactions with Objects: The OakInk2 Dataset.
[2023.08] ICCV 2023 HANDS Workshop, Thank Linlin Yang for hosting.