Meta introduces HOT3D: A dataset for advancing hand-object interaction research


SOURCE: PHILAVERSE.SUBSTACK.COM
JAN 03, 2025

PHIL SIARRI

JAN 03, 2025

Meta Reality Labs has introduced HOT3D, a publicly available dataset designed to enhance machine learning research on hand-object interactions.

This dataset comprises over 833 minutes of multi-view egocentric 3D video streams, captured using Meta's Project Aria glasses and Quest 3 VR headset. It includes 3.7 million annotated images, offering high-quality data on 19 subjects interacting with 33 diverse objects in real-world tasks.

Key features of HOT3D include:

Multi-modal data: RGB/monochrome image streams, eye gaze tracking, and 3D point clouds.

Comprehensive annotations: 3D poses of objects, hands, and cameras; 3D models of hands and objects.

Real-world scenarios: Demonstrations range from basic object manipulation to complex activities like typing or using kitchen utensils.

The dataset leverages a professional motion-capture system and supports advanced 3D tracking formats such as UmeTrack and MANO. Initial experiments revealed that models trained on HOT3D’s multi-view data outperform those trained on single-view data, excelling in tasks like 3D hand tracking, 6DoF object pose estimation, and 3D lifting of objects.

Available as open-source, HOT3D aims to drive innovation in robotics, AR/VR systems, and human-machine interfaces by providing a robust foundation for computer vision and machine learning advancements.

HOT3D Overview.

HOT3D overview: The dataset features multi-view egocentric image streams captured using the Aria glasses and Quest 3 headset, annotated with precise 3D poses and models of hands and objects. On the left, three multi-view frames from Aria display contours of 3D models for hands (white) and objects (green) in their ground-truth poses. Additionally, Aria provides 3D point clouds generated by SLAM and includes eye gaze tracking data (right). Credit: Banerjee et al.