Scores of different models across the three tasks in P3D-Bench. The Score is the average of the four bucket scores (Geometry, Topology, Judge, Part), rescaled to 0–100. Multimodal large language ...
HOI-DETR is a transformer-based framework for detecting hands, hand-held objects, and their interactions in images and video. Built on the Co-DETR architecture, it adds a lightweight interaction ...