AI-powered spatial scene understanding with object size estimation from single images
This project combines GroundingDINO, SAM, and Depth Anything V2 to estimate real-world object sizes from single images. Users provide a reference object with known dimensions to calibrate measurements of other objects in the scene.
AI Models: GroundingDINO • SAM • Depth Anything V2
Detection Success
Measurement Accuracy
Processing Time

Depth Anything V2 depth map

SAM precise masks

Combined visualization
Result: Laptop dimensions estimated with 30% accuracy using reference calibration.See detailed analysis →