Can objects that are not visible in an image—but are in the vicinity of the camera—be detected? This study introduces the novel tasks of 2D, 2.5D, and 3D unobserved object detection for predicting the location of nearby objects that are occluded or lie outside the image frame. We adapt several state-of-the-art pre-trained generative models, including diffusion and vision-language models, and show they can infer the presence of unseen objects. Our benchmark metrics and empirical evaluations on indoor scenes (RealEstate10k and NYU Depth V2 datasets) support this approach.
Unobserved Object Detection (UOD) is the task of inferring the presence and spatial location of objects that are not directly visible within an image frame but are present in the surrounding environment. This includes objects that are occluded or lie just outside the camera's field-of-view. The task is explored in three settings — 2D images with partial views, 3D scenes with occlusions, and 2.5D scenes (2D images augmented with depth information).
Once released, detailed instructions for running the experiments and reproducing results will be provided.
@inproceedings{bhattacharjee2025uod, title={{Believing is Seeing}: Unobserved Object Detection using Generative Models}, author={Bhattacharjee, Subhransu S. and Campbell, Dylan and Shome, Rahul}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2025}, note={To Appear} }