Believing is Seeing: Unobserved Object Detection using Generative Models

CVPR 2025

Subhransu S. Bhattacharjee, Dylan Campbell, Rahul Shome

Unobserved Object Detection Example
Illustration of Unobserved Object Detection (UOD) using generative models.

Abstract

Can objects that are not visible in an image—but are in the vicinity of the camera—be detected? This study introduces the novel tasks of 2D, 2.5D, and 3D unobserved object detection for predicting the location of nearby objects that are occluded or lie outside the image frame. We adapt several state-of-the-art pre-trained generative models, including diffusion and vision-language models, and show they can infer the presence of unseen objects. Our benchmark metrics and empirical evaluations on indoor scenes (RealEstate10k and NYU Depth V2 datasets) support this approach.

Task Definition

Unobserved Object Detection (UOD) is the task of inferring the presence and spatial location of objects that are not directly visible within an image frame but are present in the surrounding environment. This includes objects that are occluded or lie just outside the camera's field-of-view. The task is explored in three settings — 2D images with partial views, 3D scenes with occlusions, and 2.5D scenes (2D images augmented with depth information).

Resources

Read Paper Code Coming Soon

Results

Detection Results
Each row shows the predicted 2D and top-down 3D spatial distributions generated by each method for various object categories: TV (first row), refrigerator (second row), sink (third row), laptop (fourth row), and sink (fifth row). Notably, in the bottom row, the DFM-based model infers the likely presence of a sink, occluded by the refrigerator, albeit not with a high likelihood. A white triangle marks the camera position, while dashed and dot-dashed lines depict the camera frustums for \(\mathcal{I}\) and \(\mathbb{I}\). The white star indicates the ground-truth position of the object, when visible in 2D. Heatmap colors indicate object likelihood, with warmer tones representing higher probabilities. Since these are spatially-normalized distributions, we use a log-scale for visualization.

Usage

Once released, detailed instructions for running the experiments and reproducing results will be provided.

Cite As

@inproceedings{bhattacharjee2025uod,
  title={{Believing is Seeing}: Unobserved Object Detection using Generative Models},
  author={Bhattacharjee, Subhransu S. and Campbell, Dylan and Shome, Rahul},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2025},
  note={To Appear}
}