This paper extends traditional object detection—allowing autonomous systems to localize visible objects and predict positions of those in unobserved regions.
Can objects not visible in an image—yet near the camera—be detected? We introduce 2D, 2.5D & 3D unobserved object detection, adapting pretrained generative models (2D/3D diffusion & vision–language) to infer occluded or out-of-frame objects. We benchmark on RealEstate10k & NYU Depth V2 with metrics capturing diverse performance aspects.
The task of unobserved object detection is to identify objects present in the scene but outside the camera frustum by predicting a conditional spatio-semantic distribution—a heatmap over spatial regions and labels—from a single RGB image.
Subhransu is supported by the international University Research Scholarship at the Australian National University. This research was partially funded by the U.S. Government under DARPA TIAMAT HR00112490421. The views and conclusions expressed in this document are solely those of the authors and do not represent the official policies or endorsements, either expressed or implied, of the U.S. Government. This research was also funded by the Australian Research Council under the scheme ITRH IH210100030.