Blocks-World Cameras

For several vision and robotics applications, 3D geometry of man-made environments such as indoor scenes can be represented with a small number of dominant planes. However, conventional 3D vision techniques typically first acquire dense 3D point clouds before estimating the compact piece-wise planar representations (e.g., by plane-fitting). This approach is costly, both in terms of acquisition and computational requirements, and potentially unreliable due to noisy point clouds. We propose Blocks-World Cameras, a class of imaging systems which directly recover dominant planes of piece-wise planar scenes (Blocks-World), without requiring point clouds. The Blocks-World Cameras are based on a structured-light system projecting a single pattern with a sparse set of cross-shaped features. We develop a novel geometric algorithm for recovering scene planes without explicit correspondence matching, thereby avoiding computationally intensive search or optimization routines. The proposed approach has low device and computational complexity, and requires capturing only one or two images. We demonstrate highly efficient and precise planar-scene sensing with simulations and real experiments, across various imaging conditions, including defocus blur, large lighting variations, ambient illumination, and scene clutter.


Blocks-World Cameras

Jongho Lee, Mohit Gupta

Proc. CVPR 2021

oral presentation

Imaging principle

(a) The Blocks-World Cameras are based on a structured-light system consisting of a projector to project a single pattern on the scenes and a camera to capture the images. (b) The pattern consists of a sparse set of cross-shaped features, which get mapped to cross-shaped features in the image via homographies induced by scene planes.

Plane estimation from a known feature correspondence

(a) Line segments u_p and u_c from image and pattern features create a pair of planes which meet at a 3D line l_u. Similarly, v_p and v_c create lv. (b) l_u and l_v define a 3D plane which can be estimated from known image and pattern feature correspondence.

Plane estimation from unknown correspondences: non-uniform pattern features

(a), (b) N features are placed at (upper row) uniform and (lower row) non-uniform spacing on an epipolar line of the pattern. M of these are imaged as image features. (c) A plane parameter locus is created in the Π space by pairing an image feature I_1 and all the pattern features on the corresponding epipolar line. The locus is on a plane parallel to the (D − θ) plane. (d, upper row) Loci corresponding to two different image features lying on the same scene plane have a large overlap with uniform pattern feature distribution, making it impossible to determine the true scene plane containing the features. (d, bottom row) However, for a pattern with non-uniform feature distribution, it is possible to uniquely determine the true scene plane.

Ground truth comparison

(a) A 3D scene with a projected pattern. (b) 2D Π-space with votes. Dominant planes illustrated at detected peak locations. (c) Plane boundaries formed by identifying image features that voted for the peaks. (d) Recovered plane depths and normals. (e) Ground truth depths and normals.

Comparison with plane-fitting

(a) A 3D scene. (b) Depth map captured by a simulated structured-light system. (c) 3D point cloud created from (b). (d, e, f) Plane segmentation results by randomized 3D Hough transform, RANSAC, and Blocks-World Cameras. The Blocks-World Cameras achieve more accurate plane segmentation than conventional approaches since each cross-shaped image feature contains local plane information.

Quantitative performance comparison

(a) Plane parameters error comparison. (b) Run-time comparison. Blocks-World Cameras can extract the plane parameters well in terms of both accuracy and run-time even without creating the point cloud.

Robustness to defocus blur

(a, b) A scene with varying amounts of defocus blur. (c, d) Measured plane depths and normals. Our approach is robust to defocus blur.

Robustness to ambient light

(a) A scene under different indoor lighting conditions. (b, c) Recovered plane depths and normals. Our shape features are robust to photometric variations.

Robustness to specular reflections and strong textures

(a) Scenes under challenging illumination conditions with specular reflections and strong textures. (b, c) Reconstructed plane depths and surface normals by Blocks-World Camera.

Approximating non-planar scene with piece-wise planar scene

(a) Cylinder scene. (b) Plane estimation with relatively small and large bin sizes of Π-space, respectively.


Presentation Slides


Share This Article

Share on LinkedInShare on FacebookTweet about this on Twitter