Inertial Safety from Structured Light
Proc. ECCV 2020
A novel scene representation that enables fast detection of obstacles in scenarios involving camera or scene motion using single-shot structured light.
We present inertial safety maps (ISM), a novel scene representation designed for fast detection of obstacles in scenarios involving camera or scene motion, such as robot navigation and human-robot interaction. ISM is a motion-centric representation that encodes both scene geometry and motion; different camera motion results in different ISMs for the same scene. We show that ISM can be estimated with a two-camera stereo setup without explicitly recovering scene depths, by measuring differential changes in disparity over time. We develop an active, single-shot structured light-based approach for robustly measuring ISM in challenging scenarios with textureless objects and complex geometries. The proposed approach is computationally light-weight, and can detect intricate obstacles (e.g., thin wire fences) by processing high-resolution images at high-speeds with limited computational resources. ISM can be readily integrated with depth and range maps as a complementary scene representation, potentially enabling high-speed navigation and robotic manipulation in extreme environments, with minimal device complexity.
Proc. ECCV 2020
Inertial Safety Map (ISM) is a motion-centric scene representation tailored for fast collision avoidance, which is defined as the produce of scene depth d and time-to-contact τ. (a-b) An example scene: a room with several pillars. (c-e) For the same scene, different camera motion results in different ISMs. A low value of ISM indicates a higher likelihood of collision, whereas higher values convey safety in the immediate future. (f) For a given value of ISM, the possible (z, τ) pairs lie on a hyperbolic curve called the z-τ curve, which can be used for navigation policy design.
ISM can be recovered very efficiently by using a single-shot structured light system. (1) The scene is illuminated by a high-frequency sinusoidal pattern, which can mathematically be represented as a multiplication in the spatial domain and a convolution in the frequency domain (ignoring ambient light). The frequency domain images are plotted in log scale. (2-4) The ISM can be recovered via a frequency-domain algorithm, which is simple and can be efficiently parallelized on a GPU. Please refer to the paper for details.
ISMs can be computed at high speeds even for very high resolution images. A direct comparison between ISM and other single-shot SL methods is difficult since their code is usually not publicly available. Instead, we compare the computational speeds of the proposed ISM algorithm and a few widely-used stereo matching algorithms. The CPU (MATLAB) implementation of the method is up to one order of magnitude faster than MATLAB's semi-global matching algorithm. We also develop a GPU implementation of the proposed method, which is able to reach 1kfps at 1 megapixel resolution, and achieves real-time performance even for very high resolution (90fps at 9 megapixel), which is 9x faster than OpenCV's CUDA implementation of block matching and considerably faster than belief propagation (BP) and constant-space BP.
We simulated a few robot navigation scenarios through thin obstacles using a ray tracer and realistic 3D models. The proposed approach can recover the ISM for scenes with complex, overlapping thin structures (bamboos, tree branches, warehouse racks).
Our prototype structured light system consists of a Canon DSLR camera and an Epson 3LCD projector. The projector projects a 1920x1080 high-frequency sinusoidal pattern with a period of 8 pixels. ISMs estimated using the proposed method are compared with ground truth, which is obtained by projecting binary SL patterns using the same hardware. Depth maps of the scenes are also shown for comparison (not used in computing ISM). (Top) A piecewise planar scene consisting of three books. (Bottom) A spherical ball. Our method recovers the ISMs of both scenes accurately.
The proposed method can recover extremely thin structures by processing high-resolution images, which is possible due to the hardware simplicity of structured light and the computational efficiency of the proposed algorithm. The thinnest part of the scenes are 4mm and 1.5mm, which could be challenging to resolve from 1.5m away. We also show results for two commodity depth cameras: Kinect V1 and V2, whose spatial resolutions are 640x480 and 512x424 respectively. From the same distance, the depth cameras are only able to partially recover the thicker parts of the fence in the top scene and completely miss the rings in the bottom scene. This is not meant to be a direct comparison of the three approaches, because the data is acquired from different cameras. With a higher resolution, depth cameras may also be able to recover the scene details, albeit at a higher computational cost.
We show navigation sequences with manually planned trajectories to demonstrate how the proposed ISM can be used in robot navigation scenarios. (Top) A simulated sequence where a drone flies through thin threads. As the drone detects the threads, it aligns its pose to be parallel with the threads to avoid collision. (Bottom) A real video sequence where a robot navigates around a pillar. The unrectified images are shown here to better convey the scene, while the ISM is only computed for the cropped area due to projector's field-of-view. The robot moves forward (first three frames), detects the pillar and moves to the left to circumvent it (last frame). Please see the embedded video for the entire sequences.
ISMs can also be used to detect collisions between moving objects and a static camera. (Left): A hand moving towards the camera. (Right): A thin cable (held by a person) moving towards the camera. The unrectified images are shown here to better convey the scene, while the ISM is only computed for the cropped area due to projector's field-of-view.