MC3D: Motion Contrast 3D Scanning
Proc. ICCP 2015
We present a novel structured light technique called Motion Contrast 3D scanning (MC3D) that maximizes bandwidth and light source power to avoid performance trade-offs.
Structured light 3D scanning systems are fundamentally constrained by limited sensor bandwidth and light source power, hindering their performance in real-world applications where depth information is essential, such as industrial automation, autonomous transportation, robotic surgery, and entertainment. We present a novel structured light technique called Motion Contrast 3D scanning (MC3D) that maximizes bandwidth and light source power to avoid performance trade-offs. The technique utilizes motion contrast cameras that sense temporal gradients asynchronously, i.e., independently for each pixel, a property that minimizes redundant sampling. This allows laser scanning resolution with single-shot speed, even in the presence of strong ambient illumination, significant inter-reflections, and highly reflective surfaces. The proposed approach will allow 3D vision systems to be deployed in challenging and hitherto inaccessible real-world scenarios requiring high performance using limited power and bandwidth.
Proc. ICCP 2015
SL systems face trade-offs in acquisition speed, resolution, and light efficiency. Laser scanning (upper left) achieves high resolution at slow speeds. Single-shot methods (mid-right) obtain lower resolution with a single exposure. Other methods such as Gray coding and phase shifting (mid-bottom) balance speed and resolution but have degraded performance in the presence of strong ambient light, scene inter-reflections, and dense participating media. Hybrid techniques from Gupta et al. (curve shown in green) and Taguchi et al. (curve shown in red) strike a balance between these extremes. This paper proposes a new SL method, motion contrast 3D scanning (denoted by the point in the center), that simultaneously achieves high resolution, low acquisition speed, and robust performance in exceptionally challenging 3D scanning environments.
(Left) The space-time volume output of a conventional camera consists of a series of discrete full frame images (here a black circle on a pendulum). (Right) The output of a motion contrast camera for the same scene consists of a small number of pixel change events scattered in time and space. The sampling rate along the time axis in both cameras is limited by the camera bandwidth. The sampling rate for motion contrast is far higher because of the naturally sparse distribution of pixel change events.
A scanning source illuminates projector positions α1 and α2 at times t1 and t2, striking scene points s1 and s2. Correspondence between projector and camera coordinates is not known at runtime. The DVS sensor registers changing pixels at columns i1 and i2 at times t1 and t2, which are output as events containing the location/event time pairs [i1, τ1] and [i2, τ2]. We recover the estimated projector positions j1 and j2 from the event times. Depth can then be calculated using the correspondence between event location and estimated projector location.
Laser scanning performed with laser galvanometer and traditional sensor cropped to 128×128 with total exposure time of 28.5s. Kinect and MC3D methods captured with 1 second exposure at 128×128 resolution (Kinect output cropped to match) and median filtered. Object placed 1m from sensor under ∼150 lux ambient illuminance measured at object. Note that while the image-space resolution for all 3 methods are matched, MC3D produces depth resolution equivalent to laser scanning, whereas the Kinect depth is more coarsely quantized.
Disparity output for both methods captured with 1 second exposure at 128×128 resolution (Kinect output cropped to match) under increasing illumination from 150 lux to 5000 lux measured at middle of the sphere surface. The illuminance from our projector pattern was measured at 150 lux. Note that in addition to outperforming the Kinect, MC3D returns usable data at ambient illuminance levels an order of magnitude higher than the projector power.
The image on the left depicts a test scene consisting of two pieces of white foam board meeting at a 30 degree angle. The middle row of the depth output from Gray coding and MC3D are shown in the plot on the right. Both scans were captured with an exposure time of 1/30th second. Gray coding used 22 consecutive coded frames, while MC3D results were averaged over 22 frames. MC3D faithfully recovers the V-groove shape while the Gray code output contains gross errors.
The image on the left depicts a reflective test scene consisting of a shiny steel sphere. The plot on the right shows the depth output from Gray coding and MC3D. Both scans were captured with an exposure time of 1/30th second. The Gray coding method used 22 consec- utive coded frames, while MC3D results were averaged over 22 frames. The Gray code output produces significant artifacts not present in MC3D output.