Spatial and Temporal Filtering of Depth Data for Telepresence

In the past years, there has been a growing interest in telepresence communication systems, which create the impression of being present at a place different from the true location. A major challenge in this area is to process the acquired imagery at the sender site into a high-quality 3D representation of the scene in real time. High quality approaches usually require intensive offline processing. Then again, methods that work in real time produce 3D representations at a low grade. In recent systems this is caused by the application of low-cost depth sensors, such as Microsoft Kinect, which deliver 3D representations in real time but still exhibit considerable amounts of disturbing artifacts. The flickering nature of artifacts is often not considered. To understand their strong temporal component, we will start with experimentally developing a statistical model for the distortions of common depth sensors. In contrast to existing work in this field, we will also consider the temporal aspects of the matter. Guided by the results of analyzing the gained data, we will develop a new real-time spatio-temporal filter to simultaneously stabilize distorted depth data in the spatial and temporal domains. Therefore, we suggest a composition of a novel depth outlier detection method, motion estimation for depth cameras and 3D-filtering. Our idea is to smooth every depth pixel based on its 3D spatial and temporal neighborhood. In order to identify temporally related neighbors in the stream of depth frames, we will estimate the motion of scene objects to trace back the history of every depth pixel. As depth images are too unstable for this task, we suggest to rather perform motion estimation on the color images, usually delivered alongside by current depth sensors. By the fixed close spatial relation between color and depth camera we can transfer the estimation results to the depth images. After compiling the spatio-temporal 3D neighborhood of all depth pixels in a frame, we will insert a robust outlier detection and removal step using 6D linear regression. Here, an essential amount of research will be invested into the question on how to implement a least median of squares approach, which, on the one hand, is suitable to solve the task but, on the other hand, is difficult to do in real time. After cleaning away the outliers, filtering the 3D depth neighborhood will remove the inherent Gaussian noise. Our new approach will be integrated into a telepresence prototype system comprising an array of RGB-D cameras. Here, we plan to cross-validate the corrected depth data from multiple cameras by extending our previous work towards dynamic 3D representations. For the evaluation of our proposed method, testing data will be generated alongside with ground truth either obtained from predefined scenery or from artificial imagery with added hardware-conform noise.