Active Object Recognition

In recent years the problem of object recognition has been extensively studied. Under many circumstances users will have the access to control the vision system, e.g. a guided robot. In that case not only can we acquire multiple views, but also are able to actively control the system to pick a certain angle. This is where the context of active view recognition appears.

Improving 3D data with Surface Geometry and Color

3D reconstruction and modeling become useful in many applications. This is usually achieved by either Laser Scanning or using Structure from Motion algorithms. Both methods can generate sparse point clouds. We propose an algorithm to interpolate the sparse 3D points into denser clouds by estimating the latent surface geometry of the point clouds and using the color information. The algorithm estimates the latent surface given the 3D points considering normal, distance etc, and performs a more robust segmentation on 3D points along with 2D colors. The resulting algorithm gives a boost to the baseline method with lower interpolation error.

Image based rendering using camera array

We build a camera array with around 100 network cameras to achieve image based rendering. Image based rendering is to interpolate virtual scene using multiple cameras. With a dense camera array, like in hundreds, we can render the scene of any virtual camera within the range. These cameras are synchronized to capture the images at relatively fast speed, and the final rendering result is appealing.

Structure from Motion Library

We implement a C++ based Structure from Motion library, which takes in multi-view images and produces the 3D reconstruction of the scene. Structure from Motion is a computer vision term describing a set of algorithm reconstructing three-dimension structure from an object from a set of two-dimension images. 'Structure' means to recover the structure of the scene/object, and 'motion' means the motion of the camera. In this library, we use a set of images in the same scene, and a known camera providing the intrinsic camera parameters. At first feature detection and mapping algorithm should be applied. Secondly it is to estimate the camera position from the feature correspondence and calculate the three-dimension location of all the features. In this step some algorithm like factorization or triangulation is implemented. At last usually there is a refined process minimize the reconstruction error, such as sparse bundle adjustment. The library is written in OpenCV and can be given upon request.[man] [code]

Robot obstcale avoidance

Obstacle avoidance based on a monocular system has become a very interesting area in robotics and vision field. However, most of the current approaches work with a pre-trained model which makes an independent decision on each image captured by the camera to decide whether there is an impending collision or the robot is free to go. This approach would only work in the restrictive settings the model is trained in.

In this project, we propose a structured learning approach to obstacle avoidance which captures the temporal structure in the images captured by the camera and takes a joint decision on a group of frames (images in the video). We show through our experiments that this approach of capturing structure across the frames performs better than treating each frame independently. In addition, we propose an online learning formulation of the same algorithm. We allow the robot without a pre-trained model to explore a new environment and train an online model to achieve obstacle avoidance.