<< Chapter < Page Chapter >> Page >
This module is about the challenges we faced when detecting objects and the solutions to them.

Resolution difference between template and video

In the beginning, we made the template image and the video separately. In our experiment we used the large green breadboard as the object to be tracked. We first took a picture of the breadboard lying on the table using a high-resolution camera. Then, we switched the camera to video mode with a size of only 480×640 and recorded a video with someone moving the board around. We ran the code and the tracking result was disappointing.

Template with high resolution
Tracked result – off target (the “matched” points do not even make sense)

Then we realized that lots of details in the template image could not even be seen on the object in the video. For example, the screws and pinholes were not at all visible. Hence, we decided to crop out the template from a screenshot of the video so as to ensure the same level of detail. And it worked!

Template that is cropped out from the same source video
Tracking result of a video frame

Motion blur and video with lower resolution

Another problem with matching video frames is that when the shutter speed is relatively slow, the image in each frame blurs and the features of the object are no longer distinguishable.

Motion-Blurred Image
Motion-Blurred Image

Clear image
Clear image

To avoid motion blur, we had to slow down our movements when recording the video. Otherwise, the tracked position would be completely off.

However, the overall resolution and quality of the video matter, too. We tried converting one of our video files online because one member’s MATLAB cannot read in a .mov file. But after converting the file, the quality of the tracking result went down dramatically. We believe that it was partially due to the even lower resolution of the converted file. When the resolution is too low, a sharp corner can no longer be distinguished from a round corner. On the other hand, there could also be a problem introduced by the compression method that was used by that website, because clearly we could see non-uniformity in the once uniform white board area. This compression method actually introduced additional features or errors into the video frames.

Before video conversion
Before video conversion

After video conversion (also notice that the ratio of the image is changed)
After video conversion (also notice that the ratio of the image is changed)

Complex background

Since the resolution of our recording devices was not very good, a complex background could also be problematic, because there would be plenty of similar features in the background, but there would not be enough details to fully distinguish them. As a result, we had to use a simple background, such as a wall, and had to avoid wearing shirts full of complex patterns.

Features of template are inaccurately matched to similar features in the background
Features of template are inaccurately matched to similar features in the background

Objects without distinct features

An object that doesn’t have distinct SURF features is extremely problematic. We were experimenting with a blue pencil box. If the pencil box, lying on the table by itself, was used as the template image, OpenSurf would not output any feature descriptor that was above the threshold. Not until we lowered the threshold all the way down to 0.00001 (it was always set to 0.0008 for our project) did we find six features (normally more than a dozen for a small template and more than a hundred for a full video frame), and even those six features were degenerated!

Only six repeated features are found when threshold is set to 0.00001

If we cropped the template in such a way that some background was included, then only the outline or features in the background would be detected, and looking for matching features in the testing image would not make any sense. As is shown below, the matching failed.

All tracking points are attracted to irrelevant edges.
All tracking points are attracted to irrelevant edges.

In order to avoid all of the problems that occurred during our experiments, we decided that for the demo video we would only have one dancer, so as to get a closer shot and a higher resolution. The person would wear plain clothing, act in front of a wall, and move slowly and avoid out-of-plane rotation in order to increase the accuracy of object tracking.

However, because the feature matching is not very robust, and even one or two mismatches would introduce huge error into the orientation of the object, the output angular velocity is extremely noisy. That is why we finally decided to discard the angular velocity data.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Dwts - dancing with three-dimensional sound. OpenStax CNX. Dec 14, 2012 Download for free at http://cnx.org/content/col11466/1.1
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Dwts - dancing with three-dimensional sound' conversation and receive update notifications?

Ask