Project
Introduction
The purpose of this assignment is to give you the opportunity to explore a computer-vision topic of your own choosing.
Project Deliverables
There are three check-points -- Lab of Week 8, Lab of Week 9, and Lab of Week 10.
Preliminary Proposal (Due In class, Friday of Week 7 at start of Week 8 Lab)
Submit in hard copy a brief proposal describing what you hope to accomplish. The title information should include your name, the date, the course number & title, and the assignment name. The text of the proposal should answer four questions: (1) What will your program do when it is complete? (2) What part of the project will you implement? (3) What part of the project is already implemented in available code? (4) What will you learn about computer vision while implementing the project? You may answer all these questions in one or two paragraphs, but be sure to answer them in this order, and to answer all the questions. (10% of final project grade)
Based on your preliminary proposal, I will make two decisions. The first question: What is the quality of the proposal? (Does it meet the requirements above)? This determines your grade for the proposal. The second question: Do I think you can realistically accomplish it in a three week lab. If the answer is "No", I will request a revised version to be emailed me electronically by Week 8, Tuesday, 11pm, and will provide suggestions on how to reduce the lab. Do I think there is enough computer-vision related coding in your proejct? If not, I will suggest some ways you can make your project include more computer vision.
Preliminary Implementation (Due in lab, Week 8 9)
Submit a preliminary version of your program that works at doing at least part of your goal for the project.
Since you will be writing the report during the second week, this should be an 80%-95% complete version of the project.
Preliminary Demo (Due in Week 9 Lab)
Demonstrate in lab the working code that you submitted the night before. Tell me about any changes made since submission. (10% of final project grade)
Final Submission (Due in Week 10 Lab)
Submit on paper a PDF report describing your final project. This report should include the same title information as the preliminary report. This report should follow the Word template provided, which includes the following sections, with at least a paragraph of text in each section: (1) What does your program do? (2) What did you learn about computer vision while implementing the project?
In addition, the report should include high-level documentation of your project. For Matlab projects, include a call graph generated by m2html, with a high-level discussion of the important functions of your program. For Object-Oriented languages, include a UML class diagram with a high-level discussion of the important functions of your program. Please note that I am not looking for a description of what happens when your program runs, play-by-play or a description of your classes, class-by-class. Rather, I am looking for the "big picture." What really matters in your program? How is it accomplished? What are the most important methods/arrays/classes in your program? How do they work?
Your project should use multiple matlab (Or Java, C, ...) functions (or objects) to organize your code. These should be documented as demonstrated in the labs throughout the quarter.
Additional notes on m2html: You may find the command m2html('mfiles','.','htmldir','doc','graph','on');
useful for generating this graph. You will need graphviz installed for this to work. Also, dot.exe needs to be on the path. (This is in graphviz's bin folder.) Instructions for adding to the path are here.
You only need to describe one key aspect of your approach in detail, and how this connects with the rest of your program. For example, if you implemented the SIFT feature-point descriptor, you might choose to only describe in detail how you achieve rotation invariance by sampling a rotated image patch at the interest point. Or, you might choose to describe in detail how you compute a weighted edge histogram in each bin, arranged in a grid around the interest point. You would not need to describe both.
Before this detailed description, give a high-level overview of the algorithm, e.g. "To describe a feature, we first determine the principle orientation of the point from a large histogram. Then we sample an image patch that is rotated around the image. Our final feature is a set of histograms computed in bins in a grid around the point."
In addition to the report, submit working source-code for your project. As with the preliminary submission, this should be a zip file that, when unzipped, produces a directory with a program immediately inside it called main.m or Main.java, and a test-file called README.txt if required.
Final Demonstration (Due in final class Week 10)
Each team will get up to 10 minutes to demonstrate their project to others in the class (as well as the instructor and possibly visitors). No powerpoint slides are required, but please verbally describe (1) What your program does and (2) what you learned about computer vision in the process.
Potential Project Ideas
- (One of my personal goals) Do something toward constructing a 3-D model of a flexible-tube protein with the Center for Biomolecular Modeling
- Implement the K-D tree and test its performance properties with a large database of SIFT features
- Detect objects with the Kinect sensor and distinguish between a cube and a sphere
- Track objects with the Kinect sensor, re-displaying object centroids and tracking trails in a separate view
- Find correspondences between two manually-cropped Kinect sensor images by detecting interest-points (or using SIFT+3D transform instead of planar projection that we will use in lab)
- Train a real-time face detector in OpenCV (or possibly Matlab)
- Recognize faces with eigenfaces (PCA/LDA)
- Find pictures of the Eifel tower in a personal photo collection using SIFT feature matching to a variety of eifel tower pictures. (Perhaps a different object than the Eifel tower would work better.)
- Detect edge segments using the Hough transform (from scratch)
- Study the geometry of multiple-camera calibration. Calibrate several cameras intrinsically using the Matlab camera calibration toolbox. Calibrate multiple cameras given manually-marked feature points. Report on the reprojection error achieved for points not used during calibration.
- Align two photographs taken with different camera settings, then combine them to produce an improved image. e.g. flash/no-flash photography. Szeliski Sec 10.2.2, pp. 434-435/.
- View morphing. Find the 3D correspondence between two camera images, then simulate moving the camera from one image to another. Sec 7.2.3, pp. 315 gives a sketch of how you might do this. Instead of finding a homography between images, you could find an epipolar transform instead.
- Implement Dr. Chad Aeschliman's fast multi-scale edge detector
- (Don't know how much is involved...) Create a Google Cardboard app.Link
- Do something for the Mars semi-autonomous competition that Dr. Bill Farrow is leading.
- Perform background subtraction using a simple per-pixel Gaussian model for the color of each pixel. (This is the first step toward an interesting project with the Center for Biomolecular modeling)
Ambitious (If you are interested in these projects, please talk to me about Undergraduate Research or Independent Study)
- Write a stereo-correspondence algorithm from scratch for a well-fixed camera pair
- Polish and re-submit Dr. Chad Aeschliman's fast multi-scale edge detector for publication
- Implement the HoG detector. (This is a very challenging project.)
- Automatically calibrate two cameras based on detected SIFT features
- Create an automatic exploration of images in a room based only on the images and their SIFT-feature correspondences. This is challenging because SIFT features are often very noisy for a room environment (as apposed to a still-life toy collection)
- Attempt to build a 3-D model by tracking objects with the Kinect sensor
- Implement a QR code reader (the part that reads the bits off of the screen)
- Re-create something like the features shown on the cover of the book using SIFT image correspondences or edge-detection correspondences.