computer-vision interview questions
Top computer-vision frequently asked interview questions
I'm looking to create a base table of images and then compare any new images against that to determine if the new image is an exact (or close) duplicate of the base.
For example: if you want to reduce storage of the same image 100's of times, you could store one copy of it and provide reference links to it. When a new image is entered you want to compare to an existing image to make sure it's not a duplicate ... ideas?
One idea of mine was to reduce to a small thumbnail and then randomly pick 100 pixel locations and compare.
Source: (StackOverflow)
I am just throwing an idea with possibility of closing. I need to draw a crystal ball in which red and blue particles randomly locate. I guess I have to go with photoshop, and even tried to make the ball in an image but as this is for research paper and does not have to be fancy, I wonder if there is any way to program with R, matlab, or any other language.
Source: (StackOverflow)
I'm trying to compare images to each other to find out whether they are different. First I tried to make a Pearson correleation of the RGB values, which works also quite good unless the pictures are a litte bit shifted. So if a have a 100% identical images but one is a little bit moved, I get a bad correlation value.
Any suggestions for a better algorithm?
BTW, I'm talking about to compare thousand of imgages...
Edit:
Here is an example of my pictures (microscopic):
im1:

im2:

im3:

im1 and im2 are the same but a little bit shifted/cutted, im3 should be recognized as completly different...
Edit:
*Problem is solved with the suggestions of Peter Hansen! Works very well! Thanks to all answers! Some results can be found here
http://labtools.ipk-gatersleben.de/image%20comparison/image%20comparision.pdf*
Source: (StackOverflow)
I'm interested in writing some basic computerized object recognition application, so I figure I need some theoretical background in image processing algorithms, along with some AI for decision making capabilities.
I'm a computer science graduate, and one day I plan to get my Master's degree, hopefully in one of these fields. In the mean time, I'd like to get a head start and do some self-learning.
So my question is, where do I start? I'd appreciate an arrow in the right direction, a few links if possible.
Source: (StackOverflow)
I have a camera that will be stationary, pointed at an indoors area. People will walk past the camera, within about 5 meters of it. Using OpenCV, I want to detect individuals walking past - my ideal return is an array of detected individuals, with bounding rectangles.
I've looked at several of the built-in samples:
- None of the Python samples really apply
- The C blob tracking sample looks promising, but doesn't accept live video, which makes testing difficult. It's also the most complicated of the samples, making extracting the relevant knowledge and converting it to the Python API problematic.
- The C 'motempl' sample also looks promising, in that it calculates a silhouette from subsequent video frames. Presumably I could then use that to find strongly connected components and extract individual blobs and their bounding boxes - but I'm still left trying to figure out a way to identify blobs found in subsequent frames as the same blob.
Is anyone able to provide guidance or samples for doing this - preferably in Python?
Source: (StackOverflow)
Can anyone recommend a decent java library for face detection (recognition not required, just detection).
The library would preferably be pure java (e.g. no dependencies on other native libs, DLLs or such).
Platforms: Linux is a must; OS X and windows are very nice to have.
Performance isn't a big deal, can be slow, it's for server offline processing.
All I need to know is: are there faces in the photo? If yes, what are the coordinates of their bounding boxes?
Thanks
Source: (StackOverflow)
TinEye, the "reverse image search engine", allows you to upload/link to an image and it is able to search through the billion images it has crawled and it will return links to images it has found that are the same image.
However, it isn't a naive checksum or anything related to that. It is often able to find both images of a higher resolution and lower resolution and larger and smaller size than the original image you supply. This is a good use for the service because I often find an image and want the highest resolution version of it possible.
Not only that, but I've had it find images of the same image set, where the people in the image are in a different position but the background largely stays the same.
What type of algorithm could TinEye be using that would allow it to compare an image with others of various sizes and compression ratios and yet still accurately figure out that they are the "same" image or set?
Source: (StackOverflow)
I know that computer vision involves a lot of math, but I need some tips about how programmers gain that knowledge. I've started to use the OpenCV library but I have some major problems in understanding how the math works in the algorithms.
In college I have studied some math and we worked with matrices and derivatives, but I didn't pay to much attention to the subject. It seemed to be so difficult and useless from a programmer point of view. I suppose that there has to be some easy way to understand what a second derivative is without calculating an equation. (Derivatives are just an example)
Do you have any tips for me about how can i gain such knowledge? A forum, book, link, advice, anything?
Source: (StackOverflow)
I recently came across Tesseract and OpenCV. It looks like Tesseract is a full-fledged OCR engine and OpenCV can be used as a framework to create an OCR application/service.
I tried using Tesseract on some of my images and its accuracy seems decent. Later, I came across a very simple tutorial on using OpenCV to perform OCR using Python and was impressed. In a few minutes, I finished training the system and its accuracy was good. But of course, taking this approach means I need to train my system extensively using a large training set.
My specific questions are the following:
- How does one choose between Tesseract and using OpenCV to build a custom OCR app?
- There are training datasets available for Tesseract for different languages. Does OpenCV have something similar so that I don't have to start ground up to achieve OCR?
- Which one is better for a wanna-be commercial application?
Any suggestions?
Note: I am 24 hours old in the area of Computer Vision but am willing to put in time and effort to learn the pre-requisites.
Source: (StackOverflow)
Background
For my final project at university, I'm developing a vehicle license plate detection application. I consider myself an intermediate programmer, however my mathematics knowledge lacks anything above secondary school, which makes producing the right formulas harder than it probably should be.
I've spend a good amount of time looking up academic papers such as:
When it comes to the math, I'm lost. Due to this testing various graphic images proved productive, for example:

to

However this approach only worked to that particular image, and if the techniques were applied to different images, I'm sure a poorer conversion would occur. I've read about a formula called the "bottom hat morphology transform", which does the following:
Basically, the trans- formation keeps all the dark details of the picture, and eliminates everything else (including bigger dark regions and light regions).
I can't find much information on this, however the image within the documentation near the end of the report shows its effectiveness.
Other constraints
- Developing in C#
- Confining the project to UK registration plates only
- I can choose the images to convert as a demonstration
Question
I need advice on what transformation techniques I should focus on developing, and what algorithms can help me.
EDIT: New information present on Continued - Vehicle License Plate Detection
Source: (StackOverflow)
I am new to opencv and trying to implement image matching between two images. For this purpose, I'm trying to understand the difference between feature descriptors, descriptor extractors and descriptor matchers. I came across a lot of terms and tried to read about them on the opencv documentation website but I just can't seem to wrap my head around the concepts. I understood the basic difference here. Difference between Feature Detection and Descriptor Extraction
But I came across the following terms while studying on the topic :
FAST, GFTT, SIFT, SURF, MSER, STAR, ORB, BRISK, FREAK, BRIEF
I understand how FAST, SIFT, SURF work but can't seem to figure out which ones of the above are only detectors and which are extractors.
Then there are the matchers.
FlannBased, BruteForce, knnMatch and probably some others.
After some reading, I figured that certain matchers can only be used with certain extractors as explained here. How Does OpenCV ORB Feature Detector Work?
The classification given is quite clear but it's only for a few extractors and I don't understand the difference between float and uchar.
So basically, can someone please
- classify the types of detectors, extractors and matchers based on float and uchar, as mentioned, or some other type of classification?
- explain the difference between the float and uchar classification or whichever classification is being used?
- mention how to initialize (code) various types of detectors, extractors and matchers?
I know its asking for a lot but I'll be highly grateful.
Thank you.
Source: (StackOverflow)
I have a video file recorded from the front of a moving vehicle. I am going to use OpenCV for object detection and recognition but I'm stuck on one aspect. How can I determine the distance from a recognized object.
I can know my current speed and real-world GPS position but that is all. I can't make any assumptions about the object I'm tracking. I am planning to use this to track and follow objects without colliding with them. Ideally I would like to use this data to derive the object's real-world position, which I could do if I could determine the distance from the camera to the object.
Source: (StackOverflow)
I have the problem with processing digital signals. I am trying to detect fingertips, similar to the solution that is presented here: Hand and finger detection using JavaCV.
However, I am not using JavaCV but OpenCV for android which is slightly different.
I have managed to do all the steps presented in the tutorial, but filtering of convex hulls and convexity defects. This is how my image looks like:

Here is an image in another resolution:

As you can clearly see, There is to many yellow points (convex hulls) and also to many red points (convexity deffects). Sometimes between 2 yellow points there is no red point, which is quite strange (how are convex hulls calculated?)
What I need is to create simillar filtering function like in the link provided before, but using data structures of OpenCV.
Convex Hulls are type of MatOfInt ...
Convexity defects are type of MatOfInt4 ...
I created also some additional data structures, because stupid OpenCV uses different types of data containing same data, in different methods...
convexHullMatOfInt = new MatOfInt();
convexHullPointArrayList = new ArrayList<Point>();
convexHullMatOfPoint = new MatOfPoint();
convexHullMatOfPointArrayList = new ArrayList<MatOfPoint>();
Here is what I did so far but it is not working good. The problem is probably with converting data in a wrong way:
Creating convex hulls and convexity defects:
public void calculateConvexHulls()
{
convexHullMatOfInt = new MatOfInt();
convexHullPointArrayList = new ArrayList<Point>();
convexHullMatOfPoint = new MatOfPoint();
convexHullMatOfPointArrayList = new ArrayList<MatOfPoint>();
try {
//Calculate convex hulls
if(aproximatedContours.size() > 0)
{
Imgproc.convexHull( aproximatedContours.get(0), convexHullMatOfInt, false);
for(int j=0; j < convexHullMatOfInt.toList().size(); j++)
convexHullPointArrayList.add(aproximatedContours.get(0).toList().get(convexHullMatOfInt.toList().get(j)));
convexHullMatOfPoint.fromList(convexHullPointArrayList);
convexHullMatOfPointArrayList.add(convexHullMatOfPoint);
}
} catch (Exception e) {
// TODO Auto-generated catch block
Log.e("Calculate convex hulls failed.", "Details below");
e.printStackTrace();
}
}
public void calculateConvexityDefects()
{
mConvexityDefectsMatOfInt4 = new MatOfInt4();
try {
Imgproc.convexityDefects(aproximatedContours.get(0), convexHullMatOfInt, mConvexityDefectsMatOfInt4);
if(!mConvexityDefectsMatOfInt4.empty())
{
mConvexityDefectsIntArrayList = new int[mConvexityDefectsMatOfInt4.toArray().length];
mConvexityDefectsIntArrayList = mConvexityDefectsMatOfInt4.toArray();
}
} catch (Exception e) {
Log.e("Calculate convex hulls failed.", "Details below");
e.printStackTrace();
}
}
Filtering:
public void filterCalculatedPoints()
{
ArrayList<Point> tipPts = new ArrayList<Point>();
ArrayList<Point> foldPts = new ArrayList<Point>();
ArrayList<Integer> depths = new ArrayList<Integer>();
fingerTips = new ArrayList<Point>();
for (int i = 0; i < mConvexityDefectsIntArrayList.length/4; i++)
{
tipPts.add(contours.get(0).toList().get(mConvexityDefectsIntArrayList[4*i]));
tipPts.add(contours.get(0).toList().get(mConvexityDefectsIntArrayList[4*i+1]));
foldPts.add(contours.get(0).toList().get(mConvexityDefectsIntArrayList[4*i+2]));
depths.add(mConvexityDefectsIntArrayList[4*i+3]);
}
int numPoints = foldPts.size();
for (int i=0; i < numPoints; i++) {
if ((depths.get(i).intValue()) < MIN_FINGER_DEPTH)
continue;
// look at fold points on either side of a tip
int pdx = (i == 0) ? (numPoints-1) : (i - 1);
int sdx = (i == numPoints-1) ? 0 : (i + 1);
int angle = angleBetween(tipPts.get(i), foldPts.get(pdx), foldPts.get(sdx));
if (angle >= MAX_FINGER_ANGLE) // angle between finger and folds too wide
continue;
// this point is probably a fingertip, so add to list
fingerTips.add(tipPts.get(i));
}
}
Results (white points - fingertips after filtering):

Could you help me to write proper function for filtering?
UPDATE 14.08.2013
I use standard openCV function for contour approximation. I have to change approximation value with resolution change, and hand-to-camera distance, which is quite hard to do. If the resolution is smaller, then finger consist of less pixel, thus approximation value should be lover. Same with the distance. Keeping it high will result in completely losing the finger. So I think approximation is not good approach to resolving the problem, however small value could be useful to speed up calculations:
Imgproc.approxPolyDP(frame, frame, 2 , true);
If I use high values, then the result is like on the image below, which would be good only if distance and resolution wouldn't change. Also, I am quite surprised that default methods for hulls points and defects points doesn't have useful arguments to pass (min angle, distance etc)...
Image below presents the effect that I would like to achieve always, independently from resolution or hand-to-camera distance. Also I don't want to see any yellow points when I close my palm...
To sum everything up, I would like to know:
- how to filter the points
- how can I make resolution and distance independent approximation which will always work
- if someone knows or have some materials (graphical representation, explanation) about those data structures used in OpenCV, I would be happy to read it. (Mat, MatOfInt, MatOfPoint, MatOfPoint2, MatOfPoint4 etc.)

Source: (StackOverflow)
I would like to know how to convert an RGB image into a black & white (binary) image.
After conversion, how can I save the modified image to disk?
Source: (StackOverflow)
I have used the SIFT implementation of Andrea Vedaldi, to calculate the sift descriptors of two similar images (the second image is actually a zoomed in picture of the same object from a different angle).
Now I am not able to figure out how to compare the descriptors to tell how similar the images are?
I know that this question is not answerable unless you have actually played with these sort of things before, but I thought that somebody who has done this before might know this, so I posted the question.
the little I did to generate the descriptors:
>> i=imread('p1.jpg');
>> j=imread('p2.jpg');
>> i=rgb2gray(i);
>> j=rgb2gray(j);
>> [a, b]=sift(i); % a has the frames and b has the descriptors
>> [c, d]=sift(j);
Source: (StackOverflow)