EzDevInfo.com

opencv interview questions

Top opencv frequently asked interview questions

Algorithm improvement for Coca-Cola can shape recognition

One of the most interesting projects I've worked in the past couple years as I was still a student, was a final project about image processing. The goal was to develop a system to be able to recognize Coca-Cola cans (note that I'm stressing the word cans, you'll see why in a minute). You can see a sample below, with the can recognized in the green rectangle with scale and rotation.

Template matching

Some contraints on the project:

  • The background could be very noisy.
  • The can could have any scale or rotation or even orientation (within reasonable limits)
  • The image could have some degree of fuziness (contours could be not really straight)
  • There could be Coca-Cola bottles in the image, and the algorithm should only detect the can !
  • The brightness of the image could vary a lot (so you can't rely "too much" on color detection.
  • The can could be partly hidden on the sides or the middle (and possibly partly hidden behind the bottle !)
  • There could be no cans at all in the image, in which case you had to find nothing and write a message saying so.

So you could end up with tricky things like this (which in this case had my algorithm totally fail):

Total fail

Now I've done this project obviously as it was a while ago, and had a lot of fun doing it, and I had a decent implementation. Here are some details about my implementation:

Language: Done in C++ using OpenCV library.

Pre-processing: Regarding image pre-processing I mean how to transform it in a more raw form to give to the algorithm. I used 2 methods:

  1. Changing color domain from RGB to HSV (Hue Saturation Value) and filtering based on "red" hue, saturation above a certain threshold to avoid orange-like colors, and filtering of low value to avoid dark tones. The end result was a binary black and white image, where all white pixels would represent the pixels that match this threshold. Obviously there is still a lot of crap in the image, but this reduces the number of dimensions you have to work with). Binarized image
  2. Noise filtering using median filtering (taking the median pixel value of all neighbors and replace the pixel by this value) to reduce noise.
  3. Using Canny Edge Detection Filter to get the contours of all items after 2 precedent steps. Contour detection

Algorithm: The algorithm itself I chose for this task was taken from this (awesome) book on feature extraction and called Generalized Hough Transform (pretty different from the regular Hough Transform). It basically says a few things:

  • You can describe an object in space without knowing its analytical equation (which is the case here).
  • It is resistent to image deformations such as scaling and rotation, as it will basically test your image for every combination of scale factor and rotation factor.
  • It uses a base model (a template) that the algorithm will "learn".
  • Each pixel remaining in the contour image will vote for another pixel which will supposedly be the center (in terms of gravity) of your object, based on what it learned from the model.

In the end, you end up with a heat map of the votes, for example here all the pixels of the contour of the can will vote for its gravitational center, so you'll have a lot of votes in the same pixel corresponding to the center, and will see a peak in the heat map as below.

GHT

Once you have that, a simple threshold-based heuristic can give you the location of the center pixel, from which you can derive the scale and rotation and then plot your little rectangle around it (final scale and rotation factor will obviously be relative to your original template). In theory at least...

Results: Now, while this approach worked in the basic cases, it was severely lacking in some areas:

  • It is extremely slow ! I'm not stressing this enough. Almost a full day was needed to process the 30 test images, obviously because I had a very high scaling factor for rotation and translation, since some of the cans were very small.
  • It was completely lost when bottles were in the image, and for some reason almost always found the bottle instead of the can (perhaps because bottles were bigger, thus had more pixels, thus more votes)
  • Fuzzy images were also no good, since the votes ended up in pixel at random locations around the center, thus ending with a very noisy heat map.
  • Invariance in translation and rotation was achieved, but not in orientation, meaning that a can that was not directly facing the camera objective wasn't recognized.

Can you help me improve my specific algorithm, using exclusively OpenCV features, to resolve the four specific issues mentionned?

I hope some people will also learn something out of it as well, after all I think not only people who ask questions should learn :)


Source: (StackOverflow)

Increasing camera capture resolution in OpenCV

In my C/C++ program, I'm using OpenCV to capture images from my webcam. The camera (Logitech QuickCam IM) can capture at resolutions 320x240, 640x480 and 1280x960. But, for some strange reason, OpenCV gives me images of resolution 320x240 only. Calls to change the resolution using cvSetCaptureProperty() with other resolution values just don't work. How do I capture images with the other resolutions possible with my webcam?


Source: (StackOverflow)

Advertisements

How to detect a Christmas Tree?

Which image processing techniques could be used to implement an application that detects the christmas trees displayed in the following images?

I'm searching for solutions that are going to work on all these images. Therefore, approaches that require training haar cascade classifiers or template matching are not very interesting.

I'm looking for something that can be written in any programming language, as long as it uses only Open Source technologies. The solution must be tested with the images that are shared on this question. There are 6 input images and the answer should display the results of processing each of them. Finally, for each output image there must be red lines draw to surround the detected tree.

How would you go about programmatically detecting the trees in these images?


Source: (StackOverflow)

What is different between all these OpenCV Python interfaces?

There are

What are the main differences and which one should I use?


Source: (StackOverflow)

How to sharpen an image in OpenCV?

How to sharpen an image using OpenCV? There are many ways of smoothing or blurring but none that I could see of sharpening.


Source: (StackOverflow)

Test NEON-optimized cv::threshold() on mobile device [closed]

I have been writing some optimizations for the OpenCV's threshold function, for ARM devices (mobile phones). It should be working on both Android and iPhone.

However, I do not have a device to test it on, so I am looking for volunteers to give me a little help. If that motivates you more, I am planning to send it to OpenCV to be integrated into the main repository.

I would be interested in code correctness, and if it happens to work as intended, some statistics for original/optimized performance. Do not forget to look at all scenarios.

So, here is the code. To run it, paste in on opencv/modules/imgproc/src/thresh.cpp, at line 228 (as of 2.4.2) - just below SSE block, and recompile OpenCV.

Also, add this line at the top of the file

#include <arm_neon.h>

Main code body:

#define CV_USE_NEON 1
#if CV_USE_NEON
    //if( checkHardwareSupport(CV_CPU_ARM_NEON) )
    if( true )
    {
        uint8x16_t thresh_u = vdupq_n_u8(thresh);
        uint8x16_t maxval_ = vdupq_n_u8(maxval);

        j_scalar = roi.width & -8;

        for( i = 0; i < roi.height; i++ )
        {
            const uchar* src = (const uchar*)(_src.data + _src.step*i);
            uchar* dst = (uchar*)(_dst.data + _dst.step*i);

            switch( type )
            {
            case THRESH_BINARY:
                for( j = 0; j <= roi.width - 32; j += 32 )
                {
                    uint8x16_t v0, v1;
                    v0 = vld1q_u8 ( src + j );
                    v1 = vld1q_u8 ( src + j + 16 );
                    v0 = vcgtq_u8 ( v0, thresh_u );
                    v1 = vcgtq_u8 ( v1, thresh_u );
                    v0 = vandq_u8 ( v0, maxval_ );
                    v1 = vandq_u8 ( v1, maxval_ );
                    vst1q_u8 ( dst + j, v0 );
                    vst1q_u8 ( dst + j + 16, v1 );
                }


                for( ; j <= roi.width - 8; j += 8 )
                {
                    uint8x8_t v2;
                    v2 = vld1_u8( src + j );
                    v2 = vcgt_u8 ( v2, vget_low_s8 ( thresh_u ) );
                    v2 = vand_u8 ( v2, vget_low_s8 ( maxval_ ) );
                    vst1_u8 ( dst + j, v2 );                    
                }
                break;

            case THRESH_BINARY_INV:         
                for( j = 0; j <= roi.width - 32; j += 32 )
                {
                    uint8x16_t v0, v1;
                    v0 = vld1q_u8 ( src + j );
                    v1 = vld1q_u8 ( src + j + 16 );
                    v0 = vcleq_u8 ( v0, thresh_u );
                    v1 = vcleq_u8 ( v1, thresh_u );
                    v0 = vandq_u8 ( v0, maxval_ );
                    v1 = vandq_u8 ( v1, maxval_ );
                    vst1q_u8 ( dst + j, v0 );
                    vst1q_u8 ( dst + j + 16, v1 );
                }


                for( ; j <= roi.width - 8; j += 8 )
                {
                    uint8x8_t v2;
                    v2 = vld1_u8( src + j );
                    v2 = vcle_u8 ( v2, vget_low_s8 ( thresh_u ) );
                    v2 = vand_u8 ( v2, vget_low_s8 ( maxval_ ) );
                    vst1_u8 ( dst + j, v2 );                    
                }
                break;

            case THRESH_TRUNC:
                for( j = 0; j <= roi.width - 32; j += 32 )
                {
                    uint8x16_t v0, v1;
                    v0 = vld1q_u8 ( src + j );
                    v1 = vld1q_u8 ( src + j + 16 );
                    v0 = vminq_u8 ( v0, thresh_u );
                    v1 = vminq_u8 ( v1, thresh_u );                 
                    vst1q_u8 ( dst + j, v0 );
                    vst1q_u8 ( dst + j + 16, v1 );
                }


                for( ; j <= roi.width - 8; j += 8 )
                {
                    uint8x8_t v2;
                    v2 = vld1_u8( src + j );
                    v2 = vmin_u8  ( v2, vget_low_s8 ( thresh_u ) );                 
                    vst1_u8 ( dst + j, v2 );                    
                }
                break;

            case THRESH_TOZERO:         
                for( j = 0; j <= roi.width - 32; j += 32 )
                {
                    uint8x16_t v0, v1;
                    v0 = vld1q_u8 ( src + j );
                    v1 = vld1q_u8 ( src + j + 16 );             
                    v0 = vandq_u8 ( vcgtq_u8 ( v0, thresh_u ), vmaxq_u8 ( v0, thresh_u ) );
                    v1 = vandq_u8 ( vcgtq_u8 ( v1, thresh_u ), vmaxq_u8 ( v1, thresh_u ) );
                    vst1q_u8 ( dst + j, v0 );
                    vst1q_u8 ( dst + j + 16, v1 );
                }


                for( ; j <= roi.width - 8; j += 8 )
                {
                    uint8x8_t v2;
                    v2 = vld1_u8 ( src + j );                    
                    v2 = vand_u8 ( vcgt_u8 ( v2, vget_low_s8(thresh_u) ), vmax_u8 ( v2, vget_low_s8(thresh_u) ) );
                    vst1_u8 ( dst + j, v2 );                    
                }
                break;

            case THRESH_TOZERO_INV:
                for( j = 0; j <= roi.width - 32; j += 32 )
                {
                    uint8x16_t v0, v1;
                    v0 = vld1q_u8 ( src + j );
                    v1 = vld1q_u8 ( src + j + 16 );             
                    v0 = vandq_u8 ( vcleq_u8 ( v0, thresh_u ), vminq_u8 ( v0, thresh_u ) );
                    v1 = vandq_u8 ( vcleq_u8 ( v1, thresh_u ), vminq_u8 ( v1, thresh_u ) );
                    vst1q_u8 ( dst + j, v0 );
                    vst1q_u8 ( dst + j + 16, v1 );
                }


                for( ; j <= roi.width - 8; j += 8 )
                {
                    uint8x8_t v2;
                    v2 = vld1_u8 ( src + j );                    
                    v2 = vand_u8 ( vcle_u8 ( v2, vget_low_s8(thresh_u) ), vmin_u8 ( v2, vget_low_s8(thresh_u) ) );
                    vst1_u8 ( dst + j, v2 );                    
                }
                break;
            }
        }
    }
#endif

Source: (StackOverflow)

Size of Matrix OpenCV

I know this might be very rudimentary, but I am new to OpenCV. Could you please tell me how to obtain the size of a matrix in OpenCV?. I googled and I am still searching, but if any of you know the answer, please help me.

Size as in number of rows and columns.

And is there a way to directly obtain the maximum value of a 2D matrix?


Source: (StackOverflow)

Convert RGB to Black & White in OpenCV

I would like to know how to convert an RGB image into a black & white (binary) image.

After conversion, how can I save the modified image to disk?


Source: (StackOverflow)

Checking images for similarity with OpenCV

Does OpenCV support the comparison of two images, returning some value (maybe a percentage) that indicates how similar these images are? E.g. 100% would be returned if the same image was passed twice, 0% would be returned if the images were totally different.

I already read a lot of similar topics here on StackOverflow. I also did quite some Googling. Sadly I couldn't come up with a satisfieng answer.


Source: (StackOverflow)

Perspective Transform + Crop in iOS with OpenCV

I'm trying to implement a cropping & perspective correction feature into an upcoming app. Whilst doing research, I came across:

Executing cv::warpPerspective for a fake deskewing on a set of cv::Point

http://sudokugrab.blogspot.ch/2009/07/how-does-it-all-work.html

So I decided to try implementing this feature with OpenCV - the framework is there so the installation was fast. However, I'm not getting the results I hoped for: (2nd picture is the result)

Original Photo & cropping box

Cropped Photo, bad result

I've translated all the code to work with Xcode and triple checked the Coordinates. Can you tell me what is wrong with my code? For the sake of completeness, I've also included the UIImage -> Mat conversion + reversal:

- (void)confirmedImage
{
    if ([_adjustRect frameEdited]) {

    cv::Mat src = [self cvMatFromUIImage:_sourceImage];

    // My original Coordinates
    // 4-------3
    // |       |
    // |       |
    // |       |
    // 1-------2

    CGFloat scaleFactor =  [_sourceImageView contentScale];
    CGPoint p1 = [_adjustRect coordinatesForPoint:4 withScaleFactor:scaleFactor];
    CGPoint p2 = [_adjustRect coordinatesForPoint:3 withScaleFactor:scaleFactor];
    CGPoint p3 = [_adjustRect coordinatesForPoint:1 withScaleFactor:scaleFactor];
    CGPoint p4 = [_adjustRect coordinatesForPoint:2 withScaleFactor:scaleFactor];

    std::vector<cv::Point2f> c1;
    c1.push_back(cv::Point2f(p1.x, p1.y));
    c1.push_back(cv::Point2f(p2.x, p2.y));
    c1.push_back(cv::Point2f(p3.x, p3.y));
    c1.push_back(cv::Point2f(p4.x, p4.y));

    cv::RotatedRect box = minAreaRect(cv::Mat(c1));
    cv::Point2f pts[4];
    box.points(pts);

    cv::Point2f src_vertices[3];
    src_vertices[0] = pts[0];
    src_vertices[1] = pts[1];
    src_vertices[2] = pts[3];

    cv::Point2f dst_vertices[4];
    dst_vertices[0].x = 0;
    dst_vertices[0].y = 0;

    dst_vertices[1].x = box.boundingRect().width-1;
    dst_vertices[1].y = 0;

    dst_vertices[2].x = 0;
    dst_vertices[2].y = box.boundingRect().height-1;

    dst_vertices[3].x = box.boundingRect().width-1;
    dst_vertices[3].y = box.boundingRect().height-1;

    cv::Mat warpAffineMatrix = getAffineTransform(src_vertices, dst_vertices);

    cv::Mat rotated;
    cv::Size size(box.boundingRect().width, box.boundingRect().height);
    warpAffine(src, rotated, warpAffineMatrix, size, cv::INTER_LINEAR, cv::BORDER_CONSTANT);


    [_sourceImageView setNeedsDisplay];
    [_sourceImageView setImage:[self UIImageFromCVMat:rotated]];
    [_sourceImageView setContentMode:UIViewContentModeScaleAspectFit];

    rotated.release();
    src.release();

    }
}

- (UIImage *)UIImageFromCVMat:(cv::Mat)cvMat
{
    NSData *data = [NSData dataWithBytes:cvMat.data length:cvMat.elemSize()*cvMat.total()];
    CGColorSpaceRef colorSpace;
    if ( cvMat.elemSize() == 1 ) {
        colorSpace = CGColorSpaceCreateDeviceGray();
    }
    else {
        colorSpace = CGColorSpaceCreateDeviceRGB();
    }
    CGDataProviderRef provider = CGDataProviderCreateWithCFData( (__bridge CFDataRef)data );
    CGImageRef imageRef = CGImageCreate( cvMat.cols, cvMat.rows, 8, 8 * cvMat.elemSize(), cvMat.step[0], colorSpace, kCGImageAlphaNone|kCGBitmapByteOrderDefault, provider, NULL, false, kCGRenderingIntentDefault );
    UIImage *finalImage = [UIImage imageWithCGImage:imageRef];
    CGImageRelease( imageRef );
    CGDataProviderRelease( provider );
    CGColorSpaceRelease( colorSpace );
    return finalImage;
}

- (cv::Mat)cvMatFromUIImage:(UIImage *)image
{
    CGColorSpaceRef colorSpace = CGImageGetColorSpace( image.CGImage );
    CGFloat cols = image.size.width;
    CGFloat rows = image.size.height;
    cv::Mat cvMat( rows, cols, CV_8UC4 );
    CGContextRef contextRef = CGBitmapContextCreate( cvMat.data, cols, rows, 8, cvMat.step[0], colorSpace, kCGImageAlphaNoneSkipLast | kCGBitmapByteOrderDefault );
    CGContextDrawImage( contextRef, CGRectMake(0, 0, rows, cols), image.CGImage );
    CGContextRelease( contextRef );
    CGColorSpaceRelease( colorSpace );
    return cvMat;
}

Is this the correct approach to my problem? Do you have any sample code that could help me out?

Thank you for reading my question!

UDATE:

I've actually Open Sourced my UIImagePickerController replacement here: https://github.com/mmackh/MAImagePickerController-of-InstaPDF which includes the adjustable cropping view, filters and perspective correction.


Source: (StackOverflow)

Classification of detectors, extractors and matchers

I am new to opencv and trying to implement image matching between two images. For this purpose, I'm trying to understand the difference between feature descriptors, descriptor extractors and descriptor matchers. I came across a lot of terms and tried to read about them on the opencv documentation website but I just can't seem to wrap my head around the concepts. I understood the basic difference here. Difference between Feature Detection and Descriptor Extraction

But I came across the following terms while studying on the topic :

FAST, GFTT, SIFT, SURF, MSER, STAR, ORB, BRISK, FREAK, BRIEF

I understand how FAST, SIFT, SURF work but can't seem to figure out which ones of the above are only detectors and which are extractors.

Then there are the matchers.

FlannBased, BruteForce, knnMatch and probably some others.

After some reading, I figured that certain matchers can only be used with certain extractors as explained here. How Does OpenCV ORB Feature Detector Work? The classification given is quite clear but it's only for a few extractors and I don't understand the difference between float and uchar.

So basically, can someone please

  1. classify the types of detectors, extractors and matchers based on float and uchar, as mentioned, or some other type of classification?
  2. explain the difference between the float and uchar classification or whichever classification is being used?
  3. mention how to initialize (code) various types of detectors, extractors and matchers?

I know its asking for a lot but I'll be highly grateful. Thank you.


Source: (StackOverflow)

OpenCV: IplImage versus Mat, which to use?

I'm pretty new to OpenCV (about 2 months now). I have the book Learning OpenCV by Bradski and Kaehler. My question is, if I want to do everything in a 2.0+ manner, when should I use Matrices (Mat) and when should I use IplImage?

Bradky's book states upfront (Preface) that it's written based on OpenCV 2.0, and it mostly uses IplImage in its sample code, but the more recent online documentation makes it sound like Mat is now a catch-all data type for images, masks, etc, kind of like a basic matrix in Matlab. This leaves me wondering if IplImage should be considered obsolete.

So, should I be totally avoiding IplImages when writing new code? Or is there important stuff IplImages allow me to do that Mats do not?

Thanks.


Source: (StackOverflow)

Extracting text OpenCV

I am trying to find the bounding boxes of text in an image and am currently using this approach:

// calculate the local variances of the grayscale image
Mat t_mean, t_mean_2;
Mat grayF;
outImg_gray.convertTo(grayF, CV_32F);
int winSize = 35;
blur(grayF, t_mean, cv::Size(winSize,winSize));
blur(grayF.mul(grayF), t_mean_2, cv::Size(winSize,winSize));
Mat varMat = t_mean_2 - t_mean.mul(t_mean);
varMat.convertTo(varMat, CV_8U);

// threshold the high variance regions
Mat varMatRegions = varMat > 100;

When given an image like this:

enter image description here

Then when I show varMatRegions I get this image:

enter image description here

As you can see it somewhat combines the left block of text with the header of the card, for most cards this method works great but on busier cards it can cause problems.

The reason it is bad for those contours to connect is that it makes the bounding box of the contour nearly take up the entire card.

Can anyone suggest a different way I can find the text to ensure proper detection of text?

200 points to whoever can find the text in the card above the these two.

enter image description here enter image description here


Source: (StackOverflow)

Algorithm to detect corners of paper sheet in photo

What is the best way to detect the corners of an invoice/receipt/sheet-of-paper in a photo? This is to be used for subsequent perspective correction, before OCR.

My current approach has been:

RGB > Gray > Canny Edge Detection with thresholding > Dilate(1) > Remove small objects(6) > clear boarder objects > pick larges blog based on Convex Area. > [corner detection - Not implemented]

I can't help but think there must be a more robust 'intelligent'/statistical approach to handle this type of segmentation. I don't have a lot of training examples, but I could probably get 100 images together.

Broader context:

I'm using matlab to prototype, and planning to implement the system in OpenCV and Tesserect-OCR. This is the first of a number of image processing problems I need to solve for this specific application. So I'm looking to roll my own solution and re-familiarize myself with image processing algorithms.

Here are some sample image that I'd like the algorithm to handle: If you'd like to take up the challenge the large images are at http://madteckhead.com/tmp

case 1 case 2 case 3 case 4

In the best case this gives:

case 1 - canny case 1 - post canny case 1 - largest blog

However it fails easily on other cases:

case 2 - canny case 2 - post canny case 2 - largest blog

Thanks in advance for all the great ideas! I love SO!

EDIT: Hough Transform Progress

Q: What algorithm would cluster the hough lines to find corners? Following advice from answers I was able to use the Hough Transform, pick lines, and filter them. My current approach is rather crude. I've made the assumption the invoice will always be less than 15deg out of alignment with the image. I end up with reasonable results for lines if this is the case (see below). But am not entirely sure of a suitable algorithm to cluster the lines (or vote) to extrapolate for the corners. The Hough lines are not continuous. And in the noisy images, there can be parallel lines so some form or distance from line origin metrics are required. Any ideas?

case 1 case 2 case 3 case 4


Source: (StackOverflow)

linux/videodev.h : no such file or directory - OpenCV on ubuntu 11.04

I tried to install OpenCV2.2 on Ubuntu 11.04. But OpenCV compilation fails stating an error related to linux/videodev.h file. File available in /user/includes/linux is named videodev2.h.

/home/user/OpenCV-2.2.0/modules/highgui/src/cap_v4l.cpp:217:28: fatal error:    linux/videodev.h: No such file or directory
compilation terminated.
make[2]: *** [modules/highgui/CMakeFiles/opencv_highgui.dir/src/cap_v4l.o] Error 1
make[1]: *** [modules/highgui/CMakeFiles/opencv_highgui.dir/all] Error 2
make: *** [all] Error 2

Is there a solution for this ?

thank you.


Source: (StackOverflow)