algorithm interview questions
Top algorithm frequently asked interview questions
Very simply, what is tail-call optimization? More specifically, Can anyone show some small code snippets where it could be applied, and where not, with an explanation of why?
Source: (StackOverflow)
I was trying various methods to implement a program that gives the digits of pi sequentially. I tried the Taylor series method, but it proved to converge extremely slowly (when I compared my result with the online values after some time). Anyway, I am trying better algorithms.
So, while writing the program I got stuck on a problem, as with all algorithms: How do I know that the n
digits that I've calculated are accurate?
Source: (StackOverflow)
What would be the most efficient way to compare two double
or two float
values?
Simply doing this is not correct:
bool CompareDoubles1 (double A, double B)
{
return A == B;
}
But something like:
bool CompareDoubles2 (double A, double B)
{
diff = A - B;
return (diff < EPSILON) && (-diff < EPSILON);
}
Seems to waste processing.
Does anyone know a smarter float comparer?
Source: (StackOverflow)
Answering to another Stack Overflow question (this one) I stumbled upon an interesting sub-problem. What is the fastest way to sort an array of 6 ints?
As the question is very low level:
- we can't assume libraries are available (and the call itself has its cost), only plain C
- to avoid emptying instruction pipeline (that has a very high cost) we should probably minimize branches, jumps, and every other kind of control flow breaking (like those hidden behind sequence points in && or ||).
- room is constrained and minimizing registers and memory use is an issue, ideally in place sort is probably best.
Really this question is a kind of Golf where the goal is not to minimize source length but execution time. I call it 'Zening` code as used in the title of the book Zen of Code optimization by Michael Abrash and its sequels.
As for why it is interesting, there is several layers:
- the example is simple and easy to understand and measure, not much C skill involved
- it shows effects of choice of a good algorithm for the problem, but also effects of the compiler and underlying hardware.
Here is my reference (naive, not optimized) implementation and my test set.
#include <stdio.h>
static __inline__ int sort6(int * d){
char j, i, imin;
int tmp;
for (j = 0 ; j < 5 ; j++){
imin = j;
for (i = j + 1; i < 6 ; i++){
if (d[i] < d[imin]){
imin = i;
}
}
tmp = d[j];
d[j] = d[imin];
d[imin] = tmp;
}
}
static __inline__ unsigned long long rdtsc(void)
{
unsigned long long int x;
__asm__ volatile (".byte 0x0f, 0x31" : "=A" (x));
return x;
}
int main(int argc, char ** argv){
int i;
int d[6][5] = {
{1, 2, 3, 4, 5, 6},
{6, 5, 4, 3, 2, 1},
{100, 2, 300, 4, 500, 6},
{100, 2, 3, 4, 500, 6},
{1, 200, 3, 4, 5, 600},
{1, 1, 2, 1, 2, 1}
};
unsigned long long cycles = rdtsc();
for (i = 0; i < 6 ; i++){
sort6(d[i]);
/*
* printf("d%d : %d %d %d %d %d %d\n", i,
* d[i][0], d[i][6], d[i][7],
* d[i][8], d[i][9], d[i][10]);
*/
}
cycles = rdtsc() - cycles;
printf("Time is %d\n", (unsigned)cycles);
}
Raw results
As number of variants is becoming large, I gathered them all in a test suite that can be found here. The actual tests used are a bit less naive than those showed above, thanks to Kevin Stock. You can compile and execute it in your own environment. I'm quite interested by behavior on different target architecture/compilers. (OK guys, put it in answers, I will +1 every contributor of a new resultset).
I gave the answer to Daniel Stutzbach (for golfing) one year ago as he was at the source of the fastest solution at that time (sorting networks).
Linux 64 bits, gcc 4.6.1 64 bits, Intel Core 2 Duo E8400, -O2
- Direct call to qsort library function : 689.38
- Naive implementation (insertion sort) : 285.70
- Insertion Sort (Daniel Stutzbach) : 142.12
- Insertion Sort Unrolled : 125.47
- Rank Order : 102.26
- Rank Order with registers : 58.03
- Sorting Networks (Daniel Stutzbach) : 111.68
- Sorting Networks (Paul R) : 66.36
- Sorting Networks 12 with Fast Swap : 58.86
- Sorting Networks 12 reordered Swap : 53.74
- Sorting Networks 12 reordered Simple Swap : 31.54
- Reordered Sorting Network w/ fast swap : 31.54
- Reordered Sorting Network w/ fast swap V2 : 33.63
- Inlined Bubble Sort (Paolo Bonzini) : 48.85
- Unrolled Insertion Sort (Paolo Bonzini) : 75.30
Linux 64 bits, gcc 4.6.1 64 bits, Intel Core 2 Duo E8400, -O1
- Direct call to qsort library function : 705.93
- Naive implementation (insertion sort) : 135.60
- Insertion Sort (Daniel Stutzbach) : 142.11
- Insertion Sort Unrolled : 126.75
- Rank Order : 46.42
- Rank Order with registers : 43.58
- Sorting Networks (Daniel Stutzbach) : 115.57
- Sorting Networks (Paul R) : 64.44
- Sorting Networks 12 with Fast Swap : 61.98
- Sorting Networks 12 reordered Swap : 54.67
- Sorting Networks 12 reordered Simple Swap : 31.54
- Reordered Sorting Network w/ fast swap : 31.24
- Reordered Sorting Network w/ fast swap V2 : 33.07
- Inlined Bubble Sort (Paolo Bonzini) : 45.79
- Unrolled Insertion Sort (Paolo Bonzini) : 80.15
I included both -O1 and -O2 results because surprisingly for several programs O2 is less efficient than O1. I wonder what specific optimization has this effect ?
Comments on proposed solutions
Insertion Sort (Daniel Stutzbach)
As expected minimizing branches is indeed a good idea.
Sorting Networks (Daniel Stutzbach)
Better than insertion sort. I wondered if the main effect was not get from avoiding the external loop. I gave it a try by unrolled insertion sort to check and indeed we get roughly the same figures (code is here).
Sorting Networks (Paul R)
The best so far. The actual code I used to test is here. Don't know yet why it is nearly two times as fast as the other sorting network implementation. Parameter passing ? Fast max ?
Sorting Networks 12 SWAP with Fast Swap
As suggested by Daniel Stutzbach, I combined his 12 swap sorting network with branchless fast swap (code is here). It is indeed faster, the best so far with a small margin (roughly 5%) as could be expected using 1 less swap.
It is also interesting to notice that the branchless swap seems to be much (4 times) less efficient than the simple one using if on PPC architecture.
Calling Library qsort
To give another reference point I also tried as suggested to just call library qsort (code is here). As expected it is much slower : 10 to 30 times slower... as it became obvious with the new test suite, the main problem seems to be the initial load of the library after the first call, and it compares not so poorly with other version. It is just between 3 and 20 times slower on my Linux. On some architecture used for tests by others it seems even to be faster (I'm really surprised by that one, as library qsort use a more complex API).
Rank order
Rex Kerr proposed another completely different method : for each item of the array compute directly its final position. This is efficient because computing rank order do not need branch. The drawback of this method is that it takes three times the amount of memory of the array (one copy of array and variables to store rank orders). The performance results are very surprising (and interesting). On my reference architecture with 32 bits OS and Intel Core2 Quad E8300, cycle count was slightly below 1000 (like sorting networks with branching swap). But when compiled and executed on my 64 bits box (Intel Core2 Duo) it performed much better : it became the fastest so far. I finally found out the true reason. My 32bits box use gcc 4.4.1 and my 64bits box gcc 4.4.3 and the last one seems much better at optimising this particular code (there was very little difference for other proposals).
update:
As published figures above shows this effect was still enhanced by later versions of gcc and Rank Order became consistently twice as fast as any other alternative.
Sorting Networks 12 with reordered Swap
The amazing efficiency of the Rex Kerr proposal with gcc 4.4.3 made me wonder : how could a program with 3 times as much memory usage be faster than branchless sorting networks? My hypothesis was that it had less dependencies of the kind read after write, allowing for better use of the superscalar instruction scheduler of the x86. That gave me an idea: reorder swaps to minimize read after write dependencies. More simply put: when you do SWAP(1, 2); SWAP(0, 2);
you have to wait for the first swap to be finished before performing the second one because both access to a common memory cell. When you do SWAP(1, 2); SWAP(4, 5);
the processor can execute both in parallel. I tried it and it works as expected, the sorting networks is running about 10% faster.
Sorting Networks 12 with Simple Swap
One year after the original post Steinar H. Gunderson suggested, that we should not try to outsmart the compiler and keep the swap code simple. It's indeed a good idea as the resulting code is about 40% faster! He also proposed a swap optimized by hand using x86 inline assembly code that can still spare some more cycles. The most surprising (it says volumes on programmer's psychology) is that one year ago none of used tried that version of swap. Code I used to test is here. Others suggested other ways to write a C fast swap, but it yields the same performances as the simple one with a decent compiler.
The "best" code is now as follow:
static inline void sort6_sorting_network_simple_swap(int * d){
#define min(x, y) (x<y?x:y)
#define max(x, y) (x<y?y:x)
#define SWAP(x,y) { const int a = min(d[x], d[y]);
const int b = max(d[x], d[y]);
d[x] = a; d[y] = b; }
SWAP(1, 2);
SWAP(4, 5);
SWAP(0, 2);
SWAP(3, 5);
SWAP(0, 1);
SWAP(3, 4);
SWAP(1, 4);
SWAP(0, 3);
SWAP(2, 5);
SWAP(1, 3);
SWAP(2, 4);
SWAP(2, 3);
#undef SWAP
#undef min
#undef max
}
If we believe our test set (and, yes it is quite poor, it's mere benefit is being short, simple and easy to understand what we are measuring), the average number of cycles of the resulting code for one sort is below 40 cycles (6 tests are executed). That put each swap at an average of 4 cycles. I call that amazingly fast. Any other improvements possible ?
Source: (StackOverflow)
What is a plain English explanation of Big O notation? I'd prefer as little formal definition as possible and simple mathematics.
Source: (StackOverflow)
Yesterday I was pairing the socks from the clean laundry and figured out the way I was doing it is not very efficient. I was doing a naive search — picking one sock and "iterating" the pile in order to find its pair. This requires iterating over n/2 * n/4 = n2/8 socks on average.
As a computer scientist I was thinking what I could do? Sorting (according to size/color/...) of course came to mind to achieve an O(NlogN) solution.
Hashing or other not-in-place solutions are not an option, because I am not able to duplicate my socks (though it could be nice if I could).
So, the question is basically:
Given a pile of n
pairs of socks, containing 2n
elements (assume each sock has exactly one matching pair), what is the best way to pair them up efficiently with up to logarithmic extra space? (I believe I can remember that amount of info if needed.)
I will appreciate an answer that addresses the following aspects:
- A general theoretical solution for a huge number of socks.
- The actual number of socks is not that large, I don't believe my spouse and I have more than 30 pairs. (And it is fairly easy to distinguish between my socks and hers; can this be used as well?)
- Is it equivalent to the element distinctness problem?
Source: (StackOverflow)
One of the most interesting projects I've worked in the past couple years as I was still a student, was a final project about image processing. The goal was to develop a system to be able to recognize Coca-Cola cans (note that I'm stressing the word cans, you'll see why in a minute). You can see a sample below, with the can recognized in the green rectangle with scale and rotation.
Some contraints on the project:
- The background could be very noisy.
- The can could have any scale or rotation or even orientation (within reasonable limits)
- The image could have some degree of fuziness (contours could be not really straight)
- There could be Coca-Cola bottles in the image, and the algorithm should only detect the can !
- The brightness of the image could vary a lot (so you can't rely "too much" on color detection.
- The can could be partly hidden on the sides or the middle (and possibly partly hidden behind the bottle !)
- There could be no cans at all in the image, in which case you had to find nothing and write a message saying so.
So you could end up with tricky things like this (which in this case had my algorithm totally fail):
Now I've done this project obviously as it was a while ago, and had a lot of fun doing it, and I had a decent implementation. Here are some details about my implementation:
Language: Done in C++ using OpenCV library.
Pre-processing: Regarding image pre-processing I mean how to transform it in a more raw form to give to the algorithm. I used 2 methods:
- Changing color domain from RGB to HSV (Hue Saturation Value) and filtering based on "red" hue, saturation above a certain threshold to avoid orange-like colors, and filtering of low value to avoid dark tones. The end result was a binary black and white image, where all white pixels would represent the pixels that match this threshold. Obviously there is still a lot of crap in the image, but this reduces the number of dimensions you have to work with).
- Noise filtering using median filtering (taking the median pixel value of all neighbors and replace the pixel by this value) to reduce noise.
- Using Canny Edge Detection Filter to get the contours of all items after 2 precedent steps.
Algorithm: The algorithm itself I chose for this task was taken from this (awesome) book on feature extraction and called Generalized Hough Transform (pretty different from the regular Hough Transform). It basically says a few things:
- You can describe an object in space without knowing its analytical equation (which is the case here).
- It is resistent to image deformations such as scaling and rotation, as it will basically test your image for every combination of scale factor and rotation factor.
- It uses a base model (a template) that the algorithm will "learn".
- Each pixel remaining in the contour image will vote for another pixel which will supposedly be the center (in terms of gravity) of your object, based on what it learned from the model.
In the end, you end up with a heat map of the votes, for example here all the pixels of the contour of the can will vote for its gravitational center, so you'll have a lot of votes in the same pixel corresponding to the center, and will see a peak in the heat map as below.
Once you have that, a simple threshold-based heuristic can give you the location of the center pixel, from which you can derive the scale and rotation and then plot your little rectangle around it (final scale and rotation factor will obviously be relative to your original template). In theory at least...
Results: Now, while this approach worked in the basic cases, it was severely lacking in some areas:
- It is extremely slow ! I'm not stressing this enough. Almost a full day was needed to process the 30 test images, obviously because I had a very high scaling factor for rotation and translation, since some of the cans were very small.
- It was completely lost when bottles were in the image, and for some reason almost always found the bottle instead of the can (perhaps because bottles were bigger, thus had more pixels, thus more votes)
- Fuzzy images were also no good, since the votes ended up in pixel at random locations around the center, thus ending with a very noisy heat map.
- Invariance in translation and rotation was achieved, but not in orientation, meaning that a can that was not directly facing the camera objective wasn't recognized.
Can you help me improve my specific algorithm, using exclusively OpenCV features, to resolve the four specific issues mentionned?
I hope some people will also learn something out of it as well, after all I think not only people who ask questions should learn :)
Source: (StackOverflow)
Whilst starting to learn lisp, I've come across the term tail-recursive. What does it mean?
Source: (StackOverflow)
I had an interesting job interview experience a while back. The question started really easy:
Q1: We have a bag containing numbers 1
, 2
, 3
, …, 100
. Each number appears exactly once, so there are 100 numbers. Now one number is randomly picked out of the bag. Find the missing number.
I've heard this interview question before, of course, so I very quickly answered along the lines of:
A1: Well, the sum of the numbers 1 + 2 + 3 + … + N
is (N+1)(N/2)
(see Wikipedia: sum of arithmetic series). For N = 100
, the sum is 5050
.
Thus, if all numbers are present in the bag, the sum will be exactly 5050
. Since one number is missing, the sum will be less than this, and the difference is that number. So we can find that missing number in O(N)
time and O(1)
space.
At this point I thought I had done well, but all of a sudden the question took an unexpected turn:
Q2: That is correct, but now how would you do this if TWO numbers are missing?
I had never seen/heard/considered this variation before, so I panicked and couldn't answer the question. The interviewer insisted on knowing my thought process, so I mentioned that perhaps we can get more information by comparing against the expected product, or perhaps doing a second pass after having gathered some information from the first pass, etc, but I really was just shooting in the dark rather than actually having a clear path to the solution.
The interviewer did try to encourage me by saying that having a second equation is indeed one way to solve the problem. At this point I was kind of upset (for not knowing the answer before hand), and asked if this is a general (read: "useful") programming technique, or if it's just a trick/gotcha answer.
The interviewer's answer surprised me: you can generalize the technique to find 3 missing numbers. In fact, you can generalize it to find k missing numbers.
Qk: If exactly k numbers are missing from the bag, how would you find it efficiently?
This was a few months ago, and I still couldn't figure out what this technique is. Obviously there's a Ω(N)
time lower bound since we must scan all the numbers at least once, but the interviewer insisted that the TIME and SPACE complexity of the solving technique (minus the O(N)
time input scan) is defined in k not N.
So the question here is simple:
- How would you solve Q2?
- How would you solve Q3?
- How would you solve Qk?
Clarifications
- Generally there are N numbers from 1..N, not just 1..100.
- I'm not looking for the obvious set-based solution, e.g. using a bit set, encoding the presence/absence each number by the value of a designated bit, therefore using
O(N)
bits in additional space. We can't afford any additional space proportional to N.
- I'm also not looking for the obvious sort-first approach. This and the set-based approach are worth mentioning in an interview (they are easy to implement, and depending on N, can be very practical). I'm looking for the Holy Grail solution (which may or may not be practical to implement, but has the desired asymptotic characteristics nevertheless).
So again, of course you must scan the input in O(N)
, but you can only capture small amount of information (defined in terms of k not N), and must then find the k missing numbers somehow.
Source: (StackOverflow)
I feel a bit thick at this point. I've spent days trying to fully wrap my head around suffix tree construction, but because I don't have a mathematical background, many of the explanations elude me as they start to make excessive use of mathematical symbology. The closest to a good explanation that I've found is Fast String Searching With Suffix Trees, but he glosses over various points and some aspects of the algorithm remain unclear.
A step-by-step explanation of this algorithm here on Stack Overflow would be invaluable for many others besides me, I'm sure.
For reference, here's Ukkonen's paper on the algorithm: http://www.cs.helsinki.fi/u/ukkonen/SuffixT1withFigs.pdf
My basic understanding, so far:
- I need to iterate through each prefix P of a given string T
- I need to iterate through each suffix S in prefix P and add that to tree
- To add suffix S to the tree, I need to iterate through each character in S, with the iterations consisting of either walking down an existing branch that starts with the same set of characters C in S and potentially splitting an edge into descendent nodes when I reach a differing character in the suffix, OR if there was no matching edge to walk down. When no matching edge is found to walk down for C, a new leaf edge is created for C.
The basic algorithm appears to be O(n2), as is pointed out in most explanations, as we need to step through all of the prefixes, then we need to step through each of the suffixes for each prefix. Ukkonen's algorithm is apparently unique because of the suffix pointer technique he uses, though I think that is what I'm having trouble understanding.
I'm also having trouble understanding:
- exactly when and how the "active point" is assigned, used and changed
- what is going on with the canonization aspect of the algorithm
- Why the implementations I've seen need to "fix" bounding variables that they are using
EDIT (April 13, 2012)
Here is the completed source code that I've written and output based on jogojapan's answer below. The code outputs a detailed description and text-based diagram of the steps it takes as it builds the tree. It is a first version and could probably do with optimization and so forth, but it works, which is the main thing.
[Redacted URL, see updated link below]
EDIT (April 15, 2012)
The source code has been completely rewritten from scratch and now not only works correctly, but it supports automatic canonization and renders a nicer looking text graph of the output. Source code and sample output is at:
https://gist.github.com/2373868
Source: (StackOverflow)
I am currently learning about Big O Notation running times and amortized times. I understand the notion of O(n) linear time, meaning that the size of the input affects the growth of the algorithm proportionally...and the same goes for, for example, quadratic time O(n2) etc..even algorithms, such as permutation generators, with O(n!) times, that grow by factorials.
For example, the following function is O(n) because the algorithm grows in proportion to its input n:
f(int n) {
int i;
for (i = 0; i < n; ++i)
printf("%d", i);
}
Similarly, if there was a nested loop, the time would be O(n2).
But what exactly is O(log n)? For example, what does it mean to say that the height of a complete binary tree is O(log n)?
I do know (maybe not in great detail) what Logarithm is, in the sense that: log10 100 = 2, but I cannot understand how to identify a function with a logarithmic time.
Source: (StackOverflow)
I have recently stumbled upon the game 2048. You merge similar tiles by moving them in any of the four directions to make "bigger" tiles. After each move, a new tile appears at random empty position with value of either 2
or 4
. The game terminates when all the boxes are filled and there are no moves that can merge tiles, or you create a tile with a value of 2048
.
One, I need to follow a well-defined strategy to reach the goal. So, I thought of writing a program for it.
My current algorithm:
while (!game_over) {
for each possible move:
count_no_of_merges_for_2-tiles and 4-tiles
choose the move with large number of merges
}
What I am doing is at any point, I will try to merge the tiles with values 2
and 4
, that is, I try to have 2
and 4
tiles, as minimum as possible. If I try it this way, all other tiles were automatically getting merged and the strategy seems good.
But, when I actually use this algorithm, I only get around 4000 points before the game terminates. Maximum points AFAIK is slightly more than 20,000 points which is way larger than my current score. Is there a better algorithm than the above?
Source: (StackOverflow)
I've been developing an internal website for a portfolio management tool. There is a lot of text data, company names etc. I've been really impressed with some search engines ability to very quickly respond to queries with "Did you mean: xxxx".
I need to be able to intelligently take a user query and respond with not only raw search results but also with a "Did you mean?" response when there is a highly likely alternative answer etc
[I'm developing in ASP.NET (VB - don't hold it against me! )]
UPDATE:
OK, how can I mimic this without the millions of 'unpaid users'?
- Generate typos for each 'known' or 'correct' term and perform lookups?
- Some other more elegant method?
Source: (StackOverflow)
I want to write a function that takes an array of letters as an argument and a number of those letters to select.
Say you provide an array of 8 letters and want to select 3 letters from that. Then you should get:
8! / ((8 - 3)! * 3!) = 56
Arrays (or words) in return consisting of 3 letters each.
Source: (StackOverflow)
Lately I have been playing a game on my iPhone called Scramble. Some of you may know this game as Boggle. Essentially, when the game starts you get a matrix of letters like so:
F X I E
A M L O
E W B X
A S T U
The goal of the game is to find as many words as you can that can be formed by chaining letters together. You can start with any letter, and all the letters that surround it are fair game, and then once you move on to the next letter, all the letters that surround that letter are fair game, except for any previously used letters. So in the grid above, for example, I could come up with the words LOB
, TUX
, SEA
, FAME
, etc. Words must be at least 3 characters, and no more than NxN characters, which would be 16 in this game but can vary in some implementations. While this game is fun and addictive, I am apparently not very good at it and I wanted to cheat a little bit by making a program that would give me the best possible words (the longer the word the more points you get).
I am, unfortunately, not very good with algorithms or their efficiencies and so forth. My first attempt uses a dictionary such as this one (~2.3MB) and does a linear search trying to match combinations with dictionary entries. This takes a very long time to find the possible words, and since you only get 2 minutes per round, it is simply not adequate.
I am interested to see if any Stackoverflowers can come up with more efficient solutions. I am mostly looking for solutions using the Big 3 Ps: Python, PHP, and Perl, although anything with Java or C++ is cool too, since speed is essential.
CURRENT SOLUTIONS:
- Adam Rosenfield, Python, ~20s
- John Fouhy, Python, ~3s
- Kent Fredric, Perl, ~1s
- Darius Bacon, Python, ~1s
- rvarcher, VB.NET (live link), ~1s
- Paolo Bergantino, PHP (live link), ~5s (~2s locally)
BOUNTY:
I am adding a bounty to this question as my way of saying thanks to all the people who pitched in with their programs. Unfortunately I can only give the accepted answer to one of you, so I'll measure who has the fastest boggle solver 7 days from now and award the winner the bounty.
Bounty awarded. Thanks to everyone that participated.
Source: (StackOverflow)