architecture interview questions
Top architecture frequently asked interview questions
Sometimes I'm solving problems on projecteuler.net. Almost all problems are solvable with programs, but these tasks are more mathematical than programmatical.
Maybe someone knows similar sites with:
- design tasks,
- architecture tasks,
- something like "find most elegant C++ solution"?
Source: (StackOverflow)
I've seen quite a few developer job postings recently that include a sentence that reads more or less like this: "Must have experience with N-Tier architecture", or "Must be able to develop N-Tier apps".
This leads me to ask, what is N-Tier architecture? How does one gain experience with it?
Source: (StackOverflow)
How can the theoretical peak performance of 4 floating point operations (double precision) per cycle be achieved on a modern x86-64 Intel CPU?
As far as I understand it take three cycles for an SSE add
and five cycles for a mul
to complete on most of the modern Intel CPUs (see for example Agner Fog's 'Instruction Tables' ). Due to pipelining one can get a throughput of one add
per cycle if the algorithm has at least three independent summations. Since that is true for packed addpd
as well as the scalar addsd
versions and SSE registers can contain two double
's the throughput can be as much as two flops per cycle.
Furthermore, it seems (although I've not seen any proper documentation on this) add
's and mul
's can be executed in parallel giving a theoretical max throughput of four flops per cycle.
However, I've not been able to replicate that performance with a simple C/C++ programme. My best attempt resulted in about 2.7 flops/cycle. If anyone can contribute a simple C/C++ or assembler programme which demonstrates peak performance that'd be greatly appreciated.
My attempt:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <sys/time.h>
double stoptime(void) {
struct timeval t;
gettimeofday(&t,NULL);
return (double) t.tv_sec + t.tv_usec/1000000.0;
}
double addmul(double add, double mul, int ops){
// Need to initialise differently otherwise compiler might optimise away
double sum1=0.1, sum2=-0.1, sum3=0.2, sum4=-0.2, sum5=0.0;
double mul1=1.0, mul2= 1.1, mul3=1.2, mul4= 1.3, mul5=1.4;
int loops=ops/10; // We have 10 floating point operations inside the loop
double expected = 5.0*add*loops + (sum1+sum2+sum3+sum4+sum5)
+ pow(mul,loops)*(mul1+mul2+mul3+mul4+mul5);
for (int i=0; i<loops; i++) {
mul1*=mul; mul2*=mul; mul3*=mul; mul4*=mul; mul5*=mul;
sum1+=add; sum2+=add; sum3+=add; sum4+=add; sum5+=add;
}
return sum1+sum2+sum3+sum4+sum5+mul1+mul2+mul3+mul4+mul5 - expected;
}
int main(int argc, char** argv) {
if (argc != 2) {
printf("usage: %s <num>\n", argv[0]);
printf("number of operations: <num> millions\n");
exit(EXIT_FAILURE);
}
int n = atoi(argv[1]) * 1000000;
if (n<=0)
n=1000;
double x = M_PI;
double y = 1.0 + 1e-8;
double t = stoptime();
x = addmul(x, y, n);
t = stoptime() - t;
printf("addmul:\t %.3f s, %.3f Gflops, res=%f\n", t, (double)n/t/1e9, x);
return EXIT_SUCCESS;
}
Compiled with
g++ -O2 -march=native addmul.cpp ; ./a.out 1000
produces the following output on an Intel Core i5-750, 2.66 GHz.
addmul: 0.270 s, 3.707 Gflops, res=1.326463
That is, just about 1.4 flops per cycle. Looking at the assembler code with
g++ -S -O2 -march=native -masm=intel addmul.cpp
the main loop seems kind of
optimal to me:
.L4:
inc eax
mulsd xmm8, xmm3
mulsd xmm7, xmm3
mulsd xmm6, xmm3
mulsd xmm5, xmm3
mulsd xmm1, xmm3
addsd xmm13, xmm2
addsd xmm12, xmm2
addsd xmm11, xmm2
addsd xmm10, xmm2
addsd xmm9, xmm2
cmp eax, ebx
jne .L4
Changing the scalar versions with packed versions (addpd
and mulpd
) would double the flop count without changing the execution time and so I'd get just short of 2.8 flops per cycle. Is there a simple example which achieves four flops per cycle?
Nice little programme by Mysticial; here are my results (run just for a few seconds though):
gcc -O2 -march=nocona
: 5.6 Gflops out of 10.66 Gflops (2.1 flops/cycle)
cl /O2
, openmp removed: 10.1 Gflops out of 10.66 Gflops (3.8 flops/cycle)
It all seems a bit complex, but my conclusions so far:
gcc -O2
changes the order of independent floating point operations with
the aim of alternating
addpd
and mulpd
's if possible. Same applies to gcc-4.6.2 -O2 -march=core2
.
gcc -O2 -march=nocona
seems to keep the order of floating point operations as defined in
the C++ source.
cl /O2
, the 64-bit compiler from the
SDK for Windows 7
does loop-unrolling automatically and seems to try and arrange operations
so that groups of three addpd
's alternate with three mulpd
's (well, at least on my system and for my simple programme).
My Core i5 750 (Nahelem architecture)
doesn't like alternating add's and mul's and seems unable
to run both operations in parallel. However, if grouped in 3's it suddenly works like magic.
Other architectures (possibly Sandy Bridge and others) appear to
be able to execute add/mul in parallel without problems
if they alternate in the assembly code.
Although difficult to admit, but on my system cl /O2
does a much better job at low-level optimising operations for my system and achieves close to peak performance for the little C++ example above. I measured between
1.85-2.01 flops/cycle (have used clock() in Windows which is not that precise. I guess, need to use a better timer - thanks Mackie Messer).
The best I managed with gcc
was to manually loop unroll and arrange
additions and multiplications in groups of three. With
g++ -O2 -march=nocona addmul_unroll.cpp
I get at best 0.207s, 4.825 Gflops
which corresponds to 1.8 flops/cycle
which I'm quite happy with now.
In the C++ code I've replaced the for
loop with
for (int i=0; i<loops/3; i++) {
mul1*=mul; mul2*=mul; mul3*=mul;
sum1+=add; sum2+=add; sum3+=add;
mul4*=mul; mul5*=mul; mul1*=mul;
sum4+=add; sum5+=add; sum1+=add;
mul2*=mul; mul3*=mul; mul4*=mul;
sum2+=add; sum3+=add; sum4+=add;
mul5*=mul; mul1*=mul; mul2*=mul;
sum5+=add; sum1+=add; sum2+=add;
mul3*=mul; mul4*=mul; mul5*=mul;
sum3+=add; sum4+=add; sum5+=add;
}
And the assembly now looks like
.L4:
mulsd xmm8, xmm3
mulsd xmm7, xmm3
mulsd xmm6, xmm3
addsd xmm13, xmm2
addsd xmm12, xmm2
addsd xmm11, xmm2
mulsd xmm5, xmm3
mulsd xmm1, xmm3
mulsd xmm8, xmm3
addsd xmm10, xmm2
addsd xmm9, xmm2
addsd xmm13, xmm2
...
Source: (StackOverflow)
What I want is not a comparison between Redis and MongoDB. I know they are different; the performance and the API is totally different.
Redis is very fast, but the API is very 'atomic'. MongoDB will eat more resources, but the API is very very easy to use, and I am very happy with it.
They're both awesome, and I want to use Redis in deployment as much as I can, but it is hard to code. I want to use MongoDB in development as much as I can, but it needs an expensive machine.
So what do you think about the use of both of them? When to pick Redis? When to pick MongoDB?
Source: (StackOverflow)
I am just getting a grasp on the MVC framework and I often wonder how much code should go in the model. I tend to have a data access class that has methods like this:
public function CheckUsername($connection, $username)
{
try
{
$data = array();
$data['Username'] = $username;
//// SQL
$sql = "SELECT Username FROM" . $this->usersTableName . " WHERE Username = :Username";
//// Execute statement
return $this->ExecuteObject($connection, $sql, $data);
}
catch(Exception $e)
{
throw $e;
}
}
My models tend to be an entity class that is mapped to the database table.
Should the model object have all the database mapped properties as well as the code above or is it OK to separate that code out that actually does the database work?
Will I end up having four layers?
Source: (StackOverflow)
I have been looking at game engine design (specifically focused on 2d game engines, but also applicable to 3d games), and am interested in some information on how to go about it. I have heard that many engines are moving to a component based design nowadays rather than the traditional deep-object hierarchy.
Do you know of any good links with information on how these sorts of designs are often implemented? I have seen evolve your hierarchy, but I can't really find many more with detailed information (most of them just seem to say "use components rather than a hierarchy" but I have found that it takes a bit of effort to switch my thinking between the two models).
Any good links or information on this would be appreciated, and even books, although links and detailed answers here would be preferred.
Source: (StackOverflow)
Could someone explain the difference between Software Design and Software Architecture?
More specifically; if you tell someone to present you the 'design' - what would you expect them to present? Same goes for 'architecture'.
My current understanding is:
- Design: UML diagram/flow chart/simple wireframes (for UI) for a specific module/part of the system
- Architecture: component diagram (showing how the different modules of the system communicates with each other and other systems), what language is to be used, patterns...?
Correct me if I'm wrong. I have referred Wikipedia has articles on http://en.wikipedia.org/wiki/Software_design and http://en.wikipedia.org/wiki/Software_architecture, but I'm not sure if I have understood them correctly.
Source: (StackOverflow)
I just watched the following video: Introduction to Node.js and still don't understand how you get the speed benefits.
Mainly, at one point Ryan Dahl (Node.js' creator) says that Node.js is event-loop based instead of thread-based. Threads are expensive and should only be left to the experts of concurrent programming to be utilized.
Later, he then shows the architecture stack of Node.js which has an underlying C implementation which has its own Thread pool internally. So obviously Node.js developers would never kick off their own threads or use the thread pool directly...they use async call-backs. That much I understand.
What I don't understand is the point that Node.js still is using threads...it's just hiding the implementation so how is this faster if 50 people request 50 files (not currently in memory) well then aren't 50 threads required?
The only difference being that since it's managed internally the Node.js developer doesn't have to code the threaded details but underneath it's still using the threads to process the IO (blocking) file requests.
So aren't you really just taking one problem (threading) and hiding it while that problem still exists: mainly multiple threads, context switching, dead-locks...etc?
There must be some detail I still do not understand here.
Source: (StackOverflow)
I just discovered that every request in an ASP.Net web application gets a Session lock at the beginning of a request, and then releases it at the end of the request!
In case the implications of this are lost on you, as it was for me at first, this basically means the following:
Anytime an ASP.Net webpage is taking a long time to load (maybe due to a slow database call or whatever), and the user decides they want to navigate to a different page because they are tired of waiting, THEY CANT! The ASP.Net session lock forces the new page request to wait until the original request has finished its painfully slow load. Arrrgh.
Anytime an UpdatePanel is loading slowly, and the user decides to navigate to a different page before the UpdadePanel has finished updating... THEY CANT! The ASP.Net session lock forces the new page request to wait until the original request has finished its painfully slow load. Double Arrrgh!
So what are the options? So far I have come up with:
- Implement a Custom SessionStateDataStore, which ASP.Net supports. I haven't found too many out there to copy, and it seems kind of high risk and easy to mess up.
- Keep track of all requests in progress, and if a request comes in from the same user, cancel the original request. Seems kind of extreme, but it would work (I think).
- Don't use Session! When I need some kind of state for the user, I could just use Cache instead, and key items on the authenticated user's name, or some such thing. Again seems kind of extreme.
I really can't believe that the ASP.Net Microsoft team would have left such a huge performance bottleneck in the framework at version 4.0! Am I missing something obvious? How hard would it be to use a ThreadSafe collection for the Session?
Source: (StackOverflow)
I am crafting a small project in mixed C and C++. I am building one small-ish state-machine at the heart of one of my worker thread.
I was wondering if you gurus on SO would share your state-machine design techniques.
NOTE: I am primarily after tried & tested implementation techniques.
UPDATED: Based on all the great input gathered on SO, I've settled on this architecture:
Source: (StackOverflow)
I read that Linux is a monolithic kernel. Does monolithic kernel mean compiling and linking the complete kernel code into an executable?
If Linux is able to support modules, why not break all the subsystems into modules and load them when necessary? In that case, the kernel doesn't have to load all modules initially and could maintain an index of the functions in the module and load them when necessary.
Source: (StackOverflow)
This question is not about when to use GET or POST in general; it is about which is the recommended one for handling logging out of a web application. I have found plenty of information on the differences between GET and POST in the general sense, but I did not find a definite answer for this particular scenario.
As a pragmatist, I'm inclined to use GET, because implementing it is way simpler than POST; just drop a simple link and you're done. This seems to be case with the vast majority of websites I can think of, at least from the top of my head. Even Stack Overflow handles logging out with GET.
The thing making me hesitate is the (albeit old) argument that some web accelerators/proxies pre-cache pages by going and retrieving every link they find in the page, so the user gets a faster response when she clicks on them. I'm not sure if this still applies, but if this was the case, then in theory a user with one of these accelerators would get kicked out of the application as soon as she logs in, because her accelerator would find and retrieve the logout link even if she never clicked on it.
Everything I have read so far suggest that POST should be used for "destructive actions", whereas actions that do not alter the internal state of the application -like querying and such- should be handled with GET. Based on this, the real question here is:
Is logging out of an application considered a destructive action/does it alter the internal state of the application?
Source: (StackOverflow)
I'm aware of questions like this, where people tend to discuss the general Symfony 2 concept of bundle.
The thing is, in a specific application, like, for instance, a twitter-like application, should everything really be inside a generic bundle, like the official docs say?
The reason I'm asking this is because when we develop applications, in general, we don't want to highly couple our code to some full-stack glue framework.
If I develop a Symfony 2 based application and, at some point, I decide Symfony 2 is not really the best choice to keep the development going, will that be a problem for me?
So the general question is: why is everything being a bundle a good thing?
EDIT#1
Almost a year now since I asked this question I wrote an article to share my knowledge on this topic.
Source: (StackOverflow)
I'm working on a large project (for me) which will have many classes and will need to be extensible, but I'm not sure how to plan out my program and how the classes need to interact.
I took an OOD course a few semesters back and learned a lot from it; like writing UML, and translating requirements documents into objects and classes. We learned sequence diagrams too but somehow I missed the lecture or something, they didn't really stick with me.
With previous projects I've tried using methods I learned from the course but usually end up with code that as soon as I can say "yeah that looks something like what I had in mind" i have no desire to dig through the muck to add new features.
I've got a copy of Steve McConnell's Code Complete which I continually hear is amazing, here and elsewhere. I read the chapter on design and didn't seem to come out with the information I'm looking for. I know he says that it's not a cut and dried process, that it's mostly based on heuristics, but I can't seem to take all his information and apply it to my projects.
So what are things you do during the high level design phase (before you begin programming) to determine what are the classes you need (especially ones not based on any 'real world objects') and how will they interact with each other?
Specifically I'm interested in what are the methods you use? What is the process you follow that usually yeilds a good, clean design that will closely represent the final product?
Source: (StackOverflow)