infrastructure interview questions
Top infrastructure frequently asked interview questions
How do VoIP services, such as Skype and Yahoo, connect to landlines?
We have a server connected to a landline using asterisk, so I'm thinking this server will bridge our VoIP conversation and connect it to a landline.
But if this is the case, wouldn't Skype need a lot of servers placed around the whole world just to connect to landlines?
Source: (StackOverflow)
I eager to know (and have to know) about the nutch and its algorithms (because it relates to my project) that it uses to fetch,classify,...(generally Crawling).
I read this material but its a little hard to understand.
Is there anyone who can explain this to me in a complete and easy-to-understand way?
thanks in advance.
Source: (StackOverflow)
Listening to Scott Hanselman's interview with the Stack Overflow team (part 1 and 2), he was adamant that the SQL server and application server should be on separate machines. Is this just to make sure that if one server is compromised, both systems aren't accessible? Do the security concerns outweigh the complexity of two servers (extra cost, dedicated network connection between the two, more maintenance, etc.), especially for a small application, where neither piece is using too much CPU or memory? Even with two servers, with one server compromised, an attacker could still do serious damage, either by deleting the database, or messing with the application code.
Why would this be such a big deal if performance isn't an issue?
Source: (StackOverflow)
UPDATE 2009-05-21
I've been testing the #2 method of using a single network share. It is resulting in some issues with Windows Server 2003 under load:
http://support.microsoft.com/kb/810886
end update
I've received a proposal for an ASP.NET website that works as follows:
Hardware load-balancer -> 4 IIS6 web servers -> SQL Server DB with failover cluster
Here's the problem...
We are choosing where to store the web files (aspx, html, css, images). Two options have been proposed:
1) Create identical copies of the web files on each of the 4 IIS servers.
2) Put a single copy of the web files on a network share accessible by the 4 web servers. The webroots on the 4 IIS servers will be mapped to the single network share.
Which is the better solution?
Option 2 obviously is simpler for deployments since it requires copying files to only a single location. However, I wonder if there will be scalability issues since four web servers are all accessing a single set of files. Will IIS cache these files locally? Would it hit the network share on every client request?
Also, will access to a network share always be slower than getting a file on a local hard drive?
Does the load on the network share become substantially worse if more IIS servers are added?
To give perspective, this is for a web site that currently receives ~20 million hits per month. At recent peak, it was receiving about 200 hits per second.
Please let me know if you have particular experience with such a setup. Thanks for the input.
UPDATE 2009-03-05
To clarify my situation - the "deployments" in this system are far more frequent than a typical web application. The web site is the front end for a back office CMS. Each time content is published in the CMS, new pages (aspx, html, etc) are automatically pushed to the live site. The deployments are basically "on demand". Theoretically, this push could happen several times within a minute or more. So I'm not sure it would be practical to deploy one web server at time. Thoughts?
Source: (StackOverflow)
I want to plan a schedule maintenance down time on one of my production asp.net website hosted on IIS windows server 2003.
I think this is the preferred behavior:
- All request to http://www.x.com including www.x.com/asb/asd/ will be redirected to a notification page (site is currently down. come back later)
- The maintenance will take around an hour. how do I ensure for having this redirection to maintenance page to have the least impact to SEO/google ranking
- Preferrably I want to be able to quietly test the production site before it goes back 'live'
- Preferrably I dont want to rely on pointing DNS elsewhere.
- To make it simple please pretend that I don't have any other hardware in front of the web servers (i.e load balancer, firewall etc)
An idea would be:
- to create another app on the same web server
- create httpmodule or httphandler to handle any url request and 302 redirect them to the maintenance page
Thanks
Source: (StackOverflow)
I'm working in a project, and we use ansible to create a deploy a cluster of servers.
One of the tasks that I've to implement, is to copy a local file to the remote host, only if that file exists locally.
Now I'm trying to solve this problem using this
- hosts: 127.0.0.1
connection: local
tasks:
- name: copy local filetocopy.zip to remote if exists
- shell: if [[ -f "../filetocopy.zip" ]]; then /bin/true; else /bin/false; fi;
register: result
- copy: src=../filetocopy.zip dest=/tmp/filetocopy.zip
when: result|success
Bu this is failing with the following message:
ERROR: 'action' or 'local_action' attribute missing in task "copy local filetocopy.zip to remote if exists"
I've tried to create this if with command task.
I've already tried to create this task with a local_action, but I couldn't make it work.
All samples that I've found, doesn't consider a shell into local_action, there are only samples of command, and neither of them have anything else then a command.
Is there a way to do this task using ansible?
Source: (StackOverflow)
Out of curiosity, I'm trying to get some simple async
/await
code to compile under .NET 3.5 Client Profile:
async void AwaitFoo()
{
await new Foo();
}
class Foo
{
public IFooAwaiter GetAwaiter() { … }
}
interface IFooAwaiter : System.Runtime.CompilerServices.INotifyCompletion
{
bool IsCompleted { get; }
void GetResult();
}
I'm perfectly aware that .NET 3.5 does not support this language feature, as expressed by this compilation error:
Cannot find all types required by the async
modifier. Are you targeting the wrong framework version, or missing a reference to an assembly?
I am also aware of the NuGet package Microsoft.Bcl.Async
, which does not have support for .NET 3.5.
Question: What is the minimum set of types & type members required for async
code to compile? Is this minimal set officially documented; and if so, where? (Note that I'm only interested in successful compilation, not execution.)
What I've got so far:
I've been trying to find this minimum set of types by experiment, which appears to be possible since the compiler reports required, but missing types one by one:
Predefined type System.Runtime.CompilerServices.IAsyncStateMachine
is not defined or imported.
Defining the reported type according to MSDN reference pages then leads to the next missing type being reported. I have so far:
System.Runtime.CompilerServices.IAsyncStateMachine
System.Runtime.CompilerServices.INotifyCompletion
(required by the example code above)
System.Threading.Tasks.CancellationToken
(required by Task
)
System.Threading.Tasks.TaskCreationOptions
(required by Task
)
System.Threading.Tasks.Task
At this point I stopped, since Task
has lots of members, but the compiler does not report exactly which members it requires; it just reports the type as a whole. I might therefore reproduce much more of the type definition than what is actually needed.
Source: (StackOverflow)
We're starting a web application using DDD and CQRS (using the ncqrs framework) and before we get started writing our own infrastructure class library, i wanted to see if any are already available.
I'd think at least some basic interfaces and common implementations for writing to the file system, sending emails, etc could be used in any project.
Source: (StackOverflow)
There is one controversy I see in using Web APIs (RESTful service) to access remote infrastracture. I would be grateful, if you could comment it. The recommendation coming from the article "RESTful Web Services vs. "Big" Web Services: Making the Right Architectural Decision" [1] is to use Web APIs rather for ad hoc integration (a la' mashup) and rapid prototyping. Empirical studies made in [2] shows these recommendation is followed in scenarious of re-using the existing information and functionality. However, re-using infrastructure with Web APIs does not fit well into the task of ad hoc integration. My impression is rather that infrastructure is usually re-used in scenarios where the resources I have do not scale well for the problem that I want to solve: large number of data, high bandwidth, high concurrency. Nevertheless, Amazon provides remote access to their infrastructure (storage space, message queueuing) both through:
- classical SOAP Web services (so called Big Web services) and
- light RESTful Web services (so called Web APIs).
Although there is nothing written whether the clients (described in case studies of Amazon Web Services) employ Big Web services or Web APIs, the fact that Amazon provides access to their infrastracture in form of Web APIs as an alternative must be meaningful.
Do you know what can be their motivation? Do you know any cases where people re-used infrastracture just for rapid prototyping? Or maybe for testing? In other words, if I would like to re-use infrastructure offered by the Amazon, which API style should I use SOAP or REST in what example situations?
EDIT: In this case as an infrastructure I meant: storage space, computational power, internet bandwidth. Thus I wonder whether such resources are re-used in ad hoc integration.
Cesare Pautasso, Olaf Zimmermann, Frank Leymann, RESTful Web Services vs. "Big" Web Services: Making the Right Architectural Decision, pp. 805-814, Jinpeng Huai, Robin Chen, Hsiao-Wuen Hon, Yunhao Liu, Wei-Ying Ma, Andrew Tomkins, Xiaodong Zhang (Ed.), Proceedings of the 17th International World Wide Web Conference, ACM Press, Beijing, China, April 2008.
Hartmann, Bjorn & Doorley, Scott & Klemmer, Scott R., Hacking, Mashing, Gluing: Understanding Opportunistic Design, IEEE Pervasive Computing , vol. 7, no. 3, 46-54 (2008).
Source: (StackOverflow)
Hope my question does not pose as too wide. So I try to frame my question not to get too similar answers to the question of this question.
Currently I have deploy my rails application on Linode. The service works fine, the price is reasonable. But there are those administering works time to time. I could live without those. Recently I started to be interested in other rails hosting services like Heroku or EngineYard, and there are other ones as well. The services they can provide seem to me to be fascinating. They promise to free us from administration things. Ok, I cannot maybe choose the database, but I can have a database which acts like a database, or can have schema-free DBs, or cluster. If I don't really want to care about the details and just want services necessary for provision of my service than I should not bother. But.. I am looking for buts and probably there are some things to consider. I find choosing the right infrastructure for the rails application (or any application) is crucial. These things come to my mind regarding choosing the right infrastructure or the infrastructure provider:
- simplicity to deploy
- pricing - I see huge two models here. Paying after processing power (EngineYard) or paying for machine configuration (Heroku)? When, which model applies better?
- migration - how simple is it to migrate the rails application and the data from one provider to another.
- additional services - like Heroku provides WebSolr, or monitoring of the rails application. Such things might be crucial or at least useful.
Which things are necessary to consider when I want to choose the - either my private, rented or a mixture - infrastructure? Is there any comparison of these things about the rails hosting services? Are there any sources to learn how to be able to better decide any design when, what kind of model applies the best?
Hope my question does not pose too wide, and can be answered on this forum within a reasonable boundary. I would like to find the way how to design the right cocktail of private infrastructure, VPS and rails hosting services. Thanks for suggestions.
Source: (StackOverflow)
I'm working with a start-up, mostly doing system administration and I've come across a some security issues that I'm not really comfortable with. I want to judge whether my expectations are accurate, so I'm looking for some insight into what others have done in this situation, and what risks/problems came up. In particular, how critical are measures like placing admin tools behind a vpn, regular security updates (OS and tools), etc.
Keep in mind that as this is a start-up, the main goal is to get as many features as possible out the door quickly, so I'll need as much justification as I can get to get the resources for security (i.e. downtime for upgrades, dev time for application security fixes).
Background Info:
- Application is LAMP as well as a custom java client-server.
- Over the next 3 months, I project about 10k anonymous visitors to the site and up to 1000 authenticated users.
- Younger audience (16-25) which is guaranteed to have an above average number of black-hats included.
Thanks in advance for your responses, and I'll welcome any related advice.
Source: (StackOverflow)
I'm familiar with the infrastructure or architecture of Cloudera:
Master Nodes include NameNode, SecondaryNameNode, JobTracker, and HMaster.
Slave Nodes include DataNode, TaskTracker, and HRegionServer.
Master nodes should all be on their own nodes (unless its a small cluster, than SecondaryNameNode, JobTracker, and HMaster may be combined, and even the NameNode if its a really small cluster).
Slave Nodes should always be colocated on the same node. The more slave nodes, the merrier.
SecondaryNameNode is a misnomer, unless you enable it for High Availability.
Does MapR maintain this setup? How is it similar and how is it different?
Source: (StackOverflow)
Does anybody use Scrum & Sprint for Infrastructure.
I'm struggling with the concept of a Sprint that never finishes i.e. a Network enhancement project.
Also any suggestions on how Item time can be built up to a Product Backlog, so that I can sanity check that resources are not overcommited on the sprint.
Source: (StackOverflow)
In the ears of working in multiple teams, I've met multiple infrastructure managers that instituted a policy of weekly server reboots. As a developer, I was always against the policy - it seems that this is a hack to work around software bugs and hardware instabilities, instead of correcting them.
What are the people's opinions, positive and negative points regarding the policy?
Source: (StackOverflow)
I am using Ansible
for some infrastructure management problem for my project. I achieved this task using a Linux client like say to copy a bin file from Ansible server and install it on a client machine. This involves tasks in my playbooks using normal Linux commands like ssh, scp, ./bin etc.,
Now I want to achieve the same in a windows client. I couldn't find any good documentation to try it out. If anyone of you have tried using Ansible with Windows client then it would be great if you could share the procedures or prototype or any piece of information to start with and progress further on my problem.
Source: (StackOverflow)