EzDevInfo.com

raid interview questions

Top raid frequently asked interview questions

How to interrupt software raid resync?

I want to interrupt a running resync operation on a debian squeeze software raid. (This is the regular scheduled compare resync. The raid array is still clean in such a case. Do not confuse this with a rebuild after a disk failed and was replaced.)

How to stop this scheduled resync operation while it is running? Another raid array is "resync pending", because they all get checked on the same day (sunday night) one after another. I want a complete stop of this sunday night resyncing.

[Edit: sudo kill -9 1010 doesn't stop it, 1010 is the PID of the md2_resync process]

I would also like to know how I can control the intervals between resyncs and the remainig time till the next one.

[Edit2: What I did now was to make the resync go very slow, so it does not disturb anymore:

sudo sysctl -w dev.raid.speed_limit_max=1000

taken from http://www.cyberciti.biz/tips/linux-raid-increase-resync-rebuild-speed.html

During the night I will set it back to a high value, so the resync can terminate.

This workaround is fine for most situations, nonetheless it would be interesting to know if what I asked is possible. For example it does not seem to be possible to grow an array, while it is resyncing or resyncing "pending"]

Source: (StackOverflow)

Software vs hardware RAID performance and cache usage

I've been reading a lot on RAID controllers/setups and one thing that comes up a lot is how hardware controllers without cache offer the same performance as software RAID. Is this really the case?

I always thought that hardware RAID cards would offer better performance even without cache. I mean, you have dedicated hardware to perform the tasks. If that is the case what is the benefit of getting a RAID card that has no cache, something like a LSI 9341-4i that isn't exactly cheap.

Also if a performance gain is only possible with cache, is there a cache configuration that writes to disk right away but keeps data in cache for reading operations making a BBU not a priority?

Source: (StackOverflow)

Advertisements

Why is RAID 0 classed as RAID when it's not redundant?

I've worked in IT quite a number of years, so I know what a RAID array is, what RAID 0 is, RAID 1, 5, 6, 10, 50, 60, etc., but something sprung to mind in a recent conversation at work; if RAID stands for redundant array of independent (or inexpensive) disks, then why is RAID 0 classed as RAID at all and not just a striped array?

Having data striped across multiple disks on the one array offers no redundancy whatsoever so why is it classed as a RAID array? Surely the lowest number should be RAID 1 (mirrored) as that's when redundancy actually starts?

Source: (StackOverflow)

To improve SQL performance, why not just put lots of RAM rather than having faster hard disks?

People keep telling me that in order to improve an SQL server's performance, buy the fastest hard disks possible with RAID 5, etc.

So I was thinking, instead of spending all the money for RAID 5 and super-duper fast hard disks (which isn't cheap by the way), why not just get tonnes of RAM? We know that an SQL server loads the database into memory. Memory is wayyyy faster than any hard disks.

Why not stuff in like 100 GB of RAM on a server? Then just use a regular SCSI hard disk with RAID 1. Wouldn't that be a lot cheaper and faster?

Source: (StackOverflow)

File Server - Storage configuration: RAID vs LVM vs ZFS something else...?

We are a small company that does video editing, among other things, and need a place to keep backup copies of large media files and make it easy to share them.

I've got a box set up with Ubuntu Server and 4 x 500 GB drives. They're currently set up with Samba as four shared folders that Mac/Windows workstations can see fine, but I want a better solution. There are two major reasons for this:

500 GB is not really big enough (some projects are larger)
It is cumbersome to manage the current setup, because individual hard drives have different amounts of free space and duplicated data (for backup). It is confusing now and that will only get worse once there are multiple servers. ("the project is on sever2 in share4" etc)

So, I need a way to combine hard drives in such a way as to avoid complete data loss with the failure of a single drive, and so users see only a single share on each server. I've done linux software RAID5 and had a bad experience with it, but would try it again. LVM looks ok but it seems like no one uses it. ZFS seems interesting but it is relatively "new".

What is the most efficient and least risky way to to combine the hdd's that is convenient for my users?

Edit: The Goal here is basically to create servers that contain an arbitrary number of hard drives but limit complexity from an end-user perspective. (i.e. they see one "folder" per server) Backing up data is not an issue here, but how each solution responds to hardware failure is a serious concern. That is why I lump RAID, LVM, ZFS, and who-knows-what together.

My prior experience with RAID5 was also on an Ubuntu Server box and there was a tricky and unlikely set of circumstances that led to complete data loss. I could avoid that again but was left with a feeling that I was adding an unnecessary additional point of failure to the system.

I haven't used RAID10 but we are on commodity hardware and the most data drives per box is pretty much fixed at 6. We've got a lot of 500 GB drives and 1.5 TB is pretty small. (Still an option for at least one server, however)

I have no experience with LVM and have read conflicting reports on how it handles drive failure. If a (non-striped) LVM setup could handle a single drive failing and only loose whichever files had a portion stored on that drive (and stored most files on a single drive only) we could even live with that.

But as long as I have to learn something totally new, I may as well go all the way to ZFS. Unlike LVM, though, I would also have to change my operating system (?) so that increases the distance between where I am and where I want to be. I used a version of solaris at uni and wouldn't mind it terribly, though.

On the other end on the IT spectrum, I think I may also explore FreeNAS and/or Openfiler, but that doesn't really solve the how-to-combine-drives issue.

Source: (StackOverflow)

Do I need to RAID Fusion-io cards?

Can I run reliably with a single Fusion-io card installed in a server, or do I need to deploy two cards in a software RAID setup?

Fusion-io isn't very clear (almost misleading) on the topic when reviewing their marketing materials Given the cost of the cards, I'm curious how other engineers deploy them in real-world scenarios.

I plan to use the HP-branded Fusion-io ioDrive2 1.2TB card for a proprietary standalone database solution running on Linux. This is a single server setup with no real high-availability option. There is asynchronous replication with a 10-minute RPO that mirrors transaction logs to a second physical server.

Traditionally, I would specify a high-end HP ProLiant server with the top CPU stepping for this application. I need to go to SSD, and I'm able to acquire Fusion-io at a lower price than enterprise SAS SSD for the required capacity.

Do I need to run two ioDrive2 cards and join them with software RAID (md or ZFS), or is that unnecessary?
Should I be concerned about Fusion-io failure any more than I'd be concerned about a RAID controller failure or a motherboard failure?
System administrators like RAID. Does this require a different mindset, given the different interface and on-card wear-leveling/error-correction available in this form-factor?
What IS the failure rate of these devices?

Edit: I just read a Fusion-io reliability whitepaper from Dell, and the takeaway seems to be "Fusion-io cards have lots of internal redundancies... Don't worry about RAID!!".

Source: (StackOverflow)

Consumer (or prosumer) SSD's vs. fast HDD in a server environment

What are the pro's and con's of consumer SSDs vs. fast 10-15k spinning drives in a server environment? We cannot use enterprise SSDs in our case as they are prohibitively expensive. Here's some notes about our particular use case:

Hypervisor with 5-10 VM's max. No individual VM will be crazy i/o intensive.
Internal RAID 10, no SAN/NAS...

I know that enterprise SSDs:

are rated for longer lifespans
and perform more consistently over long periods

than consumer SSDs... but does that mean consumer SSDs are completely unsuitable for a server environment, or will they still perform better than fast spinning drives?

Since we're protected via RAID/backup, I'm more concerned about performance over lifespan (as long as lifespan isn't expected to be crazy low).

Source: (StackOverflow)

How do I differentiate "fake RAID" from real RAID?

The Ubuntu wiki page on FakeRaid says the following:

[A] number of hardware products ... claim to be IDE or SATA RAID controllers... Virtually none of these are true hardware RAID controllers. Instead, they are simply multi-channel disk controllers combined with special BIOS configuration options...

Is there a typical way to identify (from a product specification) whether a motherboard has "real" RAID, or are "real" RAID products generally unavailable to consumers?

Source: (StackOverflow)

What is better LVM on RAID or RAID on LVM?

I currently have LVM on software RAID, but I'd like to ask you what you think it is better solution, maybe some pros and cons?

Edit: It is about software raid on lvm or lvm on software raid. I know than hardware raid is better if we are thinking about performance.

Source: (StackOverflow)

Should I defrag my RAID volumes?

It seems to me that since RAID volumes are logical (as opposed to physical), the layout that the OS believes they have might not correspond to the actual phsyical layout.

So does defrag make sense for RAID?

Source: (StackOverflow)

Do raid controllers syncronize HDD platter rotation?

I'm in the market for a new storage solution. While researching various specs one of my coworkers said that some raid controllers can synchronize HDD rotation to the effect of all drives' sector/block 0 passes under the reading head at the same time.

I searched online but have not been able to find information proving/disproving this claim.

Source: (StackOverflow)

Should I 'run in' one disk of a new RAID 1 pair to decrease the chance of a similar failure time?

I'm setting up a RAID1 array of two new 4TB hard drives.

I heard somewhere previously, that making a RAID1 array of new identical hard drives bought at the same time, increased the chance that they would fail at a similar point in time.

I am therefore considering using one of the hard drives for a period of time (maybe a couple weeks) on its own, in an attempt to reduce the likelihood of both failing within a short amount of time. (the unused drive would be kept disconnected in a drawer)

Does this seem like a reasonable approach, or am I more likely just wasting my time?

Source: (StackOverflow)

Is bit rot on hard drives a real problem? What can be done about it?

A friend is talking with me about the problem of bit rot - bits on drives randomly flipping, corrupting data. Incredibly rare, but with enough time it could be a problem, and it's impossible to detect.

The drive wouldn't consider it to be a bad sector, and backups would just think the file has changed. There's no checksum involved to validate integrity. Even in a RAID setup, the difference would be detected but there would be no way to know which mirror copy is correct.

Is this a real problem? And if so, what can be done about it? My friend is recommending zfs as a solution, but I can't imagine flattening our file servers at work, putting on Solaris and zfs..

Source: (StackOverflow)

When using software RAID and LVM on Linux, which IO scheduler and readahead settings are honored?

In the case of multiple layers (physical drives -> md -> dm -> lvm), how do the schedulers, readahead settings, and other disk settings interact?

Imagine you have several disks (/dev/sda - /dev/sdd) all part of a software RAID device (/dev/md0) created with mdadm. Each device (including physical disks and /dev/md0) has its own setting for IO scheduler (changed like so) and readahead (changed using blockdev). When you throw in things like dm (crypto) and LVM you add even more layers with their own settings.

For example, if the physical device has a read ahead of 128 blocks and the RAID has a readahead of 64 blocks, which is honored when I do a read from /dev/md0? Does the md driver attempt a 64 block read which the physical device driver then translates to a read of 128 blocks? Or does the RAID readahead "pass-through" to the underlying device, resulting in a 64 block read?

The same kind of question holds for schedulers? Do I have to worry about multiple layers of IO schedulers and how they interact, or does the /dev/md0 effectively override underlying schedulers?

In my attempts to answer this question, I've dug up some interesting data on schedulers and tools which might help figure this out:

Source: (StackOverflow)

High Failure Rate of Large Drives?

I recently deployed a server with 5x 1TB drives (I won't mention their brand, but it was one of the big two). I was initially warned against getting large capacity drives, as a friend advised me that they have a very low MTBF, and I would be better getting more, smaller capacity drives as they are not 'being pushed to the limit' in terms of what the technology can handle.

Since then, three of the five disks have failed. Thankfully I was able to replace and rebuild the array before the next disk failed, but it's got me very very worried.

What are your thoughts? Did I just get them in a bad batch? Or are newer/higher capacity disks more likely to fail than tried and tested disks?

Source: (StackOverflow)