ext3 interview questions
Top ext3 frequently asked interview questions
I have been having problems mounting a pendrive on two different machines (one with Lubuntu 13.04, another with Lubuntu 12.04, if that is relevant). I have to chown -R all the time, e.g, at home I copy data to, and then if I want to use it at work, I have to chown -R on that machine to be able to write on it, and then when I bring it home, I have to again chown -R it before I can write on it.
Source: (StackOverflow)
We have several cron jobs that ftp proxy logs to a centralized server. These files can be rather large and take some time to transfer. Part of the requirement of this project is to provide a logging mechanism in which we log the success or failure of these transfers. This is simple enough.
My question is, is there a way to check if a file is currently being written to? My first solution was to just check the file size twice within a given timeframe and check the file size. But a co-worker said that there may be able to hook into the EXT3 file system via python and check the attributes to see if the file is currently being appended to. My Google-Fu came up empty.
Is there a module for EXT3 or something else that would allow me to check the state of a file? The server is running Fedora Core 9 with EXT3 file system.
Source: (StackOverflow)
I am writing a little application, which is writing jpeg images at a constant rate on a SD card.
I choose an EXT3 filesystem, but the same behaviour was observed with an EXT2 filesystem.
My writing loop looks like this :
get_image()
fwrite()
fsync()
Or like this :
get_image()
fopen()
fwrite()
fsync()
fclose()
I also display some timing statistics, and I can see my program is sometime blocked for several seconds.
The average rate is still good, because if I keep the incoming images into a fifo, then I will write many image in a short period of time after such a stall. Do you know if it is a problem with the OS or if it is related to the SD card itself ?
How could I move closer to realtime ? I don't need strong realtime, but being stalled for several seconds is not acceptable.
Some precision :
Yes it is necessary to fsync after every file, because I want the image to be on disk, not in some user or kernel buffer. Without fsyncing, I have much better throughoutput,
but still unacceptable stall. I don't think it is a buffer problem, since the first stall happens after 50 Mbytes have been written. And according to the man page, fsync is here precisely to ensure there is no data buffered.
Precision regarding the average write rate :
I am writing at a rate that is sustainable by the card I am using. If I pile incoming image while waiting for an fsync to complete, then after this stall the write transfer rate will increase and I will quickly go back to the average rate.
The average transfer rate is around 1.4 MBytes /s.
The systeme is a modern laptop running ubuntu 8.04 with stock kee (2.6.24.19)
Source: (StackOverflow)
(Not really a programming question, sorry)
I'm working on benchmarking various filesystems (most importantly: ext3) with various filesystem options (for instance: noatime, relatime etc.) for specific situations on a Linux box.
For raw filesystem benchmarks, I'm looking into bonnie and bonnie++.
What is the most useful way to use bonnie and bonnie++ to benchmark filesystems?
What are best practices with regard to filesystem benchmarking?
While we're at it: how do you mount your ext3 filesystems on your machines?
Source: (StackOverflow)
New to hadoop, only setup a 3 debian server cluster for practice.
I was researching best practices on hadoop and came across:
JBOD no RAID
Filesystem: ext3, ext4, xfs - none of that fancy COW stuff you see with zfs and btrfs
So I raise these questions...
Everywhere I read JBOD is better then RAID in hadoop, and that the best filesystems are xfs and ext3 and ext4. Aside from the filesystem stuff which totally makes sense why those are the best... how do you implement this JBOD? You will see my confusion if you do the google search your self, JBOD alludes to a linear appendage or combination of just a bunch of disks kind of like a logical volume well at least thats how some people explain it, but hadoop seems to want a JBOD that doesnt combine. No body expands on that...
- Question 1) What does everyone in the hadoop world mean by JBOD and how do you implement that?
- Question 2) Is it as simple as mounting each disk to a different directory is all?
- Question 3) Does that mean that hadoop runs best on a JBOD where each disk is simply mounted to a different directory?
Question 4) And then you just point hadoop at those data.dirs?
Question5)
I see JBODS going 2 ways, either each disk going to a seperate mount, or a linear concat of disks, which can be done mdadm --linear mode, or lvm i bet can do it too, so I dont see the big deal with that... And if thats the case, where mdadm --linear or lvm can be used because the JBOD people are refering to is this concat of disks, then which is the best way to "JBOD" or linearly concat disks for hadoop?
This is off topic, but can someone verify if this is correct as well? Filesystems that use cow, copy on write, like zfs and btrfs just slow down hadoop but not only that the cow implementation is a waste with hadoop.
Question 6) Why is COW and things like RAID a waste on hadoop?
I see it as if your system crashes and you use the cowness of if to restore it, by the time you restored your system, there have been so many changes to hdfs it will probably just consider that machine as faulty and it would be better to rejoin it from scratch (bring it up as a fresh new datanode)... Or how will the hadoop system see the older datanode? My guess is it wont think its old or new or even a datanode, it will just see it as garbage... Idk...
Question 7) What happens if hadoop sees a datanode that fell off the cluster and then the datanode comes back online with data slightly older? Is there an extent to how old the data has to be ??? how does this topic?
REASKING QUESTION 1 THRU 4
I just realized my question is so simple yet it's so hard for me to explain it that I had to split it up into 4 questions, and i still didn't get the answer I'm looking for, from what sounds like very smart individuals, so i must re-ask differently..
On paper I could easily or with a drawing... I'll attempt with words again..
If confused on what I am asking in the JBOD question...
** just wondering what kind of JBOD everyone keeps referring to in the hadoop world is all **
JBODs are defined differently with hadoop then in normal world and I want to know how the best way to implement hadoop is it on a concat of jbods(sda+sdb+sdc+sdd) or just leave the disks alone(sda,sdb,sdc,sdd)
I think the graphical representation below explain what I am asking best
(JBOD METHOD 1)
normal world: jbod is a concatination of disks - then if you were to use hadoop you would overlay the data.dir (Where hdfs virtualy sites) onto a directory inside this concat of disks, ALSO all of the disks would appear as 1... so if you had sda and sdb and sdc as your data disks in your node, you would make em appear as some entity1 (either with the hardware of the motherboard or mdadm or lvm) which is a linear concat of sda and sdb and sdc. you would then mount this entity1 to a folder in the Unix namespace like /mnt/jbod/ and then setup hadoop to run with in it.
TEXT SUMMARY: if disk 1 and disk2 and disk 3 were each 100gb and 200gb and 300gb big respectively then this jbod would be 600gb big, and hadoop from this node would gain 600gb in capacity
* TEXTO-GRAPHICAL OF LINEAR CONCAT OF DISKS BEING A JBOD:
* disk1 2 and 3 used for datanode for hadoop
* disk1 is sda 100gb
* disk2 is sdb 200gb
* disk3 is sdc 300gb
* sda + sdb + sdc = jbod of name entity1
* JBOD MADE ANYWAY - WHO CARES - THATS NOT MY QUESTION: maybe we made the jbod of entity1 with lvm, or mdadm using linear concat, or hardware jbod drivers which combine disks and show them to the operating system as entity1, it doesn't matter, either way its still a jbod
* This is the type of JBOD I am used to and I keep coming across when I google search JBOD
* cat /proc/partitions would show sda,sdb,sdc and entity1 OR if we used hardware jbod maybe sda and sdb and sdc would not show and only entity1 would show, again who cares how it shows
* mount entity1 to /mnt/entity1
* running "df" would show that entity1 is 100+200+300=600gb big
* we then setup hadoop to run its datanodes on /mnt/entity1 so that datadir property points at /mnt/entity1 and the cluster just gained 600gb of capacity
..the other perspective is this..
(JBOD METHOD 2)
in hadoop it seems to me they want every disk seperate. So I would mount disk sda and sdb and sdc in the unix namespace to /mnt/a and /mnt/b and /mnt/c... it seems from reading across the web lots of hadoop experts classify jbods as just that just a bunch of disks so to unix they would look like disks not a concat of the disks... and then of course i can combine then to become one entity either with logical volume manager (lvm) or mdadm (in a raid or linear fashion, linear prefered for jbod) ...... but...... nah lets not combine them because it seems in the hadoop world jbod is just a bunch of disks sitting by them selves...
if disk 1 and disk2 and disk 3 were each 100gb and 200gb and 300gb big respectively then each mount disk1->/mnt/a and disk2->/mnt/b and disk3->/mnt/c would each be 100gb and 200gb and 300gb big respectively, and hadoop from this node would gain 600gb in capacity
TEXTO-GRAPHICAL OF LINEAR CONCAT OF DISKS BEING A JBOD
* disk1 2 and 3 used for datanode for hadoop
* disk1 is sda 100gb
* disk2 is sdb 200gb
* disk3 is sdc 300gb
* WE DO NOT COMBINE THEM TO APPEAR AS ONE
* sda mounted to /mnt/a
* sdb mounted to /mnt/b
* sdc mounted to /mnt/c
* running a "df" would show that sda and sdb and sdc have the following sizes: 100,200,300 gb respectively
* we then setup hadoop via its config files to lay its hdfs on this node on the following "datadirs": /mnt/a and /mnt/b and /mnt/c.. gaining 100gb to the cluster from a, 200gb from b and 300gb from c... for a total gain of 600gb from this node... nobody using the cluster would tell the difference..
SUMMARY OF QUESTION
** Which method is everyone referring to is BEST PRACTICE for hadoop this combination jbod or a seperation of disks - which is still also a jbod according to online documentation? **
- Both cases would gain hadoop 600gb... its just 1. looks like a concat or one entity that is a combo of all the disks, which is what I always thought was a jbod... Or it will be like 2 where each disk in the system is mounted to different directory, end result is all the same to hadoop capacity wise... just wondering if this is the best way for performance
Source: (StackOverflow)
I am having a problem understanding how to find Block Group Descriptor table. In literature (D.Poirier: "The 2nd extended filesystem") is stated that block group descriptor is located in block right after superblock.
Now, when I look at first disk, with block size of 1024 bytes, structure is like this:
- MBR, 0-512 bytes
- Superblock, 1536-2560 bytes
- BG Descriptor, 2560 - ... bytes
And this structure is fine, because superblock starts with 3rd sector and BGD follows right after. However, when I look at second disk with block size of 4096 bytes, structure is like this:
- MBR, 0-512 bytes
- Superblock, 1536-2560 bytes
- BG Descriptor, 4608 - ... bytes
In this case, BGD is located 3072(?) bytes away from superblock. Could someone enlight me and tell me how exactly is BGD position determined, because I'm writing a program that reads and analyses ext structure, and I can't write a generic program that knows how to find BGD.
Source: (StackOverflow)
Where can I find the ext4 file system specifications?
(Unofficial drafts are fine, so long as they're reasonably up-to-date; if unavailable, ext3 would be fine too.)
Source: (StackOverflow)
ext3 has 3 journaling options: journal, ordered, and writeback. According to the wikipedia entry, these range from least risky to most risky for crash recovery. For some reason, Android's version of Linux only supports the latter two options, and defaults to writeback. (I'm running Froyo)
Is there a way to add support for journal mode? I'd like to do this on the /data partition, which is ext3, and also where most of the file writes happen. My device doesn't have a battery, so I need to make sure it's crash proof when someone disconnects power.
In case anyone is interested, the Linux options are defined in kernel/fs/ext3/Kconfig. The specific option is EXT3_DEFAULTS_TO_ORDERED.
Source: (StackOverflow)
I've run the following test I've created a folder containing 15'000 files of 400 bytes using this batch :
@ECHO off
SET times=15000
FOR /L %%i IN (1,1,%times%) DO (
fsutil file createnew filename%%i.txt 400
)
then I copy past it on my Windows Computer using this command :
robocopy LargeNumberOfFiles\ LargeNumberOfFiles2\
After it has completed I can see that the transfer rate was 915810 Bytes/sec this is less than 1 MB/s. It took me several seconds to copy 7 MBytes Please note that this is very slow.
I've tried the same with a folder with a single file of 50 Mbytes and the transfer rate is 1219512195 Bytes/sec. (yeah GB/s) instantaneous.
Why copying large number of files take so much time - ressources on a windows filesystem ?
Please note that I've tried to do the same on a linux system which runs on the same computer in a virtual machine (vmware player) with ext3 filesystem.
I use the cp command and the copy is instantaneous !
Please also note the following :
- no antivirus
- I've tested that behaviour on multiple windows computers (always ntfs) i always get comparable results (transfer rate under 1MB/s avg 7-8 seconds to copy 7 MBytes)
- I've tested on multiple linux ext3 system the copy is always instantaneous for that amount (15000 files of 400 bytes)
- The question is about understanding what makes windows filesystem so slow to copy large number of files compared to a linux one for instance.
Source: (StackOverflow)
I would like to simulate filesystem corruption for the purpose of testing how our embedded systems react to it and ultimately have them fail as gracefully as possible. We use different kinds of block device emulated flash storage for data which is modified often and unsuitable for storage in NAND/NOR.
Since I have a pretty good idea of how often data is modified in different parts of the file tree and where sensitive data is stored. I would like to inject errors in specific areas and not just randomly.
In cases of emergency we use fsck -y
as a sort of last resort in order to attempt to bring the system up and report that is in a very bad state. I would very much like to cause errors which would trigger fsck to attempt repairs in order to study the effect on the systems capability to come back up.
dd if=/dev/random
is not precise enough for my purpose since it can't easily be used to inject controlled errors. Are there any other tools or methods which fit my needs better or do I have to invent my own?
Source: (StackOverflow)
How can I create a file in ext3 filesystem
with a specific inode number?
(ex: I want to create a file with inode-number = 12253)
Source: (StackOverflow)
Why are there no good drivers for Windows for reading ext2/3/4 filesystems? Googling around indicates that there's 2 or 3 out there, but all of them have problems. Is there some technical inconsistency that makes it difficult to correctly code up something that would enable me to open up My Computer and work with an extN partition just like NTFS or FAT? I thought one of the benefits of open sources and standards was that problems like this should be solved fairly quickly.
Source: (StackOverflow)
I have a read only partition who's data is changing.
The change occurs on the first mount only. Subsequent mounts do not change the partition data.
Tried with ext3 and ext2 incase journalling was an issue ... no help.
Tried tune2fs with -c -1 -i 0 in order to disable updating timestamps or any other data that maybe touched by a check being executed ... no help
Normally I wouldn't care, but I need to hashsum this partition for data integrity purposes.
Source: (StackOverflow)
I am struggling to understand what is going on in a test I am running. Test is two shell scripts running on the same machine.
A:
#!/bin/bash
touch target;
for ((i=0; i < 1000; i=i+1)); do
echo "snafu$i" > $1/file$i;
mv -f $1/file$i $1/target;
done;
B:
#!/bin/bash
while(true);do
cat $1/target;
done
So I run A /ext3_dir, then run B /ext3_dir > out (so only errors go to std out).
This all works fine and as expected according to the POSIX spec for 'rename':
If the link named by the new argument exists, it shall be removed and old renamed to new. In this case, a link named new shall remain visible to other processes throughout the renaming operation and refer either to the file referred to by new or old before the operation began.
However if I add a hard-link to the temporary file before doing the move:
#!/bin/bash
touch target;
for ((i=0; i < 1000; i=i+1)); do
echo "snafu$i" > $1/file$i;
ln $1/file$i $1/link$i
mv -f $1/file$i $1/target;
done;
I get "No such file or directory" errors on the reading side - seemingly in contravention of the POSIX spec.
Can anyone shed any light on this behaviour? Is the test valid? I can't figure out why creating an extra link to the file I'm moving should impact the ability to read from the move destination.
Source: (StackOverflow)
This is the ouput of the blktrace. I could not understand what is "N 0 (00 ..) [multipathd]". I'm testing the write IO performance of the FS.
I have 2 doubts,
- N - is a action, but I dont find the usage of it in the blktrace.pdf.
- What is the difference between IOSTAT and BLKTRACE.
blktrace o/p:
8,128 7 11 85.638053443 4009 I N 0 (00 ..) [multipathd]
8,128 7 12 85.638054275 4009 D N 0 (00 ..) [multipathd]
8,128 2 88 89.861199377 5210 A W 384 + 8 <- (253,0) 384
8,128 2 89 89.861199876 5210 Q W 384 + 8 [i_worker_0]
8,128 2 90 89.861202645 5210 G W 384 + 8 [i_worker_0]
8,128 2 91 89.861204604 5210 P N [i_worker_0]
8,128 2 92 89.861205587 5210 I WA 384 + 8 [i_worker_0]
8,128 2 93 89.861210869 5210 D WA 384 + 8 [i_worker_0]
8,128 2 94 89.861499857 0 C WA 384 + 8 [0]
8,128 2 95 99.845910681 5230 A W 384 + 8 <- (253,0) 384
8,128 2 96 99.845911148 5230 Q W 384 + 8 [i_worker_20]
8,128 2 97 99.845913846 5230 G W 384 + 8 [i_worker_20]
8,128 2 98 99.845915910 5230 P N [i_worker_20]
8,128 2 99 99.845917081 5230 I WA 384 + 8 [i_worker_20]
8,128 2 100 99.845922597 5230 D WA 384 + 8 [i_worker_20]
Source: (StackOverflow)