ext4 interview questions
Top ext4 frequently asked interview questions
I'm writing an app that needs to store lots of files up to approx 10 million.
They are presently named with a UUID and are going to be around 4MB each but always the same size. Reading and writing from/to these files will always be sequential.
2 main questions I am seeking answers for:
1) Which filesystem would be best for this. XFS or ext4?
2) Would it be necessary to store the files beneath subdirectories in order to reduce the numbers of files within a single directory?
For question 2, I note that people have attempted to discover the XFS limit for number of files you can store in a single directory and haven't found the limit which exceeds millions. They noted no performance problems. What about under ext4?
Googling around with people doing similar things, some people suggested storing the inode number as a link to the file instead of the filename for performance (this is in a database index. which I'm also using). However, I don't see a usable API for opening the file by inode number. That seemed to be more of a suggestion for improving performance under ext3 which I am not intending to use by the way.
What are the ext4 and XFS limits? What performance benefits are there from one over the other and could you see a reason to use ext4 over XFS in my case?
Source: (StackOverflow)
Is it safe to call rename(tmppath, path)
without calling fsync(tmppath_fd)
first?
I want the path to always point to a complete file.
I care mainly about Ext4. Is the rename() promised to be safe in all future Linux kernel versions?
A usage example in Python:
def store_atomically(path, data):
tmppath = path + ".tmp"
output = open(tmppath, "wb")
output.write(data)
output.flush()
os.fsync(output.fileno()) # The needed fsync().
output.close()
os.rename(tmppath, path)
Source: (StackOverflow)
How does the length of a filename affect remaining storage space on a disk?
I realize this is filesystem dependent. In particular I am thinking about the EXT series of file systems. I don't fully understand how inodes affect disk space and how the filename itself is stored. It's difficult to get relevant search results for this question too. That's why I'm asking here. On linux, the maximum file name length is usually 255 or 256 characters. When the file system is created, is that amount of space "reserved" for each and every file name? In other words, is disk storage not affected by the actual file name because the maximum is already used? Or is it more complicated than that?
Suppose, I have a file named "joe.txt" and rename it to "joe2.txt". Has the amount of available disk space decreased after this? What about longer names like "joe_version.txt" or "joe_original_version_with_bug_that_Jim_solved.txt"? I am worried about thresholds at 8, 16, 32, 64, etc characters. I will be storing millions of images. I have never bothered to worry about such an issue before so I'm not completely sure how this works.
Although EXT is the only filesystem I'm using, discussing FAT and others might be useful to somebody else that has a similar question.
Source: (StackOverflow)
Update: Turns out I was being very stupid. I was checking the modification time when I should be checking the access time. The reason it was not reproducible was that the test files were made with dd if=/dev/urandom of="$target" bs='1K' count=1 || exit 1
, which most of the time was too fast for the modification time (end of dd
) of the new files to be different from the access time (start time of dd
). Another thing to watch out for.
I'm working on a script to apply the access time of one file plus two years to another file. This uses stat -c %x
, date --rfc-3339=ns
and touch -a --date="$result"
. stat
and date
both output date strings with nanoseconds, like
2012-11-17 10:22:15.390351800+01:00
, and info coreutils 'touch invocation'
says it supports nanoseconds. But sometimes when applying touch there is a small difference between the timestamp applied and the one returned afterwards by stat. Here's data from an actual run:
$ for i in {1..100}; do ./t_timecopy.sh 2>/dev/null| grep ASSERT; done
ASSERT:Expecting same access time expected:<2012-11-17 10:58:40.719320935+01:00> but was:<2012-11-17 10:58:40.723322203+01:00>
ASSERT:Expecting same access time expected:<2012-11-17 11:00:04.342346275+01:00> but was:<2012-11-17 11:00:04.346358718+01:00>
ASSERT:Expecting same access time expected:<2012-11-17 11:00:39.343348183+01:00> but was:<2012-11-17 11:00:39.347351686+01:00>
ASSERT:Expecting same access time expected:<2012-11-17 11:01:08.655348312+01:00> but was:<2012-11-17 11:01:08.659347625+01:00>
ASSERT:Expecting same access time expected:<2012-11-17 11:01:37.930346876+01:00> but was:<2012-11-17 11:01:37.934347311+01:00>
ASSERT:Expecting same access time expected:<2012-11-17 11:02:16.939319832+01:00> but was:<2012-11-17 11:02:16.943323061+01:00>
ASSERT:Expecting same access time expected:<2012-11-17 11:02:46.456443149+01:00> but was:<2012-11-17 11:02:46.458379114+01:00>
ASSERT:Expecting same access time expected:<2012-11-17 11:03:15.487339595+01:00> but was:<2012-11-17 11:03:15.491341426+01:00>
ASSERT:Expecting same access time expected:<2012-11-17 11:04:04.646335863+01:00> but was:<2012-11-17 11:04:04.650346634+01:00>
ASSERT:Expecting same access time expected:<2012-11-17 11:04:14.410326608+01:00> but was:<2012-11-17 11:04:14.414331233+01:00>
ASSERT:Expecting same access time expected:<2012-11-17 11:04:24.159367348+01:00> but was:<2012-11-17 11:04:24.163352418+01:00>
ASSERT:Expecting same access time expected:<2012-11-17 11:04:33.931387953+01:00> but was:<2012-11-17 11:04:33.935350115+01:00>
ASSERT:Expecting same access time expected:<2012-11-17 11:05:03.394361030+01:00> but was:<2012-11-17 11:05:03.398320957+01:00>
ASSERT:Expecting same access time expected:<2012-11-17 11:05:42.054317430+01:00> but was:<2012-11-17 11:05:42.059106497+01:00>
ASSERT:Expecting same access time expected:<2012-11-17 11:06:40.346320820+01:00> but was:<2012-11-17 11:06:40.350346956+01:00>
ASSERT:Expecting same access time expected:<2012-11-17 11:08:17.194346778+01:00> but was:<2012-11-17 11:08:17.198338832+01:00>
ASSERT:Expecting same access time expected:<2012-11-17 11:08:27.102347603+01:00> but was:<2012-11-17 11:08:27.106320380+01:00>
ASSERT:Expecting same access time expected:<2012-11-17 11:09:16.247322948+01:00> but was:<2012-11-17 11:09:16.251347966+01:00>
ASSERT:Expecting same access time expected:<2012-11-17 11:09:55.191325266+01:00> but was:<2012-11-17 11:09:55.195320672+01:00>
ASSERT:Expecting same access time expected:<2012-11-17 11:12:09.915318301+01:00> but was:<2012-11-17 11:12:09.919334099+01:00>
ASSERT:Expecting same access time expected:<2012-11-17 11:12:28.906346914+01:00> but was:<2012-11-17 11:12:28.910348186+01:00>
So 21 out of 100 tests failed, with a mean of 3.938ms and a median of 4.001 ms. Any ideas what could cause this?
$ uname -a
Linux user 2.6.32-22-generic #33-Ubuntu SMP Wed Apr 28 13:27:30 UTC 2010 i686 GNU/Linux
Source: (StackOverflow)
I am creating a radioGroup in extJS 4 using xtype inside a FormPanel. I am trying to enable/disable a textfield as soon as the radio is checked.
{
xtype: 'radiogroup',
fieldLabel: 'Enable / Disable ',
columns: 2,
vertical: true,
items: [
{boxLabel: 'Enable', name: 'formtype', inputValue: '1'},
{boxLabel: 'Disable', name: 'formtype', inputValue:'2',checked:true},
]
}
I am confused where to add the listeners for check/click event. Thanks a ton in advance.
Source: (StackOverflow)
I've installed a simple LAMP system based on Debian 7.2.0 (32 bits). On my server I want to know when each of PHP files was used (accessed) by web server. When I check last access times of php files (with command ls -alu
), they are wrong.
I've found that it is because of relatime
option used for mounting of the root filesystem. I've tried to edit my /etc/fstab and to put norelatime,atime
options there but it does not work. My current /etc/fstab is:
UUID=d4bb10f1-1428-4ee4-916c-55e800263c3f / ext4 atime,norelatime,errors=remount-ro 0 1
UUID=6db7a3c7-6ff9-43ac-b959-5175039bb84b none swap sw 0 0
/dev/sr0 /media/cdrom0 udf,iso9660 user,noauto 0 0
/dev/fd0 /media/floppy0 auto rw,user,noauto 0 0
After a reboot, when I type mount, I get:
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs (rw,relatime,size=10240k,nr_inodes=127786,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=103240k,mode=755)
/dev/disk/by-uuid/d4bb10f1-1428-4ee4-916c-55e800263c3f on / type ext4 (rw,relatime,errors=remount-ro,user_xattr,barrier=1,data=ordered)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
tmpfs on /run/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=352700k)
All the partitions have relatime
option. Any help?
Source: (StackOverflow)
I'm currently patching Ext4 for academic purposes (only linux/fs/ext4/*, like file.c, ioctl.c, ext4.h) . I'm working on the QEMU virtual machine, and to speed up the whole process I've selected Ext4 to compile as a kernel module. The problem occurs when it comes to test new changes, as, even though I run make modules ARCH=x86 && make modules_install ARCH=x86
and reboot the machine (/ is Ext4), they are not visible unless I recompile the whole kernel. It's a little bit weird as I have a variety of signs that the Ext4 has been compiled as a module:
It is configured as that:
$ grep EXT4 .config
CONFIG_EXT4_FS=m
It does compile as a module:
$ make modules ARCH=x86
(...)
CC [M] fs/ext4/ioctl.o
LD [M] fs/ext4/ext4.o
Building modules, stage 2.
MODPOST 3 modules
LD [M] fs/ext4/ext4.ko
After $ make modules_install ARCH=x86
the files in /lib/modules/3.13.3/kernel/fs/ have proper time stamp.
Finally:
$ lsmod
Module Size Used by
ext4 340817 1
(...)
For some reason I have to do $ make all ARCH=x86
in order to see my changes appear in the runtime. What have I missed? Thanks!
Source: (StackOverflow)
#include <unistd.h>
#include <stdio.h>
void dump_log(int size){
char cmd[1024];
snprintf(cmd, sizeof(cmd)/sizeof(cmd[0]), "dd if=/dev/zero of=from.bin bs=1024 count=%d", size);
int ret = system(cmd);
if (ret<0){
perror("system");
}
}
int main(){
const char *filepath = "from.bin";
while(1){
dump_log(1024*100);
sleep(10);
unlink(filepath);
}
return 0;
}
strace -T ./a.out
show this:
unlink("from.bin") = 0 <0.019916>
unlink a file(100M) spend 19ms, what happen when unlink a file? why so slow?
system information:
linux 3.13.0-57-generic , Ubuntu 14.04.2 LTS, ext4
Source: (StackOverflow)
Assume we have a file of FILE_SIZE
bytes, and:
FILE_SIZE <= min(page_size, physical_block_size)
;
- file size never changes (i.e.
truncate()
or append write()
are never performed);
file is modified only by completly overwriting its contents using:
pwrite(fd, buf, FILE_SIZE, 0);
Is it guaranteed on ext4
that:
- Such writes are atomic with respect to concurrent reads?
Such writes are transactional with respect to a system crash?
(i.e., after a crash the file's contents is completely from some previous write and we'll never see a partial write or empty file)
Is the second true:
- with
data=ordered
?
with data=journal
or alternatively with journaling enabled for a single file?
(using ioctl(fd, EXT4_IOC_SETFLAGS, EXT4_JOURNAL_DATA_FL)
)
when physical_block_size < FILE_SIZE <= page_size
?
I've found related question which links discussion from 2011. However:
- I didn't find an explicit answer for my question
2
.
- I wonder, if the above is true, is it documented somewhere?
Source: (StackOverflow)
For debugging purposes, I want to open a file on a specific predefined block. For instance, if I suspect a specific block is damaged, I want to write and read from it, and I'd rather do that in user mode, while the partition is mounted.
Is there a way to tell Linux, "hey! open this new file on block 4579 if it's free".
Yes, I can edit the block device directly, but that would likely to trash the filesystem if the drive is mounted.
Generic answers are welcomed, but even answer for the ext filesystems families is good enough.
Source: (StackOverflow)
I know that my question has an answer here: QFile seek performance. But I am not completely satisfied with the answer. Even after looking at the following implementation of generic_file_llseek()
for ext4, I can't seem to understand how can the complexity be measured.
/**
* generic_file_llseek - generic llseek implementation for regular files
* @file: file structure to seek on
* @offset: file offset to seek to
* @origin: type of seek
*
* This is a generic implemenation of ->llseek useable for all normal local
* filesystems. It just updates the file offset to the value specified by
* @offset and @origin under i_mutex.
*/
loff_t generic_file_llseek(struct file *file, loff_t offset, int origin)
{
loff_t rval;
mutex_lock(&file->f_dentry->d_inode->i_mutex);
rval = generic_file_llseek_unlocked(file, offset, origin);
mutex_unlock(&file->f_dentry->d_inode->i_mutex);
return rval;
}
/**
* generic_file_llseek_unlocked - lockless generic llseek implementation
* @file: file structure to seek on
* @offset: file offset to seek to
* @origin: type of seek
*
* Updates the file offset to the value specified by @offset and @origin.
* Locking must be provided by the caller.
*/
loff_t
generic_file_llseek_unlocked(struct file *file, loff_t offset, int origin)
{
struct inode *inode = file->f_mapping->host;
switch (origin) {
case SEEK_END:
offset += inode->i_size;
break;
case SEEK_CUR:
/*
* Here we special-case the lseek(fd, 0, SEEK_CUR)
* position-querying operation. Avoid rewriting the "same"
* f_pos value back to the file because a concurrent read(),
* write() or lseek() might have altered it
*/
if (offset == 0)
return file->f_pos;
break;
}
if (offset < 0 || offset > inode->i_sb->s_maxbytes)
return -EINVAL;
/* Special lock needed here? */
if (offset != file->f_pos) {
file->f_pos = offset;
file->f_version = 0;
}
return offset;
}
Say, for example, I have a 4GB file, and I know the offset for the middle portion in the file, how exactly does a lseek()
get me there without traversing the entire file? Does the OS already know where each byte of the file resides?
Source: (StackOverflow)
What is the largest number of tables that can be within a single pgsql database while still retaining good performance, given that pgsql stores 1 file per table on the filesystem and searches the pg_catalog for every query to do query planning?
EG: Can pgsql deal with 1 million tables within a single database? Assume that the filesystem used is ext4 and each table contained very little data, so the overage disk storage size isn't an issue. The issue really comes from (1) impact of having 1 million files on the filesystem and (2) impact of having 1 million entries in pg_catalog.
From this thread (2005), http://postgresql.1045698.n5.nabble.com/GENERAL-Maximum-number-of-tables-per-database-and-slowness-td1853836.html - it is said below (but I do not how much of this is still applicable these days):
Benjamin Arai wrote:
What is the current maximum number of tables per database? Also, does
having more tables slow down performance in any way?
For most cases, the answer is no. However, once you get near 6 figure
table counts, pg_catalog ends up being pretty massive. The problem is
that the query planner must check pg_catalog for every query to see what
indexes are available, what the statistics & value distributions are,
etc. in order to build the optimal plan. At some point, a really large
pg_catalog can begin to bog down your system.
...
William Yu <[hidden email]> writes:
Benjamin Arai wrote:
What is the current maximum number of tables per database? Also, does
having more tables slow down performance in any way?
For most cases, the answer is no. However, once you get near 6 figure
table counts, pg_catalog ends up being pretty massive.
You also have to think about the performance implications of having tens
of thousands of files in your database directory. While some newer
filesystems aren't fazed by that particularly, a lot of 'em bog down on
lookups when there are more than a few thousand entries in a directory.
Source: (StackOverflow)
This may sound noobish, especially as I'm ( as you may have guessed ) trying to write an Operating System. At the moment I'm stuck on trying to make a file system.
What I want is a similar file system as Linux Ubuntu which is EXT4 ( at least mine is ). I want to try and also either write it in C.
Any idea's on how I can go about this? And/or any tutorials that you might have found that may help me ( I have tried searching with no luck ) :L
Thanks in advance!
Jamie.
Source: (StackOverflow)
I have a development tree on a Linux Ubuntu 14.04-LTS machine like this, with three identical branches:
main -+-- leonardo --- project --- htdocs -+- panel --- index.php
| |
| +- config.php
|
+-- federico --- project --- htdocs -+- panel --- index.php
| |
| +- config.php
|
+-- carlo ------ project --- htdocs -+- panel --- index.php
| |
| +- config.php
..... (you get my drift).
There are neither soft links nor hard links. The config.php
file is in svn-ignore and is different between all branches
There is an Apache server and there is a virtualHost for each developer, so I can see my development version at http://leonardo.project.local or Federico's at http://federico.project.local .
While investigating the current weirdness, the two files are these:
<?php // this is panel/index.php
echo "I am " . __FILE__ . "\n";
echo "I will include " . realpath('../config.php') . "\n";
require_once '../config.php';
<?php // this is config.php
echo "I am " . __FILE__ . "\n";
exit();
The expected output of course would be:
I am leonardo/project/htdocs/panel/index.php
I will include /var/www/main/leonardo/project/htdocs/config.php
I am leonardo/project/htdocs/config.php
But the actual output is:
I am leonardo/project/htdocs/panel/index.php
I will include /var/www/main/leonardo/project/htdocs/config.php
I am federico/project/htdocs/config.php
The additional weirdness is that
echo "I will include " . realpath('../config.php') . "\n";
require_once realpath('../config.php');
works.
TL;DR require_once
and realpath
disagree about where '../config.php' actually is.
The really strange thing is that I do not see how a script running in leonardo/project/htdocs/panel/
could know about federico/project/htdocs/config.php
; it ought to go four directories up, then explore very many subdirectories.
I'm almost beginning to suspect that this could be something filesystem- or even kernel- related.
The filesystem is ext4
, the kernel is 3.13.0-55-generic #92-Ubuntu SMP Sun Jun 14 18:32:20 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
. The machine is a virtual x64 on the latest VMware Workstation.
Checks
- PHP's
include_path
only includes .
and /usr/local/php5/pear
.
- as stated earlier, no files in the branch are symlinks, and the inode counts for all involved files indicate there are no cross links. The files are indeed different.
- all files are really there, it's not a "last ditch include".
- from command line, in leonardo...panel, I run "cat ../config.php" and I get my config.php, as expected. It is only from PHP that the wrong file gets included.
- restarting Apache (just in case) availed nothing. I'll next try and reboot the whole VM, but to do that I need to freeze several services and it will take me a while.
- everything was hunky dory up to yesterday (I wasn't here then). There were no system updates, no reboots, and not even remote logins in the last three days. Uptime is now eight days.
- I'm an idiot: I can too know to the minute when this started happening by checking the integration test logs. Have asked for them, expecting them after lunch.
Source: (StackOverflow)
I am puzzled by the following sequence of commands.
sh-4.2$ pwd
/home/willard
sh-4.2$ ls -l f
-rwxr-xr-x 1 willard users 59116 Jan 23 14:54 f
sh-4.2$ file f
f: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15, BuildID[sha1]=0xea0e08ff2b5a062698d45b78177acdd6bf140d1f, stripped
sh-4.2$ ./f
sh: ./f: No such file or directory
sh-4.2$ strace ./f
execve("./f", ["./f"], [/* 32 vars */]) = -1 ENOENT (No such file or directory)
write(2, "strace: exec: No such file or di"..., 40strace: exec: No such file or directory
) = 40
exit_group(1) = ?
+++ exited with 1 +++
sh-4.2$ ls -l f
-rwxr-xr-x 1 willard users 59116 Jan 23 14:54 f
sh-4.2$ uname -a
Linux xdat10 3.6.2-1-ARCH #1 SMP PREEMPT Fri Oct 12 23:58:58 CEST 2012 x86_64 GNU/Linux
How is this possible?
Source: (StackOverflow)