virtual-memory interview questions
Top virtual-memory frequently asked interview questions
I read in text books that the stack grows by decreasing memory address; that is, from higher address to lower address. It may be a bad question, but I didn't get the concept right. Can you explain?
Source: (StackOverflow)
I have a problem with a Java application running under Linux.
When I launch the application, using the default maximum heap size (64mb), I see using the tops application that 240 MB of virtual Memory are allocated to the application. This creates some issues with some other software on the computer, which is relatively resource-limited.
The reserved virtual memory will not be used anyway, as far as I understand, because once we reach the heap limit an OutOfMemoryError is thrown. I ran the same application under windows and I see that the Virtual Memory size and the Heap size are similar.
Is there anyway that I can configure the Virtual Memory in use for a Java process under Linux?
Edit 1: The problem is not the Heap. The problem is that if I set a Heap of 128M, for example, still linux allocates 210 MB of Virtual Memory, which is not needed, ever.**
Edit 2: Using ulimit -v
allows limiting the amount of virtual memory. If the size set is below 204 MB, then the application won't run even though it doesn't need 204MB, only 64MB. So I want to understand why java requires so much virtual memory. Can this be changed?
Edit 3: There are several other applications running in the system, which is embedded. And the system does have a virtual memory limit. (from comments, important detail)
Source: (StackOverflow)
Running a simple Java program on our production machine, I noticed that this program eats up more 10G virt. I know that virtual memory is not that relevant, but at least I would like to understand why this is needed.
public class Main {
public static void main(String[] args) {
System.out.println("Hello World!");
try {
Thread.sleep(10000);
} catch(InterruptedException e) {
/* ignored */
}
}
}
Heres what top
is saying when i run that little program:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
18764 myuser 20 0 10.2g 20m 8128 S 1.7 0.1 0:00.05 java
Does anyone know why this is happening?
uname -a says:
Linux m4fxhpsrm1dg 2.6.32-358.18.1.el6.x86_64 #1 SMP Fri Aug 2 17:04:38 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux
On an older 32bit-linux machine the same program consumes only about 1G virt. The old machine has 4GB RAM, the new one 32GB.
Source: (StackOverflow)
I want to create a program that will simulate an out-of-memory (OOM) situation on a Unix server. I created this super-simple memory eater:
#include <stdio.h>
#include <stdlib.h>
unsigned long long memory_to_eat = 1024 * 50000;
size_t eaten_memory = 0;
void *memory = NULL;
int eat_kilobyte()
{
memory = realloc(memory, (eaten_memory * 1024) + 1024);
if (memory == NULL)
{
// realloc failed here - we probably can't allocate more memory for whatever reason
return 1;
}
else
{
eaten_memory++;
return 0;
}
}
int main(int argc, char **argv)
{
printf("I will try to eat %i kb of ram\n", memory_to_eat);
int megabyte = 0;
while (memory_to_eat > 0)
{
memory_to_eat--;
if (eat_kilobyte())
{
printf("Failed to allocate more memory! Stucked at %i kb :(\n", eaten_memory);
return 200;
}
if (megabyte++ >= 1024)
{
printf("Eaten 1 MB of ram\n");
megabyte = 0;
}
}
printf("Successfully eaten requested memory!\n");
free(memory);
return 0;
}
It eats as much memory as defined in memory_to_eat
which now is exactly 50 GB of RAM. It allocates memory by 1 MB and prints exactly the point where it fails to allocate more, so that I know which maximum value it managed to eat.
The problem is that it works. Even on a system with 1 GB of physical memory.
When I check top I see that the process eats 50 GB of virtual memory and only less than 1 MB of resident memory. Is there a way to create a memory eater that really does consume it?
System specifications: Linux kernel 3.16 (Debian) most likely with overcommit enabled (not sure how to check it out) with no swap and virtualized.
Source: (StackOverflow)
So Belady's Anomaly states that when using a FIFO page replacement policy, when adding more page space we'll have more page faults.
My intuition says that we should less or at most, the same number of page faults as we add more page space.
If we think of a FIFO queue as a pipe, adding more page space is like making the pipe bigger:
____
O____O size 4
________
O________O size 8
So, why would you get more page faults? My intuition says that with a longer pipe, you'd take a bit longer to start having page faults (so, with an infinite pipe you'd have no page faults) and then you'd have just as many page faults and just as often as with a smaller pipe.
What is wrong with my reasoning?
Source: (StackOverflow)
What are the reasons a malloc() would fail, especially in 64 bit?
My specific problem is trying to malloc a huge 10GB chunk of RAM on a 64 bit system.
The machine has 12GB of RAM, and 32 GB of swap.
Yes, the malloc is extreme, but why would it be a problem? This is in Windows XP64 with both Intel and MSFT compilers. The malloc sometimes succeeds, sometimes doesn't, about 50%. 8GB mallocs always work, 20GB mallocs always fail. If a malloc fails, repeated requests won't work, unless I quit the process and start a fresh process again (which will then have the 50% shot at success). No other big apps are running. It happens even immediately after a fresh reboot.
I could imagine a malloc failing in 32 bit if you have used up the 32 (or 31) bits of address space available, such that there's no address range large enough to assign to your request.
I could also imagine malloc failing if you have used up your physical RAM and your hard drive swap space. This isn't the case for me.
But why else could a malloc fail? I can't think of other reasons.
I'm more interested in the general malloc question than my specific example, which I'll likely replace with memory mapped files anyway. The failed malloc() is just more of a puzzle than anything else... that desire to understand your tools and not be surprised by the fundamentals.
Source: (StackOverflow)
Once again, I find myself with a set of broken assumptions. The article itself is about a 10x performance gain by modifying a proven-optimal algorithm to account for virtual memory:
On a modern multi-issue CPU, running
at some gigahertz clock frequency, the
worst-case loss is almost 10 million
instructions per VM page fault. If you
are running with a rotating disk, the
number is more like 100 million
instructions.
What good is an O(log2(n)) algorithm
if those operations cause page faults
and slow disk operations? For most
relevant datasets an O(n) or even an
O(n^2) algorithm, which avoids page
faults, will run circles around it.
Are there more such algorithms around? Should we re-examine all those fundamental building blocks of our education? What else do I need to watch out for when writing my own?
Clarification:
The algorithm in question isn't faster than the proven-optimal one because the Big-O notation is flawed or meaningless. It's faster because the proven-optimal algorithm relies on an assumption that is not true in modern hardware/OSes, namely that all memory access is equal and interchangeable.
Source: (StackOverflow)
I have written a converter that takes openstreetmap xml files and converts them to a binary runtime rendering format that is typically about 10% of the original size. Input file sizes are typically 3gb and larger. The input files are not loaded into memory all at once, but streamed as points and polys are collected, then a bsp is run on them and the file is output. Recently on larger files it runs out of memory and dies (the one in question has 14million points and 1million polygons). Typically my program is using about 1gb to 1.2 gb of ram when this happens. I've tried increasing virtual memory from 2 to 8gb (on XP) but this change made no effect. Also, since this code is open-source I would like to have it work regardless of the available ram (albeit slower), it runs on Windows, Linux and Mac.
What techniques can I use to avoid having it run out of memory? Processing the data in smaller sub-sets and then merging the final results? Using my own virtual memory type of handler? Any other ideas?
Source: (StackOverflow)
Currently, I am trying to understand the value of splice/vmsplice. Regarding the use case of IPC, I stumbled upon the following answer on stackoverflow: http://stackoverflow.com/a/1350550/1305501
Question: How to transfer memory pages from one process to another process using vmsplice without copying data (i.e. zero-copy)?
The answer mentioned above claims that it is possible. However, it doesn't contain any source code. If I understand the documentation of vmsplice
correctly, the following function will transfer the memory pages into a pipe (kernel buffer) without copying, if the memory is properly allocated and aligned. Error handling omitted for the ease of presentation.
// data is aligned to page boundaries,
// and length is a multiple of the page size
void transfer_to_pipe(int pipe_out, char* data, size_t length)
{
size_t offset = 0;
while (offset < length) {
struct iovec iov { data + offset, length - offset };
offset += vmsplice(pipe_out, &iov, 1, SPLICE_F_GIFT);
}
}
But how can the memory pages be accessed from user space without copying? Apparently the following methods don't work:
vmsplice
: This function can also be used for the reverse direction. But according to the comments in the kernel sources, the data will be copied.
read
: I can imagine, that this function does some magic if the memory is properly aligned, but I doubt it.
mmap
: Not possible on pipe. But is there some kind of virtual file that can be used instead, i.e. splice
the memory pages to the virtual file and mmap
it?
- ... ?
Isn't it possible at all with vmsplice
?
Source: (StackOverflow)
I am a little confused about the terms physical/logical/virtual addresses in an Operating System(I use Linux- open SUSE)
Here is what I understand:
Physical Address- When the processor is in system mode, the address used by the processor is physical address.
Logical Address- When the processor is in user mode, the address used is the logical address. these are anyways mapped to some physical address by adding a base register with the offset value.It in a way provides a sort of memory protection.
I have come across discussion that virtual and logical addresses/address space are the same. Is it true?
Any help is deeply appreciated.
Source: (StackOverflow)
i would like to know the exact difference between Commit Size (visible in the Task Manager) and Virtual Size (visible in sysinternals Process Explorer).
The Virtual Size parameter in Process Explorer looks like a more accurate indicator of Total Virtual Memory usage by a process. However the Commit Size is always smaller than the Virtual Size and i guess it does not include all virtual memory in use by the process. I would like somebody to explain what is exactly included in these parameters.
Source: (StackOverflow)
Prologue
I am an operating system hobbyist, and my kernel runs on 80486+, and already supports virtual memory.
Starting from 80386, the x86 processor family by Intel and various clones thereof has supported virtual memory with paging. It is well known that when the PG
bit in CR0
is set, the processor uses virtual address translation. Then, the CR3
register points to the top-level page directory, that is the root for 2-4 levels of page table structures that map the virtual addresses to physical addresses.
The processor does not consult these tables for each virtual address generated, instead caching them in a structure called Translation Lookaside Buffer, or TLB. However, when changes to the page tables are made, the TLB needs to be flushed. On 80386 processors, this flush would be done by
reloading (MOV
) CR3
with the top level page directory address, or a task switch. This supposedly unconditionally flushes all the TLB entries. As I understand, it would be perfectly valid for a virtual memory system to always reload CR3 after any change.
This is wasteful, since the TLB would now throw out completely good entries, thus in 80486 processors the INVLPG
instruction was introduced. INVLPG
will invalidate the TLB entry matching the source operand address.
Yet starting with Pentium Pro, we also have global pages that are not flushed with the moves to CR3
or task switch; and AMD x86-64 ISA says that some upper level page table structures might be cached and not invalidated by INVLPG
. To get a coherent picture of what is needed and what is not needed on each ISA one would really need to download a 1000-page datasheet for a multitudes of ISAs released since 80s to read a couple pages therein, and even then the documents seem to be particularly vague as to the TLB invalidation and what happens if the TLB is not properly invalidated.
Question
For the simplicity, one can assume that we are talking about a uniprocessor system. Also, it can be assumed that no task-switch is required after changing the page structures. (thus INVLPG
is always supposedly at least as good choice as reloading the CR3
register).
The base assumption is that one would need to reload CR3
after each change to page tables and page directories, and such a system would be correct. However, if one wants to avoid flushing the TLB needlessly, one needs answers to the 2 questions:
Provided that INVLPG
is supported on the ISA, after what kind of changes can one safely use it instead of reloading the CR3
? E.g. "If one unmaps one page frame (set the corresponding table entry to not present), one can always use INVLPG
"?
What kind of changes one can do to the tables and directories without touching either CR3
or executing INVLPG
? E.g. "If a page is not mapped at all (not present), one can write a PTE with Present=1
for it without flushing the TLB at all"?
Even after reading a quite a load of ISA documents and everything related to INVLPG
here on Stack Overflow I am not personally sure of either examples I presented there. Indeed, one notable post stated it right away: "I don't know exactly when you should use it and when you shouldn't." Thus any certain, correct examples, preferably documented, and for either IA32 or x86-64, that you can give, are appreciated.
Source: (StackOverflow)
My application runs for few hours, There is no increase in any value ( vmsize, memory) of Task Manager. But after few hours i get out of memory errors.
In sysinternals i see that "Virtual Size" is contineously increasing, and when it reach around 2 GB i start getting memory errors.
So what kind of memory leak is that ?
How can i demonstrate it with a code ? Is it possible to reproduce same thing with any piece of code where none of the memory value increase but only the Virtual Size in sysinternsl process explorer increase ?
thanks for any suggestions
Source: (StackOverflow)
We ship a Java application whose memory demand can vary quite a lot depending on the size of the data it is processing. If you don't set the max VM (virtual memory) size, quite often
the JVM quits with an GC failure on big data.
What we'd like to see, is the JVM requesting more memory, as GC fails to provide enough, until the total available VM is exhausted. e.g., start with 128Mb, and increase geometrically (or some other step) whenever the GC failed.
The JVM ("Java") command line allows explicit setting of max VM sizes (various -Xm* commands), and you'd think that would be designed to be adequate. We try to do this in a .cmd file that we ship with the application. But if you pick any specific number,
you get one of two bad behaviors: 1) if your number is small enough to work on most
target systems (e.g., 1Gb), it isn't big enough for big data, or 2) if you make it very large, the JVM refuses to run on those systems whose actual VM is smaller than specified.
How does one set up Java to use the available VM when needed, without knowing that number in advance, and without grabbing it all on startup?
Source: (StackOverflow)
I would like to know architectures which violate the assumptions I've listed below. Also, I would like to know if any of the assumptions are false for all architectures (that is, if any of them are just completely wrong).
sizeof(int *) == sizeof(char *) == sizeof(void *) == sizeof(func_ptr *)
The in-memory representation of all pointers for a given architecture is the same regardless of the data type pointed to.
The in-memory representation of a pointer is the same as an integer of the same bit length as the architecture.
Multiplication and division of pointer data types are only forbidden by the compiler. NOTE: Yes, I know this is nonsensical. What I mean is - is there hardware support to forbid this incorrect usage?
All pointer values can be casted to a single integer. In other words, what architectures still make use of segments and offsets?
Incrementing a pointer is equivalent to adding sizeof(the pointed data type)
to the memory address stored by the pointer. If p
is an int32*
then p+1
is equal to the memory address 4 bytes after p
.
I'm most used to pointers being used in a contiguous, virtual memory space. For that usage, I can generally get by thinking of them as addresses on a number line. See Stack Overflow question Pointer comparison.
Source: (StackOverflow)