EzDevInfo.com

virtual-memory interview questions

Top virtual-memory frequently asked interview questions

Why does the stack address grow towards decreasing memory addresses?

I read in text books that the stack grows by decreasing memory address; that is, from higher address to lower address. It may be a bad question, but I didn't get the concept right. Can you explain?

Source: (StackOverflow)

Virtual Memory Usage from Java under Linux, too much memory used

I have a problem with a Java application running under Linux.

When I launch the application, using the default maximum heap size (64mb), I see using the tops application that 240 MB of virtual Memory are allocated to the application. This creates some issues with some other software on the computer, which is relatively resource-limited.

The reserved virtual memory will not be used anyway, as far as I understand, because once we reach the heap limit an OutOfMemoryError is thrown. I ran the same application under windows and I see that the Virtual Memory size and the Heap size are similar.

Is there anyway that I can configure the Virtual Memory in use for a Java process under Linux?

Edit 1: The problem is not the Heap. The problem is that if I set a Heap of 128M, for example, still linux allocates 210 MB of Virtual Memory, which is not needed, ever.**

Edit 2: Using ulimit -v allows limiting the amount of virtual memory. If the size set is below 204 MB, then the application won't run even though it doesn't need 204MB, only 64MB. So I want to understand why java requires so much virtual memory. Can this be changed?

Edit 3: There are several other applications running in the system, which is embedded. And the system does have a virtual memory limit. (from comments, important detail)

Source: (StackOverflow)

A simple "Hello World" needs 10G virtual memory on a 64-bit machine vs 1G at 32-bit?

Running a simple Java program on our production machine, I noticed that this program eats up more 10G virt. I know that virtual memory is not that relevant, but at least I would like to understand why this is needed.

public class Main {
  public static void main(String[] args) {
        System.out.println("Hello World!");
        try {
                Thread.sleep(10000);
        } catch(InterruptedException e) {
                /* ignored */
        }
  }
}

Heres what top is saying when i run that little program:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
18764 myuser    20   0 10.2g  20m 8128 S  1.7  0.1   0:00.05 java

Does anyone know why this is happening?

uname -a says:

Linux m4fxhpsrm1dg 2.6.32-358.18.1.el6.x86_64 #1 SMP Fri Aug 2 17:04:38 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux

On an older 32bit-linux machine the same program consumes only about 1G virt. The old machine has 4GB RAM, the new one 32GB.

Source: (StackOverflow)

Why doesn't this memory eater really eat memory?

I want to create a program that will simulate an out-of-memory (OOM) situation on a Unix server. I created this super-simple memory eater:

#include <stdio.h>
#include <stdlib.h>

unsigned long long memory_to_eat = 1024 * 50000;
size_t eaten_memory = 0;
void *memory = NULL;

int eat_kilobyte()
{
    memory = realloc(memory, (eaten_memory * 1024) + 1024);
    if (memory == NULL)
    {
        // realloc failed here - we probably can't allocate more memory for whatever reason
        return 1;
    }
    else
    {
        eaten_memory++;
        return 0;
    }
}

int main(int argc, char **argv)
{
    printf("I will try to eat %i kb of ram\n", memory_to_eat);
    int megabyte = 0;
    while (memory_to_eat > 0)
    {
        memory_to_eat--;
        if (eat_kilobyte())
        {
            printf("Failed to allocate more memory! Stucked at %i kb :(\n", eaten_memory);
            return 200;
        }
        if (megabyte++ >= 1024)
        {
            printf("Eaten 1 MB of ram\n");
            megabyte = 0;
        }
    }
    printf("Successfully eaten requested memory!\n");
    free(memory);
    return 0;
}

It eats as much memory as defined in memory_to_eat which now is exactly 50 GB of RAM. It allocates memory by 1 MB and prints exactly the point where it fails to allocate more, so that I know which maximum value it managed to eat.

The problem is that it works. Even on a system with 1 GB of physical memory.

When I check top I see that the process eats 50 GB of virtual memory and only less than 1 MB of resident memory. Is there a way to create a memory eater that really does consume it?

System specifications: Linux kernel 3.16 (Debian) most likely with overcommit enabled (not sure how to check it out) with no swap and virtualized.

Source: (StackOverflow)

Can't understand Belady's anomaly

So Belady's Anomaly states that when using a FIFO page replacement policy, when adding more page space we'll have more page faults.

My intuition says that we should less or at most, the same number of page faults as we add more page space.

If we think of a FIFO queue as a pipe, adding more page space is like making the pipe bigger:

 ____
O____O  size 4

 ________
O________O  size 8

So, why would you get more page faults? My intuition says that with a longer pipe, you'd take a bit longer to start having page faults (so, with an infinite pipe you'd have no page faults) and then you'd have just as many page faults and just as often as with a smaller pipe.

What is wrong with my reasoning?

Source: (StackOverflow)

64 bit large mallocs

What are the reasons a malloc() would fail, especially in 64 bit?

My specific problem is trying to malloc a huge 10GB chunk of RAM on a 64 bit system. The machine has 12GB of RAM, and 32 GB of swap. Yes, the malloc is extreme, but why would it be a problem? This is in Windows XP64 with both Intel and MSFT compilers. The malloc sometimes succeeds, sometimes doesn't, about 50%. 8GB mallocs always work, 20GB mallocs always fail. If a malloc fails, repeated requests won't work, unless I quit the process and start a fresh process again (which will then have the 50% shot at success). No other big apps are running. It happens even immediately after a fresh reboot.

I could imagine a malloc failing in 32 bit if you have used up the 32 (or 31) bits of address space available, such that there's no address range large enough to assign to your request.

I could also imagine malloc failing if you have used up your physical RAM and your hard drive swap space. This isn't the case for me.

But why else could a malloc fail? I can't think of other reasons.

I'm more interested in the general malloc question than my specific example, which I'll likely replace with memory mapped files anyway. The failed malloc() is just more of a puzzle than anything else... that desire to understand your tools and not be surprised by the fundamentals.

Source: (StackOverflow)

Algorithms for modern hardware?

Once again, I find myself with a set of broken assumptions. The article itself is about a 10x performance gain by modifying a proven-optimal algorithm to account for virtual memory:

On a modern multi-issue CPU, running at some gigahertz clock frequency, the worst-case loss is almost 10 million instructions per VM page fault. If you are running with a rotating disk, the number is more like 100 million instructions.

What good is an O(log2(n)) algorithm if those operations cause page faults and slow disk operations? For most relevant datasets an O(n) or even an O(n^2) algorithm, which avoids page faults, will run circles around it.

Are there more such algorithms around? Should we re-examine all those fundamental building blocks of our education? What else do I need to watch out for when writing my own?

Clarification:

The algorithm in question isn't faster than the proven-optimal one because the Big-O notation is flawed or meaningless. It's faster because the proven-optimal algorithm relies on an assumption that is not true in modern hardware/OSes, namely that all memory access is equal and interchangeable.

Source: (StackOverflow)

How to avoid running out of memory in high memory usage application? C / C++

I have written a converter that takes openstreetmap xml files and converts them to a binary runtime rendering format that is typically about 10% of the original size. Input file sizes are typically 3gb and larger. The input files are not loaded into memory all at once, but streamed as points and polys are collected, then a bsp is run on them and the file is output. Recently on larger files it runs out of memory and dies (the one in question has 14million points and 1million polygons). Typically my program is using about 1gb to 1.2 gb of ram when this happens. I've tried increasing virtual memory from 2 to 8gb (on XP) but this change made no effect. Also, since this code is open-source I would like to have it work regardless of the available ram (albeit slower), it runs on Windows, Linux and Mac.

What techniques can I use to avoid having it run out of memory? Processing the data in smaller sub-sets and then merging the final results? Using my own virtual memory type of handler? Any other ideas?

Source: (StackOverflow)

Linux Zero-Copy: Transfer memory pages between two processes with vmsplice

Currently, I am trying to understand the value of splice/vmsplice. Regarding the use case of IPC, I stumbled upon the following answer on stackoverflow: http://stackoverflow.com/a/1350550/1305501

Question: How to transfer memory pages from one process to another process using vmsplice without copying data (i.e. zero-copy)?

The answer mentioned above claims that it is possible. However, it doesn't contain any source code. If I understand the documentation of vmsplice correctly, the following function will transfer the memory pages into a pipe (kernel buffer) without copying, if the memory is properly allocated and aligned. Error handling omitted for the ease of presentation.

// data is aligned to page boundaries,
// and length is a multiple of the page size
void transfer_to_pipe(int pipe_out, char* data, size_t length)
{
    size_t offset = 0;
    while (offset < length) {
        struct iovec iov { data + offset, length - offset };
        offset += vmsplice(pipe_out, &iov, 1, SPLICE_F_GIFT);
    }
}

But how can the memory pages be accessed from user space without copying? Apparently the following methods don't work:

vmsplice: This function can also be used for the reverse direction. But according to the comments in the kernel sources, the data will be copied.
read: I can imagine, that this function does some magic if the memory is properly aligned, but I doubt it.
mmap: Not possible on pipe. But is there some kind of virtual file that can be used instead, i.e. splice the memory pages to the virtual file and mmap it?
... ?

Isn't it possible at all with vmsplice?

Source: (StackOverflow)

Difference between physical/logical/virtual memory address

I am a little confused about the terms physical/logical/virtual addresses in an Operating System(I use Linux- open SUSE)

Here is what I understand:

Physical Address- When the processor is in system mode, the address used by the processor is physical address.
Logical Address- When the processor is in user mode, the address used is the logical address. these are anyways mapped to some physical address by adding a base register with the offset value.It in a way provides a sort of memory protection.
I have come across discussion that virtual and logical addresses/address space are the same. Is it true?

Any help is deeply appreciated.

Source: (StackOverflow)

Windows - Commit Size vs Virtual Size

i would like to know the exact difference between Commit Size (visible in the Task Manager) and Virtual Size (visible in sysinternals Process Explorer).

The Virtual Size parameter in Process Explorer looks like a more accurate indicator of Total Virtual Memory usage by a process. However the Commit Size is always smaller than the Virtual Size and i guess it does not include all virtual memory in use by the process. I would like somebody to explain what is exactly included in these parameters.

Source: (StackOverflow)

When to do or not do INVLPG, MOV to CR3 to minimize TLB flushing

Prologue

I am an operating system hobbyist, and my kernel runs on 80486+, and already supports virtual memory.

Starting from 80386, the x86 processor family by Intel and various clones thereof has supported virtual memory with paging. It is well known that when the PG bit in CR0 is set, the processor uses virtual address translation. Then, the CR3 register points to the top-level page directory, that is the root for 2-4 levels of page table structures that map the virtual addresses to physical addresses.

The processor does not consult these tables for each virtual address generated, instead caching them in a structure called Translation Lookaside Buffer, or TLB. However, when changes to the page tables are made, the TLB needs to be flushed. On 80386 processors, this flush would be done by reloading (MOV) CR3 with the top level page directory address, or a task switch. This supposedly unconditionally flushes all the TLB entries. As I understand, it would be perfectly valid for a virtual memory system to always reload CR3 after any change.

This is wasteful, since the TLB would now throw out completely good entries, thus in 80486 processors the INVLPG instruction was introduced. INVLPG will invalidate the TLB entry matching the source operand address.

Yet starting with Pentium Pro, we also have global pages that are not flushed with the moves to CR3 or task switch; and AMD x86-64 ISA says that some upper level page table structures might be cached and not invalidated by INVLPG. To get a coherent picture of what is needed and what is not needed on each ISA one would really need to download a 1000-page datasheet for a multitudes of ISAs released since 80s to read a couple pages therein, and even then the documents seem to be particularly vague as to the TLB invalidation and what happens if the TLB is not properly invalidated.

Question

For the simplicity, one can assume that we are talking about a uniprocessor system. Also, it can be assumed that no task-switch is required after changing the page structures. (thus INVLPG is always supposedly at least as good choice as reloading the CR3 register).

The base assumption is that one would need to reload CR3 after each change to page tables and page directories, and such a system would be correct. However, if one wants to avoid flushing the TLB needlessly, one needs answers to the 2 questions:

Provided that INVLPG is supported on the ISA, after what kind of changes can one safely use it instead of reloading the CR3? E.g. "If one unmaps one page frame (set the corresponding table entry to not present), one can always use INVLPG"?
What kind of changes one can do to the tables and directories without touching either CR3 or executing INVLPG? E.g. "If a page is not mapped at all (not present), one can write a PTE with Present=1 for it without flushing the TLB at all"?

Even after reading a quite a load of ISA documents and everything related to INVLPG here on Stack Overflow I am not personally sure of either examples I presented there. Indeed, one notable post stated it right away: "I don't know exactly when you should use it and when you shouldn't." Thus any certain, correct examples, preferably documented, and for either IA32 or x86-64, that you can give, are appreciated.

Source: (StackOverflow)

What is "Virtual Size" in sysinternals process explorer

My application runs for few hours, There is no increase in any value ( vmsize, memory) of Task Manager. But after few hours i get out of memory errors.

In sysinternals i see that "Virtual Size" is contineously increasing, and when it reach around 2 GB i start getting memory errors.

So what kind of memory leak is that ? How can i demonstrate it with a code ? Is it possible to reproduce same thing with any piece of code where none of the memory value increase but only the Virtual Size in sysinternsl process explorer increase ?

thanks for any suggestions

Source: (StackOverflow)

Get JVM to grow memory demand as needed up to size of VM limit?

We ship a Java application whose memory demand can vary quite a lot depending on the size of the data it is processing. If you don't set the max VM (virtual memory) size, quite often the JVM quits with an GC failure on big data.

What we'd like to see, is the JVM requesting more memory, as GC fails to provide enough, until the total available VM is exhausted. e.g., start with 128Mb, and increase geometrically (or some other step) whenever the GC failed.

The JVM ("Java") command line allows explicit setting of max VM sizes (various -Xm* commands), and you'd think that would be designed to be adequate. We try to do this in a .cmd file that we ship with the application. But if you pick any specific number, you get one of two bad behaviors: 1) if your number is small enough to work on most target systems (e.g., 1Gb), it isn't big enough for big data, or 2) if you make it very large, the JVM refuses to run on those systems whose actual VM is smaller than specified.

How does one set up Java to use the available VM when needed, without knowing that number in advance, and without grabbing it all on startup?

Source: (StackOverflow)

Pointer implementation details in C

I would like to know architectures which violate the assumptions I've listed below. Also, I would like to know if any of the assumptions are false for all architectures (that is, if any of them are just completely wrong).

sizeof(int *) == sizeof(char *) == sizeof(void *) == sizeof(func_ptr *)
The in-memory representation of all pointers for a given architecture is the same regardless of the data type pointed to.
The in-memory representation of a pointer is the same as an integer of the same bit length as the architecture.
Multiplication and division of pointer data types are only forbidden by the compiler. NOTE: Yes, I know this is nonsensical. What I mean is - is there hardware support to forbid this incorrect usage?
All pointer values can be casted to a single integer. In other words, what architectures still make use of segments and offsets?
Incrementing a pointer is equivalent to adding sizeof(the pointed data type) to the memory address stored by the pointer. If p is an int32* then p+1 is equal to the memory address 4 bytes after p.

I'm most used to pointers being used in a contiguous, virtual memory space. For that usage, I can generally get by thinking of them as addresses on a number line. See Stack Overflow question Pointer comparison.

Source: (StackOverflow)