in ,

Memory Layout of a Program in C, Hacker News


Lecture notes:           http://www.cs.utk.edu/~huangj/cs360 / 360 / notes / Memory / lecture.html


The machine layout that I describe is that of the machines in the hydra lab. If you try to go through the programs in this lecture on other machines (e.g. kenner, the hydra machines), you will likely get different results. However, you should be able to figure out how whatever the machine you are on is laid out.

Also, you should set up your shell so that you don’t generate core files when doing this lecture. I.e., if it is not done in your. cshrcfile, do:

UNIX>limit coredumpsize 0

This lecture is an introduction to memory as we see it in Unix.

As I have said previously, memory is like a huge array with (say) 0xffffffff elements. A pointer in C is an index to this array. Thus when a C pointer is 0xefffe 034, it points to the 0xefffe 035 th element in the memory array (memory being indexed starting with zero).

Unfortunately, you cannot access all elements of memory. One example that we have seen a lot is element 0. If you try to dereference a pointer with a value of 0, you will get a segmentation violation. This is Unix’s way of telling you that that memory location is illegal.

For example, the following code will generate a segmentation violation:

main () {   char * s;   char c;    s=(char  0;   c=* s; }

As it turns out, there are 4 regions of memory that are legal. They are:

  1. The code (or “text”): These are the instructions of your program
  2. The globals: These are your global variables (init data and bss)
  3. The heap: This is memory that you get frommalloc ().
  4. The stack: This contains your local variables and procedure                         arguments.

If we view memory as a big array, the regions (or “ segments ”) look as follows:

     | -------------- | 0      | |      | void |      | |      | -------------- | 0x 10000      | |      | code |      | |      | -------------- |      | void |      | -------------- | 0x 20000      | |      | globals |      | |      | -------------- |      | |      | heap |      | |      ||||||||||||||||      vvvvvvvvvvvvvv      | |      | |      | void |      | |      | |      | ^^^^^^^^^^^^^^ |      ||||||||||||||||      | |      | stack |      | | 0xefffffff      | -------------- |

Note, the heap grows down as you make moremalloc ()calls, and the stack goes up as you make nested procedure calls.


With an operating system provides each process it loads with a memory address space that start from 0x0 and goes up to 0xffffffff or 0x8fffffff, depending on what type of system you are on. These addresses are allvirtual memory addresses. An analogy that might help you understand is the assignment of phone numbers to your house. Phone numbers are just logical and can be easily changed, while your street address is not. Here, the operating system needs to map this virtual address space to its physical address space, i.e. entries on the chips holding the memory banks. This is part of the job done by OS in terms of memory management, which also include how to best use a limited physical address space to meet the need by a large number of processes. There are many ways to memory management, fortunately, all UNIX systems use a pretty standard approach, called paging. Let’s use the hydra machines as an example.

On the hydra machines, memory is broken up into 8192 – byte chunks. These are called pages. On some machines, pages are 4096 bytes – this is something set by the hardware. Mostly, on the same order of magnitude.

The way memory works is as follows: The operating system allocates certain pages of memory for you. Whenever you try to read to or write from an address in memory, the hardware first checks with the operating system to see if that address belongs to a page that has been allocated for you. If so, then it goes ahead and performs the read / write. If not, you’ll get a segmentation violation (note, there are many ways to get segmentation violation, and this is only one of them).

This is what happens when you do:

  s=(char  0;   c=* s;

When you say “c=* s“, the hardware sees that you want to read memory location zero. It checks with the operating system, which says “I haven’t allocated the page containing location zero for you “. This results in a segmentation violation. Note here that a page can be allocated, but not currently in the physical memory. This relates to a concept called “page fault”.

A page fault is generated when the OS detects that a process is trying to access a page in the virtual memory address space, but that page is not in the physical memory. As a result of that, the OS stops this process until that requested page is read in. Page fault is, in most cases, a page fault is not an error. Segment fault is almost always an error, antithetically.

The exact mechanics of paging are covered in classes on Operating Systems. I won’t go into it further here.

As it turns out, the first 8 pages on our hydra machines are void. This means that trying to read to or write from any address from 0 to 0xffff will result in a segmentation violation.

The next page (starting with address 0x 10000) starts the code segment. This segment ends at the variable& etext, which I’ll go over in a bit. The globals segment starts at page 0x 20000. It goes until the variable& end. The heap starts immediately after& end, and goes up tosbrk (0), which I’ll talk about still later. The stack ends with address 0xefffffff. Its beginning changes with the different procedure calls you make. We’ll go over this more later in this lecture. Every page between the end of the heap and the beginning of the stack is void, and will generate a segmentation violation upon accessing.


(For more info on these variables, doman etext, orman edata, etc. These globals are UNIX specific.)

These are three external variables that are defined as follows:

extern etext; extern edata; extern end;

Note that they are typeless. Normally, you would never use just “etext” and “end“, because whenever these variables are specified as external linkage without a defining copy in the code, these three are then treated the same as .etext, .edata, .end, all are symbols reserved byld. You use their addresses – these point to the end of the text, end of initialized data segment in globals and end of uninitialized data in the globals segments respectively.

Look at the programtestaddr1.c. This prints out the addresses ofetext,edataandend. Then it prints out 6 values:

  • mainis a pointer to the first instruction of     themain ()procedure. This is simply a     location in the code segment, which should be     familiar to you from the assembler lectures.
  • Iis a global variable. Thus& Ishould be an     address in the globals segment.
  • iis a local variable. Thus& ishould be an     address in the stack.
  • argcis an argument tomain (). Thus,& argc    should be an address in the stack.
  • iiis another local variable. Thus,& ii    should be an address in the stack. However,     iiis a pointer to memory that has been     malloc‘d. Thus,iishould be an address in     the heap.

When we runtestaddr1, we get something like the following:

UNIX>testaddr1& etext=0x 108 B8 & edata=0x  a)  & end=0x  a)   main=0x 10688 & I=0x 20 A4C & i=0xffbef 82 c & argc=0xffbef 884 & ii=0xffbef 828 ii=0x  a)  UNIX>

So, what this says is that the code segment goes from 0x 10000 to 0x 108 b8. The globals segment goes from 0x 20000 to 0x 20 a 54.Iis uninitialized, so it appears inbss(block started by symbols), right betweenedataandend. The heap goes from 0x (a) to some address greater than 0x 20 a 68 (sinceiiallocated 4 bytes starting at 0x 20 a 68). The stack goes from some address less than 0xffbef 828 to 0xffffffff. All values ​​that are printed bytestaddr1make sense.


Now, look attestaddr2.c.

This is the first really gross piece of C code that you’ll see. What it does is print out& etextand& end, and then prompt the user for an address in hexidecimal. It puts that address into the pointer variables. You should never do this unless you are writing code like this which is testing memory. The first thing that it does withsis try to read from that memory location(c=* s). Then it tries to write to the memory locations=c). This is a way to see which memory locations are legal.

So, lets try it out with an illegal memory value of zero:

UNIX>testaddr2& etext=0x 1191 b & end=0x 21 d 90  Enter memory location in hex (start with 0x):0x0Reading 0x0: Segmentation Fault UNIX>

When we tried to read from memory location zero, we got a seg fault. This is because memory location zero is in the void – the hardware recognized this by asking the operating system, and then generating a segmentation violation.

Memory locations 0x0 to 0xffff are illegal – if we try any address in that range, we will get a segmentation violation:

UNIX>testaddr2& etext=0x 1191 b & end=0x 21 d 90  Enter memory location in hex (start with 0x):0xffffReading 0xffff: Segmentation Fault UNIX>testaddr2& etext=0x 1191 b & end=0x 21 d 90  Enter memory location in hex (start with 0x):0x4abcReading 0x4abc: Segmentation Fault UNIX>

Memory location 0x 10000 is in the code segment. This should be a legal address:

UNIX>testaddr2& etext=0x 1191 b & end=0x 21 d 90  Enter memory location in hex (start with 0x):0x 10000Reading 0x 10000: 127 Writing 127 back to 0x 10000: Segmentation Fault UNIX>

You’ll note that we were able to read from 0x 10000 – it gave us the BYTE 127, which begins some instruction in the program. However, we got a seg fault when we wrote to 0x 10000. This is by design: The code segment is read-only. You can read from it, but you can’t write to it. This makes sense, because you can’t change your program while it’s running – instead you have to recompile it, and rerun it.

Now, what if we try memory location 0x 11 fff? This is above & etext, so it should be outside of the code segment:

UNIX>testaddr2& etext=0x 1191 b & end=0x 21 d 90  Enter memory location in hex (start with 0x):0x 11 fffReading 0x 11 fff: - 48 Writing - 48 back to 0x 11 fff: Segmentation Fault UNIX>

You’ll note that even though 0x 11 fff is an address outside the code segment, we’re still allowed to read from it. This is because the hardware checks the with operating system to see if an address’s page has been allocated. Since page 8 (0x 10000 – 0x 11 fff) has been allocated for the code segment, the hardware treats any address between 0x 10000 and 0x 11 fff as a legal address. You can read from it, but its value is meaningless.

Now, pages 9 to 15 are undreadable again:

UNIX>testaddr2& etext=0x 1191 b & end=0x 21 d 90  Enter memory location in hex (start with 0x):0xReading 0x 12000: Segmentation Fault UNIX>testaddr2& etext=0x 1191 b & end=0x 21 d 90  Enter memory location in hex (start with 0x):0x1f  (0) **** Reading 0x1f 00 0: Segmentation Fault UNIX>

The globals starts at 0x 20000, so we see that the 16 th page is readable and writable:

UNIX>testaddr2& etext=0x 1191 b & end=0x 21 d 90  Enter memory location in hex (start with 0x):0xReading 0x 20000: 127 Writing 127 back to 0x 20000: ok UNIX>

We can read from and write to any location (0x (to 0x) fff) in this page. The next page (starting at 0x 22000) is unreachable:

UNIX>testaddr2& etext=0x 1191 b & end=0x 21 d 90  Enter memory location in hex (start with 0x):0x 21 dffReading 0x 21 dff: 0 Writing 0 back to 0x 21 dff: ok UNIX>testaddr2& etext=0x 1191 b & end=0x 21 d 90  Enter memory location in hex (start with 0x):0x 22000Reading 0x 22000: Segmentation Fault UNIX>

What this tells us is that the globals go from 0x (to 0x) ************************************************************************************* (D) . The heap goes from 0x d) up to some higher address in the same page.


sbrk ()is a system call that we will get into in a few lectures.sbrk (0)returns to the user the current end of the heap. Since we can keep callingmalloc (),sbrk (0)can change over time.testaddr3.cshows the value ofsbrk (0)– note it is in page (0x – 0x (fff). Since the hardware performs its check in 8192 – byte intervals, we can get at any byte in page 16, even thoughsbrk (0)returns 0x C) :

UNIX>testaddr3& etext=0x 11993 & end=0x 21 e 18 sbrk (0)=0x  (e)  & c=0xffbee 103  Enter memory location in hex (start with 0x):0x 21 fffReading 0x 21 fff: 0 Writing 0 back to 0x 21 fff: ok UNIX>

We haven’t calledmalloc ()intestaddr3.c. This is the reason why& endandsbrk (0)return the same value. Intestaddr3a.cwe make amalloc ()call in the beginning of the program, and as you see,& endandsbrk (0)return different values:

UNIX>testaddr3a& etext=0x 119 A3 & end=0x 21 e 28 sbrk (0)=0x 23 e 28 & c=0xffbee 103  Enter memory location in hex (start with 0x):0x 23 fffReading 0x 23 fff: 0 Writing 0 back to 0x 23 fff: ok UNIX>testaddr3a& etext=0x 119 A3 & end=0x 21 e 28 sbrk (0)=0x 23 e 28 & c=0xffbee 103  Enter memory location in hex (start with 0x):0x 24000Reading 0x 24000: Segmentation Fault UNIX>

So, where’s the beginning of the stack? If we try addresses above 0xffbee 103 intestaddr3, we see that most of them are legal:

UNIX>testaddr3& etext=0x 11993 & end=0x 21 e 18 sbrk (0)=0x  (e)  & c=0xffbee 103  Enter memory location in hex (start with 0x):0xffb 00  0Reading 0xffb 00 00 0: 0 Writing 0 back to 0xffb 00 00 0: ok UNIX>testaddr3& etext=0x 11993 & end=0x 21 e 18 sbrk (0)=0x  (e)  & c=0xffbee 103  Enter memory location in hex (start with 0x):0xff3f 00Reading 0xff3f 00 00: 0 Writing 0 back to 0xff3f 00 00: ok UNIX>testaddr3& etext=0x 11993 & end=0x 21 e 18 sbrk (0)=0x  (e)  & c=0xffbee 103  Enter memory location in hex (start with 0x):0xff3effffReading 0xff3effff: Segmentation Fault UNIX>

What gives? As it turns out, the operating system allocates all pages from 0xff3f 00 00 to the bottom of the stack. Where is the bottom of the stack? Let’s probe:

UNIX>testaddr3& etext=0x 11993 & end=0x 21 e 18 sbrk (0)=0x  (e)  & c=0xffbee 103  Enter memory location in hex (start with 0x):0xffbeffffReading 0xffbeffff: 0 Writing 0 back to 0xffbeffff: ok UNIX>testaddr3& etext=0x 11993 & end=0x 21 e 18 sbrk (0)=0x  (e)  & c=0xffbee 103  Enter memory location in hex (start with 0x):0xffbf 00Reading 0xffbf 00 00: Segmentation Fault UNIX>

So the stack goes from 0xff3f 00 00 to 0xffbeffff . That is roughly 8 megabytes.

You can print out the default stack size, and change it using thelimitcommand (read the man page):

UNIX>limit... Stacksize 8192 kbytes ...

Whenever you call a procedure, it allocates local variables and arguments (plus a few other things) on the stack. Whenever you return from a procedure, those varables are popped off the stack. So, look attestaddr4.c. It hasmain ()call itself recursively as many times as there are arguments. You’ll see that at each recursive call, the addresses ofargcandargvand the local variableiare smaller addresses – this is because each time the procedure is called, the stack grows upward to allocate its arguments and local variables. You’ve seen this already in the assembler lectures.

UNIX>testaddr4argc=1. & argc=0xffbee 15 c, & argv=0xffbee 160, & i=0xffbee 104 argc=0. & argc=0xffbee0e4, & argv=0xffbee0e8, & i=0xffbee 08 c UNIX>testaddr4 vargc=2. & argc=0xffbee 154, & argv=0xffbee 158, & i=0xffbee0fc argc=1. & argc=0xffbee0dc, & argv=0xffbee0e0, & i=0xffbee0 84 argc=0. & argc=0xffbee 064, & argv=0xffbee0 68, & i=0xffbee 00 c UNIX>testaddr4 v o l sargc=5. & argc=0xffbee 144, & argv=0xffbee 148, & i=0xffbee0ec argc=4. & argc=0xffbee0cc, & argv=0xffbee0d0, & i=0xffbee 074 argc=3. & argc=0xffbee0 54, & argv=0xffbee 058, & i=0xffbedffc argc=2. & argc=0xffbedfdc, & argv=0xffbedfe0, & i=0xffbedf 84 argc=1. & argc=0xffbedf 64, & argv=0xffbedf 68, & i=0xffbedf0c argc=0. & argc=0xffbedeec, & argv=0xffbedef0, & i=0xffbede 94 UNIX>

Now, lets break the stack. This can be done by writing a program that allocates too much stack memory. One such program is inbreakstack1.c. It performs infinite recursion, and at each recursive step it allocates 10000 bytes of stack memory in the variableiptr. When you run this, You’ll see that you get a segmentation violation when the recursive call is made and the stack is about to dip below 0xff3f 00 00:

UNIX>breakstack1... & c=0xff3fa 347, iptr=0xff3f7c 30 ... ok & c=0xff3f7bbf, iptr=0xff3f 54 A8 ... ok & c=0xff3f 5437, iptr=0xff3f2d 20 ... ok Segmentation Fault UNIX>

Often when you have infinite recursion and overflow the stack, you get “illegal instruction” instead of Segmenation fault. To get an idea, think what part of stack is related to instructions.

The second way to break the stack is to simply allocate too much local memory. E.g. look atbreakstack2.c. It tries to allocate 10 M of memory in the stack. It segfaults inabecause it tries to reference smaller memory addresses than 0xff3f 00 00. Exactly where does the seg fault happen? Think about it – answer below.

The segfault happens inawhen the code attempts to pushiptron the stack for theprintfcall. This is because the stack pointer is pointing to the void. Had we not referenced anything at the stack pointer, our program should have worked. For example, trybreakstack3.c.

UNIX>breakstack3Calling a. i=1 After a is done. i=5 UNIX>

You should understand, and be able to explain this phenomenon.

Brave Browser
Read More

What do you think?

Leave a Reply

Your email address will not be published. Required fields are marked *

GIPHY App Key not set. Please check settings

Apple Music Web Client, Hacker News

First trailer for latest Black Christmas slasher reboot goes for “woke”, Ars Technica

First trailer for latest Black Christmas slasher reboot goes for “woke”, Ars Technica