Learning About Linux Memory Management (MM) Through Pictures (part 1)
Wouldn't it be great to learn the story of linux MM through pictures? Together
with some animation, we can follow the flow of what MM is doing on running
programs. Of course we need words and definitions, but it sure helps to
visualize some of the "black magic" that is going on under the hood. (This blog series is in multiple parts. Part 2 gives an overview of page cache and swap cache. Part 3 introduces memory allocation.)
My starting point for this linux MM adventure was the Round 12 Outreachy application in March 2016.
I picked the linux kernel project and the mentor, Rik Riel, posted two "small" tasks:
Cause the system to swap, and use ftrace and the trace points in
mm/page_alloc.c and mm/vmscan.c to determine how long it takes to allocate
(and free) pages.
Repeat the same exercise when the system is under heavy filesystem read IO,
and when doing lots of filesystem writes (enough to fill up memory).
Use kernelshark to visually examine the observed latencies.
Let's parse these "simple" tasks.
One path of learning is related to tools:
  ftrace, trace points, kernelshark
Another path of learning is related to MM concepts:
  to swap, the swap cache, the swap area, a swap file
  to allocate memory, to free (meaning 'to release') memory, free (meaning 'not allocated') memory
  filesystem: how does the kernel represent a filesystem?
  file: how does the kernel represent a file?
  "fill up memory": what is "full"? what part of memory can "fill up"?
Letâs see how much we know now. In a terminal window, run the meminfo command.
The output lists types of page frames like Anon and Mapped. This post covers those types. The next post covers Buffers, Cached, SwapCached, Active (file or anon), Inactive (file or anon), inevictable, dirty, clean, private, shared. (All terms in meminfo are explained here.)
provides statistics on memory since your last boot. It includes pages swapped in and out, and paged in and out. This information should make more sense as you read on.
The Big Picture
One day I found Gustavo Duarte's blog (from a link on a university
professor's slides) called 'brain food for hackers', which includes great pictures
for telling the MM story. This post refers to two of his posts: Anatomy of a program in memory and How the Kernel Manages Your Memory.
Lesson 1: Anatomy of a program in memory
concepts:
 -- modes: kernel, user
    -- kernel mode process, user mode process
 -- physical memory segments
    -- memory-mapped segment (file-backed, anonymous, shared, private)
 -- physical memory: used (allocated) memory, free (unallocated) memory
 -- page tables
The figure below shows the relationship between a user mode process and kernel mode processes.
Figure: User Mode+Kernel Mode
The data type mm_struct represents a processâs address space, including its virtual memory areas (vmaS). The page tables (referred to as a single âpage tableâ) map a virtual address to the address of a page descriptor, which describes and points to a physical page frame.
Figure: Virtual Address+Page Tables shows how a virtual address is divided into fields of bits. Each field is used to calculate an offset into a table.Â
Figure: Virtual Address+Page Tables
Figure Page Tables shows more detail with respect to the page tables (page global directory (PGD), page middle directory (PMD), page table entries (PTE)) and includes the page descriptor (struct page).
Figure: Page Tables (PGD, PMD, PTE)
  Lesson 2:
  How the Kernel Manages Your Memory
    process: a program managed by the kernel; it is either running, ready to run, or waiting for some event before it can resume
process (task_struct): The kernel represents a process as a process descriptor. The process address space is described by its memory descriptor, mm_struct.
process address space (mm_struct)
Figure:Â process descriptor shows the mm field pointing to the address space of the process. (mm->mm_struct). The mm_struct, in turn, points to the head of a list of virtual memory areas (vm_area_structS).
Other task_struct fields to notice, since they are important in writing device drivers (Part 5 of blog post):
 -- fs: points to fs_struct, which holds filesystem data
 -- files: points to files_struct, which stores a collection of file descriptors (1 file descriptor (struct file) for each file opened by this process)
Figure: process descriptor
vma (virutal memory area) descriptor (vm_area_struct): identifies a contiguous region of virtual memory addresses . In the figure below (p. 440, Figure 9-2), a vma is represented as a gray region in the linear address space (linear is synonymous with virtual). The mm_struct points to the head of a list of vm_area_structS.
file-backed: memory is associated with a file or device
        example: shared library, text segment, data segment
anonymous: data is not associated with a file or device
        example: stack, heap, bss segment
In Figure: Data Structures+Memory Mapping, notice the VMAs (vm_area_struct). The virtual addresses in the left vma are translated to physical addresses in (mapped to) 2 page frames in the file represented by struct inode. The right vma is mapped to one page frame in the same file. The set of vm_area_structS represent the virtual address space of a user process. If the vm_file member of a vma is not NULL, then the vma is said to be file-backed. In the Figure, both vmaS are file-backed and point to the same file. (In the current version of linux, struct file (an open file object) points directly to struct inode (the underlying file). The kernel manages physical memory in units of page frames (PAGE_SIZE is typically 4096 bytes). Each page frame is described by its own descriptor (struct page).
Bottom Line: when a process accesses a virtual address, the address is mapped to an offset in a page frame. The page frame is associated with an offset into a specific file on disk. The part of the file associated with the page frame is said to be âmemory-mappedâ. The vma is said to be âmemory-mappedâ with memory allocated to the vma.
If the vm_file member of a vma is NULL, the vma is said to be anonymous, (not file-backed).
file-backed vs anonymous: determines where the kernel writes the page frame to disk. A file-backed page frame is written to the blocks associated with the fileâs inode. An anonymous page frame is written to a swap area on disk.
Figure: Data Structures+Memory Mapping
PAGE_SIZE: typically 4096 bytes
page: a contiguous sequence of virtual addresses; the first byte has a virtual address that is a multiple of PAGE_SIZE; this term refers to the memory cells as well as the data contents of the memory
page frame: a contiguous sequence of PAGE_SIZE physical memory cells; the first byte has a physical address that is a multiple of
page descriptor (struct page): the kernelâs representation of a page frame
the kernel allocates a page frame to a process only at the time that
the process accesses a virtual address that is contained in a valid vma; think of this as lazy page frame allocation
          the mechanism used by the kernel to allocate a page of memory
      to a process on demand
Here is the kernelshark visualization of how the kernel handles a page fault.
These pages show the beginning and ending of the function graph for do_page_fault -> find_vma -> vmacache_find -> handle_mm_fault -> filemap_map_pages -> do_set_pte -> do_set_pte -> ... -> do_set_pte
The vmacache_find checks the vma corresponding to the last virtual address accessed. In Figure: Virtual Memory, mmap_cache is vmacache.
Now that we have come this far, letâs measure how far we have come.
1. Run the commands again, and see how much of the output you understand.
 2. Can you understand the following excerpts from LWN.net ?
2a. Memory used for file-backed data can be accessed by direct reads and writes and can be mapped concurrently by multiple processes, which may request differently sized mappings. [ May 11, 2016]
2b. Anonymous memory is only accessed by memory mapping (i.e. with mmap()) and the size of this mapping is usually fixed on allocation. [ May 11, 2016]
2c. ..[W]hat to do if all of the system's memory is tied up in the tmpfs filesystem (which has no backing store and only stores files in memory). [Improving the OOM killer, 4/27/2016]
The SLOB support extracted from grsecurity seems entirely broken. I
 have no idea what's going on there, I spent my time testing SLAB and
 SLUB. Having someone else look at SLOB would be nice, but this series
 doesn't depend on it. [mm: Hardened usercopy, 7/6/2016]
Part 2 of this blog post covers the page cache and swap cache. Part 3 covers memory allocation in the Zone Allocator and the Slab Allocator. Part 4 covers page frame reclamation. Part 5 covers how to interface a kernel driver to the linux kernel memory management subsystem.