12. Memory management

Functions of memory management

Hardware address relocation

Ensuring the same layout of logical address space for all processes, including multiple processes of the same binary image

Protection

Not allowing processes to access data not explicitly allocated by them
Ensuring correct accesses to the addresses

Dynamic allocation and reallocation

Enlarging and shrinking the size of address space allocated to the process

Virtualization

The process sees the same amount of memory available, regardless of the actual memory size and memory utilization by other processes

Memory management unit

MMU placed between the execution unit and memory
Address generated by the exec unit based on program instructions is called logical/virtual address
The address and access attributes are sent to MMU, which checks them for correctness and links them to the physical memory - physical address

Simple relocation

Early memory management method introduces in early 1960s !
In system mode, physical addresses are the same as logical addresses
MMU is transparent
system software executes in physical address space
User process address space create a single, contiguous block of addresses
addresses starting from 0
processes did not take more than few tens of kilowords

Address translation in relocation unit

Relocation unit contains two registers:

DATUM (base) - physical address of the first memory cell allocated
LIMIT - physical address of the last memory location allocated
some implementations stored the size of area
Logical address is added to the DATUM
The address obtained is compared to LIMIT
if the sum exceedes the LIMIT - memory access fails, error
Every process has its own calues of base and limit
During process switch, OS reloads these registers with values of the new process

Memory allocation

Allocation achieved via expanding the area occupied by the process
Area may be expanded if the area following it is not occupied:

OS must move processes in memory to create a contiguous area of the newly requested size
While moving, BASE and LIMIT are modified in process descriptors

Memory fragmentation

Multiple allocations and deallocations cause creation of small free areas in physical memory
The areas are too small to be used for new allocations despite their total size being significant
This is called memory fragmentation
OS must periodically run a defragmenter - process which moves the occupied area so that they become contiguous

Characteristics

Simple relocation unit supports the first 3 functions of memory management

relocation
protection - uncapable of checking correctness of references
dynamic allocation - only via expanding the single area (low eff.)
Virtualization is not possible due to the fact that the whole space is allocated to the process as single area

Segmentation

Generalization of simple relocation
MMU implementing segmentation is called segmentation unit (SU)
Process' address space is divided according to its logical structure

Code (text)
Static data - constants, variables
Stack
Heap
Areas have different lengths (segments)
Segmentation usually visible in application programming model

Logical addresses in segmentation

Logical address spaces is 2D - contrary to von Neumann paradigm
Consists of two parts:

segment identifier
intrasegment address (offset)
SU translates addresses in 2D space into addresses in 1D (linear) space
Address generated by SU is called linear address
may be used as physical address

Address translation

Execution unit passes to the SU a logical address - pair of segment identifier and intrasegment address, as well as information on access attributes (privilege level, access type)
Based on segment identifier, SU finds the segment descriptor. It contains:

Validity tag
Segment access right
Segment size
Linear base address
SU signals access error and blocks the memory reference when:
Descriptor is invalid (validity tag = 0, not other field are present in an invalid descriptor)
Access attributes are not compatible with access rights in descriptor
Intrasegment address exceeds segment size
If no error is detecter - SU generates linear address by adding intrasegment address (from execution unit) to linear base adderss (in segment descriptor)

Memory allocation in segmentation

Allocation achieved in two ways:

Expanding the already existing heap segment
Creating a new heap segment
In both cases, memory fragmentation can occur

Virtual memory based on segmentation

To implement virtual memory, address space must be divided into many fragments
Usually program address space contains:

1+ code segments
1 static data segment (optionally separate segment for constants)
1 stack segment
1+ heap segments
Division of heap and code into segments makes operations on pointers more difficult
Fragmentation of mass storage implies very time-consuming defragmentation, lasting for minutes

Characteristics

Pros:

proper relocation
excellent protection based on logical structure of address space
Const:
complicates programming model (2-element address)
problems with dynamic allocation
practically not implementable virtual memory
Segmentation in presented form does not appear in contemporary computers
exception - x86 due to historical reasons

Paging

Division of logical and physical address spaces into small, size-aligned blocks of sizes equal to $n^{2}$
Division is not related to logical structure of address space
Blocks in logical/virtual space are called logical/virtual pages or pages
Blocks of physical space - physical pages or page frames
Older implementation used single, fixed page size. New ones allow for different page sizes
Paging is not visible in application programming model

Paging unit

MMU implementing paging is called paging unit
Typical page size - 4 or 8KiB
modern paging units allow for simultaneous use of significantly bigger pages - from 0.5MiB to 1GiB
Address translation implemented as mapping of a logical page to a physical one
For purpose of translation - paging unit divides virtual address into two fields

Virtual Page Number (VPN) - most significant bits
Intrapage address (offset) - least significant bits
During translation VPN is substituted with physical page number (PPN)
offset is not affected

Translation buffer

Basic hardware structure appearing in paging unit is a translation buffer. Different names:

TLB (translation lookaside buffer) - historical name, mostly used
TB (translation buffer)
ATC (address translation cache)
Translation buffer stores some number of recently used valid page descriptors
The buffer is implemented as a cache with high associativity

Address translation in paging unit

Based on VPN, translation buffer finds the page descriptor
if no descriptor is found - TLB miss
The page descriptor contains:

validity tag (only valid descriptors are stored in TLB, but some TLB entries may be empty)
access rights tag (user/system, read/write/execute)
physical page number
additional tags/attributes used by OS and hardware
If current access attributes sent by execution unit are not compatible with access right in page descriptor - paging error, access aborted
Physical address is generated by concatenating PPN and intrapage offset

Translation buffer miss - service

When TLB miss occurs, new descriptor must be found and loaded
Page descriptors are prepared by OS and stored in tables in main memory

CISC-style paging unit - descriptors fetched from memory tables
RISC-style paging unit - OS is responsible for loading descriptors
Invalid descriptors are not loaded into TLB

Virtual memory based on paging

Large number of pages allows for flexible determination of working set
pages not needed atm are moved to secondary storage (fixed disk)
page swapping algorithm uses Accessed and Dirty bits in page descriptors
Fixed page size removes possibility of fragmentation of primary and secondary storage
always possible to allocate a new page in the place of an old one
no need for defragmentation
Implementation of virtual memory is easy and effective - that's why paging was developed

Storing page descriptors in memory

Typical case for a 32-bit processor

Page size - 4KiB
Address space contains $2^{20}$ (> $10^{6}$ ) pages
Simple page descriptor occupies 4 bytes
$2^{20}$ descriptors would occupy 4MiB
To reduce this amount of memory, descriptors are stored in a tree-of-tables structures
Tree-of-tables example - x86
each table contains 1024 descriptors
two levels of tables
invalid descriptor in top-level table means that no next-level table exists and the whole fragment is unallocated
valid descriptor points to the next level descriptor table

Storing page descriptors

Mapping between virtual and physical addresses is unique for every process
Every process has its own descriptor tables
Top level table is pointed to by a special processor register - page table base pointer

Two-level page descriptor tables

For the purpose of table walk, virtual addresses are divided into 3 fields

INDEX2 - 10 most significant bits
INDEX1 - 10 middle bits
OFFSET - 12 least significant bits
Most significant bits are used to select a descriptor from top level descriptor table
every descriptor in this table describes $2^{22} = 4 MiB$ of address space
invalid descriptor marks the whole 4MiB area invalid & inaccessible
valid descriptor points to next level descriptor table
1st level table is indexed using middle bits of virtual address
table contains page descriptors
Each table has size of 4KiB and occupies exactly one page

Memory usage

Process' address space is divided into user and system part
System part mapping is the same for all processes
L2 table describes system part shared between processes
Every process has its own:
L2 level table
set of L1 tables describing the mapping of user part of address space
Typical process occupies from few KB to about 100MB of adderess space in 3/4 areas
one L1 and 3-30 L2 tables are required
total size of tables ranges from 20 to 100 KiB

x86 three-level descriptor tables - PAE mode

With 2-level tables, changes in system memory allocation crossing the 4MiB block boundaries require creation of new L1 table and linking it to all L2 tables.
Solution - 3 level table
extra lebel allows for separation of global, system L2 tables and tasks' private tables
changes in L2 do not imply changes in L3
L2 table location is fixed, only contents may change
PAE paging in x86

8-byte pege descriptors, 512 descriptors per page
L3 contains 4 descriptors pointing to L2 tables

Paging in 64-bit processors

Logical address - 64-bits
Virtual address may be shorter than logical address (48-bits in x86-64)
occupies the bottom and top of logical address space
valid virtual addresses are canonical - all significant bits have the same value
references to non-canonical addreseses are treated as errors
Long virtual address requires many levels of tree-of-tables structures
x86-64/AMD64 - 4 levels

TLB misses

Translation buffer miss causes table walk
TLB misses are infrequent, page descriptors are rarely caches in processor's caches

during table walk - MMU performs 2-4 accesses into slow memory - time needed is equal to exec time of hundreds of instructions
High cost of TLB miss brings the need for reducing the number of misses
Initial misses appear after every process switch
Misses result from limited capacity of TLB

Avoiding initial misses in TLB

Since virtual-to-physical mapping is unique for each process, descriptors should be flushed from TLB during process switch
Mapping of system part of address space is common to all processes
no need to remove them from TLB
Solution - global descriptors
page descriptor may be marked as global by the OS using a flag bit
global descriptors are not flushed from TLB during normal flush operations

Avoiding TLB 'size' misses

Need for virtualization -> small page size -> big number of descriptors
No virtualization == no need to divide address space into small pages
Not virtualized areas:

system kernel
whole physical memory mapped in system part of address space
I/O devices like graphics controller (buffer size >1GiB)
There areas are handled using small number of big page descriptors

Big pages - x86

Normal descriptor in L2 points to L1, describing 4MiB space divided into 4KiB pages
Big descriptor in L2 describtes 4MiB area not divided into smaller pages
single decriptor refers to whole 4MiB area
22 LSb are used are intrapage addresses
Similar solution exists for 3-level tables