12. Memory management
Functions of memory management
Hardware address relocation
Ensuring the same layout of logical address space for all processes, including multiple processes of the same binary image
Protection
Not allowing processes to access data not explicitly allocated by them
Ensuring correct accesses to the addresses
Dynamic allocation and reallocation
Enlarging and shrinking the size of address space allocated to the process
Virtualization
The process sees the same amount of memory available, regardless of the actual memory size and memory utilization by other processes
Memory management unit
MMU placed between the execution unit and memory
Address generated by the exec unit based on program instructions is called logical/virtual address
The address and access attributes are sent to MMU, which checks them for correctness and links them to the physical memory - physical address

Simple relocation
Early memory management method introduces in early 1960s !
In system mode, physical addresses are the same as logical addresses
MMU is transparent
system software executes in physical address space
User process address space create a single, contiguous block of addresses
addresses starting from 0
processes did not take more than few tens of kilowords
Address translation in relocation unit
Relocation unit contains two registers:
- DATUM (base) - physical address of the first memory cell allocated
- LIMIT - physical address of the last memory location allocated
some implementations stored the size of area
Logical address is added to the DATUM
The address obtained is compared to LIMIT
if the sum exceedes the LIMIT - memory access fails, error
Every process has its own calues of base and limit
During process switch, OS reloads these registers with values of the new process
Memory allocation
Allocation achieved via expanding the area occupied by the process
Area may be expanded if the area following it is not occupied:
- OS must move processes in memory to create a contiguous area of the newly requested size
- While moving, BASE and LIMIT are modified in process descriptors
Memory fragmentation
Multiple allocations and deallocations cause creation of small free areas in physical memory
The areas are too small to be used for new allocations despite their total size being significant
This is called memory fragmentation
OS must periodically run a defragmenter - process which moves the occupied area so that they become contiguous
Characteristics
Simple relocation unit supports the first 3 functions of memory management
- relocation
- protection - uncapable of checking correctness of references
- dynamic allocation - only via expanding the single area (low eff.)
Virtualization is not possible due to the fact that the whole space is allocated to the process as single area
Segmentation
Generalization of simple relocation
MMU implementing segmentation is called segmentation unit (SU)
Process' address space is divided according to its logical structure
- Code (text)
- Static data - constants, variables
- Stack
- Heap
Areas have different lengths (segments)
Segmentation usually visible in application programming model
Logical addresses in segmentation
Logical address spaces is 2D - contrary to von Neumann paradigm
Consists of two parts:
- segment identifier
- intrasegment address (offset)
SU translates addresses in 2D space into addresses in 1D (linear) space
Address generated by SU is called linear address
may be used as physical address
Address translation
Execution unit passes to the SU a logical address - pair of segment identifier and intrasegment address, as well as information on access attributes (privilege level, access type)
Based on segment identifier, SU finds the segment descriptor. It contains:
- Validity tag
- Segment access right
- Segment size
- Linear base address
SU signals access error and blocks the memory reference when: - Descriptor is invalid (validity tag = 0, not other field are present in an invalid descriptor)
- Access attributes are not compatible with access rights in descriptor
- Intrasegment address exceeds segment size
If no error is detecter - SU generates linear address by adding intrasegment address (from execution unit) to linear base adderss (in segment descriptor)
Memory allocation in segmentation
Allocation achieved in two ways:
- Expanding the already existing heap segment
- Creating a new heap segment
In both cases, memory fragmentation can occur
Virtual memory based on segmentation
To implement virtual memory, address space must be divided into many fragments
Usually program address space contains:
- 1+ code segments
- 1 static data segment (optionally separate segment for constants)
- 1 stack segment
- 1+ heap segments
Division of heap and code into segments makes operations on pointers more difficult
Fragmentation of mass storage implies very time-consuming defragmentation, lasting for minutes
Characteristics
Pros:
- proper relocation
- excellent protection based on logical structure of address space
Const: - complicates programming model (2-element address)
- problems with dynamic allocation
- practically not implementable virtual memory
Segmentation in presented form does not appear in contemporary computers
exception - x86 due to historical reasons
Paging
Division of logical and physical address spaces into small, size-aligned blocks of sizes equal to
Division is not related to logical structure of address space
Blocks in logical/virtual space are called logical/virtual pages or pages
Blocks of physical space - physical pages or page frames
Older implementation used single, fixed page size. New ones allow for different page sizes
Paging is not visible in application programming model
Paging unit
MMU implementing paging is called paging unit
Typical page size - 4 or 8KiB
modern paging units allow for simultaneous use of significantly bigger pages - from 0.5MiB to 1GiB
Address translation implemented as mapping of a logical page to a physical one
For purpose of translation - paging unit divides virtual address into two fields
- Virtual Page Number (VPN) - most significant bits
- Intrapage address (offset) - least significant bits
During translation VPN is substituted with physical page number (PPN)
offset is not affected
Translation buffer
Basic hardware structure appearing in paging unit is a translation buffer. Different names:
- TLB (translation lookaside buffer) - historical name, mostly used
- TB (translation buffer)
- ATC (address translation cache)
Translation buffer stores some number of recently used valid page descriptors
The buffer is implemented as a cache with high associativity

Address translation in paging unit
Based on VPN, translation buffer finds the page descriptor
if no descriptor is found - TLB miss
The page descriptor contains:
- validity tag (only valid descriptors are stored in TLB, but some TLB entries may be empty)
- access rights tag (user/system, read/write/execute)
- physical page number
- additional tags/attributes used by OS and hardware
If current access attributes sent by execution unit are not compatible with access right in page descriptor - paging error, access aborted
Physical address is generated by concatenating PPN and intrapage offset
Translation buffer miss - service
When TLB miss occurs, new descriptor must be found and loaded
Page descriptors are prepared by OS and stored in tables in main memory
- CISC-style paging unit - descriptors fetched from memory tables
- RISC-style paging unit - OS is responsible for loading descriptors
Invalid descriptors are not loaded into TLB
Virtual memory based on paging
Large number of pages allows for flexible determination of working set
pages not needed atm are moved to secondary storage (fixed disk)
page swapping algorithm uses Accessed and Dirty bits in page descriptors
Fixed page size removes possibility of fragmentation of primary and secondary storage
always possible to allocate a new page in the place of an old one
no need for defragmentation
Implementation of virtual memory is easy and effective - that's why paging was developed
Storing page descriptors in memory
Typical case for a 32-bit processor
- Page size - 4KiB
- Address space contains
(> ) pages - Simple page descriptor occupies 4 bytes
descriptors would occupy 4MiB
To reduce this amount of memory, descriptors are stored in a tree-of-tables structures
Tree-of-tables example - x86
each table contains 1024 descriptors
two levels of tables
invalid descriptor in top-level table means that no next-level table exists and the whole fragment is unallocated
valid descriptor points to the next level descriptor table
Storing page descriptors
Mapping between virtual and physical addresses is unique for every process
Every process has its own descriptor tables
Top level table is pointed to by a special processor register - page table base pointer
Two-level page descriptor tables

For the purpose of table walk, virtual addresses are divided into 3 fields
- INDEX2 - 10 most significant bits
- INDEX1 - 10 middle bits
- OFFSET - 12 least significant bits
Most significant bits are used to select a descriptor from top level descriptor table
every descriptor in this table describesof address space
invalid descriptor marks the whole 4MiB area invalid & inaccessible
valid descriptor points to next level descriptor table
1st level table is indexed using middle bits of virtual address
table contains page descriptors
Each table has size of 4KiB and occupies exactly one page
Memory usage
Process' address space is divided into user and system part
System part mapping is the same for all processes
L2 table describes system part shared between processes
Every process has its own:
L2 level table
set of L1 tables describing the mapping of user part of address space
Typical process occupies from few KB to about 100MB of adderess space in 3/4 areas
one L1 and 3-30 L2 tables are required
total size of tables ranges from 20 to 100 KiB
x86 three-level descriptor tables - PAE mode
With 2-level tables, changes in system memory allocation crossing the 4MiB block boundaries require creation of new L1 table and linking it to all L2 tables.
Solution - 3 level table
extra lebel allows for separation of global, system L2 tables and tasks' private tables
changes in L2 do not imply changes in L3
L2 table location is fixed, only contents may change
PAE paging in x86
- 8-byte pege descriptors, 512 descriptors per page
- L3 contains 4 descriptors pointing to L2 tables
Paging in 64-bit processors
Logical address - 64-bits
Virtual address may be shorter than logical address (48-bits in x86-64)
occupies the bottom and top of logical address space
valid virtual addresses are canonical - all significant bits have the same value
references to non-canonical addreseses are treated as errors
Long virtual address requires many levels of tree-of-tables structures
x86-64/AMD64 - 4 levels
TLB misses
Translation buffer miss causes table walk
TLB misses are infrequent, page descriptors are rarely caches in processor's caches
- during table walk - MMU performs 2-4 accesses into slow memory - time needed is equal to exec time of hundreds of instructions
High cost of TLB miss brings the need for reducing the number of misses
Initial misses appear after every process switch
Misses result from limited capacity of TLB
Avoiding initial misses in TLB
Since virtual-to-physical mapping is unique for each process, descriptors should be flushed from TLB during process switch
Mapping of system part of address space is common to all processes
no need to remove them from TLB
Solution - global descriptors
page descriptor may be marked as global by the OS using a flag bit
global descriptors are not flushed from TLB during normal flush operations
Avoiding TLB 'size' misses
Need for virtualization -> small page size -> big number of descriptors
No virtualization == no need to divide address space into small pages
Not virtualized areas:
- system kernel
- whole physical memory mapped in system part of address space
- I/O devices like graphics controller (buffer size >1GiB)
There areas are handled using small number of big page descriptors
Big pages - x86
Normal descriptor in L2 points to L1, describing 4MiB space divided into 4KiB pages
Big descriptor in L2 describtes 4MiB area not divided into smaller pages
single decriptor refers to whole 4MiB area
22 LSb are used are intrapage addresses
Similar solution exists for 3-level tables