Final prep

7 questions

  1. x86 assembly code
    after each line:
    • determine value of result
    • address/register were result is stored (numeric values in case of address)
  2. Cache organization / Paging / Table organization
    Which bits are used to select what (byte indexing, set selection, ...)
    Pages on the PU
    • given page size, size of page descriptor, virtual address size - figure out minimum number of levels in the address table to implement the structure
    • What's the maximum number of address bits which could still be covered by that table structure
  3. Operations of virtual memory
  4. 4 ! different functions of memory management
  5. Role of exceptions
  6. Role of the paging unit
  7. What is done in hardware/software for virtual memory system
  8. Describe the register sets / model of condition operations / flags
  9. Programming model vs. implementation - properties of programming model
    What characteristics of the programming model make pipelining possible
  10. Calling convention - function prologue/epilogue

Cache organization

Cache design

Data is stored in big units called lines, x4 size of memory word (32/64 bytes)
LSB of address is used to select part of line

Fully associative cache

Data accessed based on association of data - similar to phone book
Every cell may store data from any address - all cells must be searched during every address
Results in misses if working set exceeds cache size

Direct mapped cache

Ordinary RAM, single equality operator
LSB part of address - offset at line
Middle part - address in cache
MSB - address tag, physical address, compared with one obtained from cache
During every access cycle, one line is selected based on middle bits
Cache hit - address tag matches address from cache
Cache miss - selected line is replaces with data fetched from memory
Speeds ups references, if working set is smaller than x2 cache capacity
Data from given address may be stored only in one line, cannot store data from two addresses with identical middle parts

Set associative cache

Several direct-mapped caches, number of possible data locations equal to number of blocks
every cycle searches for a single line of every block - collection of lines - set
such sets behave line a tiny fully-associative cache
Number of blocks - cache associativity - two way / four way
Allows to store data from addresses with equal middle part

Selecting data

Capacity=Associativity×LineSize×Sets
24KiB cache, with 32B line size, 4-way associative
1024B×2432B=768 - number of lines
7684=192=28 8 bits to select set
32B=25 5 bits to select offset in the line
Left bits are the MSB of the physical address (compared with one stored in cache)
University/WUT/ECOAR/pictures/Pasted image 20250612204611.png

EXERCISE

System uses - 40bit virtual address, 8KiB page size and Page Table Entry is 4 bytes. What is the minimum number of levels in hierarchical table needed to fit every page?

8KiB page size = 213 - 13 bits are used as offset in the virtual address
This leaves 4013=27 bytes left for page indexing
81924=2048=211 entries per page, so 11 bits are used for each level
27112.45=3

EXERCISE

System uses - 52bit virtual address, 16KiB page size and 10bit PTE

16KiB page size = 214 - 14 bits as offset
5214=38 bits left for level indexing
2141016382048=210 - 10 bits per level
38103.8=4 levels

EXCERCISE

In a virtual memory system, size of virtual address is 32-bit, size of physical address is 30-bit, page size is 4 Kbyte and size of each page table entry is 32-bit. The main memory is byte addressable. Which one of the following is the maximum number of bits that can be used for storing protection and other information in each page table entry?

4KB page - 4096=212 - 12 bits of offset 20 bits left for level indexing
Physical address is 30 bits - sum of PFN and offset, so PFN takes 3012=18 bits
PTE contains PFN and additional information, which takes 32 - 18 = 14 bits.

EXCERCISE

Consider a system with byte-addressable memory, 32 bit logical addresses, 4 kilobyte page size and page table entries of 4 bytes each. The size of the page table in the system in megabytes is

Total addressable space - 232 bytes
Single page addressable space - 212 bytes
Number of pages needed 232212=220
With each page having 4 bytes per PTE - 4×220=4MiB of space

Operations of virtual memory

The memory visible in an application programming model (by software) is virtual memory layer of memory hierarchy - not physical
Virtualization allows the process to see the same amount of memory available, regardless of the actual memory size and memory utilization by other processes.
The virtualization is managed by Memory Management Unit, which translates the logical address provided by the processor (from process' virtual memory) into a physical address.
Most basic way of doing so is called simple relocation and bases on saving process' base address and adding it to the logical one. This is simple, fast but causes memory fragmentation due to allocating and deallocating space for the processes memory space.
Second operation is segmentation which divides the process' memory space into segments - most commonly 4 - text (code), static data, stack and heap. Every of the segments has their own segment identifier, and every data in them is accessed using intrasegment address, which is added into the base linear address, obtained from segment descriptor (to which the segment identifier points to). This calculating is done in the segmentation unit. The segment descriptor contains:

4 functions of memory management

1. Hardware address relocation

Ensuring the same layout of logical address space for all processes, including multiple processes of the same program (binary image)

2. Protection

Not allowing processes to access data not explicitly allocated/allowed for them to access. Ensuring correct accesses to the addresses

3. Dynamic allocation and reallocation

Shrinking and enlarging the size of address space allocated to the process

4. Virtualization

Ensuring the process sees the same amount of memory available, regardless of the actual memory size and memory utilization by other processes

Role of exceptions

Exceptions are a way of signaling the processor of important events during a process' execution. We divide exceptions into asynchronous (not caused by the program being executed) - interrupts, and synchronous (result of instruction execution) - traps and errors.
The exceptions can be used as event signals - as in case of interrupts about data transfer finish, key press, mouse move etc, or program errors (faults, aborts) - undefined instruction, misaligned memory access, protection rules violation. The traps are used to suspend program's execution, used for example in debugging (instruction stepping).

Role of paging unit

The paging unit is used to translate a virtual address obtained by the processor from process, into a physical address for the memory. The translation is done by Translation Buffer, which combines offset from the virtual address, and the appropriate physical page number obtained from the page descriptor found by TLB in the page tables.
Using the TLB, the page unit finds the corresponding physical address, checks access rights and validity of the found data in order to then serve back the correct data to the processor.

What is done in hardware/software for virtual memory system

On the hardware level, the modifications depend on the used virtual memory solution. For simple relocation its the addition of relocation unit, for segmentation - segmentation unit and for paging - paging unit, all in Memory Management Unit.
In software level, the segmentation divides process' memory space into segments, which is visible in the application programming model. The introduction of multiprocess processor which requires dividing the segments (code and heap) further into smaller subsegments greatly complicates the pointer arithmetics and requires time consuming defragmention, caused by allocation/deallocation of memory space resulting in fragmentation.
Paging on the other hand is invisible to the application programming model, but requires the introduction and allocation of pages at the OS level, as well as their management.

Describe register sets / model of condition operations / flags

Register sets
Model of condition operations

Condition operations are based on flag values. Instructions:

Flags

Programming model vs. implementation - properties of programming model

what characteristics of the programming model make pipelining possible

For the architecture to be implementable in pipeline processor, it needs to meet specific requirements which form the base of pipeline processor concept. These are:

Calling convention - function prologue and epilogue

In the x86 C calling convention, before the main part of a routine, there a few steps that need to be done to ensure proper, safe code execution.
First of them is saving the initial stack pointer, which can be then used to address local variables and passed arguments. Then, if any callee-saved registers are used such as EBX, they have to be saved (easiest way is to push them onto the stack). This ensures proper stack alignment on the function return.
After the routine, the function epilogue has to be run, which return the most important register values back to their original ones. The values pushed onto the stack during the prologue are now popped back, in the reverse order (due to the stack build and workings), and the return instruction called, to return from the routine.


Last year exam

University/WUT/ECOAR/pictures/att.W0Q0nTKyYiKznEodrKxJHYEVoAZ9xwriKJviRKCeAaU.jpg

1. Simple data types - characteristic, binary representation

numeric, no decimal:

2. Describe features of RISC architecture which make it possible to implement a processor as pipeline

Main features of RISC that allows to implement a pipeline processor:

3. Methods of interacting with I/O devices

There are three main methods of communicating with peripheral devices

  1. Polling - infinite loop in the executing code. Doesn't require any hardware additions, but takes processor time and is only suitable for single-process execution units.
  2. Interrupts - peripheral announces its ready state to the processor via an interrupt. Exception service instruction then proceeds to receive the data. If the processor is not ready to receive the data, the process is suspended.
    Very small hardware overhead, mainly software implementation, but causes lots of switching between 'normal' processes and exception service, which causes delays.
  3. Direct Memory Access (DMA) - additional component of the computer which goal is to communicate with the peripheral and exchange data. The DMA receives a whole line of data from the device, directly placing it into memory (without the interference of execution unit) and signals the exec unit about the received data via raising an exception. This greatly reduces number of exceptions a execution unit must serve, compared to the previous method.
    Requires additional hardware, but completely disconnects execution unit from transfer control.

4. Addressing modes

There are multiple addressing modes:

  1. Register direct - mov eax, ebx
  2. Register immediate - mov eax, 123
  3. Register indirect - mov eax, [ebx]
  4. Register indirect with displacement - mov eax, [ebx + 2]
  5. Double register indirect - mov eax, [ebx + ecx]
  6. Absolute direct - mov eax, [memVal]
    For the simplest architecture and basic HLL support, only three are needed - register direct, register immediate and one of register indirects (usually with displacement)

5. Hardware address relocation (definition, purpose, implementation)

Hardware address relocation is a process of implementing an software-based virtual address which is the translated into a physical address, used in physical memory. It allows the program to run without the need for manual, per-program memory management, as each program can run in its own, virtual memory which is then translated onto the physical address space.
It's purpose is to simplify memory management on the user process level, allow multiple processes to run at the same time and memory protection between processes.
The translation of the virtual address into the physical one happens by adding the internal, virtual address to the base program address saved in the relocation/base register. By this operation, the execution unit is able to calculate the proper physical address of the memory needed.

6. Interrupts

Interrupts are a type of exception, which can get thrown both by software and hardware. If the interrupt is thrown at the processor level, its priority level is usually low and the service is handled between instructions, but if it is thrown at the OS level, its priority is usually the highest and must be services as fast as possible, due to the risk of it being lost. They are serviced by Interrupt Service Routine.
Examples of interrupts are - key press, mouse move, disk data transfer finish interrupt, network packet packet received.
During Interrupt Service Routine, the interrupt priority level is saved in the system status register, and no other lower interrupt may interfere with the service routine, only higher-level ones.
During the interrupt service routine, current processor context is saved (in architectures with stack - onto the stack, without - context save register), and restored after the service is ended.

7. Following x86 assembly routine

	push ebp
	mov ebp, esp
	mov ebx, [ebp + 8]
	xor eax, eax
oclop:
	adc, al 0
	shr ebx, 1
	jz fin
	jmp oclop
fin:
	adc, al 0
	pop ebp
	ret

a) The possible C language declaration/prototype of this module

unsigned int ocnt(unsigned int val);

b) The purpose of this routine (the return value) is:

a number of 1's in the input binary representation (bit count)

c) The unconditional jump may be removed by the following modification:

; original
	;jz fin
	;jmp oclop
; new
	jnz oclop

d) The routine violates the x86 C calling convention because

it uses but does not save EBX register, which is a callee-saved register.

e) To make it compliant with x86 C calling convention, it should be modified by:

pushing EBX onto the stack right after the function prologue and popping it before the epilogue

; prologue
	push ebp
	mov ebp, esp
	push ebx
	mov ebx, [ebp + 8]
	;...
; epilogue
	pop ebx
	pop ebp
	ret