Crates.io | bpu_trasher |
lib.rs | bpu_trasher |
version | 0.2.0 |
source | src |
created_at | 2024-10-20 10:03:13.683271 |
updated_at | 2024-10-23 06:32:41.977342 |
description | A tool to trash the branch prediction unit |
homepage | https://github.com/pseitz/bpu_trasher |
repository | https://github.com/pseitz/bpu_trasher |
max_upload_size | |
id | 1416116 |
size | 1,020,999 |
BPU (Branch Predictor Unit) Trasher tries to trash the BPU of a processor by trying different strategies for the different components of the BPU.
This tool is useful to avoid overtraining the branch predictor unit of a processor an getting skewed results in benchmarks.
The exact components of the BPU can vary from processor to processor and are not public information.
Although intuitively, we think of branches as conditional (e.g. if
), for the CPU anything that changes the flow of the program is a branch.
Therefore, there are different types of branches:
Unconditional branches
Instructions: JMP
, CALL
Conditional branches
Instructions:JNE
, JZ
, JE
Indirect Branch
Instructions: JMP [RAX]
Like Conditional branches, but the target address is not known at compile time, e.g. a virtual function call.
Function Call
Instructions: CALL
Return from Function
This is an unconditional branch. It's its own type, since the BPU has a special handling for it.
Instructions: RET
The 2-level adaptive predictor is a common branch predictor used in modern processors.
The 2-level adaptive predictor consists of two tables:
The BHT is a table that stores the history of the last N branches. The PHT is a table that stores the prediction of the last N branches. The BHT is used to index the PHT. The PHT is used to predict the outcome of the branch.
The BTB is a cache that stores the target address of the most the frequently used conditional and unconditional branches.
Depending on the CPU, there may be different branch target buffers for different types of branches. For example, there may be a separate BTB for indirect branches.
The loop counter is a special case of a branch predictor that is used to predict the number of iterations of a loop. An entry in the BTB will contain information if the branch has loop behaviour and if yes, the number of iterations of the loop.
Indirect jumps are jumps that are not directly to a specific address but to an address that is calculated at runtime. Due to polymorphism, the BTB may need to keep multiple targets.
I think in Rust this would be as simple as:
fn call_me(obj: &dyn MyTrait) {
obj.call_me(); // could have multiple targets
}
In order to reliably trash the BPU, we need to understand how the hashing function works.
" The literature recommends that the index into the pattern history table is generated by an XOR combination of the history bits and the branch address. However, my experimental results do not confirm such a design. The indexing function in figure 3.3 may be a more complex hashing function of the history and the branch address, or it may involve the branch target address, BTB entry address or trace cache address.
Since the indexing function is not known, it is impossible to predict whether two branches will use the same entry in the pattern history table. For the same reason, I have not been able to measure the size of the pattern history table, but must rely on rumors in the literature. "
A perceptron is a type of neural network that is used in the BPU to predict the outcome of a branch. It is used in Zen 4 processors.
This is an magnificient ressource: https://www.agner.org/optimize/microarchitecture.pdf
Deep dive into an older architecture: http://www.ece.uah.edu/~milenka/docs/VladimirUzelac.thesis.pdf