Crate rtm [−] [src]
Intel RTM Extensions.
Please note this crate only works on x86_64 Intel processors, and only those built after the boardwell 6th generation.
Basic Intro:
RTM works very similiar to a database. You can read/write memory but you have to commit the changes. If another thread modifies the same region as you are, the other RTM transaction will abort (the second chronologically).
RTM transaction can also be cancelled. Meaning
if you do not want to commit a transaction
as in you wish to roll it back that can be
accomplished via abort(x: u8)
interface
within this library if you hit a condition
that requires rolling back the transaction.
Deep Dive:
Now we need to perform a deep dive into
into RTM and it's implementation. RTM works on
the cache line level. This means each region
RTM thinks it is exclusive to a cache line.
Each cache line in Intel CPU's is 64bytes,
so you will wish to ensure that your data
structures being modified WITHIN RTM
transactions are X * 64 = size_of::<T>()
or 0 == size_of::<T>() % 64
. At the same
time you will wish to ensure the allocation
is on the 64 byte boundry (this is called
allignment) this simply means
&T % 64 == 0
(the physical pointer).
The reason for this false sharing. If a different thread modifies the same cacheline you have decared RTM your modification may abort reducing your preformance.
RTM works via the MESIF protocol. These are the states a Cache Line can be in. E (Exclusive), M (Modified), S (Shared), F (Forward), I (Invalid). Effectively RTM attempts to ensure that all the writes/reads you will perform are on E/F values (Exclusive/Forward). This means you either own the the only copy of this in Cache OR another thread may read this data, but not write to it.
If another thread attempts to write to a cacheline
during the RTM transaction the status of your cache
will change E -> S
or F -> I
. And the other
thread is not executing RTM code, your transaction
will abort.
Architecture Notes:
RTM changes are buffered in L1 cache. so too many changes can result in very extreme performance penalities.
RMT changes are a full instruction barrier, but
they are not the same as an mfence
or sfence
or lfence
instruction (only to the local cache
lines effected by an RTM transaction).
Performance Notes:
For modification of a single cache line
AtomicUsize
or AtomicPtr
will be faster even
in SeqCst
mode. RTM transaction are typically
faster for larger transaction on the order of
several cache lines (typically >300
bytes) or so.
Reexports
pub use tsx::*; |
Modules
tsx |
Raw extension bindings |
Enums
Abort |
Why the transaction aborted |