Crates.io | framehop |
lib.rs | framehop |
version | 0.13.1 |
source | src |
created_at | 2022-03-12 22:54:35.257386 |
updated_at | 2024-10-03 20:44:50.358604 |
description | Stack frame unwinding support for various formats |
homepage | |
repository | https://github.com/mstange/framehop/ |
max_upload_size | |
id | 548942 |
size | 254,511 |
Framehop is a stack frame unwinder written in 100% Rust. It produces high quality stacks at high speed, on multiple platforms and architectures, without an expensive pre-processing step for unwind information. This makes it suitable for sampling profilers.
It currently supports unwinding x86_64 and aarch64, with unwind information formats commonly used on Windows, macOS, Linux and Android.
You give framehop register values, stack memory and unwind data, and framehop produces a list of return addresses.
Framehop can be used in the following scenarios:
samply
uses it.MustNotAllocateDuringUnwind
.As a user of framehop, your responsibilities are the following:
In turn, framehop solves the following problems:
__unwind_info
(macOS).eh_frame
(using .eh_frame_hdr
as an index, if available).debug_frame
.pdata
, .rdata
and .xdata
(for Windows x86_64).debug_frame
and for .eh_frame
without .eh_frame_hdr
.Framehop is not suitable for debuggers or to implement exception handling. Debuggers usually need to recover all register values for every frame whereas framehop only cares about return addresses. And exception handling needs the ability to call destructors, which is also a non-goal for framehop.
Framehop is so fast that stack walking is a miniscule part of sampling in both scenarios where I've tried it.
In this samply example of profiling a single-threaded Rust application, walking the stack takes a quarter of the time it take to query macOS for the thread's register values. In another samply example of profiling a Firefox build without frame pointers, the dwarf unwinding takes 4x as long as the querying of the register values, but is still overall cheaper than the cost of thread_suspend + thread_get_state + thread_resume.
In this example of processing a perf.data
file, the bottleneck is reading the bytes from disk, rather than stackwalking. With a warm file cache, the cost of stack walking is still comparable to the cost of copying the bytes from the file cache, and most of the stack walking time is spent reading return addresses from the stack bytes.
Framehop achieves this speed in the following ways:
rip
, rsp
and rbp
, and on aarch64 that's lr
, sp
and fp
. All other registers are not needed - in theory they could be used as inputs to DWARF CFI expressions, but in practice they are not.__unwind_info
are only accessed during unwinding, and the binary search happens right inside the original __unwind_info
memory. For DWARF unwinding, framehop uses the excellent gimli
crate, which was written with performance in mind.Furthermore, adding a module is fast too because framehop only does minimal up-front parsing and processing - really, the only thing it does is to create the index of FDE offsets for .eh_frame
/ .debug_frame
.
Framehop is still a work in progress. Its API is subject to change. The API churn probably won't quieten down at least until we have one or two 32 bit architectures implemented.
That said, framehop works remarkably well on the supported platforms, and is definitely worth a try if you can stomach the frequent API breakages. Please file issues if you run into any trouble or have suggestions.
Eventually I'd like to use framehop as a replacement for Lul in the Gecko profiler (Firefox's built-in profiler). For that we'll also want to add x86 support (for 32 bit Windows and Linux) and EHABI / EXIDX support (for 32 bit ARM Android).
use framehop::aarch64::{CacheAarch64, UnwindRegsAarch64, UnwinderAarch64};
use framehop::{ExplicitModuleSectionInfo, FrameAddress, Module};
let mut cache = CacheAarch64::<_>::new();
let mut unwinder = UnwinderAarch64::new();
let module = Module::new(
"mybinary".to_string(),
0x1003fc000..0x100634000,
0x1003fc000,
ExplicitModuleSectionInfo {
base_svma: 0x100000000,
text_svma: Some(0x100000b64..0x1001d2d18),
text: Some(vec![/* __text */]),
stubs_svma: Some(0x1001d2d18..0x1001d309c),
stub_helper_svma: Some(0x1001d309c..0x1001d3438),
got_svma: Some(0x100238000..0x100238010),
unwind_info: Some(vec![/* __unwind_info */]),
eh_frame_svma: Some(0x100237f80..0x100237ffc),
eh_frame: Some(vec![/* __eh_frame */]),
text_segment_svma: Some(0x1003fc000..0x100634000),
text_segment: Some(vec![/* __TEXT */]),
..Default::default()
},
);
unwinder.add_module(module);
let pc = 0x1003fc000 + 0x1292c0;
let lr = 0x1003fc000 + 0xe4830;
let sp = 0x10;
let fp = 0x20;
let stack = [
1, 2, 3, 4, 0x40, 0x1003fc000 + 0x100dc4,
5, 6, 0x70, 0x1003fc000 + 0x12ca28,
7, 8, 9, 10, 0x0, 0x0,
];
let mut read_stack = |addr| stack.get((addr / 8) as usize).cloned().ok_or(());
use framehop::Unwinder;
let mut iter = unwinder.iter_frames(
pc,
UnwindRegsAarch64::new(lr, sp, fp),
&mut cache,
&mut read_stack,
);
let mut frames = Vec::new();
while let Ok(Some(frame)) = iter.next() {
frames.push(frame);
}
assert_eq!(
frames,
vec![
FrameAddress::from_instruction_pointer(0x1003fc000 + 0x1292c0),
FrameAddress::from_return_address(0x1003fc000 + 0x100dc4).unwrap(),
FrameAddress::from_return_address(0x1003fc000 + 0x12ca28).unwrap()
]
);
Here's a list of articles I found useful during development:
I used these tools very frequently:
llvm-dwarfdump --eh-frame mylib.so
to display DWARF unwind information.llvm-objdump --section-headers mylib.so
to display section information.unwindinfodump mylib.dylib
to display compact unwind information. (Install using cargo install --examples macho-unwind-info
, see macho-unwind-info.)Licensed under either of
LICENSE-APACHE
or http://www.apache.org/licenses/LICENSE-2.0)LICENSE-MIT
or http://opensource.org/licenses/MIT)at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.