| Crates.io | rdump |
| lib.rs | rdump |
| version | 0.1.7 |
| created_at | 2025-07-31 20:04:25.359567+00 |
| updated_at | 2025-10-22 01:32:08.840189+00 |
| description | A fast, expressive, and language-aware file search tool. |
| homepage | |
| repository | https://github.com/almaclaine/rdump |
| max_upload_size | |
| id | 1775642 |
| size | 337,256 |
rdump — The Definitive Developer's Guide to Code-Aware Searchrdump is a next-generation, command-line tool for developers. It finds and processes files by combining filesystem metadata, content matching, and deep structural code analysis.
It's a developer's swiss-army knife for code discovery. It goes beyond the text-based search of tools like grep and ripgrep by using tree-sitter to parse your code into a syntax tree. This allows you to ask questions that are impossible for other tools to answer efficiently:
rdump is written in Rust for blazing-fast performance, ensuring that even complex structural queries on large codebases are executed in moments.
rdump?
rdump Query Language (RQL) — A Deep Dive
rdump: Adding a New Languagerdump? A Comparative LookFor decades, developers have relied on text-based search tools like grep, ack, and ripgrep. These tools are phenomenal for finding literal strings and regex patterns. However, they share a fundamental limitation: they don't understand code. They see a file as a flat sequence of characters.
This leads to noisy and inaccurate results for code-related questions. A grep for User will find:
struct User definition.NewUser.user_permission.User."Failed to create User".rdump Solution: Structural Awarenessrdump sees code the way a compiler does: as a structured tree of nodes. It uses the powerful tree-sitter library to parse source code into a Concrete Syntax Tree (CST).
This means you can ask for struct:User, and rdump will navigate the syntax tree to find only the node representing the definition of the User struct. This is a paradigm shift in code search.
| Feature | ripgrep / grep |
semgrep |
rdump |
|---|---|---|---|
| Search Paradigm | Regex / Literal Text | Abstract Semantic Patterns | Metadata + Content + Code Structure |
| Primary Use Case | Finding specific lines of text | Enforcing static analysis rules | Interactive code exploration & filtering |
| Speed | Unmatched for text search | Fast for patterns | Very fast; optimizes by layer |
Query func:foo |
grep "func foo" (noisy) |
pattern: function foo(...) |
func:foo (precise) |
Query size:>10kb |
No | No | size:>10kb (built-in) |
Query import:react |
grep "import.*react" (noisy) |
pattern: import ... from "react" |
import:react (precise) |
| Combine Filters | Possible via shell pipes | Limited | **Natively via RQL (&, ` |
rdump's power and simplicity are not accidental; they are the result of deliberate architectural choices and the leveraging of best-in-class libraries from the Rust ecosystem. This section details how these pieces fit together to create a performant, modular, and extensible tool.
At its heart, rdump is a highly optimized pipeline. It starts with a massive set of potential files and, at each stage, applies progressively more powerful (and expensive) filters to narrow down the set.
[Query String] -> [1. CLI Parser (clap)] -> [2. RQL Parser (pest)] -> [AST] -> [3. Evaluator Engine] -> [Matched Files] -> [6. Formatter (syntect)] -> [Final Output]
|
V
[4. Predicate Trait System]
|
+------> [Metadata Predicates]
|
+------> [Content Predicates]
|
+------> [5. Semantic Engine (tree-sitter)]
clapclap (Command Line Argument Parser)clap is the face of rdump. It provides a declarative macro-based API to define the entire CLI structure: subcommands (search, lang, preset), flags (--format, -C), and arguments (<QUERY>).rdump --help is generated for free, perfectly in sync with the defined CLI.main function, providing a single, clear entry point to the application's logic.pestpest (Parser-Expressive Syntax Trees)pest transforms the human-readable RQL query string (e.g., "ext:rs & (struct:User | !path:tests)") into a machine-readable Abstract Syntax Tree (AST).src/rql.pest). This allows the language syntax to evolve independently of the Rust code that processes it.pest generates a robust parser with excellent, human-readable error messages out of the box (e.g., "error: expected logical_op, found...").build_ast_from_pairs function in src/parser.rs recursively walks to build our AstNode enum (e.g., AstNode::LogicalOp(...)).pest and a list of candidate files, and returns only the files that match the query.AstNode tree. If it sees a LogicalOp, it calls itself on the left and right children. If it sees a Predicate, it dispatches to the predicate system.ext:rs & struct:User, if ext:rs returns false, the evaluator immediately stops and does not execute the expensive struct:User predicate. This is a critical performance optimization.trait objects)rdump's modularity. Each predicate (ext, size, contains, func, etc.) is an independent module that implements a common Predicate trait.Box<dyn Predicate>. When it encounters a predicate key in the AST, it dynamically finds and executes the correct predicate's evaluate() method.author:<name>, a developer simply needs to:
src/predicates/author.rs.Predicate trait for an AuthorPredicate struct.tree-sittertree-sitter and its Rust binding.tree-sitter is the universal parser that powers all code-aware predicates. It takes source code text and produces a concrete syntax tree.tree-sitter query against a syntax tree.tree-sitter grammar (as a crate)..scm files containing tree-sitter queries (e.g., (function_definition name: (identifier) @func-name)).func support for a new language involves writing a one-line query in a text file, not writing complex Rust code to traverse a language-specific AST.rayonrayonrayon is the secret sauce for rdump's performance on multi-core machines. While the evaluator processes a single query, the file search itself is a massively parallel problem. rayon provides incredibly simple, data-parallel iterators.rayon, converting a sequential iterator over files into a parallel one is often a one-line change (e.g., files.iter() becomes files.par_iter()). rayon handles thread pooling, work-stealing, and synchronization automatically.rayon's design guarantee that this parallelism is memory-safe, preventing data races at compile time.rdump to scale its performance linearly with the number of available CPU cores, making it exceptionally fast on modern hardware when searching large numbers of files.syntectsyntectsyntect uses the same syntax and theme definitions as Sublime Text, providing robust, accurate, and beautiful highlighting for a vast number of languages.SYNTAX_SET and THEME_SET are wrapped in once_cell::sync::Lazy to ensure they are loaded from disk and parsed only once on the first use, making subsequent runs faster.Format enum allows the print_output function to act as a clean dispatcher, routing to different printing functions (print_highlighted_content, print_markdown_fenced_content, etc.) based on the user's choice. This keeps the presentation logic clean and separated.If you have the Rust toolchain (rustup), you can install directly from Crates.io. This ensures you have the latest version.
cargo install rdump
Pre-compiled binaries for Linux, macOS, and Windows are available on the GitHub Releases page. Download the appropriate archive, extract the rdump executable, and place it in a directory on your system's PATH.
To build rdump from source, you'll need git and the Rust toolchain.
git clone https://github.com/user/repo.git
cd rdump
cargo build --release
# The executable will be at ./target/release/rdump
./target/release/rdump --help
rdump "str:/[A-Za-z0-9_\\-]{20,}/ & !path:test"
rdump "(comment:ignore | comment:skip) & name:*test*"
db or repository package:
rdump "str:/SELECT.*FROM/ & !(path:/db/ | path:/repository/)"
rdump "call:process_payment" --format hunks -C 3
rdump "ext:go & size:>50kb" --format find
# This is a two-step process, but rdump helps find the candidates
rdump "ext:py & func:." --format json > funcs.json
# Then, a script could check which function names from funcs.json are never found with a `call:` query.
rdump "ext:rs & (struct:. | enum:.) & !path:tests"
rdump "contains:APP_PORT"
api/ directory.
rdump "path:src/api/ & func:."
rdump "name:Dockerfile & !contains:/@sha256:/"
rdump "ext:toml & size:>1kb & modified:<2d" --format find
.tmp files older than a week.
rdump "ext:tmp & modified:>7d" --format paths | xargs rm -v
# This is an approximation, but effective.
# It finds functions where the text content of the function node is over 1200 bytes.
rdump "func:. & size:>1200b"
GET endpoints that are missing a call to an authentication middleware.
rdump "ext:go & func:/^Get/ & !call:requireAuth"
rdump "(str:. | contains:/ \d+;/) & !contains:/const / & !contains:/let / & !contains:/var /"
rdump Query Language (RQL) — A Deep Dive(This section is intentionally verbose for complete clarity.)
key:value pair (e.g., ext:rs).& (AND), | (OR). Precedence is ! > & > |, so wrap groups in parentheses when in doubt.! negates a predicate or group (e.g., !ext:md).() controls the order of operations (e.g., ext:rs & (contains:foo | contains:bar)).' or " for values with spaces or special characters (e.g., contains:'fn main()').rdump is fast, but you can make it even faster by writing efficient queries. The key is to eliminate the most files with the cheapest predicates first.
ext:rs & struct:User
rdump first finds all .rs files (very cheap), then runs the expensive struct parser only on that small subset.struct:User & ext:rs
rdump's engine is smart enough to likely re-order this, writing it this way is logically less efficient. It implies parsing every file to look for a struct, then checking its extension.path:models/ & ext:rs & struct:User
Golden Rule: Always lead with path:, name:, or ext: if you can.
Predicates are the core of RQL. They are grouped into three categories based on what they inspect.
These predicates operate on filesystem metadata and are extremely fast. Always use them first in your query to narrow the search space.
| Key | Example | Description |
|---|---|---|
ext |
ext:ts |
Matches the file extension. Case-insensitive. |
name |
name:"*_test.go" |
Matches the filename (the part after the last / or ``) against a glob pattern. |
path |
path:src/api |
Matches if the given substring appears anywhere in the full relative path of the file. |
in |
in:"src/commands" |
Matches all files that are descendants of the given directory. |
size |
size:>=10kb |
Filters by file size. Operators: >, <, >=, <=, =. Units: b, kb, mb, gb. |
modified |
modified:<2d |
Filters by last modification time relative to now. Units: m (minutes), h (hours), d (days), w (weeks), y (years). |
These predicates inspect the raw text content of a file. They are slower than metadata predicates but faster than code-aware ones.
| Key | Example | Description |
|---|---|---|
contains |
contains:"// HACK" |
Fast literal substring search. It does not support regular expressions. |
matches |
matches:"/user_[a-z]+/" |
Slower but powerful regex search. The value must be a valid regular expression. |
These are rdump's most powerful feature. They parse the code with tree-sitter to understand its structure. These are the most expensive predicates; use them after narrowing the search with metadata and content predicates.
| Key | Example | Description |
|---|---|---|
def |
def:User |
Finds a generic definition (e.g., a class in Python, a struct in Rust, a type in Go). |
func |
func:get_user |
Finds a function or method definition. |
import |
import:serde |
Finds an import, use, or require statement. |
call |
call:println |
Finds a function or method call site. |
comment |
comment:TODO |
Finds text within any code comment (//, #, /* ... */, etc.). |
str |
str:"api_key" |
Finds text only inside a string literal (e.g., "api_key" or 'api_key'). Much more precise than contains. |
class |
class:ApiHandler |
Finds a class definition. |
struct |
struct:Point |
Finds a struct definition (primarily for Rust/Go). |
enum |
enum:Status |
Finds an enum definition. |
interface |
interface:Serializable |
Finds an interface definition (primarily for Go/TypeScript/Java). |
trait |
trait:Runnable |
Finds a trait definition (primarily for Rust). |
type |
type:UserID |
Finds a type alias definition. |
impl |
impl:User |
Finds an impl block (Rust). |
macro |
macro:println |
Finds a macro definition or invocation (Rust). |
component |
component:Button |
React: Finds a JSX element definition (e.g., <Button ... />). |
element |
element:div |
React: Finds a specific JSX element by its tag name (e.g., <div>). |
hook |
hook:useState |
React: Finds a call to a standard React hook. |
customhook |
customhook:useAuth |
React: Finds a call to a custom hook (a function starting with use). |
prop |
prop:onClick |
React: Finds a JSX prop (attribute) being passed to a component. |
The "Match All" Wildcard: Using a single dot . as a value for a predicate means "match any value". This is useful for checking for the existence of a node type.
rdump "ext:rs & struct:." — Find all Rust files that contain any struct definition.rdump "ext:py & !import:." — Find all Python files that have no import statements.Searching for Absence: The ! operator is very powerful when combined with the wildcard.
rdump "ext:js & !func:." — Find JavaScript files that contain no functions (e.g., pure data/config files).Escaping Special Characters: If you need to search for a literal quote, you can escape it.
rdump "str:'hello \'world\''" — Finds the literal string 'hello 'world''.Negating Groups: Find Rust files that are not in the tests or benches directory.
rdump "ext:rs & !(path:tests/ | path:benches/)"
Distinguishing Content Types: contains:"foo" finds foo anywhere. str:"foo" finds foo only inside a string literal. This is much more precise.
Forcing Evaluation Order: Use parentheses to ensure logical correctness for complex queries.
# Find JS or TS files that either import React or define a 'Component' class
rdump "(ext:js | ext:ts) & (import:react | class:Component)"
Filtering OR Groups: Because & binds tighter than |, wrap OR chains in parentheses before applying a shared filter.
rdump "(in:src/frontend/**/* | in:src/backend/**/* ) & !ext:ico"
(Sections for lang and preset are omitted for brevity but would be here)
rdump searchThe primary command. Can be omitted (rdump "ext:rs" is the same as rdump search "ext:rs").
Usage: rdump [OPTIONS] <QUERY>
Options:
| Flag | Alias | Description |
|---|---|---|
--format <FORMAT> |
-f |
Sets the output format. See Output Formats. |
--context <LINES> |
-C |
Includes <LINES> of context around matches in hunks format. |
--preset <NAME> |
-p |
Uses a saved query preset. |
--no-ignore |
Disables all ignore logic. Searches everything. | |
--hidden |
Includes hidden files and directories (those starting with .). |
|
--config-path <PATH> |
Path to a specific rdump.toml config file. |
|
--help |
-h |
Displays help information. |
--version |
-V |
Displays version information. |
| Format | Description |
|---|---|
hunks |
(Default) Shows only the matching code blocks, with optional context. |
markdown |
Wraps results in Markdown, useful for reports. |
json |
Machine-readable JSON output with file paths and content. |
paths |
A simple, newline-separated list of matching file paths. Perfect for piping. |
cat |
Concatenated content of all matching files. |
find |
ls -l-style output with permissions, size, modified date, and path. |
config.toml Filerdump merges settings from a global and a local config file. Local settings override global ones.
~/.config/rdump/config.toml.rdump.toml (in the current directory or any parent)..rdumpignore Systemrdump respects .gitignore by default and provides its own .rdumpignore for more control.
rdump: Adding a New LanguageAdding support for a new language is possible if there is a tree-sitter grammar available for it. This involves:
tree-sitter grammar..scm query files to capture semantic nodes.rdump's language profiles.ext:, path:, or name: first.rdump isn't finding a file I know is there.
--no-ignore to check.! or &?
contains:'&'.(Illustrative) rdump is designed for accuracy and expressiveness, but it's still fast. On a large codebase (e.g., the Linux kernel):
ripgrep "some_string": ~0.1srdump "contains:some_string": ~0.5srdump "ext:c & func:some_func": ~2.0srdump will never beat ripgrep on raw text search, but ripgrep can't do structural search at all.
Contributions are welcome! Please check the GitHub Issues.
This project is licensed under the MIT License.