| Crates.io | dash-em |
| lib.rs | dash-em |
| version | 1.0.1 |
| created_at | 2025-11-15 14:27:04.987013+00 |
| updated_at | 2025-11-16 15:26:59.811203+00 |
| description | Enterprise-Grade Em-Dash Removal Library — SIMD-Accelerated String Processing |
| homepage | https://github.com/Gaurav-Gosain/dash-em#readme |
| repository | https://github.com/Gaurav-Gosain/dash-em |
| max_upload_size | |
| id | 1934415 |
| size | 70,275 |
Enterprise-Grade Em-Dash Removal Infrastructure — Leveraging Advanced SIMD Vectorization for Optimal Character Stream Processing
dash-em is an absurdly over-engineered, deliberately meme-grade, production-ready, enterprise-certified string manipulation library designed with singular, unwavering purpose—removing em-dashes (U+2014) from UTF-8 encoded text—with unprecedented obsession.
Building upon decades of accumulated wisdom in systems programming—combined with cutting-edge SIMD acceleration techniques—dash-em delivers truly unnecessary—yet deeply satisfying—performance characteristics in the em-dash elimination category.
🎭 MEME REPOSITORY DISCLOSURE — This project is a deliberately absurd, tongue-in-cheek exploration of over-engineering. The em-dash removal use case is intentionally ridiculous. This is not serious production software, despite being written with genuine engineering rigor. Enjoy the absurdity.
Instead of checking characters one by one—which is slow—dash-em uses SIMD instructions to process 16–64 bytes in parallel. Modern CPUs can do this crazy fast—we just have to tell them what to do.
Real-world speedups (measured across multiple architectures):
The library auto-detects your CPU and picks the fastest path:
It's absurdly optimized. Maybe too optimized. But it works—and it's fast.
Multi-architecture SIMD performance (statistical benchmarks):
| Pattern | macos-14-aarch64 | windows-2022-msvc | ubuntu-22.04-gcc | ubuntu-22.04-clang |
|---|---|---|---|---|
| sparse | 3.89 GB/s (2.63x) | 12.33 GB/s (11.47x) | 12.78 GB/s (8.71x) | 9.67 GB/s (6.61x) |
| moderate | 2.73 GB/s (1.82x) | 7.23 GB/s (6.55x) | 7.55 GB/s (5.58x) | 6.93 GB/s (5.10x) |
| dense | 1.53 GB/s (0.67x) | 1.54 GB/s (1.06x) | 2.57 GB/s (1.07x) | 2.18 GB/s (1.09x) |
| alternating | 1.53 GB/s (0.67x) | 1.58 GB/s (1.09x) | 2.57 GB/s (1.07x) | 2.18 GB/s (0.94x) |
| boundary | 3.28 GB/s (2.16x) | 9.84 GB/s (9.24x) | 10.45 GB/s (7.45x) | 9.57 GB/s (6.88x) |
| no | 4.27 GB/s (2.90x) | 12.02 GB/s (11.10x) | 15.58 GB/s (10.51x) | 13.66 GB/s (9.21x) |
Comparing dash-em bindings against native byte-level implementations:
| Language | Test Pattern | Native (μs) | dash-em (μs) | Speedup |
|---|---|---|---|---|
| javascript | alternating | 66.1 | 19.0 | 3.48x |
| javascript | dense | 127.4 | 53.6 | 2.38x |
| javascript | moderate | 302.7 | 50.0 | 6.05x |
| javascript | no | 2932.3 | 94.1 | 31.14x |
| javascript | sparse | 2861.1 | 107.6 | 26.58x |
| python | alternating | 3624.9 | 15.5 | 234.20x |
| python | dense | 7577.0 | 31.2 | 243.19x |
| python | moderate | 18888.8 | 73.3 | 257.74x |
| python | no | 186188.6 | 345.9 | 538.21x |
| python | sparse | 193434.6 | 915.9 | 211.21x |
git clone https://github.com/Gaurav-Gosain/dash-em
cd dash-em
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make
sudo make install
npm install dash-em
String API (easy to use):
const dashem = require('dash-em');
const result = dashem.remove('Hello—world');
console.log(result); // Output: Helloworld
Buffer API (high-performance, zero-copy):
const dashem = require('dash-em');
// Process Buffer directly (10-26x faster than string API)
const input = Buffer.from('Hello—world', 'utf-8');
const output = dashem.removeBuffer(input);
console.log(output.toString('utf-8')); // Output: Helloworld
// In-place modification (fastest, modifies input)
const buffer = Buffer.from('Hello—world', 'utf-8');
const newLength = dashem.removeBufferInPlace(buffer);
console.log(buffer.slice(0, newLength).toString('utf-8')); // Output: Helloworld
pip install dash-em
import dashem
result = dashem.remove('Hello—world')
print(result) # Output: Helloworld
go get github.com/Gaurav-Gosain/dash-em/go
package main
import (
"fmt"
"github.com/Gaurav-Gosain/dash-em/go"
)
func main() {
result, _ := dashem.Remove("Hello—world")
fmt.Println(result) // Output: Helloworld
}
[dependencies]
dash-em = "1.0"
fn main() {
let result = dash_em::remove("Hello—world").unwrap();
println!("{}", result); // Output: Helloworld
}
public class Example {
public static void main(String[] args) {
String result = Dashem.remove("Hello—world");
System.out.println(result); // Output: Helloworld
}
}
string result = Dashem.Remove("Hello—world");
Console.WriteLine(result); // Output: Helloworld
<?php
$result = dashem_remove('Hello—world');
echo $result; // Output: Helloworld
?>
require 'dashem'
result = Dashem.remove('Hello—world')
puts result # Output: Helloworld
import Dashem
let result = removeEmDashes("Hello—world")
print(result) // Output: Helloworld
Comprehensive bindings are provided for—and thoroughly tested against—the following languages:
dash-em compiles to—high-performance WebAssembly modules supporting multiple target specifications:
# wasm32 (Emscripten)
cd bindings/wasm && ./build.sh
# WASI (WebAssembly System Interface)
WASI_SDK_PATH=/opt/wasi-sdk ./build.sh wasi
The dispatch system checks your CPU once, then uses the best available implementation for all subsequent calls:
graph TD
A["dashem_remove()"] --> B{Input < 32 bytes?}
B -->|Yes| C["fast_small() scalar"]
B -->|No| D["Check CPU capabilities<br/>(cached after first call)"]
D --> E{AVX-512?}
E -->|Yes| F["dashem_remove_avx512"]
E -->|No| G{AVX2?}
G -->|Yes| H["dashem_remove_avx2_unrolled"]
G -->|No| I{SSE4.2?}
I -->|Yes| J["dashem_remove_sse42"]
I -->|No| K{ARM NEON?}
K -->|Yes| L["dashem_remove_neon"]
K -->|No| M["dashem_remove_scalar"]
C --> N["Output"]
F --> N
H --> N
J --> N
L --> N
M --> N
/**
* Remove em-dashes from UTF-8 string
*
* @param input Input UTF-8 string
* @param input_len Length of input in bytes
* @param output Output buffer
* @param output_cap Output buffer capacity
* @param output_len Output length (set on return)
* @return 0 on success, -1 on buffer overflow, -2 on invalid input
*/
int dashem_remove(
const char *input,
size_t input_len,
char *output,
size_t output_capacity,
size_t *output_len
);
/**
* Get library version
* @return Version string (e.g., "1.0.0")
*/
const char* dashem_version(void);
/**
* Get active implementation name
* @return Implementation name (e.g., "AVX2", "SSE4.2", "Scalar")
*/
const char* dashem_implementation_name(void);
/**
* Detect available CPU features
* @return Bitmask of DASHEM_CPU_* flags
*/
uint32_t dashem_detect_cpu_features(void);
/**
* Remove em-dashes from a string
* @param {string} input - Input string
* @returns {string} String with em-dashes removed
*/
dashem.remove(input);
/**
* Remove em-dashes from a Buffer (zero-copy, high-performance)
* @param {Buffer} buffer - Input Buffer
* @returns {Buffer} New Buffer with em-dashes removed
*/
dashem.removeBuffer(buffer);
/**
* Remove em-dashes from a Buffer in-place (ultra-fast, modifies input!)
* WARNING: This modifies the input Buffer
* @param {Buffer} buffer - Input Buffer (will be modified)
* @returns {number} New length of valid data in buffer
*/
dashem.removeBufferInPlace(buffer);
/**
* Get library version
* @returns {string} Version string
*/
dashem.version();
/**
* Get implementation name
* @returns {string} Implementation name (e.g., "AVX2", "SSE4.2")
*/
dashem.implementationName();
Performance Notes:
remove(): Easy to use but includes UTF-8 conversion overheadremoveBuffer(): 10-26x faster than remove(), zero-copy operationremoveBufferInPlace(): Fastest option, modifies input buffer directlyPerformance benchmarks are automatically updated via GitHub Actions. See the Performance section above for the latest results across all architectures and language bindings.
To run benchmarks locally:
C/C++ Core Library:
cd build && ./bench_dashem
Multi-Language Benchmarks:
cd benchmarks
./run_all_benchmarks.sh
Results are generated in JSON format for easy integration with continuous performance monitoring systems.
Comprehensive test suites—validated across all supported platforms—ensure—correctness and reliability:
# C/C++ tests
cd build && ctest
# Language-specific tests
npm test # JavaScript
python -m pytest # Python
cargo test # Rust
go test ./... # Go
dash-em leverages GitHub Actions—to ensure—consistent quality across:
Contributions are welcome! Please ensure:
MIT License — See LICENSE file for details.
If dash-em is utilized in—academic or—commercial contexts, please reference:
@software{gosain2025dashem,
title={dash-em: Enterprise-Grade Em-Dash Removal Infrastructure},
author={Gosain, Gaurav},
year={2025},
url={https://github.com/Gaurav-Gosain/dash-em}
}
This project exists because—em-dashes matter—and they deserve—the most efficient, highly optimized—removal mechanism—available on modern computing platforms.
Building excellence—one em-dash at a time. —
Version: 1.0.1 | Status: Production-Ready | License: MIT | Repository: github.com/Gaurav-Gosain/dash-em