pragmatic-segmenter

Crates.iopragmatic-segmenter
lib.rspragmatic-segmenter
version0.1.3
sourcesrc
created_at2020-08-20 11:59:54.80333
updated_at2023-07-06 14:54:07.098095
descriptionRust port of pySBD v3.1.0.
homepagehttps://github.com/simnalamburt/rust-pragmatic-segmenter
repositoryhttps://github.com/simnalamburt/rust-pragmatic-segmenter
max_upload_size
id278603
size84,864
Hyeon Kim (김지현) (simnalamburt)

documentation

https://docs.rs/pragmatic-segmenter

README

rust-pragmatic-segmenter version

Rust port of pySBD v3.1.0 and Ruby pragmatic_segmenter. Documentations

rust-pragmatic-segmenter is rule-based SBD. It uses a lot of regular expressions to separate sentences.

use pragmatic_segmenter::Segmenter;

let segmenter = Segmenter::new()?;
let result: Vec<_> = segmenter.segment("Hi Mr. Kim. Let's meet at 3 P.M.").collect();
//=> vec!["Hi Mr. Kim. ", "Let's meet at 3 P.M."]

How to build

sudo apt install -y libclang-dev
cargo build

TODOs

  • Perfectly match the behavior with pySBD (current: 99%)
  • Support languages other than English
  • Remove regexes with look around and back references
  • Try Intel Hyperscan
  • Fix mistakes of pySBD, possibly send patches to the upstream
  • Optimize copies and allocations
  • Use proper error types instead of Boxed error
  • Import test cases from pySBD and ruby pragmatic_segmenter

 


rust-pragmatic-segmenter is primarily distributed under the terms of both the Apache License (Version 2.0) and the MIT license. See COPYRIGHT for details.

Commit count: 89

cargo fmt