common_substrings

Crates.iocommon_substrings
lib.rscommon_substrings
version1.0.0
sourcesrc
created_at2020-04-11 23:29:22.271208
updated_at2020-04-11 23:29:22.271208
descriptionFinding all common strings
homepagehttps://github.com/hanwencheng/common_substrings_rust
repositoryhttps://github.com/hanwencheng/common_substrings_rust
max_upload_size
id228791
size24,056
Hanwen Cheng (hanwencheng)

documentation

README

Find all common substrings

A method for finding all common strings, particularly quick for large string samples. It only use the Rust std library.

The algorithms uses a two dimension trie to get all the fragment. The vertical one is the standard suffix trie, but all the node of the last word in each suffix is linked, which I call them virtually horizontally linked.

Usage

Use the function get_substrings to get all the common strings in the strings list,

Example

use common_substrings::get_substrings;
let input_strings = vec!["java", "javascript", "typescript", "coffeescript", "coffee"];
let result_substrings = get_substrings(input_strings, 2, 3);

which gives the result list of

Substring(sources: {2, 3}, name: escript, weight: 14)
Substring(sources: {1, 0}, name: java, weight: 8)
Substring(sources: {4, 3}, name: coffee, weight: 12)

Arguments

  • input - The target input string vector.
  • min_occurrences The minimal occurrence of the captured common substrings.
  • min_length The minimal length of the captured common substrings.

Algorithm

Explanation here

Other implementations

License

Apache-2.0

Commit count: 37

cargo fmt