| Crates.io | random-person-name |
| lib.rs | random-person-name |
| version | 0.1.0 |
| created_at | 2025-06-06 01:09:19.95913+00 |
| updated_at | 2025-06-06 01:09:19.95913+00 |
| description | A markov chain based approach to random name generation. Allows a user to define positive and negative weights for character ngrams. |
| homepage | |
| repository | https://github.com/NSriram5/random-person-name |
| max_upload_size | |
| id | 1702502 |
| size | 47,159 |
A library for reading names and making guesses at derived names using ngrams to guess the next character in the sequence
A primary goal of this project is to have a smaller memory footprint than implementations that store lists of names with the library or rely on a lookup within a corpus. While also providing a mechanism that produces some know
NameExperiment N=2 or N=3 are reasonable starting points.Name struct to handle raw &str of name text or perform manual conversion from &str to &[Option<char>]. Dedupe if desired.NameExperiment::read_positive_sample on each.NameExperiment::build_random_name. Apply external analysis to separate valid names from non names.NameExperiment by continuing to call NameExperiment::read_positive_sample and NameExperiment::read_negative_sample using valid and invalid names.let mut name_guess_experiments: NameExperiments<3> = NameExperiments::new();
let orc_names: &[str] = &["Morgash", "Nargul", "Snarlgash"];
let names = Name::new_from_batch(orc_names,
"male",
PaddingBias::Left,
Some("Orc"),
None,
None,
None
);
for n in names.iter() {
let _ = name_guess_experiments.read_positive_sample(&n.text).unwrap();
}
let new_name = name_guess_experiments.build_random_name(Some(16)).unwrap();
println!("Hello, {}!", new_name);
This library exports a struct of NameExperiments and supports the analysis and extraction of probability distributions of character combinations.
To start, define a new NameExperiments with a generic const parameter N. N indicates how many characters to look backwards while analyzing a name
(Values of N less than 2 will result in a panic when NameExperiments::new() is called).
The NameExperiments::read_positive_sample function can be used to iterate through a list of names. This library assumes that a user will utilize the text field in the included Name struct,
but this can be bypassed by passing an array slice of Option<char> into read_positive_sample
Note: The
read_positive_samplefunction makes no attempt to de-duplicate text that has already be read. If the same name is read into a NameExperiments struct weights around that name's character sequences will become stronger. This might not be the intent; users of this library are advised to apply filtering or de-duplication earlier in their data pipeline.
Aside from gaining data about names, the NameExperiments struct can also read array slices of characters that are decidedly not names. The determination of what is or isn't a name is up to the
user of the API. But as a starting point, this can help to de-weight ngrams that would result in long sequences of vowels, consonants or simply letters that don't often follow one another.
Use NameExperiments::read_negative_sample to update weights that should correspond de-emphasized character sequences.
Note: Again,
read_negative_sampledoes not de-duplicate names.
Under the hood, the weights of the samples are stored within Four total Vec that are size allocated when "new" is called. Two of the Vec instances are used to hold observations about character
sequences and the count of an N+1 character observations in an array of length corresponding to the number of ValidChar variants.
The other two Vec instances hold observation data about character type sequences and following character type encounters in an array of length corresponding to the number of CharType variants.
All observation is stored in u8 format to minimize the memory impact of the weights (see Intended Goal), but analysis of larger data sets with frequent occurences of the same ngram sets may prove this
primitive too small.
Given an N`` number of preceding characters assuming that there are 29 valid characters and 10 character types the NameExperimentholds twoVecof capacity29^Nand each array within the vec will be size 29 bytes. Meanwhile the two char_type sample weights will be10^Nwith arrays of size 10 bytes. In the case ofN=2memory footprint is estimated to be 51 kB. In the case ofN=3` memory footprint is estimated to be 1.4 MB.
For reference: In a system that loads a corpus of names (of average length 8). 1.4 MB could hold around 22,400 names. But would be dependant on a user to provide the names.