Crates.io | hebrew_unicode_script |
lib.rs | hebrew_unicode_script |
version | 0.5.0 |
source | src |
created_at | 2024-07-13 12:36:26.005393 |
updated_at | 2024-11-16 19:44:23.986661 |
description | A lightweight library to check if a hebrew character belongs to certain collections |
homepage | |
repository | https://github.com/Roestdev/hebrew_unicode_script/ |
max_upload_size | |
id | 1302304 |
size | 320,062 |
Hebrew_Unicode_Script
This crate (hebrew_unicode_script
) is a low-level library written in Rust and designed to facilitate the identification and validation of Unicode characters (unicode code points) related to the Hebrew script and associated unicode code blocks.
Both a check on individual characters and membership of collections are possible. Examples of collections are vowels, yiddish characters, punctations etc..
More information can be found in the file ARCHITECTURE.
This library provides two types of interfaces:
functions
trait (the same functions but behind a trait).
Each function in this library returns a boolean value, making it easy to integrate these controls into existing or new applications.
For an overview of released versions see releases.
Basic usage:
use hebrew_unicode_script::is_hbr_consonant_mem;
use hebrew_unicode_script::is_hbr_consonant_normal;
use hebrew_unicode_script::is_hbr_consonant;
use hebrew_unicode_script::is_script_hbr_consonant;
assert!(is_hbr_consonant_mem('מ'));
assert!(is_hbr_consonant_normal('מ'));
assert!(is_hbr_consonant('מ'));
assert!(is_script_hbr_consonant('מ'));
use hebrew_unicode_script::is_hbr_block;
if is_hbr_block('מ') {
println!("The character you entered is part of the 'unicode code block Hebrew'");
}
use hebrew_unicode_script::is_hbr_block;
if is_hbr_block('מ') {
println!("The character you entered is part of the 'unicode code block Hebrew'");
}
use hebrew_unicode_script::{is_hbr_consonant_final, is_hbr_consonant};
let test_str = "ךםןףץ";
for c in test_str.chars() {
assert!(is_hbr_consonant_final(c));
assert!(is_hbr_consonant(c));
}
A more complex example:
use hebrew_unicode_script::{is_hbr_accent,is_hbr_mark, is_hbr_point, is_hbr_punctuation};
use hebrew_unicode_script::{is_hbr_consonant_final,is_hbr_yod_triangle,is_hbr_ligature_yiddish};
fn main() {
// define a strings of characters
let string_of_chars = "יָ֭דַעְתָּ שִׁבְתִּ֣י abcdefg וְקוּמִ֑י";
// get a structures that indicates if a type is present or not (bool)
let chartypes = get_character_types(string_of_chars);
// print the results
println!("The following letter types are found in: {}", string_of_chars);
println!("{:?}",chartypes);
}
#[derive(Debug, Default)]
pub struct HebrewCharacterTypes {
accent: bool,
mark: bool,
point: bool,
punctuation: bool,
letter: bool,
letter_normal: bool,
letter_final: bool,
yod_triangle: bool,
ligature_yiddish: bool,
whitespace: bool,
non_hebrew: bool,
}
impl HebrewCharacterTypes {
fn new() -> Self {
Default::default()
}
}
pub fn get_character_types(s: &str) -> HebrewCharacterTypes {
let mut found_character_types = HebrewCharacterTypes::new();
for c in s.chars() {
match c {
c if is_hbr_accent(c) => found_character_types.accent = true,
c if is_hbr_mark(c) => found_character_types.mark = true,
c if is_hbr_point(c) => found_character_types.point = true,
c if is_hbr_punctuation(c) => found_character_types.punctuation = true,
c if is_hbr_consonant_final(c) => found_character_types.letter_final = true,
c if is_hbr_yod_triangle(c) => found_character_types.yod_triangle = true,
c if is_hbr_ligature_yiddish(c) => found_character_types.ligature_yiddish = true,
c if c.is_whitespace() => found_character_types.whitespace = true,
_ => found_character_types.non_hebrew = true,
}
}
found_character_types.letter =
found_character_types.letter_normal | found_character_types.letter_final;
found_character_types
}
Output result:
The following character types were found:
HebrewCharacterTypes {
accent: true,
mark: false,
point: true,
punctuation: false,
letter: true,
letter_normal: true,
letter_final: false,
yod_triangle: false,
ligature_yiddish: false,
whitespace: true,
non_hebrew: true,
}
use hebrew_unicode_script::HebrewUnicodeScript;
assert!( 'מ'.is_script_hbr() );
assert!( !'מ'.is_script_hbr_point() );
assert!( 'ױ'.is_script_hbr_ligature_yiddisch() );
assert!( 'מ'.is_hbr_block() );
assert!( !'מ'.is_hbr_point() );
See the crate modules for more examples.
This crate (hebrew_unicode_script
) uses the #![no_std]
attribute.
It does not depend on any standard library, nor a system allocator.
For installation see the hebrew_unicode_script page at crates.io.
All functions are written in safe Rust.
Not that I am aware of.
All (trait)functions return either true or false.
Current code coverage is 100% [^code coverage] [^code coverage]: The code coverage figures shown in crates.io are (very) different!
To generate the code coverage, I used grconv (see here how to use it).
See https://www.unicode.org/charts/PDF/U0590.pdf
See also: https://graphemica.com/blocks/hebrew/
There are some issues with unicode and Hebrew. These are described on the following web page: Unicode Problems
To learn more about Unicodesee: Unicode main site, Unicode Scripts and Unicode Blocks
See Hebrew Cantillation Marks And Their Encoding for more specifics on this matter.
The hebrew_unicode_script
library is distributed under either of
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this crate by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
I invite you to:
Any input is welcome. To do this, you can submit a request here.
For me it not clear yet if the 'HEBREW POINT JUDEO-SPANISH VARIKA' a reading sign or not. For the time being this code-point will be part of the reading signs ↩