Crates.io | tokenizer_py |
lib.rs | tokenizer_py |
version | 0.2.0 |
source | src |
created_at | 2024-02-20 10:05:47.217908 |
updated_at | 2024-02-24 14:13:06.061661 |
description | crate with a tokenizer that works like a Python tokenizer |
homepage | |
repository | https://github.com/salam99823/tokenizer |
max_upload_size | |
id | 1146184 |
size | 47,352 |
This project implements a Python-like tokenizer in Rust.
It can tokenize a string into a sequence of tokens, which are
represented by the Token
enum. The supported tokens are:
Token::Name
: a name token, such as a function or variable name.Token::Number
: a number token, such as a literal integer or floating-point number.Token::String
: a string token, such as a single or double-quoted string.Token::OP
: an operator token, such as an arithmetic or comparison operator.Token::Indent
: an indent token, indicating that a block of code is being indented.Token::Dedent
: a dedent token, indicating that a block of code is being dedented.Token::Comment
: a comment token, such as a single-line or multi-line comment.Token::NewLine
: a newline token, indicating a new line in the source code.Token::NL
: a token indicating a new line, for compatibility with the original tokenizer.Token::EndMarker
: an end-of-file marker.The tokenizer recognizes the following tokens:
Whitespace
: spaces, tabs, and newlines.Numbers
: integers and floating-point numbers.
float
: floats numbers.int
: integer numbers.complex
: complex numbers.Names
: identifiers and keywords.Strings
: single- and double-quoted strings.
basic-String
: single- and double-quoted strings.format-String
: format string from python.byte-String
: byte string from python.raw-String
: raw string.multy-line-String
: single- and double-quoted multy-line-string.combined-string
: string with combined prefix.Operators
: arithmetic, comparison, and other operators.Comments
: single-line comments.The tokenizer also provides a tokenize
method that takes a string as input and returns a Result
containing a vector
of tokens.
Add this to your Cargo.toml
:
[dependencies]
tokenizer_py = "0.2.0"
use tokenizer_py::{tokenize, Token};
let tokens = tokenize("hello world").unwrap();
assert_eq!(tokens, vec![
Token::Name("hello".to_string()), // Token of the name "hello"
Token::Name("world".to_string()), // Token of the name "world"
Token::NewLine, // New line token
Token::EndMarker, // End of text token
]);
use tokenizer_py::{tokenize, Token};
// Structure representing a binary expression
struct BinaryExp {
left: Token,
center: Token,
right: Token,
}
impl BinaryExp {
// Method for creating a new instance of BinaryExp
fn new(left: Token, center: Token, right: Token) -> Self {
BinaryExp { left, center, right }
}
// Method for executing the binary expression
fn execute(&self) -> Result<isize, <isize as std::str::FromStr>::Err> {
use Token::{Number, OP};
match (&self.left, &self.center, &self.right) {
(Number(ref left), OP(ref op), Number(ref right)) => {
let (left, right) = (
left.parse::<isize>()?, right.parse::<isize>()?
);
match op.as_str() {
"+" => Ok(left + right),
"-" => Ok(left - right),
"*" => Ok(left * right),
"/" => Ok(left / right),
"%" => Ok(left % right),
_ => panic!("Invalid operator"), // Invalid operator
}
}
_ => panic!("Invalid tokens"), // Invalid tokens
}
}
}
let mut tokens = tokenize("10 + 10").unwrap();
let _ = tokens.pop(); // Remove Token::EndMarker
let _ = tokens.pop(); // Remove Token::NewLine
let binexp = BinaryExp::new(
tokens.pop().unwrap(),
tokens.pop().unwrap(),
tokens.pop().unwrap()
);
assert_eq!(binexp.execute(), Ok(20)); // Checking the execution result