Crates.io | opstr |
lib.rs | opstr |
version | 1.1.0 |
source | src |
created_at | 2024-04-07 22:22:59.770931 |
updated_at | 2024-04-07 23:24:51.418218 |
description | ‘Operate on strings’ command line utility |
homepage | |
repository | https://github.com/typho/opstr |
max_upload_size | |
id | 1199575 |
size | 9,215,457 |
author: tajpulo
version: 1.1.0
As a software developer, I often need to look at strings and apply operations to them. I frequently use python on the commandline or resort to client-side web applications. But the operations are always the same and should be accessible with one CLI call.
I built opstr, so you can throw a bunch of strings in and get the result of various operations out. Or you specify an operation and get a predictable result. It also simplifies to run string operations in your shell.
To apply operations to strings.
Anyone working with text strings (in the Unicode sense, so as sequence of codepoints).
Install me via crates.io:
cargo add opstr
opstr --op utf8-bytes "hello"
to get [104, 101, 108, 108, 111]
Please lists the help menu to see all options to configure opstr
.
Here I would like to mention that most options can also be provided as environment variable.
Hence you can avoid to specify the option at every CLI call, but one set them once.
The list of environment variables is:
OPSTR_RADIX
: the radix used for integers printed outOPSTR_HEX_UPPER
: print hexadecimal alphabetic digits with uppercase letters, not lowercase lettersOPSTR_COLOR_SCHEME
: the color scheme for the outputOPSTR_LOCALE
: locale to use for locale-dependent operations (only en-US
works per default)OPSTR_SYNTAX
: the output representation syntax to useLocales are tricky, because the executable would be impractically large if I ship all locales.
Instead, you need to generate locale data yourself; compare with icu4x data management and replace en-us
with your locale in this call:
icu4x-datagen -W -o data/icu4x_en-us.blob2 --include-collations search-all --trie-type small --locales en-us --keys all --format blob
The environment variable OPSTR_LOCALE_DATAFILE
needs to point to the .blob2
file to load and you need to specify the locale as CLI argument or enviroment variable to make it work properly. Since you might have a different path for every locale you specify, the string {filepath}
inside the environment variable will be replaced by the specified locale.
If you have a new function to implement …
We follow semver principles:
--op
is specified (more specifically, the internal priority) only requires a patch releaseWhat to pay attention to before creating a new release:
icu4x-datagen -W -o data/icu4x_en-US.blob2 --include-collations search-all --trie-type small --locales en-us --keys all --format blob
We have one generic op name. If the user specifies a locale, we need to supply a correct Unicode-compatible result (maybe require a proper OPSTR_LOCALE_DATAFILE
). If the user specifies no locale, we need to provide a best-effort Unicode-less alternative.
We can also expose the Unicode-less algorithm as additional operation (e.g. sort
versus sort-lexicographically
), because a suffix like lexicographically
indicates that the sorting algorithm does not need/consider Unicode.
Currently I only accept UTF-8 strings as arguments. The architecture allows strings as well as bytes as arguments. No op supports bytes though. As long as I cannot see a clear path how to support bytes supplied to rust through the CLI, I won't pursue that path (NOTE: rust abstracts CLI argument types away because Windows supplies UTF-16 and POSIX supplies bytes).
The source code is available at Github.
See the LICENSE file (Hint: MIT license).
0.7.0: first public release
0.9.0: final evaluation release
1.0.0: uses Unicode Version 15.0, release with backwards compatibility guarantees
1.1.0: Perl support, deterministic output for codepoint-frequencies
Please report any issues on the Github issues page.