Crates.io | drug-extraction-cli |
lib.rs | drug-extraction-cli |
version | 1.3.0 |
source | src |
created_at | 2022-04-26 19:02:52.457243 |
updated_at | 2024-04-24 15:00:23.565828 |
description | A CLI for extracting drugs from text records |
homepage | |
repository | https://github.com/UK-IPOP/drug-extraction |
max_upload_size | |
id | 575659 |
size | 55,217 |
This application takes a CSV file and parses text records from another CSV file to detect and extract search term mentions using string similarity algorithms to account for common misspellings. It is named for the drug searching it does most commonly for us at IPOP but is flexible enough to accept any type search terms.
NOTE: In our text-preprocessing, we specifically allow hyphens ("-") to to their frequency in drug terminologies. If you want to see this functionality removed or put behind a feature flag, please file an Issue.
If you are wondering about specific use cases, check out the Examples folder!
To install the drug-extraction-cli application, simply:
Please use pipx since it is designed specifically for this use case of installing Python CLI apps into isolated virtual environments.
pipx install extract-drugs
cargo install drug-extraction-cli
IMPORTANT! Both of these will install an executable called
extract-drugs
.No matter how you install the package from either packaging index, the binary program will be named
extract-drugs
for more intuitive commands.INFO: The naming discrepancy is due to to how
maturin
handles package names and wanting to both keep the same CLI command/name and maintain the Rust namespace. Apologies, but you'll be fine 🙂.
This application has two commands: interactive
and search
. Both of these commands have the same underlying functionality, the latter allows you to pass command-line arguments and is better suited to automated processing or advanced users while the former allows interactive declaration of the same configuration options and is better for new or first time users.
API documentation for the library can be found on docs.rs.
This will present you with a series of prompts to help you select correct options. Highly recommended for new users or one-off runs.
Usage:
extract-drugs interactive
This command is demoed in the GIF above.
search
functions the same as interactive
but allows you to declaratively provide the configuration options.
This tool will output an output.csv
file with the following format:
Column Name | Description | Data Type | Limits/Ranges |
---|---|---|---|
row_id | Identifier from --id-col if provided, else line number of row in --data-file |
String | None |
search_term | The search term, cleaned and normalized. This is the actual term that was compared. | String | None |
matched_term | The matched term, cleaned and normalized. This is the actual term that was compared. | String | None |
edits | The osa edit distance |
Integer | 0-2 (top limit due to exclusion filter) |
similarity_score | The jaro_winkler similarity score |
Float | 0.95-1.0 (bottom limit due to exclusion filter) |
search_field | The field that this match was found in, from --search-cols |
String | None |
metadata | The attached metadata to search_term in the search_terms file |
String or None | None |
For a whole showcase of example runs of this tool check out the shell scripts inside the examples folder.
For a showcase of potential analytical value that can be derived from running this tool, checkout the Jupyter Notebooks in the same folder!
If you encounter any issues or need support please either contact @nanthony007 or open an issue.
See CONTRIBUTING.md.