Crates.io | whichlicense_detection |
lib.rs | whichlicense_detection |
version | 6.0.0 |
source | src |
created_at | 2023-04-25 11:12:25.385292 |
updated_at | 2023-06-17 13:44:24.336439 |
description | A tool to detect licenses used by the WhichLicense project |
homepage | https://whichlicense.com |
repository | https://github.com/whichlicense/license-detection |
max_upload_size | |
id | 848378 |
size | 106,898 |
This is a library to facilitate the detection of licenses in source code.
let mut gaoya = GaoyaDetection {
index: MinHashIndex::new(num_bands, band_width, 0.5),
min_hasher: MinHasher32::new(num_bands * band_width),
shingle_text_size,
normalization_fn: DEFAULT_NORMALIZATION_FN,
};
gaoya.load_from_file("licenses");
// OR:
// for l in load_licenses_from_folder("./licenses/RAW"){
// gaoya.add_plain(&l.name, &strip_spdx_heading(&l.text));
// }
let mut fuzzy = FuzzyDetection {
licenses: vec![],
min_confidence: 50,
exit_on_exact_match: false,
normalization_fn: DEFAULT_NORMALIZATION_FN,
};
fuzzy.load_from_file("licenses");
// OR:
// for l in load_licenses_from_folder("./licenses/RAW"){
// fuzzy.add_plain(&l.name, &strip_spdx_heading(&l.text));
// }
The normalization function is used to normalize the license text before it is processed by the algorithm. This is used so that the algorithm can focus on the license text itself and not the formatting of the license text, which ultimately improves the accuracy of the algorithm (higher confidence).
The pipeline system was developed to automatically improve the results of license detection outputs by allowing further processing when a confidence is, for example, too low. A pipeline works by executing each segment on the running license whilst also checking against the algorithm every time a segment is executed. The pipeline will stop running if the confidence of the top (highest confidence) license is above the desired confidence.
The steps are as follows:
Batched segments allow you to run multiple segments one after the other without checking against (i.e., testing) the algorithm after each segment. The algorithm will be tested after all batched segments have executed.
let pipeline = Pipeline::new(vec![
Segment::Remove(Using::Regex(Regex::new(r"...").unwrap())),
Segment::Remove(Using::Text("...".to_string())),
Segment::Replace(Using::Text("...".to_string()), "***".to_string()),
Segment::Batch(vec![
Segment::Remove(Using::Regex(Regex::new(r"...").unwrap())),
Segment::Remove(Using::Regex(Regex::new(r"...").unwrap())),
]),
]);
let results = pipeline.run(&algorithm, "<your_incoming_license>", 100.0);
The initial database was generated by making use of the license data from the ScanCode toolkit. You do not need to make use of this copyright notice in your project if you choose not to use the ScanCode license database. However, if you do make use of the ScanCode license database, you must include this copyright notice in your project.
Copyright (c) nexB Inc. and others. All rights reserved. ScanCode is a trademark of nexB Inc. SPDX-License-Identifier: CC-BY-4.0 See https://creativecommons.org/licenses/by/4.0/legalcode for the license text. See https://github.com/nexB/scancode-toolkit for support or download. See https://aboutcode.org for more information about nexB OSS projects.