QBG === Command-line interface for NGT with Quantization for indexing high-dimensional data Command ======= **qbg** - proximity search for high dimensional data with quantization $ qbg command [option] index [data] **Note:** When the environment variable POSIXLY_CORERECT is set on some platforms such as Cygwin, you should specifiy options before the command as follows. $ qbg [option] command index [data] **qbg** handles two types of graphs with quantization: Quantized Graph (QG) and Quantized Blob Graph (QBG). **command** for the quantized graph is one of: - *[create-qg](#create-qg)* - *[build-qg](#build-qg)* - *[search-qg](#search-qg)* **command** for the quantized blob graph is one of: - *[create](#create)* - *[append](#append)* - *[build](#build)* - *[search](#search)* ### CREATE-QG Make and initialize a QG directory for the quantized graph in the specified NGT index directory, and insert the data in the NGT index into the QG index. $ qbg create-qg [-P number_of_extended_dimensions] [-Q number_of_subvector_dimensions] index *index* Specify the name of the directory for the existing index such as ANNG or ONNG to be quantized. The index only with L2 distance and normalized cosine similarity distance can be quantized. You should build the ANNG or ONNG with normalized cosine similarity in order to use cosine similarity for the quantized graph. **-P** *number_of_extended_dimensions* Specify the number of the extended dimensions. The number should be greater than or equal to the number of the genuine dimensions, and also should be a multiple of 4. When this option is not specified, the smallest multiple of 4 that is greater than the dimension is set to the number of the extended dimensions. **-Q** *number_of_subvector_dimension* Specify the number of the subvector dimensions. The number should be less than or equal to the the number of the extended dimensions, and also should be a divisor of the number of the extended dimensions. When this option is not specified, one is set to the number of the subvector dimensions. ### BUILD-QG Quantize the objects of the specified index and build a quantized graph into the index. $ qbg build-qg [-o number_of_objects_for_quantization] [-E max_number_of_edges] [-M number_of_trials] index *index* Specify the name of the directory for the existing index such as ANNG or ONNG to be quantized. The index only with L2 distance and normalized cosine similarity distance can be quantized. You should build the ANNG or ONNG with normalized cosine similarity in order to use cosine similarity for the quantized graph. **-o** *number_of_objects_for_quantization* Specify the number of object for quantization and optimization. The number should be less than or equal to the number of the registered objects. **-E** *max_number_of_edges* Specify the maximum number of edges to build a qunatized graph. Since every 16 objects that are associated with edges of each node are processed, the number should be a multiple of 16. **-M** *number_of_trials* Specify the number of trials to optimize the subvector quantization. ### SEARCH-QG Search the index using the specified query data. $ qbg search-qg [-n number_of_search_objects] [-e search_range_coefficient] [-p result_expansion] [-r search_radius] index query_data *index* Specify the path of the existing quantized index. *query_data* Specify the path of the file containing query data. This file shall consist of one item of query data per line and each dimensional element of that data item shall be delimited by a space or tab. Each search shall be sequentially performed when providing multiple queries. **-n** *number_of_search_objects* (default: 20) Specify the number of search objects. **-e** *search_range_coefficient* (default = 0.02) Specify the magnification coefficient (epsilon) of the search range. A larger value means higher accuracy but slower searching, while a smaller value means a drop in accuracy but faster searching. While it is desirable to adjust this value within the range of 0 - 0.1, a negative value (> -1.0) may also be specified. **-p** *result_expansion* (default = 3.0) Specify the expansion ratio of the number of approximate inner search objects to the number of search objects. For example, when the ratio is 10 and the number of search objects is 20, the number of the approximate search objects is set to 200 inside the search processing. A larger value brings higher accuracy but slower searching. ### CREATE Make and initialize a QBG directory for the quantized blob graph. $ qbg create [-d number_of_dimension] [-P number_of_extended_dimensions] [-O object_type] [-D distance_function] [-C number_of_blobs] index *index* Specify the name of the directory for QBG. **-d** *number_of_dimensions* Specify the number of dimensions of registration data. **-P** *number_of_extended_dimensions* Specify the number of the extended dimensions. The number should be greater than or equal to the number of the genuine dimensions, and also should be a multiple of 4. When this option is not specified, the smallest multiple of 4 that is greater than the dimension is set to the number of the extended dimensions. **-O** *object_type* Specify the data object type. - __c__: 1 byte unsigned integer - __f__: 4 byte floating point number (default) **-D** *distance_function* Specify the distance function as follows. - __2__: L2 distance (default) - __c__: Cosine similarity **-C** *number_of_blobs* Specify the number of blobs that should be less than or equal to the number of quantization clusters. **-N** *number_of_subvectors* Specify the number of subvectors that should be a divisor of the number of the extended dimensions. ### APPEND Append the specified data to the specified index. $ qbg append index registration_data ### BUILD Quantize the objects of the specified index and build a quantized graph into the index. $ qbg build [-o number_of_objects_for_quantization] [-E max_number_of_edges] [-M number_of_trials] [-P rotation] index *index* Specify the name of the directory for the existing index such as ANNG or ONNG to be quantized. The index only with L2 distance and normalized cosine similarity distance can be quantized. You should build the ANNG or ONNG with normalized cosine similarity in order to use cosine similarity for the quantized graph. **-o** *number_of_objects_for_quantization* Specify the number of object for quantization and optimization. The number should be less than or equal to the number of the registered objects. **-P** *rotation* Specify the transform matrix type for the inserted and query object to optimize the subvector quantization. - __r__: Rotation matrix. - __R__: Rotation and repositioning matrix. - __p__: Repositioning matrix. - __n__: No matrix. **-M** *number_of_trials* Specify the number of trials to optimize the subvector quantization. ### SEARCH Search the index using the specified query data. $ qbg search [-n number_of_search_objects] [-e search_range_coefficient] [-p result_expansion] index query_data *index* Specify the path of the existing quantized index. *query_data* Specify the path of the file containing query data. This file shall consist of one item of query data per line and each dimensional element of that data item shall be delimited by a space or tab. Each search shall be sequentially performed when providing multiple queries. **-n** *number_of_search_objects* (default: 20) Specify the number of search objects. **-e** *search_range_coefficient* (default = 0.02) Specify the magnification coefficient (epsilon) of the search range. A larger value means higher accuracy but slower searching, while a smaller value means a drop in accuracy but faster searching. While it is desirable to adjust this value within the range of 0 - 0.1, a negative value (> -1.0) may also be specified. **-B** *blob_search_range_coefficient* (default = 0.0) Specify the magnification coefficient (epsilon) of the search range for the quantized blob graph. **-N** *number_of_explored_nodes* (default = 256) Specify the number of the explored nodes in the graph. When the number of the explored nodes reached the specified number, the search is terminated. **-p** *result_expansion* (default = 0.0) Specify the expansion ratio of the number of approximate inner search objects to the number of search objects. For example, when the ratio is 10 and the number of search objects is 20, the number of the approximate search objects is set to 200 inside the search processing. A larger value brings higher accuracy but slower searching. Examples of using the quantized graph ------------------------------------- ### Setup data $ curl -L -O https://github.com/yahoojapan/NGT/raw/main/tests/datasets/ann-benchmarks/sift-128-euclidean.tsv $ curl -L -O https://github.com/yahoojapan/NGT/raw/main/tests/datasets/ann-benchmarks/sift-128-euclidean_query.tsv $ head -1 sift-128-euclidean_query.tsv > query.tsv ### Build the quantized graph Build an ANNG for 128-dimensional, floating point data: $ ngt create -d 128 -o f -D 2 anng sift-128-euclidean.tsv Data loading time=15.4804 (sec) 15480.4 (msec) # of objects=1000000 Processed 100000 objects. time= 4.26452 (sec) ... Processed 1000000 objects. time= 7.06745 (sec) Index creation time=63.3504 (sec) 63350.4 (msec) Create and initialize the quantized graph: $ qbg create-qg anng creating... appending... Build the quantized graph: $ qbg build-qg anng optimizing... building the inverted index... building the quantized graph... ### Search with the quantized graph Search k nearest neighbors with the quantized graph: $ qbg search-qg -n 20 -e 0.02 anng query.tsv Query No.1 Rank ID Distance 1 932086 232.871 2 934877 234.715 3 561814 243.99 ... 20 2177 276.781 Query Time= 0.0005034 (sec), 0.5034 (msec) Average Query Time= 0.0005034 (sec), 0.5034 (msec), (0.0005034/1) Examples of building the quantized graph for higher performance ------------------------------------------------------------ Build an ANNG having more edges for higher performance: $ ngt create -d 128 -o f -D 2 -E 40 anng-40 sift-128-euclidean.tsv Build an ONNG: $ ngt reconstruct-graph -m S -E 64 -o 64 -i 120 anng-40 onng-40 Create and initialize the quantized graph: $ qbg create-qg onng-40 Build the quantized graph: $ qbg build-qg onng-40 Search k nearest neighbors with the quantized graph: $ qbg search -n 20 -e 0.02 onng-40 query.tsv Examples of using the quantized blob graph ------------------------------------- ### Setup data $ curl -L -O https://github.com/yahoojapan/NGT/raw/main/tests/datasets/ann-benchmarks/sift-128-euclidean.tsv $ curl -L -O https://github.com/yahoojapan/NGT/raw/main/tests/datasets/ann-benchmarks/sift-128-euclidean_query.tsv $ head -1 sift-128-euclidean_query.tsv > query.tsv ### Build the quantized blob graph Create and initialize the quantized blob graph: $ qbg create -d 128 -D 2 -N 128 qbg-index Append objects: $ qbg append qbg-index sift-128-euclidean.tsv Build the quantized graph: $ qbg build qbg-index ### Search with the quantized blob graph Search k nearest neighbors with the quantized blob graph: $ qbg search -n 20 -e 0.02 qbg-index query.tsv