| Crates.io | solrcopy |
| lib.rs | solrcopy |
| version | 0.8.1 |
| created_at | 2025-10-10 18:15:39.542394+00 |
| updated_at | 2025-10-12 16:30:51.279511+00 |
| description | Command line tool useful for migration, transformations, backup, and restore of documents stored inside cores of Apache Solr |
| homepage | https://github.com/juarezr/solrcopy |
| repository | https://github.com/juarezr/solrcopy |
| max_upload_size | |
| id | 1877299 |
| size | 274,442 |
Command line tool useful for migration, transformations, backup, and restore of documents stored inside cores of Apache Solr.
solrcopy backup for dumping documents from a Solr core into local zip files.
--query for filtering the documents extracted by using a Solr Query--order for specifying the sorting of documents extracted.--limit and --skip for restricting the number of documents extracted.--select and --exclude for restricting the columns extracted.solrcopy restore for uploading the extracted documents from local zip files into the same Solr core or another with same field names as extracted.
uniqueKey field defined in core.solrcopy backup for extracting more than one slice of documents to be updated.The following environment variables can be used for common parameters:
SOLR_COPY_URL for the url pointing to the Solr clusterSOLR_COPY_DIR for the existing folder where the zip backup files containing the extracted documents are storedThese variables can also be stored in a .env file alongside the solrcopy binary. See .env.example
Extracting and updating documents in huge cores can be challenging. It can take too much time and can fail any time.
Bellow some tricks for dealing with such cores:
--readers and --writers for executing operations in parallel.backup subcommand tend to slow as times goes and eventually fails. This is because Solr is suffers to get docs batches with hight skip/start parameters. For dealing with this:
--iterate-byn between and --stepfor iterating through parameter --query with variables {begin} and {end}.--query 'date:[{begin} TO {end}]' --iterate-by day --between '2020-04-01' '2020-04-30T23:59:59'--param shards=shard1 for copying by each shard by name in backkupsubcommand.--delay for avoiding to overload the Solr server.$ solrcopy --help
Command line tool for backup and restore of documents stored in cores of Apache Solr.
Solrcopy is a command for doing backup and restore of documents stored on Solr cores. It let you filter docs by using a expression, limit quantity, define order and desired columns to export. The data is stored as json inside local zip files. It is agnostic to data format, content and storage place. Because of this data is restored exactly as extracted and your responsible for extracting, storing and updating the correct data from and into correct cores.
Usage: solrcopy <COMMAND>
Commands:
backup Dumps documents from a Apache Solr core into local backup files
restore Restore documents from local backup files into a Apache Solr core
commit Perform a commit in the Solr core index for persisting documents in disk/memory
delete Removes documents from the Solr core definitively
generate Generates man page and completion scripts for different shells
help Print this message or the help of the given subcommand(s)
Options:
-h, --help
Print help (see a summary with '-h')
-V, --version
Print version
$ solrcopy backup --help
Dumps documents from a Apache Solr core into local backup files
Usage: solrcopy backup [OPTIONS] --core <core> --dir </path/to/output>
Options:
-u, --url <URL>
Url pointing to the Solr cluster
[env: SOLR_COPY_URL=]
[default: http://localhost:8983/solr]
-c, --core <core>
Case sensitive name of the core in the Solr server
-d, --dir </path/to/output>
Existing folder where the backuped files containing the extracted documents are stored
[env: SOLR_COPY_DIR=]
-q, --query <'f1:vl1 AND f2:vl2'>
Solr Query param 'q' for filtering which documents are retrieved See: <https://lucene.apache.org/solr/guide/6_6/the-standard-query-parser.html>
-f, --fq <'f1:vl1 AND f2:vl2'>
Solr Filter Query param 'fq' for filtering which documents are retrieved
-o, --order <f1:asc,f2:desc,...>
Solr core fields names for sorting documents for retrieval
-k, --skip <quantity>
Skip this quantity of documents in the Solr Query
[default: 0]
-l, --limit <quantity>
Maximum quantity of documents for retrieving from the core (like 100M)
-s, --select <field1,field2,...>
Names of core fields retrieved in each document [default: all but _*]
-e, --exclude <field1,field2,...>
Names of core fields excluded in each document [default: none]
-i, --iterate-by <mode>
Slice the queries by using the variables {begin} and {end} for iterating in `--query` Used in bigger solr cores with huge number of docs because querying the end of docs is expensive and fails frequently
[default: day]
Possible values:
- none
- minute: Break the query in slices by a first ordered date field repeating between {begin} and {end} in the query parameters
- hour
- day
- range: Break the query in slices by a first ordered integer field repeating between {begin} and {end} in the query parameters
-b, --between <begin> <end> <begin> <end>
The range of dates/numbers for iterating the queries throught slices. Requires that the query parameter contains the variables {begin} and {end} for creating the slices. Use numbers or dates in ISO 8601 format (yyyy-mm-ddTHH:MM:SS)
--step <num>
Number to increment each step in iterative mode
[default: 1]
-p, --params <useParams=mypars>
Extra parameter for Solr Update Handler. See: <https://lucene.apache.org/solr/guide/transforming-and-indexing-custom-json.html>
-m, --max-errors <count>
How many times should continue on source document errors
[default: 0]
--delay-before <time>
Delay before any processing in solr server. Format as: 30s, 15min, 1h
--delay-per-request <time>
Delay between each http operations in solr server. Format as: 3s, 500ms, 1min
--delay-after <time>
Delay after all processing. Usefull for letting Solr breath
--num-docs <quantity>
Number of documents to retrieve from solr in each reader step
[default: 4k]
--archive-files <quantity>
Max number of files of documents stored in each archive file
[default: 40]
--archive-prefix <name>
Optional prefix for naming the archive backup files when storing documents
--archive-compression <compression>
Compression method to use for compressing the archive files [possible values: stored, zip, zstd ]
[default: zip]
--workaround-shards <count>
Use only when your Solr Cloud returns a distinct count of docs for some queries in a row. This may be caused by replication problems between cluster nodes of shard replicas of a core. Response with 'num_found' bellow the greatest value are ignored for getting all possible docs. Use with `--params shards=shard_name` for retrieving all docs for each shard of the core
[default: 0]
-r, --readers <count>
Number parallel threads exchanging documents with the solr core
[default: 1]
-w, --writers <count>
Number parallel threads syncing documents with the archives files
[default: 1]
--log-level <level>
What level of detail should print messages
[default: INFO]
--log-mode <mode>
Terminal output to print messages
[default: mixed]
--log-file-path <path>
Write messages to a local file
--log-file-level <level>
What level of detail should write messages to the file
[default: DEBUG]
-h, --help
Print help (see a summary with '-h')
$ solrcopy backup --url http://localhost:8983/solr --core demo --query 'price:[1 TO 400] AND NOT popularity:10' --order price:desc,weight:asc --limit 10000 --select id,date,name,price,weight,popularity,manu,cat,store,features --dir ./tmp
$ solrcopy restore --help
Restore documents from local backup files into a Apache Solr core
Usage: solrcopy restore [OPTIONS] --url <localhost:8983/solr> --core <core> --dir </path/to/output>
Options:
-u, --url <localhost:8983/solr> Url pointing to the Solr cluster [env: SOLR_COPY_URL=]
-c, --core <core> Case sensitive name of the core in the Solr server
-d, --dir </path/to/output> Existing folder where the zip backup files containing the extracted documents are stored [env: SOLR_COPY_DIR=]
-f, --flush <mode> Mode to perform commits of the documents transaction log while updating the core [possible values: none, soft, hard, <interval>] [default: hard]
--no-final-commit Do not perform a final hard commit before finishing
--disable-replication Disable core replication at start and enable again at end
-p, --params <useParams=mypars> Extra parameter for Solr Update Handler. See: https://lucene.apache.org/solr/guide/transforming-and-indexing-custom-json.html
-m, --max-errors <count> How many times should continue on source document errors [default: 0]
--delay-before <time> Delay before any processing in solr server. Format as: 30s, 15min, 1h
--delay-per-request <time> Delay between each http operations in solr server. Format as: 3s, 500ms, 1min
--delay-after <time> Delay after all processing. Usefull for letting Solr breath
-s, --search <core*.zip> Search pattern for matching names of the zip backup files
--order <asc | desc> Optional order for searching the zip archives
-r, --readers <count> Number parallel threads exchanging documents with the solr core [default: 1]
-w, --writers <count> Number parallel threads syncing documents with the zip archives [default: 1]
--log-level <level> What level of detail should print messages [default: info]
--log-mode <mode> Terminal output to print messages [default: mixed]
--log-file-path <path> Write messages to a local file
--log-file-level <level> What level of detail should write messages to the file [default: debug]
-h, --help Print help
$ solrcopy restore --url http://localhost:8983/solr --dir ./tmp --core demo
$ solrcopy delete --help
Removes documents from the Solr core definitively
Usage: solrcopy delete [OPTIONS] --query <f1:val1 AND f2:val2> --url <localhost:8983/solr> --core <core>
Options:
-u, --url <localhost:8983/solr> Url pointing to the Solr cluster [env: SOLR_COPY_URL=]
-c, --core <core> Case sensitive name of the core in the Solr server
-q, --query <f1:val1 AND f2:val2> Solr Query for filtering which documents are removed in the core.
Use '*:*' for excluding all documents in the core. There are no way of recovering excluded docs.
Use with caution and check twice
-f, --flush <mode> Wether to perform a commits of transaction log after removing the documents [default: soft]
--log-level <level> What level of detail should print messages [default: info]
--log-mode <mode> Terminal output to print messages [default: mixed]
--log-file-path <path> Write messages to a local file
--log-file-level <level> What level of detail should write messages to the file [default: debug]
-h, --help Print help
$ solrcopy delete --url http://localhost:8983/solr --core demo --query '*:*'
$ solrcopy commit --help
Perform a commit in the Solr core index for persisting documents in disk/memory
Usage: solrcopy commit [OPTIONS] --url <localhost:8983/solr> --core <core>
Options:
-u, --url <localhost:8983/solr> Url pointing to the Solr cluster [env: SOLR_COPY_URL=]
-c, --core <core> Case sensitive name of the core in the Solr server
--log-level <level> What level of detail should print messages [default: info]
--log-mode <mode> Terminal output to print messages [default: mixed]
--log-file-path <path> Write messages to a local file
--log-file-level <level> What level of detail should write messages to the file [default: debug]
-h, --help Print help
$ solrcopy commit --url http://localhost:8983/solr --core demo
--params timeAllowed=15000&segmentTerminatedEarly=false&cache=false&shards=shard1For compiling a version from source:
cargo build --releasecargo installFor setting up a development environment:
For using Visual Studio Code:
You can also use Intellij Idea, vim, emacs or you prefered IDE.
See also the testing in Visual Studio Code below.
For setting up a testing environment you will need:
solrcopy backup command.solrcopy restore command.solrcopy parameters in command line or IDE launch configuration.Check the Solr docker documentation for help in how to create a Solr container.
You can use cargo make to run all tasks to setup a Solr server, test source code agains the Solr server, and cleanup.
To create a local container using docker run the following cargo make command:
cargo make test-start
After this you can test the source code agains the Solr server by running following cargo command:
cargo test --features testsolr
To create the local container, test source code and cleanup, run the following cargo make command:
cargo make test
Please also check all available tasks.
http://localhost:8983/solr# This command creates the container with a solr server with two cores: 'demo' and 'target'
$ docker compose -f docker/docker-compose.yml up -d
# Run this command to insert some data into the cores
$ docker compose exec solr solr-ingest-all
# Run this command to test backup
$ cargo run -- backup --url http://localhost:8983/solr --core demo --dir $PWD
# Run this command to test restoring the backukp data into a existing empty core
$ cargo run -- restore --url http://localhost:8983/solr --search demo --core target --dir $PWD
Its possible to create the solr container using just docker instead of docker compoose.
Follow these instructions if you'd rather prefer this way:
$ cd docker
# Pull solr latest solr image from docker hub
$ docker pull solr:slim
...
# 1. Create a container running solr and after
# 2. Create the **source** core with the name 'demo'
# 3. Import some docs into the 'demo' core
$ docker run -d --name solr4test -p 8983:8983 solr:slim solr-demo
...
# Create a empty **target** core named 'target'
$ docker exec -it solr4test solr create_core -c target
You can use Cargo Make to execute the most common sequences required for testing, linting, and preparation for commiting.
In order to use it, you need install it before running the following command:
cargo install --force cargo-make
After installed, you can check all available tasks by running the following command:
$ cargo make --list-all-steps --quiet --hide-uninteresting
Basic
----------
all - Runs all lint checks and runs all tests against a local Solr container
check - Runs all lint checks and runs all basic tests possible without a Solr Server
lint - Verify the source code using all the checks configured
list - List all available tasks [aliases: default]
test - Runs tests against a local solr server created using docker compose
Lint
----------
check-compile - Check if the source code compiles
check-doc - Check if the source code has any documentation issues
check-fmt - Check if the source code follows the formatting rules
check-future - Check if the source code has any future incompatibilities
check-lint - Check if the source code has any language issues
check-msrv - Verify the minimum supported rust version
check-unused - Check if the source code has any unused dependencies
clean - Clean all compiled artifacts
Security
----------
audit - Check if the release build has any security issues and clean the compiled artifacts after
audit-release - Check if the release build has any security issues
Test
----------
test-basic - Runs tests that do not require a Solr container
test-cleanup - Cleanup the local Solr container after testing
test-solr - Runs tests against an existing local solr server
test-start - Setup a local Solr container and ingest some documents allowing to run tests manually after
Upgrade
----------
show - Show the installed and current rust toolchains
upgrade - Upgrade rustup, rust andthe rust toolchain
upgrade-check - Check if the rust toolchain is up to date
upgrade-rustup - Upgrade the rustup tool
upgrade-toolchain - Upgrade the stable rust toolchain
There are some pre-configured launch configurations in this repository for debugging solrcopy.
.vscode/launch.json if you'd rather prefer:
--query--order--select--batch--skip--limitRelated projects and documentation: