Crates.io | latte-cli |
lib.rs | latte-cli |
version | 0.27.0 |
source | src |
created_at | 2023-01-30 09:29:50.966689 |
updated_at | 2024-08-05 10:33:23.862971 |
description | A database benchmarking tool for Apache Cassandra |
homepage | |
repository | |
max_upload_size | |
id | 771573 |
size | 4,191,708 |
Runs custom CQL workloads against a Cassandra cluster and measures throughput and response times
Contrary to NoSQLBench, Cassandra Stress and tlp-stress, Latte has been written in Rust and uses the native Cassandra driver from Scylla. It features a fully asynchronous, thread-per-core execution engine, capable of running thousands of requests per second from a single thread.
Latte has the following unique performance characteristics:
The excellent performance makes it a perfect tool for exploratory benchmarking, when you quickly want to experiment with different workloads.
Other benchmarking tools often use configuration files to specify workload recipes. Although that makes it easy to define simple workloads, it quickly becomes cumbersome when you want to script more realistic scenarios that issue multiple queries or need to generate data in different ways than the ones directly built into the tool.
Instead of trying to bend a popular configuration file format into a turing-complete scripting language, Latte simply embeds a real, fully-featured, turing-complete, modern scripting language. We chose Rune due to painless integration with Rust, first-class async support, satisfying performance and great support from its maintainers.
Rune offers syntax and features similar to Rust, albeit with dynamic typing and easy automatic memory management. Hence,
you can not only just issue custom CQL queries, but you can program
anything you wish. There are variables, conditional statements, loops, pattern matching, functions, lambdas,
user-defined data structures, objects, enums, constants, macros and many more.
Latte is still early stage software under intensive development.
dpkg -i latte-<version>.deb
cargo install latte-cli
Start a Cassandra cluster somewhere (can be a local node). Then run:
latte schema <workload.rn> [<node address>] # create the database schema
latte load <workload.rn> [<node address>] # populate the database with data
latte run <workload.rn> [-f <function>] [<node address>] # execute the workload and measure the performance
You can find a few example workload files in the workloads
folder.
For convenience, you can place workload files under /usr/share/latte/workloads
or .local/share/latte/workloads
,
so latte can find them regardless of the current working directory. You can also set up custom workload locations
by setting LATTE_WORKLOAD_PATH
environment variable.
Latte produces text reports on stdout but also saves all data to a json file in the working directory. The name of the file is created automatically from the parameters of the run and a timestamp.
You can display the results of a previous run with latte show
:
latte show <report.json>
latte show <report.json> -b <previous report.json> # to compare against baseline performance
Run latte --help
to display help with the available options.
Workloads for Latte are fully customizable with embedded scripting language Rune.
A workload script defines a set of public functions that Latte calls automatically. A minimum viable workload script
must define at least a single public async function run
with two arguments:
ctx
– session context that provides the access to Cassandrai
– current unique cycle number of a 64-bit integer type, starting at 0The following script would benchmark querying the system.local
table:
pub async fn run(ctx, i) {
ctx.execute("SELECT cluster_name FROM system.local LIMIT 1").await
}
Instance functions on ctx
are asynchronous, so you should call await
on them.
The workload script can provide more than one function for running the benchmark.
In this case you can name those functions whatever you like, and then select one of them
with -f
/ --function
parameter.
You can (re)create your own keyspaces and tables needed by the benchmark in the schema
function.
The schema
function should also drop the old schema if present.
The schema
function is executed by running latte schema
command.
pub async fn schema(ctx) {
ctx.execute("CREATE KEYSPACE IF NOT EXISTS test \
WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 }").await?;
ctx.execute("DROP TABLE IF NOT EXISTS test.test").await?;
ctx.execute("CREATE TABLE test.test(id bigint, data varchar)").await?;
}
Calling ctx.execute
is not optimal, because it doesn't use prepared statements. You can prepare statements and
register them on the context object in the prepare
function:
const INSERT = "my_insert";
const SELECT = "my_select";
pub async fn prepare(ctx) {
ctx.prepare(INSERT, "INSERT INTO test.test(id, data) VALUES (?, ?)").await?;
ctx.prepare(SELECT, "SELECT * FROM test.test WHERE id = ?").await?;
}
pub async fn run(ctx, i) {
ctx.execute_prepared(SELECT, [i]).await
}
Query parameters can be bound and passed by names as well:
const INSERT = "my_insert";
pub async fn prepare(ctx) {
ctx.prepare(INSERT, "INSERT INTO test.test(id, data) VALUES (:id, :data)").await?;
}
pub async fn run(ctx, i) {
ctx.execute_prepared(INSERT, #{id: 5, data: "foo"}).await
}
Read queries are more interesting when they return non-empty result sets.
To be able to load data into tables with latte load
, you need to set the number of load cycles on the context object
and define the load
function:
pub async fn prepare(ctx) {
ctx.load_cycle_count = 1000000;
}
pub async fn load(ctx, i) {
ctx.execute_prepared(INSERT, [i, "Lorem ipsum dolor sit amet"]).await
}
We also recommend defining the erase
function to erase the data before loading so that you always get the same
dataset regardless of the data that were present in the database before:
pub async fn erase(ctx) {
ctx.execute("TRUNCATE TABLE test.test").await
}
Latte comes with a library of data generating functions. They are accessible in the latte
crate. Typically, those
functions accept an integer i
cycle number, so you can generate consistent numbers. The data generating functions
are pure, i.e. invoking them multiple times with the same parameters yields always the same results.
latte::uuid(i)
– generates a random (type 4) UUIDlatte::hash(i)
– generates a non-negative integer hash valuelatte::hash2(a, b)
– generates a non-negative integer hash value of two integerslatte::hash_range(i, max)
– generates an integer value in range 0..max
latte::hash_select(i, vector)
– selects an item from a vector based on a hashlatte::blob(i, len)
– generates a random binary blob of length len
latte::normal(i, mean, std_dev)
– generates a floating point number from a normal distributionlatte::uniform(i, min, max)
– generates a floating point number from a uniform distributionRune uses 64-bit representation for integers and floats.
Since version 0.28 Rune numbers are automatically converted to proper target query parameter type,
therefore you don't need to do explicit conversions. E.g. you can pass an integer as a parameter
of Cassandra type smallint
. If the number is too big to fit into the range allowed by the target
type, a runtime error will be signalled.
The following methods are available:
x.to_integer()
– converts a float to an integerx.to_float()
– converts an integer to a floatx.to_string()
– converts a float or integer to a stringx.clamp(min, max)
– restricts the range of an integer or a float value to given rangeYou can also convert between floats and integers by calling to_integer
or to_float
instance functions.
Text data can be loaded from files or resources with functions in the fs
module:
fs::read_to_string(file_path)
– returns file contents as a stringfs::read_lines(file_path)
– reads file lines into a vector of stringsfs::read_resource_to_string(resource_name)
– returns builtin resource contents as a stringfs::read_resource_lines(resource_name)
– returns builtin resource lines as a vector of stringsThe resources are embedded in the program binary. You can find them under resources
folder in the
source tree.
To reduce the cost of memory allocation, it is best to load resources in the prepare
function only once
and store them in the data
field of the context for future use in load
and run
:
pub async fn prepare(ctx) {
ctx.data.last_names = fs::read_lines("lastnames.txt")?;
// ... prepare queries
}
pub async fn run(ctx, i) {
let random_last_name = latte::hash_select(i, ctx.data.last_names);
// ... use random_last_name in queries
}
Workloads can be parameterized by parameters given from the command line invocation.
Use latte::param!(param_name, default_value)
macro to initialize script constants from command line parameters:
const ROW_COUNT = latte::param!("row_count", 1000000);
pub async fn prepare(ctx) {
ctx.load_cycle_count = ROW_COUNT;
}
Then you can set the parameter by using -P
:
latte run <workload> -P row_count=200
Errors during execution of a workload script are divided into three classes:
?
for propagating them up the call chain). All errors except Cassandra overload
errors terminatectx.elapsed_secs()
– returns the number of seconds elapsed since starting the workload, as float