Example on how to build (short) k-mers for DNA sequences, translate into a k-mer presence-absence bit-vector and run k-mer counting algorithms. K-mer counts (Term Frequency vector) can be saved to disk as succinct/compressed BLOB. Based on TF vector example can compute top N percent of highly represented k-mers which are most likely parts of repeats How to get test data: --------------------- >wget bitmagic.io/data/NC_000913.3.fa.gz >wget bitmagic.io/data/NC_000001.11.fa.gz How to build: -------------- 0. use project make file OR 1. apply environment variables at the BitMagic project root: >. ./bmenv.sh or >source ./bmenv.sh 2. Build regular version: >make rebuild or AVX2 make BMOPTFLAGS=-DBMAVX2OPT rebuild OR just use (it will use GCC to create all build variants) >./build_all.sh How to run: ------------ Help: >./xsample07 -h Generate k-mer fingerprint (4 threads): >./xsample07_avx2 -kd test10.kd -fa NC_000001.11.fa -k 10 -t -j 4 Generate k-mer fingerprint with diagnostics checks(slower) ./xsample07 -kd test10.kd -fa NC_000001.11.fa -k 10 -diag Generate k-mer fingerprint and count all k-mers (8threads): >./xsample07_avx2 -kd test10.kd -kdc test10.kdc -fa NC_000001.11.fa -k 10 -t -j 8 Generate k-mer fingerprint and count all k-mers and compute frequent k-mer vector for top 10% of all k-mers. k-mer frequency histogram is reported to a file (hmap.tsv) build and save the k-mer fingerprint cleaned from over-represented k-mers (test.kdc) : >./xsample07_avx2 -kd test.kd -kdf test.kdf -kdc test.kdc -fa NC_000001.11.fa -k 16 -t -j 4 -kh hmap.tsv -fpc 10