.TH "esl\-mixdchlet" 1 "@EASEL_DATE@" "Easel @EASEL_VERSION@" "Easel Manual" .SH NAME esl\-mixdchlet \- fitting mixture Dirichlets to count data .SH SYNOPSIS .nf \fBesl\-mixdchlet fit\fR [\fIoptions\fR] \fIQ K in_countfile out_mixchlet\fR (train a new mixture Dirichlet) \fBesl\-mixdchlet score\fR [\fIoptions\fR] \fImixdchlet_file counts_file\fR (calculate log likelihood of count data, given mixture Dirichlet) \fBesl\-mixdchlet gen \fR[\fIoptions\fR] \fImixdchlet_file\fR (generate synthetic count data from mixture Dirichlet) \fBesl\-mixdchlet sample \fR[\fIoptions\fR] (sample a random mixture Dirichlet for testing) .fi .SH DESCRIPTION .PP The .B esl\-mixdchlet miniapp is for training mixture Dirichlet priors, such as the priors used in HMMER and Infernal. It has four subcommands: .B fit, .B score, .B gen, and .B sample. The most important subcommand is .B fit, which is the subcommand for fitting a new mixture Dirichlet distribution to a collection of count vectors (for example, emission or transition count vectors from Pfam or Rfam training sets). .PP Specifically, .B esl\-mixdchlet fit fits a new mixture Dirichlet distribution with .I Q mixture components to the count vectors (of alphabet size .I K ) in input file .I in_countfile, and saves the mixture Dirichlet into output file .I out_mixdchlet. .PP The input count vector file .I in_countfile contains one count vector of length .I K fields per line, for any number of lines. Blank lines and lines starting in # (comments) are ignored. Fields are nonnegative real values; they do not have to be integers, because they can be weighted counts. .PP The format of a mixture Dirichlet file .I out_mixdchlet is as follows. The first line has two fields, .I K Q, where .I K is the alphabet size and .I Q is the number of mixture components. The next .I Q lines consist of .I K+1 fields. The first field is the mixture coefficient .I q_k, followed by .I K fields with the Dirichlet alpha[k][a] parameters for this component. .PP The .B esl\-mixdchlet score subcommand calculates the log likelihood of the count vector data in .I counts_file, given the mixture Dirichlet in .I mixdchlet_file. .PP The .B esl\-mixdchlet gen subcommand generates synthetic count data, given a mixture Dirichlet. .PP The .B esl\-mixdchlet sample subcommand creates a random mixture Dirichlet distribution and outputs it to standard output. .SH OPTIONS FOR FIT SUBCOMMAND .TP .B \-h Print brief help specific to the .B fit subcommand. .TP .BI \-s " " Set random number generator seed to nonnegative integer .I . Default is 0, which means to use a quasirandom arbitrary seed. Values >0 give reproducible results. .SH OPTIONS FOR SCORE SUBCOMMAND .TP .B \-h Print brief help specific to the .B score subcommand. .SH OPTIONS FOR GEN SUBCOMMAND .TP .B \-h Print brief help specific to the .B gen subcommand. .TP .BI \-s " " Set random number generator seed to nonnegative integer .I . Default is 0, which means to use a quasirandom arbitrary seed. Values >0 give reproducible results. .TP .BI \-M " " Generate .I counts per sampled vector. (Default 100.) .TP .BI \-N " " Generate .I count vectors. (Default 1000.) .SH OPTIONS FOR SAMPLE SUBCOMMAND .TP .B \-h Print brief help specific to the .B sample subcommand. .TP .BI \-s " " Set random number generator seed to nonnegative integer .I . Default is 0, which means to use a quasirandom arbitrary seed. Values >0 give reproducible results. .TP .BI \-K " " Set the alphabet size to .I . (Default is 20, for amino acids.) .TP .BI \-Q " " Set the number of mixture components to .I . (Default is 9.) .SH SEE ALSO .nf @EASEL_URL@ .fi .SH COPYRIGHT .nf @EASEL_COPYRIGHT@ @EASEL_LICENSE@ .fi .SH AUTHOR .nf http://eddylab.org .fi