.TH "esl\-alipid" 1  "@EASEL_DATE@" "Easel @EASEL_VERSION@" "Easel Manual"

.SH NAME
esl\-alipid \- calculate pairwise percent identities for all sequence pairs in an MSA

.SH SYNOPSIS
.B esl\-alipid
[\fIoptions\fR]
.I msafile


.SH DESCRIPTION

.PP
.B esl\-alistat 
calculates the pairwise percent identity of each sequence pair in
in the MSA(s) in 
.I msafile.
For each sequence pair, it outputs a line of 
.I <seqname1> <seqname2> <%id> <nid> <denomid> <%match> <nmatch> <denommatch>
where 
.I <%id> 
is the percent identity,
.I <nid>
is the number of identical aligned pairs,
and 
.I <denomid> 
is the denominator used for the calculation: the
shorter of the two (unaligned) sequence lengths.
The %identity is defined as 100*nid/denomid.

.PP
The last three fields are the pairwise percent match calculation, in
the pair\-HMM sense of a "match state" that aligns two residues XY
(whether identical or different) versus delete \-Y and insert X\- states
that have a residue in one sequence and a gap character in the other.
That is, the %match is the percentage of the alignment that
consists of aligned residues as opposed to insertions or deletions in
either sequence. The %match is defined as 100*nmatch/denommatch.

.PP
There are many ways that one could choose a denominator for these
percentages. We always define %id using MIN(len1,len2) as the
denominator. In multiple sequence alignments, you will often have short
sequence fragments which may have very little overlap, or even none at
all. Several ways to calculate %identity, such as ignoring columns
with gaps (100* n_identities / (n_identities + n_mismatches)) , or
dividing by the total alignment length (100 * n_identities / ali_len),
are not robust to having overlapping fragments or long indels, because
you can get spuriously high or low %id's.

.PP
For both %identity and %match calculations, alignments of a gap
character in both sequences, \-\-,  aren't counted. Also, if the denominator
is zero (which can happen, when two sequence fragments in the same MSA
don't overlap each other), the resulting % is defined to be 0.

.PP
If
.I msafile 
is \- (a single dash), alignment input is read from 
stdin.

.PP
Only canonical residues are counted toward
.I <nid> 
and 
.I <n>.
Degenerate residue codes are not counted.

.SH OPTIONS

.TP
.B \-h 
Print brief help;  includes version number and summary of
all options, including expert options.

.TP
.BI \-\-informat " <s>"
Assert that input
.I msafile
is in alignment format
.IR <s> ,
bypassing format autodetection.
Common choices for 
.I <s> 
include:
.BR stockholm , 
.BR a2m ,
.BR afa ,
.BR psiblast ,
.BR clustal ,
.BR phylip .
For more information, and for codes for some less common formats,
see main documentation.
The string
.I <s>
is case-insensitive (\fBa2m\fR or \fBA2M\fR both work).

.TP
.B \-\-amino
Assert that the 
.I msafile 
contains protein sequences. 

.TP 
.B \-\-dna
Assert that the 
.I msafile 
contains DNA sequences. 

.TP 
.B \-\-rna
Assert that the 
.I msafile 
contains RNA sequences. 



.SH SEE ALSO

.nf
@EASEL_URL@
.fi

.SH COPYRIGHT

.nf 
@EASEL_COPYRIGHT@
@EASEL_LICENSE@
.fi 

.SH AUTHOR

.nf
http://eddylab.org
.fi