Comparative sequence analysis for the purpose of RNA structure determination usually begins with an alignment of related sequences. In our alignment procedure, closest relatives are aligned first on the basis of primary structure similarity; each group of aligned sequences is then treated collectively and aligned against other groups. Next, sets of conserved nucleotides are identified and used for aligning in the more variable regions. Finally, where little or no primary structure similarity exists, common secondary structural elements are used as additional markers.
In our derivation of secondary structure, we distinguish between base pairs that are supported by covariances and those that are not contradicted. (A covariance is the observation that a base pair in one organism is different by both bases when compared to the equivalent base pair in another organism.) If the two different pairs are of Watson-Crick type (G-C, A-U), we observe a compensating base change (CBC). Covariances and CBCs support the existence of a base pair because, during evolution, random single mutations that introduce an unstable pairing would not generally have been compensated for by another mutation that restored the stability unless it was required. Such an observation is positive evidence, and the more CBCs the stronger the evidence. Negative evidence is a mismatch which we define as neither a Watson-Crick pair nor G-U pair. Notably, sequence conservation provides neither positive nor negative evidence.
For each base pair we estimate positive and negative evidence by counting the number of CBCs and mismatches. The most conserved base pair at a given alignment position is identified. Then, the remaining pairs are added as CBCs where they covary. Our guideline is to consider base pairs supported if there is at least twice as much positive evidence as negative. As a general rule, when there is less, we prefer not to include a base pair. However, when a base pair is supported in a particular phylogenetic group and disproven in other groups, we include it as specific for that group.
Secondary structure models can be derived directly from the alignment. (RNA alignments might be used also to prove/disprove tertiary interations as well as interactions between more than one RNA molecule.) Typically, supported base pairs are juxtapositioned and connected with a symbol (line, circle, dot, etc.) to indicate the nature of the pairing. When there is negative evidence for a pair, the bases are spaced apart with no symbol between them.