Supplementary Materials

Supplementary Figure 1

Supplementary Figure 1. Comparison of the SCA algorithm for conservation ( ΔG ) and sequence entropy (H). A: Correlation for the aminotran_3 Pfam family. The Spearman correlation coefficient (r) is -0.93. B: Box-plot showing the range of Spearman correlation coefficient values for all 138 Pfam families used in this study. The line in the middle of the box is the median value. The edges of the box are the 25th and 75th percentile. The whiskers are the 10th and 90th percentile.

Supplementary Figures 2-3

Shown are the results of running the ELSC and SCA algorithms over 138 Pfam alignments. A pdb file was chosen for each Pfam alignment as described in the methods section of the main paper. The x-axis of each trace is scaled from the minimum covariance score to the maximum covariance score. The y-axis is scaled from the minimum pair distance to the maximum pair distance. The vertical red lines indicate the top 75 scoring pair of residues. The horizontal red line indicates the 50th percentile of all pair distances. The orange line indicates 8A.

ELSCSCA
Supplementary Figure 2Supplementary Figure 3

Supplementary Figure 4


Supplementary Figure 4. The accuracy of predicting residue contacts ( Cβ-Cβ within 8A) as a function of the number of predictions each algorithm was asked to make for all 138 Pfam families. ELSC is in black, SCA is in red and a random pairing algorithm is in blue.

Supplementary Figures 5-6

Shown are the results of running the ELSC and SCA algorithms over 138 Pfam alignments. The x-axis of each trace ranges from the 0th percentile to the 100th score percentile. The y-axis of each trace is the pair distance percentile corresponding to Cβ-Cβ distances in the chosen crystal structure. The red line in each trace represents the 50th percentile of pair distance. Each point on each trace is the average of the predictions made for 20 pairs of columns.
ELSCSCA
Pair Distance Average PercentileSupplementary Figure 5Supplementary Figure 6

Supplementary Figure 7

Supplementary Figure 7. Comparison of the ELSC and SCA algorithms for multiple protein families. This figure shows the average of the 138 panels shown in Supplementary Figures 5-6 as histograms with each point representing 0.025 percentile. The rightmost point in the ELSC panel, for example, is the average pair distance percentile of the all the pairs of residues that are within the top 0.025th for each of the 138 protein families. The details on the normalization scheme have been previously reported (Fodor and Aldrich, 2004, submitted).

Pfam families used in this study

The following 138 Pfam families were used in this study. The percentage requirement for each family is the percentage conservation required by the SCA algorithm for inclusion ( see methods in the main paper ). For example, the adh_zinc family has a percentage requirement of 0.16. This means that any column in the adh_zinc family that was not made up of 16% or more of one residue was excluded as an i column from our study. So, if a column in adh_zinc had 17% Alanine, it was included in our study. But if the most conserved residue was Alanine and was in only 15% of the sequences in that column, then that column was excluded. The order listed below is the order of the panels of Supplementary Figures 2-6 above going from left to right and top to bottom.

Pfam Idpercentage requirement
14-3-30.44
2-Hacid_DH_C0.31
2-oxoacid_dh0.39
aakinase0.25
abhydrolase0.12
aconitase0.36
Acyl-CoA_dh0.23
Acyl_transf0.26
Adenylsucc_synt0.58
adh_short0.06
adh_zinc0.16
AIRS_C0.2
ALAD0.61
aldedh0.1
aldo_ket_red0.18
alk_phosphatase0.47
alpha-amylase0.1
Amidohydro_10.17
Amino_oxidase0.22
aminotran_1_20.11
aminotran_30.24
aminotran_50.25
An_peroxidase0.5
ANF_receptor0.15
asp0.18
ATP-synt0.45
beta-lactamase0.23
catalase0.42
cellulase0.26
Chal_stil_syntC0.33
chorismate_bind0.34
CLP_protease0.5
CN_hydrolase0.26
COesterase0.18
cpn60_TCP10.16
CPSase_L_D20.2
cyclin0.17
Cys_Met_Meta_PP0.37
DAO0.25
DEAD0.1
DegT_DnrJ_EryC10.4
DNA_pol_A0.49
DNA_pol_B_exo0.26
E1-E2_ATPase0.15
E1_dehydrog0.35
ECH0.2
enolase0.42
Epimerase0.21
Exo_endo_phos0.11
fer4_NifH0.3
FGGY0.32
FGGY_C0.37
G6PD_C0.45
GATase0.19
GHMP_kinases0.23
gln-synt0.22
globin0.1
Glu_synth_NTN0.58
Glu_synthase0.61
Glyco_hydro_10.22
Glyco_hydro_170.4
Glyco_hydro_180.16
Glyco_hydro_190.49
Glyco_hydro_280.31
Glyco_hydro_90.3
GMC_oxred0.4
gpdh0.26
gpdh_C0.41
Gram-ve_porins0.16
HECT0.41
Hemagglutinin0.16
Hist_deacetyl0.39
HMG-CoA_red0.53
hormone_rec0.1
Hydrolase0.06
isodh0.31
ketoacyl-synt0.17
ketoacyl-synt_C0.22
kinesin0.23
lactamase_B0.11
ldh_C0.26
Lipoprotein_60.51
lipoxygenase0.46
lyase_10.28
MCPsignal0.23
Metallophos0.1
MIP0.23
Mur_ligase0.22
NTP_transferase0.15
oxidored_FMN0.4
p4500.05
PALP0.18
PEP-utilizers_C0.47
Peptidase_C10.16
Peptidase_M200.2
Peptidase_M240.24
Peptidase_S80.13
Peripla_BP_like0.19
peroxidase0.2
PFK0.39
pfkB0.2
PGI0.46
PGK0.5
Phage_integrase0.15
phosphorylase0.65
PK0.38
polyprenyl_synt0.23
PP2C0.23
Pribosyltran0.12
pro_isomerase0.3
proteasome0.21
pyr_redox0.12
ras0.1
recA0.41
response_reg0.06
ribonuc_red_lgC0.4
SBP_bac_10.16
serine_carbpept0.38
serpin0.17
SHMT0.45
SRP540.42
thiolase0.28
thymidylat_synt0.53
TIM0.33
TonB_dep_Rec0.18
Topoisom_bac0.38
TPP_enzymes_N0.31
transket_pyr0.25
Transpeptidase0.22
tRNA-synt_10.24
tRNA-synt_2b0.21
Tropomyosin0.55
trypsin0.06
tubulin0.2
tubulin_C0.16
UvrD-helicase0.41
vwa0.16
Y_phosphatase0.15