Inadequate reference datasets biased towards short non-epitopes confound B-cell epitope prediction

Rahman, K.S.; Chowdhury, E.U.; Sachse, Konrad GND; Kaltenboeck, B.

X-ray crystallography has shown that an antibody paratope typically binds 15-22 amino acids (aa) of an epitope, of which 2-5 randomly distributed aa contribute most of the binding energy. In contrast, researchers typically choose for B-cell epitope mapping short peptide antigens in antibody binding assays. Furthermore, short 6-11aa epitopes, and in particular non-epitopes, are overrepresented in published B-cell epitope datasets that are commonly used for development of B-cell epitope prediction approaches from protein antigen sequences. We hypothesized that such suboptimal-length peptides result in weak antibody binding and cause false-negative results. We tested the influence of peptide antigen length on antibody binding by analyzing data on more than 900 peptides used for B-cell epitope mapping of immunodominant proteins of Chlamydia spp. We demonstrate that short 7-12aa peptides of B-cell epitopes bind antibodies poorly, thus epitope mapping with short peptide antigens falsely classifies many B-cell epitopes as non-epitopes. We also show in published datasets of confirmed epitopes and non-epitopes a direct correlation between length of peptide antigens and antibody binding. Elimination of short, ≤11aa epitope/non-epitope sequences improved datasets for evaluation of in silico B-cell epitope prediction. Achieving up to 86% accuracy, protein disorder tendency is the best indicator of B-cell epitope regions for chlamydial and published datasets. For B-cell epitope prediction, the most effective approach is plotting disorder of protein sequences with the IUPred-L scale, followed by antibody reactivity testing of 16-30aa peptides from peak regions. This strategy overcomes the well-known inaccuracy of in silico B-cell epitope prediction from primary protein sequences.



Citation style:

Rahman, K.S. / Chowdhury, E.U. / Sachse, Konrad / et al: Inadequate reference datasets biased towards short non-epitopes confound B-cell epitope prediction. 2016.


Use and reproduction:
All rights reserved