AlexandrusPS: a user-friendly pipeline for the automated detection of orthologous gene clusters and subsequent positive selection analysis
The detection of adaptive selection in a systems approach considering all protein coding genes allows for the identification of mechanisms and pathways that enabled adaptation to different environments. Currently available programs for the estimation of positive selection signals can be divided into two groups. They are either easy to apply but can analyze only one gene family at a time, restricting systems analysis; or they can handle larger cohorts of gene families, but require considerable prerequisite data such as orthology associations, codon alignments, phylogenetic trees and proper configuration files. All these steps require extensive computational expertise restricting this endeavor to specialists. Here, we introduce AlexandrusPS, a high-throughput pipeline that overcomes technical challenges when conducting transcriptome-wide positive selection analyses on large sets of nucleotide and protein sequences. The pipeline streamlines (1) the execution of an accurate orthology prediction as a precondition for positive selection analysis, (2) preparing and organizing configuration files for CodeML, (3) performing positive selection analysis using CodeML and (4) generating an output that is easy to interpret, including all maximum likelihood and log likelihood test results. The only input needed from the user is the CDS and peptide FASTA files of proteins of interest. The pipeline is provided in a Docker image, requiring no program or module installation, enabling the application of the pipeline in any computing environment. AlexandrusPS and its documentation are available via GitHub (https://github.com/alejocn5/AlexandrusPS).