R package: SRS-Scaling with Ranked Subsampling

Beule, Lukas; Heidrich, Vitor; Karlovsky, Petr

Forschungsdaten So., 27. März. 2022 CC BY-SA 4.0

Veröffentlicht

R package: SRS-Scaling with Ranked Subsampling

Beule, Lukas ; Heidrich, Vitor; Karlovsky, Petr

Analysis of species count data in ecology often requires normalization to an identical sample size. Rarefying (random subsampling without replacement), which is a popular method for normalization, has been widely criticized for its poor reproducibility and potential distortion of the community structure. In the context of microbiome count data, researchers explicitly advised against the use of rarefying. An alternative to rarefying is scaling with ranked subsampling (SRS). SRS consists of two steps. In the first step, the total counts for all OTUs (operational taxonomic units) or species in each sample are divided by a scaling factor chosen in such a way that the sum of the scaled counts Cscaled equals Cmin. In the second step, the non-integer Cscaled values are converted into integers by an algorithm that we dub ranked subsampling. The Cscaled value for each OTU or species is split into the integer part Cint (Cint = floor(Cscaled)) and the fractional part Cfrac (Cfrac = Cscaled - Cints). Since the sum of Cint is smaller or equal to Cmin, additional delta C = Cmin - the sum of Cint counts have to be added to the library to reach the total count of Cmin. This is achieved as follows. OTUs are ranked in the descending order of their Cfrac values. Beginning with the OTU of the highest rank, single count per OTU is added to the normalized library until the total number of added counts reaches delta C and the sum of all counts in the normalized library equals Cmin. When the lowest Cfrag involved in picking delta C counts is shared by several OTUs, the OTUs used for adding a single count to the library are selected in the order of their Cint values. This selection minimizes the effect of normalization on the relative frequencies of OTUs. OTUs with identical Cfrag as well as Cint are sampled randomly without replacement. See Beule & Karlovsky (2020) doi:10.7717/peerj.9593

Einordnung

Referenziert:: ‘SRS’ R Package and ‘q2-srs’ QIIME 2 Plugin: Normalization of Microbiome Data Using Scaling with Ranked Subsampling (SRS)
(2021)
Datum der Veröffentlichung:: 27.03.2022
Sprache:: Englisch
Ressourcentyp:: Software, Multimedia
Verlag:: The Comprehensive R Archive Network (CRAN)
Schlagwörter:: microbiome count data; data normalisation; scaling with ranked subsampling (SRS); rarefying
DDC-Sachgruppe der DNB:: 004 Informatik
Link URL:: https://CRAN.R-project.org/package=SRS
Einrichtung:: Julius Kühn-Institut, Institut für Ökologische Chemie, Pflanzenanalytik und Vorratsschutz