Bacterial species classification and prediction with MSclassifR package

1. Introduction

MSclassifR is a R package dedicated for Automated classification of mass-spectra with machine learning methods.It was developed with the aim of identifying very similar species or phenotypes of bacteria from mass spectra obtained by Matrix Assisted Laser Desorption Ionisation - Time Of Flight Mass Spectrometry (MALDI-TOF MS). However, the different functions of this package can also be used to classify other categories associated to mass spectra; or from mass spectra obtained with other mass spectrometry techniques. It includes easy-to-use functions for pre-processing mass spectra, functions to determine discriminant mass-over-charge values (m/z) from a library of mass spectra corresponding to different categories, and functions to predict the category (species, phenotypes, etc.) associated to a mass spectrum from a list of selected mass-over-charge values.

In this vignette, we search to discriminate five species of the genus Ecrobia by MALDI-TOF mass spectrometry with a data set of 55 mass spectra (Lasch et al. (2016)) using MSclassifR package. The raw data concerning from a research paper (Wilke et al., 2020) were downloaded from this link.

2. Install packages

The installation of the MSclassifR package requires the installation of packages from Bioconductor, so you you might have to install the latest version of the BiocManager package. The MSclassifR package imports the other necessary packages from the CRAN. In addition, it is recommended to install the latest version of R or versions >= 4.0.

Install MSclassifR using the following code:

## install BiocManager if not installed
    if (!require("BiocManager", quietly = TRUE))
        install.packages("BiocManager")

## Install the mixOmics and multtest packages from Bioconductor
   BiocManager::install(c("multtest","mixOmics"))

## Install MSclassifR package
install.packages("MSclassifR")

Check after install the MSclassifR package:

## For spectral easy spectral treatment and machine learning
require(MSclassifR)

## Le chargement a nécessité le package : MSclassifR

## Warning: le package 'MSclassifR' a été compilé avec la version R 4.1.3

To continue this vignette, we need the MALDIquantForeign, MALDIquant and caret packages from the CRAN. The installation is possible with the command install.packages().

## time measurement
start_time <- Sys.time()

## For additional spectral treatment functions
require(MALDIquantForeign)

## Le chargement a nécessité le package : MALDIquantForeign

## Le chargement a nécessité le package : MALDIquant

## 
## This is MALDIquant version 1.21
## Quantitative Analysis of Mass Spectrometry Data
##  See '?MALDIquant' for more information about this package.

require(MALDIquant)

## For additional machine learning functions  
require(caret)

## Le chargement a nécessité le package : caret

## Warning: le package 'caret' a été compilé avec la version R 4.1.3

## Le chargement a nécessité le package : ggplot2

## Le chargement a nécessité le package : lattice

3. Import spectra and metadata

For this vignette, we will load mass spectra and their respective metadata from a public repository on GitHub.

## url for load the  MS data
MSdataE <- "https://agodmer.github.io/MSData/Ecrobia/MassSpectra_Ecrobia.Rdata"
MSMetadataE <- "https://agodmer.github.io/MSData/Ecrobia/metaData_Ecrobia.Rdata"
## Load mass spectra
### Mass spectra
load(url(MSdataE))
### Metadata
load(url(MSMetadataE))

If you want to use your own data with .flex format from Bruker MALDI-TOF MS, you could use importBrukerFlex from MALDIquantForeign package. Of note, you will have to create your own corresponding metadata as a data frame in this case.

## for data .flex format import
#MyBrukerFlexspectra <- MALDIquantForeign::importBrukerFlex("YourPathway")

4. Plotting mass spectra

It can be useful to inspect the spectra visually before starting the analysis. This step allows eliminating poor quality spectra.

## Visual quality control
MSclassifR::PlotSpectra(SpectralData=MassSpectra_Ecrobia[[1]])

5. Signal processing and peaks detection

## Signal processing
spectra <- MSclassifR::SignalProcessing(MassSpectra_Ecrobia)
## Peaks detection
peaks <- MSclassifR::PeakDetection(spectra, labels = metaData_Ecrobia$Number_strain)

You can plot the spectral processing performed and the peaks selected (black crosses).

## Inspect spectra after signal processing
MSclassifR::PlotSpectra(SpectralData=spectra[[1]],Peaks=peaks[[1]],col_spec="blue",col_peak="black")

At this step, we create an intensity matrix with m/z values in columns and sample names in rows data with the intensityMatrix function in MALDIquant package (Gibb and Strimmer (2012)).

## Perfom an Intensity Matrix with spectra data
IntMat <- MALDIquant::intensityMatrix(peaks)

## Rows are named according to selected metadata
rownames(IntMat) <-  paste(metaData_Ecrobia$Strain_name_spot, metaData_Ecrobia$Number_strain) 

## Remove "NA" in the intensityMatrix
IntMat[is.na(IntMat)] <- 0

## Normalise peaks according to the maximum intensity
## Create basic function
norma<-function(x) x/(max(x)) 
## Apply this function 
IntMat <- apply(IntMat,1,norma)

## Transpose Matrix for statistical analysis and named X
X <- t(IntMat)
Y <- factor(metaData_Ecrobia$Species)

The dataset is splitted it into training (70%) and test (30%) datasets using caretpackage Kuhn (2008) and the createDataPartition function for machine learning step.

set.seed(123) # For reproductibility
## Split data to tune the model according train set

## Index creation with p = 0,7
Index_skfold <- caret::createDataPartition(Y, p = 0.7, list=FALSE)

## Train set creation
X_train <- X[Index_skfold,]
Y_train <- Y[Index_skfold]

## Test set creation
X_test <- X[-Index_skfold,]
Y_test <- Y[-Index_skfold]

## Format Y_test as a factor for analysis
Y_testb <- gsub(" ", ".",Y_test)

6. Variable selection

The selection of discriminant mass-over-charge values can be performed with several methods proposed in the SelectionVar function of our MSclassifR package. The RFERF method uses the Recursive Feature Elimination (RFE) algorithm coupled with Random Forests (RF); the RFEGlmnet method uses RFE coupled with Logistic regression; the sPLSDA method uses sparse partial least squares discriminant analysis (sPLS-DA) (Rohart F et al. (2017)). The sPLSDA method selects variables from the ones kept in latent components of the model using an automatic choice of the number of components; the VSURF method uses RF with the VSURF function and package (Genuer et al. (2015)).

For this example, we used a k-fold cross validation (k = 2) with 5 to 70 variables. The VSURF method performs three steps variables selection : thresholding step, interpretation step and prediction step (Genuer et al. (2015)).

## a. using RFE and Random forest model
set.seed(2) # for  reproducibility
a <- SelectionVar(X_train,
                  Y_train,
                  MethodSelection = c("RFERF"),
                  MethodValidation = c("cv"),
                  NumberCV = 2,
                  PreProcessing = c("center","scale","nzv","corr"),
                  Sizes = c(5:70))


## b. using RFE and Random forest model
set.seed(2) # for  reproducibility
b <- SelectionVar(X_train,
                  Y_train,
                  MethodSelection = c("RFEGlmnet") ,
                  MethodValidation = c("cv"),
                  NumberCV = 2,
                  PreProcessing = c("center","scale","nzv","corr"),
                  Sizes = c(5:70))


## c. using sPLDA method 
set.seed(2) # for  reproducibility
c <- SelectionVar(X_train,
                  Y_train,
                  MethodSelection = c("sPLSDA"),
                  MethodValidation = c("cv"),
                  NumberCV = 2,
                  PreProcessing = c("scale","nzv"),
                  Sizes = c(5:70))

## d. using VSURF
set.seed(2) # for  reproducibility
d <- SelectionVar(X_train,
                  Y_train,
                  MethodSelection = c("VSURF"))

## Keep selected variables in a list
list_moz <- list("RFERF" = a$sel_moz,
                 "RFEglmnet" = b$sel_moz,
                 "sPLSDA" = c$sel_moz,
                 "VSURF" = d$sel_moz)

We can visualize the peaks selected by the RFERF with the red dotted lines.

PlotSpectra(SpectralData=spectra[[1]],Peaks=peaks[[1]],
Peaks2=a$sel_moz,col_spec="blue",col_peak="black")

We can visualize the peaks selected by the RFEglmnet method with the red dotted lines.

PlotSpectra(SpectralData=spectra[[1]],Peaks=peaks[[1]],
Peaks2=b$sel_moz,col_spec="blue",col_peak="black")

We can visualize the peaks selected by the sPLSDA method with the red dotted lines.

MSclassifR::PlotSpectra(SpectralData=spectra[[1]],Peaks=peaks[[1]],
Peaks2=c$sel_moz,col_spec="blue",col_peak="black")

We can visualize the peaks selected by the VSURF method with the red dotted lines.

MSclassifR::PlotSpectra(SpectralData=spectra[[1]],Peaks=peaks[[1]],
Peaks2=d$sel_moz,col_spec="blue",col_peak="black")

7. Estimation of a multinomial logistic regression

With the selected discriminant masses, we can create a prediction model. The LogReg function allows estimating a prediction model with k-fold cross validation in order to predict the category to which a mass spectrum belongs. Two main kinds of models can be estimated: linear or nonlinear (with neural networks (nnet method), random forests (rf method), support vector machines with linear kernel (svm method), or eXtreme Gradient Boosting (xgb method)). Hyperparameters are randomly searched, except for the eXtreme Gradient Boosting where a grid search is performed.

In this part, we will create different prediction models with discriminant mass-over-charge values previously selected ones using the 4 methods described above.

Each prediction models were estimated with a repeated k-fold cross validation (k = 2, and 2 repeats)in the train data set. Next, the accuracies of estimated models are evaluated on the test data set.

7.1 Use the selected variables with RFERF method

## Select variables found with RFERF 
sel_moz=a$sel_moz

## linear multinomial regression
model_lm=MSclassifR::LogReg(X=X_train, moz=sel_moz, Y=factor(Y_train), number=2, repeats=2)
#Estimated model:
#model_lm

## nonlinear multinomial regression using neural networks
model_nn=MSclassifR::LogReg(X=X_train, moz=sel_moz, Y=factor(Y_train), number=2, repeats=2, kind="nnet")
#Estimated model:
#model_nn

## nonlinear multinomial regression using random forests
model_rf=MSclassifR::LogReg(X=X_train, moz=sel_moz, Y=factor(Y_train), number=2, repeats=2, kind="rf")
#Estimated model:
#model_rf

## nonlinear multinomial regression using xgboost
model_xgb=MSclassifR::LogReg(X=X_train, moz=sel_moz, Y=factor(Y_train), number = 2, repeats=2, kind="xgb")
#Estimated model:
#model_xgb

## nonlinear multinomial regression using svm
model_svm=MSclassifR::LogReg(X=X_train, moz=sel_moz, Y=factor(Y_train), number=2, repeats=2, kind="svm")
#Estimated model:
#model_svm

Let’s look at the performances of different prediction models on test dataset.

## Keep models in a list
list_model_A <- list("lm" = model_lm$train_mod,
                      "nnet" = model_nn$train_mod,
                      "rf" = model_rf$train_mod,
                      "Xgboost" = model_xgb$train_mod,
                      "svm" = model_svm$train_mod)

## Plot performances of prediction model
model_A <- caret::resamples(list_model_A)
bwplot(model_A)

## Probabilities of belonging to each category for the mass spectra on test set
prob_cat=MSclassifR::Predict_LogReg(peaks = peaks[-Index_skfold],
                                    model = list_model_A,
                                    moz = sel_moz)

Estimating accuracies of each method on the test dataset:

## Split table according method used
ResultatsModelA <- split(prob_cat,prob_cat$method)

## Calcul accuracy for each method
DF_A <- data.frame(lapply(ResultatsModelA, function(x)(sum(Y_testb == x[,8])/length(Y_testb))), row.names = "RFERF accuracy")

t(DF_A)

##             RFERF accuracy
## comb_fisher     0.21428571
## max_vote        0.28571429
## multinom        0.85714286
## nnet            0.85714286
## rf              1.00000000
## svmLinear2      0.07142857
## xgbTree         0.85714286

7.2 Use the selected variables with RFEGlmnet method

## Select variables found with RFGlmnet
sel_moz=b$sel_moz

## linear multinomial regression
model_lm=MSclassifR::LogReg(X=X_train, moz=sel_moz, Y=factor(Y_train), number=2, repeats=2)
#Estimated model:
#model_lm

## nonlinear multinomial regression using neural networks
model_nn=MSclassifR::LogReg(X=X_train, moz=sel_moz, Y=factor(Y_train), number=2, repeats=2, kind="nnet")
#Estimated model:
#model_nn

## nonlinear multinomial regression using random forests
model_rf=MSclassifR::LogReg(X=X_train, moz=sel_moz, Y=factor(Y_train), number=2, repeats=2, kind="rf")
#Estimated model:
#model_rf

## nonlinear multinomial regression using xgboost
model_xgb=MSclassifR::LogReg(X=X_train, moz=sel_moz, Y=factor(Y_train), number=2, repeats=2, kind="xgb")
#Estimated model:
#model_xgb

## nonlinear multinomial regression using svm
model_svm=MSclassifR::LogReg(X=X_train, moz=sel_moz, Y=factor(Y_train), number=2, repeats=2, kind="svm")
#Estimated model:
#model_svm

## Keep models in a list
list_model_B <- list("lm" = model_lm$train_mod,
                      "nnet" = model_nn$train_mod,
                      "rf" = model_rf$train_mod,
                      "Xgboost" = model_xgb$train_mod,
                      "svm" = model_svm$train_mod)

## Plot performances of prediction model
model_B <- caret::resamples(list_model_B)
bwplot(model_B)

## Probabilities of belonging to each category for the mass spectra on test set
prob_cat=MSclassifR::Predict_LogReg(peaks = peaks[-Index_skfold],
                                    model = list_model_B,
                                    moz = sel_moz)

## Split table according method used
ResultatsModelB <- split(prob_cat,prob_cat$method)

## Calcul accuracy for each method
DF_B <- data.frame(lapply(ResultatsModelB, function(x)(sum(Y_testb == x[,8])/length(Y_testb))), row.names = "RFEGlmnet accuracy")

t(DF_B)

##             RFEGlmnet accuracy
## comb_fisher          0.2857143
## max_vote             0.2857143
## multinom             1.0000000
## nnet                 1.0000000
## rf                   0.9285714
## svmLinear2           0.4285714
## xgbTree              0.7857143

7.3 Use the selected variables with sPLSDA method

## Select variables found with RFGlmnet
sel_moz=c$sel_moz

## linear multinomial regression
model_lm=MSclassifR::LogReg(X=X_train, moz=sel_moz, Y=factor(Y_train), number=2, repeats=2)
#Estimated model:
#model_lm

## nonlinear multinomial regression using neural networks
model_nn=MSclassifR::LogReg(X=X_train, moz=sel_moz, Y=factor(Y_train), number=2, repeats=2, kind="nnet")
#Estimated model:
#model_nn

## nonlinear multinomial regression using random forests
model_rf=MSclassifR::LogReg(X=X_train, moz=sel_moz, Y=factor(Y_train), number=2, repeats=2, kind="rf")
#Estimated model:
#model_rf

## nonlinear multinomial regression using xgboost
model_xgb=MSclassifR::LogReg(X=X_train, moz=sel_moz, Y=factor(Y_train), number=2, repeats=2, kind="xgb")
#Estimated model:
#model_xgb

## nonlinear multinomial regression using svm
model_svm=MSclassifR::LogReg(X=X_train, moz=sel_moz, Y=factor(Y_train), number=2, repeats=2, kind="svm")
#Estimated model:
#model_svm

## Keep models in a list
list_model_C <- list("lm" = model_lm$train_mod,
                      "nnet" = model_nn$train_mod,
                      "rf" = model_rf$train_mod,
                      "Xgboost" = model_xgb$train_mod,
                      "svm" = model_svm$train_mod)

## Plot performances of prediction model
model_C <- caret::resamples(list_model_C)
bwplot(model_C)

## Probabilities of belonging to each category for the mass spectra on test set
prob_cat=MSclassifR::Predict_LogReg(peaks = peaks[-Index_skfold],
                                    model = list_model_C,
                                    moz = sel_moz)

## Split table according method used
ResultatsModelC <- split(prob_cat,prob_cat$method)

## Calcul accuracy for each method
DF_C <- data.frame(lapply(ResultatsModelC, function(x)(sum(Y_testb == x[,8])/length(Y_testb))), row.names = "sPLSDA accuracy")

t(DF_C)

##             sPLSDA accuracy
## comb_fisher       0.2857143
## max_vote          0.2142857
## multinom          1.0000000
## nnet              1.0000000
## rf                1.0000000
## svmLinear2        0.2857143
## xgbTree           0.7857143

7.4 Use the selected variables with VSURF method

## Select variables found with RFGlmnet
sel_moz=d$sel_moz

## linear multinomial regression
model_lm=MSclassifR::LogReg(X=X_train, moz=sel_moz, Y=factor(Y_train), number=2, repeats=2)
#Estimated model:
#model_lm

## nonlinear multinomial regression using neural networks
model_nn=MSclassifR::LogReg(X=X_train, moz=sel_moz, Y=factor(Y_train), number=2, repeats=2, kind="nnet")
#Estimated model:
#model_nn

## nonlinear multinomial regression using random forests
model_rf=MSclassifR::LogReg(X=X_train, moz=sel_moz, Y=factor(Y_train), number=2, repeats=2, kind="rf")
#Estimated model:
#model_rf

## nonlinear multinomial regression using xgboost
model_xgb=MSclassifR::LogReg(X=X_train, moz=sel_moz, Y=factor(Y_train), number=2, repeats=2, kind="xgb")
#Estimated model:
#model_xgb

## nonlinear multinomial regression using svm
model_svm=MSclassifR::LogReg(X=X_train, moz=sel_moz, Y=factor(Y_train), number=2, repeats=2, kind="svm")
#Estimated model:
#model_svm

## Keep models in a list
list_model_D <- list("lm" = model_lm$train_mod,
                      "nnet" = model_nn$train_mod,
                      "rf" = model_rf$train_mod,
                      "Xgboost" = model_xgb$train_mod,
                      "svm" = model_svm$train_mod)

## Plot performances of prediction model
model_D <- caret::resamples(list_model_D)
bwplot(model_D)

## Probabilities of belonging to each category for the mass spectra on test set
prob_cat=MSclassifR::Predict_LogReg(peaks = peaks[-Index_skfold],
                                    model = list_model_D,
                                    moz = as.numeric(sel_moz))

## Split table according method used
ResultatsModelD <- split(prob_cat,prob_cat$method)

## Calcul accuracy for each method
DF_D <- data.frame(lapply(ResultatsModelC, function(x)(sum(Y_testb == x[,8])/length(Y_testb))), row.names = "VSURF accuracy")

t(DF_D)

##             VSURF accuracy
## comb_fisher      0.2857143
## max_vote         0.2142857
## multinom         1.0000000
## nnet             1.0000000
## rf               1.0000000
## svmLinear2       0.2857143
## xgbTree          0.7857143

8. Conclusion

the accuracies of all the used method in the test dataset can be summarized:

## Summarize results on test set
DFf <- rbind(DF_A, DF_B, DF_C, DF_D)

## Print results
t(DFf)

##             RFERF accuracy RFEGlmnet accuracy sPLSDA accuracy VSURF accuracy
## comb_fisher     0.21428571          0.2857143       0.2857143      0.2857143
## max_vote        0.28571429          0.2857143       0.2142857      0.2142857
## multinom        0.85714286          1.0000000       1.0000000      1.0000000
## nnet            0.85714286          1.0000000       1.0000000      1.0000000
## rf              1.00000000          0.9285714       1.0000000      1.0000000
## svmLinear2      0.07142857          0.4285714       0.2857143      0.2857143
## xgbTree         0.85714286          0.7857143       0.7857143      0.7857143

This vignette illustrates how MSclassifR allows estimating many different machine learning-based models to identify two species of the genus Ecrobia from MALDI-TOF mass spectra, and how to use these models to predict the species of a new coming mass spectrum. In the example we used, nines models of the 28 models have accuracy equal to 1. For instance, perfect accuracies were obtained on the test data set with variables selection by RFE-Glmnet, sPLSDA and VSURF coupled with lm, nnet methods.

## time measurement
end_time <- Sys.time()
end_time - start_time

## Time difference of 48.34758 mins

References

sessionInfo()

## R version 4.1.2 (2021-11-01)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19044)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252   
## [3] LC_MONETARY=French_France.1252 LC_NUMERIC=C                  
## [5] LC_TIME=French_France.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] caret_6.0-91           lattice_0.20-45        ggplot2_3.3.5         
## [4] MALDIquantForeign_0.13 MALDIquant_1.21        MSclassifR_0.2.0      
## 
## loaded via a namespace (and not attached):
##   [1] sn_2.0.2                 plyr_1.8.6               igraph_1.2.11           
##   [4] splines_4.1.2            BiocParallel_1.28.3      listenv_0.8.0           
##   [7] qqconf_1.1.1             TH.data_1.1-0            MALDIrppa_1.1.0-1       
##  [10] digest_0.6.29            foreach_1.5.2            htmltools_0.5.2         
##  [13] fansi_1.0.3              magrittr_2.0.2           doParallel_1.0.17       
##  [16] recipes_0.2.0            globals_0.14.0           gower_1.0.0             
##  [19] matrixStats_0.61.0       rARPACK_0.11-0           sandwich_3.0-1          
##  [22] hardhat_0.2.0            colorspace_2.0-3         signal_0.7-7            
##  [25] ggrepel_0.9.1            rbibutils_2.2.7          xfun_0.30               
##  [28] dplyr_1.0.8              crayon_1.5.1             jsonlite_1.8.0          
##  [31] readBrukerFlexData_1.8.5 survival_3.2-13          zoo_1.8-9               
##  [34] iterators_1.0.14         glue_1.6.2               gtable_0.3.0            
##  [37] ipred_0.9-12             readMzXmlData_2.8.1      future.apply_1.8.1      
##  [40] shape_1.4.6              BiocGenerics_0.40.0      DEoptimR_1.0-11         
##  [43] scales_1.2.0             mvtnorm_1.1-3            DBI_1.1.2               
##  [46] Rcpp_1.0.8               metap_1.8                plotrix_3.8-2           
##  [49] tmvnsim_1.0-2            proxy_0.4-26             stats4_4.1.2            
##  [52] lava_1.6.10              prodlim_2019.11.13       Rborist_0.2-3           
##  [55] glmnet_4.1-3             RColorBrewer_1.1-3       TFisher_0.2.0           
##  [58] ellipsis_0.3.2           pkgconfig_2.0.3          XML_3.99-0.9            
##  [61] farver_2.1.0             nnet_7.3-16              sass_0.4.1              
##  [64] utf8_1.2.2               tidyselect_1.1.2         labeling_0.4.2          
##  [67] rlang_1.0.2              reshape2_1.4.4           munsell_0.5.0           
##  [70] tools_4.1.2              xgboost_1.5.2.1          cli_3.2.0               
##  [73] generics_0.1.2           ranger_0.13.1            mathjaxr_1.6-0          
##  [76] evaluate_0.15            stringr_1.4.0            fastmap_1.1.0           
##  [79] yaml_2.3.5               fuzzyjoin_0.1.6          ModelMetrics_1.2.2.2    
##  [82] knitr_1.38               robustbase_0.93-9        purrr_0.3.4             
##  [85] randomForest_4.7-1       future_1.24.0            nlme_3.1-153            
##  [88] compiler_4.1.2           rstudioapi_0.13          e1071_1.7-9             
##  [91] waveslim_1.8.2           tibble_3.1.6             bslib_0.3.1             
##  [94] stringi_1.7.6            highr_0.9                RSpectra_0.16-0         
##  [97] Matrix_1.3-4             VSURF_1.1.0              multtest_2.50.0         
## [100] vctrs_0.3.8              mutoss_0.1-12            pillar_1.7.0            
## [103] lifecycle_1.0.1          Rdpack_2.3               jquerylib_0.1.4         
## [106] data.table_1.14.2        corpcor_1.6.10           R6_2.5.1                
## [109] gridExtra_2.3            parallelly_1.31.0        codetools_0.2-18        
## [112] MASS_7.3-54              assertthat_0.2.1         withr_2.5.0             
## [115] mnormt_2.0.2             multcomp_1.4-18          parallel_4.1.2          
## [118] mixOmics_6.18.1          grid_4.1.2               rpart_4.1-15            
## [121] timeDate_3043.102        tidyr_1.2.0              class_7.3-19            
## [124] rmarkdown_2.13           pROC_1.18.0              numDeriv_2016.8-1.1     
## [127] Biobase_2.54.0           lubridate_1.8.0          base64enc_0.1-3         
## [130] ellipse_0.4.2

Genuer,R. et al. (2015) VSURF: An R Package for Variable Selection Using Random Forests. The R Journal, 7, 19–33.

Gibb,S. and Strimmer,K. (2012) MALDIquant: a versatile R package for the analysis of mass spectrometry data. Bioinformatics, 28, 2270–2271.

Kuhn,M. (2008) Building predictive models in r using the caret package. Journal of Statistical Software, Articles, 28, 1–26.

Lasch,P. et al. (2016) A MALDI-TOF Mass Spectrometry Database for Identification and Classification of Highly Pathogenic Microorganisms from the Robert Koch- Institute (RKI).

Rohart F,G.B. et al. (2017) mixOmics: An r package for ’omics feature selection and multiple data integration. PLoS computational biology, 13, e1005752.

Wilke,T. et al. (2020) Proteomic Fingerprinting Discriminates Cryptic Gastropod Species. Malacologia, 63, 131–137.