SILVA 138 Classifiers

Feature Classifiers for the variable regions of Prokaryotic 16S rRNA genes.

Workflow


  1. Download SILVA data

    qiime rescript get-silva-data --p-version '138' --p-target 'SSURef_NR99' --p-include-species-labels --o-silva-sequences silva-138-ssu-nr99-seqs.qza --o-silva-taxonomy silva-138-ssu-nr99-tax.qza
    Taxonomy file has 7 levels as required and without gaps (all levels are labelled). Not performing the rank propagation step.
  2. Remove low quality sequences

    qiime rescript cull-seqs --i-sequences silva-138-ssu-nr99-seqs.qza --o-clean-sequences silva-138-ssu-nr99-seqs-cleaned.qza
  3. Remove Eukaryotic taxa

    qiime taxa filter-seqs --i-sequences silva-138-ssu-nr99-seqs-cleaned.qza --i-taxonomy silva-138-ssu-nr99-tax.qza --p-exclude 'd__Eukaryota' --p-mode 'contains' --o-filtered-sequences silva138_noEuk_seqs.qza
  4. Filter by length

    Archaea, Bacteria & Eukaryota (900, 1200, 1400 bases) - Excluding Eukaryota as it was removed in the previous step.
    qiime taxa filter-seqs --i-sequences silva-138-ssu-nr99-seqs-cleaned.qza --i-taxonomy silva-138-ssu-nr99-tax.qza --p-exclude 'd__Eukaryota' --p-mode 'contains' --o-filtered-sequences silva138_noEuk_seqs.qza
    This step did not discard any more sequences.
  5. Dereplicate

    Default mode - uniq; Default rank-handles - silva
    qiime rescript dereplicate --i-sequences silva138_noEuk_AB_seqs.qza --i-taxa silva-138-ssu-nr99-tax.qza --p-threads 12 --o-dereplicated-sequences silva138_noEuk_AB_seqs_uniq.qza --o-dereplicated-taxa silva138_noEuk_AB_tax_uniq.qza

    Dereplicated Sequences: silva138_noEuk_AB_seqs_uniq.qza
    Dereplicated Taxa: silva138_noEuk_AB_tax_uniq.qza

Classifiers



  1. Extract the V1-V2 region

    Primers: 27f & 338r
    qiime feature-classifier extract-reads --i-sequences silva138_noEuk_AB_seqs_uniq.qza --p-f-primer AGAGTTTGATCMTGGCTCAG --p-r-primer TGCTGCCTCCCGTAGGAGT --p-n-jobs 12 --o-reads silva138_AB_V1-V2seqs.qza

    Output: silva138_AB_V1-V2seqs.qza

  2. Dereplicate the target region

    qiime rescript dereplicate --i-sequences silva138_AB_V1-V2seqs.qza --i-taxa silva138_noEuk_AB_tax_uniq.qza --o-dereplicated-sequences silva138_AB_V1-V2seqs_uniq.qza --o-dereplicated-taxa silva138_AB_V1-V2taxa_uniq.qza

    Dereplicated Sequences: silva138_AB_V1-V2seqs_uniq.qza
    Dereplicated Taxa: silva138_AB_V1-V2taxa_uniq.qza

  3. Classify & evaluate with RESCRIPt

    Using multiple threads increases memory usage. Using 1 thread with auto reads per batch (took approx 19 hrs)
    qiime rescript evaluate-fit-classifier --i-sequences silva138_AB_V1-V2seqs_uniq.qza --i-taxonomy silva138_AB_V1-V2taxa_uniq.qza --o-classifier silva138_AB_V1-V2_classifier.qza --o-observed-taxonomy silva138_AB_V1-V2_predicted_taxonomy.qza --o-evaluation silva138_AB_V1-V2_classifier_eval.qzv

    Classifier: silva138_AB_V1-V2_classifier.qza
    Predicted Taxonomy: silva138_AB_V1-V2_predicted_taxonomy.qza
    Evaluation: silva138_AB_V1-V2_classifier_eval.qzv

  1. Extract the V3 region

    Primers: 341f & 518r
    qiime feature-classifier extract-reads --i-sequences silva138_noEuk_AB_seqs_uniq.qza --p-f-primer CCTACGGGNGGCWGCAG --p-r-primer GTATTACCGCGGCTGCTGG --p-n-jobs 12 --o-reads silva138_AB_V3seqs.qza

    Output: silva138_AB_V3seqs.qza

  2. Dereplicate the target region

    qiime rescript dereplicate --i-sequences silva138_AB_V3seqs.qza --i-taxa silva138_noEuk_AB_tax_uniq.qza --o-dereplicated-sequences silva138_AB_V3seqs_uniq.qza --o-dereplicated-taxa silva138_AB_V3taza_uniq.qza

    Dereplicated Sequences: silva138_AB_V3seqs_uniq.qza
    Dereplicated Taxa: silva138_AB_V3taxa_uniq.qza

  3. Classify & evaluate with RESCRIPt

    Using multiple threads increases memory usage. Using 1 thread with auto reads per batch (took approx 20 hrs)
    qiime rescript evaluate-fit-classifier --i-sequences silva138_AB_V3seqs_uniq.qza --i-taxonomy silva138_AB_V3taxa_uniq.qza --o-classifier silva138_AB_V3_classifier.qza --o-observed-taxonomy silva138_AB_V3_predicted_taxonomy.qza --o-evaluation silva138_AB_V3_classifier_eval.qzv

    Classifier: silva138_AB_V3_classifier.qza
    Predicted Taxonomy: silva138_AB_V3_predicted_taxonomy.qza
    Evaluation: silva138_AB_V3_classifier_eval.qzv

  1. Extract the V3-V4 region

    Primers: 341f & 805r
    qiime feature-classifier extract-reads --i-sequences silva138_noEuk_AB_seqs_uniq.qza --p-f-primer CCTACGGGNGGCWGCAG --p-r-primer GACTACHVGGGTATCTAATCC --p-n-jobs 12 --o-reads silva138_AB_V3-V4seqs.qza

    Output: silva138_AB_V3-V4seqs.qza

  2. Dereplicate the target region

    qiime rescript dereplicate --i-sequences silva138_AB_V3-V4seqs.qza --i-taxa silva138_noEuk_AB_tax_uniq.qza --o-dereplicated-sequences silva138_AB_V3-V4seqs_uniq.qza --o-dereplicated-taxa silva138_AB_V3-V4taxa_uniq.qza

    Dereplicated Sequences: silva138_AB_V3-V4seqs_uniq.qza
    Dereplicated Taxa: silva138_AB_V3-V4taxa_uniq.qza

  3. Classify & evaluate with RESCRIPt

    Using multiple threads increases memory usage. Using 1 thread with auto reads per batch (took approx 30 hrs)
    qiime rescript evaluate-fit-classifier --i-sequences silva138_AB_V3-V4seqs_uniq.qza --i-taxonomy silva138_AB_V3-V4taxa_uniq.qza --o-classifier silva138_AB_V3-V4_classifier.qza --o-observed-taxonomy silva138_AB_V3-V4_predicted_taxonomy.qza --o-evaluation silva138_AB_V3-V4_classifier_eval.qzv

    Classifier: silva138_AB_V3-V4_classifier.qza
    Predicted Taxonomy: silva138_AB_V3-V4_predicted_taxonomy.qza
    Evaluation: silva138_AB_V3-V4_classifier_eval.qzv

  1. Extract the V4 region

    Primers: 515f & 806r
    qiime feature-classifier extract-reads --i-sequences silva138_noEuk_AB_seqs_uniq.qza --p-f-primer GTGYCAGCMGCCGCGGTAA --p-r-primer GGACTACNVGGGTWTCTAAT --p-n-jobs 12 --o-reads silva138_AB_V4seqs.qza

    Output: silva138_AB_V4seqs.qza

  2. Dereplicate the target region

    qiime rescript dereplicate --i-sequences silva138_AB_V4seqs.qza --i-taxa silva138_noEuk_AB_tax_uniq.qza --o-dereplicated-sequences silva138_AB_V4seqs_uniq.qza --o-dereplicated-taxa silva138_AB_V4taxa-uniq.qza --p-threads 12

    Dereplicated Sequences: silva138_AB_V4seqs_uniq.qza
    Dereplicated Taxa: silva138_AB_V4taxa_uniq.qza

  3. Classify & evaluate with RESCRIPt

    Using multiple threads increases memory usage. Using 1 thread with auto reads per batch (took approx 24 hrs)
    qiime rescript evaluate-fit-classifier --i-sequences silva138_AB_V4seqs_uniq.qza --i-taxonomy silva138_AB_V4taxa-uniq.qza --p-reads-per-batch 10000 --o-classifier silva138_AB_V4_classifier.qza --o-observed-taxonomy silva138_AB_V4_predicted_taxonomy.qza --o-evaluation silva138_AB_V4_classifier_eval.qzv

    Classifier: silva138_AB_V4_classifier.qza
    Predicted Taxonomy: silva138_AB_V4_predicted_taxonomy.qza
    Evaluation: silva138_AB_V4_classifier_eval.qzv

  1. Classifying & evaluating with RESCRIPt

    This step was performed with qiime_2023.5 on a Fedora 39 server
    qiime rescript evaluate-fit-classifier --i-sequences silva138_noEuk_AB_seqs_uniq.qza --i-taxonomy silva138_noEuk_AB_tax_uniq.qza --o-classifier silva138_noEuk_AB_classifier.qza --o-observed-taxonomy silva138_noEuk_AB_predicted_taxonomy.qza --o-evaluation silva138_noEuk_AB_classifier_eval.qzv --p-n-jobs 10

    Classifier: silva138_noEuk_AB_classifier.qza
    Predicted Taxonomy: silva138_noEuk_AB_predicted_taxonomy.qza
    Evaluation: silva138_noEuk_AB_classifier_eval.qzv

System configuration

(Except for the full-length classifier)

  • OS: 34
  • RAM: 16GB
  • CPU: AMD Ryzen 4600H (6C/12T)
  • SWAP: 50GB
  • tmp: 50GB

References


  1. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. https://doi.org/10.1038/s41587-019-0209-9
  2. RESCRIPt: Reproducible sequence taxonomy reference database management https://doi.org/10.1371/journal.pcbi.1009581
  3. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools https://doi.org/10.1093/nar/gks1219