ITSone DB

ITSoneDB: Eukaryal Ribosomal Internal Transcribed Spacer 1 Database

ITSoneDB is a comprehensive collection of eukaryotic ribosomal RNA Internal Transcribed Spacer 1 (ITS1) sequences. It is aimed at supporting metabarcoding surveys of fungal and other microbial eukaryotic environmental communities. The sequences were extracted from the European Nucleotide Archive (ENA) and arranged on the NCBI taxonomy tree. ITS1 start and end boundaries were defined by ENA annotations and/or designed by mapping Hidden Markov Model (HMM) profiles of flanking 18S and 5.8S ribosomal RNA coding genes on each sequence.

Current ITSoneDB release 1.144 (June 2024) is based on ENA April 2024. Release history and plan

ITSoneDB offers a complete data analysis environment, called ITSoneWB, to perform data analysis of both Shotgun Metagenomics and DNA-Metabarcoding data. Additional tools are also available to perform barcoding-gap inference and primer-design relying on ITSoneDB data.

ITSoneDB Content

Number of the nucleotide sequences	1,426,820
Total number of species (according to the NCBI taxonomy)	184,597
Number of ITS1 nucleotide sequences with start and end positions inferred by both ENA information and HMM profiles	261,410
Number of ITS1 nucleotide sequences with start and end position inferred only by HMM profiles mapping	548,405
Number of ITS1 nucleotide sequences with start and end position inferred only by ENA information	617,005

CNR Istituto di Tecnologie Biomediche, Bari - CNR Istituto di Biomembrane, Bioenergetica e Biotecnologie Molecolari, Bari - Università degli studi di Bari

The figure illustrates the main steps deployed in the pipeline procedure for ITSoneDB generation. The software components of the pipeline are implemented in Python programs except the ETL module for database population implemented in Java. In the initial step a collection of Eukaryotis candidate sequences was created by downloading the entire ENA nucleotide release. The collected entries are computed in two different part of the workflow in order to extract or to infer the ITS1 boundaries. On the left side branch, by a pre-compiled dictionary of common ITS1 definition synonymous, the procedure detect and extract ITS1 start and end site from entries feature tables annotations. On the right side branch, ITS1 boundaries are inferred by mapping Hidden Markov Model (HMM) profiles of flanking genes for 18S and 5.8S ribosomal RNA by means of the hmmsearch tool included in HMMER 3.1. The data obtained from the left and the right branch are used to populate the database and entries with both methods informative are merged. Then, a reference entry is defined for each species. At the end, an ETL module combines for each candidate sequence the annotation and the HMM ITS1 boundaries and populates the ITSoneDB by adding further information like species name, taxonomic lineage, ENA description, HMM profile alignments, etc.

Test Case

In order to demonstrate ITSoneDB usability as reference database for ITS1 based metagenomics approaches, we carried out a benchmark test on a real 454 dataset, freely available in the Sequence Read Archive (SRR174891). The dataset contains 5,160 sequences come from a study of soil of nine T.melanosporum/Q. pubescens truffle-grounds collected in May 2006 in Cahors. This benchmark test aims shows the effectiveness of ITSoneDB as metagenomics reference database compared to the use of another database, such as UNITE (Kõljalg U. et al, 2005). The test was carried out by mapping the reads on both ITSoneDB and UNITE databases, using blastn, and considering only significant matches, according to the three criteria:

at least half of the query sequence must be aligned;
query and subject sequence must share at least 95% of identity;
consider significant only subject sequences that are taxonomically informative.

Using ITSoneDB we found 3,206 significant matches while using UNITE we found 2,896 significant matches. These results demonstrate the ITSoneDB effectiveness as metagenomics reference database for the characterization of fungine populations.

ITSoneDB rel.	Date	# of sequences	ENA rel.	Date
1.131	Aug. 2017	985,240	131	Feb. 2017
1.138	Mar. 2019	1,174,761	138	Nov. 2018
1.141	Mar. 2020	1,218,745	141	Oct. 2019
1.144	Jun. 2024	1,426,820		Apr. 2024

ITSoneDB: Eukaryal Ribosomal Internal Transcribed Spacer 1 Database

ITSoneDB Content

Contacts

Credits

Santamaria Monica

Fosso Bruno

De Caro Giorgio

Grillo Giorgio

Licciulli Flavio

Liuni Sabino

Pesole Graziano

Test Case