Bacteria are highly prevalent microorganisms in the microbiota and play an important role in oral homeostasis [41]. The abundance of some bacteria may indicate dysbiosis of the oral microbiome [25, 35]. Identification of the core is inferred from the study of different microbiomes and allows to reveal the conditions of the hosts according to the presence/absence or predominance of some species over others. However, there is a set of bacteria that, regardless of the condition of the host, can be part of the core of all these microbiomes. Similar to the concept of housekeeping genes, which conceptualize the genes essential to the life of an organism, a set of bacteria from the oral microbiome, regardless of the health status of the host, can be inferred as that essential to the symbiosis between the microorganisms of the oral cavity.
In this study, 20 genera of bacteria were found in more than 450 metagenomes (Fig. 5) deposited in public databases and which, regardless of the health condition of the host, are present in the core of the oral microbiome.
In both Amplicon sequencing and Shotgun metagenomics cores, genera of bacteria already associated with caries and periodontal diseases were found, such as Streptococcus, Lactobacillus and Prevotella [10, 42].
The saliva of individuals with high caries experience is associated with a high salivary abundance of Streptococcus and countless species of Lactobacillus in addition to other bacteria capable of degrading sugars and forming extracellular polysaccharides [18]. Samples from healthy individuals with low caries experience were associated with a greater abundance of the genera Neisseria, Haemophilus, and Fusobacterium, of which most species of this genera only ferment sugar slightly [43]. According to Tanner et al. [18], the composition of saliva in the oral cavity is one of the main risk factors associated with caries. Biofilm dysbiosis results in an increase in acidogenic and aciduric species, capable of modulating the core components in the biofilm. While in cases of gingivitis, the increase in the amount of plaque around the gingival margin induces the inflammatory response in the host, leading to increased levels of anaerobic bacteria, including Gram-negative proteolytic species, especially those belonging to the Prevotella, Porphyromonas, Tannerella, Fusobacterium and Treponema genera [44].
Identification of Corynebacterium, Escherichia, Pseudomonas and Shigella suggests that genera with pathogenic potential may also be part of the core of the oral microbiome obtained by salivary samples. Chitinophaga was a recently described taxon and was observed only in Shotgun metagenomics. This genus was highly representative, as well as Escherichia, Acinetobacter, Streptococcus, and Shigella (Fig. 3). The pathogenic potential of the genus Chitinophaga has already been reported [45, 46]. However, its role in the oral microbiome is still unknown.
Amplicon sequencing metadata analysis showed inconsistent behavior (Fig. 2). Only Rarefaction and Unknown had correlation greater than 0.50 (r = 0.63). The expected behavior was of inverse correlation. The greatest number of non-inferred sequences determines the smallest number of potentially discovered organisms.
On the other hand, Shotgun metagenomics presented results as expected, which can be exemplified by inverse correlation between Failure and Rarefaction. The largest number of sequences with quality failure determines the smallest number of sequences to be inferred, which affects the rarefaction curve. This was exactly the behavior observed in Shotgun metagenomics for these variables (r = − 0.78).
These behaviors in different approaches certainly influenced the comparative study between them (Fig. 1). Furthermore, it was observed in the Amplicon sequencing dataset that all projects did not present sequences with quality failures (Failed = 0). This result was unexpected. Even Amplicon sequencing can have quality failures in the sequencing process. However, the Predicted was the only one that did not show difference between the approaches, showing results similar to those observed by [47], who investigated the microbial composition of the human intestine.
Results obtained by Shotgun metagenomics allowed a more complex characterization of the microbiome, with the identification of greater diversity and at the taxonomic level of species, when compared to Amplicon sequencing which uses regions of the gene with variability to identify down to the genus level [48].
According to the literature [47, 49], the PCA identified a greater number of representative genera in the Shotgun dataset than in the Amplicon dataset (Fig. 3). The differences between them may explain the findings. In Shotgun metagenomics, the DNA of all the organisms in the sample is extracted and sequenced directly. On the other hand, in Amplicon sequencing, only the DNA fragments that were aligned to the primer will be sequenced. The choice of primer seems to be a crucial factor to avoid bias in taxonomic analysis [50].
The specificity of primers may restrict the set of microorganisms found in studies of Amplicon sequencing. Thus, the choice of the sequencing method as well as the selection of primers are important characteristics to be considered in the analysis of microbiome studies [51].
Microbiome studies comparing the two sequencing methods for the same samples suggest that their results might be comparable. In this study, we observed that the data produced by Shotgun metagenomics of salivary samples available on the MG-RAST platform can provide the identification of a greater number of genera, evidencing the complexity of the oral microbiome, either by the diversity of genera or by the role they may play in the salivary microbiome [50].
These results should be interpreted with caution, since only the presence of the genera does not determine the condition of the host. Other characteristics such as abundance and interaction between genera have a relevant role in the association of the microbiota with the condition of the host [12].
Metagenomics projects deposited in public databases such as eHOMD and MG-RAST do not always provide information on the health conditions of the host, DNA/RNA extraction techniques or other information that might infer microbiome-host relationships.
Studies identified using the information of the principal investigator suggest that they correspond to the data obtained in the MG-RAST. However, it is not possible to specify whether such articles refer to data investigated in this study. According to the MG-RAST pipeline guideline (https://help.mg-rast.org/user_manual.html), it is not possible to carry out analyses of eukaryotes or viruses, which suggests that the DNA/RNA extraction method of the selected projects allows inferring the bacterial microbiota.