Metagenomic pipeline to identify co-infections between different SARS-CoV-2 variants of concern: Alpha to Omicron case studies

The SARS-CoV-2 genome has evolved rapidly into multiple variants due not only to the spread in various human populations, but also to the increase in the mutation rate during the year 20213. In this context, the interaction of multiple viral sequences with each other during simultaneous infection. it can cause potential differences in epidemiological behavior11. Therefore, it is vital to reveal the frequency of co-infection events, the frequency with which they occur in the population and the exact composition of the lineages13.

Here we presented an analysis of divergent VOC coinfection of the SARS-CoV-2 virus, in which samples with two different genotypes were analyzed using a metagenomic approach by ASV inference. Similar to another work of 13, we assumed that the existence of mutations of specific characteristics defined by the lineage of the lineages in quasi-viral species achieves the identification of coinfection events (Fig. 3). A customized SARS-CoV-2 database identified specific VOCs belonging to VOCs, as well as non-specific ASVs found in other genotypes. The classification metrics (VOC classes) revealed a high performance of the pipeline with an accuracy of 96.2% and AUC of 0.964. This was in stark contrast to the metagenome assembly approach, in which the classification was suggested to be random (accuracy = 46.2% and AUC = 0.500). The poor performance of the metagenome as a whole depends on the generation of a single consensus sequence even for cases with two different genomes, in which a mutation of a genotype can be overshadowed by the presence of the non-mutated nucleotide in the another sequence, creating an incorrect profile of few mutations with an erroneous VOC class assignment (Table 2).

In addition, prior to the arrival of the omicron variant in November 2021, the first version of this work was prepared and the accuracy of 96.3% for the ASV-like approach was reported and 57.7% for the whole genome as a whole (more data not shown). ). This update demonstrated the versatility of our approach to incorporating new genotypes to infer coinfections with different VOCs.

Although the pipeline can be adapted to identify coinfection with more than 2 genotypes, the relevance of implementing the analysis with three 3 genotypes is questionable due to the very low incidence of coinfection with 2 genotypes (almost impossible with 3 genotypes). and the unnecessary negative impact on performance (mutations between three lineages are more likely to create a mutation profile close to another single lineage). Thus, we have only considered combinations with two divergent genotypes.

In addition, the ASV approach was originally designed to identify different bacteria using 16S rRNA. By adapting the database, this method is completely suitable for identifying co-infections with other pathogens. However, for different microorganisms, metagenome assembly is a better strategy to identify double infection. In our case, the SARS-CoV-2 genotypes are not different enough to use the metagenome as a whole to identify concomitant infections.

Regarding the sequencing data, it has been reported that the high reliability of Illumina technology maintains the genomic evidence of coinfections or variations within the host13, which has motivated its use for coinfection studies, including this work. General approaches to identifying the concomitant presence of organisms can be made using: (i) metagenomics strategies, or (ii) strategies based on haplotype reconstruction using mapping. For SARS-CoV-2 coinfections, the latter was the method selected due to the availability of ready-to-use bioinformatics tools26,27,28. Despite this, viral haplotype reconstruction programs often malfunction for low-divergence or rare haplotype sequences29, which represent a possible limitation for use between samples with simultaneous SARS-CoV-2 genomes. Therefore, the assembly of unique genomes and the subsequent combination in simulated coinfection data was preferred here not only for the developed pipeline, but also to create the soil truth dataset rather than the comparison with a haplotype that call.

Through analysis of haplotype reconstruction, some studies have reported coinfection events caused by the appearance of two different genotypes. In 2020, 19 cases of co-infections were identified in Iraq11. Up to 8% of co-infections were reported in a Singapore study2, while at least 5% were estimated in the United Arab Emirates12. In Brazil, co-infection with local lineages was detected in early 202110. In September 2021, a study analyzed 30,806 raw sequence data sets, of which approximately 2.6% were identified as co-infections with high confidence3.

To our knowledge, only one study has implemented a specific pipeline to identify co-infections by different SARS-CoV-2 genotypes, which used a strategy with a call analysis of intra-host variants and a hypergeometric distribution method13. Using COVID-19 case sequencing data from the United States of America, the authors recognized only 53 of the 29,993 samples (0.18%) as coinfection cases. A single case with three lineages was reported, while the other 52 were identified with two genotypes. These results in terms of frequency are in line with our results on the possible occurrence of coinfections in Costa Rica. None of the 1021 cases analyzed were identified as a concomitant presence of VOC. Due to the frequency of co-infections, which is suggested to be very low, the cases analyzed may not be sufficient to identify at least one case in this country. In addition, other factors that affect this result are the few sequenced samples compared to the diagnosed cases (0.4% in Costa Rica and 0.41% in Latin America according to the GISAID database), the inability to clinically differentiate the cases. of coinfection (see later), as well as the rapid displacement of circulating lineages by VOCs that have a higher transmissibility. However, with the Delta and Omicron codomain during the transition between 2021 and 202230, reports of co-infections could increase in the coming months. Therefore, this work could be a useful tool to investigate the occurrence of this phenomenon.

In terms of biological significance, the coinfection report is worrisome because other studies have shown that this phenomenon may contribute to RNA1,8,13 virus recombination. As a result of recombination processes, new virions can acquire different pathogenic properties1 and can affect the clinical presentation of the disease in more severe symptoms13. In detail, coinfections can affect viral evolution by inducing recombination and possibly generating new genotypes. In this scenario, new features can also be activated in terms of transmission, vaccine efficacy, or clinical outcome. Thus, the contribution of this study is mainly to support genomic surveillance and, finally, to provide an epidemiological context to explain the possible origin of recombinants. This is a possible first step in making a decision about additional enhancements or developing new vaccines based on genome architecture, as previously reported for mutation-based changes.

Regarding the clinical result, cases with coinfection with different SARS-CoV-2 genotypes with the same symptoms as other patients with COVID-192.10 have been reported. However, a single coinfection report, in a young patient without comorbidities presenting with severe COVID-19, suggested concomitant infection as responsible for the clinical presentation9. Further studies and up-to-date statistics are needed to establish the relevance of coinfection in terms of the severity and mortality of COVID-1111 disease. Also, the detection of possible cases of coinfection is another factor to consider in the interaction between the immune system and SARS-CoV-2 mutations. It has been suggested that immunity driven by a specific genotype of SARS-CoV-2 does not protect against another, but may lead to a more severe disease pattern9. However, the actual immunological implications of coinfection at the cellular or humoral level are not well known10.

On the other hand, some considerations and limitations need to be considered in this study. First, the approach used by ASV inference requires a custom database that must be constructed using locally circulating genome sequences, i.e., local genomic surveillance is a preliminary step to implementing the pipeline. This includes updating VOC sequences that carry new mutations (underlines). Second, similar to other approaches to identifying coinfections, only coinfections with divergent sequences (VOCs in this case) can be identified. A co-infection test with VOI and VUM was unable to identify concomitant sequences due to low diversity, in which poor performance was obtained during genotype classification. Furthermore, the intrahost mutation again cannot be identified by this approach. This is not a drawback for this implementation because there is a very low probability of the occurrence of new mutations corresponding to exactly all mutations in the characteristics of VOCs. If any de novo mutation was equal to a characteristic mutation, the ASV can be ruled out with the implementation of the mapping ASV threshold, as we have followed here. Finally, although co-infection with SARS-CoV-2 and other pathogens, such as influenza or bacterial agents, has been reported31,32, and this pipeline could be adapted to identify them, a set of metagenomes could be a appropriate strategy instead of this approach.

Taken together, this analysis represents a new effort to track SARS-CoV-2 genotypes circulating in Costa Rica, which are complementary to our other local studies for genomic surveillance7,33, as well as the identification of clinical patterns of patients with COVID-1934. Infection concomitant with different viral genotypes can lead to the generation of SARS-CoV-2 variants with possible new properties in terms of …

Leave a Comment

Your email address will not be published. Required fields are marked *