GenomeBits representation of the SARS-CoV-2 Delta and Omicron genome sequences

In a recent study published in PLoS ONE, researchers discovered different genomic features of the Delta and Omicron variants of coronavirus 2 (SARS-CoV-2) from severe acute respiratory syndrome.

Study: GenomeBits view on the omicron and delta variants of the coronavirus pathogen. Image credit: CROCOTHERY / Shutterstock

Understanding SARS-CoV-2, the causative pathogen of the 2019 coronavirus disease pandemic (COVID-19), is still a challenge. It has been suggested that the SARS-CoV-2 genome may have formed due to the recombination of genomes close to those of bat and pangolin coronaviruses (CoV). It is essential to investigate the origin of SARS-CoV-2 to prevent the occurrence of pandemics in the future.

The SARS-CoV-2 Delta and Omicron variants have common and unique mutations in the ear protein. Previously, the authors described GenomeBits [a statistical algorithm that maps nucleotide bases into a finite alternating sum series of distributed terms of binary values (0, 1)] and revealed different genomic patterns for the SARS-CoV-2 variants Alpha, Beta, Gamma, Epsilon, and Eta.

The study and conclusions

In the present study, the researchers applied the GenomeBits method to discover the distinctive patterns of the SARS-CoV-2 Delta and Omicron genomic sequences. Genomic sequence data were obtained from the Global Avian Influenza Data Sharing Initiative (GISAID) repository. In the similarity graphs generated using the Waterman-Eggert algorithm with the lalign36 alignment software, the authors observed a more significant deviation of the Omicron variant (B.1.1.529) than the Delta variant (AY.4.2) of the ancestral SARS-CoV-2 (Wuhan). -Hu-1) sequences.

Sequences of the Delta variant of Spain showed more significant deviations when consulted with Omicron sequences from Spain. Similar variations were observed with U.S. Delta (US) sequences versus U.S. Omicron sequences. Conventional similarity methods provide limited information on nucleotide bases: adenine (A), cytosine (C), thymine (T), and guanine (G), and determining the parameters for achieving optimal alignment could be difficult. In addition, computational resources increase substantially depending on the number and length of sequences.

In contrast, the GenomeBits method works efficiently with less processing time for massive genomic data. The technique considers a series of alternating sums with terms of nucleotide variables converted into binary values ​​(0, 1). The significant difference between GenomeBits and other binary representation techniques are the alternate signs (±) of the terms of the GenomeBits sums. That is, if a term in a given nucleotide position is negative, then the next term would be negative, and vice versa.

In the GenomeBits representation, the authors observed that the curves of the Delta sequences reflected those of the Omicron sequences. This became more prominent when both curves were averaged. Regions of zero (low noise) or constant mean values ​​were indicative of a perfect reflection. The technique illustrated and ordered the transition (constant) to disordered (peak) near the nonstructural protein (NSP) -5 polymerase within the region of the open reading frame (ORF) -1a to the part of the ear protein.

Different patterns were also observed around the ear region. The disordered curves (peaks) diverged rapidly, denoting differences with increasing base position. The positive and negative terms were partially canceled, converging on some non-zero values. In addition, data noise reduction could be observed including sliding windows of different sizes up to 500 bases.

Conclusions

The researchers observed constant and maximal transitions around the spike protein region of the SARS-CoV-2 Delta and Omicron variants using the GenomeBits method. Numerical representations of genomic sequences have been fundamental in bioinformatics and could help manage huge sequence data. GenomeBits could help with future bioinformatics surveillance of infectious diseases, and sequence-to-numeral mapping methods would likely prevail to characterize new sequences.

Leave a Comment

Your email address will not be published. Required fields are marked *