Comparison of genome sequences of SARS-CoV-2 and other coronaviruses

In a recent study published in Research Square *, researchers compared the genomic sequence of severe acute respiratory coronavirus 2 (SARS-CoV-2) and that of different coronaviruses (CoV).

Study: Different genomic representations of new pathogens based on signal processing algorithms: COVID-19 case study. Image credit: Drehstrom / Shutterstock

Fund

CoVs are ribonucleic acid (RNA) viruses that cause respiratory and digestive tract infections. CoVs contain spike (S) proteins that mediate entry into host cells. CoVs belong to the Coronaviridae family which includes four genera: Alpha, Beta, Gamma and Delta. Alpha and beta-CoV infect mammals, while gamma and delta-CoV predominantly infect birds.

Phylogenetic studies have reported the complex evolution of CoVs. These viruses use “template change,” a unique mechanism that leads to higher rates of homologous RNA recombination. Human CoVs are pathogenic, including SARS-CoV which caused the outbreak of SARS, Middle East Respiratory Syndrome (MERS) -CoV, which caused the outbreak of MERS, and SARS-CoV-2, l etiological agent of current coronavirus disease 2019 (Covid pandemic19.

SARS-CoV-2, a Beta-CoV, is surrounded and contains a positive-sense single-stranded (ss) RNA genome. Its genome is approximately 29.9 kilobases. It contains 11 5 ‘and 3’ open reading frames and untranslated regions (UTRs). A recent study indicated that the SARS-CoV-2 genome is a consequence of recombination of bat and pangolin CoVs.

The study and conclusions

The present study used bioinformatics and signal processing tools to understand intragenic variations between different CoV genomes and explore the origin of SARS-CoV-2. They compiled a library of 26 CoV genome sequences, including SARS-CoV-2, publicly available on GenBank. First, they constructed a Chaos Game Representation Graph / Image (CGR), which is an iterative mapping method in which each nucleotide is assigned a coordinate (X, Y) in a two-dimensional (2D) space.

The CGR graph was divided into equal subimages and the center point (centroid) of each subregion was calculated. They determined the distance between the centroid of the SARS-CoV-2 genome and the other sequences for each subregion. They then mapped pseudo-potentials for electron-ion interaction (EIIP) to obtain signals from genomic sequences. They were analyzed using discrete smoothed Fourier transform (SDFT) and continuous wave transformation (CWT) methods.

In addition, the authors explored the similarities between the sequences with the Clustal X tool, followed by a recombination analysis using the Simplot tool. The 25 genomic sequences were transformed into a numerical representation in a 2D CGR graphic / image. They found similarities between the SARS-CoV-2 genome and five other CoV sequences: Bat CoVs RaTG13, COVZC45, COVZXC21, and pangolin CoVs GXP2V and MP789. The genomes closest to SARS-CoV-2 were RaTG13, GXP2V, and MP789.

The similarities between the SARS-CoV-2 genome sequence and the five sequences were confirmed by applying SDFT and CWT techniques to EIIP signals. Simplot analysis revealed a similarity of the SARS-CoV-2 genome with CoV MP789 pangolin and CoV RaTG13 bat.

Conclusions

To summarize, in the present work, the authors proposed original methods of genomic identification and processing that help to compare various coronavirus genomes and to identify similarities between viruses such as SARS-CoV-2 affecting humans and other viruses belonging to the genome. same family affecting other viruses. spices. According to the authors, this research is crucial to understanding the origin and evolution of the SARS-CoV-2 genome.

The researchers compared the results obtained using different methods of DNA representation with each other and with the results obtained by traditional methods such as Simplot analysis and Blast comparison, in an attempt to identify any possible recombination event.

The new algorithms proposed by the authors are based on nucleotide frequencies and could help to classify and identify many other DNA sequences by analyzing the correlation spectra between the sequences. The authors hope that these findings on phylogenetic trees using numerical classification based on DNA sequences will be a reflection of the performance of the algorithm and their methods can help solve complex biological problems.

* Important news

Research Square publishes preliminary scientific reports that are not peer-reviewed and therefore should not be considered conclusive, guided by clinical practice or health-related behavior, or treated as established information.

Leave a Comment

Your email address will not be published. Required fields are marked *