Characterizing the mutation spectrum and reconstructing the viral quasispecies in the HBV RT region has implications for understanding drug resistance due to NA therapy 23. However, long-term use of NAs leads to drug resistance. Nucleos(t)ide analogs (NAs) have been widely used in anti-HBV therapy by directly inhibiting the HBV RT enzyme and effectively suppressing viral replication 22. Since the RT lacks proofreading, errors in HBV DNA replication occur at a relatively higher rate than other DNA viruses, with an estimated nucleotide substitution rate of 1.4–3.2 × 10 −5 substitutions per site per year 21. HBV has a partially double-stranded DNA genome, and its replication depends on reverse transcription of an RNA intermediate by reverse transcriptase (RT). However, these algorithms have not been fully assessed and applied to viral sequencing data (e.g., hepatitis B virus, HBV). Sequencing data generated in NGS platforms were analyzed in four microbial genomes to assess the coverage distribution, bias, GC distribution, variant detection and accuracy 13. Two review articles 12, 20 have systematically surveyed these methods for PGM data and provided guidance concerning which tools to consider for benchmarking based on the data properties. These methods are classified into the following three categories: (1) suffix array/tree-based methods that use a suffix tree to detect and correct substitution and indel errors (e.g., Fiona 15) (2) k-spectrum-based methods that divide reads into k-mer lengths and generate a k-mer depth profile to correct the k-mer profile (e.g., Blue 16 and Pollux 17) and (3) multiple sequence alignment (MSA)-based methods that use k-mers as seeds and construct a consensus sequence from the multiple alignments to correct errors (e.g., Coral 18 and Karect 19). These algorithms differ with respect to error models, statistical techniques, data features, the determined parameters, and performances. Several algorithms have been proposed to correct sequencing errors for PGM data (Table 1). However, PGM has a relatively high insertion and deletion (indel) error rate of 1.5% (range from 0.46% to 2.4%) 12, 13, 14. Ion Torrent Personal Genome Machine (PGM) technology is a mid-length read, low-cost and high-speed NGS platform 10 with special applications in microbial sequencing 11. Next-generation sequencing (NGS) has been widely used in the study of viruses and has opened new avenues for research and diagnostic applications (e.g., viral mutant spectra 1, 2, virus quasispecies theory and dynamics 3, 4, 5, 6, 7, fitness landscape 8, 9 and discovery of novel viruses 5). We found that the combined use of Pollux and Fiona gave the best results when error-correcting Ion Torrent PGM viral data. Of the algorithms tested, Pollux showed a better overall performance but tended to over-correct ‘genuine’ substitution variants, whereas Fiona proved to be better at distinguishing these variants from sequencing errors. Tests using both real and simulated data showed that the algorithms differed in their abilities to detect and correct errors and that the error rate and sequencing depth significantly affected the performance. Deletion errors were clearly present at the ends of homopolymer runs. We examined 19 quality-trimmed PGM datasets for the HBV reverse transcriptase (RT) region and found a total error rate of 0.48% ± 0.12%. A full systematic assessment of the effectiveness of various error correction algorithms in PGM viral datasets (e.g., hepatitis B virus (HBV)) has not been performed. Ion Torrent Personal Genome Machine (PGM) technology is a mid-length read, low-cost and high-speed next-generation sequencing platform with a relatively high insertion and deletion (indel) error rate.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |