本文主要基于2016年发表在『Nature Review Genetics』杂志上综述文章『A comparison of tools for the simulation of genomic next-generation sequencing data』。
在学习高通量测序的相关知识的时候,我们往往陷入两个困境:(1)找不到想要的数据;(2)数据太大,难以下载分析。这时,高通量数据模拟的软件就派上用场了。
简单来说,测序数据模拟软件主要用于一下三个方面:
- planning experiments
- testing hypotheses
- benchmarking tools
- evaluating particular results
The simulation of NGS data can be extremely useful for planning experiments,
testing hypotheses, benchmarking tools and evaluating particular results.
Given a reference genome or dataset, for instance, one can play with
an array of sequencing technologies to choose the best-suited technology and parameters for the particular goal,
possibly optimizing time and costs.
Yet, this is still not the standard practice and researchers often base their choices on
practical considerations like technology and money availability.
As shown throughout this Review, simulation of NGS data from known genomes or transcriptomes can be extremely useful
when evaluating assembly, mapping, phasing or genotyping algorithms exposing their advantages and drawbacks under different circumstances.
这篇综述文章评估了23个测序数据模拟软件,介绍各自不同的特点,需求及潜在应用,并给出选取合适软件的方法。
NGS genomic simulators decision tree.
下面的树状图简单说明了选取不同方法的原则
Main characteristics of current NGS technologies
目前不同NGS技术的一些特点。注意,『X』表示存在。
General overview of the sequencing process and steps that can be parameterized in the simulations
General overview of NGS simulation
General information about 23 NGS genomic simulators
23种模拟软件的特点
Technical information about 23 NGS genomic simulators
Genomic variants
最后,直接给出该文章的online summary:
欢迎大家关注我的微信公众号『生信family』