Use of In silico Approaches for Next Generation Sequencing Bioinformatic Pipeline Validation

Publication Date: October 13, 2022
Last Updated: October 24, 2022

General Recommendations

The laboratory may use in silico data files to supplement NGS analytical validation, particularly to assess analytical sensitivity or false-negative rates for specific variants; however, in silico data files cannot supplant the use of physical samples (eg, patient samples).

The laboratory should understand the functional limitations of the type(s) of in silico data being utilized.

The laboratory should understand the limitations of most in silico data for assessing performance in particular genome contexts and variant types susceptible to systematic sequencing errors (eg, homopolymers and tandem repeats) and mapping errors (eg, genes with pseudogenes).

The laboratory may use in silico samples for testing required for minor updates to clinical bioinformatics software pipelines.

Commercial vendors and internal pipeline developers should include options in their analysis pipelines to facilitate easier in silico data file import and analysis by clinical laboratories.

Recommendation Grading



Use of In silico Approaches for Next Generation Sequencing Bioinformatic Pipeline Validation

Authoring Organizations

Publication Month/Year

October 13, 2022

Last Updated Month/Year

May 23, 2023

Document Type


Country of Publication


Document Objectives

In silico approaches for next generation sequencing (NGS) data modeling have utility in the clinical laboratory as a tool for clinical assay validation. In silico NGS data can take a variety of forms including pure simulated data or manipulated data files in which variants are inserted into existing data files. In silico data enables simulation of a range of variants that may be difficult to obtain from a single physical sample. Such data allows laboratories to more accurately test the performance of clinical bioinformatics pipelines without sequencing additional cases. For example, clinical laboratories may use in silico data to simulate low variant allele fraction variants to test the analytical sensitivity of variant calling software or simulate a range of insertion/deletion sizes to determine the performance of indel calling software. In this manuscript, the Working Group reviews the different types of in silico data with their strengths and limitations, methods to generate in silico data, and how it can be used in the clinical molecular diagnostic laboratory. Survey data indicates how in silico NGS data is currently being used. Finally, potential applications for which in silico data may become useful in the future are presented.

Inclusion Criteria

Male, Female, Adult, Older adult

Health Care Settings

Ambulatory, Laboratory services, Outpatient

Intended Users

Nurse, nurse practitioner, physician, physician assistant


Counseling, Assessment and screening, Management

Diseases/Conditions (MeSH)

D017422 - Sequence Analysis, DNA


Sequencing, Next-Generation Sequencing, pipeline validation

Source Citation

Duncavage EJ, Coleman JF, de Baca ME, Kadri S, Leon A, Routbort M, Roy S, Suarez CJ, Vanderbilt C, Zook JM. Recommendations for the Use of In silico Approaches for Next Generation Sequencing Bioinformatic Pipeline Validation: A Joint Report of the Association for Molecular Pathology, Association for Pathology Informatics, and College of American Pathologists. J Mol Diagn. 2022 Oct 13:S1525-1578(22)00287-2. doi: 10.1016/j.jmoldx.2022.09.007. Epub ahead of print. PMID: 36244574.

Supplemental Methodology Resources

Data Supplement