Next-Gen Sequencing

Several processes are available for generating or binning counts, inferring gene structure, and creating and importing variant call format (VCF) files from next-generation sequencing data.

Count SAS Data Generation

The first three processes import a set of files and generate count data, which is combined into SAS data sets containing chromosome, location, and sequence identity with respect to a reference sequence.

Process

Input file format

Input file extension

SAM Input Engine

Sequence Alignment Map (SAM)

.sam

BAM Input Engine

Compressed Binary Sequence Alignment Map (BAM)

.bam

Eland Input Engine

Eland

.txt

Binning and Summarization

The following two processes are used for additional condensation and summarization of next-generation sequencing data.

Process

Choose this process for...

Bin Intensities or Read Counts

Binning intensities or read counts stored in rows of a tall SAS data set

Tip: This can be useful to reduce the number of rows in a large data set in preparation for downstream plotting and modeling.

Gene Model Summary

Summarizing position-level intensity data into exon and intron bins as defined by an isoform definition file in UCSC format

Tip: Output from a process such as SAM Input Engine can be used as input for this process.

VCF File and SAS Data Set Generation from Other Sources

The remaining processes focus on the detection of single nucleotide polymorphisms (SNPs) and insertion-deletion polymorphisms (INDELs, also known as deletion insertion polymorphisms (DIPs)), generating VCF or SAS files.

Process

Choose this process for...

Call Variants with SAMtools

Generating variant call format (VCF) files from SNPs/INDELs called (using SAMtools/BCFtools) from BAM files

CLC Bio Input Engine

Importing CLC bio SNP or DIP Detection Table .csv files into SAS data set(s)

Complete Genomics Input Engine

Importing Complete Genomics files into SAS data set(s)

VCF Input Engine

Importing variant call format (VCF) files into SAS data set(s)

Import Feature-Barcode Matrices

Importing 10x Genomics Single-Cell RNA Sequencing data to SAS data sets.

See Import for other subcategories.