Next-Gen Sequencing
Several processes are available for generating or binning counts, inferring gene structure, and creating and importing variant call format (VCF) files from next-generation sequencing data.
Count SAS Data Generation
The first three processes import a set of files and generate count data, which is combined into SAS data sets containing chromosome , location, and sequence identity with respect to a reference sequence.
Binning and Summarization
The following two processes are used for additional condensation and summarization of next-generation sequencing data.
Binning intensities or read counts stored in rows of a tall SAS data set
Tip : This can be useful to reduce the number of rows in a large data set in preparation for downstream plotting and modeling.
Summarizing position-level intensity data into exon and intron bins as defined by an isoform definition file in UCSC format
Tip : Output from a process such as SAM Input Engine can be used as input for this process.
VCF File and SAS Data Set Generation from Other Sources
The remaining processes focus on the detection of single nucleotide polymorphisms ( SNPs ) and insertion - deletion polymorphisms (INDELs, also known as deletion insertion polymorphisms (DIPs)), generating VCF or SAS files.
Generating variant call format (VCF) files from SNPs/INDELs called (using SAMtools/BCFtools) from BAM files
Importing CLC bio SNP or DIP Detection Table .csv files into SAS data set(s)
Importing variant call format (VCF) files into SAS data set(s)
