Analyzing copy number variation using SNP array data: protocols for calling CNV and association tests

High-density SNP genotyping technology provides a low-cost, effective tool for conducting Genome Wide Association (GWA) studies. The wide adoption of GWA studies has indeed led to discoveries of disease- or trait-associated SNPs, some of which were subsequently shown to be causal. However, the nearly universal shortcoming of many GWA studies–missing heritability–has prompted great interest in searching for other types of genetic variation, such as copy number variation (CNV). Certain CNVs have been reported to alter disease susceptibility. Algorithms and tools have been developed to identify CNVs using SNP array hybridization intensity data. Such an approach provides an additional source of data with almost no extra cost. In this unit, we demonstrate the steps for calling CNVs from Illumina SNP array data using PennCNV and performing association analysis using R and PLINK.