Skip to main content

Advertisement

Log in

Self-reported ethnicity, genetic structure and the impact of population stratification in a multiethnic study

  • Original Investigation
  • Published:
Human Genetics Aims and scope Submit manuscript

Abstract

It is well-known that population substructure may lead to confounding in case–control association studies. Here, we examined genetic structure in a large racially and ethnically diverse sample consisting of five ethnic groups of the Multiethnic Cohort study (African Americans, Japanese Americans, Latinos, European Americans and Native Hawaiians) using 2,509 SNPs distributed across the genome. Principal component analysis on 6,213 study participants, 18 Native Americans and 11 HapMap III populations revealed four important principal components (PCs): the first two separated Asians, Europeans and Africans, and the third and fourth corresponded to Native American and Native Hawaiian (Polynesian) ancestry, respectively. Individual ethnic composition derived from self-reported parental information matched well to genetic ancestry for Japanese and European Americans. STRUCTURE-estimated individual ancestral proportions for African Americans and Latinos are consistent with previous reports. We quantified the East Asian (mean 27%), European (mean 27%) and Polynesian (mean 46%) ancestral proportions for the first time, to our knowledge, for Native Hawaiians. Simulations based on realistic settings of case–control studies nested in the Multiethnic Cohort found that the effect of population stratification was modest and readily corrected by adjusting for race/ethnicity or by adjusting for top PCs derived from all SNPs or from ancestry informative markers; the power of these approaches was similar when averaged across causal variants simulated based on allele frequencies of the 2,509 genotyped markers. The bias may be large in case-only analysis of gene by gene interactions but it can be corrected by top PCs derived from all SNPs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Albright CL, Steffen AD, Wilkens LR, Henderson BE, Kolonel LN (2008) The prevalence of obesity in ethnic admixture adults. Obesity 16:1138–1143

    Article  PubMed  Google Scholar 

  • Bacanu SA, Devlin B, Roeder K (2000) The power of genomic control. Am J Hum Genet 66:1933–1945

    Article  CAS  PubMed  Google Scholar 

  • Beechert ED (1985) Working in Hawaii: a labor history. University of Hawaii Press, Honolulu

    Google Scholar 

  • Bonilla C, Parra EJ, Pfaff CL, Dios S, Marshall JA, Hamman RF, Ferrell RE, Hoggart CL, McKeigue PM, Shriver MD (2004) Admixture in the Hispanics of the San Luis Valley, Colorado, and its implications for complex trait gene mapping. Ann Hum Genet 68:139–153

    Article  CAS  PubMed  Google Scholar 

  • Devlin B, Roeder K (1999) Genomic control for association studies. Biometrics 55:997–1004

    Article  CAS  PubMed  Google Scholar 

  • Fejerman L, Haiman CA, Reich D, Tandon A, Deo RC, John EM, Ingles SA, Ambrosone CB, Bovbjerg DH, Jandorf LH, Davis W, Ciupak G, Whittemore AS, Press MF, Ursin G, Bernstein L, Huntsman S, Henderson BE, Ziv E, Freedman ML (2009) An admixture scan in 1, 484 African American women with breast cancer. Cancer Epidemiol Biomarkers Prev 18:3110–3117

    Article  CAS  PubMed  Google Scholar 

  • Freedman ML, Haiman CA, Patterson N, McDonald GJ, Tandon A, Waliszewska A, Penney K, Steen RG, Ardlie K, John EM, Oakley-Girvan I, Whittemore AS, Cooney KA, Ingles SA, Altshuler D, Henderson BE, Reich D (2006) Admixture mapping identifies 8q24 as a prostate cancer risk locus in African-American men. Proc Natl Acad Sci USA 103:14068–14073

    Article  CAS  PubMed  Google Scholar 

  • Grandinetti A, Keawe’aimoku Kaholokula J, Chang HK, Chen R, Rodriguez BL, Melish JS, Curb JD (2002) Relationship between plasma glucose concentrations and Native Hawaiian Ancestry: The Native Hawaiian Health Research Project. Int J Obes 26:778–782

    Article  CAS  Google Scholar 

  • Haiman CA, Patterson N, Freedman ML, Myers SR, Pike MC, Waliszewska A, Neubauer J, Tandon A, Schirmer C, McDonald GJ, Greenway SC, Stram DO, Le Marchand L, Kolonel LN, Frasco M, Wong D, Pooler LC, Ardlie K, Oakley-Girvan I, Whittemore AS, Cooney KA, John EM, Ingles SA, Altshuler D, Henderson BE, Reich D (2007) Multiple regions within 8q24 independently affect risk for prostate cancer. Nat Genet 39:638–644

    Article  CAS  PubMed  Google Scholar 

  • Haiman CA, Hsu C, de Bakker PIW, Frasco M, Sheng X, Van Den Berg D, Casagrande JT, Kolonel LN, Le Marchand L, Hankinson SE, Han J, Dunning AM, Pooley KA, Freedman ML, Hunter DJ, Wu AH, Stram DO, Henderson BE (2008) Comprehensive association testing of common genetic variation in DNA repair pathway genes in relationship with breast cancer risk in multiple populations. Hum Mol Genet 17:825–834

    Article  CAS  PubMed  Google Scholar 

  • Hoggart CJ, Parra EJ, Shriver MD, Bonilla C, Kittles RA, Clayton DG, McKeigue PM (2003) Control of confounding of genetic associations in stratified populations. Am J Hum Genet 72:1492–1504

    Article  CAS  PubMed  Google Scholar 

  • Hubisz MJ, Falush D, Stephens M, Pritchard JK (2009) Inferring weak population structure with the assistance of sample group information. Mol Ecol Resour 9:1322–1332

    Article  Google Scholar 

  • Jakobsson M, Scholz SW, Scheet P, Gibbs JR, VanLiere JM, Fung HC, Szpiech ZA, Degnan JH, Wang K, Guerreiro R, Bras JM, Schymick JC, Hernandez DG, Traynor BJ, Simon-Sanchez J, Matarin M, Britton A, van de Leemput J, Rafferty I, Bucan M, Cann HM, Hardy JA, Rosenberg NA, Singleton AB (2008) Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451:998–1003

    Article  CAS  PubMed  Google Scholar 

  • Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, New York

    Google Scholar 

  • Kolonel LN, Henderson BE, Hankin JH, Nomura AM, Wilkens LR, Pike MC, Stram DO, Monroe KR, Earle ME, Nagamine FS (2000) A multiethnic cohort in Hawaii and Los Angeles: baseline characteristics. Am J Epidemiol 151:346–357

    CAS  PubMed  Google Scholar 

  • Kosoy R, Nassir R, Tian C, White PA, Butler LM, Silva G, Kittles R, Alarcon-Riquelme ME, Gregersen PK, Belmont JW, De La Vega FM, Seldin MF (2009) Ancestry informative marker sets for determining continental origin and admixture proportions in common populations in America. Hum Mutat 30:69–78

    Article  PubMed  Google Scholar 

  • Lander ES, Schork NJ (1994) Genetic dissection of complex traits. Science 265:2037–2048

    Article  CAS  PubMed  Google Scholar 

  • Mao X, Bigham AW, Mei R, Gutierrez G, Weiss KM, Brutsaert TD, Leon-Velarde F, Moore LG, Vargas E, McKeigue PM, Shriver MD, Parra EJ (2007) A genomewide admixture mapping panel for Hispanic/Latino populations. Am J Hum Genet 80:1171–1178

    Article  CAS  PubMed  Google Scholar 

  • Marchini J, Cardon LR, Phillips MS, Donnelly P (2004) The effects of human population structure on large genetic association studies. Nat Genet 36:512–517

    Article  CAS  PubMed  Google Scholar 

  • Martinez-Marignac VL, Valladares A, Cameron E, Chan A, Perera A, Globus-Goldberg R, Wacher N, Kumate J, McKeigue P, O’Donnell D, Shriver MD, Cruz M, Parra EJ (2007) Admixture in Mexico City: implications for admixture mapping of type 2 diabetes genetic risk factors. Hum Genet 120:807–819

    Article  PubMed  Google Scholar 

  • McKeigue PM (1997) Mapping genes underlying ethnic differences in disease risk by linkage disequilibrium in recently admixed populations. Am J Hum Genet 60:188–196

    CAS  PubMed  Google Scholar 

  • Nordyke EC (1989) The peopling of Hawaii. University Press of Hawaii, Honolulu

    Google Scholar 

  • Patterson N, Hattangadi N, Lane B, Lohmueller KE, Hafler DA, Oksenberg JR, Hauser SL, Smith MW, O’Brien SJ, Altshuler D, Daly MJ, Reich D (2004) Methods for high-density admixture mapping of disease genes. Am J Hum Genet 74:979–1000

    Article  CAS  PubMed  Google Scholar 

  • Patterson N, Price AL, Reich D (2006) Population structure and eigenanalysis. PLoS Genet 2:2074–2093

    Article  CAS  Google Scholar 

  • Plenge RM, Seielstad M, Padyukov L, Lee AT, Remmers EF, Ding B, Liew A, Khalili H, Chandrasekaran A, Davies LR, Li W, Tan AK, Bonnard C, Ong RT, Thalamuthu A, Pettersson S, Liu C, Tian C, Chen WV, Carulli JP, Beckman EM, Altshuler D, Alfredsson L, Criswell LA, Amos CI, Seldin MF, Kastner DL, Klareskog L, Gregersen PK (2007) TRAF1-C5 as a risk locus for rheumatoid arthritis—a genomewide study. N Engl J Med 357:1199–1209

    Article  CAS  PubMed  Google Scholar 

  • Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904–909

    Article  CAS  PubMed  Google Scholar 

  • Price AL, Patterson N, Yu F, Cox DR, Waliszewska A, McDonald GJ, Tandon A, Schirmer C, Neubauer J, Bedoya G, Duque C, Villegas A, Bortolini MC, Salzano FM, Gallo C, Mazzotti G, Tello-Ruiz M, Riba L, Aguilar-Salinas CA, Canizales-Quinteros S, Menjivar M, Klitz W, Henderson B, Haiman CA, Winkler C, Tusie-Luna T, Ruiz-Linares A, Reich D (2007) A genomewide admixture map for Latino populations. Am J Hum Genet 80:1024–1036

    Article  CAS  PubMed  Google Scholar 

  • Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155:945–959

    CAS  PubMed  Google Scholar 

  • Reich D, Patterson N, De Jager PL, McDonald GJ, Waliszewska A, Tandon A, Lincoln RR, DeLoa C, Fruhan SA, Cabre P, Bera O, Semana G, Kelly MA, Francis DA, Ardlie K, Khan O, Cree BA, Hauser SL, Oksenberg JR, Hafler DA (2005) A whole-genome admixture scan finds a candidate locus for multiple sclerosis susceptibility. Nat Genet 37:1113–1118

    Article  CAS  PubMed  Google Scholar 

  • Reiner AP, Ziv E, Lind DL, Nievergelt CM, Schork NJ, Cummings SR, Phong A, Burchard EG, Harris TB, Psaty BM, Kwok P (2005) Population structure, admixture and aging-related phenotypes in African American adults: the cardiovascular health study. Am J Hum Genet 76:463–477

    Article  CAS  PubMed  Google Scholar 

  • Silva-Zolezzi I, Hidalgo-Miranda A, Estrada-Gil J, Fernandez-Lopez JC, Uribe-Figueroa L, Contreras A, Balam-Ortiz E, del Bosque-Plata L, Velazquez-Fernandez D, Lara C, Goya R, Hernandez-Lemus E, Davila C, Barrientos E, March S, Jimenez-Sanchez G (2009) Analysis of genomic diversity in Mexican Mestizo populations to develop genomic medicine in Mexico. Proc Natl Acad Sci USA 106:8611–8616

    Article  CAS  PubMed  Google Scholar 

  • Smith MW, Patterson N, Lautenberger JA, Truelove AL, McDonald GJ, Waliszewska A, Kessing BD, Malasky MJ, Scafe C, Le E, De Jager PL, Mignault AA, Yi Z, De The G, Essex M, Sankale JL, Moore JH, Poku K, Phair JP, Goedert JJ, Vlahov D, Williams SM, Tishkoff SA, Winkler CA, De La Vega FM, Woodage T, Sninsky JJ, Hafler DA, Altshuler D, Gilbert DA, O’Brien SJ, Reich D (2004) A high-density admixture map for disease gene discovery in african americans. Am J Hum Genet 74:1001–1013

    Article  CAS  PubMed  Google Scholar 

  • Tang H, Quertermous T, Rodriguez B, Kardia SL, Zhu X, Brown A, Pankow JS, Province MA, Hunt SC, Boerwinkle E, Schork NJ, Risch NJ (2005) Genetic structure, self-identified race/ethnicity, and confounding in case–control association studies. Am J Hum Genet 76:268–275

    Article  CAS  PubMed  Google Scholar 

  • The International HapMap Consortium (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449:851–861

    Article  Google Scholar 

  • Thomas DC, Witte JS (2002) Point: population stratification: a problem for case–control studies of candidate–gene associations? Cancer Epidemiol Biomarkers Prev 11:505–512

    PubMed  Google Scholar 

  • Tian C, Hinds DA, Shigeta R, Kittles R, Ballinger DG, Seldin MF (2006) A genomewide single-nucleotide-polymorphism panel with high ancestry information for African American admixture mapping. Am J Hum Genet 79:640–649

    Article  CAS  PubMed  Google Scholar 

  • Wacholder S, Rothman N, Caporaso N (2002) Counterpoint: bias from population stratification is not a major threat to the validity of conclusions from epidemiological studies of common polymorphisms and cancer. Cancer Epidemiol Biomarkers Prev 11:512–520

    Google Scholar 

  • Wilson JF, Weale ME, Smith AC, Gratrix F, Fletcher B, Thomas MG, Bradman N, Goldstein DB (2001) Population genetic structure of variable drug response. Nat Genet 29:265–269

    Article  CAS  PubMed  Google Scholar 

  • Yeager M, Chatterjee N, Ciampa J, Jacobs KB, Gonzalez-Bosquet J, Hayes RB, Kraft P, Wacholder S, Orr N, Berndt S, Yu K, Hutchinson A, Wang Z, Amundadottir L, Feigelson HS, Thun MJ, Diver WR, Albanes D, Virtamo J, Weinstein S, Schumacher FR, Cancel-Tassin G, Cussenot O, Valeri A, Andriole GL, Crawford ED, Haiman CA, Henderson B, Kolonel L, Le Marchand L, Siddiq A, Riboli E, Key TJ, Kaaks R, Isaacs W, Isaacs S, Wiley KE, Gronberg H, Wiklund F, Stattin P, Xu J, Zheng SL, Sun J, Vatten LJ, Hveem K, Kumle M, Tucker M, Gerhard DS, Hoover RN, Fraumeni JF Jr, Hunter DJ, Thomas G, Chanock SJ (2009) Identification of a new prostate cancer susceptibility locus on chromosome 8q24. Nat Genet 41:1055–1057

    Article  CAS  PubMed  Google Scholar 

  • Yu K, Wang Z, Li Q, Wacholder S, Hunter DJ, Hoover RN, Chanock S, Thomas G (2008) Population substructure and control selection in genome-wide association studies. PLoS ONE 3:e2551

    Article  PubMed  Google Scholar 

  • Zhang F, Wang Y, Deng HW (2008) Comparison of population-based association study methods correcting for population stratification. PLoS One 3:e3392

    Article  PubMed  Google Scholar 

  • Zhu X, Cooper RS (2007) Admixture mapping provides evidence of association of the VNN1 gene with hypertension. PLoS One 2:e1244

    Article  PubMed  Google Scholar 

Download references

Acknowledgments

We thank the researchers and participants of the Multiethnic Cohort study. We also thank Dr. David Van Den Berg, Loreall Pooled and Xin Sheng for their technical assistance in genotyping as well as Dr. Gary Chen and Christian Caberto for help in genotype pre-processing and data management. This work was supported by grants from the National Institutes of Health (R37CA54281, P01CA33619, R01CA63464, and U01CA98758).

Conflict of interest statement

The authors declare that they have no conflict of interest.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hansong Wang.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 344 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, H., Haiman, C.A., Kolonel, L.N. et al. Self-reported ethnicity, genetic structure and the impact of population stratification in a multiethnic study. Hum Genet 128, 165–177 (2010). https://doi.org/10.1007/s00439-010-0841-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00439-010-0841-4

Keywords

Navigation