World Library  
Flag as Inappropriate
Email this Article

Genetics and archaeogenetics of South Asia

Article Id: WHEBN0003133330
Reproduction Date:

Title: Genetics and archaeogenetics of South Asia  
Author: World Heritage Encyclopedia
Language: English
Subject: South Asia, Genetics, Human evolutionary genetics, Genetic history of Europe, Administrators' noticeboard/Archive228
Publisher: World Heritage Encyclopedia

Genetics and archaeogenetics of South Asia

The study of the genetics and archaeogenetics of the ethnic groups of South Asia aims at uncovering these groups' genetic history. The geographic position of India makes Indian populations important for the study of the early dispersal of all human populations on the Eurasian continent.

Studies based on mtDNA variation have reported genetic unity across various Indian sub–populations.[1][2][3][4] Conclusions of studies based on Y Chromosome variation and Autosomal DNA variation have been varied, although many researchers argue that most of the ancestral nodes of the phylogenetic tree of all the mtDNA types originated in the subcontinent. Recent genome studies appear to show that most South Asians are descendants of two major ancestral components, one restricted to South Asia and the other component shared with Central Asia, West Asia and Europe.[5][6]

It has been found that the ancestral node of the phylogenetic tree of all the mtDNA types typically found in Central Asia, the Middle East and Europe are also to be found in South Asia at relatively high frequencies. The inferred divergence of this common ancestral node is estimated to have occurred slightly less than 50,000 years ago.[7] In India the major maternal lineages, or mitochondrial DNA Haplogroups, are M, R and U, whose coalescence times have been approximated to 50,000 BP.[7]

The major paternal lineages represented by Y chromosomes are haplogroups R1a1, R2, H, L and J2.[8] Many researchers have argued that Y-DNA Haplogroup R1a1 (M17) is of autochthonous Indian origin.[9][10] However, proposals for a Central Asian origin for R1a1 are also quite common.[11][12]


All the mtDNA and Y-chromosome lineages outside Africa descend from three founder lineages:

  • M, N and R haplogroups for mtDNA and
  • C, D and F haplogroups for the Y-chromosome.

All these six founder haplogroups can be found in the present day populations of South Asia. Moreover, the mtDNA haplogroup M and the Y-chromosome haplogroups C and D are restricted to the area east of South Asia. All the West Eurasian populations derive from the N and R haplogroups of mtDNA and the F haplogroup of the Y-chromosome.[13]

Endicott et al. state that these facts are consistent with the hypothesis of a single exodus from East Africa 65,000 years ago via a southern coastal route, with the West Eurasian lineages separating from the South Asian lineages somewhere between East/Northeast Africa and South Asia.[14]


Hypothesized map of human migration into the Indian subcontinent based on mitochondrial DNA and possible dispersal routes.

The most frequent mtDNA haplogroups in the Indian subcontinent are M, R and U (where U is a descendant of R).[8]

Arguing for the longer term "rival Y-Chromosome model",[9] Stephen Oppenheimer believes that it is highly suggestive that India is the origin of the Eurasian mtDNA haplogroups which he calls the "Eurasian Eves". According to Oppenheimer it is highly probable that nearly all human maternal lineages in Central Asia, the Middle East and Europe descended from only four mtDNA lines that originated in South Asia 50,000-100,000 years ago.[15]

Macrohaplogroup M

The macrohaplogroup M which is considered as a cluster of the proto-Asian maternal lineages,[7] represents more than 60% of Indian MtDNA.[16]

The M macrohaplotype in India includes many subgroups that differ profoundly from other sublineages in East Asia especially Mongoloid populations.[7] The deep roots of M phylogeny clearly ascertain the relic of Indian lineages as compared to other M sub lineages (in East Asia and elsewhere) suggesting 'in-situ' origin of these sub-haplogroups in South Asia, most likely in India. These deep rooting lineages are not language specific and spread over all the language groups in India.[16]

Virtually all modern Central Asian MtDNA M lineages seem to belong to the Eastern Eurasian (Mongolian) rather than the Indian subtypes of haplogroup M, which indicates that no large-scale migration from the present Turkic-speaking populations of Central Asia occurred to India. The absence of haplogroup M in Europeans, compared to its equally high frequency among Indians, eastern Asians and in some Central Asian populations contrasts with the Western Eurasian leanings of South Asian paternal lineages.[7]

Most of the extant mtDNA boundaries in South and Southwest Asia were likely shaped during the initial settlement of Eurasia by anatomically modern humans.[17]

Haplogroup Important Sub clades Populations
M2 M2a, M2b Throughout the continent except in Northwest
Peaking in Bangladesh, Andhra Pradesh, coastal Tamil Nadu and Sri Lanka
M3 M3a All the subcontinent except the Northeast
20% in Rajastan and Madhya Pradesh, being also very dense in Maharastra, Uttar Pradesh, Haryana, Gujarat, Karnataka
M4 M4a Peaks in Pakistan and Kashmir
M6 M6a,M6b Kashmir and near the coasts of the Bay of Bengal, Sri Lanka
M18 Throughout the subcontinent
Peaking at Rajastan and Andhra Pradesh
M25 Widespread in most of India (but rare outside it)
western Maharastra and Kerala, Punjab

Macrohaplogroup R

The spatial distribution of M, R and U mtDNA haplogroups and their sub-haplogroups in South Asia. .

The macrohaplogroup R (a very large and old subdivision of macrohaplogroup N) is also widely represented and accounts for the other 40% of Indian MtDNA. A very old and most important subdivision of it is haplogroup U that, while also present in West Eurasia, has several subclades specific to South Asia.

Most important South Asian haplogroups within R:[17]

Haplogroup Populations
R2 Distributed widely across the sub continent
R5 widely distributed by most of India.
Peaks in coastal SW India
R6 widespread at low rates across India.
Peaks among Tamils and Kashmiris
W Found in Pakistan, Kashmir and Punjab.
It is rare further east and not to be found in the rest of India.

Haplogroup U

Haplogroup U is a sub-haplogroup of macrohaplogroup R.[17] The distribution of haplogroup U is a mirror image of that for haplogroup M: the former has not been described so far among eastern Asians but is frequent in European populations as well as among Indians.[18] Indian U lineages differ substantially from those in Europe and their coalescence to a common ancestor also dates back to about 50,000 years.[1]

Haplogroup Populations
U2* (a parahaplogroup) is sparsely distributed specially in the northern half of the subcontinent. It is also found in SW Arabia.
U2a shows relatively high density in Pakistan and NW India but also in Karnataka, where it reaches its higher density.
U2b has highest concentration in Uttar Pradesh but is also found in many other places, specially in Kerala and Sri Lanka. It is also found in Oman.
U2c is specially important in Bangladesh and West Bengal.
U2l is maybe the most important numerically among U subclades in South Asia, reaching specially high concentrations (over 10%) in Uttar Pradesh, Sri Lanka, Sindh and parts of Karnataka. It also has some importance in Oman. mtDNA haplogroup U2i is dubbed "Western Eurasian" in Bamshad et al. study but "Eastern Eurasian (mostly India specific)" in Kivisild et al. study.
U7 this haplogroup has a significant presence in Gujrat, Punjab and Pakistan. The possible homeland of this haplogroup spans Indian Gujarat(highest frequency, 12%) and Iran because from there its frequency declines steeply both to the east and to the west.

Y chromosome

The diversion of Haplogroup F and its descendants.

The major Y chromosome DNA haplogroups in the subcontinent are Haplogroup F's descendant haplogroups R (mostly R2a, R2 and R1a1), L, H and J (mostly J2).[8]

The South Asian Y-chromosomal gene pool is characterized by five major lineages: R1a, R2, H, L and J2. Their geographical origins are listed as follows, according to the latest scholarship:
Major South Asian Y-chromosomal lineages: H J2 L R1a R2
Basu et al. (2003) no comment no comment no comment Central Asia no comment
Kivisild et al. (2003) India Western Asia India Southern and Western Asia South-Central Asia
Cordaux et al. (2004) India West or Central Asia Middle Eastern Central Asia South-Central Asia
Sengupta et al. (2006) India The Middle East and Central Asia South India North India North India
Thanseem et al. (2006) India The Levant The Middle East Southern and Central Asia Southern and Central Asia
Sahoo et al. (2006) South Asia The Near East South Asia South or West Asia South Asia
Mirabal et al. (2009) no comment no comment no comment Northwestern India or Central Asia no comment
Zhao et al. (2009) India The Middle East The Middle East Central Asia or West Eurasia Central Asia or West Eurasia
Sharma et al. (2009) no comment no comment no comment South Asia no comment
Thangaraj et al. (2010) South Asia The Near East The Near East South Asia South Asia

Haplogroup L


Haplogroup L shows time of neolithic expansion.[19] The clade is present in the Indian population at an overall frequency of ca.7-15%.[9][11][20][21] The presence of haplogroup L is quite rare among tribal groups (ca. 5,6-7%)[9][11][21]

Earlier studies (e.g. Wells et al. 2001) report a very high frequency (approaching 50%) of Haplogroup L in South India appear to have been due to extrapolation from data obtained from a sample of 84 Yadavas and Kallars, a Tamil-speaking caste of Tamil Nadu, among whom 40 (approx. 48%) displayed the M20 mutation that defines Haplogroup L.


Haplogroup L3 (M357) is found frequently among Burusho (approx. 12%[22]) and Pashtuns (approx. 7%[22]), with a moderate distribution among the general Pakistani population (approx. 2%[22]). Its highest frequency can be found in south western Balochistan province along the Makran coast (28%) to Indus River delta.

L3a (PK3) is found in approximately 23% of Nuristani in northwest Pakistan.[22]

Haplogroup H

Haplogroup H (Y-DNA) is found at a high frequency in South Asia. H is rarely found outside of the South Asia but is common among the Romanis, particularly the H-M82 subgroup. Haplogroup H is frequently found among populations of India, Sri Lanka, Nepal, Pakistan and Maldives. All three branches of Haplogroup H (Y-DNA) are found in Indian-subcontinent.

It is a branch of [23] Haplogroup H is by no means restricted to specific populations. For example, H is possessed by about 28.8% of Indo-Aryan castes.[9][21] and in tribals about 25-35%.[11][21]

Haplogroup R2

In South Asia, the frequency of R2 lineage is around 10-15% in India and Sri Lanka and 7-8% in Pakistan. At least 90% of R-M124 individuals are located in the Indian sub-continent.[24] It is also reported in Caucasus and Central Asia at lower frequency.


Significantly high percentages are shown by the people of Southern India at 26%, West Bengal at 23%, Hindus from New Delhi at 20% and Baniya from Bihar at 36%. Among tribal groups, Lodhas of West Bengal show it at 43% while Bhil of Gujarat at 18%. Chenchu and Pallan of South India at 20% and 14% respectively. Tharu of North India shows it at 17%.[4]

It is also significantly high in many Brahmin groups including Punjabi Brahmins (25%), Bengali Brahmins (22%), Bengali Kayasthas (21%), Konkanastha Brahmins (20%), Chaturvedis (32%), Bhargavas (32%), Kashmiri Pandits (14%) and Lingayat Brahmins (30%).[4]

North Indian Muslims have a frequency of 11%(Sunni) and 9%(Shia), while Dawoodi Bohra Muslim in the western state of Gujarat have a frequency of 16% and Mappla Muslims of South India have a frequency of 5%.[25] This lineage also forms 5% of Punjabi males.


The R2 haplogroup is found in 14% of the Burusho people.[22] Among the Hunza it is found at 18% while the Parsis show it at 20%.

Sri Lanka

39% of the Sinhalese of Sri Lanka are found to have R2.


13% of the Maldivian people of Maldives are found to have R2.[26]


In Nepal, R2 percentages range from 2% to 26% within different groups under various studies. Newars show a significantly high frequency of 26% while people of Kathmandu show it at 10%.

Haplogroup R1a1

In South Asia R1a1 has been observed often with high frequency in a number of demographic groups.[10][27] Its parent clade Haplogroup R1a is believed to have its origins in the Indus Valley or the Eurasian Steppe,[28] whereas its successor clade R1a1 has the highest frequency and time depth in South Asia, making it a possible locus of origin.[29][30][31] However, the uneven distribution of this haplogroup among South Asian castes and tribal populations makes a Central Eurasian origin of this lineage a strong possibility as well.[11][12]


In India, high percentage of this haplogroup is observed in West Bengal Brahmins (72%) [27] to the east, Konkanastha Brahmins (48%) [27] to the west, Khatris (67%)[29] in north and Iyengar Brahmins (31%)[27] of south. It has also been found in several South Indian Dravidian-speaking Tribals including the Chenchu (26%)[32] and Valmikis of Andhra Pradesh as well as the Yadav and Kallar of Tamil Nadu suggesting that M17 is widespread in these Southern Indians tribes.[32]

Besides these, studies show high percentages in geographically distant groups in India such as Manipuris (50%)[29] in the extreme North East and in Punjab (47%)[32] to the extreme North West.


In Pakistan it is found at 71% among the Mohanna of Sindh Province to the south and 46% among the Baltis of Gilgit-Baltistan to the north.[29]

Sri Lanka

In Sri Lanka, 13% of the Sinhalese people were found to be R1a1a (M17) positive.[32]


In Maldives, 24% of the Maldivian people were found to be R1a1a (M17) positive.[26]


People in Terai Region, Nepal show R1a1a at 69%.[33]

Haplogroup J2

Haplogroup J2 reflects presence from neolithic period in the subcontinent.[19] J2 is almost absent from tribals, but occurs among some Austro-Asiatic tribals (11%). The frequency of J2 is higher in South Indian castes (19%) than in North Indian castes (11%) or Pakistan (12%).[9] J2 appears at 20% among the Yadavas of South India while among the Lodhas of West Bengal it is 32%. In Maldives, 22% of Maldivian population were found to be haplogroup J2 positive.[34]

Reconstructing Indian population history

The Indian Genome Variation Consortium (2008), divides the population of the subcontinent into four morphological types— Caucasoids, Mongoloids, Australoids, and Negritos (largely in the Andaman Islands) and four linguistic groups— Indo–European, Dravidian, Tibeto–Burman and Austro–Asiatic.[35] The molecular anthropology studies use three different type of markers: Mitochondrial DNA (mtDNA) variation which is maternally inherited and highly polymorphic, Y Chromosome variation which involves uniparental transmission along the male lines, and Autosomal DNA variation.[4]:04

mtDNA variation

Most of the studies based on mtDNA variation have reported genetic unity of Indian populations across language, caste and tribal groups.[1][2][3] It is likely that haplogroup M was brought to Asia from East Africa along the southern route by earliest migration wave 60,000 years ago.[1]

According to Kivisild et al. (1999), "Minor overlaps with lineages described in other Eurasian populations clearly demonstrate that recent immigrations have had very little impact on the innate structure of the maternal gene pool of Indians. Despite the variations found within India, these populations stem from a limited number of founder lineages. These lineages were most likely introduced to the Indian subcontinent during the Middle Palaeolithic, before the peopling of Europe and perhaps the Old World in general."[1] Basu et al. (2003) also emphasizes underlying unity of female lineages in India.[20]

Y Chromosome variation

Conclusions based on Y Chromosome variation have been more varied than those based on mtDNA variation. While Kivisild et al. (2003) proposes an ancient and shared genetic heritage of male lineages in India, Bamshad et al. (2001) suggests an affinity between Indian male lineages and west Eurasians proportionate to caste rank and places caste populations of southern Indian states closer to East Europeans.[36]

Basu et al. (2003) concludes that Austro–Asiatic tribal populations entered India first from the Northwest corridor and much later some of them through Northeastern corridor.[20] Whereas, Kumar et al. (2007) analyzed 25 Indian Austro-Asiatic tribes and found strong paternal genetic link among the sub-linguistic groups of the Indian Austro-Asiatic populations.[37] Mukherjee et al. (2001) places North Indians between west Asian and Central Asian populations,[38] whereas Cordaux et al. (2004) argues that the Indian caste populations are closer to Central Asian populations.[21] Sahoo et al. (2006) and Sengupata et al. (2006) suggest that Indian caste populations have not been subject to any recent admixtures.[9][10] Sanghamitra Sahoo concludes his study with:[10]

It is not necessary, based on the current evidence, to look beyond South Asia for the origins of the paternal heritage of the majority of Indians at the time of the onset of settled agriculture. The perennial concept of people, language, and agriculture arriving to India together through the northwest corridor does not hold up to close scrutiny. Recent claims for a linkage of haplogroups J2, L, R1a, and R2 with a contemporaneous origin for the majority of the Indian castes’ paternal lineages from outside the subcontinent are rejected, although our findings do support a local origin of haplogroups F* and H. Of the others, only J2 indicates an unambiguous recent external contribution, from West Asia rather than Central Asia. The current distributions of haplogroup frequencies are, with the exception of the lineages, predominantly driven by geographical, rather than cultural determinants. Ironically, it is in the northeast of India, among the TB groups that there is clear-cut evidence for large-scale demic diffusion traceable by genes, culture, and language, but apparently not by agriculture.

Autosomal DNA variation

Results of studies based upon autosomal DNA variation have also been varied. In a major study (2009) using over 500,000 biallelic autosomal markers, Reich hypothesized that the modern Indian population was the result of admixture between two genetically divergent ancestral populations dating from the post-Holocene era. These two "reconstructed" ancient populations he termed "Ancestral South Indians" (ASI) and "Ancestral North Indians" (ANI). According to Reich: "ANI ancestry is significantly higher in Indo-European than Dravidian speakers, suggesting that the ancestral ASI may have spoken a Dravidian language before mixing with the ANI."[39]

Further building on Reich et al.'s characterization of the South Asian population as historically based on admixture of ANI (Ancestral North Indian) and ASI (Ancestral South Indian) populations, a 2011 session paper by Moorjani et al. states that a "major ANI-ASI mixture occurred in the ancestors of both northern and southern Indians 1,200-3,500 years ago, overlapping the time when Indo-European languages first began to be spoken in the subcontinent."[40]

Basu et al. (2003) suggests concludes that "Dravidian tribals were possibly widespread throughout India before the arrival of the Indo-European-speaking nomads" and that "formation of populations by fission that resulted in founder and drift effects have left their imprints on the genetic structures of contemporary populations".[20] The geneticist PP Majumder (2010) has recently argued that the findings of Reich et al. (2009) are in remarkable concordance with previous research using mtDNA and Y-DNA:[41]

Central Asian populations are supposed to have been major contributors to the Indian gene pool, particularly to the northern Indian gene pool, and the migrants had supposedly moved into India through what is now Afghanistan and Pakistan. Using mitochondrial DNA variation data collated from various studies, we have shown that populations of Central Asia and Pakistan show the lowest coefficient of genetic differentiation with the north Indian populations, a higher differentiation with the south Indian populations, and the highest with the northeast Indian populations. Northern Indian populations are genetically closer to Central Asians than populations of other geographical regions of India... . Consistent with the above findings, a recent study using over 500,000 biallelic autosomal markers has found a north to south gradient of genetic proximity of Indian populations to western Eurasians. This feature is likely related to the proportions of ancestry derived from the western Eurasian gene pool, which, as this study has shown, is greater in populations inhabiting northern India than those inhabiting southern India.

Genetic distance between caste groups and tribes

Studies by Watkins et al. (2005) and Kivisild et al. (2003) based on autosomal markers conclude that Indian caste and tribal populations have a common ancestry.[42][43] Reddy et al. (2005) found fairly uniform allele frequency distributions across caste groups of southern Andhra Pradesh, but significantly larger genetic distance between caste groups and tribes indicating genetic isolation of the tribes and castes.[44]

Viswanathan et al. (2004) in a study on genetic structure and affinities among tribal populations of southern India concludes, "Genetic differentiation was high and genetic distances were not significantly correlated with geographic distances. Genetic drift therefore probably played a significant role in shaping the patterns of genetic variation observed in southern Indian tribal populations. Otherwise, analyses of population relationships showed that Indian populations are closely related to one another, regardless of phenotypic characteristics, and do not show particular affinities to Africans. We conclude that the phenotypic similarities of some Indian groups to Africans do not reflect a close relationship between these groups, but are better explained by convergence."[45]

A 2011 study published in the American Journal of Human Genetics[5] indicates that Indian ancestral components are the result of a more complex demographic history than was previously thought. According to the researchers, South Asia harbours two major ancestral components, one of which is spread at comparable frequency and genetic diversity in populations of South and West Asia, the Middle East, the Near East and the Caucasus; the other component is more restricted to South Asia. However, rather than ruling out the possibility of Indo-Aryan migration, these findings suggest that the genetic affinities of both Indian ancestral components are the result of multiple gene flows over the course of thousands of years.[5]

See also


  1. ^ a b c d e
  2. ^ a b
  3. ^ a b
  4. ^ a b c d
  5. ^ a b c
  6. ^
  7. ^ a b c d e
  8. ^ a b c Y Haplogroups of the World, 2005, McDonald
  9. ^ a b c d e f g
  10. ^ a b c d
  11. ^ a b c d e
  12. ^ a b
  13. ^ Endicott, Metspalu & Kivisild 2007, p. 231.
  14. ^ Endicott, Metspalu & Kivisild 2007, pp. 234-235.
  15. ^ Oppenheimer 2003
  16. ^ a b
  17. ^ a b c
  18. ^
  19. ^ a b
  20. ^ a b c d
  21. ^ a b c d e
  22. ^ a b c d e
  23. ^ a b Y-DNA Haplogroup H and its Subclades - 2015
  24. ^ Manoukian, Jean-Grégoire (2006), "A Synthesis of Haplogroup R2 - 2006."
  25. ^
  26. ^ a b Ancestry of Maldives People in Light of Population Genetics
  27. ^ a b c d Sengupta et al. (2005)
  28. ^ ISOGG 2012 Y-DNA Haplogroup R
  29. ^ a b c d
  30. ^
  31. ^
  32. ^ a b c d Kivisild et al. (2003)
  33. ^ Fornarino et al. (2009)
  34. ^ Ancestry of Maldives People in Light of Population Genetics: Maldivian Ancestry in light of Genetics
  35. ^ The Place of the Indian mtDNA Variants in the Global Network of Maternal Lineages and the Peopling of the Old World
  36. ^
  37. ^
  38. ^
  39. ^
  40. ^ Abstract/Presentation
  41. ^
  42. ^
  43. ^
  44. ^
  45. ^
Additional references
  • (paper read at the South Asia Conference)


  • (PhD)

External links

This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.

Copyright © World Library Foundation. All rights reserved. eBooks from Hawaii eBook Library are sponsored by the World Library Foundation,
a 501c(4) Member's Support Non-Profit Organization, and is NOT affiliated with any governmental agency or department.