November 4, 2022

I-Y3120 and I-PH908 frequency in Southeastern Europe

In this article by using GenAlEx are analyzed frequencies of I-Y3120 and I-PH908 and the percentage of I-PH908 ("Dinaric South") within overall I-Y3120 frequency; the number of duplicate haplotypes and frequency of the same as an indicator of homogeneity & founder effect within a population; the number of samples per haplotype; the number of haplotypes with specific mutations on markers as a predictor of some I-PH908 subclades; the number of exact haplotype matches between populations as a possible indicator of connection and migrations on 17 Y-STR marker haplotypes from scientific studies. In the end are also analyzed and predicted Croatian 23 Y-STR marker haplotypes of non-PH908, "Dinaric North", subclades by using Nevgen predictor. The populations from scientific studies are:

  1. Croatia is divided into five regions (1100 samples from Mršić et al. 2012)
  2. Zagreb & Croatia (239 samples from Purps et al. 2014) 
  3. Bosnia and Herzegovina (200 samples from Kovačević et al. 2013 and Dogan et al. 2016)
  4. Serbian region of Vojvodina (185 samples from Veselinović et al. 2008)
  5. Serbia (820 samples from Veselinović et al. 2008, Mirabal et al. 2010, Todorović et al. 2014 & 2015, Scorrano et al. 2017, Zgonjanin et al. 2017)
  6. Serbs of Serbia, B&H, Croatia and Montenegro (303 samples from Kačar et al. 2019)
  7. Montenegro (404 samples from Mirabal et al. 2010)
  8. North Macedonia (204 samples from Purps et al. 2014 and Jankova et al. 2019)
  9. Bulgaria (344 samples from Karachanak et al. 2013 and Martinez-Cruz et al. 2016)
  10. Slovenia (305 samples from Purps et al. 2014 and Drobnic et al. 2017)
  11. Hungary (568 samples from Purps et al. 2014, Martinez-Crus et al. 2016, Pamjav et al. 2017)
  12. Romania (266 samples from Stanciu et al. 2010 and Martinez-Cruz et al. 2016)

Albania, Kosovo, and Greece were excluded from the comparison because their frequencies are significantly lower and those frequencies are evidence of admixture with South Slavic peoples.

First were analyzed five Croatian regional populations, checking their frequencies, the number of unique and duplicate haplotypes, including intrapopulation and interpopulation exact population pairs i.e. matches, and the number of samples per haplotype within and between regional populations.


Intrapopulation duplicates only partly are interpopulation duplicates on the level of Croatia.

In the Croatian regional populations frequency of I-Y3120 goes from 25.45% to 54.54% with an average of 37.81%, while the frequency of I-PH908 from 10.45% to 43.60%, with an average of 26.09%. Both have an increasing North-South inclination. Bosnian Croats weren't taken into account because of a lack of scientific Y-STR samples, but it is known that have the highest frequencies of I-Y3120 (>63%).

The number of haplotypes with DYS448=19/18, representing I-PH908, make the majority of I-Y3120 frequency in the regional populations (>60, 70, 80%) with an exception of Northern region where to make a large minority (>40%). The average Croatian percentage of I-PH908 distribution within I-Y3120 frequency is roughly 69%. The share of I-PH908 samples in the number of samples of two main Slavic macro-haplogroup lineages I-Y3120 and R1a has a national average of 43% while in regional populations is roughly making three clusters with 15% difference, highest (Western-Eastern-Southern, 45-52-59%), intermediate (Central, 35%) and lowest (Northern, 19%).




Map from Mršić et al. 2012

The population of Croatian capital city of Zagreb (114) and Croatia (125) are joined together to make a separate Croatian population (239) in a table and have 36.4% I-Y3120, 21.3% I-PH908 which is making 59% of I-Y3120. Separately, the city's percentages are much like of the region of Central Croatia.

The Croatian national and most regional frequencies are very similar to averages of a cluster comprising near Bosnia and Herzegovina, Serbia, Serbs from Western Balkans, and Montenegro (cca. 29-46% I-Y3120 / 20-39% I-PH908 / 68-84% I-PH908 ratio), with lower frequencies in a cluster comprising North Macedonia, Bulgaria and Romania (19-22% I-Y3120 / 7-11% I-PH908 / 39-51% I-PH908 ratio), to whom could be added North Croatian region, while lowest frequencies are in a cluster comprising of Slovenia and Hungary (15-19% I-Y3120 / 5% I-PH908 / 28-34% I-PH908 ratio). However, in all observed South Slavic & Balkan populations, I-PH908 roughly makes at least 1/3 and in some 2/3 of the I-Y3120 percentage, and at least 1/10 (8-12%) up to 6/10 (19-59%) of the cumulative I-Y3120 and R1a percentage. Also to be noted, considering other scientific studies, some countries like Romania (Kushniarevich et al. 2015 on 1638 samples has 40% I2) have much higher average and regional frequencies but due to differences in data could not have been taken into consideration for comparison.






The number and frequency of duplicate I-PH908 haplotypes are in correlation with the increase of I-PH908's frequency in the tested population (for example Southern Croatian compared to other Croatian regional populations), in other words, with the increase of the number of I-PH908 haplotypes in a population. On a national level, among observed populations, the highest duplicate frequency was in Hungary on 568 samples (25%), followed by cumulative Serbia + Western Balkans Serbs population on 1123 samples (25.60%) and Serbia on 820 samples (23.70%) where roughly 1/4 of I-PH908 frequency are duplicates. Also, high frequencies were in Croatia on 1100 samples (18.46%) and in Montenegro on 404 samples (15.80%). Considering 1-step-neighbor haplotypes and subclades emerges a reality where the number is even higher due to the demographic boom in which are repeated genealogical lines of tribes and families with fairly recent TMRCA.

Considering regional and national populations, the lowest duplicate frequency was in North Croatia and North Macedonia (0%), followed by Central and West Croatia and Slovenia (4.54-5.88%).

Croatia and Serbia have national populations with the most samples, but the number of duplicate haplotypes among Croats in Croatia is lower than in Serbs in Serbia although the number of samples for Serbia and cumulative Serbia + Western Balkans Serbs population are 1.3x smaller or slightly higher than Croatia's and also includes areas outside Serbia. In that sense happened a lower founder effect among Croats than Serbs, and Serbs aren't the population with the highest diversity of I-PH908. The highest diversity presumably should be in Bosnia and Herzegovina (among Bosniaks), but due to the substantially lower number of tested samples, it remains uncertain. Anyway, the frequency of the haplogroup and subclade in Southeastern Europe are evidently a result of strong founder effect(s), while the diversity and distance between haplotypes a result of arrival through different Slavic migrations.

Western Balkans Serbs* and Serbia + Western Balkans Serbs* calculation uses 74 instead of 76 I-PH908 samples because 2 of them lack a result on a marker. Hence, the population of W. Balkan Serbs has 74 instead of 76, Serbia + Western Balkans Serbs 285 instead of 287, and 1121 instead of 1123 samples used in the calculation of frequencies of duplicate haplotypes. In addition, 73 out of 74 Western Balkan Serbs haplotypes were unique on 23 markers, which resulted in 67 unique on 17 markers.

Bulgaria* calculation uses 35 instead of 38 I-PH908 samples because 3 of them lack a result on a marker.

Hungary* calculation uses 28 instead of 30 I-PH908 samples because 2 of them lack a result on a marker.


Also considering Croatian and Serbian national populations due to most samples, the most common haplotype in Croatia is on 7 samples, but usually isn't on the national yet intraregional level of Southern Croatia which already has 5 samples. On the national interregional level haplotype with the most samples is on 7, 6 (4 samples from Eastern Croatia), 2x 5 samples of which one is on 4 interregional samples. The most common haplotype in Serbia is on 10 samples, followed by those on 7, 6 and 5 samples, while on cumulative Serbia + Western Balkans Serbs population size the most common are on 2x 10, 8, 7 and 6 samples. In comparison to Croatia, haplotypes weren't specific for only one region yet reoccur on a national and international levels.


For marker mutations as potential predictors for some I-PH908 subclades could be said that DYS448=18 etc. for I-BY93199 makes 20-35% of I-PH908 in Montenegro (tribe of Ozrinići); probably some I-Y56203 samples among Serbs; probably few I-Y51673 samples everywhere; probably many I-Z16983>A493 samples among Serbs, Croats, B&H and Montenegro; probably few I-A5913>A22312 samples in B&H, Serbia and Montenegro. Although most numerous in the table, duplicate haplotypes usually aren't those with DYS391=10 result.


Probably one of the most extensive and interesting is the table showing a number of exact matches between tested populations. Note, it is a comparison of unique haplotypes, even on the regional level of Croatia. The 1-step-neighbors weren't taken into account because of I-PH908 haplotype modality, sometimes only 1 marker result is differentiating separate subclades. As most striking, the comparison of populations with the biggest size, Croatian 234 and Serbia + Western Balkans Serbs 212 unique haplotypes both on more than 1100 samples, shows that there exist several hundred unique haplotypes. It indicates high genetic diversity of I-PH908 in the region and that certainly exists some bias in smaller tested populations because they aren't representative of all the diversity & difference on a regional, interregional, national, and international level. The number of matches is also lower than expected, showing, contrary to some assumptions, old and distant connections and low admixture between the populations. Some matches represent more modal haplotypes. However, the number of samples is scientifically reliable, representative, and hence relevant to make such a comparison and make some conclusions based on a number of matches.

Generally could be said there exist some cluster proximity between Croatian, Bosnian and Herzegovinian, Serbian and Montenegin populations. Among them, Croatian regional populations cluster together. Serbian regional and international populations cluster together as well. Bosnia and Herzegovina is in-between Croatian and Serbian clusters. Montenegro makes its own cluster closer to Serbia or in-between Serbia, Bosnia and Herzegovina, and Croatia. North Macedonia has neutral proximities. Bulgaria also has neutral proximities, while in comparison to bigger tested populations it is closer to the Serbian than the Croatian population. Slovenia and Hungary are kind of neutral but closer to ex-Yugoslavian populations, interestingly both don't have matches with North and Central Croatia. Romania is also kind of neutral, seemingly closer to ex-Yugoslavian populations than neighboring Bulgaria and Hungary. As said, when counting matches excluding biggest & cumulative one Croatian and two Serbian populations, the lowest total number of international matches (less than 20) has the tested population of Hungary (14 nevertheless its high duplicate frequency), North Croatia (19) and North Macedonia (20), followed by with less than 30 matches the population of Bulgaria, Slovenia and Romania.

For Croatia* are taken into consideration matches of unique 261 interregional haplotypes, while for Serbia* matches of unique 186 haplotypes from diverse studies, while Serbia + Western Balkans Serbs* is a sum of these matches and those between Serbia and Western Balkans Serbs. Serbia + Western Balkans Serbs* shows a number of matches obtained calculating 67 unique for Western Balkans Serbs and 161 unique for Serbia which instead of 228 gives 212 unique haplotypes, while in parentheses is a calculation of 67 unique for Western Balkans Serbs and 186 for Serbia from diverse studies.


Considering some peculiarity of the Croatian Kajkavian-dialect-speaking North Croatian population, there were also analyzed 30 I-Y3120 Y-STR haplotypes from Croatian Chakavian-dialect speaking Adriatic islands of Cres, Dugi Otok, Lastovo, Mljet, Pašman, Pag, Ugljan and Vis from Šarac et al. 2016 (siblings and male relatives were avoided in the selection of the samples for DNA analysis based on obtained genealogical data, as well as examinees without at least two generations of ancestors living in the sampled subpopulation). In the SNP analysis 91 sample out of 384 (23.69%) in the islands population belonged to "I2a1b-M423" (frequency would be higher if a correction would be made according to the number of inhabitants of an individual island, because the samples are not evenly represented, and even more with included frequencies from other islands Krk, Brač, Hvar, Korčula from other studies (Peričić et al. 2005, Primorac et al. 2022). Hence more accurately, when the I2a frequencies are calibrated according to the demographic size of the islands, the insular population of the 8 islands has 29.80%, the insular population of the 12 islands has 34.57%, the insular population of the 10 Dalmatian islands has 42.75%, and the insular population of the 2 Kvarner islands has 8.97% of the haplogroup.

With 25 out of 30 having DYS448=19 and 2 out of 30 having DYS448=18 for a total of 27 out of 30 samples being I-PH908, making a highest observed percentage of I-PH908 within I-Y3120 frequency (90%) and having same general average frequency of Croatia (c. 21.32%). There was one exact match within the island of Ugljan for a total of 26 unique haplotypes of high diversity, and low duplicate frequency (3.7%). With such percentages can be said that on Croatian islands is highest concentration of I-PH908 in the world.

For finding matches was calculated adequate result on DYS389II marker. Like in the case of North Croatian population it also had a total number of matches of less than 20, actually lowest observed (17), excluding cumulative & big Croatian and Serbian populations (in both 2/3 matches are the same Mljet3 and Pašman2 haplotype), and similarly showed low connection to Slovenia and Eastern Balkans. Expectedly it had and kind-of most matches with all Croatian, Bosnian and Herzegovinian, and Serbian populations. Seemingly the I-PH908 frequency, diversity and matching isolation of Croatian populations of Kajkavian and Chakavian-dialect make them to form their own clusters.



2. Croatian I-Y3120 "Dinaric North" subclades:

As for the I-Y3120 Dinaric-North subclades, frequency is 11.7% making 31% of the I-Y3120 in Croatia (1100 samples from Mršić et al. 2012) and 15% with share of 41% in Zagreb+Croatia (239 samples from Purps et al. 2014). As was previously said, the frequency of non-PH908 subclades is highest in North and Central Croatia. According to the FTDNA's Croatian DNA Project samples, the Croats in Croatia and Bosnia and Herzegovina belong to I-S17250 (subclades R-Z16971>A815 and R-Y4882 both with TMRCA of 1850 YBP) and I-FT76511 (subclade TMRCA 1500 YBP; sub-subclade I-FT256359 TMRCA 1320 YBP). 

In the subclade R-Z16971>A815 both were in sub-subclade I-FGC92673 (one of them FT398145>FT398005 with match in Czechia; TMRCA 1700-1400 YBP) while in the subclade R-Y4482 both were in sub-subclade I-Z16969, but different sub-subclades, one in I-A1328 (Ukraine, Poland, medieval Hungarian "Sárrétudvari 251") > FT27092 (Ukraine, Russia, Belarus, Slovakia; TMRCA 1650 YBP) > Y177643 (Bosnia and Herzegovina; 1550-1350 YBP) > Y220413 (Bosnia and Herzegovina, Montenegro; TMRCA 1500-1200 YBP) > FT109194 (Montenegro/Bosnia; TMRCA 1100 YBP), second in I-FT28683 > FT226043 (Russia, Ukraine, Slovakia, Poland, Czechia, Hungary; TMRCA 1750-1690 YBP) > FTC45751 (Bulgaria; TMRCA 500 YBP). On almost all subclades exist matches located in the Carpathian Mountains and further location spots reminding of the movement of the tribe of White Croats from Western Ukraine up to Czechia.

For a better Nevgen haplogroup prediction were analyzed only 36 haplotypes on 23 Y-STR from Zagreb+Croatia (Purps et al. 2014). According to the predictor, the subclades I-S17250 and I-Y4460 were almost the same in frequency (6-8% vs 5-8%), but I-Y4460 should have the most frequent individual sub-subclade, I-Y3118 (4-7%), as I-S17250 is mainly divided between I-Z16971 (3-5%) and I-Y4882 (1-2%), and traces of I-BY142076. Others subclades, like I-Z17855 and I-FT76511, were rare.

I-Y3120 "Dinaric North" clades and subclades excluding derivative "Dinaric South" subclade (I-S17250>PH908)