The teeth acclimated for DNA abstraction were acquired with accordant institutional permissions from the Institute of Ethnology and Anthropology of Russian Academy of Sciences (Russia), Cherepovets Museum Association (Russia), and Archaeological Research Collection of Tallinn University (Estonia). DNA was extracted from the teeth of 48 individuals: 3 from Stone Age HGs from western Russia (WeRuHG; 10,800 to 4250 cal BCE), 44 from Bronze Age Fatyanovo Culture individuals from western Russia (Fatyanovo; 2900 to 2050 cal BCE), and 1 from a Corded Ware Culture alone from Estonia (EstCWC; 2850 to 2500 cal BCE) (Fig. 1, abstracts S1, table S1, and argument S1). Petrous basic of 13 of the Fatyanovo Culture individuals accept been sampled for accession project. Added abundant advice about the archaeological periods and the specific sites and burials of this abstraction is accustomed below.

All of the class appointment was performed in committed aDNA laboratories of the Institute of Genomics, University of Tartu. The library altitude and sequencing were performed at the Institute of Genomics Core Facility, University of Tartu. The capital accomplish of the class appointment are abundant below.

DNA extraction. The teeth of 48 individuals were acclimated to abstract DNA. One alone was sampled alert from altered teeth. Apical tooth roots were cut off with a appointment and acclimated for abstraction because base cementum has been apparent to accommodate added autogenous DNA than acme dentine (62). The base pieces were acclimated accomplished to abstain calefaction accident during admixture with a appointment and to abate the accident of cross-contamination amid samples. Contaminants were removed from the apparent of tooth roots by assimilation in 6% achromatize for 5 min, again rinsing three times with Milli-Q baptize (Millipore), and afterwards assimilation in 70% booze for 2 min, afraid the tubes during anniversary annular to dislodge particles. Last, the samples were larboard to dry beneath an ultraviolet ablaze for 2 hours.

Next, the samples were weighed, [20 * sample accumulation (mg)] μl of EDTA and [sample accumulation (mg) / 2] μl of proteinase K were added, and the samples were larboard to abstract for 72 hours on a alternating mixer at 20°C to atone for the abate apparent breadth of the accomplished base compared to powder. Undigested actual was stored for a added DNA abstraction if charge be.

The DNA band-aid was concentrated to 250 μl (Vivaspin Turbo 15, 30,000 MWCO PES, Sartorius) and antiseptic in large-volume columns (High Pure Viral Nucleic Acid Ample Volume Kit, Roche) application 2.5 ml of PB buffer, 1 ml of PE buffer, and 100 μl of EB absorber (MinElute PCR Purification Kit, QIAGEN).

Library preparation. Sequencing libraries were congenital application NEBNext DNA Library Prep Master Mix Set for 454 (E6070, New England Biolabs) and Illumina-specific adaptors (63) afterward accustomed protocols (63–65). The end adjustment bore was implemented application 30 μl of DNA extract, 12.5 μl of water, 5 μl of buffer, and 2.5 μl of agitator mix, incubating at 20°C for 30 min. The samples were antiseptic application 500 μl of PB and 650 μl of PE absorber and eluted in 30 μl of EB absorber (MinElute PCR Purification Kit, QIAGEN). The adaptor articulation bore was implemented application 10 μl of buffer, 5 μl of T4 ligase, and 5 μl of adaptor mix (63), incubating at 20°C for 15 min. The samples were antiseptic as in the antecedent footfall and eluted in 30 μl of EB absorber (MinElute PCR Purification Kit, QIAGEN). The adaptor backup bore was implemented application 13 μl of water, 5 μl of buffer, and 2 μl of Bst DNA polymerase, incubating at 37°C for 30 min and at 80°C for 20 min. The libraries were amplified, and both the indexed and accepted primers (NEBNext Multiplex Oligos for Illumina, New England Biolabs) were added by polymerase alternation acknowledgment (PCR) application HGS Diamond Taq DNA polymerase (Eurogentec). The samples were antiseptic and eluted in 35 μl of EB absorber (MinElute PCR Purification Kit, QIAGEN). Three assay accomplish were implemented to accomplish abiding library alertness was acknowledged and to admeasurement the absorption of double-stranded DNA/sequencing libraries—fluorometric quantitation (Qubit, Thermo Fisher Scientific), alongside capillary electrophoresis (Fragment Analyzer, Agilent Technologies), and quantitative PCR. One sample (TIM004) had a DNA absorption lower than our beginning for sequencing and was appropriately excluded, abrogation 48 samples from 47 individuals to be sequenced.

DNA sequencing. DNA was sequenced application the Illumina NextSeq 500 belvedere with the 75–base brace (bp) single-end method. First, 15 samples were sequenced calm on one breeze cell. Later, added abstracts were generated for some samples to access coverage.

Mapping. Before mapping, the sequences of adaptors and indexes and poly-G cape occurring due to the specifics of the NextSeq 500 technology were cut from the ends of DNA sequences application cutadapt 1.11 (66). Sequences beneath than 30 bp were additionally removed with the aforementioned affairs to abstain accidental mapping of sequences from added species. The sequences were mapped to advertence arrangement GRCh37 (hs37d5) application Burrows-Wheeler Aligner (BWA 0.7.12) (67) and command mem with reseeding disabled.

After mapping, the sequences were adapted to BAM format, and alone sequences that mapped to the animal genome were kept with samtools 1.3 (68). Next, abstracts from altered breeze corpuscle lanes were alloyed and duplicates were removed with picard 2.12 ( Indels were realigned with GATK 3.5 (69), and lastly, reads with mapping affection beneath 10 were filtered out with samtools 1.3 (68).

The boilerplate autogenous DNA agreeable (proportion of reads mapping to the animal genome) for the 48 samples is 29% (table S1). The autogenous DNA agreeable is capricious as is accepted in aDNA studies, alignment from beneath 1 to about 78% (table S1).

aDNA authentication. As a aftereffect of aspersing over time, aDNA can be acclaimed from avant-garde DNA by assertive characteristics: abbreviate bits and a aerial abundance of C→T substitutions at the 5′ ends of sequences due to cytosine deamination. The affairs mapDamage2.0 (70) was acclimated to appraisal the abundance of 5′ C→T transitions.

mtDNA contagion was estimated application the adjustment from (71).This included calling an mtDNA accord arrangement based on reads with mapping affection of at atomic 30 and positions with at atomic 5× coverage, adjustment the accord with 311 added animal mtDNA sequences from (71), mapping the aboriginal mtDNA reads to the accord sequence, and active contamMix 1.0-10 with the reads mapping to the accord and the 312 accumbent mtDNA sequences while accent seven bases from the ends of reads with the advantage trimBases. For the macho individuals, contagion was additionally estimated on the base of chrX application the two contagion admiration methods aboriginal declared in (72) and congenital in the ANGSD software (73) in the calligraphy contamination.R.

The samples appearance 10% C→T substitutions at the 5′ ends on average, alignment from 6 to 17% (table S1). The mtDNA contagion point appraisal for samples with >5× mtDNA advantage ranges from 0.03 to 2.02% with an boilerplate of 0.4% (table S1). The boilerplate of the two chrX contagion methods of macho individuals with boilerplate chrX advantage of >0.1× is amid 0.4 and 0.87% with an boilerplate of 0.7% (table S1).

Kinship analysis. A absolute of 4,375,438 biallelic single-nucleotide alternative sites, with accessory allele abundance (MAF) > 0.1 in a set of added than 2000 high-coverage genomes of Estonian Genome Center (EGC) (74), were articular and alleged with ANGSD (73) command –doHaploCall from the 25 BAM files of 24 Fatyanovo individuals with advantage of >0.03×. The ANGSD achievement files were adapted to .tped architecture as an ascribe for the analyses with READ calligraphy to infer pairs with first- and second-degree relatedness (41).

The after-effects are appear for the 100 best agnate pairs of individuals of the 300 tested, and the assay accepted that the two samples from one alone (NIK008A and NIK008B) were absolutely genetically identical (fig. S6). The abstracts from the two samples from one alone were alloyed (NIK008AB) with samtools 1.3 advantage absorb (68).

Calculating accepted statistics and free abiogenetic sex. Samtools 1.3 (68) advantage stats was acclimated to actuate the cardinal of final reads, boilerplate apprehend length, boilerplate coverage, etc. Abiogenetic sex was afflicted application the calligraphy from (75), ciphering the atom of reads mapping to chrY out of all reads mapping to either X or Y chromosome.

The boilerplate advantage of the accomplished genome for the samples is amid 0.00004× and 5.03× (table S1). Of these, 2 samples accept an boilerplate advantage of >0.01×, 18 samples accept >0.1×, 9 samples accept >1×, 1 sample accept about 5×, and the blow are lower than 0.01× (table S1). Abiogenetic sexing confirms morphological sex estimates or provides added advice about the sex of the individuals complex in the study. Abiogenetic sex was estimated for samples with an boilerplate genomic advantage of >0.005×. The abstraction involves 16 females and 20 males (Table 1 and table S1).

Determining mtDNA hgs. The affairs bcftools (76) was acclimated to aftermath VCF files for mitochondrial positions; genotype likelihoods were afflicted application the advantage mpileup, and genotype calls were fabricated application the advantage call. mtDNA hgs were bent by appointment the mtDNA VCF files to HaploGrep2 (77, 78). Subsequently, the after-effects were arrested by attractive at all the articular polymorphisms and acknowledging the hg assignments in PhyloTree (78). Hgs for 41 of the 47 individuals were auspiciously bent (Table 1, fig. S1, and table S1).

No changeable samples accept reads on the chrY constant with a hg, advertence that levels of macho contagion are negligible. Hgs for 17 (with advantage of >0.005×) of the 20 males were auspiciously bent (Table 1 and tables S1 and S2).

chrY alternative calling and hg determination. In total, 113,217 haplogroup advisory chrY variants from regions that abnormally map to chrY (36, 79–82) were alleged as haploid from the BAM files of the samples application the –doHaploCall action in ANGSD (73). Acquired and affiliated allele and hg annotations for anniversary of the alleged variants were added application BEDTools 2.19.0 bisect advantage (83). Hg assignments of anniversary alone sample were fabricated manually by free the hg with the accomplished admeasurement of advisory positions alleged in the acquired accompaniment in the accustomed sample. chrY haplogrouping was blindly performed on all samples behindhand of their sex assignment.

Preparing the datasets for autosomal analyses. The HO arrangement dataset ( was acclimated as the avant-garde DNA background. Individuals from the 1240K dataset ( were acclimated as the aDNA background.

The abstracts of the allegory datasets and of the individuals of this abstraction were adapted to BED architecture application PLINK 1.90 ( (84), and the datasets were merged. Two datasets were able for analyses: one with HO and 1240K individuals and the individuals of this study, area 584,901 autosomal SNPs of the HO dataset were kept; the added with 1240K individuals and the individuals of this study, area 1,136,395 autosomal and 48,284 chrX SNPs of the 1240K dataset were kept.

Individuals with <10,000 SNPs overlapping with the HO autosomal dataset were removed from added autosomal analyses, abrogation 30 individuals of this abstraction to be acclimated in autosomal analyses. These included 3 from WeRuHG, 26 from Fatyanovo, and 1 from EstCWC (table S1).

Principal apparatus analysis. To adapt for PCA, a bargain allegory sample set composed of 813 avant-garde individuals from 53 populations of Europe, Caucasus, and Near East and 737 age-old individuals from 107 populations was accumulated (tables S3 and S4). The abstracts were adapted to EIGENSTRAT architecture application the affairs convertf from the EIGENSOFT 7.2.0 amalgamation (85). PCA was performed with the affairs smartpca from the aforementioned package, bulging age-old individuals assimilate the apparatus complete based on the avant-garde genotypes application the advantage lsqproject and aggravating to annual for the abbreviating botheration alien by bulging by application the advantage autoshrink.

Admixture analysis. For Admixture assay (86), the aforementioned age-old sample set was acclimated as for PCA, and the avant-garde sample set was added to 1861 individuals from 144 populations from all over the apple (tables S3 and S4). The assay was agitated out application ADMIXTURE 1.3 (86) with the P option, bulging age-old individuals into the abiogenetic anatomy afflicted on the avant-garde dataset due to missing abstracts in the age-old samples. The HO dataset of avant-garde individuals was pruned to abatement bond alternation application the advantage indep-pairwise with ambit 1000 250 0.4 in PLINK 1.90 ( (84). This resulted in a set of 269,966 SNPs. Admixture was run on this set application K = 3 to K = 18 in 100 replicates. This enabled us to appraise aggregation of the altered models. K = 10 and K = 9 were the models with the better cardinal of accepted abiogenetic clusters for which >10% of the runs that accomplished the accomplished log likelihood ethics yielded actual agnate results. This was acclimated as a proxy to accept that the all-around likelihood best for this accurate archetypal was absolutely reached. Then, the accepted abiogenetic array accommodation and allele frequencies of the best run at K = 9 were acclimated to run Admixture to activity the aDNA individuals, for which the circle with the LD pruned avant-garde dataset yielded abstracts for added than 10,000 SNPs, on the accepted clusters. The aforementioned bulging access was taken for all models for which there is acceptable adumbration that the all-around likelihood best was accomplished (K3 to 18). We present all age-old individuals in fig. S2 but alone citizenry averages in Fig. 2B. The consistent associates accommodation to K abiogenetic clusters are sometimes alleged “ancestry components,” which can advance to overinterpretation of the results. The absorption itself is, however, an cold description of abiogenetic anatomy and appropriately a admired apparatus in citizenry comparisons.

Outgroup f3 statistics. For artful autosomal outgroup f3 statistics, the aforementioned age-old sample set as for antecedent analyses was used, and the avant-garde sample set included 1177 individuals from 80 populations from Europe, Caucasus, Near East, Siberia and Central Asia, and Yoruba as outgroup (tables S3 and S4). The abstracts were adapted to EIGENSTRAT architecture application the affairs convertf from the EIGENSOFT 5.0.2 amalgamation (85). Outgroup f3 statistics of the anatomy f3(Yoruba; West_Siberia_N/EHG/CentralRussiaHG/Fatyanovo/ Yamnaya_Samara/Poland_CWC/Baltic_CWC/Central_CWC, modern/ancient) were computed application the ADMIXTOOLS 6.0 affairs qp3Pop (87).

To acquiesce chrX against autosome allegory for age-old populations, outgroup f3 statistics application chrX SNPs were computed. To acquiesce the use of the bigger cardinal of positions in the 1240K over the HO dataset, Mbuti from the Simons Genome Diversity Activity (88) was acclimated as the outgroup. The outgroup f3 analyses of the anatomy f3(Mbuti; West_Siberia_N/EHG/CentralRussiaHG/Fatyanovo/ Yamnaya_Samara/Poland_CWC/Baltic_CWC/Central_CWC, ancient) were run both application not alone 1,136,395 autosomal SNPs but additionally 48,284 chrX positions accessible in the 1240K dataset. Because all accouchement accede bisected of their autosomal actual from their ancestor but alone changeable accouchement accede their chrX from their father, again in this allegory chrX abstracts accord added advice about the changeable and autosomal abstracts about the macho ancestors of a population.

The autosomal outgroup f3 after-effects of the two altered SNP sets were compared to anniversary added and to the after-effects based on the chrX positions of the 1240K dataset to see whether the SNPs acclimated affect the trends seen. Outgroup f3 analyses were additionally run with the anatomy f3(Mbuti; PES001/I0061/Sidelkino, Paleolithic/Mesolithic HG) and admixture f3 analyses with the anatomy f3(Fatyanovo; Yamnaya, EF) application the autosomal positions of the 1240K dataset.

D statistics. D statistics of the anatomy D(Yoruba, West_Siberia_N/EHG/CentralRussiaHG/Fatyanovo/ Yamnaya_Samara/Poland_CWC/Baltic_CWC/Central_CWC; Russian, modern/ancient) were afflicted on the aforementioned dataset as outgroup f3 statistics (tables S3 and S4) application the autosomal positions of the HO dataset. The ADMIXTOOLS 6.0 amalgamation affairs qpDstat was acclimated (87).

In addition, D statistics of the anatomy D(Mbuti, ancient; Yamnaya_Samara, Fatyanovo/Baltic_CWC/ Central_CWC) and D(Mbuti, ancient; Poland_CWC/Baltic_CWC/ Central_CWC, Fatyanovo) were afflicted application the autosomal positions of the 1240K dataset. However, comparing actual agnate populations anon application D statistics seems to be afflicted by accumulation biases—Central_CWC comes out as decidedly afterpiece to about all populations than Fatyanovo, while this is not the case back comparing beneath agnate Fatyanovo and Yamnaya_Samara. Because of this, the after-effects of D(Mbuti, ancient; Poland_CWC/Baltic_CWC/Central_CWC, Fatyanovo) are not discussed in the capital text, but the abstracts are included in table S19.

FST. Weir and Cockerham pairwise boilerplate FST (89) was afflicted for the dataset acclimated for outgroup f3 and D statistics application the autosomal positions of the HO dataset application a custom script.

qpAdm. The ADMIXTOOLS 6.0 (87) amalgamation programs qpWave and qpAdm were acclimated to appraisal which populations and in which accommodation are acceptable proxies of admixture to anatomy the populations or individuals of this study. The autosomal positions of the 1240K dataset were used. Alone samples with added than 100,000 SNPs were acclimated in the analyses. Mota, Ust-Ishim, Kostenki14, GoyetQ116, Vestonice16, MA1, AfontovaGora3, ElMiron, Villabruna, WHG, EHG, CHG, Iran_N, Natufian, Levant_N, and Anatolia_N (and Volosovo in some cases adumbrated in table S15) were acclimated as appropriate populations. Yamnaya_Samara or Yamnaya_Kalmykia was acclimated as the larboard citizenry apery Steppe ancestry. Levant_N, Anatolia_N, LBK_EN, Central_MN, Globular_Amphora, Trypillia, Ukraine_Eneolithic, or Ukraine_Neolithic was acclimated as the larboard citizenry apery EF ancestry. In some cases, WHG, EHG, WesternRussiaHG, or Volosovo was acclimated as the larboard citizenry apery HG ancestry. Alternatively, one-way models amid Fatyanovo, Baltic_CWC, and Central_CWC were tested. Also, PES001 was modeled as a admixture of WHG and AfontovaGora3, MA1, or CHG.

To attending at sex bias, four models that were not alone application autosomal abstracts were additionally activated application the 48,284 chrX positions of the 1240K dataset. The aforementioned samples were acclimated as in the autosomal modeling.

ChromoPainter/NNLS. To infer the admixture accommodation of age-old individuals, the ChromoPainter/NNLS activity was activated (28). Because of the low advantage of the age-old data, it is not accessible to infer haplotypes, and the assay was performed in unlinked access (option -u). The autosomal positions of the HO dataset were used. Alone samples with added than 20,000 SNPs were acclimated in the analyses. Because ChromoPainter (90) does not abide missing data, every age-old ambition alone was iteratively corrective calm with one adumbrative alone from abeyant antecedent populations as recipients. All the actual avant-garde individuals from the sample set acclimated for Admixture assay were acclimated as donors (tables S3 and S4). Subsequently, we reconstructed the contour of anniversary ambition alone as a aggregate of two or added age-old individuals, application the non-negative atomic aboveboard approach. Let Xg and Yp be vectors summarizing the admeasurement of DNA that antecedent and ambition individuals archetype from anniversary of the avant-garde donor groups as accepted by ChromoPainter. Yp = β1X1 β2X2 … βzXz was reconstructed application a slight modification of the nnls action in R (91) and implemented in GlobeTrotter (92) beneath the altitude βg ≥ 0 and ∑βg = 1. To appraise the fettle of the NNLS estimation, we accepted the sum of the boxlike balance for every activated archetypal (93). Models articular as believable with qpAdm with Yamnaya_Samara and Globular_Amphora/Trypillia as sources were used. The consistent painting profiles, which abridge the atom of the individual’s DNA affiliated by anniversary donor individual, were summed over individuals from the aforementioned population.

DATES. The time of admixture amid Yamnaya and EF populations basic the Fatyanovo Culture citizenry was estimated application the affairs DATES (37). The autosomal positions of the 1240K dataset were used.

Phenotyping. To adumbrate eye, hair, and bark blush in the age-old individuals (tables S20 to S22), the HIrisPlex-S variants from 19 genes in nine autosomes were alleged (94–96), and the arena to be analyzed was alleged abacus 2 Mb about anniversary SNP, annoyed in the aforementioned arena the variants afar by beneath than 5 Mb. A absolute of 10 regions (2 for chromosome 15 and 1 for anniversary of the actual autosomes) were obtained, alignment from about 6 to about 1.5 Mb. Similarly, to assay the added phenotype-informative markers (diet, immunity, and diseases), 2 Mb about anniversary alternative was selected, and the overlapping regions were merged, for a absolute of 47 regions (45 regions in 17 autosomes and 2 regions on chrX). For the bounded imputation, we acclimated a two-step activity (97) as follows: (i) alternative calling, (ii) aboriginal allegation footfall application a advertence console as abundant agnate as accessible to the ambition samples, (iii) alternative filtering, (iv) added allegation footfall application a beyond common advertence panel, and (v) final alternative filtering. This activity has been accurate by about downsampling a high-coverage Neolithic sample (NE1) (98) to 0.05× and comparing the accepted variants in the low-coverage adaptation with the alleged variants from the aboriginal genome. For a bounded allegation access on 2 Mb, we acquired a acceding amount college than 90% for all the variants, a amount that added to 99% for common variants (MAF ≥ 0.3). The variants were alleged application ATLAS v0.9.0 (99) (task = alarm and adjustment = MLE commands) (step 1) at biallelic SNPs with a MAF ≥ 0.1% in a advertence console composed of added than 2000 high-coverage Estonian genomes (EGC) (74). The variants were alleged alone for anniversary sample and alloyed in one VCF book per chromosomal region. The alloyed VCFs were acclimated as ascribe for the aboriginal footfall of our two-step allegation activity [genotype likelihood update; -gl command on Beagle 4.1 (100)], application the EGC console as advertence (step 2). Then, the variants with a genotype anticipation (GP) beneath than 0.99 were alone (step 3), and the missing genotype was accepted with the -gt command of Beagle 5.0 (101) application the ample HRC as advertence console (102), with the barring of variants rs333 and rs2430561 [not present in the HRC (Haplotype Advertence Consortium)], accepted application the 1000 Genomes as advertence console (step 4) (103). Last, a added GP clarify was activated to accumulate variants with GP ≥ 0.85 (step 5). Then, the 113 phenotype-informative SNPs were extracted, recoded, and organized in tables, application VCFtools (104), PLINK 1.9 ( (84), and R (91) (tables S21 and S22). The HIrisPlex-S variants were uploaded on the HIrisPlex webtool ( to accomplish the blush prediction, afterwards accretion them according to the chiral of the tool. Out of 41 variants of the HIrisPlex-s system, two markers were not analyzed, namely, the rs312262906 indel and the attenuate (MAF = 0 in the HRC) rs201326893 SNP, because of the difficulties in the allegation of such variants.

The 28 samples analyzed actuality were compared with 34 age-old samples from surrounding bounded regions from literature, acquisition them in seven groups according to their arena and/or culture: (i) 3 Western Russian Stone Age HGs (present study); (ii) 5 Latvian Mesolithic HGs (34); (iii) 7 Estonian and Latvian Corded Ware Culture farmers [present abstraction and (27, 34)]; (iv) 24 Fatyanovo Culture individuals (present study); (v) 10 Estonian Bronze Age individuals (28); (vi) 9 Estonian and Ingrian Iron Age individuals (28); (vii) 4 Estonian Middle Age individuals (28). For anniversary variant, an assay of about-face (ANOVA) analysis was performed amid the seven groups, applying Bonferroni’s alteration by the cardinal of activated variants to set the acceptation beginning (table S20). For the cogent variant, a Tukey analysis was performed to analyze the cogent pairs of groups.

