Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A METHOD FOR NON-INVASIVE PRENATAL DETECTION OF FETAL SEX CHROMOSOMAL ABNORMALITIES AND FETAL SEX DETERMINATION FOR SINGLETON AND TWIN PREGNANCIES
Document Type and Number:
WIPO Patent Application WO/2019/025004
Kind Code:
A1
Abstract:
The invention relates generally to the field of non-invasive prenatal screening and diagnostics. The invention provides reliable method that is applicable to the practice of non-invasive prenatal screening for sex chromosomes aneuploidies such as monosomy X (X0, Turner syndrome), XXY (Klinefelter syndrome), XXX (triple X syndrome), and XYY (Jacob syndrome), from the blood sample taken from mother in early stage of pregnancy. Moreover, the invention provides a novel approach to calculation of fetal fraction of cell free DNA fragments. The invention relates particularly to single or twin pregnancies, however, the extension of the invention to triple or quadruple pregnancies is contemplated.

Inventors:
DURIS FRANTISEK (SK)
GAZDARICA JURAJ (SK)
KUCHARIK MARCEL (SK)
HYBLOVA MICHAELA (SK)
SZEMES TOMAS (SK)
BUDIS JAROSLAV (SK)
Application Number:
PCT/EP2017/069795
Publication Date:
February 07, 2019
Filing Date:
August 04, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
TRISOMYTEST S R O (SK)
International Classes:
C12Q1/6869; C12Q1/6879
Domestic Patent References:
WO2013192562A12013-12-27
WO2011051283A12011-05-05
Foreign References:
EP2183693A12010-05-12
EP2366031A12011-09-21
US8296076B22012-10-23
Other References:
LAU TZE KIN ET AL: "Noninvasive prenatal diagnosis of common fetal chromosomal aneuploidies by maternal plasma DNA sequencing", THE JOURNAL OF MATERNAL FETAL & NEONATAL MEDICINE : THE OFFICIAL JOURNAL OF THE EUROPEAN ASSOCIATION OF PERINATAL MEDICINE, THE FEDERATION OF ASIA AND OCEANIA PERINATAL SOCIETIES, THE INTERNATIONAL SOCIETY OF PERINATAL OBSTETRICIANS, INFORMA HEALTHCA, vol. 25, no. 8, 1 August 2012 (2012-08-01), pages 1370 - 1374, XP008164835, ISSN: 1057-0802, DOI: 10.3109/14767058.2011.635730
S. C. Y. YU ET AL: "Size-based molecular diagnostics using plasma DNA for noninvasive prenatal testing", PROCEEDINGS NATIONAL ACADEMY OF SCIENCES PNAS, vol. 111, no. 23, 19 May 2014 (2014-05-19), US, pages 8583 - 8588, XP055297276, ISSN: 0027-8424, DOI: 10.1073/pnas.1406103111
AMIN R. MAZLOOM ET AL: "Noninvasive prenatal detection of sex chromosomal aneuploidies by sequencing circulating cell-free DNA from maternal plasma", PRENATAL DIAGNOSIS, vol. 33, no. 6, 17 June 2013 (2013-06-17), pages 591 - 597, XP055089609, ISSN: 0197-3851, DOI: 10.1002/pd.4127
LO ET AL., LANCET, vol. 350, 1997, pages 485 - 487
LO, YM DENNIS ET AL.: "Presence of fetal DNA in maternal plasma and serum", THE LANCET, vol. 350.9076, 1997, pages 485 - 4
CHIU, ROSSA WK ET AL.: "Noninvasive prenatal diagnosis of fetal chromosomal aneuploidy by massively parallel genom sequencing of DNA in maternal plasma", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES, vol. 105.51, 2008, pages 20458 - 20463
FAN, H. CHRISTINA ET AL.: "Noninvasive diagnosis of fetal aneuploidy by shotgun sequencing DNA from maternal blood", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES, vol. 105.42, 2008, pages 16266 - 16271, XP002613056, DOI: doi:10.1073/pnas.0808319105
CHIU, ROSSA WK ET AL.: "Non-invasive prenatal assessment of trisomy 21 by multiplexed maternal plasma DNA sequencing: large scale validity study", BMJ, vol. 342, 2011, pages c7401
SEHNERT, AMY J. ET AL.: "Optimal detection of fetal chromosomal abnormalities by massively parallel DNA sequencing of cell-free fetal DNA from maternal blood", CLINICAL CHEMISTRY, vol. 57.7, 2011, pages 1042 - 1049
LAU, TZE KIN ET AL.: "Noninvasive prenatal diagnosis of common fetal chromosomal aneuploidies by maternal plasma DNA sequencing", THE JOURNAL OF MATERNAL-FETAL & NEONATAL MEDICINE, vol. 25.8, 2012, pages 1370 - 1374, XP008164835, DOI: doi:10.3109/14767058.2011.635730
BIANCHI, DIANA W. ET AL.: "Genome-wide fetal aneuploidy detection by maternal plasma DNA sequencing", OBSTETRICS & GYNECOLOGY, vol. 119.5, 2012, pages 890 - 901, XP009161880, DOI: doi:10.1097/AOG.0b013e31824fb482
STRAVER, ROY ET AL.: "WISECONDOR: detection of fetal aberrations from shallow sequencing maternal plasma based on a within-sample comparison scheme", NUCLEIC ACIDS RESEARCH, vol. 42.5, 2014, pages e31 - e31, XP055235535, DOI: doi:10.1093/nar/gkt992
STEPHANIE, C. YU ET AL.: "Size-based molecular diagnostics using plasma DNA for noninvasive prenatal testing", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES, vol. 111.23, 2014, pages 8583 - 8588
TYNAN, J. A. ET AL.: "Application of risk score analysis to low-coverage whole genome sequencing data for the noninvasive detection of trisomy 21, trisomy 18, and trisomy 13", PRENATAL DIAGNOSIS, vol. 36.1, 2016, pages 56 - 62
ZIMMERMANN, BERNHARD ET AL.: "Noninvasive prenatal aneuploidy testing of chromosomes 13, 18, 21, X, and Y, using targeted sequencing of polymorphic loci", PRENATAL DIAGNOSIS, vol. 32.13, 2012, pages 1233 - 1241, XP055119823, DOI: doi:10.1002/pd.3993
MAZLOOM, AMIN R. ET AL.: "Noninvasive prenatal detection of sex chromosomal aneuploidies by sequencing circulating cell-free DNA from maternal plasma", PRENATAL DIAGNOSIS, vol. 33.6, 2013, pages 591 - 597, XP055089609, DOI: doi:10.1002/pd.4127
LIANG, DESHENG ET AL.: "Non-invasive prenatal testing of fetal whole chromosome aneuploidy by massively parallel sequencing", PRENATAL DIAGNOSIS, vol. 33.5, 2013, pages 409 - 415, XP055147281, DOI: doi:10.1002/pd.4033
WANG, YANLIN ET AL.: "Maternal mosaicism is a significant contributor to discordant sex chromosomal aneuploidies associated with noninvasive prenatal testing", CLINICAL CHEMISTRY, vol. 60.1, 2014, pages 251 - 259
KIM, SUNG K. ET AL.: "Determination of fetal DNA fraction from the plasma of pregnant women using sequence read counts", PRENATAL DIAGNOSIS, vol. 35.8, 2015, pages 810 - 815, XP055215002, DOI: doi:10.1002/pd.4615
STRAVER, ROY ET AL.: "Calculating the fetal fraction for noninvasive prenatal testing based on genome-wide nucleosome profiles", PRENATAL DIAGNOSIS, vol. 36.7, 2016, pages 614 - 621
JIANG, PEIYONG ET AL.: "FetalQuantSD: accurate quantification of fetal DNA fraction by shallow-depth sequencing of maternal plasma DNA", GENOMIC MEDICINE, vol. 1, 2016, pages 16013
COCK, PETER JA ET AL.: "The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants", NUCLEIC ACIDS RESEARCH, vol. 38.6, 2010, pages 1767 - 1771
LI, HENG ET AL.: "The sequence alignment/map format and SAMtools", BIOINFORMATICS, vol. 25.16, 2009, pages 2078 - 2079, XP055229864, DOI: doi:10.1093/bioinformatics/btp352
BENJAMINI; YUVAL; TERENCE P. SPEED: "Summarizing and correcting the GC content bias in high-throughput sequencing", NUCLEIC ACIDS RESEARCH, 2012, pages gksOO1
LIAO, CAN ET AL.: "Noninvasive prenatal diagnosis of common aneuploidies by semiconductor sequencing", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES, vol. 111.20, 2014, pages 7415 - 7420, XP055362638, DOI: doi:10.1073/pnas.1321997111
MINARIK; GABRIEL ET AL.: "Utilization of Benchtop Next Generation Sequencing Platforms Ion Torrent PGM and MiSeq in Noninvasive Prenatal Testing for Chromosome 21 Trisomy and Testing of Impact of In Silico and Physical Size Selection on Its Analytical Performance", PLOS ONE, vol. 10.12, 2015, pages e0144811
DATABASE Assembly [O] 27 February 2009 (2009-02-27), "GRCh37", retrieved from NCBI Database accession no. GCA_000001405.1
SNIJDERS, R. J. M.; N. J. SEBIRE; K. H. NICOLAIDES: "Maternal age and gestational age-specific risk for chromosomal defects", FETAL DIAGNOSIS AND THERAPY, vol. 10.6, 1995, pages 356 - 367
NORTON, MARY E.; LAURA L. JELLIFFE-PAWLOWSKI; ROBERT J. CURRIER.: "Chromosome abnormalities detected by current prenatal screening and noninvasive prenatal testing", OBSTETRICS & GYNECOLOGY, vol. 124.5, 2014, pages 979 - 986
HOOK, ERNEST B.; DOROTHY WARBURTON.: "Turner syndrome revisited: review of new data supports the hypothesis that all viable 45, X cases are cryptic mosaics with a rescue cell line, implying an origin by mitotic loss", HUMAN GENETICS, vol. 133.4, 2014, pages 417 - 424
GRATI, FRANCESCA R. ET AL.: "Fetoplacental mosaicism: potential implications for false-positive and false-negative noninvasive prenatal screening results", GENETICS IN MEDICINE, vol. 16.8, 2014, pages 620 - 624
STUMM, MARKUS ET AL.: "Diagnostic accuracy of random massively parallel sequencing for non-invasive prenatal detection of common autosomal aneuploidies: a collaborative study in Europe", PRENATAL DIAGNOSIS, vol. 34.2, 2014, pages 185 - 191, XP055344715, DOI: doi:10.1002/pd.4278
RUSSELL; STUART; PETER NORVIG: "Artificial Intelligence: A modern approach", ARTIFICIAL INTELLIGENCE. PRENTICE-HALL, EGNLEWOOD CLIFFS, vol. 25, 1995, pages 27
Attorney, Agent or Firm:
HAK, Roman (CZ)
Download PDF:
Claims:
Claims

1. A method for determining sex chromosome aneuploidy, sex and fetal fraction of one or multiple fetuses from a test sample of maternal blood, plasma or serum comprising a mixture of DNA fragments of fetal and maternal origin, wherein said mixture are circulating cell-free DNA molecules, the method comprising neural network model for calculation of fetal fraction distribution and set of hypotheses about sex chromosomal aberrations of the fetus or fetuses in the test sample, wherein the method further comprises the following steps a) performing a random sequencing on at least a portion of a plurality of the cell-free DNA molecules contained in the test sample and in each training sample from the training sets Tf comprising female samples, Tin comprising male samples, and Te comprising samples with single euploid male fetus and with various fetal fraction values /, thereby obtaining sequence information for a plurality of DNA fragments of fetal and maternal origin from the test maternal sample and from each training sample, wherein the sequence information comprises sequence reads;

b) mapping the sequence reads to the reference human genome, and obtaining the percentage of sequences mapped to chromosomes X and Y for said test sample and each training sample from said training sets Tf Tin, and Te;

c) performing GC correction on chromosome X but not on chromosome Y or on chromosome Y and not chromosome X for the said test sample and each of training sample from said training sets Tf, Tin, and Te;

d) obtaining mean mapping ratios with their respective standard deviations for X and Y chromosomes and fictitious Z chromosome by computing the numbers muX, muY, muZ, sdX, sdY, sdZ based on said training sets Tf and Tin , and using said numbers muX, muY, muZ, sdX, sdY, sdZ to prepare the hypotheses about sex chromosomal aberrations of the fetus or fetuses in the test sample, wherein the hypotheses are parametrized by said values muX, muY, muZ, sdX, sdY, sdZ and a separate value of fetal fraction f which ranges from 5% to 100% with a step of 0.1%;

e) training neural network model for calculation of fetal fraction distribution on said training set Te, and using the trained neural network to obtain the estimated fetal fraction / of the test sample and the error of this estimate which are together used to define fetal fraction distribution as a normal distribution with the estimated fetal fraction / as its mean and the error of this estimate as its standard deviation, wherein fetal fraction determination is based on the length of DNA fragments of the test sample;

f) calculating the probability of observing the percentage of sequences mapped to chromosomes X and Y for the test sample under each of the said hypotheses, wherein each hypothesis is parametrized by a value which ranges from 5 % to 100 % with a step of 0.1 , and the computed values muX, muY, muZ, sdX, sdY, sdZ from step d);

g) adjusting the said probabilities from the step f) by multiplying each of the said hypothesis parametrized by a specific value / with the probability of the value / in the fetal fraction distribution obtained in step e);

h) adjusting the said probabilities from the step f) according to a priori information about prevalence of said hypotheses in the general population and optionally according to further relevant a priori information;

i) selecting the most probable of the said hypotheses with the specific parametrization by means of maximum likelihood analysis, thus obtaining the status of sex chromosomal aberrations, sex and fetal fraction for the fetus or fetuses of the test sample.

2. The method according to claim 1, where the sex chromosome aneuploidy is Turner syndrome, X0; Klinefelter syndrome, XXY; triple X syndrome, XXX; and Jacob syndrome, XYY.

3. The method according to claim 1 or 2, where fetal fraction determination is based on the length of DNA fragments limited to all lengths in base pairs from 100 bp to 200 bp.

4. The method according to any one of claims 1 to 3, where maximum likelihood analysis is performed on the said hypotheses parametrized with the values muX, muY, muZ, sdX, sdY, sdZ from step d) and with discretized fetal fraction/ starting at 5 % and by a step of 0.1 % up to 100 %. 5. The method according to any one of claims 1 to 4, where further relevant a priori information is one or more from maternal age, gestation week, maternal history of aneuploid pregnancies, and various biochemical markers such as markers based on alpha- fetoprotein, human chorionic gonadotropin, or estriol.

6. The method according to any one of claims 1 to 5, where the hypotheses about sex chromosomal aberrations of the fetus or fetuses define fetus or fetuses selected from the group consisted of euploid female fetus, euploid male fetus, female fetus with Turner syndrome, female fetus with triple X syndrome, male fetus with Klinefelter syndrome, male fetus with Jacob syndrome for single pregnancy and two euploid female fetuses, two euploid male fetuses, euploid female fetus with euploid male fetus, euploid female fetus with Turner female fetus, euploid female fetus with triple X female fetus, euploid female fetus with Klinefelter male fetus, euploid female fetus with Jacob male fetus, euploid male fetus with Turner female fetus, euploid male fetus with triple X female fetus, euploid male fetus with Klinefelter male fetus, euploid male fetus with Jacob male fetus, two female fetuses with Turner syndrome, Turner female fetus with triple X female fetus, Turner female fetus with Klinefelter male fetus, Turner female fetus with Jacob male fetus, two female fetuses with Triple X syndrome, triple X female fetus with Klinefelter male fetus, triple X female fetus with Jacob male fetus, two male fetuses with Klinefelter syndrome, Klinefelter male fetus with Jacob male fetus and two male fetuses with Jacob syndrome.

7. A computer implemented method for determining sex chromosome aneuploidy, sex and fetal fraction of one or multiple fetuses from a random sequencing on at least a portion of a plurality of the cell-free DNA molecules contained in the sample of maternal blood, plasma or serum comprising a mixture of DNA fragments of fetal and maternal origin, and random sequencing on at least a portion of a plurality of the cell-free DNA molecules contained in the training sample, said method comprising neural network model for calculation of fetal fraction distribution and set of hypotheses about sex chromosomal aberrations of the fetus or fetuses in the test sample, wherein the method further comprises the steps b) to h) of the method of claim 1.

8. The method according to claim 7, where the sex chromosomal aneuploidy is selected from Turner syndrome, X0; Klinefelter syndrome, XXY; triple X syndrome, XXX; and Jacob syndrome, XYY.

9. The method according to claim 7 or 8, where fetal fraction determination is based on the length of DNA fragments limited to all lengths in base pairs from lOObp to 200bp.

10. The method according to any one of claims 7 to 9, where maximum likelihood analysis is performed on the said hypotheses parametrized with the values muX, muY, muZ, sdX, sdY, sdZ from step d) and with discretized fetal fraction/ starting at 5 % and by a step of 0.1 % up to 100 %.

11. The method according to any one of claims 7 to 10, where further relevant a priori information is one or more from maternal age, gestation week, and maternal history of aneuploid pregnancies, and various biochemical markers such as markers based on alpha- fetoprotein, human chorionic gonadotropin, or estriol.

12. The method according to any one of claims 7 to 11, where the hypotheses about sex chromosomal aberrations of the fetus or fetuses define fetus or fetuses selected from the group consisted of euploid female fetus, euploid male fetus, female fetus with Turner syndrome, female fetus with triple X syndrome, male fetus with Klinefelter syndrome, male fetus with Jacob syndrome for single pregnancy and two euploid female fetuses, two euploid male fetuses, euploid female fetus with euploid male fetus, euploid female fetus with Turner female fetus, euploid female fetus with triple X female fetus, euploid female fetus with Klinefelter male fetus, euploid female fetus with Jacob male fetus, euploid male fetus with Turner female fetus, euploid male fetus with triple X female fetus, euploid male fetus with Klinefelter male fetus, euploid male fetus with Jacob male fetus, two female fetuses with Turner syndrome, Turner female fetus with triple X female fetus, Turner female fetus with Klinefelter male fetus, Turner female fetus with Jacob male fetus, two female fetuses with Triple X syndrome, triple X female fetus with Klinefelter male fetus, triple X female fetus with Jacob male fetus, two male fetuses with Klinefelter syndrome, Klinefelter male fetus with Jacob male fetus and two male fetuses with Jacob syndrome.

13. A computer program product comprising a computer readable medium comprising a plurality of instructions for controlling a computing system to perform the method according to any one of claims 7 to 12.

AMENDED CLAIMS

received by the International Bureau on 26 November 2018 (26.11.2018)

1. A method for determining sex chromosome aneuploidy, sex and fetal fraction of one or multiple fetuses from a test sample of maternal blood, plasma or serum comprising a mixture of DNA fragments of fetal and maternal origin, wherein said mixture are circulating cell-free DNA molecules, the method comprising neural network model for calculation of fetal fraction distribution and set of hypotheses about sex chromosomal aberrations of the fetus or fetuses in the test sample, wherein the method further comprises the following steps a) performing a random sequencing on at least a portion of a plurality of the cell-free DNA molecules contained in the test sample and in each training sample from the training sets Tf comprising adult female samples, Tm comprising adult male samples, and Te comprising samples of pregnant women with single euploid male fetus and with various fetal fraction values /, thereby obtaining sequence information for a plurality of DNA fragments of fetal and maternal origin from the test maternal sample and from each training sample, wherein the sequence information comprises sequence reads;

b) mapping the sequence reads to the reference human genome, and obtaining the percentage of sequences mapped to chromosomes X and Y for said test sample and each training sample from said training sets Tf Tm, and Te;

c) performing GC correction on chromosome X but not on chromosome Y or on chromosome Y and not chromosome X for the said test sample and each of training sample from said training sets Tf, Tm, and Te;

d) obtaining mean mapping ratios with their respective standard deviations for chromosome X and chromosome Y based on the said training set Tf wherein the chromosome Y is referred to as fictitious chromosome Z for the training set Tf and wherein the obtained means and standard deviations related to chromosome X and to fictitious chromosome Z are muX and sdX, and muZ and sdZ, respectively;

e) obtaining mean mapping ratio with its respective standard deviation for chromosome Y based on the said training set Tm, wherein the computed mean and standard deviation are muY and sdY, respectively;

f) using said numbers muX, muY, muZ, sdX, sdY, sdZ to prepare the hypotheses about sex chromosomal aberrations of the fetus or fetuses in the test sample, wherein the hypotheses are parametrized by said values muX, muY, muZ, sdX, sdY, sdZ and a separate value of fetal fraction which ranges from 5% to 100% with a step of 0.1%;

g) training neural network model for calculation of fetal fraction distribution on said training set Te, and using the trained neural network to obtain the estimated fetal fraction /of the test sample and the error of this estimate which are together used to define fetal fraction distribution as a normal distribution with the estimated fetal fraction / as its mean and the error of this estimate as its standard deviation, wherein fetal fraction determination is based on the length of DNA fragments of the test sample;

h) calculating the probability of observing the percentage of sequences mapped to chromosomes X and Y for the test sample under each of the said hypotheses, wherein each hypothesis is parametrized by a value which ranges from 5 % to 100 % with a step of 0.1 %, and the computed values muX, muY, muZ, sdX, sdY, sdZ from steps d) and e); i) adjusting the said probabilities from the step h) by multiplying each of the said hypothesis parametrized by a specific value / with the probability of the value / in the fetal fraction distribution obtained in step g);

j) adjusting the said probabilities from the step i) according to a priori information about prevalence of said hypotheses in the general population and optionally according to further relevant a priori information;

k) selecting the most probable of the said hypotheses with respect to the probabilities from any of the steps h), i), j) with the specific parametrization by means of maximum likelihood analysis, thus obtaining the status of sex chromosomal aberrations, sex and fetal fraction for the fetus or fetuses of the test sample.

2. The method according to claim 1, where the sex chromosome aneuploidy is Turner syndrome, X0; Klinefelter syndrome, XXY; triple X syndrome, XXX; and Jacob syndrome, XYY.

3. The method according to claim 1 or 2, where fetal fraction determination is based on the length of DNA fragments limited to all lengths in base pairs from 100 bp to 200 bp.

4. The method according to any one of claims 1 to 3, where maximum likelihood analysis is performed on the said hypotheses parametrized with the values muX, muY, muZ, sdX, sdY, sdZ from steps d) and e) and with discretized fetal fraction / starting at 5 % and by a step of 0.1 % up to 100 %.

5. The method according to any one of claims 1 to 4, where further relevant a priori information is one or more from maternal age, gestation week, maternal history of aneuploid pregnancies, and various biochemical markers such as markers based on alpha-fetoprotein, human chorionic gonadotropin, or estriol.

6. The method according to any one of claims 1 to 5, where the hypotheses about sex chromosomal aberrations of the fetus or fetuses define fetus or fetuses selected from the group consisted of euploid female fetus, euploid male fetus, female fetus with Turner syndrome, female fetus with triple X syndrome, male fetus with Klinefelter syndrome, male fetus with Jacob syndrome for single pregnancy and two euploid female fetuses, two euploid male fetuses, euploid female fetus with euploid male fetus, euploid female fetus with Turner female fetus, euploid female fetus with triple X female fetus, euploid female fetus with Klinefelter male fetus, euploid female fetus with Jacob male fetus, euploid male fetus with Turner female fetus, euploid male fetus with triple X female fetus, euploid male fetus with Klinefelter male fetus, euploid male fetus with Jacob male fetus, two female fetuses with Turner syndrome, Turner female fetus with triple X female fetus, Turner female fetus with Klinefelter male fetus, Turner female fetus with Jacob male fetus, two female fetuses with Triple X syndrome, triple X female fetus with Klinefelter male fetus, triple X female fetus with Jacob male fetus, two male fetuses with Klinefelter syndrome, Klinefelter male fetus with Jacob male fetus and two male fetuses with Jacob syndrome.

7. A computer implemented method for determining sex chromosome aneuploidy, sex and fetal fraction of one or multiple fetuses from a random sequencing on at least a portion of a plurality of the cell-free DNA molecules contained in the sample of maternal blood, plasma or serum comprising a mixture of DNA fragments of fetal and maternal origin, and random sequencing on at least a portion of a plurality of the cell-free DNA molecules contained in the training sample, said method comprising neural network model for calculation of fetal fraction distribution and set of hypotheses about sex chromosomal aberrations of the fetus or fetuses in the test sample, wherein the method further comprises the steps b) to h) of the method of claim 1.

8. The method according to claim 7, where the sex chromosomal aneuploidy is selected from Turner syndrome, XO; Klinefelter syndrome, XXY; triple X syndrome, XXX; and Jacob syndrome, XYY.

9. The method according to claim 7 or 8, where fetal fraction determination is based on the length of DNA fragments limited to all lengths in base pairs from lOObp to 200bp.

10. The method according to any one of claims 7 to 9, where maximum likelihood analysis is performed on the said hypotheses parametrized with the values muX, muY, muZ, sdX, sdY, sdZ from step d) and with discretized fetal fraction/ starting at 5 % and by a step of 0.1 % up to 100 %.

11. The method according to any one of claims 7 to 10, where further relevant a priori information is one or more from maternal age, gestation week, and maternal history of aneuploid pregnancies, and various biochemical markers such as markers based on alpha- fetoprotein, human chorionic gonadotropin, or estriol.

12. The method according to any one of claims 7 to 11, where the hypotheses about sex chromosomal aberrations of the fetus or fetuses define fetus or fetuses selected from the group consisted of euploid female fetus, euploid male fetus, female fetus with Turner syndrome, female fetus with triple X syndrome, male fetus with Klinefelter syndrome, male fetus with Jacob syndrome for single pregnancy and two euploid female fetuses, two euploid male fetuses, euploid female fetus with euploid male fetus, euploid female fetus with Turner female fetus, euploid female fetus with triple X female fetus, euploid female fetus with Klinefelter male fetus, euploid female fetus with Jacob male fetus, euploid male fetus with Turner female fetus, euploid male fetus with triple X female fetus, euploid male fetus with Klinefelter male fetus, euploid male fetus with Jacob male fetus, two female fetuses with Turner syndrome, Turner female fetus with triple X female fetus, Turner female fetus with Klinefelter male fetus, Turner female fetus with Jacob male fetus, two female fetuses with Triple X syndrome, triple X female fetus with Klinefelter male fetus, triple X female fetus with Jacob male fetus, two male fetuses with Klinefelter syndrome, Klinefelter male fetus with Jacob male fetus and two male fetuses with Jacob syndrome. 13. A computer program product comprising a computer readable medium comprising a plurality of instructions for controlling a computing system to perform the method according to any one of claims 7 to 12.

Description:
A method for non-invasive prenatal detection of fetal sex chromosomal abnormalities and fetal sex determination for singleton and twin pregnancies

FIELD OF THE INVENTION The invention relates generally to the field of non-invasive prenatal screening and diagnostics. The invention provides a method for detection of the presence or absence of sex chromosomal aneuploidies, particularly monosomy and trisomy of chromosome X, Klinefelter and XYY syndrome of chromosome Y, sex, and the proportion of DNA fragments of fetuses from the blood sample taken from mother in early stage of pregnancy. Moreover, the invention provides a novel approach to calculation of fetal fraction of cell free DNA fragments which is used internally, and which can be used separately in other fields of non-invasive prenatal screening and diagnostics. The invention relates to single or twin pregnancies. The extension of the invention to triple or quadruple pregnancies with future technologies is contemplated.

BACKGROUND OF THE INVENTION

Nowadays, prenatal testing is an integral component of obstetric practice. The primary aim of prenatal testing is screening for fetal aneuploidies, such as trisomy of chromosome 21 (Down syndrome), trisomy 18 (Edwards syndrome), and trisomy 13 (Patau syndrome). Other major group of abnormalities are sex chromosome aberrations (SCAs), such as monosomy X (X0, Turner syndrome), XXY (Klinefelter syndrome), XXX (triple X syndrome), and XYY (Jacob syndrome). Although the majority of fetuses with aneuploidy result in termination during the development of the fetus, the SCAs are rarely lethal and their phenotypic features are less severe than autosomal chromosomal aberrations. The most common monosomy X has been estimated to occur in 1-1.5% of pregnancies, and it is a common cause of first trimester pregnancy loss (approx. 23%). Therefore, the prenatal detection of SCAs is important prenatal genetic test for prenatal screening or diagnostics. Reliable invasive prenatal tests are available, however, because of their risky nature, they are currently preformed only in high- risk pregnancies. Developing a reliable method for non-invasive prenatal testing (NIPT) for fetal aneuploidies seems to be recent challenge of the utmost importance in prenatal care. The discovery of circulating cell-free fetal DNA (cffDNA) in maternal blood (Lo et al., Lancet 350:485-487, 1997 1 ) has offered the possibility for developing non-invasive processes that use fetal nucleic acids from a maternal peripheral blood sample to determine fetal chromosomal abnormalities. CffDNA constitutes approximately less than 10% of the total circulating cell-free DNA (cfDNA) in maternal plasma, however it has recently been found that the entire fetal genome, in the form of cffDNA, is present in maternal blood and thus it is very promising material for NIPD.

Several documents {Chiu et al 2008 2 , Fan et al. 2008 3 , Chiu et al 2011 4 , Sehnert et al 2011 5 , Lau et al 2102 6 , Bianchi et al. 2012' Straver et al. 2014 s , Yu et al. 2014 9 , Tynan et al. 2016 10 , EP2183693, EP2366031, US8296076) disclosed the methods in which sequencing of DNA in maternal blood is used to obtain information on aberrant chromosome dosage in fetus. Particular interest in methods for detection of SCAs is disclosed documents Zimmermann et al. 2012 11 , Mazloom et al. 2013 12 , Liang et al. 2013 13 , Wang et al. 2014 14 . All methods mentioned above use analysis of the total cfDNA in maternal plasma without the need to isolate fetal- specific DNA, cffDNA. These methods are based on the detection of the extra copy or loss of chromosome to distinguish normal cases from trisomy or monosomy cases. In case of a fetus with trisomy (or monosomy), the number of copies of trisomic (monosomic) chromosome in the maternal blood is slightly higher (lower) in comparison with other autosomes. Basically, the same applies to sex chromosome aberrations, where an additional or missing copy is either from chromosome X or chromosome Y. The advent of mass parallel DNA sequencing (MPS) permitted sequencing of extremely large quantities of DNA molecules. Thus, next- generation DNA sequencing (NGS) have recently been used to detect non-invasive fetal trisomy from maternal blood.

Generally, the detection of fetal SCAs such as monosomy or trisomy of X in female fetuses using NGS is done through the following process. First, a short region at one end of each DNA molecule of maternal plasma is sequenced and mapped against the reference human genome to determine the chromosomal origin of each sequence. Next, the amount of the sequenced tags from the chromosome of interest (e.g. chromosome X) is compared with some kind of reference value obtained from a cohort of euploid samples. The crucial step of an analysis is to estimate proportion of DNA fragments originated from fetus, named fetal fraction. In case of low proportion diagnosed sample may be incorrectly classified as healthy based on abundance of maternal fragments.

DESCRIPTION OF THE PRIOR ART The methods of Chiu et al. 2008, Chiu et al. 2011, Sehnert et al. 2011, Bianchi et al. 2012, Lau et al. 2012 for the detection of autosomal aneuploidies such as trisomy 21 are based on whole genome sequencing using MPS and z-score determination. Although z-score is widely accepted as the standard parameter used for detection of aneuploid samples, there are differences in its calculation. Chiu et al. 2011 disclosed an approach using reads mapped to all chromosomes used as reference for z-score calculation, Lau et al. 2012 disclosed an approach using reads mapped to some specific chromosome, e.g. 14 as reference for T21 trisomy, and Sehnert et al. 2011 chose chromosome 9 to be the optimal internal reference for the chromosome 21. We consider these methods as approaches that can be applied to SCAs as well, as is particularly noted in some of the documents. The method of Zimmermann et al. 2012 is based on targeted sequencing of specific polymorphic loci (SNP). The method generates multiple hypotheses how should the sequencing data look like for monosomic, disomic, or trisomic chromosome with a particular fetal fraction. Likelihood of each hypothesis is then calculated given the observed data, and the most likely hypothesis is selected. Part of the method relies on adaptation of proprietary alignment algorithm Novoalign (Novocraft, Selangor, Malaysia), a proprietary statistical algorithm Parental Support implemented in MATLAB (MathWorks, Natic, MA, USA), and data from Hapmap Database 15 . Potential disadvantage of this method with respect to the cost of the test is the requirement to sequence parental DNA to measure parental genotypes.

The method of Mazloom et al. 2013 is based on over and under representation of X and Y chromosomes when compared to a cohort of euploid samples with female fetuses in case of chromosome X, and a cohort of euploid males in case of chromosome Y. The method appeats to be similar to that of Chiu et al. 2008. No z-score or hypothesis likelihood is reported in sample diagnostics. However, a z-score on chromosome X is used internally to define classification regions (see Supporting information). A decision tree is used to classify a sample (45 X, 46 XX, 47 XXX, 47 XXY, 47 XYY, or 46 XY). The method of Liang et al. 2013, which is said to calculate trisomy of 9 , 13 , 18 , 21 s , as well as sex chromosome aberrations X0, XXX, XXY, and XYY, is based on normalized chromosomal values NCR, which equals the count of the sequences uniquely mapped to the chromosome of interest/total count of the sequences uniquely mapped to all the autosomal chromosomes. The NCR values are used to produce z-scores in a manner common to the art. Classification of the samples to euploid or aberrant is done by means of this z-score and various cut-off values, different for each aneuploidy.

The method of Wang et al. 2014 is based on dividing each chromosome in to contiguous 20kbp bins. Given a training set of euploid pregnancies, a special normalized read number for each bin is calculated, and the median value is stored. Then, to ascertain the gain or loss of chromosome regions, similar bins of a test sample are compared with the stored median values by means of a fused lasso algorithm (least absolute shrinkage and selection operator). The level of chromosome mosaicism is reported as well, and it is calculated by means of normalized chromosome representations NCR as (NCRj - NCRf)/NCRf, where NCRj is NCR of any chromosome of a test sample, and NCRf is the mean NCR of the same chromosome in a set of reference samples.

Generally, there are three main stages of the determination of aneuploidy of fetus from a maternal blood, plasma or serum sample: 1) preparation of DNA sample and DNA library, 2) sequencing, and 3) analysis of the sequence data. Sequencing made remarkable progress in the past few years, however, in the first and the third stage there is a space for improving that might have low cost and big impact on the quality of the testing.

There are various approaches how to improve the first stage, as disclosed for example in EP2366031 (Rava, R.P. et al., assigned to Verinata Health, Inc., US). The document disclosed a method for prenatal screening and diagnostics of fetal chromosomal aneuploidy on the basis of NGS comprising a novel protocol for preparing sequencing libraries from a maternal sample. The novel approach in preparing sequencing libraries comprises the consecutive steps of end-repairing, dA-tailing and adaptor ligating said nucleic acids, and wherein said consecutive steps exclude purifying the end-repaired products prior to the dA-tailing step and exclude purifying the dA-tailing products prior to the adaptor-ligating step. The method allows for determining copy number variations (CNV) of any sequence of interest. No z-score was determined as a decisive value in this method. Another improvement was disclosed in WO 2011/051283 (Benz, M. et al., assigned to Lifecodexx, AG, DE). The method for non-invasive diagnosis of chromosomal aneuploidy disclosed therein is improved by the enrichment and quantification of selected cfDNA sequences in a maternal blood sample.

Description of the prior art for estimating fetal fraction

Since male fetus have different pair of sex chromosomes (X and Y) than mother (two copies of the X), fetal fraction is routinely estimated from abundance of fragments in these chromosomes. This approach is however limited to samples with a male fetus. Fraction of female fetus may be estimated from characteristics differing between fetal and maternal fragments.

Differences in fragment localization across genome were utilized in the SeqFF method {Kim et al. 2015 16 ). They demonstrate that fetal fractions vary in different regions of genome and significantly correlates with GC content and presence of exonic regions. The method therefore uses normalized, binned counts as predictor variables and estimates fetal fraction using standard multivariate regression models.

Method Sanefalcon (Straver et al. 2016 17 ) uses more detailed information of fragment origin based on different mechanisms of fragment degradation. They used proportion of consistent fragments with precalculated genomic locations of nucleosomes as fetal fraction predictor, based on assumption that maternal fragments originate more often on positions of nucleosomes than fetal fragments.

Another promising characteristic is a length of a fragment. Since fetal fragments tends to be shorter, profiles of maternal and fetal fragment lengths differ significantly and may be used as predictor of fetal fraction. The length-based method has been proposed in Yu et al. 2014 18. Fetus genome inherited genomic information evenly from father and mother. Based on origin, it differs in certain positions, most notably in point changes of nucleotides called SNP. Fragments that are consistent with SNPs specific to father are most likely of fetal fraction and may be used in the prediction. The method FetalQuant {Jiang et al. 2016 19 ) uses this information; however, it requires another laboratory assay for genetic map of parents. Application of the method is thus too time-consuming and expensive for routine diagnosis.

Despite the existence of several methods for non-invasive detection of fetal sex chromosomes aneuploidy, there is still a need for alternative method that would be at least as sensitive and specific as the present methods and less costly at the same time. In addition, current methods for fetal fraction prediction are still limited by their precision or requirement for additional laboratory assays.

DESCRIPTION OF THE INVENTION The present invention provides alternative and reliable method that is applicable to the practice of non-invasive prenatal screening for sex chromosomes aneuploidies such as monosomy X (X0, Turner syndrome), XXY (Klinefelter syndrome), XXX (triple X syndrome), and XYY (Jacob syndrome). It provides simultaneous diagnosis of aneuploidy, sex and fetal fraction using single model which can be easily expanded to another chromosomal aneuploidies and disorders, such as monosomy or trisomy of 13 th , 18 th , 21 st or any other chromosome. Additionally, the method uses internally a novel method for the calculation of fetal fraction of cfDNA fragments in maternal blood, which can be used separately in other fields than SCA determination such as (but not limited to) trisomy T21, T18, or T13 detection. Additionally, the method needs relatively low amount of sequencing data; therefore, the method is relatively cheap and would be affordable even for the small healthcare institutions.

The following description will explain the main features of the method of the present invention, however, it does not imply that the invention must include all features and aspects described herein. The skilled person will get full understanding of the present invention from the following description together with the examples, where some specific features and aspects will be explained in more details.

The technical and scientific terms used herein have the same meaning as commonly understood by the persons skilled in the art of medicine, molecular genetics, prenatal non-invasive diagnostics, molecular biology and bioinformatics. Some specific terms are explained bellow. Definitions

Aneuploidy is used in the common sense known to the person skilled in the art, it means an imbalance of genetic material caused by a loss or gain of a whole chromosome or part of chromosome. In other words, it means the presence of the entire excessive chromosome or the absence of full chromosome or partial chromosome duplication or deletions of a significant size (> 1 kbp). Specifically, T21, T18 and T13 denote the most common types of autosomal trisomies that survive to birth in humans, which are trisomy of chromosome 21 resulting in Down syndrome, trisomy of chromosome 18 resulting in Edwards syndrome and trisomy of chromosome 13 resulting in Patau syndrome. Additionally, X0, XXX, XXY, and XYY denote the common types of sex chromosomal aneuploidies that survive to birth in humans, which are monosomy of chromosome X resulting in Turner syndrome, trisomy of chromosome X resulting in triple X syndrome, karyotype 47 XXY resulting in Klinefelter syndrome, and karyotype 47 XYY resulting in Jacob syndrome.

The term massively parallel sequencing (MPS), used in combination with new generation sequencing (NGS) means techniques, that are well known to the persons skilled in the art, of sequencing of huge amount, i.e. in millions or tens of millions order, fragments of DNA, as it is practiced in the art using Illumina or IonTorrent analyzers. Protocols for whole genome sequencing are known to the skilled person and can be found in the examples below.

Sequence reads being the short DNA sequences obtained from NGS sequencing, however, long enough {e.g. at least about 30bp) to serve as sequence tags, i.e. that can be assigned unambiguously to one of the chromosomes (1-22, X, Y). Small degree of mismatch can be usually allowed (lbp). In fact, the tags are assigned or rather mapped reads. The tags are uniquely mapped to a reference genome i.e. they are assigned to a single location to the reference genome. Tags that can be mapped to more than one location on a reference genome, i.e., tags that do not map uniquely, are excluded from the analysis.

Mapping means an alignment of the sequence information from NGS (i.e. DNA fragment the genomic position of which is unknown) with a matching sequence in reference human genome. This can be done be several ways, we used the method of Liu et al 2014. As used

20 herein, the term human reference genome or reference genome refers to hgl9 sequence . As used herein, the terms aligned or alignment refer to one or more sequences that are identified as a match in terms of the order of their nucleic acid molecules to a known sequence from a reference genome. Such alignment can be done manually or by a computer algorithms that are well known to the persons skilled in the art of molecular biology and bioinformatics. The matching of a sequence read in aligning can be a 100% sequence match or less than 100% (non-perfect match).

Fetal fraction is the proportion of cfDNA fragments originated from the fetus compared to all sequenced fragments.

Normal (or Gaussian) distribution is a common continuous probability distribution. It is defined by its probability density function f (x| μ, σ)

1 {χ-μ) 2

ί(χ| μ, σ) = 2σ 2

λΐΐπσ 2

We will often denote this function as Ν(μ, σ 2 ). If some stochastic variable X has normal distribution, we will denote this with Χ~Ν(μ, σ 2 ).

Artificial neural network is computational approach used in computer science and other research disciplines, which is based on a large collection of neural units (artificial neurons), loosely mimicking the way a biological brain solves problems with large clusters of biological neurons connected by axons. Each neural unit is connected with many others, and links can be enforcing or inhibitory in their effect on the activation state of connected neural units. Each individual neural unit may have a summation function which combines the values of all its inputs together. There may be a threshold function or limiting function on each connection and on the unit itself, such that the signal must surpass the limit before propagating to other neurons.

21

FASTQ format is a text-based file format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores. Both the sequence letter and quality score are each encoded with a single ASCII character for brevity. It was originally developed at the Wellcome Trust Sanger Institute to bundle a FASTA sequence and its quality data, but has recently become the de facto standard for storing the output of high-throughput sequencing instruments such as the Illumina Genome Analyzer. SAM is a text-based file format for storing biological sequences aligned to a reference sequence developed by Heng Li. The acronym SAM stands for Sequence Alignment/Map. It is widely used for storing data, such as nucleotide sequences, generated by Next generation sequencing technologies.

GC content bias describes the dependence between fragment count (read coverage) and GC content found in Illumina sequencing data. This bias can dominate the signal of interest for analyses that focus on measuring fragment abundance within a genome, such as copy number estimation. The bias is not consistent between samples; and there is no consensus as to the best methods to remove it in a single sample. There are many proposed approaches to correcting this bias, such as Benjamini and Speed 2012 23 or Liao et al. 201424.

The term hypothesis in the context of this patent refers to the possible status of fetal sex chromosomal aberrations together with the fetal fraction of cfDNA fragments in maternal blood, plasma or serum. A part of the method according to the invention is formulation of such hypotheses in probability theory, and calculating the probability of each such hypothesis based on the observations made from cfDNA fragments found in the said blood, plasma or serum.

SUMMARY OF THE METHOD ACCORDING TO THE INVENTION

The present invention relates to the method for determining aneuploidy of fetal sex chromosomes from a maternal blood, plasma or serum sample comprising a mixture DNA fragments of fetal and maternal origin, wherein the mixture of DNA fragments of fetal and maternal origin are circulating cell-free DNA molecules, said method comprising four main stages: 1) obtaining and treating samples of maternal blood, 2) preparation of DNA sample and DNA library, 3) sequencing, and 4) analysis of the sequence data to obtain a prediction, or in other words, a diagnosis. The important part of the method is preparation of the training data, which needs to pass through the same laboratory process as the test samples (samples under examination).

The starting point of the present method is the analysis of the maternal sample, i.e., peripheral blood, thus the method is non-invasive. The peripheral blood comprises cell free DNA (cfDNA) that is a mixture of fragments of maternal DNA and fetal DNA (cffDNA). Said fragments are further named also cfDNA fragments or briefly DNA fragments. CffDNA is what matters, therefore fetal DNA can be enriched (by selection of the shorter fragment, either in silico or physical selection as in Minarik et al. 2015 25 ). Then, the total DNA sample (still comprising DNA fragments of both maternal and fetal origin, i.e., it is a mixed sample) is subjected to the massively parallel sequencing by NGS approach to obtain huge number of short sequence reads. These reads serve as sequence tags, i.e., they are mapped to a certain genomic region or chromosome.

The present method determines the likelihood of observing NGS sequencing data for chromosome X and Y given a hypothesis about fetal sex chromosomes (e.g., female fetus with Turner syndrome). Various hypotheses relating to the various fetal SCAs are constructed, and the hypothesis with maximum likelihood is selected as the most probable case. The hypotheses are formulated for single or twin pregnancy, male or female fetus, with euploid or aberrant sex chromosomes, and for all fetal fractions (discrete values ranging from 5 % to 100 % with a step of 0.1 %). To increase the precision of the hypothesis selection, a priori fetal fraction distribution (i.e., the likelihood of a particular value being the true fetal fraction of cfDNA in a given sample) is calculated from other independent sequencing data (but still contained in the same sample's NGS sequencing data, so the cost of the test is not increased). In addition to that, a priori occurrence of SCAs is gathered from the current research to scale the proposed hypotheses so that they reflect the real expectations (e.g., expectation of two female fetuses with Turner syndrome is less probable than the expectation of having two euploid female fetuses). Thus, the method incorporates

1) NGS sequencing data of chromosome X and Y,

2) A priori fetal fraction distribution obtained from other aspects of the same NGS data (e.g. fragment length),

3) A priori information about SCAs prevalence in population (published data or own research or both), into one consistent model to produce the most probable diagnosis.

The improvement over prior art is manifold:

1) Novel approach to SCAs detection for single pregnancies.

2) Consistent inclusion of twin pregnancies in SCAs detection. 3) Method applicable to monosomy or trisomy of any fetal autosome in single pregnancy in a consistent way.

4) Easy theoretical extension to any larger number of fetuses (e.g. triplets). However, current sequencing technologies so far does not allow this for a reasonable cost in wide practice.

5) Establishing (by a separate calculation) fetal fraction distribution in the process as opposed to single number in prior art. Also, this fetal fraction distribution can be used separately in other fields.

6) Incorporation of separately calculated fetal fraction distribution into the calculation of SCAs.

7) Incorporating the prevalence of the SCAs in the population into the computation. This is especially crucial in twin pregnancies, where different SCAs can compensate each other's deficits/excesses in NGS data.

8) The method does not require sequencing of parental DNA as in Zimmermann et al.

2011.

9) The method is more tailored to the problem at hand when compared with methods adapted from trisomy detection such as in Chiu et al. 2008, Chiu et al. 2011, Sehnert et al. 2011, Bianchi et al. 2012, Lau et al. 2012.

10) Improvements in data processing prior the analysis (regarding the application of GC correction, see below).

DETAILED DESCRIPTION OF THE METHOD ACCORDING TO THE INVENTION

Below we assume that each sample (i.e. test sample as well as training sample) mentioned in this description is a result of processing of blood sample in the laboratory according to the description below (see Example 2) and, subsequently, subject to next generation sequencing, i.e., it is a collection of NGS data (sequenced cfDNA fragments) stored in the FASTQ (or equivalent) format. Moreover, we will assume that the sample's FASTQ file passed a mapping process, where the content of this file is aligned to a reference human genome (such as hgl9 26 ), and this mapping information is stored in a SAM (or equivalent) file format. Finally, we will assume that the content of the SAM file was processed, with the optional filtration of the data, such as removing ambiguous mappings, or GC bias correction, so that we know for the given sample the amount of cfDNA fragments that originated from any chromosome in the absolute as well as relative terms (here the relative is understood with respect to global number of cfDNA fragments in the SAM file, perhaps after filtration and bias correction). The method described below is not particularly dependent on the implementation of the described data processing (provided the resulting data are of sufficient quality). The above mentioned data processing is well known in the prior art, and apart from our improvements (see section Improvements in the initial data processing), it shall not be described further. The cfDNA fragments are further named also DNA fragments or simply cfDNA interchangeably. The method according to the present invention comprises a likelihood analysis of several tailor-made hypotheses about the fetus' (or fetuses', in case of twin pregnancy) sex chromosomes condition. The case of pregnancy, i.e., single or twin, is specified by the operator before using the method (the type of pregnancy is determined by ultrasonic examination of the mother). The method operates in steps described below, some of which needs to be carried out only once. Gathering of the training data and hypothesis formulation needs to be performed only once, at the beginning of the implementation of the method into the practice. The training of the neural network model for calculation of fetal fraction distribution needs to be performed only once as well. On the other hand, fetal fraction distribution must be calculated by the trained neural network for each test sample. Similarly, the probability of each hypothesis given the specific NGS data must be calculated for each test sample separately.

Gathering training data. First, we collect a cohort of female samples (of size at least 100), and a cohort of male samples (of size at least 100). We will denote the first training set as Tf and the second as Tm. From the first training set Tf we obtain a mean mapping ratio for chromosome X (i.e., the average fraction of sequenced reads mapped to chromosome X). However, what we actually observe is the mean mapping ratio for two X chromosomes because euploid females have two copies of chromosome X. We use a statistical analysis to determine the mapping ratio for one chromosome (described below). Furthermore, there usually are some reads mapped to chromosome Y even for women, which are mostly erroneous mappings from homologous areas with chromosome X. Such mappings are in the set Tf as well. Therefore, we create fictitious chromosomes, here termed chromosomes Z, which produce reads mapped to chromosome Y from chromosome X. The number of chromosomes Z will depend on the number of chromosomes X. Thus, a female sample will have two chromosomes Z. A fetus with Turner syndrome will have only one chromosome Z, while a fetus with triple X syndrome will have three. A female with euploid female fetus will have in sum two (a part of the two will come from mother and another part will come from the fetus). A female with Turner female fetus will have a fraction between one and two which will depend on the fetal fraction of cfDNA fragments belonging to the fetus (the mother will supply two chromosomes Z for her part, but the fetus will only supply one; thus, the sum of the chromosomes Z will depend on the proportion of cfDNA fragments supplied by mother and fetus - fetal fraction). Additionally, male samples or samples with male fetus will have chromosomes Z as well. This is because such samples also have chromosomes X which produce erroneous reads mapped to chromosome Y as well (the number of chromosomes X, and thus chromosomes Z, depends on the status of sample's sex and sex chromosome aberrations). We calculate the mean mapping ratio for chromosome Z from first training set Tf. We use the same statistical approach as in the case with chromosome X, since there are two chromosomes Z because there are two chromosomes X in each sample from Tf. Finally, we obtain the mean mapping ratio for chromosome Y from the male training set Tin. Thus, we have three trained numbers from here on termed as muX, muY, muZ (mathematical details are stated below). Furthermore, we calculate the standard deviations of these numbers in an analogous manner (mathematical details below), which gives as additional three numbers from here on termed as sdX, sdY, sdZ.

In addition to the training sets Tf and Tin, we collect a cohort of samples (at least 300) with single euploid male fetus and with various fetal fraction values / (ranging from 5% to 30%, determined from the chromosome Y as in the prior art) that will be used for training fetal fraction prediction model (see below). This training set will be referred to as Te.

Note that the selection of the three training sets Tf Tin, and Te does not depend on the sample(s) to be tested for SCAs. We only require that the selection of the samples into the training sets is not biased, i.e., the samples are drawn randomly from a large population, and no sample is repeated in any of the training sets. Formulating hypotheses. Once we have obtained the six numbers from previous paragraph (muX, muY, muZ, sdX, sdY, sdZ), we define mapping hypotheses for each sex chromosome aberrations including two euploid cases (X0, XX, XXX, XY, XXY, XYY for single pregnancies, and all their combinations in twin pregnancies (e.g., X0-X0 for twins with Turner syndrome, XO-XX for twins where one is euploid and the other has Turner syndrome, XX-XY for male and female euploid twins and so on). In each of these hypotheses we assume that the mother is euploid, i.e., she does not have any sex chromosome aberrations nor any large subchromosomal deletions or insertions. Moreover, each of these hypotheses is parameterized by a value of fetal fraction of cfDNA fragments in maternal blood. In case of single pregnancy, this fetal fraction will be denoted with the symbol In case of twin pregnancy, there are two fetal fractions, and we will denote them with symbols fl and 2. Fetal fraction distribution - single pregnancy. Naturally, we expect observing fetal fraction 10% to be more probable than, say, 60%. To quantify such expectation with mathematical rigor, we estimate the fetal fraction distribution, i.e., probability function assigning any discretized value of fetal fraction the probability to be the true fetal fraction of cfDNA fragments (for a particular sample), using other available sequencing data, namely cfDNA fragment lengths. Note that we use discretized values of fetal fractions because we do not need higher precision than 0.1% (the variation in the source data in the population is in the order of percents). The method of fetal fraction determination from fragment length is described in Yu et al. 2014. However, in this publication the method assigns only one value to fetal fraction of a given sample. Such approach cannot take into account the error of their estimation. Our approach, on the other hand, utilizes an error of the estimation to create a fetal fraction distribution which is a more robust approach as it incorporates more information. Thus, we do not use any of the algorithms described in the cited publication. Rather, we propose a novel approach not yet described in prior art (see below). We then use this estimated fetal fraction distribution as an a priori information for the assessment of SCAs. For the future reference, we denote the fetal fraction distribution with N(muF, sgF ).

Fetal fraction distribution - twin pregnancy. Currently, there is no known method that can determine fetal fractions fl and f2 for the twins. Until such method is conceived, we will assume that the fetal fraction distribution is the same for both fetuses, i.e., N(muFl, sgFl )=

N(muF2, sgF2 2 )= N(muF/2, sgF 2 /2). Here N(muF, sgF 2 ) is fetal fraction distribution determined as if it was a single pregnancy. Hypothesis selection. Finally, the likelihood of the given sample's NGS sequencing data is calculated under each hypothesis and for each fetal fraction / (resp. /i and 2), where the fetal fraction/is discretized, i.e., it starts at 5% and by a step of 0.1% goes to 100% (we require the sample to have at least 5% of fetal fraction of cfDNA fragments). Also, the probability of each fetal fraction, given by the determined fetal fraction distribution, is taken into account. The mathematical details are given below. The hypothesis and fetal fraction with maximum likelihood is selected as the most probable case for the given sample. Moreover, the probability of the considered hypotheses is scaled based on their prevalence in population. Such information can be extracted from the current research on fetal aneuploidy. Hypothesis interpretation. In case of single pregnancy, the interpretation of the selected hypothesis is straightforward. We discuss several such cases in examples below. In case of twin pregnancies, the interpretation is not so straightforward because some results may not be interpreted unambiguously. For example, two euploid female fetuses, with equivalent fetal fraction fl=f2, will appear very similar to one Turner and one triple X female fetus (here the additional chromosome X of one fetus compensates the missing chromosome X of the other). In this case, the prevalence of the SCAs in population provides assistance for the decision. Additionally, knowing that there are only two possibilities (XX+XX or X0+XXX) can help the operator to suggest additional tests that provide decisive answer. More details will be given below in the examples.

Statistical approach for the determination of the mean mapping ratios

Consider the training set Tf. Let S be any it's sample. Then, the cfDNA fragments that are mapped to chromosome X (and analogously to chromosome Z) originated from either of the sample's chromosomes X. However, what we can observe is only one cumulative number of all DNA fragments mapped to chromosome X. Formally, we can model this setup as a sum of two independent random variables: XI and X2. We will assume that both random variables are normally distributed with unknown parameters muX and sdX (formally, Xi ~ N(muX, sdX ), i=l,2). From the theory of statistics, it holds that the sum a*Xl+b*X2 (where a, b are some real constants) of normally distributed independent random variables XI and X2 is again normally distributed random variable O. It also holds that O ~ N(muO, sdO ), where muO = a*muX + b*muX = (a + b)*muX and sdO 2 = a 2 *sdX 2 + b 2 *sdX 2 = (a 2 + b 2 )*sdX 2 . Since both chromosomes X of sample S are equally likely to contribute to the observed number of fragments mapped to chromosome X, we also have a = b = 1/2. Thus, muO = muX and sdO

2,

Y2*sdX . The numbers muO and sdO 2,

= can be determined from the training set Tf. Hence, we have the sought numbers muX and sdX .

2

The numbers muZ and sdZ are determined analogously from the fragments mapped to chromosome Y in the training set Tf.

2

On the other hand, the numbers muY and sdY can be determined directly from the male training set Tin because there is only one chromosome Y in any sample from this set (we assume that the training set contains only euploid males). However, to account for the errors quantified by chromosome Z, we should subtract the value muZ from the mean chromosome Y ratio obtained from training set Tin in order to get a better value of muY. Similarly, we should subtract sdZ from the variance of the chromosome Y ratio obtained from the training

2

set Tin in order to get a better value sdY .

Determination of a priori fetal fraction distribution - single pregnancy

We based our fetal fraction model on fragment lengths. This aspect of NGS data carries enough information to sufficiently estimate fetal fraction as shown in Yu et al. 2014, and its nature is independent from the chromosome mapping data. While the prior art method utilized various ratios to train and model fetal fraction correlation with fragment length distribution in order to produce one result value, our approach is built on an artificial neural network. Moreover, our approach results in a probability distribution rather than one number.

Our neural network consists of two layers: a base layer for input nodes and an output layer consisting of one output node (with sufficiently large training set, a more complex network with hidden layers can be designed). Moreover, each input node is connected with the output node. A detailed description of the network can be found in Example 1 below.

First, the neural network is trained on the third of the samples from the training set Te. A sample's sequencing data, namely mapped cfDNA fragments from all chromosomes, are classified according to their length, which results in a data histogram (classification lengths are limited to all lengths in base pairs from lOObp to 200bp, all other fragments were discarded). The neural network is trained in the usual way, known to the person skilled in the art. The guiding values during the training are fetal fractions obtained from cfDNA fragments mapped to chromosome Y as in the prior art.

Second, once the neural network is trained, we calculate the fetal fraction of the second third of the samples from the training set Te according to the trained neural network as well as chromosome Y (as in the prior art). By comparing these two sets of fetal fraction, we can fit a linear model f(x) = a*x + b to the data to correct any discrepancy between Y-based and length-based fetal fraction.

Finally, the last third of the samples from the training set Te is used to assess the error of the length-based fetal fraction. We calculate the fetal fraction of this last third of the training samples according to the neural network with the linear transformation as well as chromosome Y. By comparing these values pair-wise for each sample, we calculate the standard deviation of the differences between length-based and Y-based fetal fractions.

Once the neural network with the linear transformation is trained, it can be directly used to predict fetal fraction of a given sample. Moreover, by comparing the predicted fetal fractions with known Y-based fetal fractions in last third of the training set Te, we can calculate the error of the prediction. Thus, instead of using a specific value for the estimated fetal fraction as in prior art method (which can be off from the true value by some unknown amount), we use a normal distribution N(muF, sgF ) specifying the probability distribution for the values of fetal fraction/, where muF is the prediction we got from the neural network for a given test sample, and sgF is the estimate of the error we got from the training set. It follows from this approach that the width sgF 2 of the distribution N(muF, sgF 2 ) is same for all samples, while the center muF changes with each sample. This approach to fetal fraction estimation by neural network and the error estimation has not been yet described in the prior art, and yields additional information, thus increasing the precision of the method according to the present invention.

Determination of a priori fetal fraction distribution - twin pregnancy

Since there is no known method to determine separate fetal fraction distributions for the twins, we will assume that they contribute equally to the cumulative number of fetal cfDNA fragments in maternal blood. We can calculate the cumulative fetal fraction distribution as in the case for single pregnancy, thus obtaining N(muF, sgF ). Then, by the same theory of probability that we used in Statistical approach for the determination of the mean mapping ratios, we have that the fetal fraction distribution for one fetus in twin pregnancy is N(muF/2, sgF /2). The other fetus is by our assumption same. We will denote the fetal fraction distribution density functions of the first fetus with N(muFl, sgFl ), and with N(muF2,

2

sgF2 ) that of a second fetus.

Formulation of hypotheses - single pregnancy Each of the presented hypotheses will have a code indicating how many of each chromosome X, Y, or Z is present in maternal and fetal source. It will be observed that maternal part never changes (coded as Mxxzz) because mother is always assumed to have two whole chromosomes X and two whole fictitious chromosomes Z. However, if we had any specific information regarding the chromosome distribution of the mother, we can very easily adjust the hypotheses to accommodate this change. For this reason, the following hypotheses should also be taken as examples according to which one can produce plentitude of other, perhaps case-specific, custom hypotheses. The part of the code belonging to fetus (begins with F) changes depending on fetal sex chromosome aberrations (e.g., euploid female fetus is Fxxzz, euploid male fetus is Fxzy because it has one chromosome X, one fictitious chromosome Z, and one chromosome Y and so on). Again, it is easy to adjust the hypotheses on the fetal part to accommodate other fetal chromosome- specific information (such as large deletions or insertions) that we do not want to include in the analysis.

We keep the maternal part of the code even though it does not change to stress the assumption that mother has to be healthy in order for the given hypotheses to work correctly. However, as was pointed out above, case- specific hypotheses can be easily designed to account for any aberration on maternal part.

Euploid female fetus - MxxzzFxxzz

For chromosome X there are four sources of fragments in this case: two maternal chromosomes X and two fetal chromosomes X. We associate these sources with four random variables XI, X2, X3, and X4 all of which have the same normal distribution N(muX, sdX ). Therefore, the observed cumulative number of fragments mapped to chromosome X is a random variable O = (l-f)*Xl + (l-f)*X2 + f*X3 + f*X4, which is again normally distributed as N(muO, sdO 2 ) with muO = 2 *muX, sdO 2 = 2 *(l-f) 2 *sdX 2 + 2 *f *sdX 2 and / being fetal fraction of all cfDNA fragments (hence, i -/is maternal fraction of cfDNA).

For chromosome Z we have analogous random variable P = (l-f)*Zl + (l-f)*Z2 + f*Z3 + f*Z4 with muP = 2 *muZ and sdP 2 = (2 *(l-f) 2 *sdZ 2 + 2 *f *sdZ 2 ). Finally, there is no true chromosome Y.

Thus, we have the expected mapping distributions for chromosomes X and Y in the euploid female case which will be later compared with the observed data. For short reference, we will denote these numbers with

• MxxzzFxxzz. muX = 2*muX

• MxxzzFxxzz.sdX 2 = 2*(1 -f) 2 *sdX 2 + 2*f 2 *sdX 2

• MxxzzFxxzz. muY = 2*muZ

• MxxzzFxxzz.sdY 2 = 2*(1 -f) 2 *sdZ 2 + 2*f 2 *sdZ 2

It will be observed that these formulas are dependent on fetal fraction /. In other words, we have, in fact, many hypotheses MxxzzFxxzz for different values of f. This / ranges from 5% to 100% by step 0.1%. Note that this value of/does not depend of the fetal fraction of the test sample. This holds for any hypothesis defined below.

Euploid male fetus - MxxzzFxzy

For chromosome X, there are three sources of fragments: two maternal chromosomes X and one fetal chromosomes X. We associate these sources with thee random variables XI, X2, and X3 all of which have the same normal distribution N(muX, sdX ). Therefore, the observed cumulative number of fragments mapped to chromosome X is a random variable O = (l-f)*Xl + (l-f)*X2 + f*X3, which is again normally distributed as N(muO, sdO 2 ) with muO = (2 - f)*muX, sdO 2 = 2*(l-f) 2 *sdX 2 + f 2 *sdX 2 and/being fetal fraction of all cfDNA fragments (hence, i-/is maternal fraction of cfDNA). For chromosome Z we have analogously random variable P = (l-f)*Zl + (l-f)*Z2 + f*Z3 with muP = 2 *muZ and sdP 2 = (2 *(l-f) 2 *sdZ 2 + f *sdZ 2 ).

Finally, for true chromosome Y we have/*F ~ N(f*muY, f *sdY 2 ).

Thus, we have the expected mapping distributions for chromosome X and Y in the euploid male case which will be later compared with the observed data. For short reference, we will denote these numbers with

• MxxzzFxzy. muX = (2-f)*muX

• MxxzzFxzy.sdX 2 = 2*(1 - 2 *sdX 2 + f 2 *sdX 2

• MxxzzFxzy. muY = (2-f)*muZ + f*muY

· MxxzzFxzy.sdY 2 = 2*(1 -f) 2 *sdZ 2 + f 2 *sdZ 2 + f 2 *sdY 2

Again, there are, in fact, many hypotheses MxxzzFxzy for different values off.

Analogously, the other hypotheses will be as follows:

Female fetus with Turner syndrome - MxxzzFxz

• MxxzzFxz. muX = (2-f)*muX

• MxxzzFxz. sdX 2 = 2*(1 -f) 2 *sdX 2 + f 2 *sdX 2

• MxxzzFxz. muY = (2-f)*muZ

• MxxzzFxz. sdY 2 = 2*(1 -f) 2 *sdZ 2 + f 2 *sdZ 2

Female fetus with triple X syndrome - MxxzzFxxxzzz

• MxxzzFxxxzzz. muX = (2+f)*muX

• MxxzzFxxxzzz. sdX 2 = 2*(1 -f) 2 *sdX 2 + 3*P*sdX 2

• MxxzzFxxxzzz. muY = (2+f)*muZ

· MxxzzFxxxzzz. sdY 2 = 2*(1 -f) 2 *sdZ 2 + 3*f 2 *sdZ 2

Male fetus with Klinefelter syndrome - MxxzzFxxzzy • MxxzzFxxzzy. muX = 2*muX • MxxzzFxxzzy.sdX 2 = 2*(1 -f) 2 *sdX 2 + 2*f 2 *sdX 2

• MxxzzFxxzzy.muY = 2*muZ + f*muY

• MxxzzFxxzzy.sdY 2 = 2*(1 -f) 2 *sdZ 2 + 2*f 2 *sdZ 2 + f 2 *sdY 2

Male fetus with Jacob syndrome - MxxzzFxzyy

• MxxzzFxzyy. muX = (2-f)*muX

• MxxzzFxzyy.sdX 2 = 2*(1 -f) 2 *sdX 2 + f 2 *sdX 2 + 2*f 2 *sdY 2

• MxxzzFxzyy. muY = 2*muZ + 2*f*muY

• MxxzzFxzyy.sdY 2 = 2*(1 -f) 2 *sdZ 2 + f 2 *sdZ 2 + 2*f 2 *sdY 2

Formulation of hypotheses - twin pregnancies

For twin pregnancies, the situation is analogous except that there are now two fetuses, each with its own fraction of cfDNA fragments// and 2. The hypothesis code will now change to M_F1_F2_ (e.g., MxxzzFlxxzzF2xxzz for two euploid female fetuses, MxxzzFlxxzzF2xzy for euploid female and male fetuses and so on). The derivation of hypotheses is analogous to cases for single pregnancies, so we skip to the derived reference numbers directly.

Two euploid female fetuses - MxxzzFlxxzzF2xxzz

• MxxzzFl xxzzF2xxzz.muX = 2*muX

• MxxzzFl xxzzF2xxzz.sdX 2 = 2*(1 -fl -f2) 2 *sdX 2 + 2*fl 2 *sdX 2 + 2*f2 2 *sdX 2

• MxxzzFl xxzzF2xxzz.muY = 2*muZ

MxxzzFl xxzzF2xxzz.sdY 2 = 2*(1 -fl -f2) 2 *sdZ 2 + 2*fl 2 *sdZ 2 + 2*f2 2 *sdZ 2

Euploid female fetus with euploid male fetus - MxxzzFlxxzzF2xzy

• MxxzzFl xxzzF2xzy.muX = (2-f2)*muX

• MxxzzFl xxzzF2xzy.sdX 2 = 2*(1 -f 1 -f2) 2 *sdX 2 + 2*f 1 2 *sdX 2 + f2 2 *sdX 2

• MxxzzFl xxzzF2xzy.muY = (2-f2)*muZ + f2*muY

• MxxzzFl xxzzF2xzy.sdY 2 = 2*(1 -fl -f2) 2 *sdZ 2 + 2*fl 2 *sdZ 2 + f2 2 *sdZ 2 + f2 2 *sdY 2 Euploid female fetus with Turner female fetus - MxxzzFlxxzzF2xz

• MxxzzFl xxzzF2xz.muX = (2-f2)*muX

• MxxzzFl xxzzF2xz.sdX 2 = 2*(1 -f 1 -f2) 2 *sdX 2 + 2*f 1 2 *sdX 2 + f2 2 *sdX 2

• MxxzzFl xxzzF2xz.muY = (2-f2)*muZ

· MxxzzFl xxzzF2xz.sdY 2 = 2*(1 -fl -f2) 2 *sdZ 2 + 2*fl 2 *sdZ 2 + f2 2 *sdZ 2

Euploid female fetus with triple X female fetus - MxxzzFlxxzzF2xxxzzz

• MxxzzFl xxzzF2xxxzzz.muX = (2+f2)*muX

• MxxzzFl xxzzF2xxxzzz.sdX 2 = 2*(1 -f 1 -f2) 2 *sdX 2 + 2*f 1 2 *sdX 2 + 3*f2 2 *sdX 2 · MxxzzFl xxzzF2xxxzzz.muY = (2+f2)*muZ

• MxxzzFl xxzzF2xxxzzz.sdY 2 = 2*(1 -fl -f2) 2 *sdZ 2 + 2*fl 2 *sdZ 2 + 3*f2 2 *sdZ 2

Euploid female fetus with Klinefelter male fetus - MxxzzFlxxzzF2xxzzy

• MxxzzFl xxzzF2xxzzy.muX = 2*muX

· MxxzzFl xxzzF2xxzzy.sdX 2 = 2*(1 -fl -f2) 2 *sdX 2 + 2*fl 2 *sdX 2 + 2*f2 2 *sdX 2

• MxxzzFl xxzzF2xxzzy.muY = 2*muZ + f2*muY

• MxxzzFl xxzzF2xxzzy.sdY 2 = 2*(1 -fl -f2) 2 *sdZ 2 + 2*fl 2 *sdZ 2 + 2*f2 2 *sdZ 2 + f2 2 *sdY 2

Euploid female fetus with Jacob male fetus - MxxzzFlxxzzF2xzyy

• MxxzzFl xxzzF2xzyy.muX = (2-f2)*muX

• MxxzzFl xxzzF2xzyy.sdX 2 = 2*(1 -f 1 -f2) 2 *sdX 2 + 2*f 1 2 *sdX 2 + f2 2 *sdX 2

• MxxzzFl xxzzF2xzyy.muY = (2-f2)*muZ + 2*f2*muY

• MxxzzFl xxzzF2xzyy.sdY 2 = 2*(l-fl-f2) 2 *sdZ 2 + 2*fl 2 *sdZ 2 + f2 2 *sdZ 2 + 2*f2 2 *sdY 2

Euploid male fetus with euploid female fetus - MxxzzFlxzyF2xxzz

Identical with "Euploid female fetus with euploid male fetus" - MxxzzFl xxzzF2xzy Two euploid male fetuses - MxxzzFlxzyF2xzy

• MxxzzFl xzyF2xzy.muX = (2-fl -f2)*muX

• MxxzzFl xzyF2xzy.sdX 2 = 2*(1 -fl -f2) 2 *sdX 2 + fl 2 *sdX 2 + f2 2 *sdX 2

• MxxzzFl xzyF2xzy.muY = (2-fl -f2)*muZ + fl *muY + f2*muY

· MxxzzFl xzyF2xzy.sdY 2 = 2*(1 -fl -f2) 2 *sdZ 2 + fl 2 *sdZ 2 + fl 2 *sdY 2 + f2 2 *sdZ 2 + f2 2 *sdY 2

Euploid male fetus with Turner female fetus - MxxzzFlxzyF2xz

• MxxzzFl xzyF2xz.muX = (2-fl -f2)*muX

· MxxzzFl xzyF2xz.sdX 2 = 2*(1 -fl -f2) 2 *sdX 2 + fl 2 *sdX 2 + f2 2 *sdX 2

• MxxzzFl xzyF2xz.muY = (2-fl -f2)*muZ + fl *muY

• MxxzzFl xzyF2xz.sdY 2 = 2*(1 -fl -f2) 2 *sdZ 2 + fl 2 *sdZ 2 + f2 2 *sdZ 2

Euploid male fetus with triple X female fetus - MxxzzFlxzyF2xxxzzz · MxxzzFl xzyF2xzy.muX = (2-fl +f2)*muX

• MxxzzFl xzyF2xzy.sdX 2 = 2*(1 -fl -f2) 2 *sdX 2 + fl 2 *sdX 2 + 3*f2 2 *sdX 2

• MxxzzFl xzyF2xzy.muY = (2-fl +f2)*muZ + fl *muY + f2*muY

• MxxzzFl xzyF2xzy.sdY 2 = 2*(1 -fl -f2) 2 *sdZ 2 + fl 2 *sdZ 2 + fl 2 *sdY 2 + f2 2 *sdZ 2 + f2 2 *sdY 2

Euploid male fetus with Klinefelter male fetus - MxxzzFlxzyF2xxzzy

• MxxzzFl xzyF2xxzzy.muX = (2-fl )*muX

• MxxzzFl xzyF2xxzzy.sdX 2 = 2*(1 -f 1 -f2) 2 *sdX 2 + f 1 2 *sdX 2 + 2*f2 2 *sdX 2

• MxxzzFl xzyF2xxzzy.muY = (2-fl )*muZ + fl *muY + f2*muY

· MxxzzFl xzyF2xxzzy.sdY 2 = 2*(l-fl-f2) 2 *sdZ 2 + f 1 2 *sdZ 2 + f 1 2 *sdY 2 +

2*f2 2 *sdZ 2 + f2 2 *sdY 2

Euploid male fetus with Jacob male fetus - MxxzzFlxzyF2xzyy

• MxxzzFl xzyF2xzyy.muX = (2-fl -f2)*muX • MxxzzFl xzyF2xzyy.sdX 2 = 2*(1 -fl -f2) 2 *sdX 2 + fl 2 *sdX 2 + f2 2 *sdX 2

• MxxzzFl xzyF2xzyy.muY = (2-fl -f2)*muZ + fl *muY + 2*f2*muY

• MxxzzFl xzyF2xzyy.sdY 2 = 2*(1 -f 1 -f2) 2 *sdZ 2 + f 1 2 *sdZ 2 + f 1 2 *sdY 2 + f2 2 *sdZ 2 + 2*f2 2 *sdY 2 Turner female fetus with euploid female fetus - MxxzzFlxzF2xxzz

Identical with "Euploid female fetus with Turner female fetus" - MxxzzFl xxzzF2xz

Turner female fetus with euploid male fetus - MxxzzFlxzF2xzy

Identical with "Euploid male fetus with Turner female fetus" - MxxzzFl xzyF2xz

Two female fetuses with Turner syndrome - MxxzzFlxzF2xz

• MxxzzFl xzF2xz.muX = (2-fl -f2)*muX

• MxxzzFl xzF2xz.sdX 2 = 2*(1 -fl -f2) 2 *sdX 2 + fl 2 *sdX 2 + f2 2 *sdX 2

• MxxzzFl xzF2xz.muY = (2-fl -f2)*muZ

· MxxzzFl xzF2xz.sdY 2 = 2*(1 -fl -f2) 2 *sdZ 2 + f 1 2 *sdZ 2 + f2 2 *sdZ 2

Turner female fetus with triple X female fetus - MxxzzFlxzF2xxxzzz

• MxxzzFl xzF2xxxzzz.muX = (2-fl +f2)*muX

• MxxzzFl xzF2xxxzzz.sdX 2 = 2*(1 -fl -f2) 2 *sdX 2 + fl 2 *sdX 2 + 3*f2 2 *sdX 2

· MxxzzFl xzF2xxxzzz.muY = (2-fl +f2)*muZ

• MxxzzFl xzF2xxxzzz.sdY 2 = 2*(1 -fl -f2) 2 *sdZ 2 + fl 2 *sdZ 2 + 3*f2 2 *sdZ 2

Turner female fetus with Klinefelter male fetus - MxxzzFlxzF2xxzzy

• MxxzzFl xzF2xxzzy.muX = (2-fl )*muX

· MxxzzFl xzF2xxzzy.sdX 2 = 2*(1 -f 1 -f2) 2 *sdX 2 + f 1 2 *sdX 2 + 2*f2 2 *sdX 2

• MxxzzFl xzF2xxzzy.muY = (2-fl )*muZ + f2*muY

• MxxzzFl xzF2xxzzy.sdY 2 = 2*(1 -fl -f2) 2 *sdZ 2 + fl 2 *sdZ 2 + 2*f2 2 *sdZ 2 + f2 2 *sdY 2 Turner female fetus with Jacob male fetus - MxxzzFlxzF2xzyy

• MxxzzF l xzF2xzyy. muX = (2-fl -f2)*muX

• MxxzzF l xzF2xzyy.sdX 2 = 2*(1 -fl -f2) 2 *sdX 2 + fl 2 *sdX 2 + f2 2 *sdX 2

• MxxzzF l xzF2xzyy. muY = (2-fl -f2)*muZ + 2*f2*muY

· MxxzzF l xzF2xzyy.sdY 2 = 2*(1 -fl -f2) 2 *sdZ 2 + fl 2 *sdZ 2 + f2 2 *sdZ 2 + 2*f2 2 *sdY 2

Triple X female fetus with euploid female fetus - MxxzzFlxxxzzzF2xxzz

Identical with "Euploid female fetus with Triple X female fetus" - MxxzzFl xxzzF2xxxzzz Triple X female fetus with euploid male fetus - MxxzzFlxxxzzzF2xzy

Identical with "Euploid male fetus with Triple X female fetus" - MxxzzFl xzyF2xxxzzz

Triple X female fetus with Turner female fetus - MxxzzFlxxxzzzF2xz

Identical with "Euploid male fetus with Triple X female fetus" - MxxzzFl xzF2xxxzzz

Two female fetuses with Triple X syndrome - MxxzzFlxxxzzzF2xxxzzz

• MxxzzF l xxxzzzF2xxxzzz. muX = (2+fl +f2)*muX

• MxxzzF l xxxzzzF2xxxzzz.sdX = 2*(1 -f 1 -f2) 2 *sdX 2 + 3*f 1 2 *sdX 2 + 3*f2 2 *sdX 2

• MxxzzF l xxxzzzF2xxxzzz. muY = (2+fl +f2)*muZ

· MxxzzF l xxxzzzF2xxxzzz.sdY = 2*(1 " 2 *sdZ 2 + 3*f 1 2 *sdZ 2 +3*f2 2 *sdZ 2

Triple X female fetus with Klinefelter male fetus- MxxzzFlxxxzzzF2xxzzy

• MxxzzF l xxxzzzF2xxzzy. muX = (2+fl )*muX

· MxxzzF l xxxzzzF2xxzzy.sdX 2 = 2*(1 -fl -f2) 2 *sdX 2 + 3*fl 2 *sdX 2 + 2 *f2 2 *sdX 2

• MxxzzF l xxxzzzF2xxzzy. muY = (2+fl )*muZ + f2*muY

• MxxzzF l xxxzzzF2xxzzy.sdY 2 = 2 *(1 -fl -f2) 2 *sdZ 2 + 3*fl 2 *sdZ 2 + 2*f2 2 *sdZ 2 + f2 2 *sdY 2 Triple X female fetus with Jacob male fetus- MxxzzFlxxxzzzF2xzyy

• MxxzzF l xxxzzzF2xzyy. muX = (2+fl -f2)*muX

• MxxzzF l xxxzzzF2xzyy.sdX 2 = 2*(1 -fl -f2) 2 *sdX 2 + 3*fl 2 *sdX 2 + f2 2 *sdX 2

• MxxzzF l xxxzzzF2xzyy. muY = (2+fl -f2)*muZ + 2*f2*muY

· MxxzzF l xxxzzzF2xzyy.sdY 2 = 2*(1 -fl -f2) 2 *sdZ 2 + 3*fl 2 *sdZ 2 + f2 2 *sdZ 2 +

2*f2 2 *sdY 2

Klinefelter male fetus with euploid female fetus- MxxzzFlxxzzyF2xxzz

Identical with "Euploid female fetus with Klinefelter male fetus" - MxxzzFl xxzzF2xxzzy

Klinefelter male fetus with euploid male fetus- MxxzzFlxxzzyF2xzy

Identical with "Euploid male fetus with Klinefelter male fetus" - MxxzzFl xzyF2xxzzy

Klinefelter male fetus with Turner female fetus - MxxzzFlxxzzyF2xz

Identical with "Turner female fetus with Triple X female fetus" - MxxzzFl xzF2xxzzy

Klinefelter male fetus with triple X female fetus - MxxzzFlxxzzyF2xxxzzz

Identical with "Triple X female fetus with Klinefelter male fetus" - MxxzzFl xxxzzzF2xxzzy Two male fetuses with Klinefelter syndrome - MxxzzFlxxzzyF2xxzzy

• MxxzzF l xxzzyF2xxzzy. muX = 2 *muX

• MxxzzF l xxzzyF2xxzzy.sdX 2 = 2*(1 -fl -f2) 2 *sdX 2 + 2*fl 2 *sdX 2 + 2 *f2 2 *sdX 2

• MxxzzF l xxzzyF2xxzzy. muY = 2 *muZ + fl *muY + f2*muY

• MxxzzF l xxzzyF2xxzzy.sdY 2 = 2*(l -fl -f2) 2 *sdZ 2 + 2*fl 2 *sdZ 2 + f 1 2 *sdY 2 + 2*f2 2 *sdZ 2 + f2 2 *sdY 2 Klinefelter male fetus with Jacob male fetus- MxxzzFlxxzzyF2xzyy

• MxxzzF l xxzzyF2xzyy. muX = (2-f2)*muX

• MxxzzF l xxzzyF2xzyy.sdX 2 = 2*(1 -fl -f2) 2 *sdX 2 + 2*fl 2 *sdX 2 + f2 2 *sdX 2

• MxxzzF l xxzzyF2xzyy. muY = (2-f2)*muZ + fl *muY+ 2 *f2 *muY

· MxxzzF l xxzzyF2xzyy.sdY 2 = 2*(1 -fl -f2) 2 *sdZ 2 + 2*fl 2 *sdZ 2 + f 1 2 *sdY+ f2 2 *sdZ 2 + 2*f2 2 *sdY 2

Jacob male fetus with euploid female fetus- MxxzzFlxzyyF2xxzz

Identical with "Euploid female fetus with Jacob male fetus" - MxxzzFl xxzzF2xzyy

Jacob male fetus with Euploid male fetus- MxxzzFlxzyyF2xzy

Identical with "Euploid male fetus with Jacob male fetus" - MxxzzFl xzyF2xzyy

Jacob male fetus with Turner female fetus - MxxzzFlxzyyF2xz

Identical with "Turner female fetus with Jacob male fetus" - MxxzzFl xzF2xzyy

Jacob male fetus with triple X female fetus - MxxzzFlxzyyF2xxxzzz

Identical with "Triple X female fetus with Jacob male fetus" - MxxzzFl xxxzzzF2xzyy

Jacob male fetus with Klinefelter male fetus - MxxzzFlxzyyF2xxzzy

Identical with "Klinefelter male fetus with Jacob male fetus" - MxxzzFl xxzzyF2xzyy

Two male fetuses with Jacob syndrome - MxxzzFlxzyyF2xzyy

• MxxzzF l xzyyF2xzyy. muX = (2-fl -f2)*muX

· MxxzzF l xzyyF2xzyy.sdX == 2*(l -fl -f2) 2 *sdX 2 + f 1 2 *sdX 2 + 2*fl 2 *sdY 2 + f2 2 *sdX 2 + 2*f2 2 *sdY 2

• MxxzzF l xzyyF2xzyy. muY = 2 *muZ + 2 *fl *muY + 2 *fl *muY • MxxzzF l xzyyF2xzyy.sdY = 2*(1 -f 1 -f2) 2 *sdZ 2 + f 1 2 *sdZ 2 + 2*f 1 2 *sdY 2 + f2 2 *sdZ 2 + 2*f2 2 *sdY 2

Formulation of hypotheses - more than twin pregnancies

There is no theoretical reason to stop at hypotheses for twin pregnancies - we can easily extend the theory to, say, triplets with fetal fractions fl, f2, and f3. However, the necessary higher sequencing depth for such cases is not available for a reasonable cost (regarding widespread usage) with the current sequencing technologies, although in theory (or future) it is easily possible.

Limitations

Certain combinations of SCAs in twin pregnancies may compensate each other deficits or excesses. For example, one Turner female fetus with triple X female fetus compensate the aberrations on X chromosome so that the sample would appear as two euploid female fetuses, if the fetal fractions of both fetuses are equal or close. The following table lists all ambiguous cases, when the fetal fraction is equal for both fetuses in a twin pregnancy Symmetric part of the table is omitted (empty cells). A dash (-) stands where there is no ambiguity.

Thus, as long as there is no method for determination of separate fetal fractions /i and 2 for the twins, or this method will indicate that /i is equal or close to equal to 2, the NGS data cannot be, in general, interpreted unambiguously. However, even in this ambiguous state, the test provides the user with valuable information, because there are always only two possible interpretations. Thus, the user can suggest additional tests specifically targeting one of the possible interpretations to prove or disprove it. Additionally, prevalence of the SCAs in population also provides help with the interpretation. For example, it is much more likely to have two euploid female twins rather than one with Turner and one with triple X syndrome.

Prevalence of SCAs in population The prevalence of SCAs in population was described in Snijders et al. 1995 27 , Zimmermann et al. 2012, Lau et al. 2012, Mazloom et al. 2013, Norton et al. 2014 28 , Hook et al. 201429 , Grati et al. 2014 30 , Stumm et al. 201431. It was showed in the prior art that the prevalence of SCAs depends on both maternal age and week of gestation during the performance of the test.

The operator is free to choose any prevalence model he desires. In the examples below, a prevalence model described in Snijders et al. 1995 was used. More particularly, we used Table 6 for the prevalence of XXX, XXY, and XYY, and we used Table 7 for the prevalence of X0. Additionally, we assumed that prevalence of male and female fetus is equal.

As for the twin pregnancies, we considered both fetuses to be independent. For this reason, the prevalence probabilities are multiplied. For example, the probability of having one euploid female (XX) with one triple X female (XXX) equals the prevalence of XX multiplied by the prevalence of XXX.

Maximum likelihood analysis of the hypotheses

The input to the maximum likelihood analysis are two numbers, x and y, specifying the ratio of DNA fragments mapped to chromosome X and Y, respectively, for a given sample. Additionally, the second part of the input is the a priori fetal fraction distribution given by the density function N(muF, sgF ), or two fetal fraction distributions given by their density functions N(muFl, sgFl 2 ) and N(muF2, sgF22 ) in case of twin pregnancy. We note again, that given the current status of technology, we are forced to assume that N(muFl, sgFl 2 )=N(muF2, sgF2 2 ).

In case of single pregnancy, this step calculates, for each hypothesis h (MxxzzFxxzz, MxxzzFxzy, MxxzzFxz, MxxzzFxxxzzz, MxxzzFxxzzy, MxxzzFxzyy) and each value of fetal fraction/ (discretized values ranging from 5% to 100% by a step of 0.1%) the probability of observing the data x and y under h and /, i.e., Pr[x\h,f] and Pr[y\h,f] . The probability Pr[x\h, f] (resp. Pr[y\h, f]) is calculated as a definite integral of the probability density function N(h.muX, h.sdX 2 ) (resp. N(h.muY, h.sdY 2 )) on the interval [x - le-6, x + le-6] (resp.

[y - le-6, y + le-6]), where h.muX (resp. h.muY) and h.sdX 2 (resp. h.sdY 2 ) are specified by the hypothesis h. Furthermore, the probability of the particular value of fetal fraction / is calculated as a definite integral of the probability density function N(h.muF, h.sdF ) on the interval [f - le-3, f + le-3]. The final probability of the hypothesis h 2 given the NGS data, fetal fraction distributions, and prevalence of h in population is Pr[h,j\x,y] = Pr[x\h,f]* Pr[y\h,f]* Pr[f]*Pr[h]. This value is calculated and plotted for each hypothesis h and fetal fraction/.

In case of twin pregnancy, this step calculates, for each hypothesis h (MxxzzFlxxzzF2xxzz, MxxzzFlxxzzF2xzy, MxxzzFlxxzzF2xz, MxxzzFlxxzzF2xxxzzz, MxxzzFlxxzzF2xxzzy, MxxzzFlxxzzF2xzyy, MxxzzFlxzyF2xzy, MxxzzFlxzyF2xz, MxxzzFlxzyF2xxxzzz, MxxzzFlxzyF2xxzzy, MxxzzFlxzyF2xzyy, MxxzzFlxzF2xz, MxxzzFlxzF2xxxzzz, MxxzzFlxzF2xxzzy, MxxzzFlxzF2xzyy, MxxzzFlxxxzzzF2xxxzzz,

MxxzzFlxxxzzzF2xxzzy, MxxzzFlxxxzzzF2xzyy, MxxzzFlxxzzyF2xxzzy,

MxxzzFlxxzzyF2xzyy, MxxzzFlxzyyF2xzyy, altogether 21 hypotheses) and for each fetal fraction /i, fl (the two fetal fractions// and/2 range from 5% to 100% by a step of 0.1%, but for which we additionally have fl +f2< = 100%) the probability of observing the data x and y under h and/ , fl, i.e., Pr[x\h, fl, f2] and Pr[y\h, fl, fl]. The probability Pr[x\h, fl, f2] (resp. Pr[y\h, fl, fl]) is calculated as a definite integral of the probability density function N(h.muX, h.sdX 2 ) (resp. N(h.muY, h.sdY 2 )) on the interval [x - le-6, x + le-6] (resp. [y - le-6, y + le-

6]), where h.muX (resp. h.muY) and h.sdX 2 (resp. h.sdY 2 ) are specified by the hypothesis h. Furthermore, the probability of the particular value of fetal fraction // (resp. /2) is calculated as a definite integral of the probability density function N(h.muFl, h.sdFl ) (resp. N(h.muF2, h.sdF2 )) on the interval [f - le-3, f + le-3]. The final probability of the hypothesis h and fetal fractions fl and fl ^ given the NGS data, fetal fraction distributions, and prevalence of h in population is Pr[h, fl, f2\x, y] = Pr[x\h, fl, f2]* Pr[y\h, fl, f2]* Pr[fl]*Pr[f2]*Pr[h] , where Pr[h] is the prevalence of hypothesis h in the population. This value is calculated and plotted for each hypothesis h and fetal fraction// and/2. Improvements in the initial data processing

This regards the application of GC correction on the mapped NGS data. Normally, a GC correction model is selected and applied to all chromosomes. However, we have observed that for the analysis of SCAs, it is better to apply GC correction to chromosome X and leave the chromosome Y without any correction.

To explain, this step markedly improved the correlation between mapping ratios of chromosomes X and Y for samples with euploid male fetus. Note that we would expect to find there a high correlation due to the different organization of chromosomes X and Y of euploid male fetus when compared with the organization of his mother, and the scale of the change is directly proportional to the fetal fraction. Indeed, such correlation exists, and it was significantly improved by not GC correcting chromosome Y. A similar gain in correlation was observed when we applied GC correction to chromosome Y and not to chromosome X. On the other hand, application of GC correction to both or neither chromosomes resulted in comparable, but lower correlations. The details are given in the table below (fet numbers are with mixed GC correction, underlined numbers are with common GC correction). non.X non.Y GC.X GC.Y

non.X 1 -0,73336 0,788391 -0,93994 non.Y -0,73336 1 -0,92773 0,818423

GC.X 0,788391 -0,92773 1 -0,75308

GC.Y -0,93994 0,818423 -0,75308 1

Note that we used GC correction according to Liao et al. 2014.

Since SCAs are also different organizations of chromosomes X and Y when compared with that of a mother, we reasoned that application of GC correction only to chromosome X will improve the results. However, since there is not enough cases with, say, fetuses with Turner syndrome, we cannot establish the improvement in correlation by an experiment.

Application of the present method to common autosomal aneuploidies

This part is limited to single pregnancies only. The said method can be easily trained and applied to common autosomal aneuploidies such as (but not limited to) trisomy or monosomy of chromosome 13, 18 or 21. Such case is directly analogous to a case where the female fetus has Turner (monosomy) or triple X (trisomy) syndrome. This is because, healthy female fetus has two X chromosomes, which in this analogy corresponds to two chromosomes 21 (or any other autosome). Additionally, monosomy of chromosome 21 (or any other autosome) is an analogy of Turner syndrome, and trisomy of chromosome 21 (or any other autosome) is an analogy of triple X syndrome. Thus, the hypotheses for fetus with Turner and triple X syndrome can be easily adjusted to correspond with the autosomal monosomy or trisomy. The only difference would be the number of considered hypotheses - 1) monosomy (aka Turner syndrome), 2 trisomy (aka triple X syndrome), and 3) euploid (aka euploid female fetus). The rest of the details are easy to complete by any person skilled in the art (fetal fraction distribution is same, and so are the equations for calculating the final probabilities).

Computerization of the method

Nowadays, there is a practical requirement to automatize the methods of the prenatal tests or diagnostics. The method of the present invention can be largely automatized. At least the "bioinformatics" part of the method (i.e. processing of sequencing data and all subsequent determinations and calculations) may be performed using suitable computer system, such as PC equipped with a processor, peripheral input/output devices (e.g. ports, interfaces), memories (e.g. system memory, hard disk), keyboard, monitor, mouse etc., and a specific software, program for instructing the computer system to perform specific step. Preferably, the computer system is in data communication with the sequencing system providing the sequence data, preferably in the form of plurality of sequence reads (by a wire or wireless networking, bluetooth, internet, cloud etc.). It means that the computer system is configured for receiving sequence data from the sequencing system. The suitable computer systems as well as means for connection with sequencing system are well known to the persons skilled in the art.

At least part of the method, specifically the bioinformatics part of the method, can be implemented as a software code, i.e. a plurality of instructions (computer program) to be executed by a processor of a computing system. The code may be comprised in the computer readable medium for storage or transmission such as for example RAM, ROM, hard-drive, SDS, CD, DVD, flash memory etc. Furthermore, the code may be transmitted via any suitable wired, optical or wireless network, for example via internet. For example, the whole computer programme can be downloaded by the operator (user) via the internet.

Therefore, another aspect of the present invention relates to the computer implemented method comprising all steps of calculations and determinations needed for processing of the input (sequencing data) into the output parameter(s) characterizing the diagnosis condition (fetal fraction and the most probable hypothesis, from which the presence or absence of aneuploidy as well as sex can be inferred).

Still another aspect of the present invention relates to a computer program product comprising a computer readable medium comprising a plurality of instructions for controlling a computing system to perform at least a portion of the method according to the invention, preferably portion thereof starting with the step of receiving sequence information from the random sequencing step performed with automated sequencing system.

Interpretation of the results Interpretation of the results is discussed in detail in the examples below. Generally, the output of the present method can be transformed into plots depicting probability of hypotheses about the fetal sex chromosomes, and how this probability changes with fetal fraction (see Examples 3-11 and Figures 3-11).

Note that the discord between fetal fraction from the input fetal fraction distribution, and the fetal fraction from the hypotheses can exist. This is because they are both calculated from independent data. While the fetal fraction for the input fetal fraction distribution is calculated from the distribution of cfDNA fragment lengths, the fetal fraction for hypotheses is calculated from the NGS data for chromosome X and Y.

Interpretation of the results - single pregnancy

There are three plots for each sample organized in three rows (see Examples 3-8 and Figures 3-8). The plot on the bottom row shows how the probability of each hypothesis changes with the value of fetal fraction. The value of fetal fraction at which each hypothesis reaches maximum probability is given in the legend. The plot in the middle row shows, for each hypothesis, the maximum probability through all considered values of fetal fraction. The value of fetal fraction at which this maximum is reached is showed on the x axis in parentheses (probabilities for values of fetal fractions below 5% are not included). This plot serves to compare the maximum probabilities of all hypotheses. Finally, the plot in the top row adds the prevalence of each hypothesis in the population as given by Snijders et al. 1995.

The operator should make the decision based on the top plot as it contains all of the available information (NGS sequencing data for chromosomes X and Y, fetal fraction distribution, and SCAs prevalence in population). However, the other plots may offer some supporting information as well, especially in cases with mosaicism present, in which case the fetal fraction distribution may be very different from the most probable fetal fraction from the SCAs hypotheses.

The analysis of typical samples with the euploid female or male fetus are demonstrated in the Examples 3 and 4 and the samples with aneuploid fetuses are demonstrated in the Examples 5-8.

Interpretation of the results - twin pregnancy

For twin pregnancies we show only two plots for real samples (see Examples 9-10 and Figures 9-10) because of the abundance of the data, more precisely, only the top two plots from the single pregnancy test (the probability of each case is given either above or below the top of each bar). As was pointed out in the section Limitations above, interpreting results for twin pregnancies is more difficult because the same NGS data about chromosome X and Y can lead to ambiguous results (for more details see the table in section Limitations).

According to table in the section Limitations, both hypotheses "euploid male - euploid male" and "Turner female - Jacob male" produce almost identical NGS data for chromosomes X and Y. Thus, both hypotheses have high probability. On the other hand, probability of the latter case is negligible, as is shown in the top plot, when the prevalence of SCAs in the population is taken into account. The conclusion is that the bottom plot gives the user information about the particular measured data (fetal fraction distribution plus NGS data for chromosomes X and Y) while the top plot informs him about the real expectations. Based on this, the user can propose a follow up test to distinguish between the specific two possibilities. Additionally, one artificial sample is demonstrated in the Example 11 (see also Figure 11).

A subject-matter of the present invention is a method for determining sex chromosome aneuploidy, sex and fetal fraction of one or multiple fetuses from a test sample of maternal blood, plasma or serum, as defined in the attached claims. Still another subject-matter of the present invention is computer implemented method comprising the steps of above method that follows the step of performing sequencing and obtaining sequence information, as defined in the attached claims.

Still another subject-matter of the present invention is a computer program product comprising a computer readable medium comprising a plurality of instructions for controlling a computing system to perform said computer implemented method, as defined in the attached claims.

The features and advantages of the present invention will be understood by the person skilled in art from the following detailed description, examples and figures.

BRIEF DESCRIPTION OF THE DRAWINGS

Fig. 1 shows flow chart demonstrating the basic steps of the method according to the present invention. The references to the corresponding parts of the specification are added.

Fig. 2 demonstrates the neural network for predicting fetal fraction from sample's fragment length histogram. Fetal fractions measured from the abundance of chromosome Y were used as guiding values for training of the weights wlOO, wlOl w200 as well as transformation function f(x) = ax + b.

Fig. 3 shows the results of an analysis of a sample of single pregnancy with euploid female fetus.

Fig. 4 shows the results of an analysis of a sample of single pregnancy with euploid male fetus.

Fig. 5 shows the results of an analysis of a sample of single pregnancy with Turner female fetus. Fig. 6 shows the results of an analysis of a sample of single pregnancy with Klinefelter male fetus.

Fig. 7 shows the results of an analysis of a sample of single pregnancy with triple X female fetus. Fig. 8 shows the results of an analysis of a sample of single pregnancy with Jacob male fetus.

Fig. 9 shows the results of an analysis of a sample of twin pregnancy with either two euploid male fetuses or one Turner female and one Jacob male fetus.

Fig. 10 shows the results of an analysis of a sample of twin pregnancy with either two euploid female fetuses or one Turner female and one triple X female fetus. Fig. 11 shows the results of an analysis of a sample of twin pregnancy with either two Turner female fetuses.

EXAMPLES

Example 1 Neural network for fetal fraction calculation

Our neural network, as shown in Fig. 2, consists of two layers: a base layer for 101 input nodes and an output layer of one node (with sufficiently large training set, a more complex network with hidden layers can be designed). Moreover, each input node is connected with the output node. A sample's sequencing data, namely mapped cfDNA fragments from all chromosomes, are classified according to their lengths, which results in a data histogram (fragment lengths are limited to the range from lOObp to 200bp, all other fragments are discarded). The input of the neural network is then the relative counts of the considered lengths for each sample. A set of samples with varying and known fetal fraction (e.g. obtained from chromosome Y in case of samples with male fetus) is used to train the neural network in the usual way 32. Fetal fractions measured from the abundance of chromosome Y were used as guiding values for training of the weights wlOO ,wl01 ,...,w200 as well as transformation function f(x) = ax + b. Once the model is trained, we can predict the length-based fetal fractions for the training samples, compare them with the guiding Y-based fetal fractions, and determine the error of prediction. If we denote the prediction model as f len = f Y + epsilon, where epsilon represents a random normal error, then

is the standard deviation of the prediction error (N is the number of samples in the training set). Note that the mean of epsilon, i.e., mean error of the prediction, is zero because otherwise neural network was not sufficiently trained.

Example 2

Sample preparation and sequencing techniques used in the realization of the invention

Sequence analysis

Massively parallel sequencing is necessary for the application of the method according of the invention. The method was specifically developed and validated for small benchtop next generation sequencing systems to allow low initial costs for NIPT service lab setup. The method was validated on NextSeq500 system (Illumina, Inc., San Diego, CA, USA).

Commercially available sequencing devices together with corresponding protocols and reagents recommended by the supplier, were used in the illustrative example, however, the person skilled in the art is aware of number of various sequencing methods and their variations, which could be also used in practice of the present invention. Illumina kit NextSeq 500/550 High Output v2 kit (75 cycles) was used. Paired-end sequencing is required in this implementation to allow interpretation of results due to determination of fragment sizes which is part of the method. A read setting of 2 x 35bp was used. Overall, 111 samples were used with average number of reads of 5,979,689, std 3,242,203, min 1,184,843, max 17,191,421 for method evaluation. Full procedure of sample preparation for NextSeq500 (Illumina Inc., San Diego, CA, USA)) sequencer and analysis a) The method of obtaining the sequence data is described below with reference to specific kits used following standard laboratory protocols, only specific modifications to standard protocols are described. All procedures within standard protocols are substantially known to the persons skilled in the art of molecular biology, bioinformatics and prenatal testing and such a person is aware of the possible modifications of the procedure. It is advised that sample collection, blood processing, DNA isolation and NGS library preparation should be carried out by female lab technician due to sensitivity of the method to contamination.

Blood sample collection and plasma separation processing b) 10ml of peripheral blood sample should be collected from pregnant women after 11 th week of pregnancy in general EDTA containing tubes or tubes which stabilize cell free circulating nucleic acids (e.g. Streck Cell-Free DNA BCT).

c) Dual step plasma separation should be carried out not later than 2 days after collection with car to prevent white blood cell carry over,

d) Plasma samples are advised to be processed immediately after separation but storage is possible at -20 or -80C.

Isolation of DNA a) To carry out DNA isolation with sufficient quality and yield MagMax™Cell-Free Isolation Kit by Life Technologies should be used.

b) The volume of processed plasma is advised to be at least 1 mL.

c) Standard manufacturer protocol should be used.

d) Plasma isolated from STRECK tubes does require use of proteinase K treatment

within sample processing.

Next Generation Sequencing library preparation a) 30uL of isolated DNA from maternal plasma with use of MagMax Cell-Free isolation Kit should be processed in library preparation. b) Sequencing libraries should be prepared with using Illumina TruSeq Nano library preparation kit.

c) Standard manufacturer protocol should be used with the exception described here: d) Plasma isolated from STRECK tubes does require use of proteinase K treatment

within sample processing. a. END REPAIR

i. According to standard protocol

b. Additional step SIZE SELECTION (To over-represent the size fraction of isolated nucleic acids an additional size selection step should be carried out as described here)

i. Vortex magnetic beads reservoir for 1 minute. Add 2x of volume of magnetic beads (we recommend using Agencours AMPure beads by Beckman Coulter, the test was validated using this kit). Mix properly and spin briefly.

ii. Incubate 5 minutes at room temperature

iii. Place plate into magnetic stand for 5 min or until the liquid appears clear

iv. Remove and discard supernatant.

v. First wash. Add 200 uL freshly prepared 80% EtOH and incubate 30 s.

vi. Remove and discard supernatant.

vii. Second wash. Add 200 uL freshly prepared 80% EtOH and incubate 30 s.

viii. Remove and discard supernatant.

ix. Leave tubes open and allow 10 minutes for residual ethanol to evaporate.

x. Add 10 uL Resuspension Buffer (RSB), mix thoroughly by vortexing.

xi. Incubate 2 min at room temperature.

xii. Place tubes on the magnetic stand for 5 min or until the liquid appears clear xiii. Transfer 8.75 uL of the clear supernatant to the new PCR plate. xiv. STOPPING POINT - sample can be stored for up to 1 week at - 20°C. c. A-TAILING

i. According to standard protocol d. ADAPTOR LIGATION

i. According to standard protocol. e. PCR AMPLIFICATION

i. This strep is completely omitted from the procedure compared to standard protocol.

ii. As PCR produces biases in chromosomal fragment representation, its removal is a crucial step in reducing bias and increasing confidence in determination of fetal fraction based on fragment length distribution of reads. f. LIBRARY QUANTITATION

i. Measure the final DNA library concentration by using fluorometric Qubit dsDNA HS Assay kit (ng/ul)

ii. Determine the average size of final library on an Agilent Technologies 2100 Bioanalyzer using a High Sensitivity DNA chip (bp)

iii. Normalize the final libraries to 0,5 - 4nM final library as described in manufacturer protocol and pool libraries into final library.

iv. Prepare final library for Illumina NextSeq 500 run as described by sequencing system manufacturer protocol. g. SEQUENCING ANALYSIS ON ILLUMINA NEXSEQ500

i. Illumina NexSeq500 massively parallel sequencing system should be used. ii. niumina NextSeq 500/550 High Output v2 kit (75 cycles) is used for sequencing analysis run

iii. Paired end sequencing is performed with 2 x 35bp read setting with dual index reading.

iv. Only FASTQ files are required for processing in subsequent interpretation steps.

Example 3

Biological sample analysis - single pregnancy with euploid female fetus

10ml of peripheral blood sample from mother was taken and processed as described in the Example 2. Sequencing and consequently mapping data were subjected to the procedure of likelihood analyses of the specific hypothesis (as described in detail earlier in this specification and schematically depicted on Fig. 1). Training data, hypotheses formulation and training of the neural network to calculate fetal fraction were performed once before the test samples analysis and these results were used in all the following examples. Fetal fraction distribution and the probability of each hypothesis were calculated and the results were depicted in graphical form. At the same time, the status of the fetus was confirmed by other independent method (amniocentesis followed by QF PCR aneuploidy test).

In this case we can see on Fig. 3 that "Female" hypothesis (i.e., MxxzzFxxzz) dominates all other hypotheses. Observe the gray box in the bottom graph indicating range of values of fetal fraction for which the test is not considered reliable.

Example 4

Biological sample analysis - single pregnancy with euploid male fetus

In this case we can see on the Fig. 4 that "Male" hypothesis (i.e., MxxzzFxzy) dominates all other hypotheses. Example 5

Biological sample analysis - single pregnancy with Turner female fetus (X0)

In this case we can see on Fig. 5 that "Monosomy X" hypothesis (i.e., MxxzzFxz) dominates all other hypotheses.

Example 6

Biological sample analysis - single pregnancy with Klinefelter male fetus (XXY)

In this case we can see on Fig. 6 that "Klinefelter" hypothesis (i.e., MxxzzFxxzzy) dominates all other hypotheses.

Example 7

Biological sample analysis - single pregnancy with triple X female fetus (XXX)

In this case we can see on Fig. 7 that "Triple X" hypothesis (i.e., MxxzzFxxxzzz) dominates all other hypotheses.

Example 8

Biological sample analysis - single pregnancy with Jacob male fetus (XYY)

In this case we can see on Fig. 8 that "Klinefelter" hypothesis (i.e., MxxzzFxzyy) dominates all other hypotheses.

Example 9

Biological sample analysis - twin pregnancy with either two euploid male (XY) fetuses or one Turner female (X0) and one Jacob male (XYY) fetus

The sample of mother with two male fetuses was analyzed. The NGS data for the two cases mentioned in the title above is almost identical, see Fig. 9, hence they have comparable probability in the bottom plot. On the other hand, by taking case prevalence in population into account, the euploid case is much more probable than the Turner/Jacob case (this probability is easily quantifiable by looking into disease prevalence table from section Prevalence of SCAs in population). Furthermore, one can observe that the Turner/Jacob case (X0 - XYY) has a bit more edge when looking at the bottom plot. This is a consequence of the fact that the euploid case is symmetric - both fetuses are forced to have the same fetal fraction (in other words, there is no reason to prefer different assignment). On the other hand, the Turner/Jacob case is asymmetric, and thus it can be better adjusted to the specific shown data by varying the two fetal fractions. However, this does not reflect the true probabilistic difference between these two cases (i.e., the Turner/Jacob case can be better fitted to data because it has one more free parameter). This holds for any other pair of two possibilities from section Limitations.

Example 10

Biological sample analysis - twin pregnancy with either two euploid female (XX) fetuses or one Turner female (X0) and one triple X female (XXX) fetus

The sample of mother with two female fetuses was analyzed, as shown on Fig. 10. The evaluation is essentially analogous to the previous Example 9.

Example 11 Artificial sample analysis - Twin pregnancy with either two Turner female (X0) fetuses

This is an artificial sample that was created by taking a one sample with two male XY fetuses and one sample with two female XX fetuses, and combining them in such a way that chromosome X mapping data was taken from the former sample, and chromosome Y mapping data was taken from the latter. This case does not have any other alternatives with the same NGS data (see section Limitations). This is also indicated by the bottom plot on Fig. 11, where the case X0 - X0 dominates. On the other hand, the a priori probability of having two Turner female fetuses is so low that it overrides the NGS data in the top plot, thus making the second most probable case (X0 - XX) from the bottom plot as the most probable case in the top plot. References

1 Lo, YM Dennis, et al. "Presence of fetal DNA in maternal plasma and serum." The Lancet 350.9076 (1997): 485-4

2 Chiu, Rossa WK, et al. "Noninvasive prenatal diagnosis of fetal chromosomal aneuploidy by massively parallel genom sequencing of DNA in maternal plasma." Proceedings of the National Academy of Sciences 105.51 (2008): 20458-20463.

3 Fan, H. Christina, et al. "Noninvasive diagnosis of fetal aneuploidy by shotgun sequencing DNA from maternal blood." Proceedings of the National Academy of Sciences 105.42 (2008): 16266-16271.

4 Chiu, Rossa WK, et al. "Non-invasive prenatal assessment of trisomy 21 by multiplexed maternal plasma DNA sequencing: large scale validity study." Bmj 342 (2011): c7401.

5 Sehnert, Amy J., et al. "Optimal detection of fetal chromosomal abnormalities by massively parallel DNA sequencing of cell-free fetal DNA from maternal blood." Clinical chemistry 57.7 (2011): 1042-1049.

6 Lau, Tze Kin, et al. "Noninvasive prenatal diagnosis of common fetal chromosomal aneuploidies by maternal plasma DNA sequencing." The Journal of Maternal-Fetal & Neonatal Medicine 25.8 (2012): 1370-1374.

7 Bianchi, Diana W., et al. "Genome-wide fetal aneuploidy detection by maternal plasma DNA sequencing." Obstetrics & Gynecology 119.5 (2012): 890-901.

8 Straver, Roy, et al. "WISECONDOR: detection of fetal aberrations from shallow sequencing maternal plasma based on a within-sample comparison scheme." Nucleic acids research 42.5 (2014): e31-e31.

9 Stephanie, C. Yu, et al. "Size-based molecular diagnostics using plasma DNA for

noninvasive prenatal testing." Proceedings of the National Academy of Sciences 111.23 (2014): 8583-8588.

10 Tynan, J. A., et al. "Application of risk score analysis to low-coverage whole genome sequencing data for the noninvasive detection of trisomy 21, trisomy 18, and trisomy 13." Prenatal diagnosis 36.1 (2016): 56-62.

11 Zimmermann, Bernhard, et al. "Noninvasive prenatal aneuploidy testing of chromosomes 13, 18, 21, X, and Y, using targeted sequencing of polymorphic loci." Prenatal diagnosis 32.13 (2012): 1233-1241.

12 Mazloom, Amin R., et al. "Noninvasive prenatal detection of sex chromosomal aneuploidies by sequencing circulating cell-free DNA from maternal plasma." Prenatal diagnosis 33.6 (2013): 591-597. 1 Liang, Desheng, et al. "Non-invasive prenatal testing of fetal whole chromosome aneuploidy by massively parallel sequencing." Prenatal diagnosis 33.5 (2013): 409-415.

14 Wang, Yanlin, et al. "Maternal mosaicism is a significant contributor to discordant sex chromosomal aneuploidies associated with noninvasive prenatal testing." Clinical chemistry 60.1 (2014): 251-259.

15 https://www.ncbi.nlm.nih.gov/probe/docs/proihapmap/

16 Kim, Sung K., et al. "Determination of fetal DNA fraction from the plasma of pregnant women using sequence read counts." Prenatal diagnosis 35.8 (2015): 810-815.

17 Straver, Roy, et al. "Calculating the fetal fraction for noninvasive prenatal testing based on genome-wide nucleosome profiles." Prenatal diagnosis 36.7 (2016): 614-621.

18 Stephanie, C. Yu, et al. "Size-based molecular diagnostics using plasma DNA for noninvasive prenatal testing." Proceedings of the National Academy of Sciences 111.23 (2014): 8583-8588.

19 Jiang, Peiyong, et al. "FetalQuantSD: accurate quantification of fetal DNA fraction by shallow-depth sequencing of maternal plasma DNA." Genomic Medicine 1 (2016): 16013.

20 https://www.ncbi.nlm.nih.gov/assemblv/GCF 000001405.13/

21 Cock, Peter JA, et al. "The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants." Nucleic acids research 38.6 (2010): 1767-1771.

22 Li, Heng, et al. "The sequence alignment/map format and SAMtools." Bioinformatics 25.16 (2009): 2078-2079.

23 Benjamini, Yuval, and Terence P. Speed. "Summarizing and correcting the GC content bias in high-throughput sequencing." Nucleic acids research (2012): gksOOl.

24 Liao, Can, et al. "Noninvasive prenatal diagnosis of common aneuploidies by

semiconductor sequencing." Proceedings of the National Academy of Sciences 111.20 (2014): 7415-7420.

25 Minarik, Gabriel, et al. "Utilization of Benchtop Next Generation Sequencing Platforms Ion Torrent PGM and MiSeq in Noninvasive Prenatal Testing for Chromosome 21 Trisomy and Testing of Impact of In Silico and Physical Size Selection on Its Analytical Performance." PloS one 10.12 (2015): e0144811.

26 Genome Reference Consortium Human Build 37 (GRCh37), Feb. 2009, GenBank assembly accession: GCA_000001405.1, RefSeq assembly accession: GCF_000001405.13

27 Snijders, R. J. M., N. J. Sebire, and K. H. Nicolaides. "Maternal age and gestational age- specific risk for chromosomal defects." Fetal diagnosis and therapy 10.6 (1995): 356-367. Norton, Mary E., Laura L. Jelliffe-Pawlowski, and Robert J. Currier. "Chromosome abnormalities detected by current prenatal screening and noninvasive prenatal testing." Obstetrics & Gynecology 124.5 (2014): 979-986.

29 Hook, Ernest B., and Dorothy Warburton. "Turner syndrome revisited: review of new data supports the hypothesis that all viable 45, X cases are cryptic mosaics with a rescue cell line, implying an origin by mitotic loss." Human genetics 133.4 (2014): 417-424.

30 Grati, Francesca R., et al. "Fetoplacental mosaicism: potential implications for false- positive and false-negative noninvasive prenatal screening results." Genetics in Medicine 16.8 (2014): 620-624.

31 Stumm, Markus, et al. "Diagnostic accuracy of random massively parallel sequencing for non-invasive prenatal detection of common autosomal aneuploidies: a collaborative study in Europe." Prenatal diagnosis 34.2 (2014): 185-191.

32 Russell, Stuart, Peter Norvig, and Artificial Intelligence: A modern approach. Artificial Intelligence. Prentice-Hall, Egnlewood Cliffs 25 (1995): 27.