Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
PREIMPLANTATION GENETIC TESTING FOR POLYGENIC DISEASE RELATIVE RISK REDUCTION
Document Type and Number:
WIPO Patent Application WO/2022/055747
Kind Code:
A1
Abstract:
Disclosed herein are methods and systems of generating a genomic index G i , descriptive of the aggregate risk from multiple polygenic diseases and traits for an embryo, also optionally aggregating monogenic, structural, karyotypal, and other risks of the embryo into this single index score. The genomic index Gi generated may be used in methods and systems for selecting an embryo from a set of non-aneuploid embryos or for ranking a set of non-aneuploid embryos for suitability for intrauterine transfer from the set of non-aneuploid embryos. Also disclosed herein are methods of correcting the genomic sequence of an embryo measured using a genomic sequencer and/or measured in replicate using parent and sibling genotypes measured using a genotyping array. Also disclosed herein are methods and systems for validating the foregoing methods and systems.

Inventors:
TELLIER LAURENT CHRISTIAN ASKER MELCHIOR (US)
Application Number:
PCT/US2021/048367
Publication Date:
March 17, 2022
Filing Date:
August 31, 2021
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
GENOMIC PREDICTION (US)
International Classes:
A61B5/00; C12N15/00; G16B20/20; G16B20/40; G16H50/20; G16H50/30
Foreign References:
US20190065670A12019-02-28
US20200118647A12020-04-16
US20190017119A12019-01-17
US20150356243A12015-12-10
US20190323081A12019-10-24
Attorney, Agent or Firm:
COHEN, Mark et al. (US)
Download PDF:
Claims:
What is claimed is:

1. A method of generating a genomic index Gi, descriptive of the aggregate risk from (1,..., n) polygenic diseases and traits for an embryo, also optionally aggregating monogenic, structural, karyotypal, and other risks of the embryo into this single index score; the method implemented by a computer processor executing program instructions, the method comprising the steps of: a. for each i = 1,..., n of the plurality of polygenic diseases identifying a genotype at each of a plurality of defined disease specific genetic loci (1,..., ki) from a genomic DNA sample obtained from the embryo; b. using the processor, for one of the plurality of polygenic diseases constructing a disease specific genomic profile including the genotype identified at each of the plurality of defined disease specific genetic loci; c. using the processor, comparing the disease specific genomic profile to one or more databases to determine an absolute probability Pi of an embryo with said disease specific genomic profile developing said polygenic disease; d. repeating steps (b) and (c) for each of said plurality of polygenic diseases; e. generating for the embryo the genomic index Gi, wherein the genomic index is an aggregate score, calculated using as components (whether by addition, multiplication, or any other operation), for each of the plurality of polygenic diseases: i. the population average probability of getting the disease (PA(), ii. the absolute probability of the embryo developing said polygenic disease (Pi), iii. the effect on life expectancy from each disease measured as lifespan impact years (Qi), the effect on life quality from each disease measured as lifespan quality (LQi), the effect on life treatment cost from each disease measured as lifetime treatment cost (LCi), or any other such factor as may be used to appropriately weight the impact of the disease; and f. optionally aggregating monogenic, structural, karyotypal, and other risks of the embryo into this single index score.

25 The method according to claim 1, wherein the genomic index Gi is calculated by the following:

Gi = ^i (QifPAi- Pi)). The method according to claim 1 where there is a contribution from a monogenic or structural variant, the genomic index Gi is calculated by the following:

Gi = ^t (G/PA( - P()) + SD (MCj tMPAj - MPj)) wherein: i. the difference between the population average probability of getting the monogenic (or structural, or karyotypal) disease (MPAJ), ii. the absolute probability of the embryo developing said monogenic (or structural, or karyotypal) disease (MPj), iii. the effect on life expectancy from each disease measured as lifespan impact years (M(2j), or any other such factor as may be used to appropriately weight the impact of the monogenic (or structural, or karyotypal) disease. The method according to claim 1, wherein the genomic index Gi is primarily calculated to reflect a subset of polygenic diseases such as cancer, heart disease, embryo miscarriage risk for the embryo in question, psychiatric disorders, or diabetes. The method according to claim 1, wherein the embryo has an unknown family history of polygenic disease. The method according to claim 1, wherein the embryo has a known family history of disease for one or more of said plurality of polygenic diseases, and this family history is used to adjust the calculation of the genomic index Gi. The method according to claim 1, further comprising the step of screening the embryo for aneuploidy, or monogenic disease, or structural variants, or any combination of these three. The method according to claim 1, further comprising the step of assessing the morphology of the embryo. A method of selecting an embryo for suitability for intrauterine transfer from a set of non-aneuploid embryos, the method implemented by a computer processor executing program instructions, the method comprising the steps of: a. using the processor, generating a genomic index Gi according to the method of any one of claims 1 - 8 for each embryo of the set of non-aneuploid embryos; b. including, excluding, up-prioritizing or down-prioritizing the embryo from the set of non-aneuploid embryos as suitable or unsuitable for intrauterine transfer, if the genomic index G, of the embryo is better than or worse than the genomic index Gi of another embryo of said set of non-aneuploid embryos, or if the genomic index Gi falls above or below a pre-determined threshold; and c. repeating steps (a) - (b) for each embryo of said set of non-aneuploid embryos. The method according to claim 9, wherein the set of non-aneuploid embryos are siblings. A method of ranking a set of non-aneuploid embryos for suitability for intrauterine transfer, the method implemented by a computer processor executing program instructions, the method comprising the steps of: a. using the processor, generating a genomic index Gi according to the method of any one of claims 1 - 8 for each embryo of the set of non-aneuploid embryos; b. ranking each embryo from the set of non-aneuploid embryos for suitability for intrauterine transfer based on the genomic index Gi of that embryo, wherein the ranking is relative to the genomic index Gi of another embryo of said set of non- aneuploid embryos or the ranking is relative to a pre-determined standard; and c. repeating steps (a) - (b) for each embryo of said set of non-aneuploid embryos. The method according to claim 11, wherein the set of non-aneuploid embryos are siblings. A system for generating a genomic index Gi , descriptive of the aggregate risk from (1,..., n) polygenic diseases and traits for an embryo, also optionally aggregating monogenic, structural, karyotypal, and other risks of the embryo into this single index score, the system comprising: a memory; and a computer processor to: a. identify a genotype at each of a plurality of defined disease specific genetic loci (1,..., ki) for each i = 1,..., n of the plurality of polygenic diseases from a genomic DNA sample obtained from the embryo; b. construct for one of the plurality of polygenic diseases a disease specific genomic profile including the genotype identified at each of the plurality of defined disease specific genetic loci; c. compare the disease specific genomic profile to one or more databases to determine an absolute probability P( of an embryo with said disease specific genomic profile developing said polygenic disease; and d. repeat steps (b) and (c) for each of said plurality of polygenic diseases; and e. generate for the embryo the genomic index G, of the risk for said a plurality of polygenic diseases, wherein the genomic index is a sum, for each of the plurality of polygenic diseases, of the difference between the population average probability of getting the disease (PA() and the absolute probability of the embryo developing said polygenic disease (P() and wherein the sum is weighted by the effect on life expectancy from each disease measured as lifespan impact years Qi and f. optionally aggregate monogenic, structural, karyotypal, and other risks of the embryo into this single index score. The system according to claim 13, wherein the genomic index Gi is

Si Qi(PAi - Pi . The system according to claim 13, wherein the plurality of polygenic diseases includes cancer, heart disease, and diabetes. The system according to claim 13, wherein the embryo has an unknown family history of polygenic disease. The system according to claim 13, wherein the embryo has a known family history of disease for at least one of a said plurality of polygenic diseases. A bioinformatics method of error correction using parent and sibling genotypes, comprising correcting genomic sequences of embryos using genomic sequences of an embryo’s mother and father, wherein the genomic sequences of the mother and father are measured using a genotyping array, while the genomic sequences of the embryos are measured using a genomic sequencer. A bioinformatics method of error correction using parent and sibling genotypes, comprising correcting genomic sequences of embryos using genomic sequences of an

28 embryo’s mother and father, wherein the genomic sequences of the mother and father are measured using a genotyping array, while the genomic sequences of the embryos are measured in replicate (whether by genomic sequencer, by genotyping array, or by a combination of the two), and generating a unified, replicate-corrected embryo genome by using the replicate measurements of the genomes of the same embryo to correct one another. The method according to claim 18 or claim 19, comprises comparing a state of a base or set of bases in a specific region in the embryo genome to the relevant regions of the maternal, paternal, and optionally population level genomes, wherein if the region is missing in the embryo genome setting the missing base values to those on the parental or population genomes and/or if all but one or a few bases in a region of the embryo genome could have originated from specific, corresponding blocks from the parental or population genomes, setting the discrepant base values to those on the respective parental or population blocks. A method of validating that the above methods according to any of claims 18-20 are accurate by comparing the genomic sequence of a cell line as measured with two different systems, wherein one of these measurements is a baseline sequence, due to having been characterized at a high level of genotyping resolution and wherein the second of these measurements is treated as an “embryo stand-in” or a test of the method by sampling the cell line sequence at a reduced resolution; correcting the reduced resolution sequence according to the methods of any of claims 18-20; and comparing the corrected reduced resolution sequence to the baseline sequence.

29

Description:
PREIMPLANTATION GENETIC TESTING FOR POLYGENIC DISEASE

RELATIVE RISK REDUCTION

BACKGROUND OF THE INVENTION

[001] In vitro fertilization (IVF) is the most effective treatment of infertility. As clinical and laboratory methods have improved, so has the efficiency of producing blastocysts suitable for intrauterine transfer. As a result, IVF patients and physicians are often faced with determining which specific embryo to transfer. The default strategy for choosing which embryo to transfer involves ranking embryos through careful microscopy-based characterization of development and morphology. However, preimplantation genetic testing (PGT) has become a routine method for embryo selection, now implemented in 40% of all in vitro fertilization (IVF) cycles in the United States. PGT is most commonly applied to select euploid (or likely euploid, hereafter described as non-aneuploid) embryos for transfer, while avoiding those embryos designated as aneuploid (PGT-A). The primary objective of PGT-A is to improve the success of IVF in the first attempted embryo transfer. Again, the default strategy for choosing which non-aneuploid embryo to transfer involves ranking embryos through careful microscopy -based characterization of development and morphology.

[002] More recently, the opportunity to characterize the risk of polygenic disease in the preimplantation embryo has been made possible. Polygenic disorders, conditions influenced by genetic variants in multiple genes, account for a large percentage of premature death in humans. These are largely contributed to by cancer, heart disease, and diabetes. There is a growing body of evidence that the risk of these diseases is higher in individuals seeking fertility treatments. Despite the potential for environmental influence, polygenic disease risk can now be accurately predicted for several common diseases including cancer, heart disease, and diabetes using DNA alone. The ability to achieve equivalent accuracy in genome-wide genotyping of DNA from a preimplantation embryo as is already achieved when DNA is tested from adults was recently demonstrated. Therefore, the same performance in predicting polygenic disease in adults can now be achieved in preimplantation embryos.

[003] Polygenic risk scoring in adults is often performed and evaluated in the context of entire populations of unrelated people. In contrast, PGT involves evaluating genetic risks among sibling embryos within a single family. This was addressed previously by evaluating blinded DNA from 2,601 adult sibling pair families with known type 1 diabetes status. Results demonstrated 45-72% reduction in the incidence of type 1 diabetes when 1 sibling was chosen based on a polygenic risk score compared to when 1 sibling was chosen randomly. This study demonstrated clinical utility of PGT-P in a situation that is similar to PGT for monogenic disease, where intended parents have a known risk of passing on the disease.

[004] Accordingly, there is a need for methods and systems for embryo selection and the reduction of the risk of polygenic disease beyond conventional aneuploidy screening and morphological assessment and is applicable to intended parents whose embryos have family histories ranging from an affected first order relative to no known history.

SUMMARY OF THE INVENTION

[005] In one aspect, provided herein are methods of generating a genomic index Gi, descriptive of the aggregate risk from (1,..., n) polygenic diseases and traits for an embryo, also optionally aggregating monogenic, structural, karyotypal, and other risks of the embryo into this single index score; the method implemented by a computer processor executing program instructions, the method comprising the steps of: (a) for each i = 1,..., n of the plurality of polygenic diseases identifying a genotype at each of a plurality of defined disease specific genetic loci (1,..., ki) from a genomic DNA sample obtained from the embryo; (b) using the processor, for one of the plurality of polygenic diseases constructing a disease specific genomic profile including the genotype identified at each of the plurality of defined disease specific genetic loci; (c) using the processor, comparing the disease specific genomic profile to one or more databases to determine an absolute probability P ( of an embryo with said disease specific genomic profile developing said polygenic disease; (d) repeating steps (b) and (c) for each of said plurality of polygenic diseases; and (e) generating for the embryo the genomic index Gi, wherein the genomic index is an aggregate score, calculated using as components (whether by addition, multiplication, or another operation), for each of the plurality of polygenic diseases: (i) the population average probability of getting the disease (PA,), (ii) the absolute probability of the embryo developing said polygenic disease (P ( ), (iii) the effect on life expectancy from each disease measured as lifespan impact years Qi , the effect on life quality from each disease measured as lifespan quality (LQi), the effect on life treatment cost from each disease measured as lifetime treatment cost (LC ( ), or any other such factor as may be used to appropriately weight the impact of the disease; and (f) optionally aggregating monogenic, structural, karyotypal, and other risks of the embryo into this single index score.

[006] In another aspect, provided herein are methods of selecting an embryo for suitability for intrauterine transfer from a set of non-aneuploid embryos, the method implemented by a computer processor executing program instructions, the method comprising the steps of: (a) using the processor, generating a genomic index Gi as described herein for each embryo of the set of non-aneuploid embryos; (b) including, excluding, up-prioritizing or down-prioritizing the embryo from the set of non-aneuploid embryos as suitable or unsuitable for intrauterine transfer, if the genomic index G, of the embryo is better than or worse than the genomic index Gi of another embryo of the set of non-aneuploid embryos, or if the genomic index Gi falls above or below a pre-determined threshold; and (c) repeating steps (a) - (b) for each embryo of the set of non-aneuploid embryos.

[007] In another aspect, provided herein are methods of ranking a set of non-aneuploid embryos for suitability for intrauterine transfer, the method implemented by a computer processor executing program instructions, the method comprising the steps of: (a) using the processor, generating a genomic index Gi as described herein for each embryo of the set of non-aneuploid embryos; (b) ranking each embryo from the set of non-aneuploid embryos for suitability for intrauterine transfer based on the genomic index Gi of that embryo, wherein the ranking is relative to the genomic index Gi of another embryo of the set of non-aneuploid embryos or the ranking is relative to a pre-determined standard; and (c) repeating steps (a) - (b) for each embryo of the set of non-aneuploid embryos.

[008] In another aspect, provided herein are systems for generating a genomic index Gi , descriptive of the aggregate risk from (1,..., n) polygenic diseases and traits for an embryo, also optionally aggregating monogenic, structural, karyotypal, and other risks of the embryo into this single index score, the system comprising: a memory and a computer processor to: (a) identify a genotype at each of a plurality of defined disease specific genetic loci (1,..., ki) for each i = 1 , ... , n of the plurality of polygenic diseases from a genomic DNA sample obtained from the embryo; (b) construct for one of the plurality of polygenic diseases a disease specific genomic profile including the genotype identified at each of the plurality of defined disease specific genetic loci; (c) compare the disease specific genomic profile to one or more databases to determine an absolute probability P ( of an embryo with the disease specific genomic profile developing the polygenic disease; (d) repeat steps (b) and (c) for each of the plurality of polygenic diseases; (e) generate for the embryo the genomic index Gi of the risk for the plurality of polygenic diseases, wherein the genomic index is a sum, for each of the plurality of polygenic diseases, of the difference between the population average probability of getting the disease (PA ( ) and the absolute probability of the embryo developing the polygenic disease (P ( ) and wherein the sum is weighted by the effect on life expectancy from each disease measured as lifespan impact years Qi and (f) optionally aggregate monogenic, structural, karyotypal, and other risks of the embryo into this single index score.

[009] In another aspect, provided herein are systems for selecting an embryo for suitability for intrauterine transfer from a set of non-aneuploid embryos, the system comprising: a memory and a computer processor to: (a) generate a genomic index G, as described herein for each embryo of the set of non-aneuploid embryos; (b) include, exclude, up-prioritize or down- prioritize the embryo from the set of non-aneuploid embryos as suitable or unsuitable for intrauterine transfer, if the genomic index G, of the embryo is better than or worse than the genomic index G ( of another embryo of the set of non-aneuploid embryos, or if the genomic index G ( falls above or below a pre-determined threshold; and (c) repeat steps (a) - (b) for each embryo of the set of non-aneuploid embryos.

[0010] In another aspect, provided herein are systems for ranking a set of non-aneuploid embryos for suitability for intrauterine transfer, the system comprising: a memory and a computer processor to: (a) generate a genomic index G ( as described herein for each embryo of the set of non-aneuploid embryos; (b) rank each embryo from the set of non-aneuploid embryos for suitability for intrauterine transfer based on the genomic index G ( of that embryo, wherein the ranking is relative to the genomic index G ( of another embryo of the set of non-aneuploid embryos or the ranking is relative to a pre-determined standard; and (c) repeat steps (a) - (b) for each embryo of the set of non-aneuploid embryos.

[0011] In another aspect, provided herein are systems for dividing the potential offspring embryos of a given couple of parents into a series of 100 percentiles with respect to genetic health, and the demarcation of any given embryo as falling into one of these 100 percentiles.

[0012] Other features and advantages will become apparent from the following detailed description, examples, and figures. It should be understood, however, that the detailed description and the specific examples while indicating preferred embodiments of the invention are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description.

BRIEF DESCRIPTION OF THE FIGURES

[0013] The subject matter regarded as the invention is particularly pointed out and distinctly claimed in the concluding portion of the specification. The invention, however, both as to organization and method of operation, together with objects, features, and advantages thereof, may best be understood by reference to the following detailed description when read with the accompanying drawings. [0014] Figure 1 : A pedigree of a case presenting for PGT-P with a family history of type 1 diabetes and breast cancer (shown in purple and turquoise respectively).

[0015] Figures 2: Type 1 Diabetes Case PGT-P Results. Risk Percentile indicates the predicted risk in terms of the computed polygenic risk score with respect to the distribution of risk scores from the UK Biobank cohort. Risk is classified as high when the average population-matched sample with the polygenic risk score is in the top 2% of risk, otherwise it is classified as normal risk.

[0016] Figures 3: Relative risk reduction (RRR) across 11 diseases using genomic index selection compared to random selection within 11,883 sibling pairs. The frequency of disease with random selection is shown in blue, while the frequency of disease with genomic index selection is shown in orange. These data show a clear benefit from genetic selection of 1 of only 2 siblings with unknown family history of disease. *p- value < 0.05 (Table 1).

[0017] Figure 4: Preimplantation embryo genomic index versus family history. Embryos with a first degree affected relative have significantly higher risk of polygenic disease than embryos with unknown family history of polygenic disease. **p = 0.0015 vs unknown. *p = 0.0129 vs unknown.

[0018] Figure 5: Schematically illustrates a system for executing one or more methods according to embodiments of the invention.

[0019] Figures 6A-6C: Schematically illustrates a system for grouping individual line items from the embryo health score into factors according to embodiments of the invention.

[0020] It will be appreciated that for simplicity and clarity of illustration, elements shown in the figures have not necessarily been drawn to scale. For example, the dimensions of some of the elements may be exaggerated relative to other elements for clarity. Further, where considered appropriate, reference numerals may be repeated among the figures to indicate corresponding or analogous elements.

DETAILED DESCRIPTION OF THE INVENTION

[0021] In the following description, various aspects of the present invention will be described. For purposes of explanation, specific configurations and details are set forth to provide a thorough understanding of this invention. However, it will also be apparent to one skilled in the art that the present invention may be practiced without the specific details presented herein. Furthermore, well known features may be omitted or simplified in order not to obscure the present invention.

[0022] Unless specifically stated otherwise, as apparent from the following discussions, it is appreciated that throughout the specification discussions utilizing terms such as “processing,” “computing,” “calculating,” “determining,” or the like, refer to the action and/or processes of a computer or computing system, or similar electronic computing device, that manipulates and/or transforms data represented as physical, such as electronic, quantities within the computing system’ s registers and/or memories into other data similarly represented as physical quantities within the computing system’s memories, registers or other such information storage, transmission or display devices.

[0023] In accordance with embodiments of the present invention and as used herein, the following terms are defined with the following meanings, unless explicitly stated otherwise.

[0024] As used herein, “haploid cell” refers to a cell with a haploid number (n) of chromosomes.

[0025] “Gametes”, as used herein, are specialized haploid cells (e.g., spermatozoa and oocytes) produced through the process of meiosis and involved in sexual reproduction. As used herein, “gametotype” refers to single genome copies with one allele of each of one or more loci in the haploid genome of a single gamete.

[0026] As used herein, an “autosome” is any chromosome exclusive of the X and Y sex chromosomes.

[0027] As used herein, “diploid cell” has a homologous pair of each of its autosomal chromosomes and has two copies (2n) of each autosomal genetic locus.

[0028] The term “chromosome”, as used herein, refers to a molecule of DNA with a sequence of base pairs that corresponds closely to a defined chromosome reference sequence of the organism in question.

[0029] As used herein, “euploid cell” or “euploid organism” refers to a cell or organism, respectively, having an exact multiple of the haploid number of chromosomes.

[0030] As used herein, “aneuploid cell” or “aneuploid orgranism” referes to a cell or organism, respectively, having a chromosome number that is not an exact multiple of the usually haploid number.

[0031] The term “gene”, as used herein, refers to a DNA sequence in a chromosome that codes for a product (either RNA or its translation product, a polypeptide) or otherwise plays a role in the expression of said product. A gene contains a DNA sequence with biological function. The biological function may be contained within the structure of the RNA product or a coding region for a polypeptide. The coding region includes a plurality of coding segments (“exons”) and intervening non-coding sequences (“introns”) between individual coding segments and non-coding regions preceding and following the first and last coding regions, respectively.

[0032] The term “gene product”, as used herein, refers to a product (either RNA or its translation product, a polypeptide) that is encoded by a gene and that has biological function.

[0033] As used herein, “locus” refers to any segment of DNA sequence defined by chromosomal coordinates in a reference genome known to the art, irrespective of biological function. A DNA locus may contain multiple genes or no genes; it may be a single base pair or millions of base pairs.

[0034] As used herein, a “polymorphic locus” is a genomic locus at which two or more alleles have been identified.

[0035] As used herein, an “allele” is one of two or more existing genetic variants of a specific polymorphic genomic locus.

[0036] As used herein, a “single nucleotide polymorphism” or “SNP” is a particular base position in the genome where alternative bases are known to distinguish one individual from another. Most categories of more complex genetic variants may be reduced for analytical purposes to one or a few defining SNPs.

[0037] As used herein, a “copy number variant” or “CNV” is a deletion or duplication of a large block of genetic material that exists in a population at a frequency less than 1%. As used herein, a “copy number polymorphism” or “CNP” is a deletion or duplication of a large block of genetic material that exists in a population at a frequency of greater than 1%. Since a CNV in one population may be a CNP in a second population, the two terms may be used interchangeably.

[0038] As used herein, “genotype” refers to the diploid combination of alleles at a given genetic locus, or set of related loci, in a given cell or organism. A homozygous subject carries two copies of the same allele and a heterozygous subject carries two distinct alleles. In the simplest case of a locus with two alleles “A” and “a”, three genotypes may be formed: A/A, A/a, and a/a.

[0039] As used herein, “genotyping” refers to any experimental, computational, or observational protocol for distinguishing an individual’s genotype at one or more well-defined loci.

[0040] As used herein, a “haplotype” is a unique set of alleles at separate loci that are normally grouped closely together on the same DNA molecule and are observed to be inherited as a group. A haplotype may be defined by a set of specific alleles at each defined polymorphic locus within a haploblock. As used herein, a “haploblock” refers to a genomic region that maintains genetic integrity over multiple generations and is recognized by linkage disequilibrium within a population. Haploblocks are defined empirically for a given population of individuals.

[0041] As used herein, “linkage disequilibrium” is the non-random association of alleles at two or more loci within a particular population. Linkage disequilibrium is measured as a departure from the null hypothesis of linkage equilibrium, where each allele at one locus associates randomly with each allele at a second locus in a population of individual genomes.

[0042] As used herein, a “genome” is the total genetic information carried by an individual organism or cell, represented by the complete DNA sequences of its chromosomes.

[0043] As used herein, a “genomic profile” is a representative subset of the total information contained within a genome. A genomic profile contains genotypes at a particular set of polymorphic loci.

[0044] As used herein, a genetic “trait” is a distinguishing attribute of an individual, whose expression is fully or partially influenced by an individual’s genetic constitution.

[0045] As used herein, “disease” refers to a trait that is at least partially heritable and causes a reduction in the quality of life of an individual person.

[0046] As used herein, a “phenotype” includes alternative traits which may be discrete or continuous. Phenotypes may include both traits and diseases.

[0047] As used herein, “NCBI” refers to the National Center for Biotechnology Information which is a division of the National Library of Medicine at the U.S. National Institutes of Health. The NCBI operates under a Congressional mandate to develop, maintain, and distribute databases and software to the research and medical communities.

[0048] As used herein, a “variant” is a particular allele at a locus where at least two alleles have been identified.

[0049] As used herein, a “mutation” has the same meaning as a “mutant allele” which is a variant that causes a gene to function abnormally.

[0050] As used herein, a “single gene phenotype” is a phenotype that may be caused by the expression of a genotype of a single gene. As used herein, a “single gene disease” is a disease that may be caused by a mutation or mutations in a single gene.

[0051] As used herein, a “polygenic phenotype” is a phenotype that may be caused by the expression of genotypes of more than one gene. As used herein, a “gene disease” is a disease that may be caused by mutations in more than one gene

[0052] As used herein, a “recessive phenotype” is a single gene phenotype whose expression is restricted to individuals who inherit a genotype with two copies of a particular gene. As used herein, a “dominant phenotype” is a single gene phenotype whose expression is restricted to individuals who inherit a genotype with at least one copy of a particular gene.

[0053] As used herein, “disease risk” refers to the likelihood that an existing person or a person bom via IVF from an embryo will express a specified disease based on an interpretation of genetic data which is informed by empirical data or bioinformatic modeling.

[0054] As used herein, a “non-synonymous variant” is a DNA variant that alters the coding sequence of a gene, thereby altering the amino acid sequence of the protein product of the gene. [0055] As used herein, “altering the gene product” (and grammatical variations thereof and the like) from a gene, refers to a change of the wild-type or normal biological function of the gene and that is caused by mutations of the gene. Alteration of the gene product from a gene includes alterations to transcription of the gene, alterations to translation of the gene, and alterations to the gene product itself.

[0056] It may be appreciated by persons of skill in the art that the discussion herein of disease, mutations, variants and other defective or negative functions are only examples of phenotypes and that such embodiments relate to any phenotype having negative, positive or neutral function.

[0057] Embodiments of the invention may provide a system and method for testing for a probability of future emergence of phenotypes in living organisms (real progeny that do not currently express the phenotypes). Living organisms may include organisms that were living at any time including those that are now dead.

[0058] In one aspect, provided herein are methods of generating a genomic index G, , descriptive of the aggregate risk from (1,..., n) polygenic diseases and traits for an embryo, also optionally aggregating monogenic, structural, karyotypal, and other risks of the embryo into this single index score; the method implemented by a computer processor executing program instructions, the method comprising the steps of: (a) for each i = 1,. . ., n of the plurality of polygenic diseases identifying a genotype at each of a plurality of defined disease specific genetic loci (1,..., ki) from a genomic DNA sample obtained from the embryo; (b) using the processor, for one of the plurality of polygenic diseases constructing a disease specific genomic profile including the genotype identified at each of the plurality of defined disease specific genetic loci; (c) using the processor, comparing the disease specific genomic profile to one or more databases to determine an absolute probability Pi of an embryo with said disease specific genomic profile developing said polygenic disease; (d) repeating steps (b) and (c) for each of said plurality of polygenic diseases; and (e) generating for the embryo the genomic index Gi, wherein the genomic index is an aggregate score, calculated using as components (whether by addition, multiplication, or any other operation), for each of the plurality of polygenic diseases: (i) the population average probability of getting the disease (PA ( ), (ii) the absolute probability of the embryo developing said polygenic disease (Pi), (iii) the effect on life expectancy from each disease measured as lifespan impact years (Qi), the effect on life quality from each disease measured as lifespan quality (LQi), the effect on life treatment cost from each disease measured as lifetime treatment cost (LCi), or any other such factor as may be used to appropriately weight the impact of the disease; and (f) optionally aggregating monogenic, structural, karyotypal, and other risks of the embryo into this single index score.

[0059] In another aspect, provided herein are methods of selecting an embryo for suitability for intrauterine transfer from a set of non-aneuploid embryos, the method implemented by a computer processor executing program instructions, the method comprising the steps of: (a) using the processor, generating a genomic index Gi as described herein for each embryo of the set of non-aneuploid embryos; (b) including, excluding, up-prioritizing or down-prioritizing the embryo from the set of non-aneuploid embryos as suitable or unsuitable for intrauterine transfer, if the genomic index Gi of the embryo is better than or worse than the genomic index Gi of another embryo of said set of non-aneuploid embryos, or if the genomic index Gi falls above or below a pre-determined threshold; and (c) repeating steps (a) - (b) for each embryo of said set of non-aneuploid embryos. In some embodiments, the set of non-aneuploid embryos are siblings.

[0060] In another aspect, provided herein are methods of ranking a set of non-aneuploid embryos for suitability for intrauterine transfer, the method implemented by a computer processor executing program instructions, the method comprising the steps of: (a) using the processor, generating a genomic index Gi as described herein for each embryo of the set of non-aneuploid embryos; (b) ranking each embryo from the set of non-aneuploid embryos for suitability for intrauterine transfer based on the genomic index Gi of that embryo, wherein the ranking is relative to the genomic index Gi of another embryo of said set of non-aneuploid embryos or the ranking is relative to a pre-determined standard; and (c) repeating steps (a) - (b) for each embryo of said set of non-aneuploid embryos. In some embodiments, the set of non-aneuploid embryos are siblings.

[0061] In some embodiments of the methods described herein, the method further comprises the step of assessing the morphology of the embryo. In some embodiments of the methods described herein, the method further comprises the step of screening the embryo for aneuploidy, or monogenic disease, or structural variants, or any combination thereof.

[0062] In another aspect, provided herein are systems for generating a genomic index Gi , descriptive of the aggregate risk from (1,..., n) polygenic diseases and traits for an embryo, also optionally aggregating monogenic, structural, karyotypal, and other risks of the embryo into this single index score, the system comprising: a memory and a computer processor to: (a) identify a genotype at each of a plurality of defined disease specific genetic loci (1,.. ki) for each i = 1 , ... , n of the plurality of polygenic diseases from a genomic DNA sample obtained from the embryo; (b) construct for one of the plurality of polygenic diseases a disease specific genomic profile including the genotype identified at each of the plurality of defined disease specific genetic loci; (c) compare the disease specific genomic profile to one or more databases to determine an absolute probability Pi of an embryo with said disease specific genomic profile developing said polygenic disease; (d) repeat steps (b) and (c) for each of said plurality of polygenic diseases; and (e) generate for the embryo the genomic index G, of the risk for said a plurality of polygenic diseases, wherein the genomic index is a sum, for each of the plurality of polygenic diseases, of the difference between the population average probability of getting the disease (PA ( ) and the absolute probability of the embryo developing said polygenic disease (P ( ) and wherein the sum is weighted by the effect on life expectancy from each disease measured as lifespan impact years (Qi) and (f) optionally aggregate monogenic, structural, karyotypal, and other risks of the embryo into this single index score. In some embodiments, the set of non-aneuploid embryos are siblings.

[0063] In another aspect, provided herein are systems for selecting an embryo for suitability for intrauterine transfer from a set of non-aneuploid embryos, the system comprising: a memory and a computer processor to: (a) generate a genomic index Gi as described herein for each embryo of the set of non-aneuploid embryos; (b) include, exclude, up-prioritize or down- prioritize the embryo from the set of non-aneuploid embryos as suitable or unsuitable for intrauterine transfer, if the genomic index Gi of the embryo is better than or worse than the genomic index Gi of another embryo of said set of non-aneuploid embryos, or if the genomic index Gi falls above or below a pre-determined threshold; and (c) repeat steps (a) - (b) for each embryo of said set of non-aneuploid embryos. In some embodiments, the set of non-aneuploid embryos are siblings.

[0064] In another aspect, provided herein are systems for ranking a set of non-aneuploid embryos for suitability for intrauterine transfer, the system comprising: a memory and a computer processor to: (a) generate a genomic index Gi as described herein for each embryo of the set of non-aneuploid embryos; (b) rank each embryo from the set of non-aneuploid embryos for suitability for intrauterine transfer based on the genomic index Gi of that embryo, wherein the ranking is relative to the genomic index Gi of another embryo of said set of non-aneuploid embryos or the ranking is relative to a pre-determined standard; and (c) repeat steps (a) - (b) for each embryo of said set of non-aneuploid embryos. In some embodiments, the set of non- aneuploid embryos are siblings.

[0065] In some embodiments of the methods and systems described herein, the genomic index Gi is calculated by the following:

Gi = Si (.Qi PAi - Pi ).

In some embodiments of the methods and systems described herein, where there is a contribution from a monogenic or structural variant, the genomic index Gi is calculated by the following:

Gi = Si (Qi(PAi - Piy) + a (MQi PAj - Pi)) wherein: (i) the difference between the population average probability of getting the monogenic (or structural, or karyotypal) disease (M-PAJ), (ii) the absolute probability of the embryo developing said monogenic (or structural, or karyotypal) disease (MPj), and (iii) the effect on life expectancy from each disease measured as lifespan impact years (MQj ), or any other such factor as may be used to appropriately weight the impact of the monogenic (or structural, or karyotypal) disease.

[0066] In some embodiments of the methods and systems described herein, the genomic index Gi is primarily calculated to reflect a subset of polygenic diseases such as cancer, heart disease, embryo miscarriage risk for the embryo in question, psychiatric disorders, or diabetes. In some embodiments of the methods and systems described herein, the plurality of polygenic diseases includes cancer, heart disease, and diabetes. In some embodiments of the methods and systems described herein, the embryo has an unknown family history of polygenic disease. In some embodiments of the methods and systems described herein, the embryo has a known family history of disease for one or more of said plurality of polygenic diseases, and in some of these embodiments, this family history is used to adjust the calculation of the genomic index Gi.

[0067] In yet another aspect, provided herein are bioinformatic systems and methods of error correction for a genomic sequence of an embryo using parent and sibling genotypes. In some embodiments, the genomic sequence of the embryo is corrected using the genomes of the mother and the father, where the two genomes of the mother and father are measured using a genotyping array, while the genomes of the embryos are measured using a genomic sequencer. In some embodiments, the embryo’s genomic sequence is corrected using the genomes of the mother and the father, when the two genomes of the mother and father are measured using a genotyping array, while the embryos are measured by a genomic sequencer, or by another genotyping array. In some embodiments, the genomes of each embryo is measured in replicate (whether measured by a genomic sequencer, by a genotyping array, or by a combination thereof), and these replicate measurements of the genomes of the same embryo are used to correct one another, into a unified, replicate-corrected embryo genome.

[0068] In combination with these replicates, further methods of error correction, using parent and sibling genotypes, may optionally be employed. In some embodiments, the genomic sequence of the embryo is corrected using the genomes of the mother and the father, in combination with a population reference panel, as a unified method, where the two are combined in particular by using the population reference panel as a tie-breaking vote during the process when the two parental genomes are being used to error correct, and the inferences made from the two parental genomes are inconclusive, as well as imputing with the population reference panel both prior to, and posterior to, applying error correction from the parental genomes.

[0069] An instantiation, by no means exhaustive of all possibilities covered, of the foregoing methods and systems of error connection for a genomic sequence of an embryo includes ones wherein the embryo and the parents have their DNA sequences initially characterized using the aforementioned technologies; and because DNA is inherited in blocks by the embryo from the original genome sequences of the mother and father, the state of a base or set of bases in a specific region in the embryo genome can be compared to the relevant regions of the maternal, paternal, and population level genomes. If, for example, all but one or a few bases in a region of the embryo genome could have originated from specific, corresponding blocks from the parents’ genomes, the most likely explanation for the discrepancy is an error in reading out the embryo DNA, which has been amplified from just a few cells. Setting the discrepant base values to those on the parental or population blocks will likely correct the error. Similarly, when a read of the embryo’s genome is entirely missing, setting the missing base values to those on the parental or population blocks will likely correct the error. Processes implementing the correction using only parental genomes, or only population genomes, have been tested and shown to substantially improve accuracy. [0070] In yet another aspect, method of validating that the methods and systems described herein are accurate, is to compare the genomic sequence of a cell line as measured using two different systems. One of these measurements can be treated as a baseline truth, due to having been characterized at a high level of genotyping resolution. The second of these measurements can be treated as an “embryo stand-in”, or as a test of the method or system. The second “embryo stand-in” cell line will in this case be sampled at a reduced resolution (“downsampled”) as a way of having it play the part of an embryo DNA sample - this “embryo stand-in” low resolution is significant, because the smaller amount of DNA present in a sample from an embryo generally results in a lower resolution genome. The “embryo stand-in” genome is then fixed or corrected using the genomes of other cell lines which are parental to the “embryo stand-in”, and still other genomes which are sibling to the “embryo stand-in”, and which are, in some cases, either downsampled in the same way, or not downsampled - respectively representing sibling embryos, and previously birthed siblings. The methods and systems of validation can optionally be combined (or not combined) with population reference panels, as described above. Finally, the validation methods and systems described herein can also be used to validate other methods and systems described herein, such as for demonstration of the preservation of embryo ranking or embryo selection as determined using a genomic index G, as described herein.

[0071] Fig. 5 is a schematic illustration of a system 500 according to an embodiment of the invention. Methods disclosed herein may be performed using the system of Fig. 5.

[0072] System 500 may include a genetic sequencing module 502 that accepts genetic material or DNA samples from an existing person or embryo and generates a genome or genomic profile for each. Genetic sequencing module 502 may include a processor 504 for generating each genomic profile and a memory 506 for storing each genomic profile.

[0073] Computing device 508 may include, for example, any suitable processing system, computing system, computing device, processing device, computer, processor, or the like, and may be implemented using any suitable combination of hardware and/or software. Computing device 508 may include for example one or more processor(s) 512, memory 514 and software 516. Data generated by genetic sequencing module 502, such as each genomic profile, may be transferred, for example, to computing device 508. The data may be stored in the memory 514 as for example digital information and transferred to computing device 508 by uploading, copying or transmitting the digital information. Processor 504 may communicate with computing device 508 via wired or wireless command and execution signals. [0074] Computing device 508 may use each person’s or embryo’s genome information to generate a genomic profile. Computing device 508 may combine the probabilities or likelihoods of a person bom via IVF from a specified embryo with a specified genomic profile of developing a polygenic disease to generate a genomic index for that embryo. Computing device 508 may repeat the process to generate a genomic index for each of a set of embryos [0075] In some embodiments using a matching method, computing device 508 may compare genotypes in the embryo to genotypes in a database 510 to detect any genotype-phenotype matches. Database 510 may connect to computing device 508 via a wired or wireless connection.

[0076] In some embodiments using a scoring method, computing device 508 may compute scores and weightings associated with genotypes in the embryo or retrieve scores and weightings from an external database.

[0077] Memory 506 and 514 and database 510 may include cache memory, long term memory such as a hard drive, and/or external memory, for example, including random access memory (RAM), read only memory (ROM), dynamic RAM (DRAM), synchronous DRAM (SD-RAM), flash memory, volatile memory, non-volatile memory, cache memory, buffer, short term memory unit, long term memory unit, or other suitable memory units or storage units. Memory 506 and 514 and database 510 may store instructions (e.g., software 516) and data to execute embodiments of methods described here, steps and functionality (e.g., in long term memory such as a hard drive).

[0078] Computing device 508 may include a computing module having machine-executable instructions. The instructions may include, for example, a data processing mechanism (including, for example, embodiments of methods described herein) and a modeling mechanism. These instructions may be used to cause processor 512 using associated software 516 modules programmed with the instructions to perform the operations described. Alternatively, the operations may be performed by specific hardware that may contain hardwired logic for performing the operations, or by any combination of programmed computer components and custom hardware components.

[0079] Embodiments of the invention may include an article such as a computer or processor readable medium, or a computer or processor storage medium, such as for example a memory, a disk drive, or a USB flash memory, encoding, including or storing instructions, e.g., computer-executable instructions, which when executed by a processor or controller, carry out methods disclosed herein. [0080] Processor 512 may perform and execute various methods described herein.

[0081] Display 518 may display results and/or intermediate data such as outcomes, probabilities, genomic profiles. Display 518 may include a monitor or screen, such as an organic light emitting diode (LED) screen, liquid crystal display (LCD) screen, thin film transistor display, or the like. In one embodiment, the user may interact with display 580 using input device(s) 520.

[0082] Input device(s) 520 may include a keyboard, pointing device (e.g., mouse, trackball, pen,), a touch screen or cursor direction keys, communicating information and command selections to processor 514. Input device 520 may communicate user direction information and command selections to the processor 514. For example, a user may use input device 520 to select embryos for testing, define genes, phenotypes and/or diseases to be under investigation, set thresholds or categories, set margins of error or certainty of calculations, etc.

[0083] Processors 504 and 514 may include, for example, one or more processors, controllers, central processing units (“CPUs”), or graphical processing units (“GPUs”). Software 516 may be stored, for example, in memory 514.

[0084] Computer Systems/Processors (the following computer systems/processors may be used in combination with, or as an alternative to, computer systems/processors described in reference to Fig. 5).

[0085] The methods and systems described herein may be used in combination with one or more processors, having either single or multiple cores. The processor may be operatively connected to a memory. For instance, the memory may be solid state, flash, or nanoparticle based. The processor and/or memory may be operatively connected to a network via a network adapter. The network may be digital, analog, or a combination of the two. The processor may be operatively connected to the memory to execute computer program instructions to perform one or more steps described herein. Any computer language known to those skilled in the art may be used.

[0086] Input/output circuitry may be included to provide the capability to input data to, or output data from, the processor and/or memory. For example, input/output circuitry may include input devices, such as keyboards, mice, touch pads, trackballs, scanners, and the like; output devices, such as video adapters, monitors, printers, and the like; and input/output devices, such as, modems and the like.

[0087] The memory may store program instructions that are executed by, and data that are used and processed by, CPUs to perform various functions. The memory may include electronic memory devices, such as random-access memory (RAM), read-only memory (ROM), programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), and flash memory, and electro-mechanical memory, such as magnetic disk drives, tape drives, and optical disk drives, which may be used as an integrated drive electronics (IDE) interface, or a variation or enhancement thereof, such as enhanced IDE (EIDE) or ultra direct memory access (UDMA), or a small computer system interface (SCSI) based interface, or a variation or enhancement thereof, such as fast-SCSI, wide-SCSI, fast and wide-SCSI, etc, or a fiber channel-arbitrated loop (FC-AL) interface.

[0088] The systems described herein may also include an operating system that runs on the processor, including UNIX®, OS/2®, and WINDOWS®, each of which may be configured to run many tasks at the same time, e.g., multitasking operating systems. In one aspect, the methods are utilized with a wireless communication and/or computation device, such as a mobile phone, personal digital assistant, personal computer, and the like. Moreover, the computing system may be operable to wirelessly transmit data to wireless or wired communication devices using a data network, such as the Internet, or a local area network (LAN), wide-area network (WAN), cellular network, or other wireless networks known to those skilled in the art.

[0089] In one embodiment, a graphical user interface may be included to allow human interaction with the computing system. The graphical user interface may comprise a screen, such as an organic light emitting diode screen, liquid crystal display screen, thin film transistor display, and the like. The graphical user interface may generate a wide range of colors, or a black and white screen may be used.

[0090] In certain instances, the graphical user interface may be touch sensitive, and it may use any technology known to skilled artisans including, but not limited to, resistive, surface acoustic wave, capacitive, infrared, strain gauge, optical imaging, dispersive signal technology, acoustic pulse recognition, frustrated total internal reflection, and diffused laser imaging.

[0091] The following examples are presented to more fully illustrate preferred embodiments of the invention. They should in no way be construed, however, as limiting the broad scope of the invention.

EXAMPLES

Evaluation of Genomic Index Performance in 11,883 Adult Sibling Pairs

[0092] Preimplantation genetic testing for polygenic disease risk (PGT-P) represents a new tool to aid in embryo selection. Previous studies demonstrated the ability to obtain necessary genotypes in the embryo with accuracy equivalent to adults. When applied to select adult siblings with known type I diabetes status, a reduction in disease incidence of 45-72% compared to random selection was achieved. This Example extends analysis to 11,883 sibling pairs to evaluate clinical utility of embryo selection with PGT-P. Results demonstrate relative risk reduction of all diseases tested included diabetes, cancer, and heart disease, and indicate applicability beyond patients with a known family history of disease.

[0093] Independent of fertility, polygenic conditions present in families at a much higher rate compared to monogenic disease, with most polygenic disorders manifesting in adulthood. As such, while it is common for a couple to report a family history of polygenic disorders, it is rarer for a couple to present for PGT-P based on having a previously affected child. Exceptions to this involve polygenic disorders that present with early age of onset, such as type 1 diabetes. In this case, intended parents seeking IVF treatment may already be the parents of an affected child. As reported here, this very case presented for PGT-P. Still, the spectrum of patients who may consider PGT-P could vary from being affected, or having an affected child, to having an unknown family history of any of the polygenic diseases being tested. To address whether PGT-P may apply to intended parents with unknown family history of polygenic disease, several thousand sibling pairs represented in the United Kingdom (UK) Biobank repository were evaluated using a blinded genomic disease index methodology. Whether preimplantation embryo genomic index values correlate with the extent of the embryos’ family history was also tested.

Materials and Methods

2.1. PGT-P Case with First Degree Affected Family History

[0094] A couple with a family history of type 1 diabetes (T1D) presented to Genomic Prediction Clinical Laboratory and was counseled for and consented to PGT-P as previously described. The couple reported that their 5 -year-old son was diagnosed with T1D at 3 years of age, and that two additional relatives, a paternal first cousin and a maternal second cousin were diagnosed with T1D in their 20’s (Figure 1). The patient reported two maternal relatives who were diagnosed with breast cancer. The couple otherwise denied a personal or family history of polygenic conditions that are included on the current PGT-P panel. The couple also reported three first trimester miscarriages with a normal karyotype. The couple denied a history of parental chromosome rearrangements, and previous pregnancies or family history of aneuploidy. The couple declined a family history of additional genetic conditions that they wished to test for via PGT studies. 2.2. PGT-P Case Series including Unknown Family History

[0095] To begin to evaluate the frequency of high-risk embryos across different degrees of family history, 24 consecutive PGT cases were analyzed and compared. PGT was performed using trophectoderm biopsy derived DNA, followed by whole genome amplification and Axiom UKBB SNP array-based analyses as previously described. For each case, parental DNA was analyzed, and the ethnicity was predicted (Caucasian, Asian, African, other) with a pipeline built on a previously established supervised admixture methodology and trained with 551 known-ancestry samples. The internal validation was performed on 229 samples from Coriell Cell Repository resulting in an accuracy of 99.6% (228/229).

[0096] This consecutive PGT case series included couples who consented to research during genetic counseling for routine clinical use of PGT. Indications ranged from unknown family history to having 1st degree relatives (i.e. the embryo’s sibling or parent) affected with a polygenic disease. In all cases, PGT-A was performed in parallel and from the same biopsy, as previously described. Risk of type 1 and 2 diabetes, breast, prostate, and testicular cancer, malignant melanoma, basal cell carcinoma, heart attack, coronary artery disease, hypercholesterolemia, and hypertension was tested in embryos with Caucasian ancestry, and risk of type 2 diabetes, hypercholesterolemia, and hypertension in embryos with Asian ancestry. High risk of polygenic disease was defined as previously described.

2.3. PGT-P in 11,883 Adult Sibling Pairs

[0097] A recent study reported that SNPs which are predictive of specific diseases do not overlap with one another. This suggests that genetic selection to avoid one disease may not result in increasing another (pleiotropy). Instead, there may exist a positive effect of combining predictors into one “index” score. A genomic index algorithm was developed by combining Pi (the absolute probability of getting the disease computed from SNP genotypes), with QALY weights determined by Qi (the effect on life expectancy from each disease measured as lifespan impact years), and PAi (the population average probability of getting the disease):

Gi = ^ (QifPAi- Pi))), (1)

[0098] Here, i extends over all of the disease predictors, including type 1 and 2 diabetes, breast, prostate, and testicular cancer, malignant melanoma, basal cell carcinoma, heart attack, coronary artery disease, high cholesterol, and hypertension. The genomic index = Gi is the sum of each of these contributions. Life expectancy effects Wi are sourced from the medical literature.

[0099] Predictors were constructed from data obtained from the UK Biobank by first selecting the top 50,000 SNPs (by p-value) obtained from GWAS generated using the PLINK software and then using the LASSO-path algorithm from the Python Scikit Learn package. The UK Biobank identified all pairwise relationships stronger than 2nd cousins using the King kinship software. These results were used to identify all individuals who were within a sibling pair. This set of sibling pairs was further restricted to all individuals who self-reported their ethnic background to be “White, British, Irish or Any Other White Background” and was set aside as a final testing set. The remaining non-sibling paired self-reported white individuals were used as a training cohort. A small set of 500 cases I controls are withheld from the training cohort to tune the LASSO hyperparameter and select the final model - the value chosen is such that the AUC between cases I controls was maximized.

[00100] In order to validate the application of the genomic index to real sibling data, and to address the potential impact of pleiotropy upon PGT-P, a genomic index score was generated for same-sex sibling pairs from the genome-wide genotyping data of the UK Biobank. In each pair, one of the two siblings was assigned to the cohort of “higher risk sibling” (worse index score sibling), and “lower risk sibling” (better index score sibling). Then, the prevalence of disease was calculated among the two cohorts. The prevalence of disease in the lower risk sibling selected cohort was compared to the random selected cohort using binomial testing. Sex-specific relative risk reductions for diseases which affect both sexes were averaged.

[00101] Finally, genomic indexing was tested on blastocysts from the consecutive case series cohort described in section 2.2, and with respect to the extent of family history of polygenic disease. Family history of polygenic disease was divided in three main categories with respect to the tested embryos: 1) Having one or more first-degree affected relative(s), for example an affected parent or an existing affected sibling; 2) having one or more second degree or higher affected relative(s), such as a grandparent or cousin and 3) unknown or not reported by the patient. A two-tailed pairwise t-test was computed to compare the average genomic index of embryos among the three family history categories.

Results

3.1. PGT-P Case with First Degree Affected Family History

[00102] Four euploid embryos were evaluated for polygenic disease risk and resulted in identifying 2 at high risk for T1D (Figure 2).

3.2. PGT-P Case Series including Unknown Family History

[00103] Based upon these results, involving a case where the embryos had a first-degree relative affected by a polygenic disease, and prior results where embryos had a more distant relative (second-degree relative or higher) affected by a polygenic disease, the potential for correlation between the frequency of high risk embryos produced and the extent of an embryo’s family history of polygenic disease in a larger cohort of cases was investigated. A consecutive series of 24 PGT cases with 181 embryos was evaluated by PGT-P analysis. The mean maternal age was 34.5. Thirty-seven percent of the embryos were aneuploid (67/181). Ten couples were predicted as Asian and 14 as Caucasian. There were no high-risk embryos identified from the Asian euploid embryo cohort (0/28 with no known history, and 0/3 with a more distant affected relative). Among Caucasian cases, 3 out of 51 euploid embryos (6%) were identified as high risk from couples with no known or reported family history, 3 out of 28 (11%) in cases with a more distant relative, and 4 out of 4 (100%) in a case with a first-degree affected relative.

3.3. Genomic Index Selection in 11,883 Adult Sibling Pairs

[00104] A cohort size of 11,883 sibling pairs was available for analysis from the UK Biobank. The intent of evaluating genomic indexing in this cohort was to model application of PGT-P in families with no known history of disease. However, the prevalence of disease in this cohort was often lower than what has been reported for the general population, which would bias results of PGT-P in the direction of finding no reduction in risk. For instance, the prevalence of breast cancer in this UK Biobank cohort was 8.0%, while it has been reported as a 12.3% lifetime risk in the general population. Likewise, 7.4% of individuals in the UK Biobank adult sibling cohort were affected with type 2 diabetes, whereas a prevalence of 9.8% has been estimated in the United States. Nonetheless, these sibling pairs were used to compare the relative risk of disease with either random selection or blinded genetic selection of one of the two siblings. Results indicate a relative risk reduction for all diseases tested (Figure 3) (Table 1).

[00105] Genomic indexing was also performed on embryos evaluated in the PGT-P case series described in section 3.2. Each embryo was classified based on the aforementioned categories of family history. Results indicate that embryos with a first-degree affected relative have a higher genomic risk index compared to embryos with a more distant affected relative (p=0.0132, t=3.09, df=9, a=0.05) or with unknown family history (p=0.0015, t=4.34, df=10, a=0.05). Likewise, even embryos with at least one distant affected relative presented a higher average genomic index compared to those with unknown family history (p=0.0129, t=2.55, df=66, a=0.05) (Figure 4). Table 1. Binomial test p-values for relative disease risk reduction between random selection and genomic index selection of 11,883 sibling pairs.

Disease Male Female

Basal Cell Carcinoma 0.0224 0.2655

Breast Cancer 0.0001

Malignant Melanoma 0.3518 0.4661

Prostate Cancer 0.0224

Testicular Cancer 0.5

Coronary Artery Disease 9.53E-16 3.09E-07

Heart Attack 7.31E-22 1.24E-06

Hypercholesterolemia 4.73E-10 1.21E-11

Hypertension 3.03E-25 3.08E-33

Type 1 Diabetes 0.0019 0.0083

Type 2 Diabetes 1.64E-17 2.09E-21

[00106] Genomic indexing was also performed on embryos evaluated in the PGT-P case series described in section 3.2. Each embryo was classified based on the aforementioned categories of family history Results indicate that embryos with a first-degree affected relative have a higher genomic risk index compared to embryos with a more distant affected relative (p=0.0132, t=3.09, df=9, a=0.05) or with unknown family history (p=0.0015, t=4.34, df=10, a=0.05). Likewise, even embryos with at least one distant affected relative presented a higher average genomic index compared to those with unknown family history (p=0.0129, t=2.55, df=66, a=0.05) (Figure 4).

Discussion

[00107] This study extends the validity of PGT-P to reduce disease risk beyond families with known history of disease. While many patients may elect to utilize PGT-P specifically because of a personal or family history of disease, the data presented here demonstrates utility in a more general application to routine embryo selection. One unique feature of this method is that PGT-A results are obtained in parallel with PGT-P, allowing patients to elect for additional information after knowing how many euploid embryos are suitable for transfer. In other words, instead of choosing which embryo to transfer based on morphology, choosing based upon PGT-P provides an option for patients to reduce the risk of polygenic disease, even when only 2 euploid embryos are available to choose from, and when the intended parents have no known family history of polygenic disease.

[00108] In the case series reported here, 114 of 181 embryos tested were chromosomally normal (63%). Among the euploid embryos, only ten (5%) were identified as having a high risk of a polygenic disease. With this information, patients would still be faced with deciding which euploid normal risk embryo to choose for embryo transfer. Additional empirical analyses with the use of genomic indexing demonstrated relative risk reduction in all diseases tested, thereby providing additional criteria for patients to choose which embryo to transfer. Again, risk reduction was demonstrated with only two siblings to select from. Based upon a previous study, the availability of more than two siblings will further improve the relative risk reductions observed here.

[00109] Another important consideration relates to the potential for pleiotropy, the genetic effect of a single gene on multiple phenotypic traits. With respect to PGT-P, avoiding high risk of one disease may lead to enrichment for another. The present study also demonstrated that negative pleiotropy was not observed. That is, selection with PGT-P resulted in the reduction in risk for all diseases in parallel. In support of this observation, a recent study reported that SNP sets used to predict risk of different diseases were largely disjoint.

[00110] While the present study clearly demonstrates the utility of PGT-P based sibling selection to reduce the relative risk of disease, several improvements may still be possible. The current metric of the impact of each disease used in the genomic index was limited to its reported years of lost life. Several studies on the burden of disease have incorporated more comprehensive metrics including reduced quality of life. More validation can be performed and optimized on the genomic index by testing it on the life span and quality of life outcome data from the UKBB. In addition, patients may have unique interests in reducing risks of certain diseases over others. More careful curation of these metrics will likely improve the utility of PGT-P.

[00111] While the clinical utility of tracing monogenic disorders through detailed pedigree analysis is well established, family history alone has been shown to be less effective as a single predictor of polygenic disease. The results presented here may also have implications similar to when expanded carrier screening was introduced to contemporary genetic testing strategies. Just as ethnicity and family history cannot be completely relied on to identify couples at risk for recessive disease, family history and ethnicity cannot be relied on alone to predict polygenic risk. That is, there is clear benefit to PGT-P in situations where “no known family history” exists, given that this status may only indicate that there was no reported history or no confirmed history, and that most families have a relative with at least one of the polygenic diseases tested by PGT-P. This also may further benefit couples who have no known history because they know very little about their family tree, were adopted, or are using gamete donation.

[00112] In conclusion, PGT-P provides an additional method for embryo selection beyond conventional aneuploidy screening and morphological assessment and is applicable to intended parents whose embryos have family histories ranging from an affected first order relative to no known history. At each level of embryonic family history evaluated, and in consideration of reducing the risk of polygenic disease through selection, this study demonstrates a measurable reduction in disease risk. Future work will involve incorporating additional quality of life metrics and DNA repository datasets, additional disease predictors, analysis of correlation with embryonic morphological characteristics, and relative risk reduction with more than 2 siblings to select from. The ability of genomic indexing to reduce risk of multiple diseases in parallel may allow indirect reduction in risk of diseases where direct genomic predictors are not yet available.

[00113] All references, including publications, patent applications, and patents, cited herein are hereby incorporated by reference in its entirety herein.

[00114] Having described preferred embodiments of the invention with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments, and that various changes and modifications may be affected therein by those skilled in the art without departing from the scope or spirit of the invention as defined in the appended claims.