Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
GENETIC MARKERS PREDICTING AGE AT MENOPAUSE
Document Type and Number:
WIPO Patent Application WO/2009/142483
Kind Code:
A3
Abstract:
The present relates to a method of predicting the age at natural menopause in a female individual comprising detecting at least one SNP in a gene selected from the group consisting of CLCA4, KALRN, TOPBP1, the gene for hypothetical protein FLJ16641, VCAN, MCTP1, BAI3, AGR2,LSM5, SEMA3E, CSMD1, CPA6, LINGO2, the gene for hypothetical protein C9orf85, EHMT1, ADARB2, DNA2L, MGMT, SPATA19, the gene for hypothetical protein LOC146167, MSI2, BRSK1, SUV4-20H2, HSPB1, and SCARF2.

Inventors:
UITTERLINDEN ANDREAS GERARDUS (NL)
STOLK LISETTE (NL)
SPECTOR TIMOTHY DAVID (GB)
ZHAI GUANGJU (GB)
Application Number:
PCT/NL2009/050265
Publication Date:
January 28, 2010
Filing Date:
May 18, 2009
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV ERASMUS MEDICAL CT (NL)
KING S COLLEGE LONDON (GB)
UITTERLINDEN ANDREAS GERARDUS (NL)
STOLK LISETTE (NL)
SPECTOR TIMOTHY DAVID (GB)
ZHAI GUANGJU (GB)
International Classes:
C12Q1/68
Foreign References:
US20050064453A12005-03-24
Other References:
DATABASE DBSNP [online] 28 August 2007 (2007-08-28), XP002500979, retrieved from NCBI Database accession no. ss76507267
DATABASE DBSNP [online] 28 August 2007 (2007-08-28), ABASE ENTRY SS76507267, XP002500980, retrieved from NCBI Database accession no. ss76507267
TEMPFER CLEMENS B ET AL: "Polymorphisms associated with thrombophilia and vascular homeostasis and the timing of menarche and menopause in 728 white women.", May 2005, MENOPAUSE (NEW YORK, N.Y.) 2005 MAY-JUN, VOL. 12, NR. 3, PAGE(S) 325 - 330, ISSN: 1072-3714, XP008097884
ZHANG ET AL: "HDC gene polymorphisms are associated with age at natural menopause in Caucasian women", BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, ACADEMIC PRESS INC. ORLANDO, FL, US, vol. 348, no. 4, 6 October 2006 (2006-10-06), pages 1378 - 1382, XP005614418, ISSN: 0006-291X
BROEKMANS ET AL: "Female reproductive ageing: current knowledge and future trends", TRENDS IN ENDOCRINOLOGY AND METABOLISM, ELSEVIER SCIENCE PUBLISHING, NEW YORK, NY, US, vol. 18, no. 2, 21 February 2007 (2007-02-21), pages 58 - 65, XP005897498, ISSN: 1043-2760
HEFLER L A ET AL: "Estrogen-metabolizing gene polymorphisms and age at natural menopause in Caucasian women", HUMAN REPRODUCTION (OXFORD), vol. 20, no. 5, May 2005 (2005-05-01), pages 1422 - 1427, XP002500981, ISSN: 0268-1161
HE LI-NA ET AL: "Association study of the oestrogen signalling pathway genes in relation to age at natural menopause", JOURNAL OF GENETICS, vol. 86, no. 3, December 2007 (2007-12-01), pages 269 - 276, XP002500982, ISSN: 0022-1333
Attorney, Agent or Firm:
HATZMANN, M.J. (Johan de Wittlaan 7, JR Den Haag, NL)
Download PDF:
Claims:

Claims

1. A method of predicting the age at natural menopause in a female individual comprising detecting at least one SNP in a gene selected from the group consisting of CLCA4, KALRN, TOPBPl, the gene for hypothetical protein FLJ16641, VCAN, MCTPl, BAI3, AGR2, LSM5, SEMA3E, CSMDl, CPA6, LINGO2, the gene for hypothetical protein C9orf85, EHMTl, ADARB2, DNA2L, MGMT, SPATA19, the gene for hypothetical protein LOC146167, MSI2, BRSKl, SUV4-20H2, HSPBl, and SCARF2.

2. Method according to any claim 1, wherein said SNP is selected from the group consisting of:

- SNP rsl321678 in gene CLCA4 on chromosome 1,

- SNP rsl373606 in gene KALRN on chromosome 3,

- SNP rsll711706 in gene TOPBPl on chromosome 3, - SNP rsll706413 in gene TOPBPl on chromosome 3,

- SNP rsl0512907 in gene TOPBPl on chromosome 3,

- SNP rs 10936057 in the gene for hypothetical protein FLJ 16641 on chromosome 3,

- SNP rs529998 in gene VCAN on chromosome 5, - SNP rsl426100 in gene MCTPl on chromosome 5,

- SNP rsl0485252 in gene BAI3 on chromosome 6,

- SNP rs3799081 in gene BAI3 on chromosome 6,

- SNP rs706076 in gene AGR2 on chromosome 7,

- SNP rsl404966 in gene AGR2 on chromosome 7, - SNP rsl7458671 in gene LSM5 on chromosome 7,

- SNP rs7783211 in gene SEMA3E on chromosome 7,

- SNP rsll786333 in gene CSMDl on chromosome 8,

- SNP rs594006 in gene CPA6 on chromosome 8,

- SNP rsl0968337 in gene LINGO2 on chromosome 9,

- SNP rsl 1143040 in the gene for hypothetical protein C9orf85 on chromosome 9,

- SNP rs2151145 on chromosome 9

- SNP rsl0780435 in an unknown gene on chromosome 9, - SNP rs3125785 in gene EHMTl on chromosome 9,

- SNP rs6560768 in gene ADARB2 on chromosome 10,

- SNP rs7072505 in gene ADARB2 on chromosome 10,

- SNP rsl7157052 in gene ADARB2 on chromosome 10,

- SNP rslO823195 in gene DNA2L on chromosome 10, - SNP rs2622436 in gene MGMT on chromosome 10,

- SNP rs4397868 in gene SPATA19 on chromosome 11,

- SNP rs7333181 on chromosome 13

- SNP rsl2431748 in an unknown gene on chromosome 14,

- SNP rs2875853 in the gene for hypothetical protein LOC146167 on chromosome 16,

- SNP rs2332902 in gene MSI2 on chromosome 17,

- SNP rsl551562 in gene BRSKl on chromosome 19,

- SNP rsll72822 in gene BRSKl on chromosome 19,

- SNP rs7246479 in gene BRSKl, SUV4-20H2, or HSPBlon chromosome 19,

- SNP rs2384687 in gene BRSKl, SUV4-20H2, or HSPBlon chromosome 19,

- SNP rs897798 in gene BRSKl, SUV4-20H2, or HSPBlon chromosome 19, - SNP rs236114 on chromosome 20, and

- SNP rs8137004 in gene SCARF2 on chromosome 22.

3. Method according to claim 1 or 2, wherein said gene is TOPBl, BAI3, AGR2, ADARB2, BRSKl, SUV4-20H2, or HSPBl.

4. Method according to any one of claims 1-3, wherein said gene is BRSKl.

5. Method according to claim 4, wherein said SNP is SNP rs2384687 in gene BRSKl, SUV4-20H2, or HSPBlon chromosome 19 and/or SNP rsll72822 in the BRSKl, SUV4-20H2, or HSPBl gene on chromosome 19.

6. An isolated nucleic acid molecule comprising a polymorphic nucleotide position and selectively hybridizing under high stringency conditions to a nucleotide sequence encoding a gene selected from the group consisting of CLCA4, KALRN, TOPBPl, the gene for hypothetical protein FLJ16641, VCAN, MCTPl, BAI3, AGR2, LSM5, SEMA3E, CSMDl, CPA6, LINGO2, the gene for hypothetical protein C9orf85, EHMTl, ADARB2, DNA2L, MGMT, SPATA19, the gene for hypothetical protein LOC146167, MSI2, BRSKl, SUV4-20H2, HSPBl, and SCARF2, or to the complement thereof, wherein the polymorphic position is an SNP selected from the group consisting of:

- SNP rsl321678 in gene CLCA4,

- SNP rsl373606 in gene KALRN,

- SNPs rsll711706, rsll706413 and rsl0512907 in gene TOPBPl,

- SNP rsl0936057 in the gene for hypothetical protein FLJ16641, - SNP rs529998 in gene VCAN,

- SNP rsl426100 in gene MCTPl,

- SNPs rsl0485252 and rs3799081 in gene BAI3,

- SNPs rs706076 and rsl404966 in gene AGR2,

- SNP rsl7458671 in gene LSM5, - SNP rs7783211 in gene SEMA3E,

- SNP rsll786333 in gene CSMDl,

- SNP rs594006 in gene CPA6,

- SNP rsl0968337 in gene LINGO2,

- SNP rslll43040 in the gene for hypothetical protein C9orf85, - SNP rsl0780435 and rs2151145 in an unknown gene on chromosome 9,

- SNP rs3125785 in gene EHMTl,

- SNPs rs6560768, rs7072505 and rsl7157052 in gene ADARB2,

- SNP rslO823195 in gene DNA2L,

- SNP rs2622436 in gene MGMT,

- SNP rs4397868 in gene SPATA19, - SNP rs7333181 in an unknown gene on chromosome 13,

- SNP rsl2431748 in an unknown gene on chromosome 14,

- SNP rs2875853 in the gene for hypothetical protein LOC146167,

- SNP rs2332902 in gene MSI2,

- SNPs rsl551562, rsll72822, rs7246479, rs2384687 and rs897798 in gene BRSKl, SUV4-20H2, or HSPBl

- SNP rs236114 on chromosome 20, and

- SNP rs8137004 in gene SCARF2.

7. An oligonucleotide that specifically hybridizes to the isolated nucleic acid molecule according to claim 6, and wherein the oligonucleotide hybridizes to a portion of the isolated nucleic acid molecule comprising the polymorphic nucleotide position.

8. An oligonucleotide that specifically hybridizes under high stringency conditions to the isolated nucleic acid molecule according to claim 6, wherein the oligonucleotide hybridizes to the polymorphic position and wherein the oligonucleotide is between about 18 nucleotides and about 50 nucleotides in length.

9. An oligonucleotide according to claim 7, wherein a central nucleotide of the oligonucleotide specifically hybridizes with the polymorphic position of the portion of the nucleic acid molecule.

10. A method of genetic screening comprising detecting in a nucleic acid sample the presence of a polymorphic gene wherein at least one oligonucleotide as defined in any one of claims 7-9 is allowed to hybridize under stringent conditions to the nucleic acid in said sample.

11. The method of claim 10, further comprising the step of amplifying a region of the gene or a portion thereof that contains the polymorphism.

12. The method according to claim 10 or 11, wherein the polymorphism is identified by a method selected from the group consisting of: restriction fragment length polymorphism (RFLP) analysis, minisequencing, MALD- TOF, short interspersed repeat element (SINE), heteroduplex analysis, single strand conformational polymorphism (SSCP), denaturing gradient gel electrophoresis (DGGE), temperature gradient gel electrophoresis (TGGE), Q-PCR, RT-PCR, restriction enzyme analysis and DNA array hybridization.

13. A kit for identifying an SNP in a polymorphic gene, comprising an oligonucleotide according to any one of claims 7-9 and packaging and instructions for characterizing the genotype of a female individual with reference to a polymorphic nucleotide position in a gene selected from the group consisting of CLCA4, KALRN, TOPBPl, the gene for hypothetical protein FLJ16641, VCAN, MCTPl, BAI3, AGR2, LSM5, SEMA3E, CSMDl, CPA6, LINGO2, the gene for hypothetical protein C9orf85,

EHMTl, ADARB2, DNA2L, MGMT, SPATA19, the gene for hypothetical protein LOC146167, MSI2, BRSKl, SUV4-20H2, HSPBl, and SCARF2.

Description:

Title: Genetic Markers Predicting Age at Menopause

FIELD OF THE INVENTION

The present invention is in the field of medical diagnostics. In particular, the present invention relates to the diagnosis of menopause onset, more in particular to the use of genetic markers predicting age at menopause.

BACKGROUND OF THE INVENTION

The menopause is the time of life when menstrual cycles cease and it marks the cessation of a women's reproductive lifespan. The age-related decrease in ovarian follicle numbers and a decay in the oocyte quality lead to the occurrence of natural loss of fecundity and, ultimately, menopause. The rate of this ovarian ageing process is highly variable among women and can range between 40-60 yrs of age. This variability is to a large extent determined by genetic factors with twin studies indicating heritabilities of up to 50-60% (Broekman et al. 2007, Nelson 2008, DeLellis Henderson 2008). Early menopause is associated with increased risk for many age related diseases including cardiovascular disease, osteoporosis and osteoarthritis, while it might protect against certain cancers such as breast cancer. The identification of women who have a decreased ovarian reserve for their age and might enter relatively early into menopause is, therefore, clinically relevant. Also for reasons of family planning age at menopause is a relevant feature.

Over the past few decades, postponement of childbearing has led to a decrease in family size and increased rates of age-related female sub-fertility.

Endocrine and imaging tests for ovarian reserve measure mainly quantitative aspects of ovarian reserve, but their capacity to predict age at menopause is limited and prediction of the chances for pregnancy is also poor. Genetic factors regulating the size of the follicle pool and the rate of its depletion might be identified in the near future and, possibly, assist the

accurate prediction of a woman's reproductive lifespan, including predicting age at menopause.

The identification of such genetic factors has been problematic given the complex nature of the phenotype of age-at-menopause, i.e., it is influenced by many genetic factors probably of modest effect size and the variability is most likely also influenced by environmental factors in interaction with the genetic factors.

Hence, the availability of genetic markers that are predictive for age at natural menopause is highly desirable. At present, no such markers exist.

SUMMARY OF THE INVENTION

By using Genome Wide Association Studies, the present inventors have discovered single nucleotide polymorphisms (SNPs) in the genome of women that strongly correlate with age at natural menopause. These SNPs are located in or near genes which have not previously been associated with menopause onset.

In a first aspect, the present invention provides a method of predicting the age at natural menopause in a female individual comprising detecting at least one SNP in a gene selected from the group consisting of CLCA4, KALRN, TOPBPl, the gene for hypothetical protein FLJ16641, VCAN, MCTPl, BAI3,

AGR2, LSM5, SEMA3E, CSMDl, CPA6, LINGO2, the gene for hypothetical protein C9orf85, EHMTl, ADARB2, DNA2L, MGMT, SPATA19, the gene for hypothetical protein LOC146167, MSI2, BRSKl, SUV4-20H2, HSPBl, and

SCARF2. In a preferred embodiment of a method of the invention, said SNP is selected from the group consisting of:

- SNP rsl321678 in gene CLCA4 on chromosome 1,

- SNP rsl373606 in gene KALRN on chromosome 3,

- SNP rsll711706 in gene TOPBPl on chromosome 3, - SNP rsll706413 in gene TOPBPl on chromosome 3,

- SNP rsl0512907 in gene TOPBPl on chromosome 3,

- SNP rs 10936057 in the gene for hypothetical protein FLJ 16641 on chromosome 3,

- SNP rs529998 in gene VCAN on chromosome 5,

- SNP rsl426100 in gene MCTPl on chromosome 5, - SNP rsl0485252 in gene BAI3 on chromosome 6,

- SNP rs3799081 in gene BAI3 on chromosome 6,

- SNP rs706076 in gene AGR2 on chromosome 7,

- SNP rsl404966 in gene AGR2 on chromosome 7,

- SNP rsl7458671 in gene LSM5 on chromosome 7, - SNP rs7783211 in gene SEMA3E on chromosome 7,

- SNP rsll786333 in gene CSMDl on chromosome 8,

- SNP rs594006 in gene CPA6 on chromosome 8,

- SNP rsl0968337 in gene LINGO2 on chromosome 9,

- SNP rsl 1143040 in the gene for hypothetical protein C9orf85 on chromosome 9,

- SNP rsl0780435 in an unknown gene on chromosome 9,

- SNP rs2151145 on chromosome 9

- SNP rs3125785 in gene EHMTl on chromosome 9,

- SNP rs6560768 in gene ADARB2 on chromosome 10, - SNP rs7072505 in gene ADARB2 on chromosome 10,

- SNP rsl7157052 in gene ADARB2 on chromosome 10,

- SNP rslO823195 in gene DNA2L on chromosome 10,

- SNP rs2622436 in gene MGMT on chromosome 10,

- SNP rs4397868 in gene SPATA19 on chromosome 11, - SNP rsl2431748 in an unknown gene on chromosome 14,

- SNP rs2875853 in the gene for hypothetical protein LOC146167 on chromosome 16,

- SNP rs2332902 in gene MSI2 on chromosome 17,

- SNP rsl551562 in gene BRSKl on chromosome 19, - SNP rsll72822 in gene BRSKl on chromosome 19,

- SNP rs7246479 in gene BRSKl, SUV4-20H2, or HSPBlon chromosome 19,

- SNP rs2384687 in gene BRSKl, SUV4-20H2, or HSPBl on chromosome 19,

- SNP rs897798 in gene BRSKl, SUV4-20H2, or HSPBl on chromosome 19, - SNP rs236114 on chromosome 20, and

- SNP rs8137004 in gene SCARF2 on chromosome 22.

- In a preferred embodiment of a method of the present invention, said SNP is a SNP from the list of SNPs that are highly correlated with the said SNP through linkage disequilibrium (LD). Highly correlated is defined as having an r 2 of equal to or at least 0.8 to the said SNP.

In a preferred embodiment of a method of the present invention, said gene is TOPBl, BAI3, AGR2, ADARB2, BRSKl, SUV4-20H2, or HSPBl, most preferably said gene is BRSKl.

Preferably, the methods and aspects of the invention relate to female individuals of European ancestry.

In another preferred embodiment of a method of the present invention, said SNP is SNP rs2384687 in gene BRSKl, SUV4-20H2, or HSPBlon chromosome 19 and/or SNP rsll72822 in the BRSKl, SUV4-20H2, HSPBlgene on chromosome 19. In another aspect, the present invention provides an isolated nucleic acid molecule comprising a polymorphic nucleotide position and selectively hybridizing under high stringency conditions to a nucleotide sequence encoding a gene selected from the group consisting of CLCA4, KALRN, TOPBPl, the gene for hypothetical protein FLJ16641, VCAN, MCTPl, BAI3, AGR2, LSM5, SEMA3E, CSMDl, CPA6, LINGO2, the gene for hypothetical protein C9orf85, EHMTl, ADARB2, DNA2L, MGMT, SPATA19, the gene for hypothetical protein LOC146167, MSI2, BRSKl, SUV4-20H2, HSPBl, and SCARF2, or to the complement thereof, wherein the polymorphic position is a SNP selected from the group consisting of: - SNP rsl321678 in gene CLCA4,

- SNP rsl373606 in gene KALRN,

- SNPs rsll711706, rsll706413 and rsl0512907 in gene TOPBPl,

- SNP rsl0936057 in the gene for hypothetical protein FLJ16641,

- SNP rs529998 in gene VCAN,

- SNP rsl426100 in gene MCTPl,

- SNPs rsl0485252 and rs3799081 in gene BAI3, - SNPs rs706076 and rsl404966 in gene AGR2,

- SNP rsl7458671 in gene LSM5,

- SNP rs7783211 in gene SEMA3E,

- SNP rsll786333 in gene CSMDl,

- SNP rs594006 in gene CPA6, - SNP rsl0968337 in gene LING02,

- SNP rsl 1143040 in the gene for hypothetical protein C9orf85,

- SNP rsl0780435 in an unknown gene on chromosome 9,

- SNP rs3125785 in gene EHMTl,

- SNPs rs6560768, rs7072505 and rsl7157052 in gene ADARB2, - SNP rslO823195 in gene DNA2L,

- SNP rs2622436 in gene MGMT,

- SNP rs4397868 in gene SPATA19,

- SNP rsl2431748 in an unknown gene on chromosome 14,

- SNP rs2875853 in the gene for hypothetical protein LOC146167, - SNP rs2332902 in gene MSI2,

- SNPs rsl551562, rsll72822, rs7246479, rs2384687 and rs897798 in gene BRSKl SUV4-20H2, and HSPBl, and

- SNP rs8137004 in gene SCARF2.

The above SNPs may be used either alone or in combination in aspects of the present invention. Preferably, in aspects of the invention a combination of 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29 or 30 SNPs is used.

Preferably, the method of the invention makes use of any of the above SNPs, preferably in combination with one or more of the SNPs rsll72822, rs236114, rs2384687, rsl551562, rs7333181, and/or rs897798. In another preferred embodiment, the at least one SNP is selected from the SNPs rsll72822, rs236114, rs2384687, rsl551562, rs7333181, and/or rs897798.

Again, these 6 SNPs may be used either alone or in combination in aspects of the present invention.

Most preferred SNPs are SNP rs2151145 on chromosome 9, SNP rs7333181 on chromosome 13, SNP rs236114 on chromosome 20, and SNP rsll72822 on chromosome 19. Hence, these 4 SNPs, used either alone or in combination with each other or with any one of the above-mentioned SNPs, constitute even more preferred embodiments in aspects of the invention. Of these, SNP rs236114 on chromosome 20, and SNP rsll72822 on chromosome

19 are highly preferred. In general, mutations in chromosome 19 and/or 20 are preferred. In particular the genes associated with the loci of the SNPs described for chromosomes 19 and 20 are highly preferred embodiments in aspect of the invention.

In another aspect, the present invention provides an oligonucleotide that specifically hybridizes to the isolated nucleic acid molecule of the present invention, and wherein the oligonucleotide hybridizes to a portion of the isolated nucleic acid molecule comprising the polymorphic nucleotide position. In a preferred embodiment of an oligonucleotide of the present invention, said oligonucleotide specifically hybridizes under high stringency conditions to the isolated nucleic acid molecule of the present invention, wherein the oligonucleotide hybridizes to the polymorphic position and wherein the oligonucleotide is between about 18 nucleotides and about 50 nucleotides in length.

In a preferred embodiment of an oligonucleotide of the present invention, a central nucleotide of the oligonucleotide specifically hybridizes with the polymorphic position of the portion of the nucleic acid molecule.

In another aspect, the present invention provides a method of genetic screening comprising detecting in a nucleic acid sample the presence of a polymorphic gene wherein at least one oligonucleotide as defined above is allowed to hybridize under stringent conditions to the nucleic acid in said sample.

In a preferred embodiment, said method of genetic screening further comprises the step of amplifying a region of the gene or a portion thereof that contains the polymorphism.

In a preferred embodiment of a method of genetic screening of the invention, the polymorphism is identified by a method selected from the group consisting of: restriction fragment length polymorphism (RFLP) analysis, minisequencing, MALD-TOF, SINE, heteroduplex analysis, single strand conformational polymorphism (SSCP), denaturing gradient gel electrophoresis (DGGE), temperature gradient gel electrophoresis (TGGE), Q-PCR, RT-PCR, restriction enzyme analysis and DNA array hybridization.

In another aspect, the present invention provides a kit for identifying an SNP in a polymorphic gene, comprising an oligonucleotide according to any one of claims 7-9 and packaging and instructions for characterizing the genotype of a female individual with reference to a polymorphic nucleotide position in a gene selected from the group consisting of CLCA4, KALRN,

TOPBPl, the gene for hypothetical protein FLJ16641, VCAN, MCTPl, BAI3, AGR2, LSM5, SEMA3E, CSMDl, CPA6, LINGO2, the gene for hypothetical protein C9orf85, EHMTl, ADARB2, DNA2L, MGMT, SPATA19, the gene for hypothetical protein LOC146167, MSI2, BRSKl, SUV4-20H2, HSPBl, and SCARF2.

BRIEF DESCRIPTION OF THE DRAWINGS

Figure 1: Figure Ia shows the quantile-quantile plot age at natural menopause in 2419 women of the Rotterdam study as explained in the Examples. Figure Ib shows the negative log p-value for each single SNP for age at natural menopause in 2,419 women of the Rotterdam study, the red line indicates genome-wide significance (p=5*10 8 ).

Figure 2: Figures 2a to 2e show the r2-based LD plots for the 5 loci with multiple SNPs associated with age at natural menopause in the meta-analysis. In the lower panel of Figure 2f the genes located at the 200kb chromosome 19 locus, containing the 5 SNPs associated with age at natural menopause, are shown. The upper panel shows the p- values for all the SNPs in this 200kb

region that are on the 550K array in the Rotterdam study. The 5 most significant SNPs in this region are all located in a 20kb region which contains the last 9kb part of the BRSKl gene and the immediate 11kb 3' region. Figure 2a. Plot showing the r2-based LD for the chromosome 3 locus. 2b. Plot showing the r2-based LD for the chromosome 6 locus. 2c. Plot showing the r2- based LD for the chromosome 7 locus. 2d. Plot showing the r2-based LD for the chromosome 10 locus. 2e. Plot showing the r2-based LD for the chromosome 19 locus. 2f. Scheme showing the negative log p-values for SNPs in the 200kb region on chromosome 19 from the Rotterdam study data (upper panel) and the genes located in this region (lower panel).

DESCRIPTION OF THE PREFERRED EMBODIMENTS

The term "natural menopause" indicates that the menopause is not the result of a disorder, syndrome or medical condition. The term reflects the occurrence of natural process in females when menstrual cycles cease.

Menopause is preceded by the age-related decrease in ovarian follicle numbers and a decay in the oocyte quality. This leads to the occurrence of natural loss of fecundity and, ultimately, menopause. Hence, in addition to predicting the age of natural menopause, the aspects of the invention may aid in the prediction of age at which processes that lead up to menopause may be observed, including decrease in ovarian follicle numbers, decay in the oocyte quality and natural loss of fecundity.

The term "female individual" refers to a female individuals in a mammalian species, preferably a human female, i.e a woman, more in particular a Caucasian.

The term "single nucleotide polymorphism", abbreviated "SNP", as used herein refers to a DNA sequence variation that involves a substitution, insertion or deletion, generally an alteration, of a single nucleotide position. The term "in a gene" includes reference to "near a gene". The present inventors have found SNPs that are predictive of a particular phenotype. The location of these SNPs in the genome is defined and provided in Table 2. They can be located in genes (having a known coding function) or located in genetic

regions with no known coding function near a gene. In each case this is referred to herein as "in a gene".

The term "linked thereto" refers to a genetic coupling between a SNP and another SNP or refers to a coupling between a SNP and the age at menopause. This coupling is based on so-called Linkage Disequilibrium (LD) between the SNPs or between the SNP and the age at menopause as observed herein. The degree of LD is expressed as an r 2 value. Values for r 2 above 0.8 are indicative of such strong coupling that the SNP fully explains the association observed. The term "linked thereto" therefore refers to two genetic positions (such as SNPs or other detectable polymorphic markers) having a value for r 2 between them of >0.8. Usually this means that segregation between the markers is rarely observed in or between populations. Usually, this means that they are inherited together and hence, that they are on the same chromosome and the genetic distance between them is small. The present inventors have now discovered polymorphic positions (in particular SNPs) in the DNA of female individuals that are associated with age at menopause. This means that the polymorphisms themselves are markers for this phenotype, but also markers associated with these polymorphisms can be used as markers for this phenotype. Therefore, the present inventors have enabled the provision of a test wherein not only the specific SNPs as defined in Table 2 (i.e the primary SNPs) are suitable for use as a marker for predicting age at menopause, but also a test wherein another (secudary) SNP (referred to herein as "tag SNP")or other polymorphic markers, which are known to occur in female individuals and for which no association with any phenotype is necessarily known, but which is genetically linked to the primary SNP, can be used to predict age at menopause. Such polymorphisms can be found in databases.

The term "polymorphism" as used herein refers to the occurrence of two or more genetically determined alternative sequences or alleles in a population. A single nucleotide polymorphism occurs at a polymorphic site occupied by a single nucleotide, which is the site of variation between allelic sequences. A single nucleotide polymorphism usually arises due to substitution of one nucleotide for another at the polymorphic site.

The term "tag SNPs" or as used herein refers to SNPs which are known to reside in the region where the primary SNPs identified in the meta analysis as described in the Examples below were found. Tag SNPs are determined as follows: For each of the 35 SNPs identified in Table 2 information on other SNPs located in 500kb surrounding the primary SNP was downloaded from the HapMap-website (release 23a) (URL: http://www.hapmap.org/cgi- PJl!Z£browse/hap_map_B36/ ) for the CEU population. Using the tagger option in Haploview v4.1 (URL: http://www.broad.mit.edu/mpg/haploview/ ; Barrett et al, 2005) the tagged SNPs per primary SNP were determined. A SNP is tagged to one of the primary SNPs if r 2 >0.8. The results for this analysis are shown in the table 3. Column 2 shows the primary SNP, tagSNP, and column 3 the SNPs tagged by the primary SNP (referred to as TOP 35 in the Example). In Column 4 the correlation is depicted. More detailed information can be found i.a. in Barrett JC, Fry B, Mailer J, Daly MJ. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005 Jan 15.

The term "genotype" in the context of this invention refers to the particular allelic form of a gene, which can be defined by the particular nucleotide (s) present in a nucleic acid sequence at a particular site (s). As used herein, "nucleic acid" includes reference to a deoxyribonucleotide or ribonucleotide polymer in either single-or double- stranded form, and unless otherwise limited, encompasses known analogues having the essential nature of natural nucleotides in that they hybridize to single- stranded nucleic acids in a manner similar to naturally occurring nucleotides (e. g., peptide nucleic acids). The term "isolated", as in "isolated nucleic acid molecule", refers to material, such as a nucleic acid, which is substantially or essentially free from components that normally accompany or interact with it as found in its naturally occurring environment. An isolated DNA molecule is a fragment of DNA that has been separated and that is no longer integrated in the genomic DNA of the organism from which it is derived.

"Nucleotides" are referred to by their commonly accepted single-letter codes following IUPAC nomenclature: A (Adenine), C (Cytosine), T (Thymine),

G (Guanine), U (Uracil), W (A or T), R (A or G), K (G or T), Y (C or T), S (C or G), M (A or C), B (C, G or T), H (A, C, or T), D (A, G, or T), V (A, C, or G), N (A, C, G, or T).

A "central nucleotide" refers to a nucleotide positioned essentially in the middle of a target region for hybridization or positioned essentially in the middle of a probe or primer for hybridization.

A "coding" or "encoding" sequence is the part of a gene that codes for the amino acid sequence of a protein, or for a functional RNA such as a tRNA or rRNA. The terms "hybridise" or "anneal" refer to the process by which single strands of nucleic acid sequences form double-helical segments through hydrogen bonding between complementary nucleotides.

The terms "stringency" or "stringent hybridization conditions" refer to hybridization conditions that affect the stability of hybrids, e.g., temperature, salt concentration, pH, formamide concentration and the like. These conditions are empirically optimised to maximize specific binding and minimize nonspecific binding of primer or probe to its target nucleic acid sequence. The terms as used include reference to conditions under which a probe or primer will hybridise to its target sequence, to a detectably greater degree than other sequences (e.g. at least 2-fold over background). Stringent conditions are sequence dependent and will be different in different circumstances. Longer sequences hybridise specifically at higher temperatures. Generally, stringent conditions are selected to be about 5°C lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH. The Tm is the temperature (under defined ionic strength and pH) at which 50% of a complementary target sequence hybridises to a perfectly matched probe or primer. Typically, stringent conditions will be those in which the salt concentration is less than about 1.0 M Na+ ion, typically about 0.01 to 1.0 M Na+ ion concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30°C for short probes or primers (e.g. 10 to 50 nucleotides) and at least about 60°C for long probes or primers (e.g. greater than 50 nucleotides). Stringent conditions may also be achieved with the addition of destabilizing

agents such as formamide. Exemplary low stringent conditions or "conditions of reduced stringency" include hybridization with a buffer solution of 30% formamide, 1 M NaCl, 1% SDS at 37°C and a wash in 2x SSC at 40°C. Exemplary high stringency conditions include hybridization in 50% formamide, I M NaCl, 1% SDS at 37°C, and a wash in O.lx SSC at 60°C. Hybridization procedures are well known in the art. The terms "under stringent hybridization conditions" and "high stringency conditions" are equivalent.

The term "oligonucleotide" refers to a short sequence of nucleotide monomers (usually 6 to 100 nucleotides) joined by phosphorous linkages (e.g., phosphodiester, alkyl and aryl-phosphate, phosphorothioate), or non- phosphorous linkages (e.g., peptide, sulfamate and others). An oligonucleotide may contain modified nucleotides having modified bases (e.g., 5-methyl cytosine) and modified sugar groups (e.g., 2'-O-methyl ribosyl, 2'-O- methoxyethyl ribosyl, 2'-fluoro ribosyl, 2'-amino ribosyl, and the like). Oligonucleotides may be naturally-occurring or synthetic molecules of double- and single- stranded DNA and double- and single- stranded RNA with circular, branched or linear shapes and optionally including domains capable of forming stable secondary structures (e.g., stem-and-loop and loop-stem-loop structures). The term "primer" as used herein refers to an oligonucleotide which is capable of annealing to the amplification target allowing a DNA polymerase to attach thereby serving as a point of initiation of DNA synthesis when placed under conditions in which synthesis of primer extension product which is complementary to a nucleic acid strand is induced, i.e., in the presence of nucleotides and an agent for polymerization such as DNA polymerase and at a suitable temperature and pH. The (amplification) primer is preferably single stranded for maximum efficiency in amplification. Preferably, the primer is an oligodeoxy ribonucleotide. The primer must be sufficiently long to prime the synthesis of extension products in the presence of the agent for polymerization. The exact lengths of the primers will depend on many factors, including temperature and source of primer. A "pair of bi-directional primers" as used herein refers to one forward and one reverse primer as commonly used in the

art of DNA amplification such as in PCR amplification, and may be directed to the coding strand of the DNA or the complementary strand.

The term "probe" refers to a single- stranded oligonucleotide sequence that will recognize and form a hydrogen-bonded duplex with a complementary sequence in a target nucleic acid sequence analyte or its cDNA derivative.

The probes and primers herein are selected to be "substantially" complementary (i.e. at least 65%, more preferably at least 80% perfectly complementary) to their target regions present on the different strands of each specific sequence to be amplified. It is possible to use primer sequences containing e.g. inositol residues or ambiguous bases or even primers that contain one or more mismatches when compared to the target sequence. In general, sequences that exhibit at least 65%, more preferably at least 80% homology with the target DNA oligonucleotide sequences, are considered suitable for use in a method of the present invention. Sequence mismatches are also not critical when using low stringency hybridization conditions.

A "complement" or "complementary sequence" is a sequence of nucleotides which forms a hydrogen-bonded duplex with another sequence of nucleotides according to Watson- Crick base-paring rules. For example, the complementary base sequence for 5'-AAGGCT-3' is 3'-TTCCGA-5'. The term "gene", as used herein refers to a DNA sequence including but not limited to a DNA sequence that can be transcribed into mRNA which can be translated into polypeptide chains, transcribed into rRNA or tRNA or serve as recognition sites for enzymes and other proteins involved in DNA replication, transcription and regulation. The term refers to any DNA sequence comprising several operably linked DNA fragments such as a promoter region, a 5' untranslated region (the 5' UTR), a coding region (which may or may not code for a protein), and an untranslated 3' region (3' UTR) comprising a polyadenylation site. Typically, the 5'UTR, the coding region and the 3'UTR are transcribed into an RNA of which, in the case of a protein encoding gene, the coding region is translated into a protein. The gene usually conmprises introns and exons A gene may include additional DNA fragments such as, for example, introns. Thus the term "in a gene" as used herein may well refer to a 5'or 3'

UTR or other sequence which is in close proximity to any intron or exon sequence of a gene as described. Close proximity refers to less than 2500 bases, preferably less than 1000 bases, 500 bases, 200 bases, more preferably less than 150 bases or 100 bases, most preferably less than 50 bases removed from any intron or exon of a gene as described herein.

"Sample" is used in its broadest sense as containing nucleic acids. A sample may comprise a bodily fluid such as blood; the soluble fraction of a cell preparation, or an aliquot of media in which cells were grown; a chromosome, an organelle, or membrane isolated or extracted from a cell; genomic DNA, RNA, or cDNA in solution or bound to a substrate; a cell; a tissue; a tissue print; a fingerprint, buccal cells, skin, or hair; and the like.

By "amplified" is meant the construction of multiple copies of a nucleic acid sequence or multiple copies complementary to the nucleic acid sequence using at least one of the nucleic acid sequences as a template. The product of amplification is termed an amplicon. Methods of the invention can in principle be performed by using any nucleic acid amplification system. Amplification systems include the Polymerase Chain Reaction (PCR;U.S. 4,683,195, 4,683,202, and 4,800,159) the Ligase Chain Reaction (LCR; EP 0 320 308), Self- Sustained Sequence Replication (3SR), Strand Displacement Amplification (SDA; U.S. 5,270,184, and 5,455,166), Transcriptional Amplification System (TAS), Q-Beta Replicase, Rolling Circle Amplification (RCA; U.S. 5,871,921), Nucleic Acid Sequence Based Amplification (NASBA), Cleavase Fragment Length Polymorphism (U.S. 5,719,028), Isothermal and Chimeric Primer- initiated Amplification of Nucleic Acid (ICAN), Ramification- extension Amplification Method (RAM; U.S. 5,719,028 and 5,942,391) or other suitable methods for amplification of DNA.

In order to amplify DNA with a small number of mismatches to one or more of the amplification primers, an amplification reaction may be performed under conditions of reduced stringency (e.g. a PCR amplification using an annealing temperature of 38°C, or the presence of 3.5 mM MgC12). The person skilled in the art will be able to select conditions of suitable stringency. The

detection of the amplification products can in principle be accomplished by any suitable method known in the art.

"DNA Array" refers to an ordered arrangement of at least two cDNAs on a substrate. At least one of the cDNAs represents a control or standard, and the other, a cDNA of diagnostic or therapeutic interest. The arrangement of two to about 40,000 cDNAs on the substrate assures that the size and signal intensity of each labeled hybridization complex, formed between each cDNA and at least one nucleic acid, is individually distinguishable.

"cDNA" refers to an isolated polynucleotide, nucleic acid molecule, or any fragment or complement thereof. It may have originated recombinantly or synthetically, may be double- stranded or single- stranded, represents coding and noncoding 3' or 5' sequence, and lacks introns.

"Portion" refers to any part of a nucleic acid sequence encosing a gene as defined herein used for any purpose; but especially, to a fragment of said gene comprising the polymorphic nucleotide position.

The present inventors have discovered specific SNPs that are linked to the phenotype age at menopause. Some of these SNPs indicate that the female individual wherein the SNP is detected has an increased chance of experiencing an age at menopause which is earlier than average. Other SNPs indicate that the age at menopause is later than that of the average population, In table 2, the direction of the effect on the age at menopause is provided relative to the effective allele (Allele 1) as an age decrease or earlier menopause (--) or as an age increase or later than average menopause (++). Women in the population having a (--) marker will on average experience an age at menopause that is 1.2 to 1.4 years earlier than women that do not have the marker.

The method of the present invention may thus involve the determination of 1 marker, but may suitably involve the determination of 2, 3, 4, 5 or more markers, as well as the grouping of women according to number of markers and assignment of a female individual tested to one of these groups.

In one aspect, the present invention provides a method of predicting the age at natural menopause in a female individual comprising detecting at least

one SNP in a specific gene. The method may typically involve the provision of a nucleic acid sample of a subject in which the age at natural menopause is to be predicted. This sample is suitably a blood sample. The sample will typically be treated in order to preserve the nucleic acids comprised therein. It is advantageous if the sample is treated and purified in order to obtain a sample essentially consisting of nucleic acids. Such a sample can suitably be used for performing genetic screening methods of the invention.

In order to detect a polymorphism in a gene, the gene, or a portion thereof, may be amplified from the sample, for instance by using PCR in combination with a set of bi-directional primers for gene-specific sequences.

Once the amplicon representing the gene, or a portion thereof is obtained it can be used in a detection assay for detecting a single nucleotide polymorphism as defined herein that is predictive for age at natural menopause. Alternatively, one may use SNP-specific primers in an amplification reaction in order to amplify only the polymorphic gene sequence. In that case, the successful production of an amplicon will immediately indicate the presence of the polymorphic gene in the nucleic acid sample.

An amplicon can be prepared from any type of nucleic acid present in the sample that reflects the polymorphic position, such as DNA and RNA. The polymorphic genes are an aspect of the present invention. In general they can be characterized in comprising a polymorphic nucleotide position, or single nucleotide polymorphism as defined herein, that is predictive for age at natural menopause. In addition, the polymorphic genes of the present invention will selectively hybridizing under high stringency conditions to a nucleotide sequence encoding the gene, or to its complement. The presence of the single nucleotide polymorphism will not affect the hybridization characteristics of the nucleic acid molecules of the invention.

The utility of the isolated nucleic acid molecules of the invention is that they are predictive for age at natural menopause in a female individual. Other aspects of the invention include oligonucleotides useful in the methods of the present invention. Typically, the oligonucleotides will find utility in the form of primers or probes suitable for detecting the

polymorphisms that are predictive for age at natural menopause in a female individual as defined herein. The skilled person is well aware of the various methods and assay types available for determining the presence of an SNP in a nucleic acid sample. Such genetic screening or genotyping techniques need not be described in great detail herein.

Essentially, in an assay format for direct discriminative amplification of the polymorphic sequence (while not amplifying the "wild-type" sequence) at least one bidirectional amplification primer should be able to hybridize under stringent conditions to the polymorphic sequence. This can for instance be achieved by selecting the target region for hybridization such that it overlaps (at least over a length of one or more nucleotide positions) with the polymorphic position. If the polymorphic position is present, the primer will anneal, if not, primer annealing will be compromised resulting in less efficient amplification. Alternatively, the portion of the gene comprising the (suspected) polymorphic position can be amplified using generic amplification primers for the target gene and the presence of the SNP can be determined by selective probe hybridization, restriction fragment analysis or minisequencing. The skilled person is well aware of the various possibilities available for detecting the presence of a defined SNP in a gene. Primer and/or probe annealing will generally involve to provision of conditions wherein the probe and/or primer is contacted with the nucleic acid to be tested is and is allowed to hybridize to the nucleic acid under conditions that favour the formation of hybrids or duplexes between fully complementary nucleic acid sequences, but that hinder the formation of hybrids between partially complementary sequences.

The detection of a polymorphic gene can also be accomplished by immunological methods in case the SNP results in an amino acid change in the protein encoded by the gene. The skilled person is well aware of how such immunological methods may be performed. For instance, if the SNP results in a genetic change that leads to a polymorphic protein, then this protein may be synthesised and use for the production of specific antibodies with which the

protein, and, hence, the polymorphism, may be detected in a sample of a subject.

In another aspect, the present invention provides a genetic screening method comprising detecting in a nucleic acid sample the presence of a polymorphic gene wherein at least one oligonucleotide of the present invention is allowed to hybridize under stringent conditions to the nucleic acid in said sample. In principle, the polymorphic gene (the gene comprising the polymorphism) can be detected by identifying the polymorphic sequence. This can be achieved, for instance by a method selected from the group consisting of: restriction fragment length polymorphism (RFLP) analysis, minisequencing, MALD-TOF, short interspersed repeat element (SINE), heteroduplex analysis, single strand conformational polymorphism (SSCP), denaturing gradient gel electrophoresis (DGGE), temperature gradient gel electrophoresis (TGGE), Q- PCR, RT-PCR, restriction enzyme analysis and DNA array hybridization. An oligonucleotide that specifically hybridizes to the isolated nucleic acid molecule according to claim 6, and wherein the oligonucleotide hybridizes to a portion of the isolated nucleic acid molecule comprising the polymorphic nucleotide position.

The oligonucleotides of the present invention are capable of hybridizing under stringent conditions hybridizes to the polymorphic gene sequences as identified by the present inventors. Essentially these oligonucleotides will hybridize to the gene in isolated, as well as in non-isolated form. Hence, the oligonucleotides are also suitable for "in situ" hybridization assays. Essentially, the oligonucleotides of the present invention are characterized in that they hybridize to the polymorphic position. As noted above, this will mean that their projected target sequence in the gene contains the SNP. Suitably, as stated earlier, the oligonucleotides overlap with the polymorphic position, so that its target region extends towards the 3'and 5'end from the polymorphic position. The overlap can suitably be 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 or more nucleotides in either direction of the polymorphic position. The oligonucleotide will generally be between about 18 nucleotides and about 50 nucleotides in length. Although longer oligonucleotides are possible, they are

not preferred in specific hybridization formats. As an example, a central nucleotide of the oligonucleotide specifically hybridizes with the polymorphic position of the portion of the nucleic acid molecule. The oligonucleotides of the present invention may be DNA or DNA analogues. A further aspect of the present invention is a kit-of-parts for use in a method for detecting a polymorphic gene as defined herein in a sample, said kit comprising at least one pair of bidirectional oligonucleotide primers as described herein above, or at least one oligonucleotide probe described herein above; and one of the following: an instruction for performing a method of the present invention, specific packaging material for collecting or storing nucleic acid samples, and a reagent for performing a nucleic acid amplification or hybridization reaction.

EXAMPLES Introduction

In this study we used Genome Wide Association Analysis (GWA). GWA allows identification of common genetic risk alleles. Essentially, GWA comprises screening the genome of several hundreds or thousands of subjects in a case-control study or population base cohort study with a large number (usually >500.000) of Single Nucleotide Polymorphisms (SNPs). Subsequently, an association analysis may be performed between a certain phenotype and the SNPs. In this way statistically significant associations may be identified between genetic markers and phenotype. Using a discovery sample with GWA data candidate genetic markers having a significance of l*10 4 may be selected. Subsequently, a replication sample with GWA data is used to further focus in high potential candidate markers (having p<l*10 5 ). Thereafter, these high potential markers may be analysed in subsequent replication cohorts which do not necessarily have GWA data associated with them but which can be genotyped for the particular genetic markers identified.

The Study

In the present GWA study we used the so-called baseline cohort of the Rotterdam Study (Hofman et al., Eur J Epi 2007) that was genotyped with the Illumina 550K SNP microarray, and compared the top hits with p<l*10 4 with the GWA data from the so-called Twins UK cohort that was genotyped with the Illumina 300K SNP micro array (Richards et al. 2008). In Table 1 we have presented the baseline characteristics of these two cohorts.

Table 1. Baseline characteristics of the Rotterdam study and the TwinsUK study

Study Number of subjects Age (years) Height (cm) Weight (kg) Age at natural menopause (yrs)

Rotterdam baseline 2419 70.0 (9.1) 161.1 (6.7) 69.5 (11.4) 49.6 (4.5) TwinsUK 612 55.6 (6.7) 161.9 (6.1) 66.0 (11.1) 48.5 (3.8)

Results

1. Menopause Stage 1 In stage 1 we genotyped 550K SNPs in 2420 women of the Rotterdam study using Illumina HumanHap 550v3 (Richards, Rivadeneira et al., Lancet 2008). After quality control 535,354 SNPs in 2419 women were left. Using PLINKvI.01 software (URL: http://pngu.mgh.harvard.edu/purcell/plink/ Purcell et al., Am J Hum Genet 2007) we analyzed the association of these SNPs with age at natural menopause (genomic inflation factor: 1.01669). Figure Ia shows the quantile-quantile-plot (QQ-plot) of this analysis, indicating that there is no genomic inflation of the data. In Figure Ib the genomic position and the negative log p value of the association of each SNP with age at natural menopause is shown. This plot shows no genome-wide significant SNPs (P<5*10 8 ). The strongest signals were found on chromosome 19, rs2384687 (P=4*10 7 ) and rsll72822 (p=7*10 7 ).

Stage 2

In stage 2 we performed a meta-analysis using the METAL software package (URL: http://www.sph. umich.edu/csg/abecasis/metal/index.html) on

summary statistics of the Rotterdam study and the TwinsUK study. We used summary statistics, because in the Rotterdam study unrelated population- based samples were used, whereas in the TwinsUK study monozygous and dyzygous twins were included. From the TwinsUK study we selected all singletons and randomly selected one individual per sibship. In total 612 women from the TwinsUK study were genotyped using the 300K Duo chip arrays (Richards et al, Lancet 2008). After quality control 317,818 SNPs were left for analysis using MERLIN (Abecasis et al., Nat Genet 2002).

The meta-analysis was performed on 315,418 SNPs, genotyped in both cohorts. From this combined analysis 2 SNPs on chromosome 19 showed genome-wide significance, rs2384687 (P=l*10 8 ) and rsll72822 (P=2*10 8 ). 35 SNPs from this analysis have a P<l*10 4 (Table 2) and they correspond to 25 loci of which 5 loci have multiple SNPs.

Table 2. P-values and overall effect size 35 top SNPs

Top p<10-4 (n=35) for ERGO (n=2400) & TwinsUK (n=600) from meta-analysis 315K SNPs

K) κ>

same colour means same locus, no colour, is only 1 SNP on that locus in top 35 * Based on NCBI Genome build 36 3

Same colours are indicated by bracket "{"}

Figures 2a to 2e show the r2-based LD plots for the 5 loci with multiple SNPs associated with age at natural menopause in the metaanalysis. In the lower panel of Figure 2f the genes located at the 200kb chromosome 19 locus, containing the 5 SNPs associated with age at natural menopause, are shown. The upper panel shows the p-values for all the SNPs in this 200kb region that are on the 550K array in the Rotterdam study. The 5 most significant SNPs in this region are all located in a 20kb region which contains the last 9kb part of the BRSKl gene and the immediate 11kb 3' region. This points to BRSKl as the causal gene.

A literature search (OMIM, Pubmed, etc) did not reveal any relationship with infertility, ovaria, ovarian failure or menopause. A search across the mouse genome informatics database (URL: http://www.informatics.jax.org/ ) indicated the existence of a knock-out mouse, but without any phenotype (only when both BRSKl and BRSK2 (chr Ilpl5.5, also a serine/threonine kiniase) neonatal mortality and nervous system dysmorphia was seen. When we analysed the relation between SNPs in the BRSKl region and mRNA gene expression profile data (Illumina ref6; 48K transcripts) in 60 HapMap (CEU) EBV transformed blood cell lines, no relationship was observed (Sanger Institute; GENEVAR adatset; URL: http://www.sanger.ac.uk/humgen/genevar/ ).

So, while BRSKi is no obvious candidate from this limited search we suspect that it may explain the association observed. In addition, the other genes in the 200kb region may also be involved. For the genes screened in the immediate neighbourhood (HSPBPl and SUV420H2) ant the other 7 genes, we could also not find any clues to a possible causal relationship with age at natural menopause.

2. Osteoporosis

Early menopause is associated with a higher risk for osteoporosis. Therefore, we also analyzed all 535,354 SNPs in the Rotterdam study to identity SNPs that show an association with a combination of lumbar spine BMD and femoral neck BMD and age at natural menopause. SNPs were significant if they reached p<0.01 for all three phenotypes. Table 3 shows the 15 SNPs on 11 loci that were significant in this analysis. None of these SNPs overlapped with the top35 from the menopause analysis. The genes do not contain any obvious bone candidate genes, nor do they overlap with the genes recently reported by Richards, Rivadeneira et al (Lancet 2008) and Styrkarsdottir et al. (NEJM 2008) in their GWAS analyses for BMD genes.

Table 3 provides a list of Tag SNPs as described above. These Tag SNPs can also be used in aspects of the present invention, as they are in linkage disequilibrium with the Top35 SNPs. Table 3 provides information on the chromosome (Chr) the primary SNP as defined herein (the tagged SNP) the SNP linked thereto (tag SNP) and the correlation indicating the degree of LD (r 2 ).

10 rsll250802 1.00

10 rsl7157052 rs7091914 1.00

10 rsl7157052

10 rs4545465 1.00

10 rsl7157015 1.00

10 rslO823195 rslO823195

10 rs2622436 rs2542638 1.00

10 rs2622436

10 rs2622437 0.88

10 rsl345475 0.96

11 rs4397868 rs4397868

11 rs6590727 0.86

11 rs4936202 0.82

14 rsl2431748 rsl2431748 _

14 rslO136524 1.00

14 rs7155692 1.00

14 rsl2431723 1.00

16 rs2875853 rs7203786 0.85

16 rs2875853

16 rs2136660 0.85

16 rsll860273 1.00

17 rs2332902 rsl3353194 1.00

17 rsllO79287 1.00

17 rsl815198 1.00

17 rs4341799 0.96

17 rs4794716 1.00

17 rs2332902

19 rsl551562 rsl551562 1.00

19 rsll72822 rsll668344 0.89

19 rsll72822 1.00

19 rsll668309 0.85

19 rs4806660 0.96

19 rs7246479 rslO411773 0.93

19 rslO412726 0.93

19 rs897798 0.82

19 rs8113016 0.84

19 rs734518 0.93

19 rs7246479 1.00

19 rs2384687 rs2384687 1.00

19 rs897798 rs897798 1.00

19 rs7252864 0.81

22 rs8137004 rs8137004 _

# basec on HapMap release 23a 60 CEU HapMap subjects - is the tag SNP itself, so is always 1.00, but adds no extra tagged SNPs

Table 4 Provides detailed sequence information of the SNPs as defined herein.

Table 4. Information on the sequence of the SNPs as defined herein.

Example 2

We conducted a genome-wide association study for age at natural menopause in 2,979 European women and identified six SNPs in three loci associated with age at natural menopause: chromosome 19ql3.4

(rsll72822; -0.4 year per T allele (39%); P = 6.3 x 10(-1I)), chromosome 20pl2.3 (rs236114; +0.5 year per A allele (21%); P = 9.7 x 10(-1I)) and chromosome 13q34 (+0.5 year per A allele (12%); P = 2.5 x 10(-8)). These common genetic variants regulate timing of ovarian aging, an important risk factor for breast cancer, osteoporosis and cardiovascular disease.

Menopause, the time of a woman's life when menstrual cycle ceases owing to depletion of the follicle pool, is a key event in reproductive aging. It influences a woman's well-being and is an important risk factor for several major age-related diseases including cardiovascular disease, breast cancer and osteoporosis. Age at menopause averages around 50-51 years and ranges between 40 and 60 years of age; twin studies have shown this variability to be genetically determined with Q4 heritabilities of 44-65%. Such genetic factors might regulate the size of the follicle pool and the rate of its depletion, and their identification could have biological and clinical applications.

Typical for complex quantitative traits, genome-wide linkage studies of menopause have been unsuccessful, and candidate gene studies have mainly focused on the estrogen pathwayβ and have had conflicting results. This suggests that the apparent effect sizes for genetic variants are small and that the major causative loci have not been identified. Genome-wide association studies (GWAS) have proven successful in identifying common susceptibility genes with small effect sizes for many complex diseases and traits and might be suitable to identify genetic factors involved in determining age at menopause. In this GWAS we used a two-stage design to identify previously unknown loci influencing age at menopause. We included women with self- reported natural age at menopause (defined as 12 months without regular

periods) between 40 and 60 years, excluding those with hysterectomy, uni- or bilateral ovariectomy, menopause induced by irradiation, or menopause occurring after stopping the contraceptive pill or hormone replacement therapy. In stage 1 we genotyped 2,368 women of the Rotterdam Study baselineθ with the Illumina HumanHap 550v3 Beadarray. After quality control, 535,354 SNPs were left for analysis. Allelic association tests were carried out using PLINKvI.01 softwarelO for age at natural menopause . The genomic inflation factor (1) was 1.01669 for this analysis, indicating no population stratification, so we based our results on the uncorrected P values. The strongest association signals were found for rs2151145 (P = 5.3 x lO 6 ) on chromosome 9, rs236114 (P = 5.6 x 10 6 ) on chromosome 20 and rsl 172822 (P = 6.3 x 10 6 ) on chromosome 19.

We combined the results from the Rotterdam Study baseline with GWA data from the TwinsUK study. A total of 611 women with natural menopause using the same definitions and exclusions as above were genotyped with the Illumina HumanHap 300K beadarray, and after quality control 317,818 SNPs were left for analysis. After adjusting for relatedness and genomic control, we did not observe any genome-wide significant signals in this study. Because of the different study designs we conducted meta-analysis on summary statistics of the two studies using METAL (http://www.sph.umich.edu/csg/abecasis/metal/) on 315,418 SNPs common to both cohorts (2,979 women), but we did not observe any genome-wide significant SNPs. From this meta-analysis, all SNPs with P < 1 x 10 4 , corresponding to 32 SNPs from 24 loci (with five loci having multiple significant SNPs), were followed up in stage 2. Twentyfour SNPs were genotyped using Sequenom iPLEX genotyping and seven SNPs using Taqman allelic discrimination (Applied Biosystems) in 2,560 samples of four additional cohorts of postmenopausal females of European ancestry; one of the SNPs (rsl 1786333) failed genotyping. For the remaining 31 SNPs, we calculated combined P values, betas and standard errors using inverse variance

fixed-effects meta-analysis and identified six common SNPs that were genome-wide significant in the combined stage 1 and 2 analysis. Four SNPs on chromosome 19 were significant: rsll72822 (MAF = 0.39), P = 6.28 x lθ(-ll), beta= -0.391 year per Tallele (s.e.m. = 0.0598); rs2384687 (MAF = 0.40), P = 1.39 x 10 (-10), beta = 0.381 year per C allele (s.e.m. = 0.0594); rsl551562 (MAF = 0.25), P = 1.04 x 10(-9), beta = 0.4279 year per G allele (s.e.m. = 0.0701); and rs897798 (MAF = 0.48), P = 3.91 xlO(-8), beta = 0.308 year per G allele (s.e.m. = 0.056). These four SNPs are likely to report the same signal because the linkage is high (D' > 0.92, r2 > 0.5). On chromosome 20, rs236114 (MAF = 0.21) was genome-wide significantly associated with age at natural menopause (P = 9.71 xlθ(-ll), beta = 0.4953 year per A allele (s.e.m. = 0.0765)). Furthermore, on chromosome 13 rs7333181 (MAF = 0.12) was genome-wide significant: P = 2.50 x 10- 8 , beta = 0.5201 (s.e.m. = 0.0933). The six genome-wide- significant hits showed no heterogeneity (I 2 < 25%), so fixed effects models were used. In addition, we estimated the risk for menopause before the age of 50 by allele of the six genome-wide significant SNPs. We conducted fixed-effects meta-analysis for SNPs not showing heterogeneity (rs7333181, rsl551562, rsll72822, rs2384687, rs897798), and random effects meta-analysis for rs236114, for which I 2 was 31%. This metaanalysis showed that the A allele of rsll72822 is associated with a 19% increased risk for natural menopause before 50 years (OR = 1.19, 95% CI = 1.09-1.29, P = 6.2 x 10(-5). The other SNPs on chromosome 13, 19 and 20 showed a similar increase or decrease in risk. The initial analysis was not adjusted for covariates such as age, body mass index, smoking, age at menarche, parity and use of oral contraceptives and female hormones. To rule out an effect of these covariates on the association of the genome -wide -significant hits, we carried out adjusted linear regression of these SNPs in the Rotterdam Study baseline cohort. None of the previously found associations was affected by the adjustment for these covariates, indicating that the effect of the SNP occurs directly on age at natural menopause and not via one of

the covariates. We calculated the total explained variance in age at natural menopause for these SNPs in the combined replication studies to be 1.1% (range 0.1-0.5% per SNP).

We then conducted fine mapping of these signals using metaanalysis of imputed data of the stage 1 studies, and found three SNPs on chromosome 13 with more or equal significance as rs7333181, two SNPs on chromosome 19 and one on chromosome 20 with higher significance compared to the previously reported SNPs. For all three loci, the imputed SNPs are located in the same linkage disequilibrium (LD) block as the genome-wide-significant SNPs.

The four chromosome 19 SNPs are located within an LD block covering almost 20 kb and are located intronic and 3' of the BRSKl gene (BR serine threonine kinase 1), in the 3' region and inside the hypothetical gene LOC284417, and 5' of the SUV420H2 gene (suppressor of variegation 4-20 homolog 2, a lysine methyltransferase). Literature analysis for these genes did not indicate an immediate functional explanation for the observed association, although for both BRSKl and SUV420H2 a possible involvement in ovarian aging was suggested. rs7333181 on chromosome 13 is located 4250 kb 3' of the hypothetical gene LOC121793 and the ARHGEF7 (rho guanine nucleotide exchange factor 7) gene, also known as COOLl (cloned out of library 1). ARHGEF7 has a role in cell proliferation through phosphorylation of FOXO 3a. FOXO 3a knockout mice are infertile due to early depletion of the follicle pool, indicating a possible role of this gene in menopause. The chromosome 20 SNP is located in an intron of the MCM8

(minichromosome maintenance complex component 8) gene. The more significantly associated SNP from the imputed data is a nonsynonymous SNP in exon 9 of this gene (E341K), and could influence the protein structure or function of MCM8. For the gene on chromosome 20 no involvement in ovarian aging or menopause was suggested.

Identification of the causative variant(s) and the responsible gene(s) underlying the observed associations requires further research, which will

enhance our molecular understanding of the genetic regulation of the ovarian reserve and aging process. Although the rate of ovarian aging is highly variable among women, identification of women with decreased ovarian reserve is clinically relevant, as timing of menopause is an important risk factor, for example, for breast cancer, cardiovascular disease and osteoporosis.

Table 1.

Tabic I T ( K sst genβTifr "λώa ssjf>itic>jnt SfiSPs Jtsd isssacidtiϊn «*tth .*κe aϊ nisnopaiϋ*

v" ,. ' .! ' ϊ i . * .' p t - i , ^ !

A i O O' t J -ϊ ϊ i ! λ ϊ

A >- . ^y- i J t )\™ v ' j ov

REFERENCES

Broekman FJ, Knauff EAH, te Velde ER, Macklon NS, Fauser BC; Female reproductive ageing: current knowledge and future trends; TRENDS in Endocrinology and Metabolism (2007) 18(2):58-65

Nelson HD; Menopause; Lancet (2008) 371:760-770

DeLellis Henderson K, Bernstein L, Henderson B, Kolonel L, Pike MC; Predictors of the timing of natural menopause in the multiethnic cohort study; Am J Epidem. (2008) Epub 21 March 2008 Hofman A, Breteler MM, van Duijn CM, Krestin GP, Pols HA,

Strieker BH, Tiemeier H, Uitterlinden AG, Vingerling JR, Witteman JC; The Rotterdam Study: objectives and design update; Eur J Epidemiol. (2007) 22(ll):819-29

Richards JB, Rivadeneira F, Inouye M, Pastinen TM, Soranzo N, Wilson SG, Andrew T, Falchi M, Gwilliam R, Ahmadi KR, Valdes AM, Arp P, Whittaker P, Verlaan DJ, Jhamai M, Kumanduri V, Moorhouse M, van Meurs JB, Hofman A, Pols HA, Hart D, Zhai G, Kato BS, Mullin BH, Zhang F, Deloukas P,

Uitterlinden AG, Spector TD; Bone mineral density, osteoporosis, and osteoporotic fractures: a genome-wide association study; Lancet (2008) 371(9623):1505-12.

Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Mailer J, Sklar P, de Bakker PIW, Daly MJ & Sham PC; PLINK: a toolset for whole-genome association and population-based linkage analysis; American Journal of Human Genetics (2007) 81(3):559- 75

Abecasis GR, Cherny SS, Cookson WO and Cardon LR; Merlin — rapid analysis of dense genetic maps using sparse gene flow trees; Nat Genet (2002) 30(l):97-101

Styrkarsdottir U, Halldorsson BV, Gretarsdottir S, Gudbjartsson DF, Walters GB, Ingvarsson T, Jonsdottir T, Saemundsdottir J, Center JR, Nguyen TV, Bagger Y, Gulcher JR, Eisman JA, Christiansen C, Sigurdsson G, Kong A,

Thorsteinsdottir U, Stefansson K; Multiple Genetic Loci for Bone Mineral Density and Fractures; N Engl J Med. 2008 Apr 29; Epub.