Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
PROSTATE CANCER SUSCEPTIBILITY SCREENING
Document Type and Number:
WIPO Patent Application WO/2009/056862
Kind Code:
A3
Abstract:
A panel of prostate cancer susceptibility markers allows screening to identify individuals at high risk of developing prostate cancer. Such individuals can be offered closer clinical follow up and management to ensure timely diagnosis and intervention. A number of the single nucleotide polymorphisms (SNP's) have been found to be linked to sites/genes that are expected to play functional/causative roles in prostate cancer. The most highly associated SNP (rs 10993994) is 2bp upstream of the transcriptional start site of the MSMB gene. The MSMB gene encodes PSP94 which is an immunoglobulin binding factor that is produced by prostate epithelial cells and secreted into the seminal fluid. It has been reported that loss of PSP94 is associated with tumour recurrence in patients following prostatectomy. The second most significant association is for the SNP rs2735839 which is located between the kallikrein 2 and 3 genes. These genes have also been linked to prostate cancer and proposed as possible screening markers. Further SNPs are significantly associated with prostate cancer susceptibility if individuals.

Inventors:
EELES ROSALIND (GB)
EASTON DOUGLAS (GB)
NEAL DAVID AUSTIN (US)
GILES GRAHAM (AU)
Application Number:
PCT/GB2008/003711
Publication Date:
July 23, 2009
Filing Date:
October 31, 2008
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
CANCER REC TECH LTD (GB)
EELES ROSALIND (GB)
EASTON DOUGLAS (GB)
NEAL DAVID AUSTIN (US)
GILES GRAHAM (AU)
International Classes:
C12Q1/68
Domestic Patent References:
WO2003009814A22003-02-06
WO2005007830A22005-01-27
Other References:
AMUNDADOTTIR L T ET AL: "A common variant associated with prostate cancer in European and African populations", NATURE GENETICS, NATURE PUBLISHING GROUP, NEW YORK, US, vol. 38, no. 6, 7 May 2006 (2006-05-07), pages 652 - 658, XP002396336, ISSN: 1061-4036
SCHAID D J: "The complex genetic epidemiology of prostate cancer", HUMAN MOLECULAR GENETICS, OXFORD UNIVERSITY PRESS, SURREY, vol. 13, no. REVIEW NR 1, 28 January 2004 (2004-01-28), pages R103 - R121, XP002396337, ISSN: 0964-6906
NAM ET AL: "A Novel Serum Marker, Total Prostate Secretory Protein of 94 Amino Acids, Improves Prostate Cancer Detection and Helps Identify High Grade Cancers at Diagnosis", JOURNAL OF UROLOGY, BALTIMORE, MD, US, vol. 175, no. 4, 1 April 2006 (2006-04-01), pages 1291 - 1297, XP005363045, ISSN: 0022-5347
BUCKLAND PAUL R ET AL: "Strong bias the location of functional promoter polymorphisms", HUMAN MUTATION, vol. 26, no. 3, September 2005 (2005-09-01), pages 214 - 223, XP002513954, ISSN: 1059-7794
BEKE L ET AL: "The gene encoding the prostatic tumor suppressor PSP94 is a target for repression by the Polycomb group protein EZH2", ONCOGENE, vol. 26, no. 31, July 2007 (2007-07-01), pages 4590 - 4595, XP002513955, ISSN: 0950-9232
EELES ROSALIND A ET AL: "Multiple newly identified loci associated with prostate cancer susceptibility", NATURE GENETICS, vol. 40, no. 3, March 2008 (2008-03-01), pages 316 - 321, XP002513956, ISSN: 1061-4036
THOMAS GILLES ET AL: "Multiple loci identified in a genome-wide association study of prostate cancer", NATURE GENETICS, vol. 40, no. 3, March 2008 (2008-03-01), pages 310 - 315, XP002513957, ISSN: 1061-4036
Attorney, Agent or Firm:
WILLIAMS, Richard (40-43 Chancery Lane, London WC2A 1JA, GB)
Download PDF:
Claims:

Claims

1. A method of determining susceptibility of an individual to prostate cancer comprising obtaining a sample of genetic material from the individual and determining the presence of at least one minor allele selected from one or more of the following major/minor alleles:

(i) C/T on chromosome 10 between human genome Build 36 positions

51127568 to 51233999;

(ii) C/T on chromosome 3 between human genome Build 36 positions 87171855 to 87494644;

(iii) A/G on chromosome 10 between human genome Build 36 positions

51127568 to 51233999;

(iv) G/A on chromosome 19 between human genome Build 36 positions

56030318 to 56067528; (v) T/C on chromosome 7 between human genome Build 36 positions

97471108 to 97957786;

(vi) C/T on chromosome 6 between human genome Build 36 positions

160599390 tol60896138;

(vii) G/T on chromosome 11 between human genome Build 36 positions 68572099 to 68797456;(viii) G/A on chromosome 12 between human genome

Build 36 positions 51542070 to 51656362;

(ix) A/G on chromosome 19 between human genome Build 36 positions

56025050 to 56074922;

(x) A/G on chromosome 19 between human genome Build 36 positions 56025050 to 56074922;

(xi) T/C on chromosome X between human genome Build 36 positions 50992264 to 53073243.

2. A method as claimed in claim 1, further comprising determining the presence of one or more major/minor alleles of 8q24:

G/T on chromosome 8 between human genome Build 36 positions 128391369

- 128616342, preferably position 128482487; and/or

C/T on chromosome 8 between human genome Build 36 positions 127970831

- 128196434, preferably position 128162479; and/or A/C on chromosome 8 between human genome Build 36 positions

128391369 - 128616342, preferably 128587736.

3. A method as claimed in claim 1 or claim 2, further comprising determining the presence of one or major/minor alleles of 17ql2; preferably wherein the allele is associated with TCF2; more preferably wherein the allele is C/T on chromosome

17 between human genome Build 36 positions 33119634 - 33207727; even more preferably at position 33175269.

4. A method as claimed in any preceding claim, further comprising determining the presence of one or more major/minor alleles of 17q24, preferably T/G on chromosome 17 between human genome Build 36 positions 66616213 - 66754527; preferably position 66620348.

5. A method as claimed in any preceding claim, wherein the one or more alleles are selected from:

a. (i), (ii), (iii), (iv), (v), (vi) and (xi); b. (i), (iii) and (xi); c. (iv), (v) and (vi); d. (i), (iv), (vii) and (xi); or e. (ii).

6. A method as claimed in any preceding claim, wherein the alleles of (iii) and/or (i) are associated with the microseminoprotein beta gene (MSMB), preferably the alleles lie within MSMB.

7. A method as claimed in any preceding claim, wherein the allele of (iii) is A/G on chromosome 10 as shown in SEQ ID NO: 4, preferably at human genome Build 36 position 51202627.

8. A method as claimed in any preceding claim, wherein the allele of (i) lies upstream of the transcription start site of MSMB, preferably 2 base pairs upstream thereof; and/or the allele of (i) is associated with removal of multiple predicted binding sites for transcription and splicing factors; and/or the allele of (i) is associated with androgen and/or estrogen receptor binding sites, preferably such sites are less than 50bp upstream of the allele (i).

9. A method as claimed in any preceding claim, wherein the allele of (i) is C/T on chromosome 10 as shown in SEQ ID NO: 5, preferably at human genome Build 36 position 51219502.

10. A method as claimed in any preceding claim, wherein allele (iv) is associated with Kallikrein-related peptidase 2 (KLK2) and/or KLK3 (PSA), preferably located between KLK2 and KLK3, preferably located 3' ofKLK3.

11. A method as claimed in any preceding claim, wherein the allele of (iv) is G/A on chromosome 19 as shown in SEQ ID NO: 10, preferably at human genome Build 36 position 56056435.

12. A method as claimed in any preceding claim, wherein allele (v) is associated with LMTK2 encoding a neuronal kinase, cyclin-dependent kinase 5 (cdk5)/p3- regulated kinase (cprk) and Brain-Enriched Kinase (BREK), preferably wherein allele (v) lies in LMTK2, more preferably the allele lies in an intron ofLMTK2, more preferably intron 9 of LMTK2.

13. A method as claimed in any preceding claim, wherein allele (v) is associated with basic helix-loop-helix BHLHB8, preferably wherein the allele is located upstream of BHLHB8.

14. A method as claimed in any preceding claim, wherein the allele of (v) is T/C on chromosome 7 as shown in SEQ ID NO: 3, preferably at human genome Build

36 position 97460978.

15. A method as claimed in any preceding claim, wherein allele (vi) is associated with solute carrier family 22 (organic cation transporter (OCT)) genes and/or LPAL2 and/or LPA genes, preferably wherein the allele lies in SLC22A2 and/or

SLC22A3, more preferably wherein the allele lies in an intron of SLC22A3, even more preferably intron 5 of SLC22A3.

16. A method as claimed in any preceding claim, wherein the allele of (vi) is C/T on chromosome 6 as shown in SEQ ID NO: 2, preferably at human genome Build

36 position 160804075.

17. A method as claimed in any preceding claim, wherein allele (xi) is associated with one or more of: a. nudix (nucleoside diphosphate linked moiety X)-type motif 10

(NUDTlO); b. nudix (nucleoside diphosphate linked moiety X)-type motif 11 (NUDTl 1); c. GTP binding protein (GSPT2); d. MAGED l; e. MAGED 4; f. MAGED 4B; g. CTD-2267G17.3; h. XAGE 2; i. XAGElC; j. XAGE ID; k. XAGE 3;

I. XAGE 5; m. SSX 2; n. SSX 2B;

o. SSX 7; p. SSX 8; q. SPANXN5; r. TMEM 29; s. TMEM 29B; preferably wherein the allele is located between NUDTlO and NUDTIl, more preferably about 2kb upstream of NUDTl 1.

18. A method as claimed in any preceding claim, wherein the allele of (xi) is T/C on chromosome X as shown in SEQ ID NO: 11, preferably at human genome

Build 36 position 51074708.

19. A method as claimed in any preceding claim, wherein allele (ii) is associated with a gene poor region, preferably with chromatin modifying protein 2B (CHMP2B) and/or pituitary-specific transcription factor (POUlFl (PITl)), preferably upstream of CHMP2B, more preferably 170 kb upstream of CHMP2B.

20. A method as claimed in any preceding claim, wherein the allele of (ii) is C/T on chromosome 3 is as shown in SEQ ID NO: 1, preferably at human genome Build 36 position 87193364.

21. A method as claimed in any preceding claim, wherein allele (vii) lies in an LD block of 70kb on chromosome 11.

22. A method as claimed in any preceding claim, wherein the allele of (vii) is G/T on chromosome 11 as shown in SEQ ID NO: 6, preferably at human genome Build 36 position 68751073.

23. A method as claimed in any preceding claim, wherein the allele of (viii) is

G/A on chromosome 12 as shown in SEQ ID NO: 7, preferably at human genome Build 36 position 51560171.

24. A method as claimed in any preceding claim, wherein the allele of (ix) is A/G on chromosome 19 as shown in SEQ ID NO: 8, preferably at human genome

Build 36 position 56027755.

25. A method as claimed in any preceding claim, wherein the allele of (x) is A/G on chromosome 19 as shown in SEQ ID NO: 9, preferably at human genome Build 36 position 56040902.

26. A method as claimed in any preceding claim, further comprising obtaining a fluid sample from the individual and determining the amount of prostate cancer marker substance therein.

27. A method as claimed in claim 26, wherein the marker substance is specific for the presence of prostate cancer in an individual, e.g. prostate specific antigen (PSA).

28. A method as claimed in claim 26 or claim 27, wherein the fluid is blood, plasma or serum.

29. A method as claimed in any preceding claim, wherein the allele is (iv) and/or (xi) and optionally the individual is less than 55 years old.

30. A device for detection of risk of prostate cancer in an individual, comprising at least one surface or volume for receiving a sample, and a nucleic acid molecule hybridisable to at least one target nucleic acid in the sample, the target nucleic acid comprising one of the following major/minor alleles:

(i) C/T on chromosome 10 between human genome Build 36 positions

51127568 to 51233999;

(ii) C/T on chromosome 3 between human genome Build 36 positions

87171855 to 87494644; (iii) AJG on chromosome 10 between human genome Build 36 positions

51127568 to 51233999;

(iv) G/A on chromosome 19 between human genome Build 36 positions

56030318 to 56067528;

(v) T/C on chromosome 7 between human genome Build 36 positions 97471108 to 97957786;

(vi) C/T on chromosome 6 between human genome Build 36 positions

160599390 tol60896138;

(vii) G/T on chromosome 11 between human genome Build 36 positions

68572099 to 68797456;(viii) G/A on chromosome 12 between human genome Build 36 positions 51542070 to 51656362;

(ix) A/G on chromosome 19 between human genome Build 36 positions 56025050 to 56074922;

(x) A/G on chromosome 19 between human genome Build 36 positions 56025050 to 56074922; (xi) T/C on chromosome X between human genome Build 36 positions

50992264 to 53073243.

31. A device as claimed in claim 30, wherein the nucleic acid molecule is hybridisable to the target under stringent conditions.

32. A device as claimed in claim 30 or claim 31, wherein a tag is present, such that on hybridization of the nucleic acid molecule to a target nucleic acid molecule the tag is retained in association with the target nucleic acid.

33. A device as claimed in any of claims 30 to 32, wherein the tag is selected from a fluorescent substance, a luminescent substance, a protein, an antibody or fragment thereof, a radiolabel, a metal particle, a nanomaterial or nanoparticle.

34. A device as claimed in any of claims 30 to 33, wherein the nucleic acid molecule is linked to a surface, optionally wherein the nucleic acid molecule forms part of an array of a multiplicity of species of nucleic acid molecule.

35. A device as claimed in any of claims 30 to 34, wherein there are fewer than 100,000 species of nucleic acid molecule; preferably fewer than 50,000, more preferably fewer than 10,000, even more preferably fewer than 1000.

36. A device as claimed in any of claims 30 to 35, comprising indicia or label identifying the location of the nucleic acid molecule on a surface or within an array.

37. A device as claimed in any of claims 30 to 36, further comprising instructions directing a user to obtain a sample of genetic material and expose this to the device for the purpose of determining prostate cancer risk in the individual from whom the sample was taken.

38. A device as claimed in any of claims 30 to 37, further comprising a detector for the tag and wherein the detector generates a signal and thereby data corresponding to the presence of tag.

39. A device as claimed in any of claims 30 to 38, further comprising (a) a light source, plasmon resonance detection surface and a plasmon resonance detector; or (b) a mass detector arranged for detection of change in mass on the sample receiving surface or in the sample receiving volume.

40. A device as claimed in any of claims 38 or claim 39, further comprising a microprocessor programmed to receive tag data and perform operations on the data and to generate a data output, preferably wherein the operations include comparison of the tag data to reference data.

41. A system for determining the risk of prostate cancer in an individual from a sample of genetic material obtained from the individual, comprising a device as claimed in any of claims 32 to 37 and an apparatus for receiving the device, wherein the apparatus comprises a detector for the tag and wherein the detector generates a signal and thereby data corresponding to the presence of tag.

42. A method of screening for risk of prostate cancer in an individual comprising obtaining a sample from the individual and determining the presence and/or amount of one or more of the MSMB, LMTK2 and KLK3 gene products.

43. A method of screening for agents active against or preventative of prostate cancer, comprising exposing a test agent to an expression system expressing one or more of MSMB, LMTK2 and KLK3 gene products or fragments thereof, and measuring any change in the expression and/or activity of the gene product(s) or fragments thereof.

Description:

PROSTATE CANCER SUSCEPTIBILITY SCREENING

Field of the Invention

The invention relates to oncology and methods of screening for prostate cancer susceptibility. The field of the invention therefore concerns markers of predictive or clinical value in prostate cancer diagnosis. The invention also includes apparatus and systems used in the methods of screening.

Background to the Invention

Prostate cancer (PrCa) is now the commonest cancer in men in developed countries. There is strong evidence from family studies of genetic predisposition to PrCa: the disease is approximately twice as common in first degree relatives of affected men, and this familial relative risk is higher for cases diagnosed at younger ages [1] .

Despite this, few susceptibility genes for PrCa have been identified. Genetic linkage studies based on large series of multiple case families have failed to identify consistently reproducible susceptibility loci. Predisposition may be mediated through multiple common low penetrance alleles. Recent association studies have identified common alleles on 8q24 and on 17q to be associated with PrCa risk [2-7]. In addition, rare mutations in BRCA2 (and perhaps BRCAl), are associated with an increased risk of the disease, particularly at young ages [8-10]. Taken together, however, these loci explain less than 10% of the familial relative risk of PrCa.

The kallikreins are a subgroup of serine proteases. KLK2 and KLKS are 2 of 15 kallikrein subfamily members located in a cluster on chromosome 19. Prostate specific antigen (PSA) is a serine protease which liquefies semen and as a serum marker is used in screening and disease monitoring; there is also evidence that hK2 may also be useful for screening and prognosis [16, 17]. There is evidence that many kallikreins are implicated in carcinogenesis [18]. Multiple SNPs in the promoter region have been associated with PSA levels [19] and some have been suggested to be associated with PrCa risk [20].

Genome-wide association studies have been used to identify common disease alleles without prior knowledge of position or function [11, 12]. These studies compare genotype frequencies between cases and controls at large numbers of single nucleotide polymorphisms (SNPs), chosen so that they can report on most known common variants in the genome.

Summary of the Invention

The inventors have discovered that certain SNPs are associated with increased risk of prostate cancer.

The invention provides a method of determining susceptibility of an individual to prostate cancer comprising obtaining a sample of genetic material from the individual and determining the presence of at least one minor allele selected from one or more of the following major/minor alleles:

(i) C/T on chromosome 10 between human genome Build 36 positions

51127568 to 51233999; (ii) C/T on chromosome 3 between human genome Build 36 positions

87171855 to 87494644; (iii) AJG on chromosome 10 between human genome Build 36 positions

51127568 to 51233999; (iv) G/A on chromosome 19 between human genome Build 36 positions

56030318 to 56067528;

(v) T/C on chromosome 7 between human genome Build 36 positions 97471108 to 97957786;

(vi) C/T on chromosome 6 between human genome Build 36 positions

160599390 tol60896138; (vii) G/T on chromosome 11 between human genome Build 36 positions

68572099 to 68797456;(viii) G/A on chromosome 12 between human genome Build 36 positions 51542070 to 51656362;

(ix) AJG on chromosome 19 between human genome Build 36 positions

56025050 to 56074922; (x) AJG on chromosome 19 between human genome Build 36 positions

56025050 to 56074922; (xi) T/C on chromosome X between human genome Build 36 positions

50992264 to 53073243.

The method of the invention may also be used in conjunction with one or more known methods of screening for prostate cancer susceptibility.

In one embodiment the method may further comprise determining the presence of one or more major/minor alleles of 8q24: a. G/T on chromosome 8 between human genome Build 36 positions

128391369 - 128616342, preferably position 128482487; and/or b. C/T on chromosome 8 between human genome Build 36 positions

127970831 - 128196434, preferably position 128162479; and/or c. A/C on chromosome 8 between human genome Build 36 positions 128391369 - 128616342, preferably 128587736.

In a further embodiment the method may further comprise determining the presence of one or major/minor alleles of 17ql2; preferably wherein the allele is associated with TCF2; more preferably wherein the allele is C/T on chromosome 17 between human genome Build 36 positions 33119634 - 33207727; even more preferably at position 33175269.

In a further embodiment the method may further comprise determining the presence of one or more major/minor alleles of 17q24, preferably T/G on chromosome 17 between human genome Build 36 positions 66616213 - 66754527; preferably position 66620348.

In accordance with the invention any combination of one or more alleles may be selected. For example, a. (i), (ii), (iii), (iv), (v), (vi) and (xi); b. (i), (iii) and (xi); c. (iv), (v) and (vi);

d. (i), (iv), (vii) and (xi); or e. (ii).

Alleles of (iii) and/or (i) may be associated with the microseminoprotein beta gene (MSMB); preferably the alleles lie within MSMB.

Allele (iii) which is A/G on chromosome 10 may be as shown in SEQ ID NO: 4, preferably at human genome Build 36 position 51202627.

Preferably allele (i), which lies upstream of the transcription start site of MSMB, preferably 2 base pairs upstream thereof; and/or the allele of (i) may be associated with removal of multiple predicted binding sites for transcription and splicing factors; and/or the allele of (i) may be associated with androgen and/or estrogen receptor binding sites, preferably such sites are less than 50bp upstream of the allele (i).

In preferred embodiment the allele of (i) is CfY on chromosome 10 as shown in SEQ ID NO: 5; preferably at human genome Build 36 position 51219502.

Allele (iv) may be associated with Kallikrein-related peptidase 2 (KLK2) and/or KLK3 (PSA), preferably the allele is located between KLK2 and KLK3, preferably located 3 ' of KLK3.

In preferred embodiment the allele of (iv) is G/A on chromosome 19 as shown in SEQ ID NO: 10, preferably at human genome Build 36 position 56056435.

Allele (v) may be associated with LMTK2 encoding a neuronal kinase, cyclin- dependent kinase 5 (cdk5)/p3 -regulated kinase (cprk) and Brain-Enriched Kinase (BREK), preferably wherein allele (v) lies in LMTK2, more preferably the allele lies in an intron of LMTK2, more preferably intron 9 of LMTK2. Also, allele (v) may be associated with basic helix-loop-helix BHLHB8, preferably wherein the allele is located upstream of BHLHB8.

In preferred embodiment the allele of (v) is T/C on chromosome 7 as shown in SEQ ID NO: 3, preferably at human genome Build 36 position 97460978.

Allele (vi) may be associated with solute carrier family 22 (organic cation transporter (OCT)) genes and/or LPAL2 and/or LPA genes, preferably wherein the allele lies in SLC22A2 and/or SLC22A3, more preferably wherein the allele lies in an intron of SLC22A3, even more preferably intron 5 of SLC22A3.

In preferred embodiment, the allele of (vi) is C/T on chromosome 6 as shown in SEQ ID NO: 2, preferably at human genome Build 36 position 160804075.

Allele (xi) may be associated with one or more of: a. nudix (nucleoside diphosphate linked moiety X)-type motif 10 (NUDTlO); b. nudix (nucleoside diphosphate linked moiety X)-type motif 11 (NUDTl 1); c. GTP binding protein (GSPT2); d. MAGED l; e. MAGED 4; f. MAGED 4B;

g- CTD-2267G17.3; h. XAGE 2; i. XAGElC; j- XAGE ID; k. XAGE 3;

1. XAGE 5; m. SSX 2; n. SSX 2B; o. SSX 7; p. SSX 8; q- SPANXN5; r. TMEM 29;

S. TMEM 29B; preferably the allele is located between NUDTlO and NUDTl 1, more preferably about 2kb upstream of NUDTI l.

In preferred embodiment the allele (xi) is T/C on chromosome X as shown in SEQ ID NO: 11, preferably at human genome Build 36 position 51074708.

Allele (ii) may be associated with a gene poor region, preferably with chromatin modifying protein 2B (CHMP2B) and/or pituitary-specific transcription factor (POUlFl (PITl)), preferably upstream of CHMP2B, more preferably 170 kb upstream of CHMP2B. Furthermore, the allele of (ii) is C/T on chromosome 3 is as shown in SEQ ID NO: 1, preferably at human genome Build 36 position 87193364.

Allele (vii), may lie in an LD block of 70kb on chromosome 11.

In preferred embodiment the allele of (vii) is G/T on chromosome 11 as shown in SEQ ID NO: 6, preferably at human genome Build 36 position 68751073.

Allele (viii) which is G/A on chromosome 12 may be as shown in SEQ ID NO: 7, preferably at human genome Build 36 position 51560171.

Allele (ix) which is A/G on chromosome 19 may be as shown in SEQ ID NO: 8, preferably at human genome Build 36 position 56027755.

Allele (x) which is A/G on chromosome 19 may be as shown in SEQ ID NO: 9, preferably at human genome Build 36 position 56040902.

The method may further comprise obtaining a fluid sample from the individual and determining the amount of prostate cancer marker substance therein. The marker substance may be specific for the presence of prostate cancer in an individual, e.g. prostate specific antigen (PSA).

In alternative embodiments samples may be obtained from patients by other methods well known in the art, including but not limited to, samples of blood, serum, urine, ascites and intraperitoneal fluids.

Blood samples may be taken via venepuncture, (e.g. by vacuum collection tube or syringe,) catheter, cannula, or by finger prick or heel prick as appropriate to the needs

of the patient and the amount of blood required. Once a blood sample has been taken it may be treated prior to analysis (e.g. with sodium citrate, EDTA, ethanol or Heparin) for the purposes of preservation or in order to maximise the accuracy and/or reliability of the signal obtained by analysis of the sample.

Methods of processing (e.g. centrifugation and/or filtration) may be used to separate a blood sample into fractions each of which may be tested independently. For example, a blood serum sample is produced by allowing a whole-blood sample to clot on contact with air where the clotted fraction is removed by centrifugation to leave the serum as the supernatant.

Urine samples are preferably collected by urination or catheterisation.

The cells and/or liquid collected in a sample taken from a patient may be processed immediately or preserved in a suitable storage medium for later processing. For example, in the case of a blood sample the cells are often preserved in an EDTA containing storage medium for later processing and analysis. The sample may be treated for the purposes of preservation or for maximising the accuracy and/or reliability of the signal obtained by analysis of the sample. Methods of processing (e.g. centrifugation and/or filtration) may be used to separate a sample into fractions each of which may be tested independently.

The method may be used to test individuals of any age and for any combination of alleles. However the method may include testing for the allele (iv) and/or (xi)

optionally in instances when the individual may be less than 55 years old; optionally less than 50, 45, 40, 35, 25, 20 or 15 years old.

In preferred embodiments the individual is a male. Females can also be screened for presence or absence of the risk alleles that may be passed down through the germ line.

In further preferred embodiments quantity of each allele in a sample may be measured using a quantitative polymerase chain reaction (qPCR) method. Preferably the amount of each allele is compared to an amount of another marker, including those alleles being a part of the invention, within the sample and/or with reference to samples containing a known amount of the marker.

The invention also provides a device for detection of risk of prostate cancer in an individual, comprising at least one surface or volume for receiving a sample, and a nucleic acid molecule hybridisable to at least one target nucleic acid in the sample, the target nucleic acid comprising one of the following major/minor alleles:

(i) C/T on chromosome 10 between human genome Build 36 positions

51127568 to 51233999; (ii) C/T on chromosome 3 between human genome Build 36 positions

87171855 to 87494644; (iii) A/G on chromosome 10 between human genome Build 36 positions

51127568 to 51233999;

(iv) G/ A on chromosome 19 between human genome Build 36 positions 56030318 to 56067528;

(v) T/C on chromosome 7 between human genome Build 36 positions

97471108 to 97957786; (vi) C/T on chromosome 6 between human genome Build 36 positions

160599390 tol60896138; (vii) G/T on chromosome 11 between human genome Build 36 positions

68572099 to 68797456;(viii) G/A on chromosome 12 between human genome Build 36 positions 51542070 to 51656362; (ix) A/G on chromosome 19 between human genome Build 36 positions

56025050 to 56074922; (x) A/G on chromosome 19 between human genome Build 36 positions

56025050 to 56074922; (xi) T/C on chromosome X between human genome Build 36 positions

50992264 to 53073243.

In preferred embodiments the nucleic acid molecule may be hybridisable to the target under stringent conditions.

The term 'stringent conditions' is a term well known in the art and refers to conditions wherein only DNA molecules with a defined degree of sequence similarity can hybridise efficiently to complementary DNA. The 'stringent conditions' are normally user defined by choices in, for example, temperature, divalent cation concentration (usually Mg 2+ or Mn 2+ ) and pH. In this way, by varying the conditions, the user can limit efficient hybridisation to only that subset of DNA molecules that hybridise most accurately.

A DNA microarray (also known as gene or genome chip, DNA chip, or gene array) is a collection of microscopic DNA spots, commonly representing single genes, arrayed on a solid surface by covalent attachment to chemically suitable matrices. Qualitative or quantitative measurements with DNA microarrays utilize the selective nature of DNA-DNA or DNA-RNA hybridization under high-stringency conditions.

Fluorophore-based detection may be used to determine the degree of hybridisation from which a quantitative measurement may be calculated.

In further preferred embodiments of the invention, a tag may be present, such that on hybridization of the nucleic acid molecule to a target nucleic acid molecule the tag may be retained in association with the target nucleic acid.

The tag may be selected from one or more of the following: fluorescent substance, a luminescent substance, a protein, an antibody or fragment thereof, a radiolabel, a metal particle, a nanomaterial, a nanoparticle.

Fluorescent substances may be created by including a fluorophore (e.g. FITC, rhodamine, Texas Red) either attached or within the substance that emits a detectable signal when excited by a suitable source of energy. Normally this is light of a specific wavelength and is often a LASER.

Luminescent substances produce light. Normally this is achieved by conjugating an enzyme that catalyses a light producing chemical reaction in the presence of a suitable substrate. Such enzymes include luciferase and alkaline phosphatase.

Particles of colloidal gold or tungsten may be attached to molecules and used as labels for these molecules. The metal allows the detection of the molecule attached by detecting the mass, electron density or other property of the metal.

A nanoparticle (or nanocluster or nanocrystal) is a microscopic particle with at least one dimension less than 100 run. A wide variety of nanoparticles have been created and are known to the art, e.g. nanospheres, nanorods, and nanocups have been created and collectively go by the name of nanomaterials. Furthermore metal, dielectric, and semiconductor nanoparticles have been formed, as well as hybrid structures (e.g., core-shell nanoparticles). Nanoparticles made of semiconducting material may also be called quantum dots if they are small enough (typically sub 10 nm) that quantization of electronic energy levels occurs. Such nanoscale particles are known in the art and used in biomedical applications e.g. as labels for specific molecules or drug carriers.

Radioisotopic labelling is a technique that uses radioactive isotopes for tracking the passage of a sample of substance through a system. The substance is "labelled" by including radioactive isotopes in its chemical composition. When these 'radiolabels' decay, their presence can be determined by detecting the radiation they emit. Examples of this include the incorporation of P or P into DNA probes or labelling of proteins with 125 I and/or 35 S.

In a device of the invention the nucleic acid molecule may be linked to a surface, optionally wherein the nucleic acid molecule forms part of an array of a multiplicity of species of nucleic acid molecule.

The number and density of this multiplicity of species of nucleic acid molecule may be varied over a wide range and in alternative embodiments of the invention there are fewer than 100,000 species of nucleic acid molecule; preferably fewer than 50,000, more preferably fewer than 10,000, even more preferably fewer than 1000, 500, 400, 300, 200, 100, 75, 50, 25.

Another advantage of the invention may be the inclusion of indicia or label identifying the location of the nucleic acid molecule on a surface or within an array.

The multiplicity of DNA species used on the surface of or within an array has led to problems of organisation in cross-referencing the identity of particular DNA species with their location. Including separate indicia within the substance to which the particular DNA species is bound (e.g. etching a bar code at the location of, or on the element bearing a, particular DNA species) provides another method of cross- referencing in order to keep account of the location and identity of the DNA species that comprise an array.

An embodiment of the invention may also comprise instructions directing a user to obtain a sample of genetic material and expose this to the device for the purpose of determining prostate cancer risk in the individual from whom the sample was taken.

A device as described herein may comprise a detector for the tag, wherein the detector generates a signal and thereby data corresponding to the presence of tag. These embodiments may further comprise;

a. a light source, plasmon resonance detection surface and a plasmon resonance detector; or b. a mass detector arranged for detection of change in mass on the sample receiving surface or in the sample receiving volume.

Advantageously any of the devices and/or methods as described herein, may further comprise a microprocessor programmed to receive tag data and perform operations on the data and to generate a data output, preferably wherein the operations include comparison of the tag data to reference data.

The invention also provides a system for determining the risk of prostate cancer in an individual from a sample of genetic material obtained from the individual, comprising any of the devices described herein and an apparatus for receiving the device, wherein the apparatus comprises a detector for the tag and wherein the detector generates a signal and thereby data corresponding to the presence of tag.

The invention further provides a method of screening for risk of prostate cancer in an individual comprising obtaining a sample from the individual and determining the presence and/or amount of one or more of the MSMB, LMTK2 and KLK3 gene products.

The invention also provides a method of screening for agents active against or preventative of prostate cancer, comprising exposing a test agent to an expression system expressing one or more of MSMB, LMTK2 and KLK3 gene products or

fragments thereof, and measuring any change in the expression and/or activity of the gene product(s) or fragments thereof.

The gene products screened in the conduct of the invention may include mRNA, proteins and fragments thereof.

In a preferred embodiment the mRNA level may be measured by a quantitative polymerase chain reaction (qPCR) method, preferably a qPCR method where the template is the product of a reverse transcriptase reaction (RT-qPCR.)

In other preferred embodiments mRNA may be extracted from the sample and reverse transcribed to produce cDNA prior to qPCR.

Determining the presence and/or amount of one or more of the MSMB, LMTK2 and KLK3 gene products may be measured using a nuclease protection assay, preferably the probe used is specific for MSMB and/or LMTK2 and /or KLK3.

In other preferred embodiments the mRNA level may be measured using a DNA microarray.

Gene products may also be detected by means of an antibody specific for the gene product or a fragment thereof, preferably the antibody is monoclonal.

In another embodiment, the antibody may be a Fab fragment wherein said Fab fragment may be selected from the group consisting of: scFv, F(ab') 2 , Fab, Fv and Fd fragments; or CDR3 regions.

The fragment antigen binding (Fab fragment) is a region on an antibody which binds to antigens. It is composed of one constant and one variable domain of each of the heavy and the light chain. These domains shape the paratope — the antigen binding site — at the amino terminal end of the monomer. The two variable domains bind the epitope on their specific antigens.

Fc and Fab fragments can be generated. The enzyme papain can be used to cleave an immunoglobulin monomer into two Fab fragments and an Fc fragment. The enzyme pepsin cleaves below the hinge region, so a F(ab')2 fragment and a Fc fragment may be formed. The variable regions of the heavy and light chains can be fused together to form a single chain variable fragment (scFv), which is only half the size of the Fab fragment yet retains the original specificity of the parent immunoglobulin.

A complementarity determining region (CDR) is a short amino acid sequence found in the variable domains of antigen receptor (e.g. immunoglobulin and T cell receptor) proteins that complements an antigen and therefore provides the receptor with its specificity for that particular antigen. Most of the sequence variation associated with immunoglobulins and T cell receptors are found in the CDR regions, these regions are sometimes referred to as hypervariable domains. Among these, CDR3 shows the greatest variability as it is encoded by a recombination of the VJ regions.

Brief Description of the Drawings

Figure 1. Quantile-quantile plot for the test statistics (Cochran- Armitage 1 df chi squared trend tests) for stage 1. The continuous line gives the expected distribution assuming no inflation of the test statistics.

Figure 2. Table showing the characteristics of the study sets used in the final analysis of genotypes.

Figure 3 A, 3B and 3C. Table showing the summary results for 15 SNPs selected for genotyping in stage 2.

Figure 4A, 4B, 4C and 4D. Table showing the details of 53 SNPs reaching p<10 ~6 in stage 1.

Figure 5. Table showing the PSA levels by genotype.

Figure 6. Table showing the comparisons between groups

Figure 7. Table showing the age-specific odds ratios in stage 2 for each SNP.

Figure 8A and 8B. Table showing the call rates, Hardy- Weinberg tests and Homogeneity tests.

Figure 9. Genotype counts, by study and SNP.

Figure 10. Table summarising SNP position and LD plots.

Figure 11. Nucleotide sequence associated with SNP rs2660753 (SEQ ID NO:1)

Figure 12. Nucleotide sequence associated with SNP rs9364554 (SEQ ID NO:2)

Figure 13. Nucleotide sequence associated with SNP rs6465657 (SEQ ID NO:3) Figure 14. Nucleotide sequence associated with SNP rs7920517 (SEQ ID NO:4) Figure 15. Nucleotide sequence associated with SNP rsl0993994 (SEQ ID NO:5)

Figure 16. Nucleotide sequence associated with SNP rs7931342 (SEQ ID NO:6)

Figure 17. Nucleotide sequence associated with SNP rs902774 (SEQ ID NO:7)

Figure 18. Nucleotide sequence associated with SNP rs2659056 (SEQ ID NO:8) Figure 19. Nucleotide sequence associated with SNP rs266849 (SEQ ID NO:9) Figure 20.Nucleotide sequence associated with SNP rs2735839 (SEQ ID NO: 10)

Figure 21. Nucleotide sequence associated with SNP rs5945619 (SEQ ID NO:11)

Detailed Description of the Invention

The inventors conducted a two-stage genome-wide association study. In the first stage, 1,906 prostate cancer cases and 1,934 controls collected through national studies in the UK were studied; the final number analysed after exclusions (see methods) was 1854 cases and 1894 controls (figure 2). Cases were selected for whom the disease was detected through clinical symptoms, rather than through routine screening by prostate specific antigen (PSA) measurement, because these are of known clinical relevance. The case series was further "enriched" by including men diagnosed aged <60 years or with a family history of PrCa, since these cases are thought to be more likely to carry susceptibility alleles, thereby increasing the statistical power of the study. Controls were men aged >50 years identified through a national case-finding study (ProtecT), who had a baseline PSA of <0.5ng/ml. Men with a low PSA level are known to be at a low risk for the subsequent development of clinically significant PrCa [13], potentially further improving statistical power.

In the second stage, DNA samples from these individuals were evaluated for a set of 569,243 SNPs using the Illumina Infinium platform (see example). This SNP set has been estimated to report on approximately 90% of known SNPs, based on data in HapMap, at an LD coefficient (r ) >0.80. In this analysis, data were used for 541,129 SNPs that were genotyped on all subjects, passed quality control (QC) criteria, and had a minor allele frequency of at least 1% in controls (see example).

The inventors have discovered seven PrCa susceptibility loci. In addition, the inventors have confirmed three loci on 8q24 and two on 17q as PrCa susceptibility loci.

Based on the effect size seen in stage 2 (that is, ignoring the effect of enrichment of the stage 1 set), the inventors had approximately 52% power to detect the MSMB association, rising to close to 100% power based on the effect size seen stage 1. Based on the estimated relative risks in stage 2 of this study, the susceptibility loci discovered would together explain approximately 6% of the familial risk of PrCa, with MSMB being the most significant (-2% of the familial risk, comparable to the two strongest 8q loci).

Together with the previously reported loci, approximately 15% of familial risk in PrCa can now be explained. The inventors have confirmed that PrCa is genetically complex. The loci include plausible candidates, including a kinase gene, loci without obvious candidates and one gene desert, suggesting that diverse pathways are likely to be involved. The involvement of MSMB highlights a role for its product in PrCa screening, whilst LMTK2 provides a potential therapeutic target. There is also potential for risk counselling. The relative risks conferred by these loci are modest: the homozygote relative risk for rsl0993994 at MSMB was 1.61 fold (95%CI 1.40- 1.86). rs2660753 had the highest homozygote OR (2.08), but with a wide confidence interval. The associations found by the inventors using tag SNPs may reflect stronger associations with the causal variants. Furthermore, the combined effect of these SNPs may be substantial, and as other SNPs are identified it may be possible to define genotypes that are sufficiently predictive of risk to be useful clinically.

Embodiments of the invention will now be described by way of example, in which the following materials and methods were employed:

EXAMPLE Samples

PrCa cases for stage 1 were selected from the UK Genetic Prostate Cancer Study (UKGPCS) [30]. Cases were selected on the basis of either a diagnosis at age <60 years (n=1291) or a first or second degree family history of prostate cancer (n=726). Men who reported to be non- white and men who were known to be diagnosed through asymptomatic PSA screening were excluded. Controls for stage 1 were selected through the ProtecT study. ProtecT is a national study of community-based PSA testing and a randomised trial of subsequent prostate cancer treatment.

Approximately 200,000 men between the ages of 50 and 69 years, ascertained through general practices in nine regions in the UK, are being recruited.

For controls for stage 1, men aged >50 years with a PSA of <0.5ng/ml were selected. Men known to be non- white were excluded. 2,001 controls were selected to be frequency matched to the geographical distribution of the cases.

Stage 2 comprised PrCa cases and controls from the UK and Australia. The former were ascertained through the UK GPCS as above (n=328) and through a systematically collected series from PrCa clinics in the Urology Unit at The Royal Marsden NHS Foundation Trust (n= 1665) over a 14 year period. UK controls were

identified through two sources. Four hundred and forty nine controls were drawn from the UK GPCS study (Prostate Cancer Research Foundation Study component) and were geographically, ethnically and age matched to the UKGPCS young onset cases. They had no family or personal history of PrCa. The remaining controls (n=1684) were selected from men in the ProtecT study who had a PSA of <10ng/ml. Men with PSA >4ng/ml were excluded if they had a positive prostatic biopsy. As for stage 1, men known to be non- white were excluded.

The Australian cases were ascertained from three studies: (i) a population-based series of PrCa cases identified from the Victorian

Cancer Registry since 1999, diagnosed at <56 years (Early Onset Prostate

Cancer Study, EOPCFS; n=526); (ii) a population based case control study based on cases diagnosed in

Melbourne and Perth (Risk Factors for Prostate Cancer Study, RFPCS; n=590). Cases were identified from the population cancer registries, with histopathologically confirmed prostate cancer, excluding tumours with

Gleason scores of less than 5, diagnosed at < 70 years with sampling stratified by age at diagnosis. [31-33]

(iii) a prospective cohort study of 17,154 men aged 40 to 69 years at recruitment in 1990-4 (Melbourne Collaborative Cohort Study, MCCS; n=190) [34, 35]. Controls were selected from the RFPCS study, in which they were identified through government electoral rolls and frequency matched to the age distribution of the RFPCS cases (n=509), together with a random sample from the MCCS cohort (n=756). All studies were approved by the appropriate ethics committees.

Genotyping

Stage 1 genotypes were generated using the Illumina Infϊnium HumanHap550 array. Only samples which called on at least 97% of SNPs at a confidence score of εθ.25 were used. Owing to a re-synthesis of the headset between the stages, the marker sets (versions 1 and 3) are slightly different: 534,446 SNPs were common to both sets, 14,356 markers were unique to version 1 and 20,441 markers were unique to version 3. Data on 3840 individuals (1906 cases, 1934 controls): 323 typed on version 1 and 3525 on version 3 (including 8 duplicates typed on both versions) were used. All the SNPs re-evaluated in stage 2 were from the common set of SNPs, and for simplicity the QQ plot and summary results utilise this SNP set. SNPs were selected for evaluation in stage 2 on the basis of a significance level of p<10 " based on a ldf trend test. SNPs from the previously reported regions of association on 8q24 and 17q were excluded. Multiple logistic regression was conducted using the set of SNPs in each of the remaining regions, to define SNPs that showed evidence of independent association at p<.05. Genotyping in stage 2 was performed by 5'nuclease assay (Taqman™) using the ABI Prism 7900HT sequence detection system according to the manufacturer's instructions. Primers and probes were supplied directly by Applied Biosystems (http://www.appliedbiosystems.com/) as Assays-By-Design™. All assays were carried out in 384-well format. Each plate included at least 2 negative controls and 2 duplicates.

Statistical Methods

To identify close relatives identity-by-state (IBS) probabilities for all pairs were computed. 27 cryptic duplicate samples (or MZ twins) and 3 pairs of probable

brothers (IBS >0.86) were identified. In each case we excluded the individual with the lower call rate. By computing IBS scores between participants and individuals in HapMap and using multi-dimensional scaling, 59 individuals who appeared to have significant Asian or African ancestry (approximately 10% non-European ancestry) were identified. Five cases of apparent Kleinfelter's syndrome were removed. After these exclusions, 1854 cases and 1894 controls were used in the final analysis of stage 1. All SNPs with a call rate <95%, a minor allele frequency in controls of <1%, or whose genotype frequency in controls departed from Hardy- Weinberg equilibrium at p<.00001 were filtered out. After these exclusions, 541,129 SNPs were analyzed, of which 509,295 were in both versions and common to all samples. Duplicate concordance was 98.8%. In stage 2, 123 samples were excluded that failed on four or more of the assays used. The call rates were at least 0.97 for each SNP in each population. Genotype distributions in each control population for each SNP were consistent with Hardy- Weinberg equilibrium. Associations were assessed between each SNP and disease at stage 1 using a ldf Cochran-Armitage trend test and a general 2df chi-squared test. Inflation in the chi squared statistic was assessed using the genomic control approach: an inflation factor (λ) was derived by dividing the median of the lowest 90% of the ldf statistics by the 45% percentile of a ldf chi- squared distribution (0.357). After stage 2, stratified ldf and 2df tests were performed, stratifying by stage and country. Odds ratios and confidence limits were estimated from the stage 2 data using unconditional logistic regression, stratified by country. Tests of homogeneity of the odds ratios across strata were assessed using likelihood ratio tests. Geographical variation in allele frequencies within the UK was assessed by classifying individuals into nine regions. Modification of the odds ratios by age was assessed using a case only analysis, assessing the effect of age on SNP

genotype in the cases using polytomous regression. The effects of SNP genotypes on PSA level were assessed using linear regression, after log-transformation of PSA level to correct for skewness. Analyses were performed in R (principally using SNPMatrix [36]) and Stata. Binding site predictions were examined using Matlnspector from Genomatics (http://www.genomatix.de/).

Results

Figure 1 shows the Q-Q plot for the distribution of test statistics for comparison of genotype frequencies in cases versus controls (1 degree of freedom (df) Cochran- Armitage trend test). There is little evidence of any general inflation of the test statistics (estimated inflation factor λ=l .05 based on the bottom 90% of the distribution). This was as expected, since cases and controls were of European origin, and were broadly matched for region of residence. This is consistent with previous observations that population structure across the UK causes little inflation of the test statistics in association studies [H]. There was, however, a marked excess of

"significant" associations. A total of 197 SNPs were significant at the p<10 "4 level, compared with the 54.1 that would have been expected by chance alone, and of these, 53 were significant at the p<10 "6 level, compared with 0.5 expected by chance (figure 4). Of the 53 SNPs significant at the p<10 "6 level, 20 were on 8q24, a region previously shown to harbour susceptibility loci (figures 4 and 5). These occurred in three distinct LD blocks. The strongest associations in these regions were found with rs6983267 (per allele odds ratio (OR) 0.7, p=9xlθ ~14 ), rslO16343 (per allele OR 1.37, p=1.6xlθ "8 ) and rs4242384 (per allele OR 1.88, p=2.8xlθ "17 ). Six SNPs on chromosome 17 reached p<10 "6 . Four of these were located at 17ql2, the strongest association being with rs7501939 in TCF2 (per allele OR 0.71, p= 10 "12 ). The other

two SNPs were located at 17q24, the strongest association being with rsl 859962 (per allele OR 1.26, P= 5.5xlO ~7 ). These results confirm previous observations by Gudmundsson et al [7]. The remaining 27 SNPs were from eight genomic regions (figure 4). Multiple logistic regression analyses were carried out, based on the SNPs in each of the regions. Eleven identified SNPs that appeared to be independently significant. To confirm these associations, SNPs were evaluated in a second stage that included 3,245 cases and 3,329 controls from studies in the UK and Australia. For 8 of these 11 SNPs, from seven regions, there was confirmatory evidence (at least p<.002 and in the same direction as in stage 1) with a combined significance level over both stages of p=2.4x10 " or better. This provides strong evidence of association at a level of significance appropriate for a genome-wide study (figure 3) [11, 14]. One SNP on chromosome 12 (rs902774) showed a strong association with disease in stage 1 (p=2xlθ ~7 ) but this was not replicated in stage 2, suggesting that the significance of the initial association was likely to be a type I error. Of the three SNPs typed on chromosome 19, rs2735839, which showed the strongest association in stage 1 (p=2.4xlθ ~ ) also gave evidence of association in stage 2 (p=.0003; p=2.3xlθ "18 overall), but the other two SNPs did not replicate in stage 2. Of the two SNPs tested on chromosome 10, both showed strong evidence of association. However, the association with rsl 0993994 was far stronger in both stages, and the association with rs7920517 was not significant after adjustment for rsl0993994 in stage 2.

For each confirmed SNP, the estimated per allele odds ratios is stronger for stage 1 than stage 2. This may reflect how the inventors have restricted attention to highly significant loci (so called "winner's curse"), or it may reflect, at least in part, the

enriched nature of the cases and controls in stage 1. For these reasons only the OR estimates from stage 2 are regarded as valid estimates of the relative risks in the general population. The associations of the SNPs with PSA level in a sample of 1679 UK controls in stage 2 were investigated. Three of the SNPs, rs 10993994, rs7920517 and rs2735839, were strongly associated with PSA level in the same direction as the association with PrCa risk (figure 5). Of the two chromosome 10 SNPs, the association with rs7920517 was not significant after adjustment for rsl 0993994, consistent with the pattern for PrCa risk. A weaker association with PSA levels was also observed for rs5945619, again in the same direction as the PrCa risk. It is notable that the rs 10993994 and rs2735839 (but no other SNPs) show a marked difference in allele frequency for controls between stages 1 and 2. These results indicate that the stronger associations seen in stage 1 for the chromosome 10 and 19 loci may partly reflect the selection of low PSA controls. However, the persistence of the associations in stage 2 indicate that the association with disease is not solely due to selection on PSA levels. There were no marked differences in the allele frequencies between the controls from UK and Australia, or between either of the control groups used in each country, for any SNP (figures 3 and 6). Also noted were the control frequencies for all SNPs in stage 2 are very close to those for the 1958 Birth Cohort, a UK based cohort study used in other genome scans (http://www.b58cgene.sgul.ac.uk). No evidence of regional variation in genotype frequencies was observed in this work or in the 1958 Birth Cohort. The control frequencies are robust and are not subject to significant regional variation. In stage 1, two of the SNPs (rsl0993994 and rs7931342) showed stronger associations for familial cases (p=.04 and p=.0002 respectively). In stage 2, there was some suggestion of a higher relative risk using cases diagnosed before age 55 years for

rs2735839 (figure 7, p=.14) and rs5945619 (p=.11). The results were compared with the publicly available results from the Cancer Genetic Markers of Susceptibility (CGEMS) study, a genome scan of 1,117 screen detected PC cases and 1,105 controls that used the same platform (http://cgems.cancer.gov/). There is evidence of association for each of the SNPs on chromosomes 10 (rsl0993994; p=.009), 11 (rs7931342; p=.015), 19 (rs2735839; p=.004) and X (rs5945619; p=.0004), in addition to the TCF2 association (rs7501939; p=.002), but no evidence for the novel associations on chromosomes 3, 6 or 7. For all but two SNPs, the genotype-specific ORs were consistent with a multiplicative (allele dosage) model. For rs2660753, the rare homozygote OR (2.08) was greater than would be expected under this model

(P=.O2), whilst for rs93364554, there was no apparent difference in risk between rare homozygotes and heterozygotes (figures 3 and 8).

The seven novel susceptibility regions contain several strong candidate genes. rs 10993994 and rs7920517 lie within an LD block of -100kb on chromosome 10, containing the microseminoprotein beta gene, MSMB. The most strongly associated SNP, rs 10993994, lies 2bp upstream of the transcription start site of MSMB. This SNP may be causally related to disease risk. It is noted that the risk allele of this SNP removes multiple predicted binding sites for transcription and splicing factors (http://www.genomatics.com). Putative androgen and estrogen receptor binding sites lie less than 50bp upstream of this SNP. MSMB encodes for PSP94, a member of the immunoglobulin binding factor family which is synthesized by the epithelial cells of the prostate gland and secreted into seminal plasma. The expression of the encoded protein has been found to be decreased in prostate cancer and loss of expression of

PSP94 has been found to be associated with recurrence after radical prostatectomy [15].

rs2735839 lies between KLK2 (Kallikrein-related peptidase 2; hK2) and KLK3 (PSA). rs27358389 lies 3' of KLK3 and shows a much stronger association with PSA levels than those previously reported, suggesting a novel functional effect.

rs6465657 lies in intron 9 of LMTK2 (cprk; BREK; Brain-Enriched Kinase) [21]. It encodes a neuronal kinase, cyclin-dependent kinase 5 (cdk5)/p35-regulated kinase (cprk). Cprk is expressed in a number of tissues but is enriched in brain and muscle. Somatic mutations in LMTK2 have been found in a small proportion of lung and colon cancers [22]. rs6465657 also lies upstream of basic helix-loop-helix BHLHB8 which is a transcription factor expressed at high levels in the adult seminal vesicle and during seminal gland differentiation. Mice that do not express bhlhbδ have disruption of the epithelial cellular architecture [23].

rs9364554 lies in intron 5 of the gene SLC22A3 which is one of the solute carrier family 22 (organic cation transporter; OCT) genes. Polyspecific OCTs in the liver, kidney, intestine, and other organs are critical for elimination of many endogenous small organic cations as well as a wide array of drugs and environmental toxins. This gene is one of three similar cation transporter genes located in a cluster on chromosome 6. Two of these, SLC22A3 and SLC22A2 are both in the LD block containing rs9364554. SLC22A2 is part of the PI3 kinase and cyclic AMP-dependent protein kinase A catalytic subunit pathway [24] and is activated by a calmodulin - dependent signalling pathway [25]. Downstream of these genes in the LD block are

genes involved in lipoprotein metabolism: LPAL2 and LPA. Polymorphisms in these genes have been reported to be associated with an increased risk of coronary artery disease [26]. Some studies have shown a protective effect of lipid lowering drugs (statins) on risk of advanced PrCa [27].

rs5945619 is in an LD block of approximately 2MB on Xp. It lies between NUDTlO and NUDTIl [nudix (nucleoside diphosphate linked moiety X)-type motif 11], about 2kb upstream of the latter. These genes encode isoforms of diphosphoinositol polyphosphate phosphohydrolase which determine the rate of phosphorylation in DNA repair, stress responses and apoptosis [28]. Also in the LD block are GSPT2 (a GTP binding protein); MAGEDs 1, 4 B, 4 (melanoma antigen genes expressed in the testes); CTD-2267G17.3; XAGEs 2, 1C, ID, 5, 3 (encoding a family of cancer/testis- associated antigens); SSXs 8, 7, 2, 2B (a family of synovial sarcoma X breakpoint proteins which may function as transcriptional repressors); SPANXN5 and TMEMs 29B and 29.

rs2660753 is in a gene-poor region on chromosome 3. It is 170kb upstream of CHMP2B (chromatin modifying protein 2B), which encodes a component of the endosomal ESCRTIII complex; mutations in this gene have been described in frontotemporal and neurodegenerative disease [29]. The LD block also contains POUlFl (PITl) which is a pituitary-specific transcription factor.

Finally, rs7931342 lies in an LD block of 70kb on chromosome 11 that contains no recognised genes.

References

1. Edwards, S. M. & Eeles, R. A. Unravelling the genetics of prostate cancer. American Journal of Medical Genetics Part C-Seminars in Medical Genetics 129C, 65-73 (2004).

2. Amundadottir, L. T. et al. A common variant associated with prostate cancer in European and African populations. Nature Genetics 38, 652-658 (2006).

3. Gudmundsson, J. et al. Genome-wide association study identifies a second prostate cancer susceptibility variant at 8q24. Nature Genetics 39, 631-637 (2007).

4. Yeager, M. et al. Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nature Genetics 39, 645-649 (2007).

5. Freedman, M. L. et al. Admixture mapping identifies 8q24 as a prostate cancer risk locus in African- American men. Proceedings of the National Academy of Sciences of the United States of America 103, 14068- 14073 (2006).

6. Haiman, C. A. et al. Multiple regions within 8q24 independently affect risk for prostate cancer. Nature Genetics 39, 638-644 (2007). 7. Gudmundsson, J. et al. Two variants on chromosome 17 confer prostate cancer risk, and the one in TCF2 protects against type 2 diabetes. Nature Genetics 39, 977-983 (2007).

8. Edwards, S. M. et al. Two percent of men with early-onset prostate cancer harbor germline mutations in the BRCA2 gene. American Journal of Human Genetics 72, 1 - 12 (2003).

9. Easton DF, D Thompson, & The Breast Cancer Linkage Consortium. Cancer risks in BRCA2 mutation carriers. J Nat Cancer Inst 91, 1310-1316 (1999).

10. Thompson D, Easton DF, & For the Breast Cancer Linkage Consortium. Cancer Incidence in BRCAl Mutations. J Natl Cancer Inst 94, 1358-1365 (2002).

11. The Wellcome Trust Case Control Consortium. Genome- wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661-678 (2007). 12. Easton DF et al. Genome- wide association study identifies breast cancer susceptibility loci. Nature 447, 1087-1093 (2007). 13. LiIj a, H. et al. Long-term prediction of prostate cancer up to 25 years before diagnosis of prostate cancer using prostate kallikreins measured at age 44 to 50 years. Journal of Clinical Oncology 25, 431-436 (2007). 14. Thomas, D. C, Haile, R. W., & Duggan, D. Recent developments in genomewide association scans: A workshop summary and review. American Journal of Human Genetics 77, 337-345 (2005).

15. Reeves, J. R., Dulude, H., Panchal, C, Daigneault, L., & Ramnani, D. M. Prognostic value of prostate secretory protein of 94 amino acids and its binding protein after radical prostatectomy. Clinical Cancer Research 12, 6018-6022 (2006).

16. Steuber, T., HeIo, P., & Lilja, H. Circulating biomarkers for prostate cancer. World Journal of Urology 25, 111-119 (2007).

17. Steuber, T. et al. Risk assessment for biochemical recurrence prior to radical prostatectomy: significant enhancement contributed by human

glandular kallikrein 2 (hk2) and free prostate specific antigen (PSA) in men with moderate PSA-elevation in serum. InternationalJournal of Cancer 118, 1234-1240 (2006).

18. Borgono, C. A., Michael, I. P., & Diamandis, E. P. Human tissue kallikreins: Physiologic roles and applications in cancer. Molecular Cancer Research 2, 257-280 (2004).

19. Cramer, S. D. et al. Association between genetic polymorphisms in the prostate-specific antigen gene promoter and serum prostate-specific antigen levels. Journal of the National Cancer Institute 95, 1044-1053 (2003). 20. Severi, G. et al. Variants in the prostate-specific antigen (PSA) gene and prostate cancer risk, survival, and circulating PSA. Cancer Epidemiology Biomarkers & Prevention 15, 1142-1147 (2006).

21. Kawa, S., Fujimoto, J., Tezuka, T., Nakazawa, T., & Yamamoto, T. Involvement of BREK, a serine/threonine kinase enriched in brain, in NGF signalling. Genes to Cells 9, 219-232 (2004).

22. Greenman, C. et al. Patterns of somatic mutation in human cancer genomes. Nature 446, 153-158 (2007).

23. Pin CL et al. Identification of a Transcription Factor, BHLHB8, Involved in Mouse Seminal Vesicle Epithelium Differentiation and Function. Biol.Reprod. 2007. Ref Type: In Press

24. Fang J, Ding M, Yang L, Liu L-Z, & Jiang B-H. P13K/PTEN/AKT signaling regulates prostate tumor angiogenesis. Cellular Signalling . 2007 (in press). 25. Cetinkaya, I. et al. Regulation of human organic cation transporter

hOCT2 by PKA, PDK, and calmodulin-dependent kinases. American Journal of Physiology-Renal Physiology 284, F293-F302 (2003).

26. Luke, M. M. et al. A polymorphism in the protease-like domain of apolipoprotein(a) is associated with severe coronary artery disease. Arteriosclerosis Thrombosis and Vascular Biology 27, 2030-2036 (2007).

27. Platz, E. A. et al. Statin drugs and risk of advanced prostate cancer. Journal of the National Cancer Institute 98, 1819-1825 (2006).

28. Hidaka, K. et al. An adjacent pair of human NUDT genes on chromosome X are preferentially expressed in testis and encode two new isoforms of diphosphoinositol polyphosphate phosphohydrolase. Journal of Biological Chemistry 277, 32730-32738 (2002).

29. Skibinski, G. et al. Mutations in the endosomal ESCRTIII-complex subunit CHMP2B in frontotemporal dementia. Nature Genetics 37, 806-808 (2005). 30. Eeles, R. A. Genetic predisposition to prostate cancer. Prostate Cancer and Prostatic Diseases 2, 9-15 (1999).

31. Giles, G. G. et al. Smoking and prostate cancer: Findings from an Australian case-control study. Annals of Oncology 12, 761-765 (2001).

32. Giles, G. G. et al. Androgenetic alopecia and prostate cancer: Findings from an Australian case-control study. Cancer Epidemiology Biomarkers &

Prevention 11, 549-553 (2002).

33. Severi, G. et al. ELAC2/HPC2 polymorphisms, prostate-specific antigen levels, and prostate cancer. Journal of the National Cancer Institute 95, 818-824 (2003). 34. Maclnnis, R. J., English, D. R., Gertig, D. M., Hopper, J. L., & Giles,

G. G. Body size and composition and prostate cancer risk. Cancer Epidemiology Biomarkers & Prevention 12, 1417-1421 (2003). 35. Severi, G. et al. Circulating steroid hormones and the risk of prostate cancer. Cancer Epidemiology Biomarkers & Prevention 15, 86-91 (2006). 36. Clayton, D. & Leung, H. T. An R package for analysis of wholegenome association studies. Human Heredity 64, 45-51 (2007).