Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHODS FOR COHORT SELECTION AND LONGEVITY STUDIES
Document Type and Number:
WIPO Patent Application WO/2004/048591
Kind Code:
A2
Abstract:
This application includes methods and compositions for evaluating a genetic locus associated with longevity and methods for evaluating information from groups of individuals. A longevity-associated locus was identified in gene that encodes for the microsomal triglyceride transfer (MTP) protein on chromosome IV. The polymorphic markers in the region and their methods of use are described. Also disclosed is a method for evaluating information which information for each individual of a first group of individuals and each individual of a second group of individuals is used to select a subset of individuals from the second group. The information can be about a plurality of different biological features. The selection can use a comparison between information for members of the first group and information for members of the subset. It is also possible to compare members of the first group to members of the selected subset with respect to at least one factor. The method can be used to reduce stratification, for example, in the analysis of genetic associations.

Inventors:
GEESAMAN BARD J (US)
DALY MARK (US)
PUCA ANNIBALE (US)
Application Number:
PCT/US2003/015370
Publication Date:
June 10, 2004
Filing Date:
May 15, 2003
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ELIXIR PHARMACEUTICALS INC (US)
GEESAMAN BARD J (US)
DALY MARK (US)
PUCA ANNIBALE (US)
International Classes:
A61K31/7088; A61K31/7105; A61K38/00; A61K45/00; A61K48/00; C07H21/00; C07K2/00; C07K14/00; C12Q1/02; C12Q1/68; G01N33/48; G01N33/50; G01N33/53; G06F19/00; G06G7/48; G06G7/58; C12Q; (IPC1-7): C12Q/
Foreign References:
US6132724A2000-10-17
US5292639A1994-03-08
Other References:
TAYLOR ET AL: 'Detection of obesity QTLs on mouse chromosomes 1 and 7 by selective DNA pooling' GENOMICS vol. 34, 1996, pages 389 - 398, XP002239090
Attorney, Agent or Firm:
Myers, Louis (225 Franklin Street Boston, MA, US)
Download PDF:
Claims:
What is claimed:
1. A method comprising : receiving information for each individual of a first group of individuals and each individual of a second group of individuals, wherein the information for each individual comprises indications about a plurality of different biological features; selecting a subset of individuals from the second group using a comparison between information for members of the first group and information for members of the subset; and evaluating the relationship of at least one factor to members of the first group relative to members of the selected subset.
2. The method of claim 1, wherein the different biological features comprises a property of a biomolecule.
3. The method of claim 2 wherein the biomolecule is a protein, nucleic acid, lipid, or carbohydrate.
4. The method of claim 3 wherein the different biological features comprise polymorphisms of genomic DNA.
5. The method of claim 1 wherein the different biological features comprises a property of a cell.
6. The method of claim 1 wherein the plurality of different biological features comprises at least ten features.
7. The method of claim 1 wherein the comparison comprises representing the information for each member as a multidimensional vector or matrix.
8. The method of claim 1 wherein the comparison is weighted by covariance of at least two different features.
9. The method of claim 7 wherein the comparison is weighted by a covariance matrix for the plurality of different features.
10. The method of claim 4, wherein the individuals are humans, the first group of individuals is associated with a particular phenotypic trait, and the evaluating comprises evaluating association of a genetic marker with individuals of the first group relative to individuals of the select subset.
11. The method of claim 10 wherein the plurality of different biological features comprises genetic polymorphisms located on at least four different chromosomes.
12. The method of claim 10 wherein the comparison comprises assessing a multivariate distance.
13. The method of claim 11 wherein the evaluating association of the genetic marker comprises evaluating a LOD score.
14. A method of evaluating the relationship between a genetic polymorphism and a trait, the method comprising: obtaining nucleic acid from each individual of a plurality of individuals, wherein a first group of the individuals are associated with a trait and a second group of individuals are not associated with the trait; analyzing the nucleic acid to determine genetic information about a plurality of genetic loci for each individual of the plurality; selecting a subset of individuals from the second group based on a comparison between the genetic information for members of the first group and the genetic information for members of the subset; and evaluating association of a genetic locus of interest and individuals of the first group relative to association of the genetic locus of interest and individuals of the selected subset.
15. The method of claim 14 wherein the genetic information comprises indications of presence or absence of single nucleotide polymorphisms at least some genetic loci.
16. The method of claim 14 wherein the selecting comprises selecting a subset that compares to the first group more favorably than at least another subset.
17. The method of claim 14 wherein the selecting comprises incrementally adding members of the second group to the subset.
18. The method of claim 17 wherein the incremental adding comprises selecting one or more members of the second group based on how a group that includes the one or more members compares to the first group.
19. The method of claim 18 wherein the incremental adding comprises selecting a single member of the second group that minimizes a comparative function for a comparison between a group that includes the single member and the first group.
20. The method of claim 14 wherein the comparison comprises a comparative function that returns a scalar value.
21. The method of claim 20 wherein the selecting comprises minimizing the comparative function.
22. The method of claim 20 wherein the comparative function is a function of distance.
23. The method of claim 22 wherein the distance is weighted for allele variability.
24. The method of claim 22 wherein the distance is weighted for allele co variance.
25. The method of claim 22 wherein the distance is a Mahalanobis distance.
26. The method of claim 14 wherein the selecting comprises pairing each member of the first group to a unique member of the second group.
27. The method of claim 14 wherein the evaluating of the association comprises evaluating a LOD score for the marker of interest.
28. The method of claim 14 wherein the plurality of genetic markers excludes the marker of interest.
29. The method of claim 14 wherein the plurality of genetic markers contains between 10 and 100 markers.
30. The method of claim 25 wherein the selecting comprises a filter that requires that the mean chisquare of the Gtest is less than 1.5.
31. A system comprising: a memory that stores information for each individual of a first group of individuals and each individual of a second group of individuals, wherein the information for each individual comprises indications about a plurality of different biological features; a communications interface; and a processor configured to select a subset of individuals from the second group using a comparison between information for members of the first group and information for members of the subset; evaluate the relationship of at least one factor to members of the first group relative to members of the selected subset; and communicate results of the evaluation using the interface.
32. A method comprising: obtaining nucleic acid samples from each individual of a first group of individuals and each individual of a second group of individuals; analyzing the nucleic acid samples to determine information about a plurality of genetic markers for each individual of the first and second groups; selecting a subset of individuals from the second group using a comparison between the information for members of the first group and the information for members of the subset; and comparing members of the first group to members of the selected subset with respect to at least one factor.
33. The method of claim 32 wherein the comparing comprise subjecting members of the first group, but not the second group to a condition and evaluating members of the first group and members the second group.
34. The method of claim 33 wherein the condition is a medical procedure.
35. The method of claim 32 wherein the comparison comprises a distance function that returns a scalar value.
36. The method of claim 35 wherein the distance function is weighted for marker covariance.
37. A method comprising: obtaining DNA samples from and information about each individual of a first group of the individuals are associated with a trait; analyzing the DNA samples to determine genetic information about a plurality of genetic loci for each individual of the plurality; sending the allelic information to a server that stores genetic information for each individual of a second group of individuals; and receiving information about a subset of individuals selected from the second group of individuals, wherein the subset of individuals is selected using a comparison between the genetic information for members of the first group and genetic information for members of the selected subset.
38. A server comprising a memory that stores allelic information for a plurality of genetic markers for each individual of a first group of individuals; and software configured to: receive genetic information about a plurality of genetic loci for each individual of a plurality of individuals; select a subset of individuals from the second group using a comparison between genetic information for members of the plurality of individuals and genetic information for members of the selected subset; and communicate information about individuals of the subset to a user.
39. A method of comparing a first and second population of individuals, the method comprising: receiving genetic information for the first and second populations of individuals, the genetic information including information about a plurality of genetic markers for each of the individuals, the plurality including markers located on at least four different chromosomes and at least twenty different markers; and returning a scalar value that is a function of the marker distribution for the first and second population and the degree of covariance among the genetic markers.
40. The method of claim 39 wherein the function further weights each marker by the degree of variability of the respective marker.
41. The method of claim 40 wherein the function is a function of the Mahalanobis distance between the genetic information for the first and second populations.
42. The method of claim 39 wherein each allele is weighted by its allele frequency in a third population.
43. A method of performing a controlled study, the method comprising: identifying a first and second subset of individuals from the plurality of individuals by comparing occurrences of a plurality of genetic markers among individuals of the first and second subsets ; and subjecting the first subset of individuals to a first condition and the second subset of individuals to a second condition.
44. The method of claim 43 wherein the plurality of genetic markers includes markers located on at least four different chromosomes and at least twenty different markers.
45. The method of claim 44 wherein the first conditions comprises administering a test treatment, and the second condition comprises administering a control/placebo treatment.
46. The method of claim 43 wherein the comparing comprises evaluating a function that returns a scalar value and depends on of the marker distribution for the first and second subset and the degree of covariance among the genetic markers in the respective subsets.
47. The method of claim 44 where the subsets are complementary.
48. A machine readable medium having encoded thereon information comprising: a first list of records; a second list of records, wherein each record of the first and second list corresponds to a genome and comprises genetic information about each of a plurality of genetic markers in the genome; and information describing a relationship between records of the first list and records of the second list, wherein the relationship is a function of the genetic information for at least a subset of the genetic markers, the markers of the subset including markers on at least two different chromosomes, and covariance of genetic markers of the subset between records of each list.
49. The medium of claim 47 wherein the relationship is a function of distance.
50. The medium of claim 48 wherein the distance is a Mahalanobis distance.
51. A method for evaluating propensity for longevity, the method comprising: providing a sample comprising nucleic acid from a subject; evaluating the nucleic acid for presence or absence of a polymorphism in an MTP gene; and outputting an evaluation of propensity for longevity as a function of the presence or absence of the polymorphism.
52. The method of claim 51 wherein the polymorphism comprises a polymorphic nucleotide at a noncoding region of the gene.
53. The method of claim 51 wherein the polymorphism comprises a polymorphic nucleotide at a coding region of the gene.
54. The method of claim 51 wherein the polymorphism comprises a polymorphic nucleotide with 10 nucleotides of the position defined by rsl 800591 or rs2866164.
55. The method of claim 54 the polymorphism comprises a polymorphic nucleotide at the same nucleotide position as the nucleotide defined by rsl800591 or rs2866164.
56. The method of claim 51 wherein the sample is derived from a somatic cell.
57. The method of claim 51 wherein the sample is derived from a gamete.
58. The method of claim 51 wherein the sample is derived from embryonic or fetal tissue.
59. The method of claim 51 wherein the evaluating comprises hybridization, an enzymatic reaction, and/or mass spectroscopy.
60. The method of claim 54 wherein the evaluating comprising evaluating both the polymorphic positions defined by rsl800591 and rs2866164.
61. A method for evaluating propensity for longevity, the method comprising: evaluating one or more nucleotide positions in the MTP gene of a first human subject and of a second human subject, wherein the second subject is a long lived individual and the first subject is a genetic relative of the second subject; and comparing the nucleotides present at theone or more corresponding nucleotide between the first and second subject.
62. The method of claim 61 wherein the evaluating comprises evaluating a nucleotide polymorphism defined by rsl800591 or rs2866164 in the MTP gene.
63. The method of claim 61 wherein the comparing comprises evaluating a statistical function that depends on the presence or absence of the polymorphisms at rsl800591 or rs2866164 in the MTP gene.
64. A method of altering MTP activity in a cell, the method comprising: providing an agent that alters MTP gene, mRNA, or protein activity; contacting the agent to a cell, in an amount effective to alter the MTP activity in the cell; and evaluating an ageassociated parameter of the cell.
65. A method of altering MTP activity in a cell, the method comprising: providing an agent that alters MTP gene, mRNA, or protein activity; identifying a subject that lacking an MTP allele predisposed for longevity; and administering the agent to the subject, or to a cell of the subject, in an amount effective to alter the MTP activity.
66. The method of claim 65 wherein the subject lacks the 493G allele and/or the 95Q allele.
67. The method of claim 65 wherein the agent comprises a double stranded RNA.
68. A method for inducing longevity in a cell, the method comprising introducing a longevityassociated locus comprising a polynucleotide sequence that encodes for MTP into said cell.
69. The method of claim 68 further comprising evaluating an age associated parameter of the cell.
70. The method of claim 68 further comprising evaluating lifespan of the cell.
71. The method of claim 68 further comprising transferring the cell into an organism.
72. The method of claim 68 wherein the cell is in vivo, in an organism, during the introducing.
73. The method of claim 71 or 72 further comprising evaluating an age associated parameter of the organism or lifespan of the organism.
74. A method of screening for an agent that induces longevity, the method comprising the steps of : culturing a cell that does not have an rs 1800591 or rs2866164 allele that is associated with increased longevity; contacting the cell with a test compound; and evaluating a property of the cell.
75. The method of claim 74 wherein the cell does not include the493G allele nor the 95Q allele.
76. The method of claim 74 wherein the evaluating comprises evaluating an ageassociated parameter of the cell.
77. The method of claim 74 wherein the evaluating comprises evaluating lipoprotein production or assembly.
78. The method of claim 74 wherein the method is effected in parallel for at least ten different compounds, and results of the evaluating are stored in a machine readable format.
79. The method of claim 74 further comprising comparing the evaluated property of the cell to a corresponding property of a cell that is not treated or of a cell that includes an MTP polymorphism that increases predisposition for longevity.
80. A method of altering MTP activity in a cell, the method comprising: providing an agent that alters MTP gene transcription or expression; and administering the agent to a human adult subject, or to a cell of the subject, in an amount effective to MTP activity in a cell of the subject.
81. The method of claim 80 wherein the agent is an artificial zinc finger protein that interacts with an MTP regulatory sequence.
82. The method of claim 80 wherein the agent is an siRNA.
83. A method of providing an insurance policy for a subject, the method comprising : evaluating a nucleic acid of the subject for a polymorphism in the MTP gene at a nucleotide position defined by rsl800591 and/or rs2866164; calculating a risk factor as a function of presence or absence of the polymorphism; and providing information about an insurance policy premium as a function of the risk factor to the subject.
84. A kit comprising a polynucleotide that anneals within 200 nucleotides of the polymorphic position defined by rsl 800591 or rs2866164, and a reference sample derived from a longlived human individual.
Description:
METHODS FOR COHORT SELECTION AND LONGEVITY STUDIES CROSS-REFERENCE TO RELATED APPLICATIONS This application claims priority to U. S. Application Serial No. 10/378,397, filed on March 3,2003 and U. S. Provisional Application Serial No. 60/381, 014, filed on May 15,2002, the contents of each of which are incorporated by reference in their entireties.

BACKGROUND Genetic association studies are used to identify genetic markers and genes associated with a particular trait. Typically, genetic information is obtained from individuals that have the particular trait (the"cases") and is compared with genetic information from control individuals that do not have the particular trait (the "controls") or have the trait to a different degree. Multiple hypotheses are generated that test whether genetic markers are over or under represented in the case individuals compared to the control individuals.

However, some genetic association studies have been plagued by both false positive and false negative results (type I and type II error, respectively). One recognized problem is a failure to adequately match the genetic background of the cases and controls. This phenomenon is referred to as stratification. Stratification can arise, for example, in a study of a genetic trait where individuals in the class being studied are non-randomly distributed with respect to a particular genetic background.

Alleles that are associated with that genetic background, but that are not causative for the trait can be erroneously associated with the trait.

SUMMARY This application includes methods and compositions for evaluating a genetic locus associated with longevity and methods for evaluating information from groups of individuals.

It has been postulated that there is a genetic basis for longevity. About one in ten thousand people live to become centenarians. Since the siblings of centenarians are often long-lived, centenarians and their long-lived relatives, may possess one or more genes that enable them to live longer, healthier lives, and they may also lack genes that would otherwise predispose them to diseases that cause mortality to individuals that are not so long-lived.

The invention provides, in part, polymorphic loci with significant influence upon life-span. In particular, the invention provides a longevity locus with familial linkage and genetic association at the single nucleotide polymorphism (SNP) marker rsl553432 on human chromosome IV. Furthermore, the longevity locus includes the gene microsomal triglyceride transfer protein (MTP, also known as MTTP). An exemplary MTP gene encodes an amino acid described in GenBank reference NP000244. 1.

A method of diagnosing a predisposition to longevity, or lack thereof, is provided including the steps of obtaining a biological sample, and analyzing the sample for the presence or absence of the longevity locus.

The invention further provides a method for inducing longevity in a cell, including introducing a longevity-associated-locus that includes a polynucleotide sequence which encodes MTP into the cell.

Also provided by the invention is a method to screen for agents that induce longevity. The method includes treating a transgenic animal that does not have the longevity-associated locus including a polynucleotide sequence which encodes MTP, with the agent to be tested, and comparing the longevity of the treated transgenic animal with an animal, which can be a transgenic animal, and which does have the longevity-associated locus. A transgenic animal which was treated with the agent and which exhibits a longevity comparable to an animal that possesses the longevity- associated locus indicates that the agent induces longevity.

Agents of the invention can also be used to prevent or lessen the effects of diseases associated with aging. Diseases associated with aging include heart disease, cardiovascular diseases, stroke, Alzheimer's disease, Parkinson's, cancer, diabetes, obesity, ocular disease, arthritis, osteoporosis, and liver disease.

In one aspect, the invention features a method for evaluating propensity for longevity, the method including: providing a sample including nucleic acid, e. g. , from a genome or a subject; evaluating the nucleic acid for presence or absence of a polymorphism in an MTP gene; and outputting an evaluation of propensity for longevity as a function of the presence or absence of the polymorphism.

In an embodiment, the polymorphism includes a polymorphic nucleotide at a non-coding region of the gene. In another embodiment, the polymorphism includes a polymorphic nucleotide at a coding region of the gene.

In an embodiment, the polymorphism includes a polymorphic nucleotide at the position defined by rsl800591 or rs2866164 In an embodiment, the nucleic acid includes genomic DNA or a derivative product thereof.

In an embodiment, the sample is derived from a somatic cell. In another embodiment, the sample is derived from a gamete. In still another embodiment, the sample is derived from embryonic or fetal tissue.

For example, the evaluating includes hybridization, an enzymatic reaction, and/or mass spectroscopy. The evaluating can include evaluating one or both of the polymorphic positions defined by rsl800591 and rs2866164, e. g., evaluating for the rsl 800591 or the rs2866164. The evaluating can include evaluating a nucleotide in the codon that encodes position 95 of MTP or a non-coding nucleotide, e. g. , a nucleotide in the MTP promoter. Other polymorphisms that can be evaluated can be within 100,50, 20,10, 5 or 3 nucleotides of the polymorphic position defined by rsl800591 or rs2866164. The method can include other features described herein.

In one embodiment, the method further includes using the evaluation to calculate a risk, an insurance premium, or an outcome (e. g. , a binary outcome). The method can further include executing a decision (e. g. , a transaction, e. g. , a monetary transaction) based on the evaluation.

In another aspect, the invention features a method for evaluating propensity for longevity, the method including: evaluating one or more nucleotide positions in the MTP gene of a first genome or human subject and of a second genome or human subject, wherein the second subject is a long-lived individual (e. g. , living to at least the 65, 75, 80, 85, 90, 95, or 98th percentile) and the first subject is a genetic relative of the second subject; and comparing the nucleotides present at the one or more corresponding nucleotide between the first and second subject.

For example, the evaluating includes evaluating a nucleotide polymorphism that is under the D4S1564 marker, e. g. , in the 10-20cM region linked to the D4S1564 marker, e. g. , in a region between rsl800591 or rs2866164. In another example, the evaluating includes evaluating a nucleotide polymorphism defined by rsl800591 or rs2866164 in the MTP gene. The method can also include evaluating at least two, three, four, five, or ten markers in a particular region.

In one embodiment, the comparing includes evaluating a statistical function that depends on the presence or absence of the polymorphisms at rsl 800591 or rs2866164 in the MTP gene. The method can include other features described herein.

In another aspect, the invention features a method of altering MTP activity in a cell, the method including: providing an agent that alters MTP gene, mRNA, or protein activity; contacting the agent to a cell, in an amount effective to alter the MTP activity in the cell; and evaluating an age-associated parameter of the cell. The method can include other features described herein.

In another aspect, the invention features a method of altering MTP activity in a cell, the method including: providing an agent that alters MTP gene, mRNA, or protein activity; identifying a subject that lacking an MTP allele predisposed for longevity; and administering the agent to the subject, or to a cell of the subject, in an amount effective to alter the MTP activity. In some embodiments, the subject lacks the 493G allele and/or the 95Q allele. For example, the agent is an siRNA, e. g. , an siRNA is specific for the mRNA encoded by the MTP Q95H allele. The method can include other features described herein. In another embodiment, the agent is a small molecule (non-polymeric) inhibitor of MTP, PDI, or an lipoprotein. The method can further include evaluating the subject, e. g. , for a parameter, e. g. , an age-associated parameter or for MTP activity or lipoprotein levels.

In another aspect, the invention features a method for inducing longevity in a cell, the method including introducing a longevity-associated locus including a polynucleotide sequence that encodes for MTP into said cell. In one embodiment, the method further includes evaluating an age-associated parameter of the cell or lifespan of the cell. For example, the cell is in vitro during the introducing. In one embodiment, the method further includes transferring the cell into an organism. For example, the cell is in vivo, in an organism, during the introducing. The method can further include evaluating an age-associated parameter of the organism or lifespan of the organism. In one embodiment, the introducing replaces an endogenous allele.

The method can include other features described herein.

In another aspect, the invention features a method of screening for an agent that induces longevity, the method including: culturing a cell that does not have an rsl800591 or rs2866164 allele that is associated with increased longevity; contacting the cell with a test compound; and evaluating a property of the cell. For example, the cell does not include the-493G allele nor the 95Q allele. The evaluating can include evaluating an age-associated parameter of the cell, lipoprotein production or assembly, or lifespan.

In one embodiment, the method is effected in parallel for at least ten different compounds, and results of the evaluating are stored in a machine readable format.

In one embodiment, the cell is treated with an siRNA, e. g. , an siRNA is specific for the MTP gene.

The method can further include, e. g. , comparing the evaluated property of the cell to a corresponding property of a cell that is not treated or of a cell that includes an MTP polymorphism that increases predisposition for longevity. The method can include other features described herein.

In another aspect, the invention features a method of altering MTP activity in a cell, the method including: providing an agent that alters MTP gene transcription or expression ; and administering the agent to a human adult subject, or to a cell of the subject, in an amount effective to MTP activity in a cell of the subject. The method can be effected, e. g. , without information about the cardiovascular fitness/history of the subject, without information about a blood parameter, and so forth.

For example, the agent is an artificial zinc finger protein that interacts with an MTP regulatory sequence, e. g. , a zinc finger protein interacts with a nucleotide that is within ten nucleotides of position-493. In another example, the agent is an siRNA.

In another aspect, the invention features a kit that includes a polynucleotide probe or primer for a longevity associated polymorphism, e. g. , a polymorphism described herein. For example, the polynucleotide can anneal within 500,300, 200, 100,50, 20,10, 5,3, 2 or 1 nucleotides of the polymorphic position defined by rsl 800591 or rs2866164. The kit can further include a reference sample derived from a long-lived individual (, e. g. , at least 85% percentile) or a machine readable article having information encoded thereon that indicates information about the polymorphism and its association with longevity. In one embodiment, the polynucleotide is an extendable primer or a complementary probe.

In one aspect, the invention features a method that analyzes information that uses information for each individual of a first group of individuals and each individual of a second group of individuals. The information for each individual includes indications about a plurality of different biological features. The method includes selecting a subset of individuals from the second group using a comparison between information for members of the first group (or a subset of thereof) and information for members of the subset; and evaluating the relationship of at least one factor to members of the first group relative to members of the selected subset. In a machine based implementation, at least part of the information can be received, e. g. , from a user, or from instrumentation that analyzes a biologic. The method can also include outputting (e. g. , displaying, sending, storing, or transmitting) a result of the comparing, e. g. , to a user, a computer, a memory, and so forth.

In one embodiment, the different biological features can include at least one property of a biomolecule, e. g. , a protein, nucleic acid, lipid, or carbohydrate. For example, the property of the biomolecule relates to one or more of nucleic acid sequence, DNA methylation state, DNA accessibility, transcription factor binding, protein sequence, protein structure, protein conformation, protein aggregation state, protein localization, post-translational modification, mRNA sequence, mRNA structure, mRNA localization, mRNA chemical modification, carbohydrate structure, carbohydrate sequence, membrane composition, membrane fluidity, and so forth.

In one embodiment, the different biological features include a property of a cell, e. g. , cell differentiation state, cell size, cell number or abundance, mitotic index, divisional state, gene expression state, metabolic state, extracellular-associated molecules, tissue localization, and so forth. In one embodiment, the different biological features include a property of an organism, e. g. , anatomical features, blood pressure, pigmentation (hair, eye, skin). In some embodiments, the different biological features include various combinations of properties about biomolecules, cells, and organisms. The plurality of different biological features can also be restricted to features of exclusively one category, e. g. , features only about post- translational modifications, or only about nucleic acid sequence.

For example, the different biological features can include information about a plurality of genetic polymorphisms, e. g. , an indication of presence or absence of at least one polymorphism at a genetic locus, e. g. , an indication about the presence or absence of a minor or major allele. In one embodiment, the features include an indication of presence or absence of a minor allele and a corresponding indication for the major allele. This allelic information can be phased or unphased. In one embodiment, the polymorphisms include one or more of : a SNP, RFLP, a repeat sequence, a transposon, a retroviral sequence (e. g. , LTR), a microsatellite marker (e. g., LINE or SINE), insertion, deletion, substitution, or inversion. In one embodiment, the polymorphism is a biallelic polymorphism. In another embodiment, the polymorphism is a multiallelic polymorphism.

The plurality of different biological features can include at least some quantitative features. The plurality of different biological features can include at least some qualitative features. The plurality of different biological features can include at least some features that are represented by a binary variable.

In one embodiment, the plurality of different biological features includes at least five or ten features, e. g. , between 10-500,20-200, or 50-100 features.

The comparison can include use of a model (e. g. , a Bayesian network or information theory model) or a comparative function. In one embodiment, the comparison includes representing the information for each member as a multi- dimensional vector or matrix.

In one embodiment, the comparison is weighted by covariance of at least two different features, e. g. , by a covariance matrix for at least some or all features of the plurality of different features.

In one embodiment, the selecting includes selecting a subset that compares to the first group more favorably than at least another subset, e. g. , more favorably than average or median or more favorably than at least 70,80, 90,95% of other possible subsets, e. g. , most favorably.

In one embodiment, the selecting includes incrementally adding members of the second group to the subset. For example, the incremental adding is repeated until the subset contains the same number of members as the first group.

In one embodiment, the incremental adding includes selecting a single member of the second group based on how a group that includes the single member (e. g. , the single member plus the previous selected subgroup) compares to the first group. For example, the incremental adding includes selecting a single member of the second group that minimizes a comparative function for a comparison between a group that includes the single member (e. g. , the single member plus the previous selected subgroup) and the first group.

In another embodiment, the incremental adding includes selecting a cluster of members of the second group based on how a group that includes the cluster (e. g. , the cluster plus the previous selected subgroup) compares to the first group. For example, the incremental adding includes selecting a cluster of members of the second group that minimizes a comparative function for a comparison between a group that includes the cluster (e. g. , the cluster plus the previous selected subgroup) and the first group.

In another embodiment, the selecting includes pairing each member of the first group to a unique member of the second group. The pairing can include evaluating a comparative function. The pairing can include identifying a member of the second group that compares most favorably to the respective member of the first group.

In one embodiment, the comparison includes a comparative function that returns a value, e. g. , a scalar or multivariate value. The selecting can include minimizing the comparative function.

For example, the comparative function can be a function of distance. The distance can be weighted, e. g. , for genetic (e. g. , allelic) variability, variance, and co-

variance. The distance can be a function of a Euclidean distance, z-score distance, Bhattacharya distance, Mahalanobis distance, Matusita distance, divergence metric, Chernoff distance, angular metric, Earth Mover's distance, Hausdorff distance, , City Block (Manhattan) distance, Chebychev distance, Minkowski distance, or Canberra distance. In another example the comparative function is a function of a statistical test, e. g. , the mean chi-square of the G-test or a one minus Pearson correlation.

In another embodiment, the comparison includes assessing similarity using neural networks, Bayesian networks, support vector machines, or information theory.

In one embodiment, multiple subsets are selected.

In one embodiment, the individuals are animals. In another embodiment, the individuals are plants. In still another embodiment, the individuals are protists.

Typically, the individuals are all from the same species.

The evaluating of the relationship of at least one factor to members of the first group relative to members of the selected subset can include determining a statistical association of the factor among members of the first group relative to members of the selected subset. For example, the factor can be a feature common to at least 30,50, 70,80, 90, or 95% of members of the first group. For example, the factor can be a genetic polymorphism or other biological feature.

In another aspect, the invention features a method that includes: obtaining nucleic acid samples from each individual of a plurality of individuals, wherein a first group of the individuals are associated with a trait and a second group of individuals are not associated with the trait ; analyzing the nucleic acid samples to determine genetic information about a plurality of genetic loci for each individual of the plurality; selecting a subset of individuals from the second group based on a comparison between the genetic information for members of the first group and the genetic information for members of the subset; and evaluating association of a genetic locus of interest and individuals of the first group relative to association of the genetic locus of interest and individuals of the selected subset. The method can be used, for example, to evaluate the relationship between a genetic polymorphism and a trait.

For example, the genetic information includes an indication of presence or absence of at least one polymorphism at a genetic locus, e. g. , an indication about the presence or absence of a minor or major allele. In one embodiment, the genetic

information includes an indication of presence or absence of a minor allele and a corresponding indication for the major allele. The genetic information can be phased or unphased.

In one embodiment, the polymorphism is a SNP, RFLP, a repeat sequence, a transposon, a retroviral sequence (e. g. , LTR), a microsatellite marker (e. g., LINE or SINE), insertion, deletion, substitution, or inversion. In one embodiment, the polymorphism is a biallelic polymorphism. In another embodiment, the polymorphism is a multiallelic polymorphism.

In one embodiment, the selecting includes selecting a subset that compares to the first group more favorably than at least another subset, e. g. , more favorably than average or median or more favorably than at least 70,80, 90,95% of other possible subsets, e. g. , most favorably.

In one embodiment, the selecting includes incrementally adding members of the second group to the subset. For example, the incremental adding is repeated until the subset contains a particular number of members relative to the size of the first group, e. g. , the same number of members as the first group. In another example, the incremental adding is repeated until no additional members of the second group can be identified which can be added to the selected subset without exceeding a threshold value.

In one embodiment, the incremental adding includes selecting a single member of the second group based on how a group that includes the single member (e. g. , the single member plus the previous selected subgroup) compares to the first group. For example, the incremental adding includes selecting a single member of the second group that minimizes a comparative function for a comparison between a group that includes the single member (e. g. , the single member plus the previous selected subgroup) and the first group.

In another embodiment, the incremental adding includes selecting a cluster of members of the second group based on how a group that includes the cluster (e. g. , the cluster plus the previous selected subgroup) compares to the first group. For example, the incremental adding includes selecting a cluster of members of the second group that minimizes a comparative function for a comparison between a group that includes the cluster (e. g. , the cluster plus the previous selected subgroup) and the first group.

In another embodiment, the selecting includes pairing each member of the first group to a unique member of the second group. The pairing can include evaluating a comparative function. The pairing can include identifying a member of the second group that compares most favorably to the respective member of the first group.

In one embodiment, the comparison includes a comparative function that returns a value, e. g. , a scalar or multivariate value. The selecting can include minimizing the comparative function.

For example, the comparative function can be a function of distance. The distance can be weighted, e. g. , for genetic (e. g. , allelic) variability, variance, and co- variance. The distance can be a function of a Euclidean distance, z-score distance, Bhattacharya distance, Mahalanobis distance, Matusita distance, divergence metric, Chernoff distance, angular metric, Earth Mover's distance, Hausdorff distance, , City Block (Manhattan) distance, Chebychev distance, Minkowski distance, or Canberra distance. In another example the comparative function is a function of a statistical test, e. g. , the mean chi-square of the G-test or a one minus Pearson correlation.

In another embodiment, the comparison includes assessing similarity using neural networks, Bayesian networks, support vector machines, or information theory.

In one embodiment, multiple subsets are selected.

In one embodiment, the evaluating of the association includes evaluating a LOD score for one or more genetic loci (e. g. , polymorphic markers) of interest. The plurality of genetic markers can exclude the marker of interest. In one embodiment, the plurality of genetic markers contains between 5-500, 5-200, 10-100, or 10-80 different markers or at least 5,10, 20,30 or 50 markers, or less than 500,200, 100,80, or 50 markers. The plurality of genetic markers can be preselected, e. g. , randomly selected or selected, e. g. , to distribute over two or more chromosomes (e. g. , at least 5, 10,12, or 18 chromosomes), to distribute between various distances from a centromere or telomere, to include various degrees of heterozygosity, or to exclude one or more regions of interest (e. g. , suspect regions).

The method can further include obtaining information about each individual of the plurality, e. g. , medical information about each individual. The method can include examining each individual, e. g. , for a trait, symptom, disease, or other discernable phenotype. Examining can include invasive and non-invasive (e. g.,

imaging techniques). For example, the individuals are humans. The method can include interviewing the individual (e. g. , about medical history, family history, environmental exposure, behavior, social, or societal perceptions, etc.) The information can include information about one or more symptoms for a disease of interest.

The second group is typically larger than the first group. For example, the second group includes at least 0.2, 0.5, 1.0, 1.5, 2,2. 5,5, or 10 times more members than the first group. The selected subset can be any size relative to the first group, e. g. , the same size, or within 10,20, or 30% of the size of the first group, e. g. , larger or smaller than the first group., The selecting can include using more than one comparison, e. g. , in addition to a first comparison, filtering a result using a second comparison. For example, the selecting can include filtering the results using a statistical test, e. g. , the mean chi- square of the G-test. In one embodiment, the selecting includes a filter that requires that the mean chi-square of the G-test is less than 1.5.

The method can include other features described herein.

In another aspect, the invention features a method that includes: obtaining DNA samples from each individual of a first group of individuals and each individual of a second group of individuals; analyzing the DNA samples to determine information about a plurality of genetic markers for each individual of the first and second groups; selecting a subset of individuals from the second group using a comparison between the information for members of the first group and the information for members of the subset; and comparing members of the first group to members of the selected subset with respect to at least one factor.

In one embodiment, the comparing can include subjecting members of the first group, but not the second group to a condition and evaluating members of the first group and members the second group. For example, the condition is a medical procedure (e. g. , a therapeutic or diagnostic procedure) (e. g. , a drug regimen, a diet, a physical therapy plan, a psychological treatment and so forth). In another example, the condition is a behavior or social procedure.

The method can include other features described herein.

In another aspect, the invention features a method that includes: obtaining DNA samples from and information about each individual of a first group of the individuals are associated with a trait; analyzing the DNA samples to determine genetic information about a plurality of genetic loci for each individual of the plurality; sending the allelic information to a server that stores genetic information for each individual of a second group of individuals; and receiving information about a subset of individuals selected from the second group of individuals, wherein the subset of individuals is selected using a comparison between the genetic information for members of the first group and genetic information for members of the selected subset. The method can include other features described herein.

In still another aspect, the invention features a server that includes: a memory that stores allelic information for a plurality of genetic markers for each individual of a first group of individuals; and software configured to: receive genetic information about a plurality of genetic loci for each individual of a plurality of individuals; select a subset of individuals from the second group using a comparison between genetic information for members of the plurality of individuals and genetic information for members of the selected subset; and communicate information about individuals of the subset. The software can be configured according to other features described herein.

In another aspect, the invention features a (e. g. , a machine-based) method that includes: receiving genetic information for the first and second populations of individuals, the information including information about a plurality of genetic markers for each of the individuals; and returning a scalar value that is a function of the marker distribution for the first and second population and the degree of covariance among the markers. The method can be used, e. g. , for comparing a first and second population of individuals. For example, the function is a distance function.

The distance can be weighted, e. g. , for genetic (e. g., allelic) variability, variance, and co-variance. The distance can be a function of a Euclidean distance, z-score distance, Bhattacharya distance, Mahalanobis distance, Matusita distance, divergence metric, Chernoff distance, angular metric, Earth Mover's distance, Hausdorff distance, , City Block (Manhattan) distance, Chebychev distance, Minkowski distance, or Canberra

distance. In another example the comparative function is a function of a statistical test, e. g. , the mean chi-square of the G-test or a one minus Pearson correlation.

In one example, the function weights each allele by the degree of variability of the respective allele, e. g. , by its allele frequency in a third population or the first or second population. The method can include other features described herein.

In another aspect, the invention features a method that includes: receiving information for each individual of a first group of individuals and each individual of a second group of individuals, wherein the information for each individual includes indication about a plurality of different biological features; and evaluating a comparative function that returns a scalar value, compares the information for the first group to information of the second group, and depends on a covariance matrix for at least some features of the plurality of different features. The method can include other features described herein.

In another aspect, the invention features a method that includes: receiving genetic information for a plurality of individuals; identifying a first and second subset of individuals form the plurality of individuals by comparing occurrences of the genetic markers among individuals of the first and second subsets (e. g., complementary, overlapping, or non-complementary subsets); and subjecting the first subset of individuals to a first condition and the second subset of individuals to a second condition. The method can be used, e. g. , to perform a controlled study. For example, the first conditions can include administering a test treatment, and the second condition includes administering a control/placebo treatment. In one embodiment, the plurality of individuals includes human individuals consenting to participate in a study.

In another aspect, the invention features a machine readable medium having encoded thereon information including: a first list of records; a second list of records, wherein each record of the first and second list corresponds to a genome and includes genetic information about each of a plurality of genetic loci in the genome; and information describing a relationship between records of the first list and records of the second list, wherein the relationship is a function of the genetic information for at least a subset of the genetic markers, the markers of the subset including markers on at least two different chromosomes, and covariance between alleles of the genetic

markers of the subset. The information about the relationship can be stored in a data type that includes a pointer to the first list and a pointer to the second list. For example, the relationship can be based on a result returned by a function or model described herein. For example, the relationship can be a function of distance, e. g. , a Mahalanobis distance.

In one aspect, the invention features a method that analyzes information that uses information for each item of a first group of items (e. g. , biological samples, chemical samples, materials, or any other item) and each individual of a second group of items. The information for each items includes indications about a plurality of different properties. The method includes selecting a subset of items from the second group using a comparison between information for members of the first group (or a subset of thereof) and information for members of the subset; and evaluating the relationship of at least one factor to members of the first group relative to members of the selected subset. In a machine based implementation, at least part of the information can be received, e. g. , from a user, or from instrumentation that analyzes a property of an item. The method can also include outputting (e. g. , displaying, sending, storing, or transmitting) a result of the comparing, e. g. , to a user, a computer, a memory, and so forth. The method can be used, e. g. , to make appropriately controlled comparisons between groups of items.

The invention also features algorithms used to implement a comparison described herein and software and systems configured to execute a method described herein. A system can also include a user interface that enables a user to enter, filter, or select information to be used in a comparison and/or to receive a result based on a comparison or information about individuals selected by the system based on a comparison. Instructions for software can be encoded on or in a machine readable or accessible medium. Computer-based methods can be interfaced with a method that includes evaluating a biological sample and generating a computer-interpretable representation about a feature of the biological sample. Computer-based methods can also be interfaced with a user or another computer system, e. g. , to provide an interpretable output, e. g. , text, graphic, electronic message, sound, or other signal that can be processed by a user. For example, a computer can send identifiers of members of a selected subset to a user or to another computer system.

The term"trait"refers to any detectable property, e. g. , a property of an organism, a cell, or a molecule (except a sequence of a genomic DNA). The term "individual"refers to a discrete entity or an item referenced by the discrete entity. For example, in some implementations, an individual can refer to sample obtained from a cell or organism. An"allele"refers to a particular genetic variation in a nucleic acid sequence. Such variation can be present in a gene or outside of a gene. For example, the variation can be present in a coding, non-coding, regulatory, or non-functional region of a nucleic acid sequence. Variations can be present in euchromatin or heterochromatin and so forth.

As used herein, the term"polymorphism"generally refers to the any variation in sequence at a given position or region of nucleic acid sequence between individuals in a population, e. g. , human individuals. Variations include nucleotide substitutions (e. g. , transitions and transversions), insertions, deletions, inversions, and other rearrangements. A variation can encompass one or more nucleotide positions in a reference sequence that are absent, altered, inverted, or otherwise rearranged in another sequence. Some exemplary polymorphisms cause one or more change in the amino acid sequence of an encoded protein. For example, the MTP rs2866164 SNP causes an amino acid substitution at position 95. Other exemplary polymorphisms can affect regulation, e. g. , transcription, translation, splicing, mRNA or protein stability, mRNA or protein localization, chromatin organization, and so forth. Still other exemplary polymorphisms are silent or are only manifest under particular circumstances. Even completely silent markers are useful, e. g. , as indicators. For example, they may be tightly linked to a marker that is causative of a particular property. Typically a polymorphic marker described herein is an inherited variant, but may also arise through a spontaneous recombination event, or by artificial means, e. g. , by a targeted genetic manipulation.

An"MTP protein complex"refers to a protein complex that includes MTP and associated proteins, e. g. , protein disulfide isomerase (PDI) and Apo B particles.

As used herein, the term"nucleic acid molecule"includes DNA molecules (e. g. , a cDNA or genomic DNA), RNA molecules (e. g. , an mRNA) and analogs of the DNA or RNA. A DNA or RNA analog can be synthesized from nucleotide analogs.

The nucleic acid molecule can be single-stranded or double-stranded, e. g. , double- stranded DNA or a double-stranded RNA.

The term"isolated nucleic acid molecule"or"purified nucleic acid molecule" includes nucleic acid molecules that are separated from other nucleic acid molecules present in the natural source of the nucleic acid. For example, the invention features isolated nucleic acids that include one or more polymorphic positions associated with a predisposition for longevity. A position associated with a predisposition may be an association that confers a property that favors longevity or an association with a property that impairs longevity. In some embodiments, an"isolated"nucleic acid is free of sequences which naturally flank the nucleic acid (i. e. , sequences located at the 5'and/or 3'ends of the nucleic acid) in the genomic DNA of the organism from which the nucleic acid is derived. For example, in various embodiments, the isolated nucleic acid molecule can contain less than about 5 kb, 4 kb, 3 kb, 2 kb, 1 kb, 0.5 kb or 0.1 kb of 5'and/or 3'nucleotide sequences which naturally flank the nucleic acid molecule in genomic DNA of the cell from which the nucleic acid is derived. Examples of flanking sequences include adjacent genes, transposons, and regulatory sequences.

Moreover, an"isolated"nucleic acid molecule, such as a cDNA molecule, can be substantially free of other cellular material, of culture medium when produced by recombinant techniques, or of chemical precursors or other chemicals when chemically synthesized.

As used herein, the term"hybridizes under low stringency, medium stringency, high stringency, or very high stringency conditions"describes conditions for hybridization and washing. Guidance for performing hybridization reactions can be found in Current Protocols in Molecular Biology, John Wiley & Sons, N. Y. (1989), 6.3. 1-6.3. 6, which is incorporated by reference. Aqueous and nonaqueous methods are described in that reference and either can be used. Specific hybridization conditions referred to herein are as follows: 1) low stringency hybridization conditions in 6X sodium chloride/sodium citrate (SSC) at about 45°C, followed by two washes in 0.2X SSC, 0. 1 % SDS at least at 50°C (the temperature of the washes can be increased to 55°C for low stringency conditions); 2) medium stringency hybridization conditions in 6X SSC at about 45°C, followed by one or more washes in 0.2X SSC, 0. 1% SDS at 60°C ; 3) high stringency hybridization conditions in 6X SSC

at about 45°C, followed by one or more washes in 0.2X SSC, 0. 1% SDS at 65°C ; and preferably 4) very high stringency hybridization conditions are 0.5 M sodium phosphate, 7% SDS at 65°C, followed by one or more washes at 0.2X SSC, 1 % SDS at 65°C. Very high stringency conditions (4) are the preferred conditions and the ones that should be used unless otherwise specified. Methods of the invention can include use of an isolated nucleic acid molecule of the invention that hybridizes under a stringency condition described herein to a sequence described herein or use of a polypeptide encoded by such a sequence, e. g. , the molecule can be a naturally occurring variant.

An"isolated"or"purified"polypeptide or protein is substantially free of cellular material or other contaminating proteins from the cell or tissue source from which the protein is derived, or substantially free from chemical precursors or other chemicals when chemically synthesized. "Substantially free"means that the protein of interest in the preparation is at least 10% pure.

Calculations of homology or sequence identity between sequences (the terms are used interchangeably herein) are performed as follows. To determine the percent identity of two amino acid sequences, or of two nucleic acid sequences, the sequences are aligned for optimal comparison purposes (e. g. , gaps can be introduced in one or both of a first and a second amino acid or nucleic acid sequence for optimal alignment and non-homologous sequences can be disregarded for comparison purposes). In a preferred embodiment, the length of a reference sequence aligned for comparison purposes is at least 30%, preferably at least 40%, more preferably at least 50%, 60%, and even more preferably at least 70%, 80%, 90%, 100% of the length of the reference sequence. The amino acid residues or nucleotides at corresponding amino acid positions or nucleotide positions are then compared. When a position in the first sequence is occupied by the same amino acid residue or nucleotide as the corresponding position in the second sequence, then the molecules are identical at that position (as used herein amino acid or nucleic acid"identity"is equivalent to amino acid or nucleic acid"homology"). The invention includes sequences that are at least 70% homologous to human MTP nucleic acid or amino acid sequences. Such related sequences can also be used to evaluate subjects. The percent identity between two amino acid or nucleotide sequences can be determined using the algorithm of Meyers

and Miller ( (1989) CABIOS, 4: 11-17) which has been incorporated into the ALIGN program (version 2.0), using a PAM120 weight residue table, a gap length penalty of 12 and a gap penalty of 4.

"Subject,"as used herein, refers to human and non-human animals. The term "non-human animals"of the invention includes all vertebrates, e. g. , mammals, such as non-human primates (particularly higher primates), sheep, dog, rodent (e. g. , mouse or rat), guinea pig, goat, pig, cat, rabbits, cow, and non-mammals, such as chickens, amphibians, reptiles, etc. In a preferred embodiment, the subject is a human. In another embodiment, the subject is an experimental animal or animal suitable as a disease model.

A"purified preparation of cells", as used herein, refers to an in vitro preparation of cells. In the case cells from multicellular organisms (e. g. , plants and animals), a purified preparation of cells is a subset of cells obtained from the organism, not the entire intact organism. In the case of unicellular microorganisms (e. g. , cultured cells and microbial cells), it consists of a preparation of at least 10% and more preferably 50% of the subject cells.

The term"longevity"refers to ability to live to a chronological age that is at least above the median life expectancy for a population. For example, longevity can be manifest as the ability or enhanced probability of living to an age of at least the 65, 70,80, 85,90, 95,97, or 98% percentile for a given population, or of a decreased susceptibility to a disease or disorder, e. g. , an age-related diseases and/or disorders.

In humans, for example, longevity-can include living to at least 80, 85, 90,95, 96,97, 98, or 100 years of age.

The term"chronological age"as used herein refers to time elapsed since a preselected event, such as conception, a defined embryological or fetal stage, or, more preferably, birth.

In contrast, the term"biological age"refers to manifestations of the passage of time that is not linearly fixed with the amount of time elapsed. The manifestations of biological aging are varied and may depend on the species of organism, environmental conditions, and, as discussed herein, genotype. Exemplary manifestations of biological aging in mammals include endocrine changes (for example, puberty, menses, changes in fertility or fecundity, menopause, and

secondary sex characteristics, such as balding, ), metabolic changes (for example, changes in appetite and activity), and immunological changes (for example, changes in resistance to disease). The appearance of mammals also changes with biological age, for example, graying of hair, wrinkling of skin, and so forth. With respect to a different class of animals, the nematode C. elegans also has manifestations of biological aging, for example, changes in fecundity, activity, responsiveness to stimuli, and appearance (e. g. , change in intestinal autofluorescence and flaccidity). In many cases, the remaining potential lifespan of an individual is a function of its biological age.

Methods of the invention have many uses and provide numerous advantages.

For example, some methods of the invention can be used to control for the stratification problem and to correct for type I and type II errors. The methods can be used, for example, to identify a cohort of individuals, or to cluster individuals.

Accordingly, methods of the invention can greatly assist the analysis of biological information, for example, genetic analysis and other studies that may be affected by the genetic composition of its subjects. As with all methods pertaining to genetic analysis, methods described herein should accord, in their application, with the highest ethical standards.

Identifying genetic loci that affect longevity provides benefit, for example, for predicting individual lifespan and propensity for a disease normally associated with old age and for drug discovery (e. g. , the identification of drugs that mimic the affect of a long-lived allele). Predictive tests would allow for early prophylactic intervention. Accordingly, there is a need for the identification of genetic markers or genes that are indicative of the propensity for longevity and methods for using such markers and genes.

Other features and advantages of the instant invention will become more apparent from the following detailed description and claims. Embodiments of the invention can include any combination of features described herein. All patents, patent applications, and publications cited herein are incorporated by reference in their entirety.

BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 depicts haplotype blocks identified in the MTP region of chromosome IV.

FIG. 2 is a flowchart describing an exemplary strategy for identifying a polymorphism associated with longevity.

FIG. 3 is a flowchart of an exemplary method for comparing a case group to a selected control group.

FIG. 4 is a schematic of exemplary data structures.

FIG. 5 is a schematic of an exemplary computer system that can be used to implement aspects of the invention.

DETAILED DESCRIPTION This application describes, in part, methods for evaluating information from groups of individuals. One such method was used to identify a genetic locus associated with longevity in humans. Accordingly, in another part, the application describes methods and compositions for evaluating a genetic locus associated with longevity.

The MTP locus As detailed in the Examples below, longevity in humans is associated with certain genetic polymorphisms on chromosome IV. The most significantly implicated locus is linked to single nucleotide polymorphism (SNP) marker rsl553432 and corresponds to the MTP locus, e. g. , a region within 80 kb of the MTP gene ATG. This locus is within a larger region that has linkage at the D4S1564 marker (also referred to as AFM248zg9,248zg9, and Z23817) on human chromosome 4, and which is approximately 10-20cM in size. This larger 10-20cM region with linkage at the D4S1564 marker is described in PCT WO 02/14552.

Markers in the MTP locus can be used as indicators for the presence of a genetic predisposition that is manifested as increased longevity or conversely reduced susceptibility to diseases causing mortality). Particularly useful markers are described herein, and include, for example, rsl800591 and rs2866164, and rsl553432.

In particular, the following two alleles were sufficient to distinguish this allele from all others: 1) rsl800591 (also known as-493 G/T), and 2) rs2866164 and (also known as MTP Q/H 95).

Indeed, in the observed individuals, rs2866164 was perfectly correlated with another rsl800591. rsl800591 is a polymorphic variant in the MTP promoter. The region surrounding the rsl800591 polymorphic position is as follows: >gnl dbSNP rs1800591 allelePos=26 totalLen=51 taxid=9606 snpclass=1 a lleles='T/G'|mol=Genomic|build=89 TTAACATTAT TTTGAAGTGA TTGGTKGTGGTATGAA TTAACAGTTT AAATT (SEQ ID NO : 1 ) wherein K is the polymorphic position and is typically T or G. rs2866164 (MTP Q/H 95) results in a semi-conservative amino acid change (from glutamin to histidine) change in exon three at the protein's 95th translated amino acid. The region surrounding the rs2866164 polymorphic position is as follows: >gn1 dbSNP rs2866164 allelePos=404 tota1Len=590 taxid=9606 snpclass=1 alleles='C/ G'lmol=Genomiclbuild=l 01 GAattaggta aaaacaaaag taacctaaat aaagtatgaa ttttagttaa taatatatta atattggttt attaataata acaaatcaac tacactaatg taagttatta acagaagtaa ctgtgcaggc tatatgagaa atctttgtgt tattttagca atttttctgt aaatctaaaa ctgttataaa caataaaCCT ATTTTAAAGT AACATATACA TTATACAATG TGAGAAAATA TAATGGCAAA AAAAGAAAAT GCAAGCACAA ACTATATTCC GTCTCTCAGG GATAATAATT ATTAAGGTGT TCAGTGAATA TTGTTCTAAT CCTTTTTCTG TGCTCATATG TATGTATATT GTTTAAACAA GAAATCTCAA ACCATATCTA CATCTGCTGT TTAS CAGACTGCTT CCTTTACTTA CTAAGATATG GTAAATATAC TATATTTACA AGTTTACATA TTTACATATA CCAAACTATA CAAA. ATACAC ACTTTTATAG TATTATTTTA AAATGGCTTC ATGGAATTCT GTTGAATTGA TACGCCATAA TTTATTGAAC TATTTTTGTC ATTAACCAAA CATGGT (SEQ ID NO : 2) wherein S is the polymorphic position and is typically G or C.

An exemplary MTP protein includes the following amino acid sequence (see, e. g., gil45577631reNPLOO0244. 1) : MILLAVLFLCFISSYSASVKGHTTGLSLNNDRLYKLTYSTEVLLDRGKGKLQDSVGYRIS SNVDVALLW R NPDGDDDQLIQITMKDVNVENVNQQRGEKSIFKGKSPSKIMGKENLEALQRPTLLHLIHG KVKEFYSYQ N EAVAIENIKRGLASLFQTQLSSGTTNEVDISGNCKVTYQAHQDKVIKIKALDSCKIARSG FTTPNQVLG V SSKATSVTTYKIEDSFVIAVLAEETHNFGLNFLQTIKGKIVSKQKLELKTTEAGPRLMSG KQAAAIIKA V DSKYTAIPIVGQVFQSHCKGCPSLSELWRSTRKYLQPDNLSKAEAVRNFLAFIQHLRTAK KEEILQILK M ENKEVLPQLVDAVTSAQTSDSLEAILDFLDFKSDSSIILQERFLYACGFASHPNEELLRA LISKFKGSI G SSDIRETVMIITGTLVRKLCQNEGCKLKAWEAKKLILGGLEKAEKKEDTRMYLLALKNAL LPEGIPSL L KYAEAGEGPISHLATTALQRYDLPFITDEVKKTLNRIYHQNRKVHEKTVRTAAAAIILNN NPSYMDVKN I LLSIGELPQEMNKYMLAIVQDILRLEMPASKIVRRVLKEMVAHNYDRFSRSGSSSAYTGY IERSPRSAS T YSLDILYSGSGILRRSNLNIFQYIGKAGLHGSQWIEAQGLEALIAATPDEGEENLDSYAG MSAILFDV Q LRPVTFFNGYSDLMSKMLSASGDPISVVKGLILLIDHSQELQLQSGLKANIEVQGGLAID ISGAMEFSL W YRESKTRVKNRVTWITTDITVDSSFVKAGLETSTETEAGLEFISTVQFSQYPFLVCMQMD KDEAPFRQ F EKKYERLSTGRGYVSQKRKESVLAGCEFPLHQENSEMCKWFAPQPDSTSSGWF (SEQ ID NO : 3) The above sequence includes Q at position 95. The Q95H variant can include the following amino acid sequence : MILLAVLFLCFISSYSASVKGHTTGLSLNNDRLYKLTYSTEVLLDRGKGKLQDSVGYRIS SNVDVALLW R NPDGDDDQLIQITMKDVNVENVNQ_RGEKSIFKGKSPSKIMGKENLEALQRPTLLHLIHG KVKEFYSYQ N EAVAIENIKRGLASLFQTQLSSGTTNEVDISGNCKVTYQAHQDKVIKIKALDSCKIARSG FTTPNQVLG V SSKATSVTTYKIEDSFVIAVLAEETHNFGLNFLQTIKGKIVSKQKLELKTTEAGPRLMSG KQAAAIIKA V DSKYTAIPIVGQVFQSHCKGCPSLSELWRSTRKYLQPDNLSKAEAVRNFLAFIQHLRTAK KEEILQILK M ENKEVLPQLVDAVTSAQTSDSLEAILDFLDFKSDSSIILQERFLYACGFASHPNEELLRA LISKFKGSI G SSDIRETVMIITGTLVRKLCQNEGCKLKAWEAKKLILGGLEKAEKKEDTRMYLLALKNAL LPEGIPSL L KYAEAGEGPISHLATTALQRYDLPFITDEVKKTLNRIYHQNRKVHEKTVRTAAAAIILNN NPSYMDVKN I LLSIGELPQEMNKYMLAIVQDILRLEMPASKIVRRVLKEMVAHNYDRFSRSGSSSAYTGY IERSPRSAS T YSLDILYSGSGILRRSNLNIFQYIGKAGLHGSQWIEAQGLEALIAATPDEGEENLDSYAG MSAILFDV Q LRPVTFFNGYSDLMSKMLSASGDPISWKGLILLIDHSQELQLQSGLKANIEVQGGLAIDI SGAMEFSL W YRESKTRVKNRVTWITTDITVDSSFVKAGLETSTETEAGLEFISTVQFSQYPFLVCMQMD KDEAPFRQ FEKKYERLSTGRGYVSQKRKESVLAGCEFPLHQENSEMCKWFAPQPDSTSSGWF (SEQ ID NO : 4)

Other markers in the MTP locus and in the 10-20cM region linked to the D4S 1564 marker are also useful for evaluating predisposition for longevity.

Another exemplary marker is rsl 55343, which is described as follows : >gn1 dbSNP rs1553432 allelePos=705 totalLen=1049 taxid=9606 snpclass=1 alleles=' A/G'Imol=Genomic jbuild=8 8 wherein R is A or G.

Presence of an inherited variant at the rsl553432 marker, or locus surrounding the rsl553432 marker which comprises a polynucleotide sequence that encodes MTP, or a polymorphism within the longevity locus containing the MTP gene, is indicative of the propensity for longevity, e. g. , ability or enhanced probability of living to an age of at least the 65,70, 80,85, 90,95, 97, or 98% percentile for a given population or at least 75, 80, 85, 90, 95, or 100 years of age, or of a decreased susceptibility to a disease or disorder, e. g. , an age-related diseases and/or disorders.

Typically a polymorphic marker described herein is an inherited variant, but may also arise through a spontaneous recombination event, or by artificial means, e. g., by a targeted genetic manipulation.

In another preferred embodiment, the invention comprises methods for detecting a longevity marker in a biological sample, in particular the longevity marker is a polymorphic marker at the rsl553432 marker, or locus surrounding the

rsl553432 marker which comprises a polynucleotide sequence that encodes MTP, or a polymorphism within the longevity locus comprising the MTP gene. Such methods can use any suitable means, as will be readily understood by one of skill in the art.

For example, a suitable method can comprise amplifying DNA in the region of human chromosome 4 comprising the MTP gene by means such as PCR. The amplification product comprises a region of human chromosome 4 that contains the locus associated with longevity, by familial linkage and genetic association. The common presence of a variant of the longevity marker among individuals of old age, compared with a control population, is evidence of the presence of a polymorphic variant associated with the propensity for old age. In addition, the common presence of a variant of the longevity marker among individuals of old age, compared with a control population, is evidence of the presence of a polymorphic variant associated with a decreased propensity for age-related diseases or disorders. Preferably, the analysis is conducted with age-stratified samples. Generally, "old age"refers to people surviving to at least about 90 years of age, more preferably to at least about 98 years of age. The control population refers to an ethnically balanced set of individuals.

More particularly, the biological sample to be assessed can be obtained from any nucleated cell from the individual. For assay of genomic DNA, virtually any biological sample, except pure red blood cells, is suitable. For example, convenient tissue samples include whole blood, skin, hair, semen, saliva, sweat, tears, fecal material, and urine. For assay of cDNA or mRNA, the tissue sample must be obtained from an organ in which the target nucleic acid is expressed.

Variants of the ru 1553432 marker or MTP gene in association with extreme longevity, by familial linkage and genetic association, are indicative of the propensity for old age and the likelihood of avoiding disease of old age. For example, the invention provides the ability to predict the propensity for diseases such as heart disease, cardiovascular diseases, stroke, Alzheimer's disease, Parkinson's disease and other such neurodegenerative diseases, cancer, diabetes, obesity, ocular disease, arthritis, osteoporosis, liver disease, and the like, which are associated with the aging process. Accordingly, methods of the invention are useful not only to predict the likelihood of longevity, but are also useful to indicate possible early therapeutic

intervention to prevent (prophylactic treatment) or to lessen the effects (therapeutic treatment) of diseases associated with aging.

One genetic study that indicates that the MTP locus affects human longevity is described in the example below.

The identification of the MTP locus as a genetic locus that affects longevity and the rate of aging confirms the hypothesis that aging has a genetic component.

The MTP locus as shown herein contributes to properties that facilitate longevity, e. g., a lifespan that is in at least the 65,70, 80,85, 90,95, 97, or 98% percentile for a given population. Independent support for a genetic component is suggested by, for example, the finding that certain genetic polymorphisms of the apolipoprotein E E-4 allele are rare among centenarians (Schachter et al. , Nat. Genet. , 6: 29-32 (1994) and Rebeck et al., Neurology, 44: 1513-16 (1994) ). These under-represented polymorphisms in apolipoprotein E s-4 have been associated with Alzheimer's disease and cardiovascular disease, two diseases often associated with increasing age.

Another study observed that the offspring of centenarians had more favorable lipid profile characteristics compared to ethnically matched and unmatched controls (Barzilai et al. , J. Am. Geriatr. Soc. , 49: 76-79 (2001) ). These and other observations indicate that there may be one or more genetic loci that influence longevity.

Genetic studies in other species, including mammals, Drosophila and C. elegans, indicate that specific genetic polymorphisms have powerful influences upon life span (defined by the age of the oldest member of the species). Some genetic variations in model organisms are known to be in genes that encode factors that participate in basic mechanisms of metabolism and aging.

Group Selection In another aspect, the invention features a method for comparing individuals using multiple variables. The method can be used, e. g. , to compare one group of individuals to another group and to cluster individuals into groups based on similarities. For example, the method can be used to classify individuals based on a multivariate comparison to a predetermined group of individuals. In one embodiment, the method selects a subset of individuals from a pool based on multivariate comparison to members of the predetermined group to members from a population.

The comparison can be used affirmatively to select a subset of individuals that are similar (e. g. , most similar) to the predetermined group or it can be used negatively to select a subset of individuals that are dissimilar to the predetermined group. The method is not limited to information about genetic composition and may include information about other characteristics (e. g. , in addition to genetic information or instead of genetic information). Application of the method to classify individuals based on genetic compositions is used only as a convenient illustration.

In one implementation, individuals are matched in order to select a control group for another group of individuals (the"case group"). Referring to the exemplary method in FIG. 3, case group members are identified 110. Similarly, potential control group members are identified 130 (e. g. , before, after, or concurrently with the case group). The genotype of members of each group are evaluated 120,140. The genotype can include information about at least one genetic polymorphism. A subset of the potential group members is selected 150. The subgroup is used to define the "control group. "A feature (typically independent of the information used to classify the control group) of members of the case group is compared 160 to members of the control group. For example, statistical methods can be used to evaluate association of a feature with the case group relative to the control group. In one implementation, a LOD score (likelihood of odds) is determined that evaluates the probability that a genetic polymorphism is associated with the case group relative to the control group.

In one example, the case group may be preselected for a particular criterion (e. g. , a phenotypic trait). To correlate a genetic polymorphism with a phenotypic trait, the presence of a genetic polymorphism among members of a case group defined by individuals that have the phenotypic trait can be compared to the presence of that

polymorphism among members of the selected control group. The LOD score for association between the polymorphism and the trait can be determined. In another example, the case group can be human persons volunteering for an experimental protocol.

In a related aspect, any two groups of individuals are matched. The two groups are identified by a relationship (e. g. , a similarity relationship) using a particular model (e. g. , a neural network, Bayesian network, or information theory model) or comparison function. The two groups can be distinguished by prior, concurrent, or subsequent criterion. In one example, the two groups can be subjected to separate conditions after the matching. In another example, one group is distinguished by a prior criterion--that is prior to the matching, the first group is selected based on a criterion, and the second group is selected from a general pool based on similarity to the first group. It is possible to use a general pool that has not been evaluated for the criterion. (see, for example, the longevity study below).

Sample matching enables acquisition of statistical information about the association of a feature or multiple features with one or the other groups. Additional groups (e. g. , three or more groups) can be identified as needed, e. g. , for more complex analyses.

Genetic Information Genetic information refers to any indication about nucleic acid sequence content. Genetic information can include, for example, an indication about the presence or absence of a particular polymorphism, e. g. , one or more nucleotide variations. Exemplary polymorphisms include a single nucleotide polymorphism (SNP), a restriction site or restriction fragment length, an insertion, an inversion, a deletion, a repeat (e. g., trinucleotide repeat, a retroviral repeat), and so forth. In some embodiments, the genetic information describes a haplotype, e. g. , a plurality of polymorphisms on the same chromosome. However, in many embodiments, the genetic information is unphased.

It is possible to digitally record or communicate genetic information in a variety of ways. Typical representations include one or more bits, or a text string.

For example, a biallelic marker can be described using two bits. In one embodiment,

the first bit indicates whether the first allele (e. g. , the minor allele) is present, and the second bit indicates whether the other allele (e. g. , the major allele) is present. For markers that are multi-allelic, e. g. , where greater than two alleles are possible, additional bits can be used as well as other forms of encoding (e. g. , binary, hexadecimal text, e. g., ASCII or Unicode, and so forth). The information is typically unphased.

In another embodiment which uses phased genetic information, the first bit is associated with a particular chromosome, e. g. , the maternal chromosome, and"0"can be assigned to the minor allele, and"1"can be assigned to the major allele. The second bit is similarly associated with the other chromosome, e. g. , the paternal chromosome. In still another embodiment which can be used with unphased genetic information, two bits are used to encode the numbers-1, 0, and 1. Homozygotes for the minor allele were assigned the value-1, heterozygotes 0, and major allele homozygotes 1.

Distance Measures A distance measure can be used to compare two multivariate variables. The distance is a scalar value that represents a degree of similarity.

One exemplary distance is the Mahalanobis distance. The Mahalanobis distance is a measure of distance between two multivariate means that normalizes each dimension based on the covariance matrix: D---, where V is a vector representing the mean vector for the cases, V2 is the mean vector for the controls, and S is the inverse of the covariance matrix. The superscript T designates the transform of the difference matrix. D is the Mahalanobis distance. However, for most purposes, D, or any monotonic function of D can be used as an indicator of distance. The member Sg of the covariance matrix S is the covariance between values the i'th and j'th variables, as calculated from pooling data from both the case and control groups. In this matrix, values along the diagonal represent the variance of a particular variable.

Other measures of multivariate distance or similarity that could have been used include Euclidean distance, z-score distance, Bhattacharya distance, Matusita distance, divergence metric, Chernoff distance, angular metric, Earth Mover's distance, Hausdorff distance, one minus Pearson correlation, City Block (Manhattan) distance, Chebychev distance, Minkowski distance, and the Canberra distance. The Euclidian distance, for example, does not account for variance of a particular variable or co-variance between different variables.

Group Selection There are many ways of selecting a subset of control samples from a set of potential control samples that minimizes a multivariate distance between the case and control groups.

Incremental searching. One example is an incremental search. In one implementation, the single sample that minimizes the distance to the case group is selected from all the potential control samples for inclusion in the control groups.

Then, additional samples are added in similar fashion. In other words, in a subsequent cycle, from the remaining potential control samples, the single sample that when added to the previously selected controls sample (s), minimizes the distance to the case group, is selected. In one implementation, the distance is minimized by iteratively calculating the distance between subsets formed by each possible addition and the case group. The subset with the smallest distance is advanced to the next cycle. This step is repeated until the desired number of control samples is selected.

One to one matching. For each sample in the case group, select a sample from the set of potential controls that is most"similar"or"nearest"in multivariate space. The set of one-to-one matched samples are then used as the control group or subjected to other minimization procedures.

Exhaustive search. Another example is an exhaustive search. All possible subgroups (e. g. , of a predetermined size or size range) are enumerated and each subgroup is compared to the case group. The subgroup that compares favorably (e. g., most favorably or other favored subgroups) is selected.

Branched searches. This method limits the exhaustive search to a reduced set of possibility. As subgroups are compared, possible combinations are eliminated,

e. g. , using the dead-end theorem or other branching methods, to enumerate only some subgroups from the universe of possible subgroups.

Preclustering. It is also possible to compare members of the potential controls to one another to identify clusters of similar members using a comparison function. Then clusters that are similar to individual members or clusters in the case group are selected for inclusion in the control group.

Prefiltering. Prefiltering criteria can be defined to reduce the search size. For example, if all members of the case group have a certain properties, it is possible to eliminate members of the potential controls that do not have these properties. For example, if all members of the case group have the same alleles of at particular loci, potential controls that do not have these markers are discarded.

Boundary methods. A distance measure can also be used to define a boundary in multivariate space that defines a range similarity to members of the case group. For example, a Mahalanobis group can be defined using the Mahalanobis distance function. Controls can be selected from the subset of members that are within the boundary.

As described above, matching is evaluating using a distance function for multivariates. However, other methods can be used. For example, a Bayesian network or a model-based on information theory can be used.

The success of the matching can depend on the number of markers used, the informativeness of the markers with respect to genetic background, the similarities between the cases and controls being matched, and the degree of over sampling that occurs. Although described above as a selection of"controls"best matched to "cases", the opposite works equally as well, and"case"and"control"are only labels to distinguish two groups of samples that are distinguished by some covariate (e. g. trait, phenotype, etc. ). Similarly, the comparisons need not be based only on genetic information, but can include, in addition, other biological information, or exclusively non-genetic information.

The matching can be evaluated using a second function, e. g. , another distance metric or a statistical function. For matching genetic backgrounds, the mean chi- square of the G-Test statistics can be used to evaluate the matching. If the genetic backgrounds of the two armed study were perfectly matched, the mean chi-square of

the G-Test statistics for these markers have an expected value of 1.0. In some embodiments, a threshold may be set for the mean chi-square of the G-Test statistics, e. g. , less than 1.4, 1.3, 1.2, 1. 1., or 1.0.

Exemplary applications In one embodiment, the method can be used to identify two cohorts of genomes that are balanced relative to each other. The genomes can be from individual organisms, cells, and so forth. One application is to identify a control group of individuals for an experimental (or test) group, particularly where matching the genetic backgrounds of the two groups is important for evaluating data from the experimental and control groups.

The method can be used to identify a control group of individuals that is balanced relative to a test group. For example, the method can be used to evenly match individuals in test and control groups. The method can be used to partition individuals into two groups balanced for a plurality of biological parameters, e. g., genetic composition and/or other biological parameters described herein. Balancing can be general or targeted. General balancing typically involves, e. g. , selecting genetic markers without regard for their chromosomal position or association with particular traits. For example, these genetic markers may be distributed randomly throughout the genome, e. g. , on at least two chromosomes. General balancing can be used to optimize the genetic backgrounds of the test and control groups. In contrast, targeted balancing can be used to optimize the distribution of heterogeneity in one or more specific regions of the genome between the test and control groups. For example, in a study of a treatment for Alzheimer's disease, it may be useful to if the test and control groups include similar distributions of alleles known to be associated with that disease.

It is also possible to select genetic markers based on certain criteria, e. g., criteria that are independent of map position. Exemplary criteria include criteria that depend on distribution of the marker in a population, e. g. , a sample population. Such criteria include: the relative prevalence of the major and minor allele, and degree of heterozygosity (e. g. , between 0.1-5%, 3-20%, 20-45%, or 30-50%. Exemplary criteria can also include experimental factors, e. g. , degree of certainty that the allele

can unambiguously be identified. Other criteria may include: reliability of assay with respect to a specific platform and informativeness of a marker with respect to the genetic background of individuals sampled.

It is possible to survey a broad class of individuals that can qualify as potential controls and identify a panel of biological markers (e. g. , genetic markers) that vary among the potential controls. The panel of markers can then be used to select the subset of controls by comparison to the case group. If required, variance and/or covariance is used as a component of the comparison function to control for the degree of variation.

In some embodiments, the genetic markers are selected based on map position, e. g. , distance from another marker, distance from a centromere or telomere, and distance from heterochromatin.

The methods can be used to map genes that affect a trait of any organism, particularly a polyploid (e. g. , diploid) sexual organism. For example, the method can be used to map genes that may be associated with a human disease, and other human traits, such as resistance to environmental conditions, physical manifestations, and behaviors. In just one application, the method is used to evaluate genes that affect lifespan regulation or an age-related disease or predisposition to such a disease.

Exemplary age-related diseases include: cancer (e. g. , breast cancer, colorectal cancer, CCL, CML, prostate cancer); skeletal muscle atrophy; adult-onset diabetes; diabetic nephropathy, neuropathy (e. g. , sensory neuropathy, autonomic neuropathy, motor neuropathy, retinopathy); obesity; bone resorption; age-related macular degeneration, AIDS related dementia, ALS, Alzheimer's, Bell's Palsy, atherosclerosis, cardiac diseases (e. g. , cardiac dysrhythmias, chronic congestive heart failure, ischemic stroke, coronary artery disease and cardiomyopathy), chronic renal failure, type 2 diabetes, ulceration, cataract, presbiopia, glomerulonephritis, Guillan-Barre syndrome, hemorrhagic stroke, rheumatoid arthritis, inflammatory bowel disease, multiple sclerosis, SLE, Crohn's disease, osteoarthritis, Parkinson's disease, pneumonia, and urinary incontinence. Symptoms and diagnosis of such diseases are well known to medical practitioners.

Similarly, the method can be used to map genes that affect traits of other animals, e. g., agricultural livestock and wild animals. Further, the method can be used to map genes of plants, and sexual parasites.

In another embodiment, the method can be used to identify two cohorts of individuals that are balanced relative to each other based on biological parameters, e. g. , molecular parameters, levels of metabolites, gene expression, protein modification and so forth. The parameters can be evaluated by analyzing individual organisms, organs, tissues, cells, and so forth. One application is to identify a control group of individuals for an experimental (or test) group, particularly where matching the biological state of the two groups is important for evaluating data from the experimental and control groups.

Methods of Evaluating Genetic Information There are numerous ways of evaluating genetic information. Nucleic acid samples can analyzed using biophysical techniques (e. g. , hybridization, electrophoresis, and so forth), sequencing, enzyme-based techniques, and combinations-thereof. For example, hybridization of sample nucleic acids to nucleic acid microarrays can be used to evaluate sequences in an mRNA population and to evaluate genetic polymorphisms. Other hybridization based techniques include sequence specific primer binding (e. g., PCR or LCR); Southern analysis of DNA, e. g., genomic DNA; Northern analysis of RNA, e. g., mRNA ; fluorescent probe based techniques Beaudet et al. (2001) Genome Res. 11 (4): 600-8 ; allele specific amplification. Enzymatic techniques include restriction enzyme digestion; sequencing; and single base extension (SBE, ). These and other techniques are well known to those skilled in the art.

Electrophoretic techniques include capillary electrophoresis and Single-Strand Conformation Polymorphism (SSCP) detection (see, e. g. , Myers et al. (1985) Nature 313: 495-8 and Ganguly (2002) Hum Mutat. 19 (4): 334-42). Other biophysical methods include denaturing high pressure liquid chromatography (DHPLC).

In one embodiment, allele specific amplification technology that depends on selective PCR amplification may be used to obtain genetic information.

Oligonucleotides used as primers for specific amplification may carry the mutation of interest in the center of the molecule (so that amplification depends on differential hybridization) (Gibbs et al. (1989) Nucleic Acids Res. 17: 2437-2448) or at the extreme 3'end of one primer where, under appropriate conditions, mismatch can prevent, or reduce polymerase extension (Prossner (1993) Tibtech 11: 238). In addition, it is possible to introduce a restriction site in the region of the mutation to create cleavage-based detection (Gasparini et al. (1992) Mol. Cell Probes 6 : 1). In another embodiment, amplification can be performed using Taq ligase for amplification (Barany (1991) Proc. Natl. Acad. Sci USA 88: 189). In such cases, ligation will occur only if there is a perfect match at the 3'end of the 5'sequence making it possible to detect the presence of a known mutation at a specific site by looking for the presence or absence of amplification.

Enzymatic methods for detecting sequences include amplification based- methods such as the polymerase chain reaction (PCR; Saiki, et al. (1985) Science 230, 1350-1354) and ligase chain reaction (LCR; Wu. et al. (1989) Geno7nics 4,560-569 ; Barringer et al. (1990), Gene 1989,117-122 ; F. Barany. 1991, Proc. Natl. Acad. Sci.

USA 1988,189-193) ; transcription-based methods utilize RNA synthesis by RNA polymerases to amplify nucleic acid (U. S. Pat. No. 6,066, 457; U. S. Pat. No.

6,132, 997; U. S. Pat. No. 5,716, 785; Sarkar et al., Science (1989) 244: 331-34; Stofler et al., Science (1988) 239: 491); NASBA (U. S. Patent Nos. 5,130, 238; 5,409, 818; and 5,554, 517); rolling circle amplification (RCA; U. S. Patent Nos. 5,854, 033 and 6,143, 495) and strand displacement amplification (SDA; U. S. Patent Nos. 5,455, 166 and 5,624, 825). Amplification methods can be used in combination with other techniques.

Other enzymatic techniques include sequencing using polymerases, e. g. , DNA polymerases and variations thereof such as single base extension technology. See, e. g. , U. S. 6,294, 336; U. S. 6,013, 431 ; and U. S. 5,952, 174 Mass spectroscopy (e. g., MALDI-TOF mass spectroscopy) can be used to detect nucleic acid polymorphisms. In one embodiment, (e. g. , the MassEXTENDTM assay, SEQUENOM, Inc.), selected nucleotide mixtures, missing at least one dNTP and including a single ddNTP is used to extend a primer that hybridizes near a polymorphism. The nucleotide mixture is selected so that the extension products

between the different polymorphisms at the site create the greatest difference in molecular size. The extension reaction is placed on a plate for mass spectroscopy analysis.

Fluorescence based detection can also be used to detect nucleic acid polymorphisms. For example, different terminator ddNTPs can be labeled with different fluorescent dyes. A primer can be annealed near or immediately adjacent to a polymorphism, and the nucleotide at the polymorphic site can be detected by the type (e. g.,"color") of the fluorescent dye that is incorporated.

Hybridization to microarrays can also be used to detect polymorphisms, including SNPs. For example, a set of different oligonucleotides, with the polymorphic nucleotide at varying positions with the oligonucleotides can be positioned on a nucleic acid array. The extent of hybridization as a function of position and hybridization to oligonucleotides specific for the other allele can be used to determine whether a particular polymorphism is present. See, e. g. , U. S. 6,066, 454.

In one implementation, hybridization probes can include one or more additional mismatches to destabilize duplex formation and sensitize the assay. The mismatch may be directly adjacent to the query position, or within 10,7, 5,4, 3, or 2 nucleotides of the query position. Hybridization probes can also be selected to have a particular Tm, e. g. , between 45-60°C, 55-65°C, or 60-75°C. In a multiplex assay, Tm's can be selected to be within 5,3, or 2°C of each other, e. g. , probes for rsl800591and rs2866164 can be selected with these criteria.

It is also possible to directly sequence the nucleic acid for a particular genetic locus, e. g. , by amplification and sequencing, or amplification, cloning and sequence.

High throughput automated (e. g. , capillary or microchip based) sequencing apparati can be used. In still other embodiments, the sequence of a protein of interest is analyzed to infer its genetic sequence. Methods of analyzing a protein sequence include protein sequencing, mass spectroscopy, sequence specific immunoglobulins, and protease digestion.

Any combination of the above methods can also be used. The above methods can be used to evaluate any genetic locus, e. g. , in a method for analyzing genetic information from particular groups of individuals or in a method for analyzing a polymorphism associated with longevity, e. g. , the MTP locus.

Exemplary methods for evaluating the MTP locus Primers for evaluating an MTP locus can include an extendible 3'terminus that is located, e. g. , within 200,100, 50,10, 5,4, 3,2, or I nucleotide of a polymorphic position. For example, the primers that include: TTGAAGTGATTGGT (SEQ ID NO : 6) TTTTGAAGTGATTGG (SEQ ID NO : 7) TTATTTTGAAGTGATT (SEQ ID NO : 8) CATTATTTTGAAGTGA (SEQ ID NO : 9) can be extended to evaluating the rs 1800591 SNP.

On the opposing strand, primers that include, e. g. , : AATTCATACCAC (SEQ ID NO : 10) TAATTCATACCA (SEQ ID NO : 11) can be extended to evaluating the rsl800591 SNP.

It is also possible to use primers whose final nucleotide is positioned at the polymorphic site. For example: TTGAAGTGATTGGTT (SEQ ID NO : 12) TTGAAGTGATTGGTG (SEQ ID NO : 13) TTAATTCATACCACA (SEQ ID NO : 14) TTAATTCATACCACC (SEQ ID NO : 15) If the final nucleotide is labeled, a mismatch can be detected as a result of exonuclease proof-reading activity.

For example, primers that include: CATCTGCTGTTTA (SEQ ID NO : 16) ACATCTGCTGTTT (SEQ ID NO : 17) TCTACATCTGCTG (SEQ ID NO : 18) can be extended to evaluate the rs2866164SNP.

On the opposing strand, primers that include, e. g. , : GGAAGCAGTCTG (SEQ ID NO : 19) AGGAAGCAGTCT (SEQ ID NO : 20) GTAAAGGAAGCAGTC (SEQ ID NO : 21)

can be extended to evaluating the rs2866164 SNP. Primers can include additional complementary 5'sequences, e. g. , to achieve a desired annealing.

It is also possible to use primers whose final nucleotide is positioned at the polymorphic site. For example : CATCTGCTGTTTAG (SEQ ID NO : 22) CATCTGCTGTTTAC (SEQ ID NO : 23) GGAAGCAGTCTGG (SEQ ID NO : 24) GGAAGCAGTCTGC (SEQ ID NO : 25) Probes that bind to the rsl 800591 SNP can include the sequence: TGATTGGTTGTGGTATG (SEQ ID NO : 26) TGATTGGTGGTGGTATG (SEQ ID NO : 27) or complements thereof.

Probes that bind to the rs2866164 SNP can include the sequence: CTGCTGTTTAGCAGACTGCTTC (SEQ ID NO : 28) CTGCTGTTTACCAGACTGCTTC (SEQ ID NO : 29) or complements thereof.

Probes can include additional complementary 5'sequences, e. g. , to achieve a desired annealing and/or one, two, or three mismatches, e. g. , to destabilize and sensitize the probe as described above.

Similar probes and primers can be constructed for these and other SNPs, described herein or associated with longevity or another property.

Other methods of evaluating biological parameters Other molecular, genetic, cellular, immunological, and other biological methods known in the art can also be used to evaluate a property of a biological system. For general guidance, see, e. g. , techniques described in Sambrook & Russell, Molecular Cloning : A Laboratory Manual, 3d Edition, Cold Spring Harbor Laboratory, N. Y. (2001), Ausubel et al., Current Protocols in Molecular Biology (Greene Publishing Associates and Wiley Interscience, N. Y. (1989), (Harlow, E. and Lane, D. (1988) Antibodies. A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY), and updated editions thereof.

For example, antibodies, other immunoglobulins, and other specific binding ligands can be used to detect biomolecule, e. g. , a protein or other antigen. For example, one or more specific antibodies can be used to probe a sample. Various formats are possible, e. g., ELISAs, fluorescence-based assays, Western blots, and protein arrays. Methods of producing polypeptide arrays are described in the art, e. g., in De Wildt et al. (2000). Nature Biotech. 18,989-994 ; Lueking et al. (1999). Anal.

Biochem. 270, 103-111 ; Ge, H. (2000). Nucleic Acids Res. 28, e3, I-VII ; MacBeath, G. , and Schreiber, S. L. (2000). Science 289,1760-1763 ; and WO 99/51773A1.

Proteins can also be analyzed using mass spectroscopy, chromatography, electrophoresis, enzyme interaction or using probes that detect post-translational modification (e. g. , a phosphorylation, ubiquitination, glycosylation, methylation, or acetylation).

Nucleic acid expression can be detected, e. g. , for one or more genes by hybridization based techniques, e. g., Northern analysis, RT-PCR, SAGE, and nucleic acid arrays. Nucleic acid arrays are useful for profiling multiple mRNA species in a sample. A nucleic acid array can be generated by various methods, e. g. , by photolithographic methods (see, e. g. , U. S. Patent Nos. 5,143, 854 ; 5,510, 270; and 5,527, 681), mechanical methods (e. g. , directed-flow methods as described in U. S.

Patent No. 5,384, 261), pin-based methods (e. g. , as described in U. S. Pat. No.

5,288, 514 and 6,101, 946), and bead-based techniques (e. g., as described in PCT US/93/04145).

Metabolites can be detected by a variety of means, including enzyme-coupled assays, using labeled precursors, and nuclear magnetic resonance (NMR). For example, NMR can be used to determine the relative concentrations of phosphate- based compounds in a sample, e. g. , creatine levels. Other metabolic parameters such as redox state, ion concentration (e. g., Ca2+) (e. g. , using ion-sensitive dyes), and membrane potential can also be detected (e. g. , using patch-clamp technology).

Imaging techniques (including NMR, tomographic, radiological, and microscopic methods) can be used to image a sample or an organism. Examples of imaging information include the localization (e. g. , tissue or sub-cellular) of a biomolecule (e. g. , a protein, mRNA, or metabolite). Some imaging techniques use probes, e. g. , probes such as fluorescent labels such as fluorescein and rhodamine,

nuclear magnetic resonance active labels, Short-range radiation emitters, positron emitting isotopes detectable by a positron emission tomography ("PET") scanner, chemiluminescers such as luciferin, and enzymatic markers such as peroxidase or phosphatase.

Fluorescence activated cell sorting can be used to profile a cell population (e. g. , blood cells). FACS analysis can use one or more labeled antibodies for typing cells, e. g., using cell surface markers. Cells can also be assayed for response to a stimulus, e. g. , to a signalling molecule or other perturbation.

Numerous other assays can be used to detect the presence, quality, or quantity of a biomolecule or other biological property. Whole organisms can be assayed, e. g., by exposure to a pathogen, for a behavioral response, and so forth.

Molecular Biology Techniques Methods described herein can include use of routine techniques in the field of molecular biology, biochemistry, classical genetics, and recombinant genetics. Basic texts disclosing the general methods of use in this invention include Sambrook & Russell, Molecular Cloning : A Laboratory Mafzual, 3rd Edition, Cold Spring Harbor Laboratory, N. Y. (2001); Kriegler, Gene Transfer and Expression: A Laboratory Manual (1990); Scopes (1994) Protein Purification : Principles and Practice, New York: Springer-Verlag ; and Ausubel et al., Current Protocols in Molecular Biology (Greene Publishing Associates and Wiley Interscience, N. Y. (1989).

Computer Implementations The invention can be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations thereof. Methods of the invention can be implemented using a computer program product tangibly embodied in a machine-readable storage device for execution by a programmable processor; and method actions can be performed by a programmable processor executing a program of instructions to perform functions of the invention by operating on input data and generating output. For example, the invention can be implemented advantageously in one or more computer programs that are executable on a programmable system including at least one programmable processor coupled to receive data and

instructions from, and to transmit data and instructions to, a data storage system, at least one input device, and at least one output device. Each computer program can be implemented in a high-level procedural or object oriented programming language, or in assembly or machine language if desired; and in any case, the language can be a compiled or interpreted language. Suitable processors include, by way of example, both general and special purpose microprocessors. A processor can receive instructions and data from a read-only memory and/or a random access memory.

Generally, a computer will include one or more mass storage devices for storing data files; such devices include magnetic disks, such as internal hard disks and removable disks; magneto-optical disks; and optical disks. Storage devices suitable for tangibly embodying computer program instructions and data include all forms of non-volatile memory, including, by way of example, semiconductor memory devices, such as EPROM, EEPROM, and flash memory devices; magnetic disks such as, internal hard disks and removable disks; magneto-optical disks ; and CD ROM disks. Any of the foregoing can be supplemented by, or incorporated in, ASICs (application-specific integrated circuits).

An example of one such type of computer is depicted in FIG. 5, which shows a block diagram of a programmable processing system (system) 410 suitable for implementing or performing the apparatus or methods of the invention. The system 410 includes a processor 420, a random access memory (RAM) 421, a program memory 422 (for example, a writable read-only memory (ROM) such as a flash ROM), a hard drive controller 423, and an input/output (I/O) controller 424 coupled by a processor (CPU) bus 425. The system 410 can be preprogrammed, in ROM, for example, or it can be programmed (and reprogrammed) by loading a program from another source (for example, from a floppy disk, a CD-ROM, or another computer).

The hard drive controller 423 is coupled to a hard disk 430 suitable for storing executable computer programs, including programs embodying the present invention, and data including storage. The I/O controller 424 is coupled by means of an I/O bus 426 to an I/O interface 427. The I/O interface 427 receives and transmits data in analog or digital form over communication links such as a serial link, local area network, wireless link, and parallel link.

One non-limiting example of an execution environment includes computers running Linux Red Hat OS, Windows XP (Microsoft), Windows NT 4.0 (Microsoft) or better or Solaris 2.6 or better (Sun Microsystems) operating systems. Browsers can be Microsoft Internet Explorer version 4.0 or greater or Netscape Navigator or Communicator version 4.0 or greater. Computers for databases and administration servers can include Windows NT 4.0 with a 400 MHz Pentium II (Intel) processor or equivalent using 256 MB memory and 9 GB SCSI drive. For example, a Solaris 2.6 Ultra 10 (400Mhz) with 256 MB memory and 9 GB SCSI drive can be used. Other environments can also be used.

In one implementation, information about a set of potential controls is stored on a server. A user can send information about case groups to the server, e. g. , from a remote computer that communicates with the server using a network, e. g. , the Internet.

The server can compare the information about the case groups and select a subset of members from the potential controls, e. g. , to minimize a distance measure that is a function of the case groups and the selected subset. The server can return information about the subset (e. g. , identifiers or other data) to the user or can return an evaluation that compares a feature of the case group to the members of the selected subset (e. g. , a statistical score that evaluates probability of association with the case group relative to the selected subset). Accordingly, the server can include a electronic interface for receiving information from a user or from an apparatus that provides information about a biological property and software configured to execute identify a subset of data objects using a comparison described herein.- Referring to the exemplary data structures in FIG. 4, the server can store a data type 210 which includes information (RL) that relates two sets of the individuals and a table 240 which includes information about the individuals (indexed by Il, I2,... In).

For example, the information about the first individual in the table 240 can include an index Il, and features (Fl, l, Fl 2, and so on to Fl, m). The features can be, e. g. , the presence of a genetic polymorphism. The data type 210 includes a first pointer (P1) and a second pointer (P2). P 1 references a list 220 of individuals by their index in the table 240. P2 references another list 230 of individuals in the table 240. Other methods of referencing the individuals (e. g. , without an index) can also be used. The field RL in the datatype 210 can be used to store information about how the first list

220 relates to the second list 230. For example, RL can be used to store a scalar distance value or a vectorial value that is the result of a comparison function or a model that compares the two members of the two lists.

In some implementations, it is possible to include a table (not shown) that stores the data type 210 in each row, and optionally additional fields. Such a table can be used during a procedure that searches for a favored set of related groups. Thus, a relational database of the invention can include three tables, the table 240, a table that includes the data type 210, and a table of lists 220 and 230.

In another implementation, information about a subject's MTP locus is stored on a server. A user can send information about the subject (e. g. , a patient, a relative of a patient, a sample of gametes (e. g. , sperm or oocytes), fetal cells, or a candidate for a treatment) to the server, e. g., from a remote computer that communicates with the server using a network, e. g. , the Internet. The server can compare the information about the subject, e. g. , to a reference individual or to a particular sequence, e. g. , an allele described herein, and produce an indication as to the individuals propensity for longevity. The indication can be, for example, qualitative or quantitative. An exemplary qualitative indication includes a binary output (e. g. , text or other symbols indicating long-lived or average). An exemplary qualitative indication includes a statistical measure of the probability of reaching a certain age. The server can return the indication or information about related subjects (e. g. , family members), e. g. , to the user. For example, the server can build a family tree based on a set of related subject. Each individual-can be, e. g., assigned a statistical score that evaluates probability of achieving longevity or a predetermined age as a function of a longevity gene locus, e. g. , the MTP locus, and/or other factors. Accordingly, the server can include a electronic interface for receiving information from a user or from an apparatus that provides information about a longevity gene locus.

In one method, information about the subject's MTP locus, e. g. , the result of evaluating a polymorphism described herein, is provided (e. g. , communicated, e. g., electronically communicated) to a third party, e. g. , a hospital, clinic (e. g. , an in vitro fertilization service), a government entity, reimbursing party or insurance company (e. g. , a life insurance company). For example, choice of medical procedure, payment

for a medical procedure, payment by a reimbursing party, or cost for a service or insurance can be function of the information.

In one embodiment, a premium for insurance (e. g. , life or medical) is evaluated as a function of information about one or more longevity associated polymorphisms, e. g. , a polymorphism described herein, e. g. , an MTP gene polymorphism. For example, premiums can be increased (e. g. , by a certain percentage) if a first polymorphism is present in the candidate insured, or decreased if a second polymorphism is present. Premiums can also be scaled depending on heterozygosity or homozygosity. For example, premiums can be assessed to distribute risk, e. g. , commensurate with the allele distribution for the particular polymorphism. In the case of the rsl 800591 and rs2866164 alleles, the premiums can be assessed in order to distribute risk according to the allele distribution in Table 1.

In another examples, premiums are assessed as a function of actuarial data that is obtained from individuals with one or more longevity associated polymorphisms, e. g., a polymorphism described herein, e. g. , an MTP gene polymorphism.

Genetic information about one or more longevity associated polymorphisms, e. g. , a polymorphism described herein, e. g. , an MTP gene polymorphism, can be used, e. g. , in an underwriting process for life insurance. The information can be incorporated into a profile about a subject. Other information in the profile can include, for example, date of birth, gender, marital status, banking information, credit information, children and so forth. An insurance policy can be recommended as a function of the genetic information along with one or more other items-of information- in the profile. An insurance premium or risk assessment can also be evaluated as function of the genetic information. In one implementation, points are assigned for presence or absence of a particular allele, e. g. , a particular rsl800591 and rs2866164 allele, e. g. ,-493T or MTP 95H. The total points for longevity polymorphism and other risk parameters are summed. A premium is calculated as a function of the points, and optionally one or more other parameters.

In one embodiment, information about a longevity associated polymorphism, e. g. , a polymorphism described herein is analyzed by a function that determines whether to authorize or transfer of funds to pay for a service or treatment provided to a subject. For example, an allele that is not associated with increased longevity can

trigger an outcome that indicates or causes a refusal to pay for a service or treatment provided to a subject. For example, an entity, e. g. , a hospital, care giver, government entity, or an insurance company or other entity which pays for, or reimburses medical expenses, can use the outcome of a method described herein to determine whether a party, e. g. , a party other than the subject patient, will pay for services or treatment provided to the patient. For example. , a first entity, e. g. , an insurance company, can use the outcome of a method described herein to determine whether to provide financial payment to, or on behalf of, a patient, e. g. , whether to reimburse a third party, e. g. , a vendor of goods or services, a hospital, physician, or other care-giver, for a service or treatment provided to a patient. For example, a first entity, e. g. , an insurance company, can use the outcome of a method described herein to determine whether to continue, discontinue, enroll an individual in an insurance plan or program, e. g. , a health insurance or life insurance plan or program.

MTP biology The gene product of MTP participates in lipoprotein assembly and is a target for treating combined hyperlipidemia and obesity (Wetterau, Lin et al. 1997; Shelness and Sellers 2001). MTP is thought to funcation at the rate limiting step in production of apoB containing particles (Jamil, Chu et al. 1998), making it a particularly appealing target for next-generation lipid-lowering drugs. Structurally, the protein dimerizes with the ubiquitous protein disulfide isomerase (PDI) and resides on the luminal surface of the endoplasmic reticulum (ER) where it facilitates the proper manufacturer of very low density lipoprotein (VLDL) and chylomicron particles.

Functionally, MTP is directly involved in the packaging of apoB and triglyceride into these particles, and MTP and apoB are thought to directly bind one another during this assembly (Wu, Zhou et al. 1996). Rare humans with two non-functioning copies of the gene suffer from abetalipoproteinemia, and are characterized by the near absence of Apo-B particles in serum (Berriot-Varoqueaux, Aggerbeck et al. 2000).

To survive, these individuals must be aggressively treated with fat soluble vitamin supplementation.

The single copy knockout of MTP in mice resulted in a 28% reduction in ApoB levels while homozygotes died during embryonic development (Raabe, Flynn

et al. 1998). Hepatic overexpression in transgenic mice results in increased in vivo secretion of VLDL and apoB (Tietge, Bakillah et al. 1999). A liver-specific double knockout in mice lowered apoB-100 levels by 95% and apoB-48 levels by only 20% (Raabe, Veniant et al. 1999). Liver specific single copy MTP knockout mice demonstrate reduced serum glucose, insulin, and triglyceride levels, suggesting the additional importance of this gene in metabolic disease (Bjorkegren, Beigneux et al.

2002). Numerous classes of drugs that inhibit MTP activity have been shown to improve lipoprotein profiles (Wetterau, Gregg et al. 1998). Several food-products have also been shown to reduce MTP activity, including garlic (Lin, Wang et al.

2002), ethanol (Lin, Li et al. 1997), and citric flavanoids (Wilcox, Borradaile et al.

2001). One study found that MTP promoter allele-493T up-regulated MTP expression by two-fold (Karpe, Lundahl et al. 1998).

MTP has been associated with phenotypes including lipoprotein profiles, insulin resistance, and fat distribution, and most of these studies focused on the- 493G/T marker (Herrmann, Poirier et al. 1998; Karpe, Lundahl et al. 1998; Couture, Otvos et al. 2000; Juo, Han et al. 2000; Talmud, Palmen et al. 2000; Ledmyr, Karpe et al. 2002; St-Pierre, Lemieux et al. 2002). In terms of linkage studies, one investigation uncovered a quantitative trait locus (QTL) for lipoprotein particle size that included the MTP gene (Rainwater, Almasy et al. 1999). A linkage study of dizygotous twins implicated MTP in regulating triglyceride levels, which have been tentatively identified as a coronary artery disease modulator (Austin, Talmud et al. 1998).

The known activity of MTP, as a rate limiting step in lipid metabolism, is consistent with a relationship between MTP and human longevity. Coronary artery disease and other vasculopathies attributed to unfavorable lipid profiles (peripheral vascular disease, renal-vascular disease, and stroke) account for a large percentage of human mortality. Common genetic variants that impact the function of lipid metabolism should be expected to impact human lifespan; for example the offspring of centenarians have higher levels of HDL ("good"cholesterol) and lower levels of LDL ("bad"cholesterol) than age matched controls and they demonstrate significantly lower risks of heart disease and stroke compared with age-matched controls (Barzilai, Gabriely et al. 2001; Terry, Wilcox et al. 2003). In addition, a"longevity syndrome" was described amongst families with extremely low levels of LDL particles (Glueck,

Gartside et al. 1977). Although reasonable to believe that the impact of MTP on human longevity is through its impact on lipid profiles, the association studies above suggest that this gene may also affect susceptibility to insulin resistance and obesity.

The promoter region of the MTP gene is highly conserved among mammalian species and contains potential control sequences for regulating MTP expression in different cell types and in response to metabolic regulators. Transcriptional activation of the human MTP promoter is suppressed by insulin and enhanced by cholesterol (Haoan et al. , J. Biol. Chem. , 269: 28737-28744,1994). The insulin response has also been demonstrated in HepG2 human liver carcinoma cells (Lin et al. , J. Lipid Res. , 3 6: 1073-1081,1995). A high-fat or a cholesterol-enriched diet may also cause higher concentrations of MTP mRNA.

MTP and APOE There are many parallels between the associations of MTP and APOE. Both genes are risk factors implicated in cardiovascular disease as well as longevity, and the latter being also associated with Alzheimer's. The genetic epidemiology of MTP can be compared to incidence and predisposition of age-related diseases, such as Alzheimer's. Before starting the current study, as a quasi positive control, it was confirmed that in the subject population that the apo-E s2 allele is protective, the s3 allele is neutral, and the c4 allele is detrimental with respect to lifespan extension. No interaction between the MTP and APOE alleles with respect to lifespan was detected, although sample size may have been inadequate.

Screening Assays In one aspect, the invention provides assays for screening for a test compound, or more typically, a library of test compounds, to evaluate an effect of the test compound on MTP activity ira vitro, in a cell, or in an organism or to evaluate interaction between the test compound and an MTP protein complex component, e. g., on MTP or an MTP associated protein. It is possible to use a screen for MTP interaction or MTP activity to find agonists or antagonists of MTP protein complexes.

It is also possible to screen for compounds that are mimics of a genetic polymorphism

described herein. The term"mimic"refers to agents that cause effects in the same manner as the longevity marker and/or its encoded polynucleotide product.

Test Compounds. A"test compound"can be any chemical compound, for example, a macromolecule (e. g. , a polypeptide, a protein complex, or a nucleic acid) or a small molecule (e. g. , an amino acid, a nucleotide, an organic or inorganic compound). The test compound can have a formula weight of less than about 10,000 grams per mole, less than 5,000 grams per mole, less than 1,000 grams per mole, or less than about 500 grams per mole. The test compound can be naturally occurring (e. g. , a herb or a nature product), synthetic, or both. Examples of macromolecules are proteins, protein complexes, and glycoproteins, nucleic acids, e. g. , DNA, RNA and PNA (peptide nucleic acid). Examples of small molecules are peptides, peptidomimetics (e. g. , peptoids), amino acids, amino acid analogs, polynucleotides, polynucleotide analogs, nucleotides, nucleotide analogs, organic or inorganic compounds e. g. , heteroorganic or organometallic compounds. A test compound can be the only substance assayed by the method described herein. Alternatively, a collection of test compounds can be assayed either consecutively or concurrently by the methods described herein.

In one preferred embodiment, high throughput screening methods involve providing a combinatorial chemical or peptide library containing a large number of potential therapeutic compounds (potential modulator or ligand compounds). Such "combinatorial chemical libraries"or"ligand libraries"are then screened in one or more assays, as described herein, to identify those library members (particular chemical species or subclasses) that display a desired characteristic activity. The compounds thus identified can serve as conventional"lead compounds"or can themselves be used as potential or actual therapeutics.

A combinatorial chemical library is a collection of diverse chemical compounds generated by either chemical synthesis or biological synthesis, by combining a number of chemical"building blocks"such as reagents. For example, a linear combinatorial chemical library such as a polypeptide library is formed by combining a set of chemical building blocks (amino acids) in every possible way for a given compound length (i. e. , the number of amino acids in a polypeptide compound).

Millions of chemical compounds can be synthesized through such combinatorial mixing of chemical building blocks.

Preparation and screening of combinatorial chemical libraries is well known to those of skill in the art. Such combinatorial chemical libraries include, but are not limited to, peptide libraries (see, e. g. , U. S. Patent 5,010, 175, Furka, Int. J. Pept. Prot.

Res. 37: 487-493 (1991) and Houghton et al., Nature 354: 84-88 (1991) ). Other chemistries for generating chemical diversity libraries can also be used. Such chemistries include, but are not limited to: peptoids (e. g. , PCT Publication No. WO 91/19735), encoded peptides (e. g. , PCT Publication No. WO 93/20242), random bio- oligomers (e. g. , PCT Publication No. WO 92/00091), benzodiazepines (e. g. , U. S. Pat.

No. 5,288, 514), diversomers such as hydantoins, benzodiazepines and dipeptides (Hobbs et al., Proc. Nat. Acad. Sci. USA 90: 6909-6913 (1993) ), vinylogous polypeptides (Hagihara et al., J. Amer. Chem. Soc. 114: 6568 (1992) ), nonpeptidal peptidomimetics with glucose scaffolding (Hirschmann et al., J. Amer. Chem. Soc.

114: 9217-9218 (1992) ), analogous organic syntheses of small compound libraries (Chen et al., J. Amer. Chem. Soc. 116: 2661 (1994)), oligocarbamates (Cho et aL, Science 261: 1303 (1993) ), and/or peptidyl phosphonates (Campbell et al., J. Org.

Chem. 59: 658 (1994) ), nucleic acid libraries (see Ausubel, Berger and Sambrook, all supra), peptide nucleic acid libraries (see, e. g., U. S. Patent 5,539, 083), antibody libraries (see, e. g., Vaughn et al., Nature Biotechnology, 14 (3): 309-314 (1996) and PCT/US96/10287), carbohydrate libraries (see, e. g., Liang et al., Science, 274: 1520- 1522 (1996) and U. S. Patent 5,593, 853), small organic molecule libraries (see, e. g., benzodiazepines, Baum C&EN, Jan 18, page 33 (1993); isoprenoids, U. S. Patent 5,569, 588; thiazolidinones and metathiazanones, U. S. Patent 5,549, 974; pyrrolidines, U. S. Patents 5,525, 735 and 5,519, 134; morpholino compounds, U. S. Patent 5,506, 337; benzodiazepines, 5,288, 514, and the like). Additional examples of methods for the synthesis of molecular libraries can be found in the art, for example in: DeWitt et al.

(1993) Proc. Natl. Acad. Sci. U. S. A. 90: 6909; Erb et al. (1994) Proc. Natl. Acad. Sci.

USA 91: 11422; Zuckermann et al. (1994). J. Med. Chem. 37: 2678; Cho et al. (1993) Science 261: 1303; Carrell et al. (1994) Angew. Chem. Int. Ed. Engl. 33: 2059; Carell et al. (1994) Angew. Chem. Int. Ed. Engl. 33: 2061; and Gallop et al. (1994) J. Med.

Chem. 37: 1233.

Some exemplary libraries are used to generate variants from a particular lead compound. One method includes generating a combinatorial library in which one or more functional groups of the lead compound are varied, e. g. , by derivatization. Thus, the combinatorial library can include a class of compounds which have a common structural feature (e. g. , framework). Examples of lead compounds which can be used as starting molecules for library generation include: known inhibitors of MTP (see, e. g. , below).

Devices for the preparation of combinatorial libraries are commercially available (see, e. g. , 357 MPS, 390 MPS, Advanced Chem Tech, Louisville KY, Symphony, Rainin, Woburn, MA, 433A Applied Biosystems, Foster City, CA, 9050 Plus, Millipore, Bedford, MA). In addition, numerous combinatorial libraries are themselves commercially available (see, e. g. , ComGenex, Princeton, N. J. , Asinex, Moscow, Ru, Tripos, Inc. , St. Louis, MO, ChemStar, Ltd, Moscow, RU, 3D Pharmaceuticals, Exton, PA, Martek Biosciences, Columbia, MD, etc. ).

The test compounds of the present invention can also be obtained from: biological libraries; peptoid libraries (libraries of molecules having the functionalities of peptides, but with a novel, non-peptide backbone which are resistant to enzymatic degradation but which nevertheless remain bioactive; see, e. g., Zuckermann, R. N. et al. (1994) J. Med. Chem. 37: 2678-85); spatially addressable parallel solid phase or solution phase libraries; synthetic library methods requiring deconvolution; the'one- bead one-compound'library method; and synthetic library methods using affinity chromatography selection. The biological libraries include libraries of nucleic-acids and libraries of proteins. Some nucleic acid libraries encode a diverse set of proteins (e. g. , natural and artificial proteins; others provide, for example, functional RNA and DNA molecules such as nucleic acid aptamers or ribozymes. A peptoid library can be made to include structures similar to a peptide library. (See also Lam (1997) Arzticancer Drug Des. 12: 145). A library of proteins may be produced by an expression library or a display library (e. g. , a phage display library).

Libraries of compounds may be presented in solution (e. g. , Houghten (1992) Biotechniques 13: 412-421), or on beads (Lam (1991) Nature 354: 82-84), chips (Fodor (1993) Nature 364: 555-556), bacteria (Ladner, U. S. Patent No. 5,223, 409), spores (Ladner U. S. Patent No. 5,223, 409), plasmids (Cull et al. (1992) Proc Natl Acad Sci

USA 89: 1865-1869) or onphage (Scott and Smith (1990) Science 249: 386-390 ; Devlin (1990) Science 249: 404-406; Cwirla et al. (1990) Proc. Natl. Acad. Sci.

87: 6378-6382; Felici (1991) J. Mol. Biol. 222: 301-310 ; Ladnersupra.).

In vitro Assays. MTP activity can be assayed, e. g., in vitro. A variety of assays can be used to evaluate MTP activity. For example, U. S. 6,492, 365 describes MTP activity assays, in part, as follows.

Compounds that interact with MTP (e. g. , inhibit or enhance MTP activity) can be found by conducting the a transfer reaction in the presence of a test compound, and optionally comparing transfer to a control reaction (e. g. , a reaction in which the compound is absent, substituted by buffer, or an inactive compound). Generally, the process can include: (a) incubating a sample thought to contain an inhibitor of MTP with detectably labeled lipids in donor particles, acceptor particles and MTP; and (b) measuring the MTP stimulated transfer of the detectably labeled lipids from the donor particles to the acceptor particles. In this assay, an inhibitor would decrease the rate of MTP-stimulated transfer of detectable labeled lipid from donor to acceptor particles.

The detection may be carried out by nuclear magnetic resonance (NMR), electron spin resonance (ESR), radiolabeling (which is preferred), fluorescent labeling, and the like. The donor and acceptor particles may be membranes, HDL, low density lipoproteins (LDL), small unilamellar vesicles (SUV), lipoproteins and the like. HDL and SUV are the preferred donor particles; LDL and SUV are the preferred acceptor particles.

In one particular example, MTP transfer is assayed as follows: Triglyceride (TG) transfer activity was measured as the protein-stimulated rate of TG transfer from donor small unilamellar vesicles (SUV) to acceptor SUV. To prepare donor and acceptor vesicles, the appropriate lipids in chloroform were mixed and then dried under a stream of nitrogen. Two mL 15/40 buffer (15 mM Tris, pH 7.4, 40 mM sodium chloride, 1 mM EDTA, and 0.02% NaN. sub. 3) were added to the dried lipids, a stream of nitrogen was blown over the buffer, then the cap was quickly screwed on to trap a nitrogen atmosphere over the lipid suspension. Lipids in the buffer were bath-sonicated. The donor and acceptor phosphatidylcholine (PC) (egg L-alpha- phosphatidylcholine, Sigma Chem. Co. , St. Louis, Mo. ) was radiolabeled by adding traces of [3H] dipalmitoylphosphatidylcholine (phosphatidylcholine L-alpha-

dipalmitoyl [2-palmitoyl-9, 10, 3H H (N)], 33 Ci/mmol, DuPont NEN) to an approximate specific activity of 100 cpm/nmol. Donor vesicles containing 40 nmol egg PC, 0.2 mol % [l4C] TG [mixture of labeled (triolein [carboxyl-l4C]-, about 100 mCi/mmol, DuPont NEN) and unlabeled (triolein, Sigma Chem. Co. , St. Louis, Mo.) triolein for a final specific activity of about 200,000 cpm/nmol], and 7.3 mol % cardiolipin (bovine heart cardiolipin, Sigma Chemical Co. ) and acceptor vesicles containing 240 nmol egg PC and 0.2 mol % TG were mixed with 5 mg fatty acid free bovine serum albumin (BSA) and an aliquot of the MTP samples in 0.7 to 0.9 mL 15/40 buffer and incubated for 1 hour at 37°C. The transfer reaction was terminated by the addition of 0.5 mL DEAE-cellulose suspension (1 : 1 suspension DE-52, preswollen DEAE-cellulose anion exchange, Fisher, Cat. no. 05720-5 to 15 mM Tris, pH 7.4, 1 mM EDTA, and 0.02% NaN. sub. 3). The reaction mixture was agitated for 5 minutes and the DEAE-cellulose with bound donor membranes (the donor membranes contained the negatively charged cardiolipin and bound to the DEAE) were sedimented by low speed centrifugation.

In another example for a transfer assay in 150 pL format, transfer activity is determined by measuring the transfer of radiolabeled TG from [3H]-HDL (5 ig cholesterol) donor particles to LDL (50 pug cholesterol) acceptor particles at 37°C. for three hours in 15 mM Tris, pH 7.4, 125 mM MOPS, 30 mM Na acetate, 160 mM NaCl, 2.5 mM Na. sub. 2 EDTA, 0.02% NaN3,0. 5% BSA with about 50-200 ng purified MTP in the well of a 96-well plate. The material to be tested (e. g. , material in an assay compatible solvent such as ethanol, methanol or DMSO can be screened by addition to a well prior to incubation. The transfer is terminated with the addition of 10 iL of freshly prepared, 4. degree. C. heparin/MnCl2 solution (1.0 g heparin, Sigma Cat. No. H3393 187 U/mg, to 13.9 mL, 1.5 M MnCl. sub. 2.0. 4% heparin (187 I. U.)/0. 1 M MnCl2) to precipitate the 3H-TG-LDL acceptor particles and the plate centrifuged at 800 g. An aliquot of the supernatant from each well containing the [3H]-TG-HDL donor particles is transferred to scintillation cocktail and the radioactivity quantitated. The enzyme activity is calculated based on the percentage of TG transfer. The percent TG transfer will increase with increasing MTP

concentration. An inhibitor candidate will decrease the percent TG transfer. A similar assay could be performed with labeled CE or PC.

Generally, assay volumes can be scaled down so each assay is in less than 100, 10,5, or 1 p1. For example, at least 10,50, 100, or 300 compounds can be assayed at the same time, e. g. , using multi-well plates.

Other screens can be for an interaction between a compound an the MTP protein, or between a compound and PDI. For example, a screen can be used to identify compounds that affect the interaction between the MTP protein and PDI.

Another type of in vitro assay evaluates the ability of a test compound to modulate interaction between a first MTP complex component and a second MTP complex component, e. g. , between MTP and PDI. One of the proteins of the complex can be labeled with a fluorophore, and the other with a quencher. Compounds which causes dissociation may increase fluorescence in an assay sample, whereas compounds which promote association may decrease fluorescence. This type of assay can also be accomplished, for example, by coupling one of the components, with a radioisotope or enzymatic label such that binding of the labeled component to the other MTP complex component can be determined by detecting the labeled compound in a complex. A MTP complex component can be labeled with 125I, 35S, 14C, or 3H, either directly or indirectly, and the radioisotope detected by direct counting of radioemmission or by scintillation counting. Alternatively, a component can be enzymatically labeled with, for example, horseradish peroxidase, alkaline phosphatase, or luciferase, and the enzymatic label detected by determination of conversion of an appropriate substrate to product. Competition assays can also be used to evaluate a physical interaction between a test compound and a target.

Soluble and/or membrane-bound forms of isolated proteins (e. g. , MTP complex components) can be used in the cell-free assays of the invention. When membrane-bound forms of the protein are used, it may be desirable to utilize a solubilizing agent. Examples of such solubilizing agents include non-ionic detergents such as n-octylglucoside, n-dodecylglucoside, n-dodecylmaltoside, octanoyl-N- methylglucamide, decanoyl-N-methylglucamide, Triton X-100, Triton X-114, Thesitg, Isotridecypoly (ethylene glycol ether) n, 3- [ (3-

cholamidopropyl) dimethylamminio]-l-propane sulfonate (CHAPS), 3- [ (3- cholamidopropyl) dimethylamminio]-2-hydroxy-1-propane sulfonate (CHAPSO), or N-dodecyl=N, N-dimethyl-3-ammonio-1-propane sulfonate. In another example, the MTP complex component (e. g. , MTP) can reside in a membrane, e. g. , a liposome or other vesicle.

Cell-free assays involve preparing a reaction mixture of the target protein (e. g., the MTP complex component) and the test compound under conditions and for a time sufficient to allow the two components to interact and bind, thus forming a complex that can be removed and/or detected.

The interaction between two molecules can also be detected, e. g. , using a fluorescence assay in which at least one molecule is fluorescently labeled. One example of such an assay includes fluorescence energy transfer (FET or FRET for fluorescence resonance energy transfer) (see, for example, Lakowicz et al., U. S.

Patent No. 5,631, 169; Stavrianopoulos, et al., U. S. Patent No. 4,868, 103). A fluorophore label on the first,'donor'molecule is selected such that its emitted fluorescent energy will be absorbed by a fluorescent label on a second,'acceptor' molecule, which in turn is able to fluoresce due to the absorbed energy. Alternately, the'donor'protein molecule may simply utilize the natural fluorescent energy of tryptophan residues. Labels are chosen that emit different wavelengths of light, such that the'acceptor'molecule label may be differentiated from that of the'donor'.

Since the efficiency of energy transfer between the labels is related to the distance separating the molecules, the spatial relationship between the molecules can-be assessed. In a situation in which binding occurs between the molecules, the fluorescent emission of the'acceptor'molecule label in the assay should be maximal.

A FET binding event can be conveniently measured through standard fluorometric detection means well known in the art (e. g. , using a fluorimeter).

Another example of a fluorescence assay is fluorescence polarization (FP).

For FP, only one component needs to be labeled. A binding interaction is detected by a change in molecular size of the labeled component. The size change alters the tumbling rate of the component in solution and is detected as a change in FP. See, e. g. , Nasir et al. (1999) Comb Chem HTS 2: 177-190; Jameson et al. (1995) Methods Enzymol 246 : 283; Seethala et al.. (1998) Anal Biochem. 255: 257. Fluorescence

polarization can be monitored in multiwell plates, e. g. , using the Tecan Polarionrm reader. See, e. g, Parker et al. (2000) Journal of Biomolecular Screening 5: 77-88; and Shoeman, et al.. (1999) 38,16802-16809.

In another embodiment, determining the ability of the MTP complex component protein to bind to a target molecule can be accomplished using real-time Biomolecular Interaction Analysis (BIA) (see, e. g. , Sjolander, S. and Urbaniczky, C.

(1991) Anal. Clzena. 63: 2338-2345 and Szabo et al. (1995) Curr. Opin. Struct. Biol.

5: 699-705). "Surface plasmon resonance"or"BIA"detects biospecific interactions in real time, without labeling any of the interactants (e. g., BIAcore). Changes in the mass at the binding surface (indicative of a binding event) result in alterations of the refractive index of light near the surface (the optical phenomenon of surface plasmon resonance (SPR) ), resulting in a detectable signal which can be used as an indication of real-time reactions between biological molecules.

In one embodiment, the MTP complex component is anchored onto a solid phase. The MTP complex component/test compound complexes anchored on the solid phase can be detected at the end of the reaction, e. g. , the binding reaction. For example, the MTP complex component can be anchored onto a solid surface, and the test compound, (which is not anchored), can be labeled, either directly or indirectly, with detectable labels discussed herein.

It may be desirable to immobilize either the MTP complex component or an anti-MTP complex component antibody to facilitate separation of complexed from uncomplexed forms of one or both of the proteins, as well as to accommodate automation of the assay. Binding of a test compound to a MTP complex component protein, or interaction of a MTP complex component protein with a second component in the presence and absence of a candidate compound, can be accomplished in any vessel suitable for containing the reactants. Examples of such vessels include microtiter plates, test tubes, and micro-centrifuge tubes. In one embodiment, a fusion protein can be provided which adds a domain that allows one or both of the proteins to be bound to a matrix. For example, glutathione-S- transferase/MTP complex component fusion proteins or glutathione-S- transferase/target fusion proteins can be adsorbed onto glutathione sepharose beads (Sigma Chemical, St. Louis, MO) or glutathione derivatized microtiter plates, which

are then combined with the test compound or the test compound and either the non- adsorbed target protein or MTP complex component protein, and the mixture incubated under conditions conducive to complex formation (e. g. , at physiological conditions for salt and pH). Following incubation, the beads or microtiter plate wells are washed to remove any unbound components, the matrix immobilized in the case of beads, complex determined either directly or indirectly, for example, as described above. Alternatively, the complexes can be dissociated from the matrix, and the level of MTP complex component binding or activity determined using standard techniques.

Other techniques for immobilizing either a MTP complex component protein or a target molecule on matrices include using conjugation of biotin and streptavidin.

Biotinylated MTP complex component protein or target molecules can be prepared from biotin-NHS (N-hydroxy-succinimide) using techniques known in the art (e. g., biotinylation kit, Pierce Chemicals, Rockford, IL), and immobilized in the wells of streptavidin-coated 96 well plates (Pierce Chemical).

In order to conduct the assay, the non-immobilized component is added to the coated surface containing the anchored component. After the reaction is complete, unreacted components are removed (e. g. , by washing) under conditions such that any complexes formed will remain immobilized on the solid surface. The detection of complexes anchored on the solid surface can be accomplished in a number of ways.

Where the previously non-immobilized component is pre-labeled, the detection of label immobilized on the surface indicates that complexes were formed. Where the previously non-immobilized component is not pre-labeled, an indirect label can be used to detect complexes anchored on the surface, e. g. , using a labeled antibody specific for the immobilized component (the antibody, in turn, can be directly labeled or indirectly labeled with, e. g. , a labeled anti-Ig antibody).

In one embodiment, this assay is performed utilizing antibodies reactive with a MTP complex component protein or target molecules but which do not interfere with binding of the MTP complex component protein to its target molecule. Such antibodies can be derivatized to the wells of the plate, and unbound target or the MTP complex component protein trapped in the wells, by antibody conjugation. Methods for detecting such complexes, in addition to those described above for the GST- immobilized complexes, include immunodetection of complexes using antibodies

reactive with the MTP complex component protein or target molecule, as well as enzyme-linked assays which rely on detecting an enzymatic activity associated with the MTP complex component protein or target molecule.

Alternatively, cell free assays can be conducted in a liquid phase. In such an assay, the reaction products are separated from unreacted components, by any of a number of standard techniques, including but not limited to: differential centrifugation (see, for example, Rivas, G. , and Minton, A. P. , (1993) Trends Biochem Sci 18: 284-7); chromatography (gel filtration chromatography, ion-exchange chromatography); electrophoresis (see, e. g. , Ausubel, F. et al., eds. Current Protocols in Molecular Biology 1999, J. Wiley: New York. ) ; and immunoprecipitation (see, for example, Ausubel, F. et al., eds. (1999) Current Protocols in Molecular Biology, J. Wiley: New York). Such resins and chromatographic techniques are known to one skilled in the art (see, e. g. , Heegaard, N. H. , (1998) JMol Recognit 11: 141-8; Hage, D. S. , and Tweed, S. A. (1997) J Chromatogr B Biomed Sci Appl. 699: 499-525). Further, fluorescence energy transfer may also be conveniently utilized, as described herein, to detect binding without further purification of the complex from solution.

In a preferred embodiment, the assay includes contacting the MTP complex component protein or biologically active portion thereof with a known compound which binds a MTP complex component to form an assay mixture, contacting the assay mixture with a test compound, and determining the ability of the test compound to interact with a MTP complex component protein, wherein determining the ability of the test compound to interact with the MTP complex component protein includes determining the ability of the test compound to preferentially bind to the MTP complex component or biologically active portion thereof, or to modulate the activity of a target molecule, as compared to the known compound.

The target products of the invention can, in vivo, interact with one or more cellular or extracellular macromolecules, such as proteins. For the purposes of this discussion, such cellular and extracellular macromolecules are referred to herein as "binding partners. "Compounds that disrupt such interactions can be useful in regulating the activity of the target product. Such compounds can include, but are not limited to molecules such as antibodies, peptides, and small molecules. The preferred targets/products for use in this embodiment are the MTP complex components. In an

alternative embodiment, the invention provides methods for determining the ability of the test compound to modulate the activity of a MTP complex component protein through modulation of the activity of a downstream effector of a MTP complex component target molecule. For example, the activity of the effector molecule on an appropriate target can be determined, or the binding of the effector to an appropriate target can be determined, as previously described.

To identify compounds that interfere with the interaction between the target product and its cellular or extracellular binding partner (s), a reaction mixture containing the target product and the binding partner is prepared, under conditions and for a time sufficient, to allow the two products to form complex. In order to test an inhibitory agent, the reaction mixture is provided in the presence and absence of the test compound. The test compound can be initially included in the reaction mixture, or can be added at a time subsequent to the addition of the target and its cellular or extracellular binding partner. Control reaction mixtures are incubated without the test compound or with a placebo. The formation of any complexes between the target product and the cellular or extracellular binding partner is then detected. The formation of a complex in the control reaction, but not in the reaction mixture containing the test compound, indicates that the compound interferes with the interaction of the target product and the interactive binding partner. Additionally, complex formation within reaction mixtures containing the test compound and normal target product can also be compared to complex formation within reaction mixtures containing the test compound and mutant target product. This comparison can be important in those cases wherein it is desirable to identify compounds that disrupt interactions of mutant but not normal target products.

These assays can be conducted in a heterogeneous or homogeneous format.

Heterogeneous assays involve anchoring either the target product or the binding partner onto a solid phase, and detecting complexes anchored on the solid phase at the end of the reaction. In homogeneous assays, the entire reaction is carried out in a liquid phase. In either approach, the order of addition of reactants can be varied to obtain different information about the compounds being tested. For example, test compounds that interfere with the interaction between the target products and the binding partners, e. g. , by competition, can be identified by conducting the reaction in

the presence of the test substance. Alternatively, test compounds that disrupt preformed complexes, e. g. , compounds with higher binding constants that displace one of the components from the complex, can be tested by adding the test compound to the reaction mixture after complexes have been formed. The various formats are briefly described below.

In a heterogeneous assay system, either the target product or the interactive cellular or extracellular binding partner, is anchored onto a solid surface (e. g. , a microtiter plate), while the non-anchored species is labeled, either directly or indirectly. The anchored species can be immobilized by non-covalent or covalent attachments. Alternatively, an immobilized antibody specific for the species to be anchored can be used to anchor the species to the solid surface.

In order to conduct the assay, the partner of the immobilized species is exposed to the coated surface with or without the test compound. After the reaction is complete, unreacted components are removed (e. g. , by washing) and any complexes formed will remain immobilized on the solid surface. Where the non-immobilized species is pre-labeled, the detection of label immobilized on the surface indicates that complexes were formed. Where the non-immobilized species is not pre-labeled, an indirect label can be used to detect complexes anchored on the surface ; e. g. , using a labeled antibody specific for the initially non-immobilized species (the antibody, in turn, can be directly labeled or indirectly labeled with, e. g. , a labeled anti-Ig antibody).

Depending upon the order of addition of reaction components, test compounds that inhibit complex formation or that disrupt preformed complexes can be detected.

Alternatively, the reaction can be conducted in a liquid phase in the presence or absence of the test compound, the reaction products separated from unreacted components, and complexes detected; e. g. , using an immobilized antibody specific for one of the binding components to anchor any complexes formed in solution, and a labeled antibody specific for the other partner to detect anchored complexes. Again, depending upon the order of addition of reactants to the liquid phase, test compounds that inhibit complex or that disrupt preformed complexes can be identified.

In an alternate embodiment of the invention, a homogeneous assay can be used. For example, a preformed complex of the target product and the interactive cellular or extracellular binding partner product is prepared in that either the target

products or their binding partners are labeled, but the signal generated by the label is quenched due to complex formation (see, e. g. , U. S. Patent No. 4,109, 496 that utilizes this approach for immunoassays). The addition of a test substance that competes with and displaces one of the species from the preformed complex will result in the generation of a signal above background. In this way, test substances that disrupt target product-binding partner interaction can be identified.

In yet another aspect, the MTP complex component proteins or regions thereof can be used as"bait proteins"in a two-hybrid assay y (see, e. g. , U. S. Patent No.

5,283, 317; Zervos et al. (1993) Cell 72 : 223-232; Madura et al. (1993) J. Biol. Chem.

268: 12046-12054 ; Bartel et al. (1993) Biotechniques 14: 920-924; Iwabuchi et al.

(1993) Oncogene 8: 1693-1696; and Brent W094/10300).

In another embodiment, modulators of an MTP complex component gene expression are identified. For example, a cell or cell free mixture is contacted with a candidate compound and the expression of the MTP complex component mRNA or protein evaluated relative to the level of expression of MTP complex component mRNA or protein in the absence of the candidate compound. When expression of the MTP complex component mRNA or protein is greater in the presence of the candidate compound than in its absence, the candidate compound is identified as a stimulator of MTP complex component mRNA or protein expression. Alternatively, when expression of the MTP complex component mRNA or protein is less (statistically significantly less) in the presence of the candidate compound than in its absence, the candidate compound is identified as an inhibitor of the MTP complex component mRNA or protein expression. The level of the MTP complex component mRNA or protein expression can be determined by methods for detecting MTP complex component mRNA or protein.

Organismal Assays. Still other methods for evaluating a test compound include organismal based assays, e. g. , using a mammal (e. g. , a mouse, rat, primate, or other non-human), or other animal (e. g. , Xenopus, zebrafish, or an invertebrate such as a fly or nematode).

In some cases, the organism is a transgenic organism, e. g. , an organism which includes a heterologous MTP complex component, (e. g. , from a mammal, e. g. , a human). In one embodiment, transgenic animals that express one or more variant

alleles associated with longevity, as described herein, are produced. The animals can be either heterozygous or homozygous. The animals can be tested with agents to identify agents that are mimics, agonists, or antagonists of longevity. The longevity of the treated transgenic animal is compared with the longevity of a comparable animal that does not possess the same variant allele or combination of alleles and which is treated with the same agent. Alternatively, the longevity of the tested transgenic animal is compared to a transgenic animal which is not treated with the agent. Further still, the longevity of the treated transgenic animal can be compared with an untreated transgenic animal. Transgenic animals can express one or more variant alleles, or combinations of variant alleles, or they can be knockout animals that lack the variant allele locus or that lack an endogenous gene but include a heterologous copy, e. g. , a human MTP gene. Furthermore, transgenic animals may have all or only a portion of the longevity-associated locus, as will be understood by one of skill in the art.

The test compound can be administered to the organism once or as a regimen (regular or irregular). A parameter of the organism is then evaluated, e. g. , an age- associated parameter or a parameter of the MTP complex. Test compounds that are indicated as of interest result in a change in the parameter relative to a reference, e. g., a parameter of a control organism. Other parameters (e. g. , related to toxicity, clearance, and pharmacokinetics) can also be evaluated.

In some embodiment, the test compound is evaluated using an animal that has a particular disorder, e. g. , an age associated disorder. These disorders provide a sensitized system in which the test compound's effects on physiology can be observed.

Exemplary disorders include: denervation, disuse atrophy; metabolic disorders (e. g., disorder of obese and/or diabetic animals such as db/db mouse and ob/ob mouse); cerebral, liver ischemia ; cisplatin/taxol/vincristine models; various tissue (xenograph) transplants; transgenic bone models; Pain syndromes (include inflammatory and neuropathic disorders); Paraquot, genotoxic, oxidative stress models; pulmonary obstruction (e. g. , asthma models); and tumor models.

To evaluate a test compound, it is administered to the animal, and a parameter of the animal is evaluated, e. g. , after a period of time. The animal can be fed ad libitum or normally (e. g. , not under caloric restriction, although some parameters can

be evaluated under such conditions). Typically, a cohort of such animals is used for the assay. Generally, a test compound can be indicated as favorably altering lifespan regulation in the animal if the test compound affects the parameter in the direction of the phenotype of a similar animal subject to caloric restriction. Such test compounds may cause at least some of the lifespan regulatory effects of caloric restriction, e. g. , a subset of such effects, without having to deprive the organism of caloric intake.

In one embodiment, the parameter is an age-associated or disease associated parameter, e. g. , a symptom of the disorder associated with the animal model. For example, the test compound can be administered to the SH Rat, and blood pressure is monitored. A test compound that is favorably indicated can cause an amelioration of the symptom relative to a similar reference animal not treated with the compound. In a related embodiment, the parameter is a parameter of the MTP activity, e. g. , blood lipid composition.

In assessing whether a test compound is capable of inhibiting the MTP complex for the purpose of altering life span regulation, a number of age-associated parameters or biomarkers can be monitored or evaluated. Exemplary age associated parameters include: (i) lifespan of the cell or the organism; (ii) presence or abundance of a gene transcript or gene product in the cell or organism that has a biological age- dependent expression pattern; (iii) resistance of the cell or organism to stress; (iv) one or more metabolic parameters of the cell or organism; (v) proliferative capacity of the cell or a set of cells present in the organism; and (vi) physical appearance or behavior of the cell or organism.

Characterization of molecular differences between two such organisms, e. g., one reference organism and one organism treated with an MTP complex modulator can reveal a difference in the physiological state of the organisms. The reference organism and the treated organism are typically the same chronological age.

Generally, organisms of the same chronological age may have lived for an amount of time within 15,10, 5,3, 2 or 1% of the average lifespan of a wildtype organism of that species. In a preferred embodiment, the organisms are adult organisms, e. g. the organisms have lived for at least an amount of time in which the average wildtype organism has matured to an age at which it is competent to reproduce.

In some embodiments, the organismal screening assay is performed before the organisms exhibit overt physical features of aging. For example, the organisms may be adults that have lived only 10,30, 40,50, 60, or 70% of the average lifespan of a wildtype organism of the same species.

Age-associated changes in metabolism, immune competence, and chromosomal structure have been reported. Any of these changes can be evaluated, either in a test subject (e. g. , for an organism based assay), or for a patient (e. g. , prior, during or after treatment with a therapeutic described herein.

In another embodiment, a marker associated with caloric restriction is evaluated in a subject organism of a screening assay (or a treated subject). Although these markers may not be age-associated, they may be indicative of a physiological state that is altered when MTP activity is modulated. The marker can be an mRNA or protein whose abundance changes, for example, in calorically restricted animals. WO 01/12851 and US 6,406, 853 describe exemplary markers.

In a related aspect, the invention features a method of evaluating a test compound using a plurality of biomarkers. This can be done by profiling the sample.

The method includes providing a cell or organism and a test compound; contacting the test compound to the cell; obtaining a subject expression profile for the contacted cell; and comparing the subject expression profile to one or more reference profiles.

The profiles include a value representing the level of expression of molecules previously determined to be correlated with MTP activity (see, e. g. , below). In a preferred embodiment, the subject expression profile is compared to a target profile, e. g. , a profile for a normal cell or for desired condition of a cell. The test compound is evaluated favorably if the subject expression profile is more similar to the target profile than an expression profile obtained from an uncontacted cell.

Similarity of profiles can be determined by a variety of metric, including Euclidean distance in a n-dimensional space, where n is the number of different values within the profile. Other metrics, for example, include weighting factors that basis different values according to their importance for the comparison.

Profiles, e. g. , profiles obtained from nucleic acid array or protein arrays can be used to compare samples and/or cells in a variety of states as described in Golub et al.

( (1999) Science 286: 531). In one embodiment, multiple expression profiles from

different conditions and including replicates or like samples from similar conditions are compared to identify nucleic acids whose expression level is predictive of the sample and/or condition. Each candidate nucleic acid can be given a weighted "voting"factor dependent on the degree of correlation of the nucleic acid's expression and the sample identity. A correlation can be measured using a Euclidean distance or the Pearson correlation coefficient.

It is also possible to use structure-activity relationships (SAR) and structure- based design principles to find compounds that affect MTP activity activity.

Structure-based design can also be used to identify a pharmacophore which may lead to drug optimization.

Once a compound is identified that matches the pharmocophore, it can be tested for activity, e. g. , for binding to a component of MTP complex and/or for a biological activity, e. g. , modulation of MTP activity, e. g. , MTP inhibition. See, e. g., "Screening Methods". siRNA It is also possible to regulate MTP complex activity using a double-stranded RNA (dsRNA) that mediates RNA interference (RNAi). The dsRNA can be delivered to cells or to an organism. Endogenous components of the cell or organism can trigger RNA interference (RNAi) which silences expression of genes that include the target sequence. dsRNA can be produced by transcribing a cassette in both directions, for example, by including a T7 promoter on either side of the cassette. The insert in the cassette is selected so that it includes a sequence complementary to a nucleic acid encoding MTP or an MTP protein complex component. The sequence need not be full length, for example, an exon, or at least 50 nucleotides. The sequence can be from the 5'half of the transcript, e. g. , within 1000,600, 400, or 300 nucleotides of the ATG. See also, the HiScribe RNAi Transcription Kit (New England Biolabs, MA) and Fire, A. (1999) Trends Genet. 15,358-363. dsRNA can be digested into smaller fragments. See, e. g. , US Patent Application 2002-0086356 and 2003-0084471.

In one embodiment, an siRNA is used. siRNAs are small double stranded RNAs (dsRNAs) that optionally include overhangs. For example, the duplex region is

about 18 to 25 nucleotides in length, e. g. , about 19,20, 21,22, 23, or 24 nucleotides in length. Typically the siRNA sequences are exactly complementary to a target mRNA.

In one embodiment, an siRNA that is specific for an MTP allele (e. g. , the Q95H allele) is used. dsRNAs can be used to silence or reduce gene expression in mammalian cells, e. g. , MTP gene expression, and may be administered to an organism to silence or reduce gene expression in a cell in the organism. See, e. g. , Clemens, J. C. et al. (2000) Proc. Natl. Sci. USA 97, 6499-6503 ; Billy, E. et al. (2001) Proc. Natl. Sci. USA 98, 14428-14433; Elbashir et al. (2001) Nature. 411 (6836): 494-8 ; Yang, D. et al. (2002) Proc. Natl. Acad. Sci. USA 99,9942-9947. Such molecules can be used to contact cells ex vivo or in vivo to regulate MTP activity in those cells. dsRNA molecules can be used to provide cells and organisms (e. g., mammalian cells and organisms, and nematode mammalian cells and organisms) that are deficient in an MTP activity, e. g. , to establish a baseline for a compound that inhibits mRNA activity. Such cells and organisms are useful tools for evaluating heterologous MTP molecules and test compounds for activity, e. g. , an activity that modulates lifespan regulation.

Other agents Other agents include artificial ligands that can regulate MTP activity. For example, artificial-zinc finger proteins can be designed to bind to an MTP promoter, e. g. , within 20,10, or 5 nucleotides of position-493. The artificial zinc finger proteins can include a repression domain, e. g. , to reduce MTP production.

Exemplary methods for producing artificial zinc finger proteins are described in US 6,534, 261; 6,511, 808; 6,453, 242, and 6,410, 248.

Phannacogenomics Both prophylactic and therapeutic methods of treatment may be specifically tailored or modified, based on knowledge obtained from a pharmacogenomics analysis. In particular, a subject can be treated based on the presence or absence of a genetic polymorphism associated with longevity, e. g. , a polymorphism associated

with the MTP locus. Phannacogenomics allows a clinician or physician to target prophylactic or therapeutic treatments to patients who will most benefit from the treatment and to avoid the treatment of patients who will experience toxic or other undesirable drug-related side effects. In particular, a diet or drug that affects an age- related disorder can be prescribed as a function of the subject's MTP locus.

Pharmaceutical Compositions Another embodiment of the invention is a method to screen for agents that are agonists, mimics or antagonists of the longevity marker and its encoded polynucleotide product. The term"agonist"refers to agents that potentiate or stimulate the activities of the longevity marker and/or its encoded polynucleotide product. The term"mimic"refers to agents that cause effects in the same manner as the longevity marker and/or its encoded polynucleotide product. Whereas the term "antagonist"opposes or interferes with the activities of the longevity marker and/or its encoded polynucleotide product.

A compound that modulates an MTP pathway component can be incorporated into a pharmaceutical composition for administration to a subject, e. g. , a human, a non-human animal, e. g. , an animal patient (e. g. , pet or agricultural animal) or an animal model (e. g. , an animal model for aging or a metabolic disorder (e. g. , a pancreatic or insulin related disorder). Such compositions typically include a small molecule (e. g. , a small molecule that is an MTP inhibitor), nucleic acid molecule, protein, or antibody and a pharmaceutically acceptable carrier. As used herein the language"pharmaceutically acceptable carrier"includes solvents, dispersion media, coatings, antibacterial and antifungal agents, isotonic and absorption delaying agents, and the like, compatible with pharmaceutical administration. Other active compounds can also be incorporated into the compositions.

Exemplary compounds that can be used for inhibit MTP or the MTP protein complex include compounds disclosed in US 6,492, 365; 6,472, 414; 6,281, 228; 6,066, 650; 5,919, 795; and Chang et al.. (2002) Cu7r spill Drug Discov Devel 5 (4): 562-70.

A pharmaceutical composition is formulated to be compatible with its intended route of administration. Examples of routes of administration include parenteral, e. g. , intravenous, intradermal, subcutaneous, oral (e. g. , inhalation), transdermal (topical), transmucosal, and rectal administration. Solutions or suspensions used for parenteral, intradermal, or subcutaneous application can include the following components: a sterile diluent such as water for injection, saline solution, fixed oils, polyethylene glycols, glycerine, propylene glycol or other synthetic solvents; antibacterial agents such as benzyl alcohol or methyl parabens; antioxidants such as ascorbic acid or sodium bisulfite ; chelating agents such as ethylenediaminetetraacetic acid; buffers such as acetates, citrates or phosphates and agents for the adjustment of tonicity such as sodium chloride or dextrose. pH can be adjusted with acids or bases, such as hydrochloric acid or sodium hydroxide. The parenteral preparation can be enclosed in ampoules, disposable syringes or multiple dose vials made of glass or plastic.

Pharmaceutical compositions suitable for injectable use include sterile aqueous solutions (where water soluble) or dispersions and sterile powders for the extemporaneous preparation of sterile injectable solutions or dispersion. For intravenous administration, suitable carriers include physiological saline, bacteriostatic water, Cremophor ELTM (BASF, Parsippany, NJ) or phosphate buffered saline (PBS). In all cases, the composition must be sterile and should be fluid to the extent that easy syringability exists. It should be stable under the conditions of manufacture and storage and must be preserved against the contaminating action of microorganisms such as bacteria and fungi. The carrier can be a solvent or dispersion medium containing, for example, water, ethanol, polyol (for example, glycerol, propylene glycol, and liquid polyetheylene glycol, and the like), and suitable mixtures thereof. The proper fluidity can be maintained, for example, by the use of a coating such as lecithin, by the maintenance of the required particle size in the case of dispersion and by the use of surfactants. Prevention of the action of microorganisms can be achieved by various antibacterial and antifungal agents, for example, parabens, chlorobutanol, phenol, ascorbic acid, thimerosal, and the like. In many cases, it will be preferable to include isotonic agents, for example, sugars, polyalcohols such as manitol, sorbitol, or sodium chloride in the composition. Prolonged absorption of the

injectable compositions can be brought about by including in the composition an agent which delays absorption, for example, aluminum monostearate and gelatin.

Sterile injectable solutions can be prepared by incorporating the active compound in the required amount in an appropriate solvent with one or a combination of ingredients enumerated above, as required, followed by filtered sterilization.

Generally, dispersions are prepared by incorporating the active compound into a sterile vehicle which contains a basic dispersion medium and the required other ingredients from those enumerated above. In the case of sterile powders for the preparation of sterile injectable solutions, the preferred methods of preparation are vacuum drying and freeze-drying which yields a powder of the active ingredient plus any additional desired ingredient from a previously sterile-filtered solution thereof.

Oral compositions generally include an inert diluent or an edible carrier. For the purpose of oral therapeutic administration, the active compound can be incorporated with excipients and used in the form of tablets, troches, or capsules, e. g., gelatin capsules. Oral compositions can also be prepared using a fluid carrier for use as a mouthwash. Pharmaceutically compatible binding agents, and/or adjuvant materials can be included as part of the composition. The tablets, pills, capsules, troches and the like can contain any of the following ingredients, or compounds of a similar nature: a binder such as microcrystalline cellulose, gum tragacanth or gelatin; an excipient such as starch or lactose, a disintegrating agent such as alginic acid, Primogel, or corn starch; a lubricant such as magnesium stearate or Sterotes; a glidant such as colloidal silicon dioxide; a sweetening agent such as sucrose or saccharin; or a flavoring agent such as peppermint, methyl salicylate, or orange flavoring.

For administration by inhalation, the compounds are delivered in the form of an aerosol spray from pressured container or dispenser which contains a suitable propellant, e. g. , a gas such as carbon dioxide, or a nebulizer.

Systemic administration can also be by transmucosal or transdermal means.

For transmucosal or transdermal administration, penetrants appropriate to the barrier to be permeated are used in the formulation. Such penetrants are generally known in the art, and include, for example, for transmucosal administration, detergents, bile salts, and fusidic acid derivatives. Transmucosal administration can be accomplished through the use of nasal sprays or suppositories. For transdermal administration, the

active compounds are formulated into ointments, salves, gels, or creams as generally known in the art. The compounds can also be prepared in the form of suppositories or retention enemas for rectal delivery.

In one embodiment, the active compounds are prepared with carriers that will protect the compound against rapid elimination from the body, such as a controlled release formulation, including implants and microencapsulated delivery systems.

Biodegradable, biocompatible polymers can be used, such as ethylene vinyl acetate, polyanhydrides, polyglycolic acid, collagen, polyorthoesters, and polylactic acid.

Methods for preparation of such formulations will be apparent to those skilled in the art. The materials can also be obtained commercially from Alza Corporation and Nova Pharmaceuticals, Inc. Liposomal suspensions (including liposomes targeted to particular cells, e. g. , a pituitary cell) can also be used as pharmaceutically acceptable carriers. These can be prepared according to methods known to those skilled in the art, for example, as described in U. S. Patent No. 4,522, 811.

It is advantageous to formulate oral or parenteral compositions in dosage unit form for ease of administration and uniformity of dosage. Dosage unit form as used herein refers to physically discrete units suited as unitary dosages for the subject to be treated; each unit containing a predetermined quantity of active compound calculated to produce the desired therapeutic effect in association with the required pharmaceutical carrier.

Toxicity and therapeutic efficacy of such compounds can be determined by standard pharmaceutical procedures in cell cultures or experimental animals, e. g., for determining the LD50 (the dose lethal to 50% of the population) and the ED50 (the dose therapeutically effective in 50% of the population). The dose ratio between toxic and therapeutic effects is the therapeutic index and it can be expressed as the ratio LD50/ED50. Compounds which exhibit high therapeutic indices are preferred.

While compounds that exhibit toxic side effects may be used, care should be taken to design a delivery system that targets such compounds to the site of affected tissue in order to minimize potential damage to uninfected cells and, thereby, reduce side effects.

The data obtained from the cell culture assays and animal studies can be used in formulating a range of dosage for use in humans. The dosage of such compounds

lies preferably within a range of circulating concentrations that include the ED50 with little or no toxicity. The dosage may vary within this range depending upon the dosage form employed and the route of administration utilized. For any compound used in the method of the invention, the therapeutically effective dose can be estimated initially from cell culture assays. A dose may be formulated in animal models to achieve a circulating plasma concentration range that includes the IC50 (i. e., the concentration of the test compound which achieves a half-maximal inhibition of symptoms) as determined in cell culture. Such information can be used to more accurately determine useful doses in humans. Levels in plasma may be measured, for example, by high performance liquid chromatography.

As defined herein, a therapeutically effective amount of protein or polypeptide (i. e. , an effective dosage) ranges from about 0.001 to 30 mg/kg body weight, preferably about 0. 01 to 25 mg/kg body weight, more preferably about 0.1 to 20 mg/kg body weight, and even more preferably about 1 to 10 mg/kg, 2 to 9 mg/kg, 3 to 8 mg/kg, 4 to 7 mg/kg, or 5 to 6 mg/kg body weight. The protein or polypeptide can be administered one time per week for between about 1 to 10 weeks, preferably between 2 to 8 weeks, more preferably between about 3 to 7 weeks, and even more preferably for about 4,5, or 6 weeks. The skilled artisan will appreciate that certain factors may influence the dosage and timing required to effectively treat a subject, including but not limited to the severity of the disease or disorder, previous treatments, the general health and/or age of the subject, and other diseases present. Moreover, treatment-of a subject with a therapeutically effective amount of a compound can include a single treatment or, preferably, can include a series of treatments.

For antibody compounds that modulate MTP complex components, one preferred dosage is 0.1 mg/kg of body weight (generally 10 mg/kg to 20 mg/kg).

Generally, partially human antibodies and fully human antibodies have a longer half- life within the human body than other antibodies. Accordingly, lower dosages and less frequent administration is often possible. Modifications such as lipidation can be used to stabilize antibodies and to enhance uptake and tissue penetration. A method for lipidation of antibodies is described by Cruikshank et al. ( (1997) J Acquired Immune Deficiency Syndromes and Human Retrovirology 14: 193).

The present invention encompasses agents that modulate expression or activity of MTP complex components. An agent may, for example, be a small molecule. For example, agents include, but are not limited to, peptides, peptidomimetics (e. g., peptoids), amino acids, amino acid analogs, polynucleotides (e. g. , antisense and dsRNAs, e. g. , siRNAs), polynucleotide analogs, nucleotides, nucleotide analogs, organic or inorganic compounds (i. e. ,. including heteroorganic and organometallic compounds) having a molecular weight less than about 10,000 grams per mole, organic or inorganic compounds having a molecular weight less than about 5,000 grams per mole, organic or inorganic compounds having a molecular weight less than about 1,000 grams per mole, organic or inorganic compounds having a molecular weight less than about 500 grams per mole, and salts, esters, and other pharmaceutically acceptable forms of such compounds.

Exemplary doses include milligram or microgram amounts of the small molecule per kilogram of subject or sample weight (e. g. , about 1 microgram per kilogram to about 500 milligrams per kilogram, about 100 micrograms per kilogram to about 5 milligrams per kilogram, or about 1 microgram per kilogram to about 50 micrograms per kilogram. It is furthermore understood that appropriate doses of a small molecule depend upon the potency of the small molecule with respect to the expression or activity to be modulated. When one or more of these small molecules is to be administered to an animal (e. g. , a human) in order to modulate expression or activity of a polypeptide or nucleic acid of the invention, a physician, veterinarian, or researcher may, for example, prescribe a relatively low dose at first, subsequently increasing the dose until an appropriate response is obtained. In addition, it is understood that the specific dose level for any particular animal subject will depend upon a variety of factors including the activity of the specific compound employed, the age, body weight, general health, gender, and diet of the subject, the time of administration, the route of administration, the rate of excretion, any drug combination, and the degree of expression or activity to be modulated.

The nucleic acid molecules that modulate MTP complex components can be inserted into vectors and used as gene therapy vectors. Gene therapy vectors can be delivered to a subject by, for example, intravenous injection, local administration (see U. S. Patent 5,328, 470) or by stereotactic injection (see e. g. , Chen et al. Proc. Natl.

Acad. Sci. USA 91: 3054-3057,1994). The pharmaceutical preparation of the gene therapy vector can include the gene therapy vector in an acceptable diluent, or can comprise a slow release matrix in which the gene delivery vehicle is imbedded.

Alternatively, where the complete gene delivery vector can be produced intact from recombinant cells, e. g., retroviral vectors, the pharmaceutical preparation can include one or more cells which produce the gene delivery system. Ex vivo gene delivery is also possible. Gene transfer methods for gene therapy fall into three broad categories: physical (e. g. , electroporation, direct gene transfer and particle bombardment), chemical (e. g. , lipid-based carriers, or other non-viral vectors) and biological (e. g., virus-derived vector and receptor uptake). For example, non-viral vectors can be used which include liposomes coated with DNA. Such liposome/DNA complexes can be directly injected intravenously into the patient. Additionally, vectors or the"naked" DNA of the gene can be directly injected into the desired organ, tissue or tumor for targeted delivery of the therapeutic DNA.

The pharmaceutical compositions can be included in a container, pack, or dispenser together with instructions for administration.

Modulating Lifespan Regulation in Subjects Agents that alter MTP activity can be used to modulate lifespan regulation in subjects, e. g. , animal (e. g. , mammalian, e. g. , human subjects). The compositions can be administered to a subject, e. g. , an adult subject, e. g. , a healthy adult subject or a - subject having an age-related disease. In the latter case, the method can include evaluating a subject, e. g. , to characterize a symptom of an age-related disease or other disease marker, and thereby identifying a subject as having an age-related disease or being pre-disposed to such a disease. Exemplary age-related diseases include: cancer (e. g. , breast cancer, colorectal cancer, CCL, CML, prostate cancer); skeletal muscle atrophy; adult-onset diabetes; diabetic nephropathy, neuropathy (e. g. , sensory neuropathy, autonomic neuropathy, motor neuropathy, retinopathy); obesity; bone resorption; age-related macular degeneration, AIDS related dementia, ALS, Alzheimer's, Bell's Palsy, atherosclerosis, cardiac diseases (e. g. , cardiac dysrhythmias, chronic congestive heart failure, ischemic stroke, coronary artery disease and cardiomyopathy), chronic renal failure, type 2 diabetes, ulceration,

cataract, presbiopia, glomerulonephritis, Guillan-Barre syndrome, hemorrhagic stroke, rheumatoid arthritis, inflammatory bowel disease, multiple sclerosis, SLE, Crohn's disease, osteoarthritis, Parkinson's disease, pneumonia, and urinary incontinence.

Symptoms and diagnosis of such diseases are well known to medical practitioners.

The compositions may also be administered to individuals being treated by other means for such diseases, for example, individuals being treated with a chemotherapeutic (e. g. , and having neutropenia, atrophy, cachexia, nephropathy, neuropathy) or an elective surgery.

Subjects can be diagnosed and evaluated, e. g. , before, during, and after treatment. Standard medical procedures can be used to monitor the health and fitness of the subject. In addition, a parameter of metabolic activity (e. g. , insulin levels) can be monitored.

In some embodiments, the MTP modulating agent is directed to a particular cell (e. g. , by using a targeting vehicle or by using a cell-type specific regulatory sequence for a nucleic acid). For example, the agent can be targeting to an adipose, liver, pancreatic, brain, or skeletal muscle cell. In some examples, the targeted tissue participates in metabolic regulatin.

* * * The following non-limiting example illustrates a particular implementation of sample matching.

EXAMPLE 1 In a genome-wide linkage study for human longevity using 308 long-lived individuals (centenarians or near-centenarians) in 137 sibships, a locus was identified with statistically significant linkage within chromosome IV near microsatellite D4S1564. This interval spans 12 million base pairs and contains approximately 50 putative genes. A haplotype-based fine mapping was used to study the interval and identify the specific gene and gene variants impacting lifespan. The resulting genetic association study identified the gene for microsomal transfer protein (MTP) as accounting for significant variance in human lifespan. MTP has been identified as the rate limiting step in lipoprotein synthesis and may affect longevity by subtly modulating this pathway. This study provides proof of concept for the feasibility of

fine mapping linkage peaks using association studies and for the power of using the centenarian genome to identify genes impacting longevity.

The ability to survive to old age is partially under genetic influence (McGue, Vaupel et al. 1993; Herskind, McGue et al. 1996; Gudmundsson, Gudbjartsson et al.

2000; Perls, Shea-Drinkwater et al. 2000). Clearly individuals burdened by the fatal monogenic diseases of youth, such as cystic fibrosis, retinoblastoma, and muscular dystrophy have a reduced lifespan compared with the general population. However, although the effects of these harmful gene variants is large in magnitude with respect to affected individuals, because these mutations are extremely rare, all the monogenic diseases combined contribute little to the population variance in human lifespan.

There is demographic evidence that there is considerable heritability of human lifespan. Based on an analysis of longevity in twins, this heritability has been estimated at 25%, however the importance of genetic factors is likely greater at the extremes of age. For example, male and female siblings of centenarians have 17-fold and 8-fold greater relative risks respectively of surviving to age 100 and about half the death rate from age 20 to age 100 of birth-cohort matched individuals (Perls, Wihnoth et al. 2002).

These studies suggest that exceptional longevity is amenable to genetic studies, but not without the realization that for humans to achieve an age of one hundred years represents a complex interaction of genetics, environment, and chance. Lifespan can be conceptualized as the most complex trait of all, as this trait necessarily integrates genetic and environmental factors contributing to all diseases affecting human mortality. Accordingly, genetic variance in human lifespan within a population may be distributed over many genes with relatively subtle influences by any single gene.

The distribution of these effects (e. g. the number of genes accounting for much of the genetic variance) is an unanswered empirical question. If the variance in human lifespan is evenly distributed over large numbers of genes and gene variants (alleles), the likelihood of deciphering the individual contributions is small. Furthermore, if unspecified gene-gene and gene-environment interactions account for the majority of the variance, these difficulties will be compounded. Despite these concerns, an increasing number of genetic studies are reporting genes associated with human

longevity. These genes include ApoE, ApoB, and klotho (Kervinen, Savolainen et al.

1994; Schachter, Faure-Delanef et al. 1994; van Bockxmeer 1994; Arking, Krebsova et al. 2002), although only ApoE has been reproduced consistently. In order to achieve their extreme age, centenarians likely lack numerous gene variants that are associated with premature mortality and there is also the possibility that they are more likely to carry protective variants as well (Wachter 1997; Schachter 1998).

From linkage study to association study Results of a genome-wide linkage scan using 308 extremely long lived individuals in 137 sibships and linkage to exceptional longevity (i. e. living beyond the 5% survival tail) at chromosome IV near microsatellite D4S 1564 with a maximum LOD score of 3.65 (p = 0.044 genome-wide with non-parametric analysis) have been reported (Puca, Daly et al. 2001). No other chromosomal region achieved statistically significant linkage in this study. There are approximately 50 putative genes in the 12 million base pairs spanning the 85% confidence interval of this linkage peak, and a priori it was difficult to exclude any of the genes based on functional considerations.

In addition, it was possible that the polymorphism underlying the linkage was not within any of these 50"genes. "Therefore, an unbiased, systematic fine mapping of the region was desired. Larger numbers of sibling-pairs could be collected to study all specific polymorphisms in the region.

This study finely mapped the chromosome IV locus with the hope of identifying specific gene variants associated with exceptional longevity. Rather than bias the potential findings to regions of the locus containing well characterized genes, a systematic exploration of the linkage peak was conducted. With this aim, 2,000 single nucleotide polymorphisms (SNPs) (an average of one every 6 kb) within the longevity linkage locus were selected from the SNP consortium (TSC) database.

Based on experience with an earlier pilot study, only a fraction of these markers were expected to be useful in an association study. Of the 2000, a total of 875 SNPs were converted into successful genotyping assays and were determined to be polymorphisms with minor allele frequency greater than 5%.

From SNPs to haplotypes Although these validated SNP assays could have been used alone as markers in the association study described below, there were strong arguments to additionally build a haplotype map of the locus from these SNPs and then leverage the reconstructed haplotypes as genetic markers. A haplotype is a specific combination of alleles of nearby markers. In most cases, the power (informativeness) of a genetic marker with respect to an association study is increased when there are large numbers of variants of a single marker (unless the marker is the causative variant). Accordingly, SNP markers, which are biallelic, have less power to detect associations than multi- SNP haplotypes. Secondly, the diversity of the genome can be effectively captured by reducing it to sequential blocks of haplotypes with limited diversity (Johnson, Esposito et al. 2001; Patil, Berno et al. 2001; Stephens, Schneider et al. 2001).

Defining haplotypes provides the opportunity for selecting groups of markers which are minimally correlated with one another, which maximizes the statistical power per marker. Once the common haplotypes within a block have been defined, SNPs within the same block redundant for discriminating between the different haplotypes can be omitted for defining haplotypes. After removing SNPs redundant with respect to defining haplotypes within each block, 875 validated SNPs and approximately 700 "maximally informative SNPs"remained for using in association studies (see supplementary information). Finally, haplotype reconstruction provides a number of ways to assess the statistical coverage of a mapping effort and to model the recombination history within a locus.

Haplotype based approaches applied to smaller genomic regions have been demonstrated by others (Daly, Rioux et al. 2001; Johnson, Esposito et al. 2001; Rioux, Daly et al. 2001) and advantages over single markers have been shown (De Benedictis, Falcone et al. 1997; Stephens 1999; Davidson 2000). There is no generally accepted method for defining and recovering haplotypes from SNP-based data. The algorithms used in this study are outlined in the Methods.

Testing for association By densely genotyping across the 12 Mb region, a good draft of the underlying haplotype structure was constructed. Approximately 75% of the mapped region was within regions of strong linkage disequilibrium. Using this carefully reconstructed assortment of SNP-based haplotype markers, a case/control association study between groups of unrelated long-lived individuals (age 98 and older) and a much younger control population (less than 50 years of age) was conducted.

To reduce genotyping costs and to increase the power by confirming the hypothesis in independent populations, the study was divided in two sequential tiers of samples, with the first tier comparing 190 centenarians with 190 controls at SNP- based haplotype markers. These initial sample sizes were intended only as a preliminary screen of the region. This first attempt pointed in the direction of the MTP gene. Several SNPs and haplotype markers were"significant"at p < 0.05. All such markers are useful for evaluating a nucleic acid, e. g. , from a subject, e. g. , the predisposition for longevity of a subject.

The marker showing the strongest association (p = 0.0005) was the SNP rsl553432, located 72kb upstream from MTP. This association provided a potentially interesting first hypothesis to follow up with dense genotyping and haplotype mapping of the surrounding genes. A review of the December 2001 human genome draft showed four nearby areas of interest-the alcohol dehydrogenase (ADH) gene cluster, the partially characterized transcripts AL136838, AK000332 and microsomal transfer protein (MTP).

In the 250 kb region bracketing rsl553432, 60 SNPs were identified and validated. Several of these densely spaced SNPs showed strong associations when analyzed in the set of 190 cases and controls used above; most of these markers were located near the 5'end of MTP or just upstream of this gene, particularly densely near the promoter. All of the newly identified associations were in strong linkage disequilibrium with rsl 553432 (e. g. , they fell on the same"long-range"haplotype).

With interest narrowing in on a single gene, all known SNP polymorphisms for MTP and its promoter were genotyped in the original 190 cases (long lived individuals) and 190 controls (young individuals). After haplotype reconstruction of the area was completed, a single haplotype (see Fig. la), which was underrepresented in the long-

lived individuals, accounted for the majority of the statistical distortion at the locus.

Genotyping an additional 190 cases and controls further increased the strength of the association at this locus (p = 0.000005, relative risk = 0.56). See Table 1 for counts and frequencies of the haplotypes compared. This haplotype was seen in 27% of controls and 17% of long-lived individuals. Two of the many SNPs within this block (rs2866164 and MTP Q/H 95) were sufficient to distinguish this allele from all others.

These two SNPs were interesting because of their potential functional significance.

RS2866164 is perfectly correlated with another MTP promoter SNP, rsl800591 (also known as-493 G/T) that has been previously associated with several phenotypes including lipoprotein profiles, central obesity, and insulin resistance (see below).

MTP Q/H 95 results in a semi-conservative amino acid change (from glutamine to histidine) in exon three at the protein's 95th translated amino acid.

Table 1. Risk allele frequencies Cases (long-lived) - 493G allele-493T allele 95Q allele 546 (76%) 127 (17%) 95H allele 53 (7%) Controls - 493G allele-493T allele 95Q allele 498 (68%) 201 (27%) 95H allele 36 (5%) Table 1. Risk haplotype allele frequencies. Broken down into cases (long- lived) and controls, shows frequencies for the four possible haplotypes defined by the promoter (-493 G/T) and exon 3 (95 Q/H) polymorphisms. Note that only three of the four haplotypes was observed, fulfilling the criteria of no historic recombination

between the two SNPs. 726 out of 760 case chromosomes were successfully genotyped at both alleles in the long-lived individuals, compared to 735 out of 760 for the controls. As discussed in the text, the haplotype composed of the-493T allele and 95Q allele is underrepresented in long-lived individuals, suggesting this variant confers mortality risk. Note that the MTP-493 marker has multiple"twins" displaying identical statistical behaviour (see text).

Genetic stratification and controlling type I error Some genetic association studies have been plagued with false positive or other problematic results (Hirschhorn, Lohmueller et al. 2002). A recognized problem affecting genetic association studies is a failure to adequately match the genetic backgrounds of the cases and controls, a phenomenon called stratification.

This association study which compares individuals born decades apart can be potentially vulnerable to this confounder because the geographic distribution of ethnicities has changed over the past 100 years. Specifically, this case population reflects the ethnic distribution of the United States near the beginning of the last century while the control population was sampled from more recent generations. To minimize this problem, only DNA from people who identified themselves as "Caucasian"was used but even this class is obviously a diverse group.

Consequently, cases and controls would differ not only with respect to the longevity phenotype but also have ethnicity as an uncontrolled confounder. If the effect is strong enough, associations will be found reflecting these ethnic differences rather than differences in lifespan. There are accepted ways of checking and correcting for potential stratification, one of which is described in the Methods. The mean chi-square for randomly selected SNP markers (representing differences in genetic background) for the 380 cases and controls tested above was 1. 51 (compared with an expected value of 1.0). Although, modest, any amount of stratification is undesirable and the methods of correcting for this potential confounder have not been empirically well validated.

To avoid correcting for the hundreds of partially independent hypotheses tested with the original sample set and to simultaneously eliminate stratification as a

problem, proactive sample matching was used. 250 cases were proactively matched (see also below) against individuals selected from a new set of 463 potential controls.

Using the approach discussed in the Methods section, a subgroup of 250 controls from the potential controls was selected that best matched the cases with respect to genetic background. The mean chi-square for this group of samples (using and independent group of SNPs) was 0.92, indicating a very high level of genetic balance. None of these samples was used to generate the single hypothesis being pursued, allowing testing the single inference that the risk haplotype was underrepresented in long-lived individuals. The association at this haplotype was confirmed with this well matched group of cases and controls (p = 0. 01 by G-Test, p = 0. 0027 by Hotelling-T test, relative risk = 0.69).

Although the interaction between rs2866164 and Q/H 95 was sufficient to account for all the association at the locus, it is imprudent to conclude, at least without the following analysis, that the polymorphisms were causative with respect to longevity. In particular, a few"twins" (SNPs whose alleles are perfectly correlated) of-493 G/T were identified that, in combination with Q/H 95 could equally explain the data. Ideally, because simpler models are preferred over more complex solutions, a single SNP"tagging" (i. e. distinguishing) the risk haplotype would be favored over the two SNP interaction model.

To search for"tagging"SNPs, a resequencing strategy intended to minimize the number of samples assayed was used. The details of this strategy are described in the Methods. This procedure was applied to the 12kb within the risk block and the 72 kb block of DNA extending up to the initial rsl553432 SNP. In addition, all 18 exons of MTP were sequenced in a group of 50 long-lived individuals to search for rare functional polymorphisms that would not fall on well-defined haplotypes. Altogether, 104 SNPs were identified, although none uniquely tagged the risk haplotype. After adding the additional SNPs to the map, a new block structure was defined with significant changes (FIG. lb), but no evidence of recombination between MTP-493 G/T and MTP Q/H 95 was observed. Because a single SNP marker could not explain the association, the most parsimonious model involved an interaction between the two original functional SNPs.

After confirming the MTP finding, there remained the possibility that additional genes associated with longevity contribute to the linkage peak. To be as thorough as possible, all of the hundreds of SNPs or SNP-based haplotypes genotyped in the first set of 190 cases and controls significantly associated at p < 0.05 were tested in independent samples, as described above for MTP. At the end of this sequential process, there were no additional associations that survived the proper corrections discussed above, although larger sample sizes and/or more perfect sample matching may reveal additional associations in the future. 190 cases and controls were genotyped using at least 5 SNPs near all the well-characterized genes under the locus, which involved assaying an additional 55 SNP markers. This effort yielded no additional associations. Thus the MTP locus is the locus that most adequately explains the original linkage result.

Some Implications This study demonstrates that centenarians and near-centenarians can serve as a model for studying human longevity and disease resistance (Barzilai, Gabriely et al.

2001). A population that has escaped or delayed the lethal pathologies of old age is useful for detecting genetic factors that impact the diseases of aging (Silverman, Smith et al. 1999). Here, a haplotype-based linkage disequilibrium mapping approach identified a risk allele based on an initial finding contributed by a linkage study. The complex trait linkage peak ultimately resulted in the identification of a specific gene variant.

FIG. 1 depicts haplotype-blocks at MTP locus: (a) The original haplotype block defined by publicly available SNPs containing RS2866164 (circled box) and MTP Q/H 95 (boxed). The arrow indicates the risk haplotype. (b) a more refined map that include 61 novel SNPs showing MTP-493 G/T (boxed) and MTP Q/H 95 belonging to different haplotype blocks but in strong linkage disequilibrium. In circled boxes there are SNPs perfectly correlated with MTP-493 G/T. Dashes lines indicate haplotypes which are commonly linked across haplotype boundaries.

Asterisks indicate maximally informative SNPs. (c) relative frequency of the different haplotypes in trios and their sum. (d) Degree of Linkage disequilibrium between the

blocks estimated as d-prime. To conserve space, many statistically redundant SNPs were removed from the figure. For more details see figure 2 of Daly, Rioux et al.

2001.

FIG. 2. is a schematic describing the search for genes affecting human longevity. Before any genotyping began, it was important to demonstrate evidence that longevity runs in families (70) and, consequently, the prior probability of finding longevity-modulating genes was high. The subsequent linkage genome-wide scan focused attention on an extended region of chromosome IV (72). To identify the specific alleles involved, a haplotype map of the region was created using familial trios (74) and this map was used to identify a specific risk haplotype, as described herein. The study included a haplotype association study of long lived individuals compared to controls (76) and testing of associations with independent samples (78).

Several rounds of SNP discovery (80), haplotype reconstruction, and mapping (82) were required to exhaust the search for potentially causative variants (84). Because MTP can only explain a small fraction of the total genetic variance in human longevity and there may be dozens of genes with a similar association, in the near future additional studies will likely yield an insight into the genetic basis of longevity, aging, and disease resistance.

Methods Sample ascertainment and phenotyping. The study sample consists of individuals 98 years and older. Individuals were identified and recruited by a variety of methods including institutional websites, direct mailings and advertisement in newspapers geared towards potential participants or organizations involved with the aging community. Physical and cognitive health was not used as participation criteria.

All participants and/or their legally authorized representatives took part in a written informed consent process. Additional collected data included health and socio- demographic histories, proof of age, usually in the form of a birth certificate, a three- generation pedigree and measures to assess functional independence and cognitive status.

Potential biases in the study may include subtle sample bias towards healthier study participants as a result of recruitment methods. For example, contact may result

in part from the families of potential study participants with higher physical and cognitive status than the average nonagenarian/centenarian. This may explain the lower incidence of age associated diseases (i. e. cardiovascular disease, stroke) in the study group than expected. Controls (self-identified as"Caucasian"and less than 50 years of age) were obtained from several anonymous sources in the U. S. and Europe.

SNP validation. To screen this initial set of SNPs, 19 familial trios (mother, father, and offspring) acquired from the Centre d'Etude du Polymorphisme Humain (CEPH) Repository were genotyped at all selected markers. Of these 2000 markers, 1494 had high confidence calls on the MassArrayTM platform. Of these markers, 990 had a minor allele frequency of at least 5%. SNPs of lower heterozygosity were excluded because of the reduced power of such markers with respect to mapping complex traits in association studies with limited sample size. Of the remaining SNPs, 113 were eliminated because the frequency distribution of the two types of homozygotes and heterozygotes as not statistically compatible with Hardy-Weinberg equilibrium. These failures were attributed to systematic artifacts introduced by the genotyping platform. The use of familial trios allowed a Mendelian check on the validity of each SNP assay. If more than one Mendelian inheritance error per assay was detected within the 19 trios, the assay was judged unreliable. Finally, of the 875 remaining SNPs, approximately 700"maximally informative SNPs"were required to reconstruct all the identified haplotypes.

Genotyping. Potential SNPs were retrieved from the Human genome draft database. Assays were designed using spectroDESIGNER software (Sequenom, Inc.) to be multiplexed up to five times.

SNP genotyping was performed by Sequenom's chip-based matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry (DNA MassARRAY) on PCR-based extension products from individual DNA samples.

Cases and controls were always run on the same chip to avoid potential artifacts due to chip-specific miss-calls.

Sequencing. Samples homozygous with respect to the risk block were identified. Two homozygotes for each of the five haplotypes were selected for the sequencing of 84 kb spanning rsl553432 and the risk block. Sequencing was performed on the AB 3100 using a BigDye termination (version 3) chemistry on RapXtract (from Prolinks Inc. ) purified PCR products. Phred program (by Codoncode) was used for quality scores and Sequencer (by Genecodes) for sequence comparisons and SNP detection.

Haplotype reconstruction. 19 familial trios (mother, father, offspring) were genotyped with densely spaced SNP markers in order to create a haplotype map of the 12cM region. F or each trio, the parental origin of offspring alleles was determined for all cases where phase could be resolved unambiguously. In cases where phase was ambiguous (i. e. triple heterozygotes), the data were treated as missing. By applying this method, four parental chromosomes were reconstructed, with intermittent missing allele data. For this example, haplotypes were used that correspond to a region of DNA with little evidence (<2.5%) for meiotic recombination within the common genetic history of the individuals genotyped.

In situations where the boundaries were ambiguous, a second heuristic was applied that assigned boundaries in such as way to minimize the size (i. e. base pairs) within each block. With haplotype boundaries assigned, haplotype frequencies were estimated for each haplotype allele using an Expectation Maximization (EM) algorithm (Excoffier and Slatkin 1995). Any haplotype that had a frequency of less than 2.5% was excluded from further analysis to avoid possible errors in either the genotyping or the estimation process. Within each haplotype block, between 2 and 6 common SNP-based haplotypes were observed, and each of these haplotypes could be used as genetic markers.

In order to reconstruct haplotypes for the case/control association studies, the haplotype boundaries and allele frequency estimates established in the trios are used as initial parameters to seed the haplotype allele frequency estimations from genotyping the cases and controls. This seeding is important because of the significant amount of ambiguous phase information present in pairs of unrelated

chromosomes. In cases where haplotype data could not be estimated with >95% confidence, the haplotype allele was treated as missing.

Tests of association. The G-Test with Williams correction (a statistic following a chi-square distribution) was used to test inferences about associating genetic markers (haplotype or SNP) with the longevity phenotype (Sokal and Rohlf 2000). For each allele, 2x2 contingency tables were constructed as +/-allele vs. +/- longevity. For tests where only one direction of allele frequency difference was tested, p values were divided by two. The Hotelling T test is the multivariate extension of the Student's T test and has recently been applied to genetic data (Xiong, Zhao et al.

2002).

Testing for stratification. 60 random SNP markers were genotyped in all cases and controls and chi-square values were calculated from the allele counts.

Because these SNPs were selected at random, any differences in allele frequencies were inferred as representative of the differences in genetic backgrounds between cases and controls. If the genetic backgrounds of the two armed study were perfectly matched, the mean chi-square of the G-Test statistics for these markers have an expected value of 1.0.

Proactive sample matching. 60 random SNP markers (non-overlapping with the stratification panel described above) were genotyped in 250 cases and 463 controls. Homozygotes for the minor allele were assigned the value-1, heterozygotes 0, and major allele homozygotes 1. Based on the multivariate means calculated from this coded data, a subgroup of the 250 controls was selected that minimized the Mahalanobis distance with respect to the case samples. The Mahalanobis distance is a measure of distance between two multivariate means that normalizes each dimension based on the covariance matrix: D---, where I ; is a vector representing the mean genotyping values of the cases, is the mean vector for the controls, and S-'is the inverse of the covariance matrix.

References 1. Gudmundsson, H. , Gudbjartsson, D. F. , Frigge, M. , Gulcher, J. R. & Stefansson, K. Inheritance of human longevity in Iceland. Eur J Hum Genet 8,743-9 (2000).

2. Perls, T. et al. Exceptional Familial Clustering for Extreme Longevity in Human. J Am Geriatr Soc 48,1483-1485 (2000).

3. Herskind, A. M. et al. The heritability of human longevity: a population-based study of 2872 Danish twin pairs born 1870-1900. Hum Genet 97, 319-23 (1996).

4. McGue, M. , Vaupel, J. W., Holm, N. & Harvald, B. Longevity is moderately heritable in a sample of Danish twins born 1870-1880. J Gerontol 48, B237-44 (1993).

5. Perls, T. et al. Life-long sustained mortality advantage of siblings of centenarians. Proc Natl Acad Sci U S A 99,8442-8447 (2002).

6. van Bockxmeer, F. M. ApoE and ACE genes: impact on human longevity. Nat Genet 6,4-5 (1994).

7. Schachter, F. et al. Genetic associations with human longevity at the APOE and ACE loci. Nat Genet 6,29-32 (1994).

8. Arking, D. E. et al. Association of human aging with a functional variant of klotho. ProcNatl Acad Sci US A 99, 856-861 (2002).

9. Kervinen, K. et al. Apolipoprotein E and B polymorphisms--longevity factors assessed in nonagenarians. Atherosclerosis 105,89-95 (1994).

10. Schachter, F. Causes, effects, and constraints in the genetics of human longevity. Am J Hum Genet 62,1008-14 (1998).

11. Wachter, K. W. In Between Zeus and the Salmon. The Biodemography of Longevity (National Academy Press, Washington, D. C. , 1997).

12. Puca, A. A. et al. A genome-wide scan for linkage to human exceptional longevity identifies a locus on chromosome 4. Proc Natl Acad Sci U S A 98,10505-8 (2001).

13. Patil, N. et al. Blocks of limited haplotype diversity revealed by high- resolution scanning of human chromosome 21. Science 294, 1719-23 (2001).

14. Stephens, J. C. et al. Haplotype variation and linkage disequilibrium in 313 human genes. Science 293,489-93 (2001).

15. Johnson, G. C. et al. Haplotype tagging for the identification of common disease genes. Nat Genet 29,233-7 (2001).

16. Daly, M. J. , Rioux, J. D., Schaffner, S. F., Hudson, T. J. & Lander, E.

S. High-resolution haplotype structure in the human genome. Nat Genet 29,229-32 (2001).

17. Rioux, J. D. et al. Genetic variation in the 5q31 cytokine gene cluster confers susceptibility to Crohn disease. Nat Genet 29,223-8 (2001).

18. De Benedictis, G. et al. DNA multiallelic systems reveal gene/longevity associations not detected by diallelic systems. The APOB locus. Hum Genet 99,312-8 (1997).

19. Stephens, J. C. Single-nucleotide polymorphisms, haplotypes, and their relevance to pharmacogenetics. Mol Diagn 4,309-17 (1999).

20. Davidson, S. Research suggests importance of haplotypes over SNPs.

NatBiotechnol 18,1134-5 (2000).

21. Hirschhorn, J. N. , Lohmueller, K., Byrne, E. & Hirschhorn, K. A comprehensive review of genetic association studies. Genet Med 4,45-61 (2002).

22. Wetterau, J. R. , Lin, M. C. & Jamil, H. Microsomal triglyceride transfer protein. Biochim Biophys Acta 1345, 136-50 (1997).

23. Shelness, G. S. & Sellers, J. A. Very-low-density lipoprotein assembly and secretion. Curr Opin Lipidol 12,151-7 (2001).

24. Jamil, H. et al. Evidence that microsomal triglyceride transfer protein is limiting in the production of apolipoprotein B-containing lipoproteins in hepatic cells. J Lipid Res 39,1448-54 (1998).

25. Wu, X. , Zhou, M. , Huang, L. S. , Wetterau, J. & Ginsberg, H. N.

Demonstration of a physical interaction between microsomal triglyceride transfer protein and apolipoprotein B during the assembly of ApoB-containing lipoproteins. J BiolChem271, 10277-81 (1996).

26. Berriot-Varoqueaux, N. , Aggerbeck, L. P. , Samson-Bouma, M. & Wetterau, J. R. The role of the microsomal triglygeride transfer protein in abetalipoproteinemia. Annu Rev Nutr 20,663-97 (2000).

27. Raabe, M. et al. Knockout of the abetalipoproteinemia gene in mice: reduced lipoprotein secretion in heterozygotes and embryonic lethality in homozygotes. Proc Natl Acad Sci U S A 95,8686-91 (1998).

28. Tietge, U. J. et al. Hepatic overexpression of microsomal triglyceride transfer protein (MTP) results in increased in vivo secretion of VLDL triglycerides and apolipoprotein B. J Lipid Res 40,2134-9 (1999).

29. Raabe, M. et al. Analysis of the role of microsomal triglyceride transfer protein in the liver of tissue-specific knockout mice. J Clin Invest 103,1287- 98 (1999).

30. Bjorkegren, J. , Beigneux, A. , Bergo, M. O., Maher, J. J. & Young, S.

G. Blocking the secretion of hepatic very low density lipoproteins renders the liver more susceptible to toxin-induced injury. J Biol Chem 277,5476-83 (2002).

31. Wetterau, J. R. et al. An MTP inhibitor that normalizes atherogenic lipoprotein levels in WHHL rabbits. Science 282,751-4 (1998).

32. Lin, M. C. et al. Garlic inhibits microsomal triglyceride transfer protein gene expression in human liver and intestinal cell lines and in rat intestine. J Nutr 132, 1165-8 (2002).

33. Lin, M. C. et al. Ethanol down-regulates the transcription of microsomal triglyceride transfer protein gene. Faseb J 11, 1145-52 (1997).

34. Wilcox, L. J. , Borradaile, N. M. , de Dreu, L. E. & Huff, M. W.

Secretion of hepatocyte apoB is inhibited by the flavonoids, naringenin and hesperetin, via reduced activity and expression of ACAT2 and MTP. J Lipid Res 42,725-34 ' (2001).

35. Karpe, F., Lundahl, B. , Ehrenborg, E., Eriksson, P. & Hamsten, A. A common functional polymorphism in the promoter region of the microsomal triglyceride transfer protein gene influences plasma LDL levels. Arterioscler Thromb Vasc Biol 18,756-61 (1998).

36. Couture, P. et al. Absence of association between genetic variation in the promoter of the microsomal triglyceride transfer protein gene and plasma lipoproteins in the Framingham Offspring Study. Atherosclerosis 148,337-43 (2000).

37. Juo, S. H. , Han, Z. , Smith, J. D. , Colangelo, L. & Liu, K. Common polymorphism in promoter of microsomal triglyceride transfer protein gene influences cholesterol, ApoB, and triglyceride levels in young african american men: results from the coronary artery risk development in young adults (CARDIA) study.

Arterioscler Thromb Vase Biol 20,1316-22 (2000).

38. Ledmyr, H. et al. Variants of the microsomal triglyceride transfer protein gene are associated with plasma cholesterol levels and body mass index. J Lipid Res 43, 51-8 (2002).

39. St-Pierre, J. et al. Visceral obesity and hyperinsulinemia modulate the impact of the microsomal triglyceride transfer protein-493G/T polymorphism on plasma lipoprotein levels in men. Atherosclerosis 160,317-24 (2002).

40. Talmud, P. J. , Palmen, J. , Miller, G. & Humphries, S. E. Effect of microsomal triglyceride transfer protein gene variants (-493G > T, Q95H and H297Q) on plasma lipid levels in healthy middle-aged UK men. Ann Hum Genet 64, 269-76 (2000).

41. Herrmann, S. M. et al. Identification of two polymorphisms in the promoter of the microsomal triglyceride transfer protein (MTP) gene: lack of association with lipoprotein profiles. J Lipid Res 39,2432-5 (1998).

42. Rainwater, D. L. et al. A genome search identifies major quantitative trait loci on human chromosomes 3 and 4 that influence cholesterol concentrations in small LDL particles. Arterioscler Thromb Vase Biol 19,777-83 (1999).

43. Austin, M. A. et al. Candidate-gene studies of the atherogenic lipoprotein phenotype: a sib-pair linkage analysis of DZ women twins. Am J Hum Genet 62,406-19 (1998).

44. Barzilai, N. , Gabriely, I., Gabriely, M., Iankowitz, N. & Sorkin, J. D.

Offspring of centenarians have a favorable lipid profile. J Am Geriatr Soc 49,76-9 (2001).

45. Terry, D. , Wilcox, M., McCormick, M. , Lawler, E. & Perls, T.

Cardiovascular Advantages Among the Offspring of Centenarians. Journal Gerontological Medical Science In Press (2003).

46. Glueck, C. J. , Gartside, P. S. , Mellies, M. J. & Steiner, P. M. Familial hypobeta-lipoproteinemia: studies in 13 kindreds. Trans Assoc Am Physicians 90, 184-203 (1977).

47. Silverman, J. M. et al. Identifying families with likely genetic protective factors against Alzheimer disease. Am J Hum Genet 64,832-8 (1999).

48. Excoffier, L. & Slatkin, M. Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Biol Evol 12,921-7 (1995).

49. Sokal, R. R. & Rohlf, F. J. Biometry (W. H. Freeman and Company, New York, 2000).

50. Xiong, M. , Zhao, J. & Boerwinkle, E. Generalized T2 test for genome association studies. Am J Hum Genet 70,1257-68 (2002).

EXAMPLE 2 Association with Exceptional Longevity on Human Chromosome 4 at Marker ru 1553432 Prior to the fine resolution mapping described above in Example 1, the following analysis was used to identify SNP haplotypes in the chromosomal region chr4: 99990059-112943333. Both SNPs and blocks of SNP haplotypes were used as genetic markers for association analysis. Although SNP frequencies can be directly and exactly calculated from the experimental data (collected using the Sequenom MassARRAYTM), haplotype frequencies must be estimated using a probabilistic algorithm and reconstructed with a set of heuristics.

A SNP haplotype is a chromosome specific unique permutation of nearby bi- allelic markers. For each chromosomal region, each diploid organism has two haplotypes (one each for the maternally and paternally derived chromosomes). The SNP genotype data provided from the Sequenom Mass Array does not distinguish phase-if a locus is heterozygous it is not known how the two alleles segregate

between the maternal and paternal chromosomes. Recovering the genetic phase of the data is greatly assisted by the use of familial trios (parents plus one offspring). Each familial trio represents four unique chromosomes and the phase of each allele can be inferred via Mendelian inheritance rules except in the case where all trio members are heterozygous at a particular locus or genotyping fails for one of the individuals.

To reconstruct ambiguous or missing data, the most likely information is inferred using an expectation maximization (EM) algorithm, as described elsewhere (Demster, J. R. Stat. Soc. 39: 1-38 (1977) ). This method is successful because certain permutations of nearby SNP alleles are far more common than others and this information can help reconstruct the most likely population frequencies.

The phase-resolved data is divided into a series of haplotype blocks using a coalescent based zero-recombination model. Briefly, haplotype diversity under coalescent theory is the consequence of two independent processes-new mutation and genetic recombination during meiosis. In the absence of recombination, certain combinations of alleles are not possible. For example, in the two-locus bi-allelic case, if the haplotypes 01, 00, and 11 occur in the population, 10 will be absent, unless recombination has occurred. The chances of recombination occurring is monotonically dependent on the distance between two markers. The haplotype block has been defined as a series of bi-allelic loci (SNPs) where very limited (less than 2.5%) recombination is experimentally observed. Although the zero-recombination rule constrains the solutions space, for extended genomic regions the haplotype map reconstructed by this method is not unique. To arrive at a single solution, the additional heuristic is introduced to group SNPs which are physically close to one another.

With the trio-based haplotype map for a genomic region, a subset of SNPs which recover the majority of the diversity (> 95%) are selected for further genotyping in the case and control samples. Eliminating all but these"maximally informative"SNPs (MIS) saves genotyping costs and time. In addition, SNPs are removed from further consideration that cannot be reliably genotyped, that have excessive Mendelian inheritance errors, are out of Hardy-Weinberg equilibrium, or have a minor allele frequency of less than 5%.

After the case/control samples are genotyped using the maximally informative SNPs, population frequencies of the SNP and haplotype markers are recovered as described above. For each marker, a two-by-two contingency table is constructed as absence vs. presence of marker against case vs. control population. A goodness-of-fit statistic based on likelihood ratios and William's corrected is tested against a chi- square distribution with one degree of freedom. A significant test of association is assumed when the Bonferroni corrected p value for rejecting the null hypothesis is less than 0.05.

According to this method, a haplotype map for the chromosomal region chr4: 99990059-112943333 (between markers D4S2986 and D4S406) was constructed by genotyping 2000 SNP markers in 20 familial trios (60 individuals, representing 80 unique chromosomes). All genomic coordinates reference the December 2001 draft of the human genome (archival materials available online at the UCSC Human Genome Project website). Slightly more than half the markers were excluded from further consideration for reasons described above. Using the remaining 991 SNPs, a haplotype block map was reconstructed and 715 MIS were genotyped in 100 centenarians and 100 controls. SNP and haplotype marker frequencies in the two populations were tested for association by the G-Test. Forty markers were selected for further study based on inability to reject the null hypothesis at p < 0.05, although no locus was statistically significant after correcting for multiple testing. These 40 loci were genotyped in an additional 100 centenarians and controls.

For the pooled analysis of the first 400 samples, three loci were initially identified, although not statistically significant after Bonferroni was applied. These three genomic loci were tested in an additional 200 centenarians and 200 controls. One of these loci (represented by the SNP marker rsl553432, at genomic location chr4: 100813844) was highly statistically associated in the pooled comparison of 400 centenarians against 400 controls (representing 800 chromosomes in each arm), with a p = 0.000008 (p = 0.006 after Bonferroni correction). The minor allele was significantly under represented in the centenarians for this bi-allelic marker (5% in centenarians vs. 11.2% in controls).

To further analyze of the genetic diversity in this region, 350 kb centered on rsl 553432 was densely genotyped and a haplotype block map of this region was

constructed in trios. The second set of 200 centenarians and controls was genotyped for this denser set of markers, and because only one locus was tested, no Bonferroni correction was applied. With respect to the minor allele in centenarians the markers skewed as follows: rsl354368 : p = 0.026 rsl873517 : p = 0.0041 rsl873516 : p = 0.021 rsl491235 : p = 0.0076 rsl491233 : p = 0.00083 rsl503777 : p = 0.016 rs2866164: p = 0. 0010 rsl057613 : p = 0.0013 rs745075 : p = 0.030 rsl491245 : p = 0. 037 rus 1061271 : p = 0.022 In addition, haplotype blocks consisting of the SNPs rs2654849-rsl553432- <BR> <BR> <BR> <BR> rs1032827, rsl873517-rsl873516-rsl491235-rsl491233, andrsl503777-rs2866164- rsl057613 all contain haplotype blocks significantly skewed between the two populations.

There are at least two genes in the area of maximal statistical skewing, DKFZP434G072, and MTP. The DKFZP434G072 gene is expressed in testes. There is also evidence for ADH7 being associated with the longevity locus.

Other embodiments are within the following claims.