INTERROGATORY CELL-BASED ASSAYS FOR INDENTIFYING DRUG-INDUCED TOXICITY MARKERS

Title:

INTERROGATORY CELL-BASED ASSAYS FOR INDENTIFYING DRUG-INDUCED TOXICITY MARKERS

Document Type and Number:

WIPO Patent Application WO/2013/176694

Kind Code:

Abstract:

Described herein is a discovery Platform Technology for analyzing a drug- induced toxicity condition, such as cardiotoxicity via model building.

More Like This:

WO/1993/023537	CHIMERIC MULTIVALENT PROTEIN ANALOGUES AND METHODS OF USE THEREOF
WO/2010/006072	MTOR MODULATORS AND USES THEREOF

Inventors:

NARAIN NIVEN RAJIN (US)
SARANGARAJAN RANGAPRASAD (US)
VISHNUDAS VIVEK K (US)

Application Number:

PCT/US2012/054323

Publication Date:

November 28, 2013

Filing Date:

September 07, 2012

Export Citation:

Click for automatic bibliography generation Help

Assignee:

BERG PHARMA LLC (US)
NARAIN NIVEN RAJIN (US)
SARANGARAJAN RANGAPRASAD (US)
VISHNUDAS VIVEK K (US)

International Classes:

G01N33/48; G16B5/00; G01N33/53; G16B20/20; G16B20/50

Domestic Patent References:

WO2004063334A2	2004-07-29
WO2007084187A2	2007-07-26

Foreign References:

US20070054269A1	2007-03-08
US20110059089A1	2011-03-10
US20060019888A1	2006-01-26
US20110275563A1	2011-11-10
US20110003707A1	2011-01-06

Other References:

VIJAYAKUMAR SUKUMARAN ET AL., INTERNATIONAL JOURNAL OF BIOLOGICAL SCIENCES, 11 February 2011 (2011-02-11), pages 2
See also references of EP 2852839A4

Attorney, Agent or Firm:

HANLEY, Elizabeth, A. et al. (LLP265 Franklin Stree, Boston MA, US)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

We claim:

1. A method for identifying a drug that causes or is at risk for causing drug-induced cardiotoxicity, comprising: comparing (i) a level of expression of one or more biomarkers present in a first cell sample obtained prior to the treatment with the drug; with (ii) a level of expression of the one or more biomarkers present in a second cell sample obtained following the treatment with the drug; wherein the one or more biomarkers is selected from the markers listed in table 2; wherein a modulation in the level of expression of the one or more biomarkers in the second sample as compared to the first sample is an indication that the drug causes or is at risk for causing drug- induced cardiotoxicity.

2. A method for identifying a rescue agent that can reduce or prevent drug-induced cardiotoxicity comprising: (i) determining a normal level of expression of one or more biomarkers present in a first cell sample obtained prior to the treatment with a cardiotoxicity inducing drug; (ii) determining a treated level of expression of the one or more biomarkers present in a second cell sample obtained following the treatment with the cardiotoxicity inducing drug to identify one or more biomarkers with a change of expression in the treated cell sample; (iii) determining the level of expression of the one or more biomarkers with a changed level of expression in the cardiotoxicity inducing drug treated sample present in a third cell sample obtained following the treatment with the cardiotoxicity inducing drug and the rescue agent; and (iv) comparing the level of expression of the one or more biomarkers determined in the third sample with the level of expression of the one or more biomarkers present in the first sample; wherein the one or more biomarkers is selected from the markers listed in table 2; and wherein a normalized level of expression of the one or more biomarkers in the third sample as compared to the first sample is an indication that the rescue agent can reduce or prevent drug-induced cardiotoxicity.

3. A method for alleviating, reducing or preventing drug-induced cardiotoxicity, comprising administering to a subject a rescue agent identified by the method of claim 2, thereby reducing or preventing drug-induced cardiotoxicity in the subject.

4. The method of any one of claims 1-3, wherein the one or more biomarkers is selected from the group consisting TIMPl, PTX3, HSP76, FINC, CYB5, PAIl, IBP7 (IGFBP7), 1C17, EDIL3, HMOX1, NUCB1, CSOIO, and HSPA4.

5. The method of claim 4, wherein the drug-induced cardiotoxicity is

cardiomyopathy, heart failure, atrial fibrillation, cardiomyopathy and heart failure, heart failure and LV dysfunction, atrial flutter and fibrillation, or heart valve damage and heart failure.

6. The method of any one of claims 1-5, wherein the cell samples are

cardiomyocytes or diabetic cardiomyocytes.

7. The method of any one of claims 1-3, wherein the drug is a cancer drug, diabetic drug, neurological drug, or anti-inflammatory drug.

8. The method of any one of claims 1-7, wherein the drug is Anthracyclines, 5- Fluorouracil, Cisplatin, Trastuzumab, Gemcitabine, Rosiglitazone, Pioglitazone, Troglitazone, Cabergoline, Pergolide, Sumatriptan, Bisphosphonates, or TNF antagonists.

9. The method of claim 3, wherein the subject is a mammal, a human, or a non- human animal.

10. The method of claim 3, wherein the rescue agent is administered to a subject that has already been treated with a cardiotoxicity-inducing drug.

11. The method of claim 3, wherein the rescue agent is administered to a subject at the same time as treatment of the subject with a cardiotoxicity-inducing drug.

12. The method of claim 3, wherein the rescue agent is administered to a subject prior to treatment of the subject with a cardiotoxicity-inducing drug.

13. The method of claim 3, wherein the rescue agent is Coenzyme Q10.

14. The method of claim 3, wherein the rescue agent is not Coenzyme Q10.

15. The method of claim 3, further comprising monitoring the subject for drug induced cardiotoxicity.

16. A method for identifying a modulator of drug-induced toxicity, said method comprising:

(1) establishing a model for drug-induced toxicity, using cells associated with drug-induced toxicity, to represents a characteristic aspect of drug-induced toxicity;

(2) obtaining a first data set from the model for drug-induced toxicity, wherein the first data set represents one or more of genomics, lipidomics, proteomics, metabolomics, transcriptomics, and single nucleotide polymorphism (SNP) data characterizing the cells associated with drug-induced toxicity;

(3) obtaining a second data set from the model for drug-induced toxicity, wherein the second data set represents a functional activity or a cellular response of the cells associated with drug-induced toxicity;

(4) generating a consensus causal relationship network among the expression levels of the one or more of genomics, lipidomics, proteomics,

metabolomics, transcriptomics, and single nucleotide polymorphism (SNP) data and the functional activity or cellular response based solely on the first data set and the second data set using a programmed computing device, wherein the generation of the consensus causal relationship network is not based on any known biological relationships other than the first data set and the second data set;

(5) identifying, from the consensus causal relationship network, a causal relationship unique in drug-induced toxicity, wherein a gene, lipid, protein, metabolite, transcript, or SNP associated with the unique causal relationship is identified as a modulator of drug-induced toxicity.

17. The method of claim 16, wherein second data set representing the functional activity or cellular response of the cells comprises one or more of bioenergetics, cell proliferation, apoptosis, organellar function, a genotype-phenotype association actualized by functional models selected from ATP, ROS, OXPHOS, and Seahorse assays, global enzyme activity, and an effect of global enzyme activity on the enzyme metabolic substrates of cells associated with drug-induced toxicity.

18. The method of claim 17, wherein the global enzyme activity is global kinase activity, and wherein the effect of global enzyme activity on the enzyme metabolic substrates is the phospho proteome.

19. The method of any one of claims 1 to 3 and 16, wherein the first data set comprises two or more of genomics, lipidomics, proteomics, metabolomics, transcriptomics, and single nucleotide polymorphism (SNP) data.

20. The method of claim 16, wherein step (4) is carried out by an artificial intelligence (AI) -based informatics platform.

21. The method of claim 20, wherein the AI-based informatics platform comprises REFS(TM).

22. The method of claim 21, wherein the AI-based informatics platform receives all data input from the first data set and the second data set without applying a statistical cut-off point.

23. The method of claim 16, wherein the consensus causal relationship network established in step (4) is further refined to a simulation causal relationship network, before step (5), by in silico simulation based on input data, to provide a confidence level of prediction for one or more causal relationships within the consensus causal relationship network.

24. The method of claim 16, wherein the unique causal relationship is identified as part of a differential causal relationship network that is uniquely present in cells, and absent in the matching control cells.

25. The method of claim 16, wherein the unique causal relationship identified is a relationship between at least one pair selected from the group consisting of expression of a gene and level of a lipid; expression of a gene and level of a transcript; expression of a gene and level of a metabolite; expression of a first gene and a second gene; expression of a gene and presence of a SNP; expression of a gene and a functional activity; level of a lipid and level of a transcript; level of a lipid and level of a metabolite; level of a first lipid and a second lipid; level of a lipid and presence of a SNP; level of a lipid and a functional activity; level of a first transcript and level of a second transcript; level of a transcript and level of a metabolite; level of a transcript and presence of a SNP; level of a first transcript and a functional activity; level of a first metabolite and level of a second metabolite; level of a metabolite and presence of a SNP; level of a metabolite and a functional activity; level of a first SNP and presence of a second SNP; and presence of a SNP and a functional activity.

26. The method of claim 25, wherein the functional activity is selected from the group consisting of bioenergetics, cell proliferation, apoptosis, organellar function, kinase activity, protease activity, and a genotype-phenotype association actualized by functional models selected from ATP, ROS, OXPHOS, and Seahorse assays.

27. The method of claim 16, further comprising validating the identified unique causal relationship in drug-induced toxicity.

28. The method of claim 16, wherein the drug-induced toxicity is drug-induced cardiotoxicity, hepatotoxicity, nephrotoxicity, neurotoxicity, renaltoxicity, or myotoxicity.

29. The method of claim 28, wherein the drug-induced cardiotoxicity is

cardiomyopathy, heart failure, atrial fibrillation, cardiomyopathy and heart failure, heart failure and LV dysfunction, atrial flutter and fibrillation, or, heart valve damage and heart failure.

30. The method of claim 16, wherein the model for drug-induced toxicity comprises cell cardiomyocytes, diabetic cardiomyocytes, hepatocytes, kidney cells, neuronal cells, renal cells, or myoblasts.

31. The method of claim 16, wherein the model for drug-induced toxicity comprises a toxicity inducing drug, cancer drug, diabetic drug, neurological drug, or antiinflammatory drug.

32. The method of claim 16, wherein the drug is Anthracyclines, 5-Fluorouracil, Cisplatin, Trastuzumab, Gemcitabine, Rosiglitazone, Pioglitazone, Troglitazone, Cabergoline, Pergolide, Sumatriptan, Bisphosphonates, or TNF antagonists.

33. A method for identifying a drug that causes or is at risk for causing drug-induced toxicity, comprising: comparing (i) a level of one or more biomarkers present in a first cell sample obtained prior to the treatment with the drug; with (ii) a level of the one or more biomarkers present in a second cell sample obtained following the treatment with the drug; wherein the one or more biomarkers is selected from the modulators identified by the methods of any one of claims 16-32; wherein a modulation in the level of the one or more biomarkers in the second sample as compared to the first sample is an indication that the drug causes or is at risk for causing drug-induced toxicity.

34. A method for identifying a rescue agent that can reduce or prevent drug-induced toxicity comprising: (i) determining a normal level of one or more biomarkers present in a first cell sample obtained prior to the treatment with a toxicity inducing drug; (ii) determining a treated level of the one or more biomarkers present in a second cell sample obtained following the treatment with the toxicity inducing drug to identify one or more biomarkers with a change of level in the treated cell sample; (iii) determining the level of the one or more biomarkers with a changed level in the toxicity inducing drug treated sample present in a third cell sample obtained following the treatment with the toxicity inducing drug and the rescue agent; and (iv) comparing the level of the one or more biomarkers determined in the third sample with the level of the one or more biomarkers present in the first sample; wherein the one or more biomarkers is selected from the modulators identified by the methods of any one of claims 16-32 and wherein a normalized level of the one or more biomarkers in the third sample as compared to the first sample is an indication that the rescue agent can reduce or prevent drug-induced toxicity.

35. A method for alleviating, reducing or preventing drug-induced toxicity, comprising administering to a subject the rescue agent of claim 34, thereby reducing or preventing drug-induced toxicity in the subject.

Description:

INTERROGATORY CELL-BASED ASSAYS FOR IDENTIFYING DRUG- INDUCED TOXICITY MARKERS

Cross-Reference to Refated Applications

This application claims priority to U.S. Provisional Application Serial No., 61/650462 filed May 22, 2012, the entire content of which is incorporated herein.

Background of the Invention

The pharmaceutical industry is currently witnessing a 90% attrition of potential compounds entering clinical development, 30% of which is owing to poor clinical safety (Kola et al.(2004) Nat Rev Drug Discovery:3 711-715) . In the U.S., fatal adverse drug reactions (ADRs) are the 4 ^th to 6 ^th leading causes of death. Costs directly attributable to ADRs may lead to an additional $1.56 to $4 billion in direct hospital costs per year in the U.S. (Lazarou J et al.(1998) JAMA; 279(15):1200-1225). The cost of drug discovery and development has increased to about $1 billion, partly due to increased attrition of compounds and NME late in clinical development ( Adams CP, Brantner VV (2010) "Spending on New Drug Development" Health Econ. 19: 130-141). The lack of reliable tools that can help with predicting toxicity early in drug development is partly to blame for increasing costs and lower return on investment. Further, drug safety issues are the leading cause of increased litigation and settlements in the pharmaceutical industry. Between January 2009 and May 2011 the industry has spent over USD 8 billion on litigation cases related to drug safety issues.

In order to augment a "kill early policy" of compounds in early clinical trials and drug development, the FDA is now encouraging the drug industry and the community to adopt a very innovative strategy. FDA white paper Innovation or Stagnation: Challenges and Opportunity on the Critical Path to New Medical Projects states, "A new product development toolkit containing powerful new scientific and technical methods such as animal or computer-based predictive models, biomarkers for safety and effectiveness, and new clinical evaluation techniques— is urgently needed to improve predictability and efficiency along the critical path from laboratory concept to commercial product" (FDA, 2005). The FDA declaration clearly underscores the lack of innovative technologies that can aid in efficient decision making in drug development. Cardiotoxicity refers to a broad range of adverse effects on heart function induced by therapeutic molecules. Cardiotoxicity may emerge early in pre-clinical studies or become apparent later in the clinical setting. It is a leading cause of drug withdrawal, accounting for over 45% of all drugs withdrawn since 1994, which results in significant financial burden for drug development. Cardiovascular toxicity includes increased QT duration, arrhythmias, myocardial ischemia, hypertension and

thromboembolic complications, and myocardial dysfunction.

Cardiac safety biomarkers currently used by the FDA are QTc prolongation - lectrophysiological arrhythmias, circulating troponin c, heart rate, blood pressure, lipids, troponin, C-reactive protein (CRP), brain ot B-type natriuretic peptide (BNP), ex vivo platelet aggregation, and imaging biomarkers (cardiac magnetic resonance imaging). The QTc prolongation is a very robust but complex marker. However, a decision on whether to kill or sustain a drug in early development is hard to make based on QTc alone. In addition, QTc is subjective and is dependent upon underlying pathologies that can lead to tachyarrythmias.

In view of the foregoing, it is evident that new cardiac safety biomarkers, such as molecular cardiac safety biomarkers, are needed in the art.

Summary of the Invention

The platform technology described herein is useful for identifying markers associated with drug-induced toxicity. This platform technology integrates molecular interactions within and across a hierarchy of models starting from primary human cell based model to human clinical samples. This approach leads to the identification of biomarkers that reflect an underlying toxicity caused by a compound or NME that is a potential drug, such as a drug candidate ready to enter phase I clinical trials. Drug induced toxicities can include cardiac, renal, hepatic and other tissue toxicity. The instant application provides several novel biomarkers associated with drug-induced toxicity, and which are useful in methods for predicting potential toxicity of a molecule or drug candidate, and as potential therapeutic targets for treating, preventing or counteracting drug-induced toxicity. The invention described herein is based, at least in part, on a novel, collaborative utilization of network biology, genomic, proteomic, metabolomic, transcriptomic, and bioinformatics tools and methodologies, which, when combined, may be used to study any biological system of interest, such as obtaining insight into the molecular mechanisms associated with or causal for drug-induced toxicity. The platform technology is further described in international PCT Application PCT/US2012/027615, the entire contents of which are hereby expressly incorporated herein. Additional embodiments of the platform technology, including a description of how to carry out platform technology methods involving incorporation of enzyme (e.g., kinase) activity data, are described in U.S. Application Serial No. 13/607,587, filed on September 7, 2012, the entire contents of which are expressly incorporated herein by reference. In a first step, cellular modeling systems are developed to probe a drug-induced toxicities, such as cardiotoxicity, hepatotoxicity, nephrotoxicity, neurotoxicity, renaltoxicity or myotoxicity . A cellular system modeling drug-induced toxicity can comprise toxicity- related cells subjected to various -relevant environment stimuli (e.g., hyperglycemia, hypoxia, immuno-stress, and lipid peroxidation, or exposure to a test molecule or drug candidate). In some embodiments, the cellular modeling system involves cellular crosstalk mechanisms between various interacting cell types related to specific drug-induced toxicity, such as cardiomyocytes, diabetic cardiomyocytes, hepatocytes, kidney cells, neuronal cells, renal cells, or myoblasts. High throughput biological readouts from the cell model system are obtained by using a combination of techniques, including, for example, cutting edge mass spectrometry (LC/MSMS), flow cytometry, cell-based assays, and functional assays. The high throughput biological readouts are then subjected to a bioinformatic analysis to study congruent data trends by in vitro, in vivo, and in silico modeling. The resulting matrices allow for cross-related data mining where linear and non-linear regression analysis were developed to reach conclusive pressure points (or "hubs"). These "hubs", as presented herein, are candidates for drug discovery. In particular, these hubs represent potential drug targets for reducing or alleviating drug- induced toxicity and/or drug-induced toxicity markers.

The molecular signatures of the differentials allow for insight into the mechanisms that dictate the alterations in the tissue microenvironment that lead to drug- induced toxicity. Taken together, the combination of the aforementioned technology platform with strategic cellular modeling allows for robust intelligence that can be employed to further establish an understanding of the underlying mechanisms and molecular drivers contributing to drug-induced toxicity, e.g., cardiotoxicity,

hepatotoxicity, nephrotoxicity, neurotoxicity, renal toxicity or myotoxicity while creating biomarker libraries that may allow early identification of drug candidates at risk for causing drug-induced toxic effects, as well as drug targets that may reduce or alleviate drug-induced toxicity.

A significant feature of the platform of the invention is that the AI-based system is based on the data sets obtained from the drug-induced toxicity cell model system, without resorting to or taking into consideration any existing knowledge in the art, such as known biological relationships (i.e., no data points are artificial), concerning the drug- induced toxicity. Accordingly, the resulting statistical models generated from the platform are unbiased. Another significant feature of the platform of the invention and its components, e.g., the cell model systems and data sets obtained therefrom, is that it allows for continual building on the drug-induced toxicity cell models over time (e.g., by the introduction of new cells and/or conditions), such that an initial, "first generation" consensus causal relationship network generated from a cell model for a drug-induced toxicity can evolve along with the evolution of the cell model itself to a multiple generation causal relationship network (and delta or delta-delta networks obtained therefrom). In this way, both the drug-induced toxicity cell models, the data sets from the drug-induced toxicity cell models, and the causal relationship networks generated from the drug-induced toxicity cell models by using the Platform Technology methods can constantly evolve and build upon previous knowledge obtained from the Platform Technology.

The present invention is based, at least in part, on the identification of novel biomarkers that are associated with drug-induced cardiotoxicity. The invention is further based, at least in part, on the discovery that Coenzyme Q10 is capable of reducing or preventing drug-induced cardiotoxicity.

Accordingly, the invention provides methods for identifying an agent that causes or is at risk for causing cardiotoxicity. In one embodiment, the agent is a drug or drug candidate. In one embodiment, the toxicity is drug-induced toxicity, e.g., cardiotoxicity. In one embodiment, the agent is a drug or drug candidate for treating diabetes, obesity, a cardiovascular disorder, cancer, a neurological disorder, or an inflammatory disorder. In these methods, the amount of one or more biomarkers/proteins in a pair of samples (a first sample not subject to the drug treatment, and a second sample subjected to the drug treatment) is assessed. A modulation in the level, expression level, or activity of the one or more biomarkers in the second sample as compared to the level of expression of the one or more biomarkers in the first sample is an indication that the drug causes or is at risk for causing drug-induced cardiotoxicity. In one embodiment, the one or more biomarkers is selected from the markers listed in table 2. The methods of the present invention can be practiced in conjunction with any other method used by the skilled practitioner to identify a drug at risk for causing drug-induced cardiotoxocity.

In one embodiment, a drug that may be used in the methods of the invention includes, but is not limited to, Anthracyclines, 5-Fluorouracil, Cisplatin, Trastuzumab, Gemcitabine, Rosiglitazone, Pioglitazone, Troglitazone, Cabergoline, Pergolide, Sumatriptan, Bisphosphonates, and TNF antagonists.

Accordingly, in one aspect, the invention provides a method for identifying a drug that causes or is at risk for causing drug-induced cardiotoxicity, comprising:

comparing (i) the level of expression of one or more biomarkers present in a first cell sample obtained prior to the treatment with the drug; with (ii) the level of expression of the one or more biomarkers present in a second cell sample obtained following the treatment with the drug; wherein the one or more biomarkers is selected from the markers listed in table 2; wherein a modulation in the level of expression of the one or more biomarkers in the second sample as compared to the first sample is an indication that the drug causes or is at risk for causing drug-induced cardiotoxicity.

In one embodiment, the cells are cells of the cardiovascular system, e.g.,

cardiomyocytes. In one embodiment, the cells are diabetic cardiomyocytes. In one embodiment, the drug is a drug or candidate drug for treating diabetes,

obesity,cardiovascular disease, cancer, neurological disorder, or inflammatory disorder. In one embodiment, the drug is any one of Anthracyclines, 5-Fluorouracil, Cisplatin, Trastuzumab, Gemcitabine, Rosiglitazone, Pioglitazone, Troglitazone, Cabergoline, Pergolide, Sumatriptan, Bisphosphonates, and TNF antagonists.

In one embodiment, a modulation (e.g., an increase or a decrease) in the level of expression of one, two, three, four, five, six, seven, eight, nine,ten, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, or more of the biomarkers selected from the markers listed in table 2 in the second sample as compared to the first sample is an indication that the drug causes or is at risk for causing drug-induced cardiotoxicity.

In one embodiment, a modulation (e.g., an increase or a decrease) in the level of expression of a panel of two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, or thirteen markers selected from a group consisting TIMP1, PTX3, HSP76, FINC, CYB5, PAI1, IBP7 (IGFBP7), 1C17, EDIL3, HMOX1, NUCB 1, CSOIO, HSPA4 in the second sample as compared to the first sample is an indication that the drug causes or is at risk for causing drug-induced cardiotoxicity.

Methods for identifying a rescue agent that can reduce or prevent drug-induced cardiotoxicity are also provided by the invention. In these methods, the amount of one or more biomarkers in three samples (a first sample not subjected to the drug treatment, a second sample subjected to the drug treatment, and a third sample subjected both to the drug treatment and the agent) is assessed. A normalized level of expression of the one or more biomarkers in the third sample as compared to the first sample, with a change of expression in the second example treated with the drug, is an indication that the rescue agent can reduce or prevent drug-induced cardiotoxicity. In one embodiment, the one or more biomarkers is selected from the markers listed in table 2.

Using the methods described herein, a variety of molecules, particularly including molecules sufficiently small to be able to cross the cell membrane, may be screened in order to identify molecules which modulate, e.g., increase or decrease the expression and/or activity of a marker of the invention. Compounds so identified can be provided to a subject in order to reduce, alleviate or prevent drug-induced toxicity in the subject.

Accordingly, in another aspect, the invention provides a method for identifying a rescue agent that can reduce or prevent drug-induced cardiotoxicity comprising: (i) determining a normal level of expression of one or more biomarkers present in a first cell sample obtained prior to the treatment with a cardiotoxicity inducing drug; (ii) determining a treated level of expression of the one or more biomarkers present in a second cell sample obtained following the treatment with the cardiotoxicity inducing drug to identify one or more biomarkers with a change of expression in the treated cell sample; (iii) determining the level of expression of the one or more biomarkers with a changed level of expression in the cardiotoxicity inducing drug treated sample present in a third cell sample obtained following the treatment with the cardiotoxicity inducing drug and the rescue agent; and (iv) comparing the level of expression of the one or more biomarkers determined in the third sample with the level of expression of the one or more biomarkers present in the first sample; wherein the one or more biomarkers is selected from the markers listed in table 2; and wherein a normalized level of expression of the one or more biomarkers in the third sample as compared to the first sample is an indication that the rescue agent can reduce or prevent drug-induced cardiotoxicity.

obesity,cardiovascular disease, cancer, neurological disorder, or inflammatory disorder. In one embodiment, the drug is is any one of Anthracyclines, 5-Fluorouracil, Cisplatin, Trastuzumab, Gemcitabine, Rosiglitazone, Pioglitazone, Troglitazone, Cabergoline, Pergolide, Sumatriptan, Bisphosphonates, and TNF antagonists. In one embodiment, about the same level of expression of one, two, three, four, five, six, seven, eight, nine, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, or more of the biomarkers selected from the markers listed in table 2 in the third sample as compared to the first sample is an indication that the rescue agent can reduce or prevent drug-induced cardiotoxicity.

In one embodiment, a normalized level of expression of a panel of two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, or thirteen,markers selected from a group consisting TIMP1, PTX3, HSP76, FINC, CYB5, PAI1, IBP7 (IGFBP7), 1C17, EDIL3, HMOX1, NUCB1, CSOIO, HSPA4, in the third sample as compared to the first sample is an indication that the rescue agent can reduce or prevent drug-induced cardiotoxicity.

The invention further provides methods for alleviating, reducing or preventing drug-induced cardiotoxicity in a subject in need thereof, comprising administering to a subject (e.g., a mammal, a human, or a non-human animal) an agent identified by the screening methods provided herein, thereby reducing or preventing drug-induced cardiotoxicity in the subject. In one embodiment, the agent is administered to a subject that has already been treated with a cardiotoxicity-inducing drug. In one embodiment, the agent is administered to a subject at the same time as treatment of the subject with a cardiotoxicity-inducing drug. In one embodiment, the agent is administered to a subject prior to treatment of the subject with a cardiotoxicity-inducing drug.

The invention further provides methods for alleviating, reducing or preventing drug-induced cardiotoxicity in a subject in need thereof, comprising administering Coenzyme Q10 to the subject (e.g., a mammal, a human, or a non-human animal), thereby reducing or preventing drug-induced cardiotoxicity in the subject. In one embodiment, the Coenzyme Q10 is administered to a subject that has already been treated with a cardiotoxicity-inducing drug. In one embodiment, the Coenzyme Q10 is administered to a subject at the same time as treatment of the subject with a

cardiotoxicity-inducing drug. In one embodiment, the Coenzyme Q10 is administered to a subject prior to treatment of the subject with a cardiotoxicity-inducing drug. In one embodiment, the drug-induced cardiotoxicity is associated with modulation of expression of one, two, three, four, five, six, seven, eight, nine, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, or more of the biomarkers selected from the markers listed in table 2. All values presented in the foregoing list can also be the upper or lower limit of ranges, that are intended to be a part of this invention, e.g., between 1 and 5, 1 and 10, 2 and 5, 2 and 10, or 5 and 10 of the foregoing genes (or proteins).

In one embodiment, the drug-induced cardiotoxicity is cardiomyopathy, heart failure, atrial fibrillation, cardiomyopathy and heart failure, heart failure and LV dysfunction, atrial flutter and fibrillation, or heart valve damage and heart failure.

The invention further provides biomarkers (e.g, genes and/or proteins) that are useful as predictive markers for drug-induced cardiotoxicity. These biomarkers include the markers listed in table 2.

In one embodiment, the drug-induced cardiotoxicity is associated with modulation of a panel of two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, or thirteen, markers selected from a group consisting TIMP1, PTX3, HSP76, FINC, CYB5, PAI1, IBP7 (IGFBP7), 1C17, EDIL3, HMOX1, NUCB1, CSOIO, HSPA4.

In one embodiment, the predictive markers for drug-induced cardiotoxicity is a panel of two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, or thirteen markers selected from a group consisting TIMP1, PTX3, HSP76, FINC, CYB5, PAI1, IBP7 (IGFBP7), 1C17, EDIL3, HMOX1, NUCB1, CSOIO, HSPA4. The ordinary skilled artisan would, however, be able to identify additional biomarkers predictive of drug-induced cardiotoxicity by employing the methods described herein, e.g., by carrying out the methods described in Example 3 but by using a different drug known to induce cardiotoxicity. Exemplary drug-induced cardiotoxicity biomarkers of the invention are further described below.

In one aspect, the invention relates to a method for identifying a modulator of adrug-induced toxicity, said method comprising: (1) establishing a model for drug- induced toxicity, using cells associated with drug-induced toxicity, to represents a characteristic aspect of drug-induced toxicity; (2) obtaining a first data set from the model for drug-induced toxicity, wherein the first data set represents one or more of genomics, lipidomics, proteomics, metabolomics, transcriptomics, and single nucleotide polymorphism (SNP) data characterizing the cells associated with drug-induced toxicity; (3) obtaining a second data set from the model for drug-induced toxicity, wherein the second data set represents a functional activity or a cellular response of the cells associated with drug-induced toxicity; (4) generating a consensus causal relationship network among the expression levels of the one or more of genomics, lipidomics, proteomics, metabolomics, transcriptomics, and single nucleotide polymorphism (SNP) data and the functional activity or cellular response based solely on the first data set and the second data set using a programmed computing device, wherein the generation of the consensus causal relationship network is not based on any known biological

relationships other than the first data set and the second data set; (5) identifying, from the consensus causal relationship network, a causal relationship unique in drug-induced toxicity, wherein a gene, lipid, protein, metabolite, transcript, or SNP associated with the unique causal relationship is identified as a modulator of drug-induced toxicity.

In certain embodiments, the modulator stimulates or promotes the drug-induced toxicity.

In certain embodiments, the modulator inhibits the drug-induced toxicity.

In certain embodiments, the model of the drug-induced toxicity comprises an in vitro culture of cells associated with the drug-induced toxicity, optionally further comprising a matching in vitro culture of control cells.

In certain embodiments, the in vitro culture of the cells is subject to an environmental perturbation, and the in vitro culture of the matching control cells is identical cells not subject to the environmental perturbation. In certain embodiments, the environmental perturbation comprises one or more of a contact with an agent, a change in culture condition, an introduced genetic modification / mutation, and a vehicle (e.g., vector) that causes a genetic modification / mutation.

In certain embodiments, the first data set comprises protein and/or mRNA expression levels of the plurality of genes.

In certain embodiments, the first data set further comprises two or more of genomics, lipidomics, proteomics, metabolomics, transcriptomics, and single nucleotide polymorphism (SNP) data. In certain embodiments, the first data set further comprises three or more of genomics, lipidomics, proteomics, metabolomics, transcriptomics, and single nucleotide polymorphism (SNP) data.

In certain embodiments, the second data set representing the functional activity or cellular response of the cells comprises one or more of bioenergetics, cell

proliferation, apoptosis, organellar function, a genotype-phenotype association actualized by functional models selected from ATP, ROS, OXPHOS, and Seahorse assays, global enzyme activity, and an effect of global enzyme activity on the enzyme metabolic substrates of cells associated with drug-induced toxicity. In one embodiment, the global enzyme activity is global kinase activity. In one embodiment, the effect of global enzyme activity on the enzyme metabolic substrates is the phospho proteome of the cell.

In certain embodiments, step (4) is carried out by an artificial intelligence (AI) - based informatics platform.

In certain embodiments, the AI-based informatics platform comprises

REFS(TM).

In certain embodiments, the AI-based informatics platform receives all data input from the first data set and the second data set without applying a statistical cut-off point.

In certain embodiments, the consensus causal relationship network established in step (4) is further refined to a simulation causal relationship network, before step (5), by in silico simulation based on input data, to provide a confidence level of prediction for one or more causal relationships within the consensus causal relationship network.

In certain embodiments, the unique causal relationship is identified as part of a differential causal relationship network that is uniquely present in cells, and absent in the matching control cells. In one embodiment, the unique causal relationship identified is a relationship between at least one pair selected from the group consisting of expression of a gene and level of a lipid; expression of a gene and level of a transcript; expression of a gene and level of a metabolite; expression of a first gene and a second gene; expression of a gene and presence of a SNP; expression of a gene and a functional activity; level of a lipid and level of a transcript; level of a lipid and level of a metabolite; level of a first lipid and a second lipid; level of a lipid and presence of a SNP; level of a lipid and a functional activity; level of a first transcript and level of a second transcript; level of a transcript and level of a metabolite; level of a transcript and presence of a SNP; level of a first transcript and a functional activity; level of a first metabolite and level of a second metabolite; level of a metabolite and presence of a SNP; level of a metabolite and a functional activity; level of a first SNP and presence of a second SNP; and presence of a SNP and a functional activity.

In one embodiment, the functional activity is selected from the group consisting of bioenergetics, cell proliferation, apoptosis, organellar function, kinase activity, protease activity, and a genotype-phenotype association actualized by functional models selected from ATP, ROS, OXPHOS, and Seahorse assays. In certain embodiments, the method further comprising validating the identified unique causal relationship in a drug- indiced toxicity model.

In one embodiment, the drug-induced toxicity is drug-induced cardiotoxicity, hepatotoxicity, nephrotoxicity, neurotoxicity, renaltoxicity or myotoxicity.

In one embodiment, the model for drug-induced toxicity comprises

cardiomyocytes, diabetic cardiomyocytes, hepatocytes, kidney cells, neuronal cells, renal cells, or myoblasts.

In one embodiment, the model for drug-induced toxicity comprises a toxicity inducing drug, cancer drug, diabetic drug, neurological drug, or anti-inflammatory drug. In one embodiment, the drug is Anthracyclines, 5-Fluorouracil, Cisplatin, Trastuzumab, Gemcitabine, Rosiglitazone, Pioglitazone, Troglitazone, Cabergoline, Pergolide, Sumatriptan, Bisphosphonates, or TNF antagonists.

In one aspect, the invention provides a method for identifying a drug that causes or is at risk for causing drug-induced toxicity, comprising: comparing (i) a level of one or more biomarkers present in a first cell sample obtained prior to the treatment with the drug; with (ii) a level of the one or more biomarkers present in a second cell sample obtained following the treatment with the drug; wherein the one or more biomarkers is selected from the modulators identified by the methods described above; wherein a modulation in the level of the one or more biomarkers in the second sample as compared to the first sample is an indication that the drug causes or is at risk for causing drug- induced toxicity.

In one aspect, the invention provides a method for identifying a rescue agent that can reduce or prevent drug-induced toxicity comprising: (i) determining a normal level of one or more biomarkers present in a first cell sample obtained prior to the treatment with a toxicity inducing drug; (ii) determining a treated level of the one or more biomarkers present in a second cell sample obtained following the treatment with the toxicity inducing drug to identify one or more biomarkers with a change of level in the treated cell sample; (iii) determining the level of the one or more biomarkers with a changed level in the toxicity inducing drug treated sample present in a third cell sample obtained following the treatment with the toxicity inducing drug and the rescue agent; and (iv) comparing the level of the one or more biomarkers determined in the third sample with the level of the one or more biomarkers present in the first sample; wherein the one or more biomarkers is selected from the modulators identified by the methods described above and wherein a normalized level of the one or more biomarkers in the third sample as compared to the first sample is an indication that the rescue agent can reduce or prevent drug-induced toxicity.

In another aspect, the invention relates to a method for alleviating, reducing or preventing drug-induced toxicity, comprising administering to a subject the rescue agent identified by the methods described above, thereby reducing or preventing drug-induced toxicity in the subject. In another aspect, the invention relates to a method for providing a model for drug-induced toxicity for use in a platform method, comprising: establishing a drug- induced toxicity model, using cells associated with the drug-induced toxicity, to represent a characteristic aspect of the drug-induced toxicity, wherein the model for the drug-induced toxicity is useful for generating data sets used in the platform method; thereby providing a model for drug-induced toxicity for use in a platform method.

In one embodiment, the model for drug-induced toxicity comprises

cardiomyocytes, diabetic cardiomyocytes, hepatocytes, kidney cells, neuronal cells, renal cells, or myoblasts.

In another aspect, the invention relates to a method for obtaining a first data set and second data set from a model for drug-induced toxicity for use in a platform method, comprising: (1) obtaining a first data set from the model for drug-induced toxicity for use in a platform method, wherein the model for the drug-induced toxicity comprises cells associated with the drug-induced toxicity, and wherein the first data set represents expression levels of a plurality of genes in the cells associated with the drug-induced toxicity; (2) obtaining a second data set from the model for drug-induced toxicity for use in the platform method, wherein the second data set represents a functional activity or a cellular response of the cells associated with the drug-induced toxicity; thereby obtaining a first data set and second data set from the model for the drug-induced toxicity for use in a platform method.

In another aspect, the invention relates to a method for identifying a modulator of drug-induced toxicity, said method comprising: (1) generating a consensus causal relationship network among a first data set and second data set obtained from a model for the drug-induced toxicity, wherein the model comprises cells associated with the drug-induced toxicity, and wherein the first data set represents expression levels of a plurality of genes in the cells and the second data set represents a functional activity or a cellular response of the cells, using a programmed computing device, wherein the generation of the consensus causal relationship network is not based on any known biological relationships other than the first data set and the second data set; (2) identifying, from the consensus causal relationship network, a causal relationship unique in the drug-induced toxicity, wherein a gene associated with the unique causal relationship is identified as a modulator of the drug-induced toxicity; thereby identifying a modulator of drug-induced toxicity. In another aspect, the invention relates to a method for identifying a modulator of a drug-induced toxicity, said method comprising: 1) providing a consensus causal relationship network generated from a model for the drug-induced toxicity; 2) identifying, from the consensus causal relationship network, a causal relationship unique in the drug-induced toxicity, wherein a gene associated with the unique causal relationship is identified as a modulator of the drug-induced toxicity; thereby identifying a modulator of a drug-induced toxicity.

In certain embodiments of the various methods, the consensus causal relationship network is generated among a first data set and second data set obtained from the model for the drug-induced toxicity, wherein the model comprises cells associated with the drug-induced toxicity, and wherein the first data set represents expression levels of a plurality of genes in the cells and the second data set represents a functional activity or a cellular response of the cells, using a programmed computing device, wherein the generation of the consensus causal relationship network is not based on any known biological relationships other than the first data set and the second data set.

In certain embodiments, the "environmental perturbation", also referred to herein as "external stimulus component", is a therapeutic agent. In certain embodiments, the external stimulus component is a small molecule (e.g. , a small molecule of no more than 5 kDa, 4 kDa, 3 kDa, 2 kDa, 1 kDa, 500 Dalton, or 250 Dalton). In certain

embodiments, the external stimulus component is a biologic. In certain embodiments, the external stimulus component is a chemical. In certain embodiments, the external stimulus component is endogenous or exogenous to cells. In certain embodiments, the external stimulus component is a MIM or epishifter. In certain embodiments, the external stimulus component is a stress factor for the cell system, such as hypoxia, hyperglycemia, hyperlipidemia, hyperinsulinemia, and/or lactic acid rich conditions.

In certain embodiments, the external stimulus component may include a therapeutic agent or a candidate therapeutic agent for treating a drug-induced toxicity, including chemotherapeutic agent, protein-based biological drugs, antibodies, fusion proteins, small molecule drugs, lipids, polysaccharides, nucleic acids, etc.

In certain embodiments, the external stimulus component may be one or more stress factors, such as those typically encountered in vivo under the various drug-induced toxicities, including hypoxia, hyperglycemic conditions, acidic environment (that may be mimicked by lactic acid treatment), etc.

In other embodiments, the external stimulus component may include one or more MIMs and/or epishifters, as defined herein below. Exemplary MIMs include Coenzyme Q10 (also referred to herein as CoQIO) and compounds in the Vitamin B family, or nucleosides, mononucleotides or dinucleotides that comprise a compound in the Vitamin B family.

In making cellular output measurements (such as protein expression), either absolute amount (e.g., expression amount) or relative level (e.g., relative expression level) may be used. In one embodiment, absolute amounts (e.g., expression amounts) are used. In one embodiment, relative levels or amounts (e.g., relative expression levels) are used. For example, to determine the relative protein expression level of a cell system, the amount of any given protein in the cell system, with or without the external stimulus to the cell system, may be compared to a suitable control cell line or mixture of cell lines (such as all cells used in the same experiment) and given a fold-increase or fold-decrease value. The skilled person will appreciate that absolute amounts or relative amounts can be employed in any cellular output measurement, such as gene and/or RNA transcription level, level of lipid, level of metabolite, or any functional output, e.g., level of apoptosis, level of toxicity, level of enzyme (e.g., kinase) activity, or ECAR or OCR as described herein. A pre-determined threshold level for a fold-increase {e.g. , at least 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2, 2.5, 3, 3.5, 4, 4.5, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 75 or 100 or more fold increase) or fold-decrease {e.g. , at least a decrease to 0.9, 0.8, 0.75, 0.7, 0.6, 0.5, 0.45, 0.4, 0.35, 0.3, 0.25, 0.2, 0.15, 0.1 or 0.05 fold, or a decrease to 90%, 80%, 75%, 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10% or 5% or less) may be used to select significant differentials, and the cellular output data for the significant differentials may then be included in the data sets (e.g., first and second data sets) utilized in the platform technology methods of the invention. All values presented in the foregoing list can also be the upper or lower limit of ranges, e.g., between 1.5 and 5 fold, 5 and 10 fold, 2 and 5 fold, or between 0.9 and 0.7, 0.9 and 0.5, or 0.7 and 0.3 fold, are intended to be a part of this invention.

Throughout the present application, all values presented in a list, e.g., such as those above, can also be the upper or lower limit of ranges that are intended to be a part of this invention. In one embodiment of the methods of the invention, not every observed causal relationship in a causal relationship network may be of biological significance. With respect to any given drug-induced toxicity for which the subject interrogative biological assessment is applied, some (or maybe all) of the causal relationships (and the genes associated therewith) may be "determinative" with respect to the specific biological problem at issue, e.g. , either responsible for causing a drug-induced toxicity (a potential target for therapeutic intervention) or is a biomarker for the drug-induced toxicity (a potential diagnostic or prognostic factor). In one embodiment, an observed causal relationship unique in the drug-induced toxicity is determinative with respect to the specific biological problem at issue. In one embodiment, not every observed causal relationship unique in the drug-induced toxicity is determinative with respect to the specific problem at issue.

Such determinative causal relationships may be selected by an end user of the subject method, or it may be selected by a bioinformatics software program, such as REFS, DAVID-enabled comparative pathway analysis program, or the KEGG pathway analysis program. In certain embodiments, more than one bioinformatics software program is used, and consensus results from two or more bioinformatics software programs are preferred.

As used herein, "differentials" of cellular outputs include differences (e.g. , increased or decreased levels) in any one or more parameters of the cellular outputs. In certain embodiments, the differentials are each independently selected from the group consisting of differentials in mRNA transcription, protein expression, lipid expression, protein activity, kinase activity, metabolite / intermediate level, and/or ligand-target interaction. For example, in terms of protein expression level, differentials between two cellular outputs, such as the outputs associated with a cell system before and after the treatment by an external stimulus component, can be measured and quantitated by using art-recognized technologies, such as mass-spectrometry based assays (e.g. , iTRAQ, 2D- LC-MSMS, etc.).

In one aspect, the cell model for a drug-induced toxicity comprises a cellular cross-talking system, wherein a first cell system having a first cellular environment with an external stimulus component generates a first modified cellular environment; such that a cross-talking cell system is established by exposing a second cell system having a second cellular environment to the first modified cellular environment. In one embodiment, at least one significant cellular cross-talking differential from the cross-talking cell system is generated; and at least one determinative cellular cross-talking differential is identified such that an interrogative biological assessment occurs. In certain embodiments, the at least one significant cellular cross-talking differential is a plurality of differentials.

In certain embodiments, the at least one determinative cellular cross-talking differential is selected by the end user. Alternatively, in another embodiment, the at least one determinative cellular cross-talking differential is selected by a bioinformatics software program (such as, e.g., REFS, KEGG pathway analysis or DAVID-enabled comparative pathway analysis) based on the quantitative proteomics data.

In certain embodiments, the method further comprises generating a significant cellular output differential for the first cell system.

In certain embodiments, the differentials are each independently selected from the group consisting of differentials in mRNA transcription, protein expression, lipid expression, protein activity, metabolite / intermediate level, and/or ligand-target interaction.

In certain embodiments, the first cell system and the second cell system are independently selected from: a homogeneous population of primary cells, a drug- induced toxicity related cell line, or a normal cell line.

In certain embodiments, the first modified cellular environment comprises factors secreted by the first cell system into the first cellular environment, as a result of contacting the first cell system with the external stimulus component. The factors may comprise secreted proteins or other signaling molecules. In certain embodiments, the first modified cellular environment is substantially free of the original external stimulus component.

In certain embodiments, the cross-talking cell system comprises a transwell having an insert compartment and a well compartment separated by a membrane. For example, the first cell system may grow in the insert compartment (or the well compartment), and the second cell system may grow in the well compartment (or the insert compartment).

In certain embodiments, the cross-talking cell system comprises a first culture for growing the first cell system, and a second culture for growing the second cell system. In this case, the first modified cellular environment may be a conditioned medium from the first cell system.

In certain embodiments, the first cellular environment and the second cellular environment can be identical. In certain embodiments, the first cellular environment and the second cellular environment can be different.

In certain embodiments, the cross-talking cell system comprises a co-culture of the first cell system and the second cell system.

The methods of the invention may be used for, or applied to, any number of "interrogative biological assessments." Application of the methods of the invention to an interrogative biological assessment allows for the identification of one or more modulators of a drug-induced toxicity or determinative cellular process "drivers" of a drug-induced toxicity.

In one embodiment, the interrogative biological assessment is the assessment of the toxicological profile of an agent, e.g., a drug, on a cell, tissue, organ or organism, wherein the identified modulators of drug-induced toxicity, e.g., determinative cellular process driver (e.g., cellular cross-talk differentials or causal relationships unique in drug-induced toxicity) may be indicators of toxicity, e.g., cytotoxicity, cardiotoxicity, hepatotoxicity, nephrotoxicity, neurotoxicity, renaltoxicity or myotoxicity, and may in turn be used to predict or identify the toxicological profile of the agent. In one embodiment, the identified modulators of a drug-induced toxicity, e.g., determinative cellular process driver (e.g., cellular cross-talk differentials or causal relationships unique in a drug-induced toxicity) is an indicator of cardiotoxicity of a drug or drug candidate, and may in turn be used to predict or identify the cardiotoxicological profile of the drug or drug candidate.

In another aspect, the invention provides a kit for conducting an interrogative biological assessment using a discovery Platform Technology, comprising one or more reagents for detecting the presence of, and/or for quantitating the amount of, an analyte that is the subject of a causal relationship network generated from the methods of the invention. In one embodiment, said analyte is the subject of a unique causal relationship in the drug-induced toxicity, e.g., a gene associated with a unique causal relationhip in the drug-induced toxicity. In certain embodiments, the analyte is a protein, and the reagents comprise an antibody against the protein, a label for the protein, and/or one or more agents for preparing the protein for high throughput analysis (e.g. , mass spectrometry based sequencing).

It should be understood that all embodiments described herein, including those described only in examples, are parts of the general description of the invention, and can be combined with any other embodiments of the invention unless explicitly disclaimed or inapplicable.

Brief Description of the Drawings

Various embodiments of the present disclosure will be described herein below with reference to the figures wherein:

Figure 1: Illustration of approach to identify therapeutics.

Figure 2: Illustration of systems biology of cancer and consequence of integrated multi-physiological interactive output regulation.

Figure 3: Illustration of systematic interrogation of biological relevance using

MIMS.

Figure 4: Illustration of modeling cancer network to enable interrogative biological query.

Figure 5: Illustration of the interrogative biology platform technology.

Figure 6: Illustration of technologies employed in the platform technology.

Figure 7: Schematic representation of the components of the platform including data collection, data integration, and data mining.

Figure 8: Schematic representation of the systematic interrogation using MIMS and collection of response data from the "omics" cascade.

Figure 9: Sketch of the components employed to build the in vitro models representing normal and diabetic states.

Figure 10: Schematic representation of the informatics platform REFS™ used to generate causal networks of the protein as they relate to disease pathophysiology.

Figure 11: Schematic representation of the approach towards generation of differential network in diabetic versus normal states and diabetic nodes that are restored to normal states by treatment with MIMS.

Figure 12: A representative differential network in diabetic versus normal states. Figure 13: A schematic representation of a node and associated edges of interest (Nodelin the center). The cellular functionality associated with each edge is represented.

Figure 14: High level flow chart of an exemplary method, in accordance with some embodiments.

Figure 15A-15D: High level schematic illustration of the components and process for an AI-based informatics system that may be used with exemplary embodiments.

Figure 16: Flow chart of process in AI-based informatics system that may be used with some exemplary embodiments.

Figure 17: Schematically depicts an exemplary computing environment suitable for practicing exemplary embodiments taught herein.

Figure 18: Illustration of the mathematical approach towards generation of delta- delta networks.

Figure 19: A schematic representing experimental design and modeling parameters used to study drug induced toxicity in diabetic cardiomyocytes.

Figure 20: Dysregulation of transcriptional network and expression of human mitochondrial energy metabolism genes in diabetic cardiomyocytes by drug treatment (T): rescue molecule (R) normalizes gene expression.

Figure 21: A. Drug treatment (T) induced expression of GPAT1 and TAZ in mitochondria from cardiomyocytes conditioned in hyerglycemia. In combination with the rescue molecule (T+R) the levels of GPAT1 and TAZ were normalized. B.

Synthesis of TAG from G3P.

Figure 22: A. Drug treatment (T) decreases mitochondrial OCR (oxygen consumption rate) in cardiomyocytes conditioned in hyperglycemia. The rescue molecule (T+R) normalizes OCR. B. Drug treatment (T) represses mitochondrial ATP synthesis in cardiomyocytes conditioned in hyperglycemia.

Figure 23: GO Annotation of proteins down regulated by drug treatment.

Proteins involved in mitochondrial energy metabolism were down regulated with drug treatment.

Figure 24: Illustration of the mathematical approach towards generation of delta networks. Compare unique edges from T versus UT both the models being in diabetic environment. Figure 25: A schematic representing potential protein hubs and networks that drive pathophysiology of drug induced toxicity.

Figure 26: Schematic representation of the Interrogative biology platform.

Figure 27: Illustration of cellular functional models, data integration and mathematical model Building.

Figure 28: Causal molecular interaction network that drives pathophysiology of drug-induced toxicity.

Figure 29: Causal molecular interaction sub-network of PTX3 as the central hub that drives pathophysiology of drug-induced toxicity.

Figure 30: Mitochondria ATP synthesis capacity of cardiomyocutes in normal glucose and high glucose conditions.

Figure 31: Causal molecular interaction network of ATP drivers.

Figure 32: Causal molecular interaction sub-network of ATP drivers with P4HB as the central hub.

Figure 33: Unique edges of causal molecular interaction sub-network of ATP drivers with P4HB as the central hub.

Figure 34: Illustration of functional toxicomics: multi-omics integration.

Attached herewith, as in Appendix A, are the sequences of all biomarkers referenced herein. All of the information associated with the Gene Bank accession numbers listed in Appendix A and through this application are incorporated herein by reference in the verions available on the filing date of this application.

Detailed Description of the Invention

I. Overview

Exemplary embodiments of the present invention incorporate methods that may be performed using an interrogative biology platform ("the Platform") that is a tool for understanding a wide variety of drug-induced toxicities, such as cardiotoxicity, hepatotoxicity, nephrotoxicity, neurotoxicity, renaltoxicity or myotoxicity, and the key molecular drivers underlying such drug-induced toxicities, including factors that enable a drug-induced toxicity. Some exemplary embodiments include systems that may incorporate at least a portion of, or all of, the Platform. Some exemplary methods may employ at least some of, or all of the Platform. Goals and objectives of some exemplary embodiments involving the platform are generally outlined below for illustrative purposes:

i) to create specific molecular signatures as drivers of critical components of the drug-induced toxicity as they relate to overall pathophysiology of the relevant cells, tissues, and/or organs;

ii) to generate molecular signatures or differential maps pertaining to the drug-induced toxicity, which may help to identify differential molecular signatures that distinguishes one biological state (e.g., a drug-induced toxicity state) versus a different biological stage (e.g., a normal state), and develop understanding of signatures or molecular entities as they arbitrate mechanisms of change between the two biological states (e.g., from normal to drug-induced toxicity state); and,

iii) to investigate the role of "hubs" of molecular activity as potential intervention targets for external control of the drug-induced toxicity (e.g., to use the hub as a potential therapeutic target), or as potential bio-markers for the drug-induced toxicity in question (e.g., drug-induced toxicity specific biomarkers, in prognostic and/or theranostics uses).

Some exemplary methods involving the Drug-induced Toxicity Platform may include one or more of the following features:

1) modeling the drug-induced toxicities (e.g., cardiotoxicity, hepatotoxicity, nephrotoxicity, neurotoxicity, renaltoxicity, or myotoxicity) and/or components of the drug-induced toxicity (e.g., physiology & pathophysiology associated with toxicities) in one or more models, preferably in vitro models, using cells associated with the drug- induced toxicity. For example, the cells may be human derived cells which normally participate in the drug-induced toxicity in question (e.g., heat muscle cells involved in cardiotoxicity). The model may include various cellular cues / conditions / perturbations that are specific to the drug-induced toxicity. Ideally, the model represents various drug- induced toxicity states and flux components, instead of a static assessment of the drug- induced toxicity condition. 2) profiling mRNA and/or protein signatures using any art-recognized means. For example, quantitative polymerase chain reaction (qPCR) & proteomics analysis tools such as Mass Spectrometry (MS). Such mRNA and protein data sets represent biological reaction to environment / perturbation. Where applicable and possible, lipidomics, metabolomics, and transcriptomics data may also be integrated as supplemental or alternative measures for the drug-induced toxicity in question. SNP analysis is another component that may be used at times in the process. It may be helpful for investigating, for example, whether the SNP or a specific mutation has any effect on the drug-induced toxicity. These variables may be used to describe the drug- induced toxicity, either as a static "snapshot," or as a representation of a dynamic process.

3) assaying for one or more functional activities or cellular responses to cues and perturbations, including but not limited to bioenergetics, cell proliferation, apoptosis, and organellar function. True genotype-phenotype association is actualized by employment of functional models, such as ATP, ROS, OXPHOS, Seahorse assays, etc. Such functional activities can involve global enzyme activity, such as kinase activity, and/or effects of global enzyme activity or the enzyme metabolites or substrates in the cells, e.g., the phosphor proteome of the cells. Such cellular responses represent the reaction of the cells in the drug-induced toxicity process (or models thereof) in response to the corresponding drug-induced toxicity state(s) of the mRNA / protein expression, and any other related states in 2) above.

4) integrating functional assay data thus obtained in 3) with proteomics and other data obtained in 2), and determining protein, gene, lipid, enzyme activity and other functional acitivity associations as driven by causality, by employing artificial intelligence based (Al-based) informatics system or platform. Such an Al-based system is based on, and preferably based only on, the data sets obtained in 2) and/or 3), without resorting to existing knowledge concerning the drug-induced toxicity process.

Preferably, no data points are statistically or artificially cut-off. Instead, all obtained data is fed into the AI-system for determining protein, gene, lipid, enzyme activity and other functional acitivity associations. One goal or output of the integration process is one or more differential networks (otherwise may be referred to herein as "delta networks," or, in some cases, "delta-delta networks" as the case may be) between the different biological states (e.g., drug-induced toxicity vs. normal states). 5) profiling the outputs from the AI-based informatics platform to explore each hub of activity as a potential therapeutic target and/or biomarker. Such profiling can be done entirely in silico based on the obtained data sets, without resorting to any actual wet-lab experiments.

6) validating hub of activity by employing molecular and cellular techniques. Such post-informatic validation of output with wet-lab cell-based experiments may be optional, but they help to create a full-circle of interrogation.

Any or all of the approaches outlined above may be used in any specific application concerning any drug-induced toxicity, depending, at least in part, on the nature of the specific application. That is, one or more approaches outlined above may be omitted or modified, and one or more additional approaches may be employed, depending on specific application.

Various schematics illustrating the platform are provided. In particular, an illustration of an exemplary approach to identify therapeutics using the platform is depicted in Figure 1. An illustration of systems biology of cancer and the consequence of integrated multi-physiological interactive output regulation is depicted in Figure 2. An illustration of a systematic interrogation of biological relevance using MIMS is depicted in Figure 3. An illustration of modeling a cancer network to enable an interrogative biological query is depicted in Figure 4.

Illustrations of the interrogative biology platform and technologies employed in the platform are depicted in Figures 5 and 6. A schematic representation of the components of the platform including data collection, data integration, and data mining is depicted in Figure 7. A schematic representation of a systematic interrogation using MIMS and collection of response data from the "omics" cascade is depicted in Figure 8.

Figure 14 is a high level flow chart of an exemplary method 10, in which components of an exemplary system that may be used to perform the exemplary method are indicated. Initially, a model (e.g., an in vitro model) is established for a biological process (e.g., a drug-induced toxicityprocess) and/or components of the biological process (e.g., drug-induced toxicity physiology and pathophysiology) using cells normally associated with the process (step 12). For example, the cells may be human- derived cells that normally participate in the biological process (e.g., drug-induced toxicity). The cell model may include various cellular cues, conditions, and/or perturbations that are specific to the biological process (e.g., drug-induced toxicity). Ideally, the cell model represents various (drug-induced toxicity) states and flux components of the biological process (e.g., drug-induced toxicity), instead of a static assessment of the biological process. The comparison cell model may include control cells or normal cells, e.g., cells not exposed to a drug which induces toxicity. Additional description of the cell models appears below in sections III.A and IV.

A first data set is obtained from the cell model for the biological process (e.g. drug-induced toxicity), which includes information representing, by way of example, expression levels of a plurality of genes (e.g., mRNA and/or protein signatures) (step 16) using any known process or system (e.g., quantitative polymerase chain reaction (qPCR) & proteomics analysis tools such as Mass Spectrometry (MS)).

A third data set is obtained from the comparison cell model for the biological process (e.g. drug-induced toxicity) (step 18). The third data set includes information representing, e.g., expression levels of a plurality of genes in the comparison cells from the comparison cell model.

In certain embodiments of the methods of the invention, these first and third data sets are collectively referred to herein as a "first data set" that represents, e.g., expression levels of a plurality of genes in the cells (all cells including comparison cells) associated with the biological system (e.g. drug-induced toxicity model).

The first data set and third data set may be obtained from one or more mRNA and/or Protein Signature Analysis System(s). The mRNA and protein data in the first and third data sets may represent biological reactions to environment and/or

perturbation. Where applicable and possible, lipidomics, metabolomics, and

transcriptomics data may also be integrated into the first data set as supplemental or alternative measures for the biological process (e.g. drug-induced toxicity). The SNP analysis is another component that may be used at times in the process. It may be helpful for investigating, for example, whether a single-nucleotide polymorphism (SNP) or a specific mutation has any effect on the biological process (e.g. drug-induced toxicity). The data variables may be used to describe the biological process (e.g. drug- induced toxicity) either as a static "snapshot," or as a representation of a dynamic process. Additional description regarding obtaining information representing expression levels of a plurality of genes in cells appears below in section III.B.

A second data set is obtained from the cell model for the biological process (e.g. drug-induced toxicity), which includes information representing a functional activity or response of cells (step 20). Similarly, a fourth data set is obtained from the comparison cell model for the biological process (e.g. drug-induced toxicity), which includes information representing a functional activity or response of the comparison cells (step 22).

In certain embodiments of the methods of the invention, these second and fourth data sets are collectively referred to herein as a "second data set" that represents a functional activity or a cellular response of the cells (all cells including comparison cells) associated with the biological system (e.g. drug-induced toxicity).

One or more functional assay systems may be used to obtain information regarding the functional activity or response of cells or of comparison cells. The information regarding functional cellular responses to cues and perturbations may include, but is not limited to, bioenergetics profiling, cell proliferation, apoptosis, and organellar function. Functional models for processes and pathways (e.g., adenosine triphosphate (ATP), reactive oxygen species (ROS), oxidative phosphorylation (OXPHOS), Seahorse assays, etc.,) may be employed to obtain true genotype-phenotype association. Such functional activities can involve global enzyme activity, such as kinase activity, and/or effects of global enzyme activity, or the enzyme metabolites or substrates in the cells, e.g., the phosphor proteome of the cells. The functional activity or cellular responses represent the reaction of the cells in the biological process (or models thereof) in response to the corresponding state(s) of the mRNA / protein expression, and any other related applied conditions or perturbations. Additional information regarding obtaining information representing functional activity or response of cells is provided below in section III.B.

The method also includes generating computer-implemented models of the biological processes (e.g. drug-induced toxicity) in the cells and in the control cells. For example, one or more (e.g., an ensemble of) Bayesian networks of causal relationships between the expression level of the plurality of genes and the functional activity or cellular response may be generated for the cell model (the "generated cell model networks") from the first data set and the second data set (step 24). The generated cell model networks, individually or collectively, include quantitative probabilistic directional information regarding relationships. The generated cell model networks are not based on known biological relationships between gene expression and/or functional activity or cellular response, other than information from the first data set and second data set. The one or more generated cell model networks may collectively be referred to as a consensus cell model network.

One or more (e.g., an ensemble of) Bayesian networks of causal relationships between the expression level of the plurality of genes and the functional activity or cellular response may be generated for the comparison cell model (the "generated comparison cell model networks") from the first data set and the second data set (step 26). The generated comparison cell model networks, individually or collectively, include quantitative probabilistic directional information regarding relationships. The generated cell networks are not based on known biological relationships between gene expression and/or functional activity or cellular response, other than the information in the first data set and the second data set. The one or more generated comparison model networks may collectively be refered to as a consensus cell model network.

The generated cell model networks and the generated comparison cell model networks may be created using an artificial intelligence based (AI-based) informatics platform. Further details regarding the creation of the generated cell model networks, the creation of the generated comparison cell model networks and the AI-based informatics system appear below in section III.C and in the description of Figures 2A-3.

It should be noted that many different AI-based platforms or systems may be employed to generate the Bayesian networks of causal relationships including quantitative probabilistic directional information. Although certain examples described herein employ one specific commercially available system, i.e., REFS™ (Reverse Engineering/Forward Simulation) from GNS (Cambridge, MA), embodiments are not limited. AI-Based Systems or Platforms suitable to implement some embodiments employ mathematical algorithms to establish causal relationships among the input variables (e.g., the first and second data sets), based only on the input data without taking into consideration prior existing knowledge about any potential, established, and/or verified biological relationships.

For example, the REFS™ AI-based informatics platform utilizes experimentally derived raw (original) or minimally processed input biological data (e.g., genetic, genomic, epigenetic, proteomic, metabolomic, and clinical data), and rapidly performs trillions of calculations to determine how molecules interact with one another in a complete system. The REFS™ AI-based informatics platform performs a reverse engineering process aimed at creating an in silico computer-implemented cell model (e.g., generated cell model networks), based on the input data, that quantitatively represents the underlying biological system (e.g. drug-induced toxicity). Further, hypotheses about the underlying biological system can be developed and rapidly simulated based on the computer-implemented cell model, in order to obtain predictions, accompanied by associated confidence levels, regarding the hypotheses.

With this approach, biological systems are represented by quantitative computer- implemented cell models in which "interventions" are simulated to learn detailed mechanisms of the biological system (e.g., drug-induced toxicity), effective intervention strategies, and/or clinical biomarkers that determine which patients will respond to a given treatment regimen. Conventional bioinformatics and statistical approaches, as well as approaches based on the modeling of known biology, are typically unable to provide these types of insights.

After the generated cell model networks and the generated comparison cell model networks are created, they are compared. One or more causal relationships present in at least some of the generated cell model networks, and absent from, or having at least one significantly different parameter in, the generated comparison cell model networks are identified (step 28). Such a comparison may result in the creation of a differential network. The comparison, identification, and/or differential (delta) network creation may be conducted using a differential network creation module, which is described in further detail below in section III.D and with respect to the description of Figure 18.

In some embodiments, input data sets are from one cell type and one comparison cell type, which creates an ensemble of cell model networks based on the one cell type and another ensemble of comparison cell model networks based on the one comparison control cell type. A differential may be performed between the ensemble of networks of the one cell type and the ensemble of networks of the comparison cell type(s).

In other embodiments, input data sets are from multiple cell types (e.g., two or more cell types that are normally associated with the particular type of drug-induced toxicity and multiple comparison cell types (e.g., two or more normal cell types, e.g., same cells which are not exposed to the drug). An ensemble of cell model networks may be generated for each cell types and each comparison cell type individually, and/or data from the multiple cell types and the multiple comparison cell types may be combined into respective composite data sets. The composite data sets produce an ensemble of networks corresponding to the multiple cell types (composite data) and another ensemble of networks corresponding to the multiple comparison cell types (comparison composite data). A differential may be performed on the ensemble of networks for the composite data as compared to the ensemble of networks for the comparison composite data.

In some embodiments, a differential may be performed between two different differential networks. This output may be referred to as a delta-delta network, and is described below with respect to Figure 18.

Quantitative relationship information may be identified for each relationship in the generated cell model networks (step 30). Similarly, quantitative relationship information for each relationship in the generated comparison cell model networks may be identified (step 32). The quantitative information regarding the relationship may include a direction indicating causality, a measure of the statistical uncertainty regarding the relationship (e.g., an Area Under the Curve (AUC) statistical measurement), and/or an expression of the quantitative magnitude of the strength of the relationship (e.g., a fold). The various relationships in the generated cell model networks may be profiled using the quantitative relationship information to explore each hub of activity in the networks as a potential therapeutic target and/or biomarker. Such profiling can be done entirely in silico based on the results from the generated cell model networks, without resorting to any actual wet-lab experiments.

In some embodiments, a hub of activity in the networks may be validated by employing molecular and cellular techniques. Such post-informatic validation of output with wet-lab cell based experiments need not be performed, but it may help to create a full-circle of interrogation. Figure 15 schematically depicts a simplified high level representation of the functionality of an exemplary AI-based informatics system (e.g., REFS™ AI-based informatics system) and interactions between the AI-based system and other elements or portions of an interrogative biology platform ("the Platform"). In Figure 15 A, various data sets obtained from a model for a biological process (e.g., a drug-induced toxicity model), such as drug dosage, treatment dosage, protein expression, mRNA expression, lipid levels, metabolite levels, kinase activity and any of many other associated functional measures (such as OCR, ECAR) are fed into an AI- based system. As shown in Figure 15B, from the input data sets, the Al-system creates a library of "network fragments" that includes variables (e.g., proteins, lipids, kinases and metabolites) that drive molecular mechanisms in the biological process (e.g., drug- induced toxicity), in a process referred to as Bayesian Fragment Enumeration (Figure 15B).

In Figure 15C, the AI-based system selects a subset of the network fragments in the library and constructs an initial trial network from the fragments. The AI-based system also selects a different subset of the network fragments in the library to construct another initial trial network. Eventually an ensemble of initial trial networks are created (e.g., 1000 networks) from different subsets of network fragments in the library. This process may be termed parallel ensemble sampling. Each trial network in the ensemble is evolved or optimized by adding, subtracting and/or substitution additional network fragments from the library. If additional data is obtained, the additional data may be incorporated into the network fragments in the library and may be incorporated into the ensemble of trial networks through the evolution of each trial network. After completion of the optimization/evolution process, the ensemble of trial networks may be described as the generated cell model networks.

As shown in Figure 15D, the ensemble of generated cell model networks may be used to simulate the behavior of the biological system (e.g. drug-induced toxicity). The simulation may be used to predict behavior of the biological system (e.g. drug-induced toxicity) to changes in conditions, which may be experimentally verified using wet-lab cell-based, or animal-based, experiments. Also, quantitative parameters of relationships in the generated cell model networks may be extracted using the simulation functionality by applying simulated perturbations to each node individually while observing the effects on the other nodes in the generated cell model ne works. Further detail is provided below in section III.C.

The automated reverse engineering process of the AI-based informatics system, which is depicted in Figures 2A-2D, creates an ensemble of generated cell model networks networks that is an unbiased and systematic computer-based model of the cells.

The reverse engineering determines the probabilistic directional network connections between the molecular measurements in the data, and the phenotypic outcomes of interest. The variation in the molecular measurements enables learning of the probabilistic cause and effect relationships between these entities and changes in endpoints. The machine learning nature of the platform also enables cross training and predictions based on a data set that is constantly evolving.

The network connections between the molecular measurements in the data are "probabilistic," partly because the connection may be based on correlations between the observed data sets "learned" by the computer algorithm. For example, if the expression level of protein X and that of protein Y are positively or negatively correlated, based on statistical analysis of the data set, a causal relationship may be assigned to establish a network connection between proteins X and Y. The reliability of such a putative causal relationship may be further defined by a likelihood of the connection, which can be measured by p- value (e.g., p < 0.1, 0.05, 0.01, etc).

The network connections between the molecular measurements in the data are "directional," partly because the network connections between the molecular measurements, as determined by the reverse-engineering process, reflects the cause and effect of the relationship between the connected gene / protein, such that raising the expression level of one protein may cause the expression level of the other to rise or fall, depending on whether the connection is stimulatory or inhibitory.

The network connections between the molecular measurements in the data are "quantitative," partly because the network connections between the molecular measurements, as determined by the process, may be simulated in silico, based on the existing data set and the probabilistic measures associated therewith. For example, in the established network connections between the molecular measurements, it may be possible to theoretically increase or decrease (e.g., by 1, 2, 3, 5, 10, 20, 30, 50,100-fold or more) the expression level of a given protein (or a "node" in the network), and quantitatively simulate its effects on other connected proteins in the network.

The network connections between the molecular measurements in the data are "unbiased," at least partly because no data points are statistically or artificially cut-off, and partly because the network connections are based on input data alone, without referring to pre-existing knowledge about the biological process in question.

The network connections between the molecular measurements in the data are "systemic" and (unbiased), partly because all potential connections among all input variables have been systemically explored, for example, in a pair- wise fashion. The reliance on computing power to execute such systemic probing exponentially increases as the number of input variables increases. In general, an ensemble of -1,000 networks is usually sufficient to predict probabilistic causal quantitative relationships among all of the measured entities. The ensemble of networks captures uncertainty in the data and enables the calculation of confidence metrics for each model prediction. Predictions generated using the ensemble of networks together, where differences in the predictions from individual networks in the ensemble represent the degree of uncertainty in the prediction. This feature enables the assignment of confidence metrics for predictions of clinical response generated from the model.

Once the models are reverse-engineered, further simulation queries may be conducted on the ensemble of models to determine key molecular drivers for the biological process in question, such as a drug-induced toxicity condition.

Sketch of components employed to build examplary In vitro models representing normal and diabetic statesis is depicted in Figure 9. Schematic representation of an examplary informatics platform REFS™ used to generate causal networks of the protein as they relate to disease pathophysiology is depicted in Figure 10. Schematic representation of examplary approach towards generation of differential network in diabetic versus normal states and diabetic nodes that are restored to normal states by treatment with MIMS is depicted in Figure 11. A representative differential network in diabetic versus normal states is depicted in Figure 12. A schematic representation of a node and associated edges of interest (Nodel in the center) and the cellular functionality associated with each edge is depicted in Figure 13.

The invention having been generally described above, the sections below provide more detailed description for various aspects or elements of the general invention, in conjunction with one or more specific biological systems (e.g. drug-induced toxicity) that can be analyzed using the methods herein. It should be noted, however, the specific drug-induced toxicity used for illustration purpose below are not limiting. To the contrary, it is intended that other distinct drug-induced toxicities, including any alternatives, modifications, and equivalents thereof, may be analyzed similarly using the subject Platform technology.

II. Definitions As used herein, certain terms intended to be specifically defined, but are not already defined in other sections of the specification, are defined herein.

The articles "a" and "an" are used herein to refer to one or to more than one (i. e. , to at least one) of the grammatical object of the article. By way of example, "an element" means one element or more than one element.

The term "including" is used herein to mean, and is used interchangeably with, the phrase "including but not limited to."

The term "or" is used herein to mean, and is used interchangeably with, the term "and/or," unless context clearly indicates otherwise.

The term "such as" is used herein to mean, and is used interchangeably, with the phrase "such as but not limited to."

"Metabolic pathway" refers to a sequence of enzyme-mediated reactions that transform one compound to another and provide intermediates and energy for cellular functions. The metabolic pathway can be linear or cyclic or branched.

"Metabolic state" refers to the molecular content of a particular cellular, multicellular or tissue environment at a given point in time as measured by various chemical and biological indicators as they relate to a state of health or disease.

The term "microarray" refers to an array of distinct polynucleotides, oligonucleotides, polypeptides (e.g., antibodies) or peptides synthesized on a substrate, such as paper, nylon or other type of membrane, filter, chip, glass slide, or any other suitable solid support.

The terms "disorders" and "diseases" are used inclusively and refer to any deviation from the normal structure or function of any part, organ or system of the body (or any combination thereof). A specific disease is manifested by characteristic symptoms and signs, including biological, chemical and physical changes, and is often associated with a variety of other factors including, but not limited to, demographic, environmental, employment, genetic and medically historical factors. Certain characteristic signs, symptoms, and related factors can be quantitated through a variety of methods to yield important diagnostic information.

The term "drug-induced toxicity" includes but is not limited to cardiotoxicity, hepatotoxicity, hephrotoxicity, neurotoxicity, renaltoxicity or myotoxicity. The term "cardiotoxicity" refers to a broad range of adverse effects on heart function induced by therapeutic molecules. It may emerge early in pre-clinical studies or become apparent later in the clinical setting. Cardiovascular toxicity described herein includes, but is not limited to, any one or more of increased QT duration, arrhythmias, myocardial ischemia, hypertension and thromboembolic complications, myocardial dysfunction, cardiomyopathy, heart failure, atrial fibrillation, cardiomyopathy and heart failure, heart failure and LV dysfunction, atrial flutter and fibrillation, and, heart valve damage and heart failure.

The term "expression" includes the process by which a polypeptide is produced from polynucleotides, such as DNA. The process may involves the transcription of a gene into mRNA and the translation of this mRNA into a polypeptide. Depending on the context in which it is used, "expression" may refer to the production of RNA, protein or both.

The terms "level of expression of a gene" or "gene expression level" refer to the level of mRNA, as well as pre-mRNA nascent transcript(s), transcript processing intermediates, mature mRNA(s) and degradation products, or the level of protein, encoded by the gene in the cell.

The term "modulation" refers to upregulation (i.e., activation or stimulation), downregulation (i.e., inhibition or suppression) of a response, or the two in combination or apart. A "modulator" is a compound or molecule that modulates, and may be, e.g., an agonist, antagonist, activator, stimulator, suppressor, or inhibitor.

"Normal level" of a protein, a lipid, a transcript, a metabolite, or gene expression refers to the level of the protein, lipid, transcript, metabolite, or gene expression prior to contacting the cells with the drug with the potentially toxic drug. A "normal level" can be determined in cells grown under various conditions, e.g., hyperglycemia, hypoxia, if the toxicity of the drug is to be tested under the same conditions.

"Modulated level" refers to a changed value relative to the normal level which is based on historical normal control samples or preferably normal control samples tested in the same experiment. The specific "normal" value will depend, for example, on the type of assay (e.g., ELISA, enzyme activity, immunohistochemistry, PCR), the sample to be tested (e.g., cell type and culture conditions), and other considerations known to those of skill in the art. Control samples can be used to define cut-offs between normal and abnormal.

A drug is considered to be toxic if treatment of cells with the drug results in a statistically significant change in the level of at least one marker relative to a "normal" or appropriate control level. It is understood that not all concentrations of a drug must result in a statistically significant change in the level of the at least one marker. In a preferred embodiment, a drug is considered to potentially have toxicities if a therapeutically relevant concentration of the drug results in a statistically significant change in the level of at least on marker.

A "rescue agent" is considered to be effective in reducing toxicity if the level of the marker is modulated in a statistically significant manner towards the marker level in the "normal cells" when the rescue agent is present at a therapeutically relevant concentration. In a preferred embodiment, the rescue agent returns the marker to a level that is not statistically different from the level of the marker in the control cells.

The term "control level" refers to an accepted or pre-determined level of a marker, or preferably the marker level determined in a control sample tested in parallel with the test sample, which is used to compare with the level of a marker in a sample derived from cells not treated with the potentially toxic drug or rescue agent. A "control level" is obtained from cells that are cultured under the same conditions, e.g., hypoxia, hyperglycemia, lactic acid, etc.

The term "Trolamine," as used herein, refers to Trolamine NF, Triethanolamine, TEALAN ^®, TEAlan 99%, Triethanolamine, 99%, Triethanolamine, NF or

Triethanolamine, 99%, NF. These terms may be used interchangeably herein.

The term "genome" refers to the entirety of a biological entity's (cell, tissue, organ, system, organism) genetic information. It is encoded either in DNA or RNA (in certain viruses, for example). The genome includes both the genes and the non-coding sequences of the DNA.

The term "proteome" refers to the entire set of proteins expressed by a genome, a cell, a tissue, or an organism at a given time. More specifically, it may refer to the entire set of expressed proteins in a given type of cells or an organism at a given time under defined conditions. Proteome may include protein variants due to, for example, alternative splicing of genes and/or post-translational modifications (such as glycosylation or phosphorylation).

The term "transcriptome" refers to the entire set of transcribed RNA molecules, including mRNA, rRNA, tRNA, microRNA and other non-coding RNA produced in one or a population of cells at a given time. The term can be applied to the total set of transcripts in a given organism, or to the specific subset of transcripts present in a particular cell type. Unlike the genome, which is roughly fixed for a given cell line (excluding mutations), the transcriptome can vary with external environmental conditions. Because it includes all mRNA transcripts in the cell, the transcriptome reflects the genes that are being actively expressed at any given time, with the exception of mRNA degradation phenomena such as transcriptional attenuation.

The study of transcriptomics, also referred to as expression profiling, examines the expression level of mRNAs in a given cell population, often using high-throughput techniques based on DNA microarray technology.

The term "metabolome" refers to the complete set of small-molecule metabolites (such as metabolic intermediates, hormones and other signalling molecules, and secondary metabolites) to be found within a biological sample, such as a single organism, at a given time under a given condition. The metabolome is dynamic, and may change from second to second.

The term "lipidome" refers to the complete set of lipids to be found within a biological sample, such as a single organism, at a given time under a given condition. The lipidome is dynamic, and may change from second to second.

The term "interactome" refers to the whole set of molecular interactions in a biological system under study (e.g. , cells). It can be displayed as a directed graph. Molecular interactions can occur between molecules belonging to different biochemical families (proteins, nucleic acids, lipids, carbohydrates, etc.) and also within a given family. When spoken in terms of proteomics, interactome refers to protein-protein interaction network (PPI), or protein interaction network (PIN). Another extensively studied type of interactome is the protein-DNA interactome (network formed by transcription factors (and DNA or chromatin regulatory proteins) and their target genes.

The term "cellular output" includes a collection of parameters, preferably measurable parameters, relating to cellullar status, including (without limiting): level of transcription for one or more genes (e.g. , measurable by RT-PCR, qPCR, microarray, etc.), level of expression for one or more proteins (e.g. , measurable by mass

spectrometry or Western blot), absolute activity (e.g. , measurable as substrate conversion rates) or relative activity (e.g. , measurable as a % value compared to maximum activity) of one or more enzymes or proteins, level of one or more metabolites or intermediates, level of oxidative phosphorylation (e.g., measurable by Oxygen Consumption Rate or OCR), level of glycolysis (e.g. , measurable by Extra Cellular Acidification Rate or ECAR), extent of ligand-target binding or interaction, activity of extracellular secreted molecules, etc. The cellular output may include data for a predetermined number of target genes or proteins, etc., or may include a global assessment for all detectable genes or proteins. For example, mass spectrometry may be used to identify and/or quantitate all detectable proteins expressed in a given sample or cell population, without prior knowledge as to whether any specific protein may be expressed in the sample or cell population.

As used herein, a "cell system" includes a population of homogeneous or heterogeneous cells. The cells within the system may be growing in vivo, under the natural or physiological environment, or may be growing in vitro in, for example, controlled tissue culture environments. The cells within the system may be relatively homogeneous (e.g. , no less than 70%, 80%, 90%, 95%, 99%, 99.5%, 99.9%

homogeneous), or may contain two or more cell types, such as cell types usually found to grow in close proximity in vivo, or cell types that may interact with one another in vivo through, e.g., paracrine or other long distance inter-cellular communication. The cells within the cell system may be derived from established cell lines, including cancer cell lines, immortal cell lines, or normal cell lines, or may be primary cells or cells freshly isolated from live tissues or organs.

Cells in the cell system are typically in contact with a "cellular environment" that may provide nutrients, gases (oxygen or C0 ₂, etc.), chemicals, or proteinaceous / non- proteinaceous stimulants that may define the conditions that affect cellular behavior. The cellular environment may be a chemical media with defined chemical components and/or less well-defined tissue extracts or serum components, and may include a specific pH, C0 ₂ content, pressure, and temperature under which the cells grow. Alternatively, the cellular environment may be the natural or physiological environment found in vivo for the specific cell system. In certain embodiments, a cell environment comprises conditions that simulate an aspect of a biological system or process, e.g., simulate a disease state, process, or environment. Such culture conditions include, for example, hyperglycemia, hypoxia, or lactic-rich conditions. Numerous other such conditions are described herein.

In certain embodiments, a cellular environment for a specific cell system also include certain cell surface features of the cell system, such as the types of receptors or ligands on the cell surface and their respective activities, the structure of carbohydrate or lipid molecules, membrane polarity or fluidity, status of clustering of certain membrane proteins, etc. These cell surface features may affect the function of nearby cells, such as cells belonging to a different cell system. In certain other embodiments, however, the cellular environment of a cell system does not include cell surface features of the cell system.

The cellular environment may be altered to become a "modified cellular environment." Alterations may include changes {e.g. , increase or decrease) in any one or more component found in the cellular environment, including addition of one or more "external stimulus component" to the cellular environment. The environmental perturbation or external stimulus component may be endogenous to the cellular environment {e.g. , the cellular environment contains some levels of the stimulant, and more of the same is added to increase its level), or may be exogenous to the cellular environment {e.g. , the stimulant is largely absent from the cellular environment prior to the alteration). The cellular environment may further be altered by secondary changes resulting from adding the external stimulus component, since the external stimulus component may change the cellular output of the cell system, including molecules secreted into the cellular environment by the cell system.

As used herein, "external stimulus component", also referred to herein as "environmental perturbation", include any external physical and/or chemical stimulus that may affect cellular function. This may include any large or small organic or inorganic molecules, natural or synthetic chemicals, temperature shift, pH change, radiation, light (UVA, UVB etc.), microwave, sonic wave, electrical current, modulated or unmodulated magnetic fields, etc.

The term "Multidimensional Intracellular Molecule (MIM)", is an isolated version or synthetically produced version of an endogenous molecule that is naturally produced by the body and/or is present in at least one cell of a human. A MIM is capable of entering a cell and the entry into the cell includes complete or partial entry into the cell as long as the biologically active portion of the molecule wholly enters the cell. MIMs are capable of inducing a signal transduction and/or gene expression mechanism within a cell. MIMs are multidimensional because the molecules have both a therapeutic and a carrier, e.g., drug delivery, effect. MIMs also are multidimensional because the molecules act one way in a disease state and a different way in a normal state. For example, in the case of CoQ-10, administration of CoQ-10 to a melanoma cell in the presence of VEGF leads to a decreased level of Bcl2 which, in turn, leads to a decreased oncogenic potential for the melanoma cell. In contrast, in a normal fibroblast, co- administration of CoQ-10 and VEFG has no effect on the levels of Bcl2.

In one embodiment, a MIM is also an epi-shifter In another embodiment, a MIM is not an epi-shifter. In another embodiment, a MIM is characterized by one or more of the foregoing functions. In another embodiment, a MIM is characterized by two or more of the foregoing functions. In a further embodiment, a MIM is characterized by three or more of the foregoing functions. In yet another embodiment, a MIM is characterized by all of the foregoing functions. The skilled artisan will appreciate that a MIM of the invention is also intended to encompass a mixture of two or more endogenous molecules, wherein the mixture is characterized by one or more of the foregoing functions. The endogenous molecules in the mixture are present at a ratio such that the mixture functions as a MIM.

MIMs can be lipid based or non- lipid based molecules. Examples of MIMs include, but are not limited to, CoQIO, acetyl Co-A, palmityl Co-A, L-carnitine, amino acids such as, for example, tyrosine, phenylalanine, and cysteine. In one embodiment, the MIM is a small molecule. In one embodiment of the invention, the MIM is not CoQIO. MIMs can be routinely identified by one of skill in the art using any of the assays described in detail herein. MIMs are described in further detail in US 12/777,902 (US 2011-0110914), the entire contents of which are expressly incorporated herein by reference.

As used herein, an "epimetabolic shifter" (epi-shifter) is a molecule that modulates the metabolic shift from a healthy (or normal) state to a disease state and vice versa, thereby maintaining or reestablishing cellular, tissue, organ, system and/or host health in a human. Epi-shifters are capable of effectuating normalization in a tissue microenvironment. For example, an epi-shifter includes any molecule which is capable, when added to or depleted from a cell, of affecting the microenvironment (e.g., the metabolic state) of a cell. The skilled artisan will appreciate that an epi-shifter of the invention is also intended to encompass a mixture of two or more molecules, wherein the mixture is characterized by one or more of the foregoing functions. The molecules in the mixture are present at a ratio such that the mixture functions as an epi-shifter. Examples of epi-shifters include, but are not limited to, CoQ-10; vitamin D3; ECM components such as fibronectin; immunomodulators, such as TNFa or any of the interleukins, e.g., IL-5, IL- 12, IL-23 ; angiogenic factors; and apoptotic factors.

In one embodiment, the epi-shifter also is a MIM. In one embodiment, the epi- shifter is not CoQIO. Epi-shifters can be routinely identified by one of skill in the art using any of the assays described in detail herein. Epi-shifters are described in further detail in US 12/777,902 (US 2011-0110914), the entire contents of which are expressly incorporated herein by reference.

Other terms not explicitly defined in the instant application have meaning as would have been understood by one of ordinary skill in the art.

III. Exemplary Steps and Components of the Platform Technology

For illustration purpose only, the following steps of the subject Platform

Technology may be described herein below as an exemplary utility for integrating data obtained from a custom built drug-induced toxicity model, and for identifying novel proteins / pathways driving the pathogenesis of drug-induced toxicity. Relational maps resulting from this analysis provides drug-induced toxicity treatment targets, as well as diagnostic / prognostic markers associated with drug-induced toxicity. However, the subject Platform Technology has general applicability for any drug-induced toxicity, and is not limited to any particular drug-induced toxicityor other specific drug-induced toxicity models.

In addition, although the description below is presented in some portions as discrete steps, it is for illustration purpose and simplicity, and thus, in reality, it does not imply such a rigid order and/or demarcation of steps. Moreover, the steps of the invention may be performed separately, and the invention provided herein is intended to encompass each of the individual steps separately, as well as combinations of one or more (e.g., any one, two, three, four, five, six or all seven steps) steps of the subject Platform Technology, which may be carried out independently of the remaining steps.

The invention also is intended to include all aspects of the Drug-induced Toxicity Platform Technology as separate components and embodiments of the invention. For example, the generated data sets are intended to be embodiments of the invention. As further examples, the generated causal relationship networks, generated consensus causal relationship networks, and/or generated simulated causal relationship networks, are also intended to be embodiments of the invention. The causal relationships identified as being unique in the drug-induced toxicity system are intended to be embodiments of the invention. Further, the custom built models for a particular drug-induced toxicity system are also intended to be embodiments of the invention. For example, custom built models for a drug-induced toxicity state or process, such as, e.g., a custom built model for toxicity (e.g., cardiotoxicity) of a drug, are also intended to be embodiments of the invention.

A. Custom Model Building

The first step in the Platform Technology is the establishment of a model for a drug-induced toxicity system or process. An example of a drug-induced toxicity system or process is cardiotoxicity. As any other complicated biological process or system, cardiotoxicity is a complicated pathological condition characterized by multiple unique aspects. For example, chronic imbalance in uptake, utilization, organellar biogenesis and secretion in non-adipose tissue (heart and liver) is thought to be at the center of mitochondrial damage and dysfunction and a key player in drug induced cardiotoxicity. To this end, a custom cardiotoxicity model comprising diabetic and normal

cardiomyocytes may be established to simulate the environment of cardiotoxicity, e.g., by creating cell culture conditions closely approximating the conditions of a cadiac cell experiencing cardiotoxicity. One or more relevant types of cells may be used in the model, such as, for example, cardiomyocytes, diabetic cardiomyocytes, hepatocytes, kidney cells, neural cells, renal cells, or myoblasts.

One such "environment", or growth stress condition, is hypoxia, a condition typically found in a number of disease states and in late stage diabetes or in

cardiovascular disease due to ischemia and poor circulation. Hypoxia can be induced in cells in cells using art-recognized methods. For example, hypoxia can be induced by placing cell systems in a Modular Incubator Chamber (MIC- 101, Billups-Rothenberg Inc. Del Mar, CA), which can be flooded with an industrial gas mix containing 5% C0 ₂, 2% 0 ₂ and 93% nitrogen. Effects can be measured after a pre-determined period, e.g. , at 24 hours after hypoxia treatment, with and without additional external stimulus components (e.g. , CoQIO at 0, 50, or 100 μΜ).

Likewise, lactic acid treatment of cells mimics a cellular environment where glycolysis activity is high. Lactic acid induced stress can be investigated at a final lactic acid concentration of about 12.5 mM at a pre-determined time, e.g. , at 24 hours, with or without additional external stimulus components (e.g. , CoQIO at 0, 50, or 100 μΜ).

Hyperglycemia is normally a condition found in diabetes. As high glucose is known to alter cellular metabolism, agents for the treatment of diabetes can be tested in cells cultured under hyperglycemic conditions. Exposing subject cells to a typical hyperglycemic condition may include adding 10% culture grade glucose to suitable media, such that the final concentration of glucose in the media is about 22 mM.

However, as subjects with type 2 diabetes, are frequently overweight or obese, they are frequently treated for other diseases or conditions with other agents, e.g., arthritis with anti-inflammatory agents, cardiovascular disease with cholesterol lowering, blood pressure lowering, or blood thinning agents. Thus, custom built models can be used to assess drug toxicity in normal subjects as compared to subjects to be treated for a first condition with a first agent that also have other diseases or conditions. For example, cells not exposed or exposed to hyperglycemic conditions can be tested together to detect differential toxicities of agents in subjects with or without diabetes.

Hyperlipidemia is a condition found, for example, in obesity and cardiovascular disease. Hyperlipidemia is also a condition which mimics one aspect of cardiotoxicity. The hyperlipidemic conditions can be provided by culturing cells in media containing 0.15 mM sodium palmitate.

Individual conditions reflecting different aspects of toxicity may be investigated separately in the custom built toxicity model, and/or may be combined together. In one embodiment, combinations of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50 or more conditions reflecting or simulating different aspects of toxicity conditions are investigated in the custom built toxicity model. In one embodiment, individual conditions and, in addition, combinations of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50 or more of the conditions reflecting or simulating different aspects of toxicity conditions are investigated in the custom built toxicity model. All values presented in the foregoing list can also be the upper or lower limit of ranges, that are intended to be a part of this invention, e.g., between 1 and 5, 1 and 10, 1 and 20, 1 and 30, 2 and 5, 2 and 10, 5 and 10, 1 and 20, 5 and 20, 10 and 20, 10 and 25, 10 and 30 or 10 and 50 different conditions.

Listed herein below are a few exemplary combinations of conditions that can be used to treat cells for building drug-induced toxicity models. Other combinations can be readily formulated depending on the specific interrogative biological assessment that is being conducted.

1. Media only

2. 50 μΜ CTL Coenzyme Q10 (CoQIO)

3. 100 μΜ CTL Coenzyme Q10

4. 12.5 mM Lactic Acid

5. 12.5 mM Lactic Acid + 50 μΜ CTL Coenzyme Q10

6. 12.5 mM Lactic Acid + 100 μΜ CTL Coenzyme Q10

7. Hypoxia

8. Hypoxia + 50 μΜ CTL Coenzyme Q10

9. Hypoxia + 100 μΜ CTL Coenzyme Q10

10. Hypoxia + 12.5 mM Lactic Acid

11. Hypoxia + 12.5 mM Lactic Acid + 50 μΜ CTL Coenzyme Q10

12. Hypoxia + 12.5 mM Lactic Acid + 100 μΜ CTL Coenzyme Q10

13. Media + 22 mM Glucose

14. 50 μΜ CTL Coenzyme Q10 + 22 mM Glucose

15. 100 μΜ CTL Coenzyme Q10 + 22 mM Glucose

16. 12.5 mM Lactic Acid + 22 mM Glucose

17. 12.5 mM Lactic Acid + 22 mM Glucose + 50 μΜ CTL Coenzyme Q10

18. 12.5 mM Lactic Acid + 22 mM Glucose +100 μΜ CTL Coenzyme Q10

19. Hypoxia + 22 mM Glucose

20. Hypoxia + 22 mM Glucose + 50 μΜ CTL Coenzyme Q10

21. Hypoxia + 22 mM Glucose + 100 μΜ CTL Coenzyme Q 10

22. Hypoxia +12.5 mM Lactic Acid + 22 mM Glucose 23. Hypoxia +12.5 mM Lactic Acid + 22 mM Glucose + 50 μΜ CTL Coenzyme Q10

24. Hypoxia + 12.5 mM Lactic Acid + 22 mM Glucose +100 μΜ CTL Coenzyme Q10

As a control one or more cell lines (e.g.,cardiomyocytes, diabetic

cardiomyocytes, hepatocytes, kidney cells, neural cells, renal cells, or myoblasts) are cultured under control conditions in order to identify toxicity unique proteins or pathways (see below). The control may be the comparison cell model described above.

Multiple cells of the same or different origin (for example, cardiomyocytes, diabetic cardiomyocytes, hepatocytes, kidney cells, neural cells, renal cells, or myoblasts), as opposed to a single cell type, may be included in the toxicity model. In certain situations, cross talk or ECS experiments between different cells

(cardiomyocytes, diabetic cardiomyocytes, hepatocytes, kidney cells, neuro cells, renal cells, or myoblasts ) may be conducted for several inter-related purposes.

In some embodiments that involve cross talk, experiments conducted on the cell models are designed to determine modulation of cellular state or function of one cell system or population (e.g. , cardiomyocytes) by another cell system or population (e.g. , diabetic cardiomyocytes) under defined treatment conditions (e.g. , hyperglycemia, hypoxia (ischemia)). According to a typical setting, a first cell system / population is contacted by an external stimulus components, such as a candidate molecule (e.g. , a small drug molecule, a protein) or a candidate condition (e.g. , hypoxia, high glucose environment). In response, the first cell system / population changes its transcriptome, proteome, metabolome, and/or interactome, leading to changes that can be readily detected both inside and outside the cell. For example, changes in transcriptome can be measured by the transcription level of a plurality of target mRNAs; changes in proteome can be measured by the expression level of a plurality of target proteins; and changes in metabolome can be measured by the level of a plurality of target metabolites by assays designed specifically for given metabolites. Alternatively, the above referenced changes in metabolome and/or proteome, at least with respect to certain secreted metabolites or proteins, can also be measured by their effects on the second cell system / population, including the modulation of the transcriptome, proteome, metabolome, and interactome of the second cell system / population. Therefore, the experiments can be used to identify the effects of the molecule(s) of interest secreted by the first cell system / population on a second cell system / population under different treatment conditions. The experiments can also be used to identify any proteins that are modulated as a result of signaling from the first cell system (in response to the external stimulus component treatment) to another cell system, by, for example, differential screening of proteomics. The same experimental setting can also be adapted for a reverse setting, such that reciprocal effects between the two cell systems can also be assessed. In general, for this type of experiment, the choice of cell line pairs is largely based on the factors such as origin, toxicity state and cellular function.

Although two-cell systems are typically involved in this type of experimental setting, similar experiments can also be designed for more than two cell systems by, for example, immobilizing each distinct cell system on a separate solid support.

Once the custom model is built, one or more "perturbations" may be applied to the system, such as genetic variation from patient to patient, or with / without treatment by certain drugs or pro-drugs. See Figure 15D. The effects of such perturbations to the system, including the effect on cells related to drug-induced toxicity, and normal control cells, can be measured using various art-recognized or proprietary means, as described in section III.B below.

In an exemplary experiment, cardiomyocytes are conditioned in hyperglycemia and hyperlipidemia conditions, and in addition with or without an environmental perturbation, specifically treatment by a diabetic drug known for inducing cardiotoxicity and/or a potential rescue agent CoenzymeQIO.

The custom built cell model may be established and used throughout the steps of the Platform Technology of the invention to ultimately identify a causal relationship unique in the drug-induced toxicity system, by carrying out the steps described herein. It will be understood by the skilled artisan, however, that a custom built cell model that is used to generate an initial, "first generation" consensus causal relationship network for a drug-induced toxicity can continually evolve or expand over time, e.g., by the introduction of additional drug-induced toxicity related cell lines and/or additional drug- induced toxicity related conditions. Additional data from the evolved cell model, i.e., data from the newly added portion(s) of the cell model, can be collected. The new data collected from an expanded or evolved cell model, i.e., from newly added portion(s) of the cell model, can then be introduced to the data sets previously used to generate the "first generation" consensus causal relationship network in order to generate a more robust "second generation" consensus causal relationship network. New causal relationships unique to the drug-induced toxicity can then be identified from the "second generation" consensus causal relationship network. In this way, the evolution of the cell model provides an evolution of the consensus causal relationship networks, thereby providing new and/or more reliable insights into the modulators of the drug-induced toxicity.

Custom models can also be designed to assess toxicity of drugs used in combination. For example, therapeutic agents for the treatment of a number of conditions including cancer, auto-immune disease, or HIV are typically administered as cocktails of combinations of agents. Further, many subjects have multiple, unrelated conditions to be treated simultaneously (e.g., diabetes, arthritis, cardiovascular disease). Models can be built, either in normal cells or in cells subjected to various culture conditions, to identify combinations of agents that may result in toxicities when administered simultaneously. Thus, the methods provided include testing combinations of agents (e.g., 2, 3, 4, 5, 6, 7, 8 or more) together to determine if the combination results in drug related toxicities, including with agents that do not result in toxicities alone.

Models can also be built for "personalized medicine" applications in which the specific combination of drugs being administered or considered for administration can be tested using the methods provided herein to determine if the combination of drugs are likely to have unacceptable toxicities. Such combinations can be tested in various cell types (e.g., cardiac cells, kidney cells, nerve cells, muscle cells, liver cells; either cell lines or primary cells cultured from the subject) grown under various conditions to mimic the subject of interest (e.g., grown in high glucose for a subject with diabetes or hypoxia for a subject with ischemia).

Additional examples of custom built cell models are described in detail herein.

B. Data Collection

In general, two types of data may be collected from any custom built model systems. One type of data (e.g., the first set of data, the third set of data) usually relates to the level of certain macromolecules, such as DNA, RNA, protein, lipid, etc. An exemplary data set in this category is proteomic data (e.g., qualitative and quantitative data concerning the expression of all or substantially all measurable proteins from a sample). The other type of data is generally functional data (e.g., the second set of data, the fourth set of data) that reflects the phenotypic changes resulting from the changes in the first type of data. Functional activity or cellular response of the cells can include any one or more of bioenergetics, cell proliferation, apoptosis, organellar function, a genotype-phenotype association actualized by functional models selected from ATP, ROS, OXPHOS, and Seahorse assays, global enzyme activity (e.g., global kinase activity), and an effect of global enzyme activity on the enzyme metabolic substrates of cells associated with drug-induced toxicity (e.g., phosphoproteomic data).

With respect to the first type of data, in some example embodiments, quantitative polymerase chain reaction (qPCR) and proteomics are performed to profile changes in cellular mRNA and protein expression by quantitative polymerase chain reaction (qPCR) and proteomics. Total RNA can be isolated using a commercial RNA isolation kit. Following cDNA synthesis, specific commercially available qPCR arrays (e.g., those from SA Biosciences) for disease area or cellular processes such as angiogenesis, apoptosis, and diabetes, may be employed to profile a predetermined set of genes by following a manufacturer's instructions. For example, the Biorad cfx-384 amplification system can be used for all transcriptional profiling experiments. Following data collection (Ct), the final fold change over control can be determined using the 5Ct method as outlined in manufacturer's protocol. Proteomic sample analysis can be performed as described in subsequent sections.

The subject method may employ large-scale high-throughput quantitative proteomic analysis of hundreds of samples of similar character, and provides the data necessary for identifying the cellular output differentials.

There are numerous art-recognized technologies suitable for this purpose. An exemplary technique, iTRAQ analysis in combination with mass spectrometry, is briefly described below.

The quantitative proteomics approach is based on stable isotope labeling with the 8-plex iTRAQ reagent and 2D-LC MALDI MS/MS for peptide identification and quantification. Quantification with this technique is relative: peptides and proteins are assigned abundance ratios relative to a reference sample. Common reference samples in multiple iTRAQ experiments facilitate the comparison of samples across multiple iTRAQ experiments. For example, to implement this analysis scheme, six primary samples and two control pool samples can be combined into one 8-plex iTRAQ mix according to the manufacturer's suggestions. This mixture of eight samples then can be fractionated by two-dimensional liquid chromatography; strong cation exchange (SCX) in the first dimension, and reversed-phase HPLC in the second dimension, then can be subjected to mass spectrometric analysis.

A brief overview of exemplary laboratory procedures that can be employed is provided herein.

Protein extraction: Cells can be lysed with 8 M urea lysis buffer with protease inhibitors (Thermo Scientific Halt Protease inhibitor EDTA-free) and incubate on ice for 30 minutes with vertex for 5 seconds every 10 minutes. Lysis can be completed by ultrasonication in 5 seconds pulse. Cell lysates can be centrifuged at 14000 x g for 15 minutes (4 oC) to remove cellular debris. Bradford assay can be performed to determine the protein concentration. lOOug protein from each samples can be reduced (lOmM Dithiothreitol (DTT), 55 °C, 1 h), alkylated (25 mM iodoacetamide, room temperature, 30 minutes) and digested with Trypsin (1:25 w/w, 200 mM triethylammonium bicarbonate (TEAB), 37 oC, 16 h).

Secretome sample preparation: 1) In one embodiment, the cells can be cultured in serum free medium: Conditioned media can be concentrated by freeze dryer, reduced (lOmM Dithiothreitol (DTT), 55 °C, 1 h), alkylated (25 mM iodoacetamide, at room temperature, incubate for 30 minutes), and then desalted by actone precipitation. Equal amount of proteins from the concentrated conditioned media can be digested with Trypsin (1:25 w/w, 200 mM triethylammonium bicarbonate (TEAB), 37 oC, 16 h).

In one embodiment, the cells can be cultured in serum containing medium: The volume of the medium can be reduced using 3k MWCO Vivaspin columns (GE

Healthcare Life Sciences), then can be reconstituted withlxPBS (Invitrogen). Serum albumin can be depleted from all samples using AlbuVoid column (Biotech Support Group, LLC) following the manufacturer's instructions with the modifications of buffer- exchange to optimize for condition medium application.

iTRAQ 8 Plex Labeling: Aliquot from each tryptic digests in each experimental set can be pooled together to create the pooled control sample. Equal aliquots from each sample and the pooled control sample can be labeled by iTRAQ 8 Plex reagents according to the manufacturer's protocols (AB Sciex). The reactions can be combined, vacuumed to dryness, re-suspended by adding 0.1% formic acid, and analyzed by LC- MS/MS.

2D-NanoLC-MS/MS: All labeled peptides mixtures can be separated by online 2D-nanoLC and analysed by electrospray tandem mass spectrometry. The experiments can be carried out on an Eksigent 2D NanoLC Ultra system connected to an LTQ Orbitrap Velos mass spectrometer equipped with a nanoelectrospray ion source (Thermo Electron, Bremen, Germany).

The peptides mixtures can be injected into a 5 cm SCX column (300μιη ID, 5μιη, PolySULFOETHYL Aspartamide column from PolyLC, Columbia, MD) with a flow of 4 μΕ / min and eluted in 10 ion exchange elution segments into a CI 8 trap column (2.5 cm, ΙΟΟμιη ID, 5μιη, 300 A ProteoPep II from New Objective, Woburn, MA) and washed for 5 min with H2O/0.1 %FA. The separation then can be further carried out at 300 nL/min using a gradient of 2-45% B (H20 /0.1%FA (solvent A) and ACN /0.1%FA (solvent B)) for 120 minutes on a 15 cm fused silica column (75μιη ID, 5μιη, 300 A ProteoPep II from New Objective, Woburn, MA).

Full scan MS spectra (m/z 300-2000) can be acquired in the Orbitrap with resolution of 30,000. The most intense ions (up to 10) can be sequentially isolated for fragmentation using High energy C-trap Dissociation (HCD) and dynamically exclude for 30 seconds. HCD can be conducted with an isolation width of 1.2 Da. The resulting fragment ions can be scanned in the orbitrap with resolution of 7500. The LTQ Orbitrap Velos can be controlled by Xcalibur 2.1 with foundation 1.0.1.

Peptides/proteins identification and quantification: Peptides and proteins can be identified by automated database searching using Proteome Discoverer software (Thermo Electron) with Mascot search engine against SwissProt database. Search parameters can include 10 ppm for MS tolerance, 0.02 Da for MS2 tolerance, and full trypsin digestion allowing for up to 2 missed cleavages. Carbamidomethylation (C) can be set as the fixed modification. Oxidation (M), TMT6, and deamidation (NQ) can be set as dynamic modifications. Peptides and protein identifications can be filtered with Mascot Significant Threshold (p<0.05). The filters can be allowed a 99% confidence level of protein identification (1% FDA).

The Proteome Discoverer software can apply correction factors on the reporter ions, and can reject all quantitation values if not all quantitation channels are present. Relative protein quantitation can be achieved by normalization at the mean intensity. With respect to the second type of data, in some exemplary embodiments, bioenergetics profiling of cancer and normal models may employ the Seahorse™ XF24 analyzer to enable the understanding of glycolysis and oxidative phosphorylation components.

Specifically, cells can be plated on Seahorse culture plates at optimal densities. These cells can be plated in 100 μΐ of media or treatment and left in a 37°C incubator with 5% C0 ₂. Two hours later, when the cells are adhered to the 24 well plate, an additional 150 μΐ of either media or treatment solution can be added and the plates can be left in the culture incubator overnight. This two step seeding procedure allows for even distribution of cells in the culture plate. Seahorse cartridges that contain the oxygen and pH sensor can be hydrated overnight in the calibrating fluid in a non-C0 ₂ incubator at 37°C. Three mitochondrial drugs are typically loaded onto three ports in the cartridge. Oligomycin, a complex III inhibitor, FCCP, an uncoupler and Rotenone, a complex I inhibitor can be loaded into ports A, B and C respectively of the cartridge. All stock drugs can be prepared at a 1 Ox concentration in an unbuffered DMEM media. The cartridges can be first incubated with the mitochondrial compounds in a non-C0 ₂ incubator for about 15 minutes prior to the assay. Seahorse culture plates can be washed in DMEM based unbuffered media that contains glucose at a concentration found in the normal growth media. The cells can be layered with 630 ul of the unbuffered media and can be equilibriated in a non-C0 ₂ incubator before placing in the Seahorse instrument with a precalibrated cartridge. The instrument can be run for three-four loops with a mix, wait and measure cycle for get a baseline, before injection of drugs through the port is initiated. There can be two loops before the next drug is introduced.

OCR (Oxygen consumption rate) and ECAR (Extracullular Acidification Rate) can be recorded by the electrodes in a 7 μΐ chamber and can be created with the cartridge pushing against the seahorse culture plate.

C. Data Integration and in silico Model Generation

Once relevant data sets have been obtained, integration of data sets and generation of computer-implemented statistical models may be performed using an AI- based informatics system or platform (e.g, the REFS™ platform). For example, an exemplary AI-based system may produce simulation-based networks of protein associations as key drivers of metabolic end points (ECAR/OCR). See Figure 15. Some background details regarding the REFS™ system may be found in Xing et al., "Causal Modeling Using Network Ensemble Simulations of Genetic and Gene Expression Data Predicts Genes Involved in Rheumatoid Arthritis," PLoS Computational Biology, vol. 7, issue. 3, 1-19 (March 2011) (el00105) and U.S. Patent 7,512,497 to Periwal, the entire contents of each of which is expressly incorporated herein by reference in its entirety. In essence, as described earlier, the REFS™ system is an ΑΙ-based system that employs mathematical algorithms to establish causal relationships among the input variables (e.g., protein expression levels, mRNA expression levels, and the corresponding functional data, such as the OCR / ECAR values measured on Seahorse culture plates). This process is based only on the input data alone, without taking into consideration prior existing knowledge about any potential, established, and/or verified biological relationships.

In particular, a significant advantage of the platform of the invention is that the AI-based system is based on the data sets obtained from the cell model, without resorting to or taking into consideration any existing knowledge in the art concerning the biological process. Further, preferably, no data points are statistically or artificially cutoff and, instead, all obtained data is fed into the AI-system for determining protein associations. Accordingly, the resulting statistical models generated from the platform are unbiased, since they do not take into consideration any known biological relationships.

Specifically, data from the proteomics and ECAR/OCR can be input into the AI- based information system, which builds statistical models based on data associations, as described above. Simulation-based networks of protein associations are then derived for each disease versus normal scenario, including treatments and conditions using the following methods.

A detailed description of an exemplary process for building the generated (e.g., optimized or evolved) networks appears below with respect to Figure 16. As described above, data from the proteomics and functional cell data is input into the AI-based system (step 210). The input data, which may be raw data or minimally processed data, is pre-processed, which may include normalization (e.g., using a quantile function or internal standards) (step 212). The pre-processing may also include imputing missing data values (e.g., by using the K-nearest neighbor (K-NN) algorithm) (step 212). The pre-processed data is used to construct a network fragment library (step 214). The network fragments define quantitative, continuous relationships among all possible small sets (e.g., 2-3 member sets or 2-4 member sets) of measured variables (input data). The relationships between the variables in a fragment may be linear, logistic, multinomial, dominant or recessive homozygous, etc. The relationship in each fragment is assigned a Bayesian probabilistic score that reflect how likely the candidate relationship is given the input data, and also penalizes the relationship for its mathematical complexity. By scoring all of the possible pairwise and three-way relationships (and in some embodiments also four-way relationships) inferred from the input data, the most likely fragments in the library can be identified (the likely fragments). Quantitative parameters of the relationship are also computed based on the input data and stored for each fragment. Various model types may be used in fragment enumeration including but not limited to linear regression, logistic regression, (Analysis of Variance) ANOVA models, (Analysis of Covariance) ANCOVA models, nonlinear/polynomial regression models and even non-parametric regression. The prior assumptions on model parameters may assume Gull distributions or Bayesian

Information Criterion (BIC) penalties related to the number of parameters used in the model. In a network inference process, each network in an ensemble of initial trial networks is constructed from a subset of fragments in the fragment library. Each initial trial network in the ensemble of initial trial networks is constructed with a different subset of the fragments from the fragment library (step 216).

An overview of the mathematical representations underlying the Bayesian networks and network fragments, which is based on Xing et al., "Causal Modeling Using Network Ensemble Simulations of Genetic and Gene Expression Data Predicts Genes Involved in Rheumatoid Arthritis," PLoS Computational Biology, vol. 7, issue. 3, 1-19 (March 2011) (el00105), is presented below.

A multivariate system with random variables X = X ₁ , ... , X _n may be characterized by a multivariate probability distribution function P(X _l , ... , X _n ; &) , that includes a large number of parameters Θ. The multivariate probability distribution function may be factorized and represented by a product of local conditional probability distributions: in which each variable X _T is independent from its non-descendent variables given its K _T parent variables, which are Y ,..., Y _JK . After factorization, each local probability distribution has its own parameters Θ,.

The multivariate probability distribution function may be factorized in different ways with each particular factorization and corresponding parameters being a distinct probabilistic model. Each particular factorization (model) can be represented by a Directed Acrylic Graph (DAC) having a vertex for each variable X _I and directed edges between vertices representing dependences between variables in the local conditional distributions P _I (X _; |F _/1 Y _JK ). Subgraphs of a DAG, each including a vertex and associated directed edges are network fragments.

A model is evolved or optimized by determining the most likely factorization and the most likely parameters given the input data. This may be described as "learning a Bayesian network," or, in other words, given a training set of input data, finding a network that best matches the input data. This is accomplished by using a scoring function that evaluates each network with respect to the input data.

A Bayesian framework is used to determine the likelihood of a factorization given the input data. Bayes Law states that the posterior probability, , of a model M, given data D is proportional to the product of the product of the posterior probability of the data given the model assumptions, P(D\M) , multiplied by the prior probability of the model, P(M ) , assuming that the probability of the data, P(Z)), is constant across models. This is expressed in the following equation:

The posterior probability of the data assuming the model is the integral of the data likelihood over the rior distribution of parameters:

Assuming all models are equally likely (i.e., that P{M) is a constant), the posterior probability of model M given the data D may be factored into the product of integrals over parameters for each local network fragment M; as follows:

Note that in the equation above, a leading constant term has been omitted. In some embodiments, a Bayesian Information Criterion (BIC), which takes a negative logarithm of the posterior probability of the model P(D\M) may be used to "Score" each model as follows:

where the total score S _tot for a model M is a sum of the local scores 5, for each local network fragment. The BIC further gives an expression for determining a score each individual network fragment:

K{M )

5( ,. ) « S _BIC ( , ) = S _MLE (M, ) + ^-^logN where κ(Μ,) is the number of fitting parameter in model M, and N is the number of samples (data points). S _MI^ ,) is the negative logarithm of the likelihood function for a network fragment, which may be calculated from the functional relationships used for each network fragment. For a BIC score, the lower the score, the more likely a model fits the input data.

The ensemble of trial networks is globally optimized, which may be described as optimizing or evolving the networks (step 218). For example, the trial networks may be evolved and optimized according to a Metropolis Monte Carlo Sampling alogorithm. Simulated annealing may be used to optimize or evolve each trial network in the ensemble through local transformations. In an example simulated annealing processes, each trial network is changed by adding a network fragment from the library, by deleted a network fragment from the trial network, by substituting a network fragment or by otherwise changing network topology, and then a new score for the network is calculated. Generally speaking, if the score improves, the change is kept and if the score worsens the change is rejected. A "temperature" parameter allows some local changes which worsen the score to be kept, which aids the optimization process in avoiding some local minima. The "temperature" parameter is decreased over time to allow the optimization/evolution process to converge. All or part of the network inference process may be conducted in parallel for the trial different networks. Each network may be optimized in parallel on a separate processor and/or on a separate computing device. In some embodiments, the optimization process may be conducted on a supercomputer incorporating hundreds to thousands of processors which operate in parallel. Information may be shared among the optimization processes conducted on parallel processors.

The optimization process may include a network filter that drops any networks from the ensemble that fail to meet a threshold standard for overall score. The dropped network may be replaced by a new initial network. Further any networks that are not "scale free" may be dropped from the ensemble. After the ensemble of networks has been optimized or evolved, the result may be termed an ensemble of generated cell model networks, which may be collectively referred to as the generated consensus network.

D. Simulation to Extract Quantitative Relationship Information and for Prediction

Simulation may be used to extract quantitative parameter information regarding each relationship in the generated cell model networks (step 220). For example, the simulation for quantitative information extraction may involve perturbing (increasing or decreasing) each node in the network by 10 fold and calculating the posterior distributions for the other nodes (e.g., proteins) in the models. The endpoints are compared by t-test with the assumption of 100 samples per group and the 0.01 significance cut-off. The t-test statistic is the median of 100 t-tests. Through use of this simulation technique, an AUC (area under the curve) representing the strength of prediction and fold change representing the in silico magnitude of a node driving an end point are generated for each relationship in the ensemble of networks.

A relationship quantification module of a local computer system may be employed to direct the Al-based system to perform the perturbations and to extract the AUC information and fold information. The extracted quantitative information may include fold change and AUC for each edge connecting a parent note to a child node. In some embodiments, a custom-built R program may be used to extract the quantitative information. In some embodiments, the ensemble of generated cell model networks can be used through simulation to predict responses to changes in conditions, which may be later verified though wet-lab cell-based, or animal-based, experiments.

The output of the ΑΙ-based system may be quantitative relationship parameters and/or other simulation predictions (222).

E. Generation of Differential (Delta) Networks

A differential network creation module may be used to generate differential (delta) networks between generated cell model networks and generated comparison cell model networks. As described above, in some embodiments, the differential network compares all of the quantitative parameters of the relationships in the generated cell model networks and the generated comparison cell model network. The quantitative parameters for each relationship in the differential network are based on the comparison. In some embodiments, a differential may be performed between various differential networks, which may be termed a delta-delta network. An example of a delta-delta network is described below with respect to Figure 18 in the Examples section. The differential network creation module may be a program or script written in PERL.

F. Visualization of Networks

The relationship values for the ensemble of networks and for the differential networks may be visualized using a network visualization program (e.g., Cytoscape open source platform for complex network analysis and visualization from the

Cytoscape consortium). In the visual depictions of the networks, the thickness of each edge (e.g., each line connecting the proteins) represents the strength of fold change. The edges are also directional indicating causality, and each edge has an associated prediction confidence level.

G. Exemplary Computer System

Figure 17 schematically depicts an exemplary computer system/environment that may be employed in some embodiments for communicating with the Al-based informatics system, for generating differential networks, for visualizing networks, for saving and storing data, and/or for interacting with a user. As explained above, calculations for an Al-based informatics system may be performed on a separate supercomputer with hundreds or thousands of parallel processors that interacts, directly or indirectly, with the exemplary computer system. The environment includes a computing device 100 with associated peripheral devices. Computing device 100 is programmable to implement executable code 150 for performing various methods, or portions of methods, taught herein. Computing device 100 includes a storage device 116, such as a hard-drive, CD-ROM, or other non-transitory computer readable media. Storage device 116 may store an operating system 118 and other related software.

Computing device 100 may further include memory 106. Memory 106 may comprise a computer system memory or random access memory, such as DRAM, SRAM, EDO RAM, etc. Memory 106 may comprise other types of memory as well, or combinations thereof. Computing device 100 may store, in storage device 116 and/or memory 106, instructions for implementing and processing each portion of the executable code 150.

The executable code 150 may include code for communicating with the Al-based informatics system 190, for generating differential networks (e.g., a differential network creation module), for extracting quantitative relationship information from the Al-based informatics system (e.g., a relationship quantification module) and for visualizing networks (e.g., Cytoscape).

In some embodiments, the computing device 100 may communicate directly or indirectly with the Al-based informatics system 190 (e.g., a system for executing REFS). For example, the computing device 100 may communicate with the Al-based informatics system 190 by transferring data files (e.g., data frames) to the Al-based informatics system 190 through a network. Further, the computing device 100 may have executable code 150 that provides an interface and instructions to the Al-based informatics system 190.

In some embodiments, the computing device 100 may communicate directly or indirectly with one or more experimental systems 180 that provide data for the input data set. Experimental systems 180 for generating data may include systems for mass spectrometry based proteomics, microarray gene expression, qPCR gene expression, mass spectrometry based metabolomics, and mass spectrometry based lipidomics, SNP microarrays, a panel of functional assays, and other in-vitro biology platforms and technologies.

Computing device 100 also includes processor 102, and may include one or more additional processor(s) 102', for executing software stored in the memory 106 and other programs for controlling system hardware, peripheral devices and/or peripheral hardware. Processor 102 and processor(s) 102' each can be a single core processor or multiple core (104 and 104') processor. Virtualization may be employed in computing device 100 so that infrastructure and resources in the computing device can be shared dynamically. Virtualized processors may also be used with executable code 150 and other software in storage device 116. A virtual machine 114 may be provided to handle a process running on multiple processors so that the process appears to be using only one computing resource rather than multiple. Multiple virtual machines can also be used with one processor.

A user may interact with computing device 100 through a visual display device 122, such as a computer monitor, which may display a user interface 124 or any other interface. The user interface 124 of the display device 122 may be used to display raw data, visual representations of networks, etc. The visual display device 122 may also display other aspects or elements of exemplary embodiments (e.g., an icon for storage device 116). Computing device 100 may include other I/O devices such a keyboard or a multi-point touch interface (e.g., a touchscreen) 108 and a pointing device 110, (e.g., a mouse, trackball and/or trackpad) for receiving input from a user. The keyboard 108 and the pointing device 110 may be connected to the visual display device 122 and/or to the computing device 100 via a wired and/or a wireless connection.

Computing device 100 may include a network interface 112 to interface with a network device 126 via a Local Area Network (LAN), Wide Area Network (WAN) or the Internet through a variety of connections including, but not limited to, standard telephone lines, LAN or WAN links (e.g., 802.11, Tl, T3, 56kb, X.25), broadband connections (e.g., ISDN, Frame Relay, ATM), wireless connections, controller area network (CAN), or some combination of any or all of the above. The network interface 112 may comprise a built-in network adapter, network interface card, PCMCIA network card, card bus network adapter, wireless network adapter, USB network adapter, modem or any other device suitable for enabling computing device 100 to interface with any type of network capable of communication and performing the operations described herein.

Moreover, computing device 100 may be any computer system such as a workstation, desktop computer, server, laptop, handheld computer or other form of computing or telecommunications device that is capable of communication and that has sufficient processor power and memory capacity to perform the operations described herein.

Computing device 100 can be running any operating system 118 such as any of the versions of the MICROSOFT WINDOWS operating systems, the different releases of the Unix and Linux operating systems, any version of the MACOS for Macintosh computers, any embedded operating system, any real-time operating system, any open source operating system, any proprietary operating system, any operating systems for mobile computing devices, or any other operating system capable of running on the computing device and performing the operations described herein. The operating system may be running in native mode or emulated mode.

IV. Models for Drug-induced Toxicity and Uses Therefor

A. Establishing a Model for Drug-induced Toxicity

Virtually all drug-induced toxicity involves complicated interactions among different cell types and/or organ systems. Perturbation of critical functions in one cell type or organ may lead to secondary effects on other interacting cells types and organs, and such downstream changes may in turn feedback to the initial changes and cause further complications. Therefore, it is beneficial to dissect a given drug-induced toxicity to its components, such as interaction between pairs of cell types or organs, and systemically probe the interactions between these components in order to gain a more complete, global view of the drug-induced toxicity process.

Accordingly, the present invention provides cell models for drug-induced toxicity. To this end, Applicants have built cell models for an exemplary drug-induced toxicity (e.g., cardio toxicity) which have been employed in the subject discovery Platform Technology. Applicants have conducted experiments with the cell models using the subject discovery Platform Technology to generate consensus causal relationship networks, including causal relationships unique in the drug-induced toxicity, and thereby identify "modulators" or critical molecular "drivers" important for the particular drug-induced toxicity.

One significant advantage of the Platform Technology and its components, e.g., the custom built cell models and data sets obtained from the drug-induced toxicity cell models, is that an initial, "first generation" consensus causal relationship network generated for a drug-induced toxicity can continually evolve or expand over time, e.g., by the introduction of additional cell lines/types and/or additional conditions.

Additional data from the evolved cell model, i.e., data from the newly added portion(s) of the cell model, can be collected. The new data collected from an expanded or evolved cell model, i.e., from newly added portion(s) of the cell model, can then be introduced to the data sets previously used to generate the "first generation" consensus causal relationship network in order to generate a more robust "second generation" consensus causal relationship network. New causal relationships unique to the drug- induced toxicity can then be identified from the "second generation" consensus causal relationship network. In this way, the evolution of the drug-induced toxicity cell model provides an evolution of the consensus causal relationship networks, thereby providing new and/or more reliable insights into the modulators of the drug-induced toxicity. In this way, both the drug-induced toxicity cell models, the data sets from the cell models, and the causal relationship networks generated from the drug-induced toxicity cell models by using the Platform Technology methods can constantly evolve and build upon previous knowledge obtained from the Platform Technology.

Accordingly, the invention provides consensus causal relationship networks generated from the drug-induced toxicity cell models employed in the Platform

Technology. These consensus causal relationship networks may be first generation consensus causal relationship networks, or may be multiple generation consensus causal relationship networks, e.g., 2 ^nd' 3 ^rd, 4*, 5 ^th, 6 ^th, 7 ^th, 8 ^th, 9 ^th, 10 ^th, 11 ^th, 12 ^th, 13 ^th, 14 ^th, 15 ^th, 16 ^th, 17 ^th, 18 ^th, 19 ^th, 20 ^th or greater generation consensus causal relationship networks. Further, the invention provides simulated consensus causal relationship networks generated from the drug-induced toxicity cell models employed in the Platform

Technology. These simulated consensus causal relationship networks may be first generation simulated consensus causal relationship networks, or may be multiple generation simulated consensus causal relationship networks, e.g., 2 ^nd' 3 ^rd, 4 ^th, 5 ^th, 6 ^th, 7 ^th, 8 ^th, 9 ^th, 10 ^th, 11 ^th, 12 ^th, 13 ^th, 14 ^th, 15 ^th, 16 ^th, 17 ^th, 18 ^th, 19 ^th, 20 ^th or greater simulated generation consensus causal relationship networks. The invention further provides delta networks and delta-delta networks generated from any of the consensus causal relationship networks of the invention.

A custom built cell model for a drug-induced toxicity comprises one or more cells associated with the drug-induced toxicity. The model for a drug-induced toxicity may be established to simulate an environment of the drug-induced toxicity, e.g., environment of drug-induced cardiotoxicity in vivo, by creating conditions (e.g., cell culture conditions) that mimic a characteristic aspect of the drug-induced toxicity.

Multiple cells of the same or different origin, as opposed to a single cell type, may be included in the cell model. In one embodiment, at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 50 or more different cell lines or cell types are included in the drug-induced toxicity cell model. In one embodiment, the cells are all of the same type, e.g., all cardiomyocytes, but are different established cell lines, e.g., different established cell lines of cardiomyocytes. All values presented in the foregoing list can also be the upper or lower limit of ranges, that are intended to be a part of this invention, e.g., between 1 and 5, 1 and 10, 2 and 5, or 5 and 15 different cell lines or cell types.

Examples of cell types that may be included in the cell models of the invention include, without limitation, human cells, animal cells, mammalian cells, plant cells, yeast, bacteria, or fungae. In one embodiment, cells of the cell model can include diseased cells, such as cancer cells or bacterially or virally infected cells. In one embodiment, cells of the cell model can include drug-induced toxicity associated cells, such as cells involved in diabetes, obesity or cardiovascular drug-induced toxicity state, e.g., aortic smooth muscle cells or hepatocytes. The skilled person would recognize those cells that are involved in or associated with a particular drug-induced toxicity, e.g., cardiotoxicity, hepatotoxicity, nephrotoxicity, neurotoxicity, renaltoxicity, or myotoxicity, and any such cells may be included in a cell model of the invention, e.g., cardiomyocytes, diabetic cardiomyocytes, hepatocytes, kidney cells, neuro cells, renal cells, or myoblasts.

Cell models of the invention may include one or more "control cells." In one embodiment, a control cell may be an untreated or unperturbed cell. In another embodiment, a "control cell" may be a normalcell, e.g., a cell that has not been exposed to a toxicity-causing agent or drug. In one embodiment, at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50 or more different control cells are included in the cell model. All values presented in the foregoing list can also be the upper or lower limit of ranges, that are intended to be a part of this invention, e.g., between 1 and 5, 1 and 10, 2 and 5, or 5 and 15 different control cell lines or control cell types. In one embodiment, the control cells are all of the same type but are different established cell lines of that cell type. In one embodiment, as a control, one or more normal, e.g., non-diseased, cell lines are cultured under similar conditions, and/or are exposed to the same perturbation, as the primary cells of the cell model in order to identify proteins or pathways unique to the drug-induced toxicity.

A custom cell model of the invention may also comprise conditions that mimic a characteristic aspect of the drug-induced toxicity. For example, cell culture conditions may be selected that closely approximating the conditions of a cell in a diabetic environment in vivo for probing diabetic drug induced toxicity, or of an aortic smooth muscle cell of a patient suffering from drug-induced cardiotoxicity. In some instances, the conditions are stress conditions. Various conditions / stressors may be employed in the cell models of the invention. In one embodiment, these stressors / conditions may constitute the "perturbation", e.g., external stimulus, for the cell systems. One exemplary stress condition is hypoxia, a condition typically found, for example, within patients with advanced stage of diabetes. Hypoxia can be induced using art-recognized methods. For example, hypoxia can be induced by placing cell systems in a Modular Incubator Chamber (MIC-101, Billups-Rothenberg Inc. Del Mar, CA), which can be flooded with an industrial gas mix containing 5% C0 ₂, 2% 0 ₂ and 93% nitrogen.

Effects can be measured after a pre-determined period, e.g. , at 24 hours after hypoxia treatment, with and without additional external stimulus components (e.g. , CoQIO at 0, 50, or 100 μΜ). Likewise, lactic acid treatment mimics a cellular environment where glycolysis activity is high. Lactic acid induced stress can be investigated at a final lactic acid concentration of about 12.5 mM at a pre-determined time, e.g. , at 24 hours, with or without additional external stimulus components (e.g. , CoQIO at 0, 50, or 100 μΜ). Hyperglycemia is a condition found in diabetes as well as in diabetic drug-induced toxicity. A typical hyperglycemic condition that can be used to treat the subject cells include 10% culture grade glucose added to suitable media to bring up the final concentration of glucose in the media to about 22 mM. Hyperlipidemia is a condition found, for example, in obesity and cardiovascular disease, and can be used to simulate drug-induced cardiotoxicity. The hyperlipidemic conditions can be provided by culturing cells in media containing 0.15 mM sodium palmitate. Hyperinsulinemia is a condition found, for example, in diabetes, as well as in diabetic drug-induced toxicity. The hyperinsulinemic conditions may be induced by culturing the cells in media containing 1000 nM insulin.

Individual conditions may be investigated separately in the custom built cell models of the invention, and/or may be combined together. In one embodiment, a combination of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50 or more conditions reflecting or simulating different characteristic aspects of the biological system are investigated in the custom built cell model. In one embodiment, individual conditions and, in addition, combinations of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, 35, 40, 45, 50 or more of the conditions reflecting or simulating different characteristic aspects of the drug-induced toxicity are investigated in the custom built drug-induced toxicity cell model. All values presented in the foregoing list can also be the upper or lower limit of ranges, that are intended to be a part of this invention, e.g., between 1 and 5, 1 and 10, 1 and 20, 1 and 30, 2 and 5, 2 and 10, 5 and 10, 1 and 20, 5 and 20, 10 and 20, 10 and 25, 10 and 30 or 10 and 50 different conditions.

Once the custom drug-induced toxicity cell model is built, one or more

"perturbations" may be applied to the system, such as genetic variation from patient to patient, or with / without treatment by certain drugs or pro-drugs. See Figure 15D. The effects of such perturbations to the cell model system can be measured using various art- recognized or proprietary means, as described in section III.B below.

The custom built drug-induced toxicity cell model may be exposed to a perturbation, e.g., an "environmental perturbation" or "external stimulus component". The "environmental perturbation" or "external stimulus component" may be endogenous to the cellular environment (e.g. , the cellular environment contains some levels of the stimulant, and more of the same is added to increase its level), or may be exogenous to the cellular environment (e.g. , the stimulant/perturbation is largely absent from the cellular environment prior to the alteration). The cellular environment may further be altered by secondary changes resulting from adding the environmental perturbation or external stimulus component, since the external stimulus component may change the cellular output of the cell system, including molecules secreted into the cellular environment by the cell system. The environmental perturbation or external stimulus component may include any external physical and/or chemical stimulus that may affect cellular function. This may include any large or small organic or inorganic molecules, natural or synthetic chemicals, temperature shift, pH change, radiation, light (UVA, UVB etc.), microwave, sonic wave, electrical current, modulated or unmodulated magnetic fields, etc. The environmental perturbation or external stimulus component may also include an introduced genetic modification or mutation or a vehicle (e.g., vector) that causes a genetic modification / mutation.

(i) Cross-talk cell systems

In certain situations, where interaction between two or more cell systems are desired to be investigated, a "cross-talking cell system" may be formed by, for example, bringing the modified cellular environment of a first cell system into contact with a second cell system to affect the cellular output of the second cell system.

As used herein, "cross-talk cell system" comprises two or more cell systems, in which the cellular environment of at least one cell system comes into contact with a second cell system, such that at least one cellular output in the second cell system is changed or affected. In certain embodiments, the cell systems within the cross-talk cell system may be in direct contact with one another. In other embodiments, none of the cell systems are in direct contact with one another.

For example, in certain embodiments, the cross-talk cell system may be in the form of a transwell, in which a first cell system is growing in an insert and a second cell system is growing in a corresponding well compartment. The two cell systems may be in contact with the same or different media, and may exchange some or all of the media components. External stimulus component added to one cell system may be substantially absorbed by one cell system and/or degraded before it has a chance to diffuse to the other cell system. Alternatively, the external stimulus component may eventually approach or reach an equilibrium within the two cell systems.

In certain embodiments, the cross-talk cell system may adopt the form of separately cultured cell systems, where each cell system may have its own medium and/or culture conditions (temperature, C0 ₂ content, pH, etc.), or similar or identical culture conditions. The two cell systems may come into contact by, for example, taking the conditioned medium from one cell system and bringing it into contact with another cell system. Direct cell-cell contacts between the two cell systems can also be effected if desired. For example, the cells of the two cell systems may be co-cultured at any point if desired, and the co-cultured cell systems can later be separated by, for example, FACS sorting when cells in at least one cell system have a sortable marker or label (such as a stably expressed fluorescent marker protein GFP).

Similarly, in certain embodiments, the cross-talk cell system may simply be a co- culture. Selective treatment of cells in one cell system can be effected by first treating the cells in that cell system, before culturing the treated cells in co-culture with cells in another cell system. The co-culture cross-talk cell system setting may be helpful when it is desired to study, for example, effects on a second cell system caused by cell surface changes in a first cell system, after stimulation of the first cell system by an external stimulus component.

The cross-talk cell system of the invention is particularly suitable for exploring the effect of certain pre-determined external stimulus component on the cellular output of one or both cell systems. The primary effect of such a stimulus on the first cell system (with which the stimulus directly contact) may be determined by comparing cellular outputs (e.g. , protein expression level) before and after the first cell system's contact with the external stimulus, which, as used herein, may be referred to as

"(significant) cellular output differentials." The secondary effect of such a stimulus on the second cell system, which is mediated through the modified cellular environment of the first cell system (such as its secretome), can also be similarly measured. There, a comparison in, for example, proteome of the second cell system can be made between the proteome of the second cell system with the external stimulus treatment on the first cell system, and the proteome of the second cell system without the external stimulus treatment on the first cell system. Any significant changes observed (in proteome or any other cellular outputs of interest) may be referred to as a "significant cellular cross-talk differential."

In making cellular output measurements (such as protein expression), either absolute expression amount or relative expression level may be used. For example, to determine the relative protein expression level of a second cell system, the amount of any given protein in the second cell system, with or without the external stimulus to the first cell system, may be compared to a suitable control cell line and mixture of cell lines and given a fold-increase or fold-decrease value. A pre-determined threshold level for such fold-increase (e.g. , at least 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2, 2.5, 3, 3.5, 4, 4.5, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 75 or 100 or more fold increase) or fold- decrease (e.g. , at least a decrease to 0.95, 0.9, 0.8, 0.75, 0.7, 0.6, 0.5, 0.45, 0.4, 0.35, 0.3, 0.25, 0.2, 0.15, 0.1 or 0.05 fold, or 90%, 80%, 75%, 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10% or 5% or less) may be used to select significant cellular cross-talk differentials. All values presented in the foregoing list can also be the upper or lower limit of ranges, e.g., between 1.5 and 5 fold, between 2 and 10 fold, between 1 and 2 fold, or between 0.9 and 0.7 fold, that are intended to be a part of this invention.

Throughout the present application, all values presented in a list, e.g., such as those above, can also be the upper or lower limit of ranges that are intended to be a part of this invention.

To illustrate, in one exemplary two-cell system established to imitate aspects of a drug-induced cardiotoxicity and nephrotoxicity model, a heart smooth muscle cell line (first cell system) may be treated with a hypoxia condition (an external stimulus component), and proteome changes in a kidney cell line (second cell system) resulting from contacting the kidney cells with conditioned medium of the heart smooth muscle may be measured using conventional quantitative mass spectrometry. Significant cellular cross-talking differentials in these kidney cells may be determined, based on comparison with a proper control (e.g. , similarly cultured kidney cells contacted with conditioned medium from similarly cultured heart smooth muscle cells not treated with hypoxia conditions).

Not every observed significant cellular cross-talking differentials may be of biological significance. With respect to any given drug-induced toxicity for which the subject interrogative biological assessment is applied, some (or maybe all) of the significant cellular cross-talking differentials may be "determinative" with respect to the specific biological problem at issue, e.g. , either responsible for causing a drug-induced toxicity (a potential target for therapeutic intervention) or is a biomarker for the drug- induced toxicity (a potential diagnostic or prognostic factor).

Such determinative cross-talking differentials may be selected by an end user of the subject method, or it may be selected by a bioinformatics software program, such as DAVID-enabled comparative pathway analysis program, or the KEGG pathway analysis program. In certain embodiments, more than one bioinformatics software program is used, and consensus results from two or more bioinformatics software programs are preferred.

As used herein, "differentials" of cellular outputs include differences (e.g. , increased or decreased levels) in any one or more parameters of the cellular outputs. For example, in terms of protein expression level, differentials between two cellular outputs, such as the outputs associated with a cell system before and after the treatment by an external stimulus component, can be measured and quantitated by using art-recognized technologies, such as mass-spectrometry based assays (e.g. , iTRAQ, 2D-LC-MSMS, etc.).

B. Use of Cell Models for Interro2ative Biological Assessments

The methods and cell models described herein, and further described in international Application No. PCT/US2012/027615, may be used for, or applied to, any number of "interrogative biological assessments." Use of the methods of the invention for an interrogative biological assessment facilitates the identification of "modulators" or determinative cellular process "drivers" of a drug-induced toxicity.

As used herein, an "interrogative biological assessment" may include the identification of one or more modulators of a biological system, e.g., determinative cellular process "drivers," (e.g., an increase or decrease in activity of a biological pathway, or key members of the pathway, or key regulators to members of the pathway) associated with the environmental perturbation or external stimulus component, or a unique causal relationship unique in a biological system or process. It may further include additional steps designed to test or verify whether the identified determinative cellular process drivers are necessary and/or sufficient for the downstream events associated with the environmental perturbation or external stimulus component, including in vivo animal models and/or in vitro tissue culture experiments.

In a preferred embodiment, the interrogative biological assessment is the assessment of the drug-induced toxicological profile of an agent, e.g., a drug, on a cell, tissue, organ or organism, wherein the identified modulators of a biological system, e.g., determinative cellular process driver (e.g., cellular cross-talk differentials or causal relationships unique in a biological system or process) may be indicators of drug- induced toxicities, e.g., cytotoxicity, cardiotoxicity, hepatotoxicity, nephrotoxicity, neurotoxicity, renaltoxicity, or myotoxicity, and may in turn be used to predict or identify the toxicological profile of the drug. In one embodiment, the identified modulators of a drug-induced toxicity, e.g., determinative cellular process driver (e.g., cellular cross-talk differentials or causal relationships unique in a drug-induced toxicity) is an indicator of cardiotoxicity of a drug or drug candidate, and may in turn be used to predict or identify the cardiotoxicological profile of the drug or drug candidate.

V. Proteomic Sample Analysis

In certain embodiments, the subject method employs large-scale high-throughput quantitative proteomic analysis of hundreds of samples of similar character, and provides the data necessary for identifying the cellular output differentials.

There are numerous art-recognized technologies suitable for this purpose. An exemplary technique, iTRAQ analysis in combination with mass spectrometry, is briefly described below.

To provide reference samples for relative quantification with the iTRAQ technique, multiple QC pools are created. Two separate QC pools, consisting of aliquots of each sample, were generated from the Cell #1 and Cell #2 samples - these samples are denoted as QCS1 and QCS2, and QCP1 and QCP2 for supernatants and pellets, respectively. In order to allow for protein concentration comparison across the two cell lines, cell pelki aliquots from the QC pools described above are combined in equal volumes to generate reference samples (QCP).

To implement this analysis scheme, six primary samples and two control pool samples are combined into one 8-plex iTRAQ mix, with the control pool samples labeled with 113 and 117 reagents according to the manufacturer's suggestions. This mixture of eight samples is then fractionated by two-dimensional liquid

chromatography; strong cation exchange (SCX) in the first dimension, and reversed- phase HPLC in the second dimension. The HPLC eluent is directly fractionated onto MALDI plates, and the plates are analyzed on an MDS SCIEX/AB 4800 MALDI TOF/TOF mass spectrometer.

In the absence of additional information, it is assumed that the most important changes in protein expression are those within the same cell types under different treatment conditions. For this reason, primary samples from Cell#l and Cell#2 are analyzed in separate iTRAQ mixes. To facilitate comparison of protein expression in Cell#l vs. Cell#2 samples, universal QCP samples are analyzed in the available "iTRAQ slots" not occupied by primary or cell line specific QC samples (QC1 and QC2).

A brief overview of the laboratory procedures employed is provided herein.

A. Protein Extraction From Cell Supernatant Samples

For cell supernatant samples (CSN), proteins from the culture medium are present in a large excess over proteins secreted by the cultured cells. In an attempt to reduce this background, upfront abundant protein depletion was implemented. As specific affinity columns are not available for bovine or horse serum proteins, an anti- human IgY14 column was used. While the antibodies are directed against human proteins, the broad specificity provided by the polyclonal nature of the antibodies was anticipated to accomplish depletion of both bovine and equine proteins present in the cell culture media that was used.

A 200-μ1 aliquot of the CSN QC material is loaded on a 10-mL IgY14 depletion column before the start of the study to determine the total protein concentration (Bicinchoninic acid (BCA) assay) in the flow-through material. The loading volume is then selected to achieve a depleted fraction containing approximately 40 μg total protein.

B. Protein Extraction From Cell Pellets

An aliquot of Cell #1 and Cell #2 is lysed in the "standard" lysis buffer used for the analysis of tissue samples at BGM, and total protein content is determined by the BCA assay. Having established the protein content of these representative cell lystates, all cell pellet samples (including QC samples described in Section 1.1) were processed to cell lysates. Lysate amounts of approximately 40 μg of total protein were carried forward in the processing workflow.

C. Sample Preparation for Mass Spectrometry Sample preparation follows standard operating procedures and constitute of the following:

• Reduction and alkylation of proteins

• Protein clean-up on reversed-phase column (cell pellets only)

• Digestion with trypsin

• iTRAQ labeling

• Strong cation exchange chromatography - collection of six fractions (Agilent 1200 system)

• HPLC fractionation and spotting to MALDI plates (Dionex Ultimate3000/Probot system)

D. MALDI MS and MS/MS

HPLC-MS generally employs online ESI MS/MS strategies. BG Medicine uses an off-line LC-MALDI MS/MS platform that results in better concordance of observed protein sets across the primary samples without the need of injecting the same sample multiple times. Following first pass data collection across all iTRAQ mixes, since the peptide fractions are retained on the MALDI target plates, the samples can be analyzed a second time using a targeted MS/MS acquisition pattern derived from knowledge gained during the first acquisition. In this manner, maximum observation frequency for all of the identified proteins is accomplished (ideally, every protein should be measured in every iTRAQ mix).

E. Data Processing

The data processing process within the BGM Proteomics workflow can be separated into those procedures such as preliminary peptide identification and quantification that are completed for each iTRAQ mix individually (Section 1.5.1) and those processes (Section 1.5.2) such as final assignment of peptides to proteins and final quantification of proteins, which are not completed until data acquisition is completed for the project.

The main data processing steps within the BGM Proteomics workflow are:

• Peptide identification using the Mascot (Matrix Sciences) database search engine

• Automated in house validation of Mascot IDs

• Quantification of peptides and preliminary quantification of proteins • Expert curation of final dataset

• Final assignment of peptides from each mix into a common set of proteins using the automated PVT tool

• Outlier elimination and final quantification of proteins

(i) Data Processing of Individual iTRAQ Mixes

As each iTRAQ mix is processed through the workflow the MS/MS spectra are analyzed using proprietary BGM software tools for peptide and protein identifications, as well as initial assessment of quantification information. Based on the results of this preliminary analysis, the quality of the workflow for each primary sample in the mix is judged against a set of BGM performance metrics. If a given sample (or mix) does not pass the specified minimal performance metrics, and additional material is available, that sample is repeated in its entirety and it is data from this second implementation of the workflow that is incorporated in the final dataset.

(ii) Peptide Identification

MS/MS spectra was searched against the Uniprot protein sequence database containing human, bovine, and horse sequences augmented by common contaminant sequences such as porcine trypsin. The details of the Mascot search parameters, including the complete list of modifications, are given in Table 1.

Table 1: Mascot Search Parameters

After the Mascot search is complete, an auto-validation procedure is used to promote (i.e. , validate) specific Mascot peptide matches. Differentiation between valid and invalid matches is based on the attained Mascot score relative to the expected Mascot score and the difference between the Rank 1 peptides and Rank 2 peptide Mascot scores. The criteria required for validation are somewhat relaxed if the peptide is one of several matched to a single protein in the iTRAQ mix or if the peptide is present in a catalogue of previously validated peptides.

(iii) Peptide and Protein Quantification

The set of validated peptides for each mix is utilized to calculate preliminary protein quantification metrics for each mix. Peptide ratios are calculated by dividing the peak area from the iTRAQ label (i.e. , m/z 114, 115, 116, 118, 119, or 121) for each validated peptide by the best representation of the peak area of the reference pool (QC1 or QC2). This peak area is the average of the 113 and 117 peaks provided both samples pass QC acceptance criteria. Preliminary protein ratios are determined by calculating the median ratio of all "useful" validated peptides matching to that protein. "Useful" peptides are fully iTRAQ labeled (all N-terminal are labeled with either Lysine or PyroGlu) and fully Cysteine labeled (i.e. , all Cys residues are alkylated with

Carbamidomethyl or N-terminal Pyro-cmc).

(iv) Post-acquisition Processing

Once all passes of MS/MS data acquisition are complete for every mix in the project, the data is collated using the three steps discussed below which are aimed at enabling the results from each primary sample to be simply and meaningfully compared to that of another.

(v) Global Assignment of Peptide Sequences to Proteins

Final assignment of peptide sequences to protein accession numbers is carried out through the proprietary Protein Validation Tool (PVT). The PVT procedure determines the best, minimum non-redundant protein set to describe the entire collection of peptides identified in the project. This is an automated procedure that has been optimized to handle data from a homogeneous taxonomy.

Protein assignments for the supernatant experiments were manually curated in order to deal with the complexities of mixed taxonomies in the database. Since the automated paradigm is not valid for cell cultures grown in bovine and horse serum supplemented media, extensive manual curation is necessary to minimize the ambiguity of the source of any given protein.

(vi) Normalization of Peptide Ratios

The peptide ratios for each sample are normalized based on the method of Vandesompele et al. Genome Biology, 2002, 3(7), research 0034.1-11. This procedure is applied to the cell pellet measurements only. For the supernatant samples, quantitative data are not normalized considering the largest contribution to peptide identifications coming from the media.

(vii) Final Calculation of Protein Ratios

A standard statistical outlier elimination procedure is used to remove outliers from around each protein median ratio, beyond the 1.96 σ level in the log-transformed data set. Following this elimination process, the final set of protein ratios are (recalculated.

VI. Markers of the Invention and Uses Thereof

The present invention is based, at least in part, on the identification of novel biomarkers that are associated with drug-induced toxicities, such as a drug-induced cardiotoxicity, hepatotoxicity, nephrotoxicity, neurotoxicity, renaltoxicity, or myotoxicity, or response of a drug-induced toxicity to a perturbation, such as a therapeutic agent.

In particular, the invention relates to markers (hereinafter "markers" or "markers of the invention"), which are described in the examples. The invention provides nucleic acids and proteins that are encoded by or correspond to the markers (hereinafter "marker nucleic acids" and "marker proteins," respectively). These markers are particularly useful in diagnosing drug-induced toxicity states; prognosing drug-induced toxicity states; developing drug targets for varies drug-induced toxicity states; screening for the presence of toxicity, preferably drug-induced toxicities, e.g., cardiotoxicity, hepatotoxicity, nephrotoxicity, neurotoxicity, renaltoxicity, or myotoxicity; identifying an agent that cause or is at risk for causing drug-induced toxicity; identifying an agent that can reduce or prevent drug-induced toxicity; alleviating, reducing or preventing drug-inducedtoxicity; and identifying markers predictive of drug-induced toxicity. A "marker" is a gene whose altered level of expression in a tissue or cell from its expression level in normal or healthy tissue or cell is associated with a toxicity state, such as a drug-induced toxicity, e.g., cardiotoxicity. A "marker nucleic acid" is a nucleic acid (e.g. , mRNA, cDNA) encoded by or corresponding to a marker of the invention. Such marker nucleic acids include DNA (e.g. , cDNA) comprising the entire or a partial sequence of any of the genes that are markers of the invention or the complement of such a sequence. Such sequences are known to the one of skill in the art and can be found for example, on the NIH government pubmed website. The marker nucleic acids also include RNA comprising the entire or a partial sequence of any of the gene markers of the invention or the complement of such a sequence, wherein all thymidine residues are replaced with uridine residues. A "marker protein" is a protein encoded by or corresponding to a marker of the invention. A marker protein comprises the entire or a partial sequence of any of the marker proteins of the invention. Such sequences are known to the one of skill in the art and can be found for example, on the NIH government pubmed website. The terms "protein" and "polypeptide' are used interchangeably.

A "toxic state associated" body fluid is a fluid which, when in the body of a patient, contacts or passes through sarcoma cells or into which cells or proteins shed from sarcoma cells are capable of passing. Exemplary disease state or toxic state associated body fluids include blood fluids (e.g. whole blood, blood serum, blood having platelets removed therefrom), and are described in more detail below. Disease state or toxic state associated body fluids are not limited to, whole blood, blood having platelets removed therefrom, lymph, prostatic fluid, urine and semen.

The "normal" level of expression of a marker is the level of expression of the marker in cells of a human subject or patient not afflicted with a toxicity state.

An "over-expression" or "higher level of expression" of a marker refers to an expression level in a test sample that is greater than the standard error of the assay employed to assess expression, and is preferably at least twice, and more preferably three, four, five, six, seven, eight, nine or ten times the expression level of the marker in a control sample (e.g. , sample from a healthy subject not having the marker associated a drug-induce toxicity state, e.g., cardiotoxicit, hepatotoxicity, nephrotoxicity, neurotoxicity, renaltoxicity, or myotoxicity) and preferably, the average expression level of the marker in several control samples. A "lower level of expression" of a marker refers to an expression level in a test sample that is at least twice, and more preferably three, four, five, six, seven, eight, nine or ten times lower than the expression level of the marker in a control sample (e.g. , sample from a healthy subjects not having the marker associated a drug-induced toxicity state, e.g., cardio toxicity, cardiotoxicity, hepatotoxicity, nephrotoxicity, neurotoxicity, renaltoxicity, or myotoxicity) and preferably, the average expression level of the marker in several control samples.

A "transcribed polynucleotide" or "nucleotide transcript" is a polynucleotide (e.g. an mRNA, hnRNA, a cDNA, or an analog of such RNA or cDNA) which is complementary to or homologous with all or a portion of a mature mRNA made by transcription of a marker of the invention and normal post-transcriptional processing (e.g. splicing), if any, of the RNA transcript, and reverse transcription of the RNA transcript.

"Complementary" refers to the broad concept of sequence complementarity between regions of two nucleic acid strands or between two regions of the same nucleic acid strand. It is known that an adenine residue of a first nucleic acid region is capable of forming specific hydrogen bonds ("base pairing") with a residue of a second nucleic acid region which is antiparallel to the first region if the residue is thymine or uracil. Similarly, it is known that a cytosine residue of a first nucleic acid strand is capable of base pairing with a residue of a second nucleic acid strand which is antiparallel to the first strand if the residue is guanine. A first region of a nucleic acid is complementary to a second region of the same or a different nucleic acid if, when the two regions are arranged in an antiparallel fashion, at least one nucleotide residue of the first region is capable of base pairing with a residue of the second region. Preferably, the first region comprises a first portion and the second region comprises a second portion, whereby, when the first and second portions are arranged in an antiparallel fashion, at least about 50%, and preferably at least about 75%, at least about 90%, or at least about 95% of the nucleotide residues of the first portion are capable of base pairing with nucleotide residues in the second portion. More preferably, all nucleotide residues of the first portion are capable of base pairing with nucleotide residues in the second portion.

"Homologous" as used herein, refers to nucleotide sequence similarity between two regions of the same nucleic acid strand or between regions of two different nucleic acid strands. When a nucleotide residue position in both regions is occupied by the same nucleotide residue, then the regions are homologous at that position. A first region is homologous to a second region if at least one nucleotide residue position of each region is occupied by the same residue. Homology between two regions is expressed in terms of the proportion of nucleotide residue positions of the two regions that are occupied by the same nucleotide residue. By way of example, a region having the nucleotide sequence 5'-ATTGCC-3' and a region having the nucleotide sequence 5'- TATGGC-3' share 50% homology. Preferably, the first region comprises a first portion and the second region comprises a second portion, whereby, at least about 50%, and preferably at least about 75%, at least about 90%, or at least about 95% of the nucleotide residue positions of each of the portions are occupied by the same nucleotide residue. More preferably, all nucleotide residue positions of each of the portions are occupied by the same nucleotide residue.

"Proteins of the invention" encompass marker proteins and their fragments; variant marker proteins and their fragments; peptides and polypeptides comprising an at least 15 amino acid segment of a marker or variant marker protein; and fusion proteins comprising a marker or variant marker protein, or an at least 15 amino acid segment of a marker or variant marker protein.

The invention further provides antibodies, antibody derivatives and antibody fragments which specifically bind with the marker proteins and fragments of the marker proteins of the present invention. Unless otherwise specified herewithin, the terms "antibody" and "antibodies" broadly encompass naturally-occurring forms of antibodies (e.g. , IgG, IgA, IgM, IgE) and recombinant antibodies such as single-chain antibodies, chimeric and humanized antibodies and multi- specific antibodies, as well as fragments and derivatives of all of the foregoing, which fragments and derivatives have at least an antigenic binding site. Antibody derivatives may comprise a protein or chemical moiety conjugated to an antibody.

In one embodiment, the markers of the invention are genes or proteins associated with or involved in drug-induced toxicity. Such genes or proteins involved in drug- induced toxicity include, for example, the markers listed in table 2. In some

embodiments, the markers of the invention are a combination of at least two, three, four, five, six, seven, eight, nine, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, or more of the foregoing genes (or proteins). All values presented in the foregoing list can also be the upper or lower limit of ranges, that are intended to be a part of this invention, e.g., between 1 and 5, 1 and 10, 1 and 20, 1 and 30, 2 and 5, 2 and 10, 5 and 10, 1 and 20, 5 and 20, 10 and 20, 10 and 25, 10 and 30 of the foregoing genes (or proteins).

A. Cardiotoxicity Associated Markers

Accordingly, the invention provides methods for identifying an agent that causes or is at risk for causing drug-induced cardiotoxicity. In one embodiment, the agent is a drug or drug candidate. In these methods, the amount of one or more

biomarkers/proteins in a pair of samples (a first sample not subject to the drug treatment, and a second sample subjected to the drug treatment) is assessed. A modulation in the level of expression of the one or more biomarkers in the second sample as compared to the first sample is an indication that the drug causes or is at risk for causing drug- induced cardiotoxicity. In one embodiment, the one or more biomarkers is selected from the markers listed in table 2. The methods of the present invention can be practiced in conjunction with any other method used by the skilled practitioner to identify a drug at risk for causing drug-induced cardiotoxocity.

Accordingly, in one aspect, the invention provides a method for identifying a drug that causes or is at risk for causing drug-induced cardiotoxicity, comprising:

In one embodiment, the cells are cells of the cardiovascular system, e.g., cardiomyocytes. In one embodiment, the cells are diabetic cardiomyocytes. In one embodiment, the drug is a drug or candidate drug for treating diabetes, obesity or cardiovascular disease. In one embodiment, a modulation (e.g., an increase or a decrease) in the level of expression of one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty, twenty-five, thirty, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160 or more of the biomarkers selected from the markers listed in table 2 in the second sample as compared to the first sample is an indication that the drug causes or is at risk for causing drug- induced cardiotoxicity.

In one embodiment, a modulation (e.g., an increase or a decrease) in the level of expression of a panel of two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, or thirteen, markers selected from a group consisting TIMP1, PTX3, HSP76, FINC, CYB5, PAI1, IBP7 (IGFBP7), 1C17, EDIL3, HMOX1, NUCB 1, CSOIO, HSPA4 in the second sample as compared to the first sample is an indication that the drug causes or is at risk for causing drug-induced cardiotoxicity.

Methods for identifying a rescue agent that can reduce or prevent drug-induced cardiotoxicity are also provided by the invention. In one embodiment, the drug is a drug or drug candidate for treating diabetes, obesity or a cardiovascular disorder. In these methods, the amount of one or more biomarkers in three samples (a first sample not subjected to the drug treatment, a second sample subjected to the drug treatment, and a third sample subjected both to the drug treatment and the agent) is assessed.

Approximately a normalized level of expression of the one or more biomarkers, in the third sample as compared to the first sample, with a changed level of expression in the second sample, is an indication that the rescue agent can reduce or prevent drug-induced cardiotoxicity. In one embodiment, the one or more biomarkers is selected from the markers listed in table 2.

Accordingly, in another aspect, the invention provides a method for identifying an agent that can reduce or prevent drug-induced cardiotoxicity comprising: (i) determining a normal level of expression of one or more biomarkers present in a first cell sample obtained prior to the treatment with a toxicity inducing drug; (ii) determining a treated level of expression of the one or more biomarkers present in a second cell sample obtained following the treatment with the toxicity inducing drug to identify one or more biomarkers with a change of expression in the treated cell sample; (iii) determining the level of expression of the one or more biomarkers with a changed level of expression in the toxicity inducing drug treated sample present in a third cell sample obtained following the treatment with the toxicity inducing drug and the rescue agent; and (iv) comparing the level of expression of the one or more biomarkers determined in the third sample with the level of expression of the one or more biomarkers determined in the first sample; and a normalized level of expression of the one or more biomarkers in the third sample as compared to the first sample is an indication that the agent can reduce or prevent drug-induced cardiotoxicity. In one embodiment, the one or more biomarkers is selected from the markers listed in table 2.

In one embodiment, the cells are cells of the cardiovascular system, e.g., cardiomyocytes. In one embodiment, the cells are diabetic cardiomyocytes. In one embodiment, the drug is a drug or candidate drug for treating diabetes, obesity or cardiovascular disease. In one embodiment, the drug is Anthracyclines, 5-Fluorouracil, Cisplatin, Trastuzumab, Gemcitabine, Rosiglitazone, Pioglitazone, Troglitazone, Cabergoline, Pergolide, Sumatriptan, Bisphosphonates, or TNF antagonists. In one embodiment, a normalized level of expression of one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty, twenty-five, thirty, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, or more of the biomarkers selected from the markers listed in table 2 in the third sample as compared to the first sample is an indication that the rescue agent can reduce or prevent drug-induced cardiotoxicity.

In one embodiment, a normalized level of expression of a panel of two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, or thirteen markers selected from a group consisting TIMP1, PTX3, HSP76, FINC, CYB5, PAI1, IBP7 (IGFBP7), 1C17, EDIL3, HMOX1, NUCB1, CSOIO, HSPA4, in the third sample as compared to the first sample is an indication that the rescue agent can reduce or prevent drug-induced cardiotoxicity.

In one embodiment, the sample comprises a fluid obtained from the subject. In one embodiment, the fluid is selected from the group consisting of blood fluids, vomit, saliva, lymph, cystic fluid, urine, fluids collected by bronchial lavage, fluids collected by peritoneal rinsing, and gynecological fluids. In one embodiment, the sample is a blood sample or a component thereof.

In another embodiment, the sample comprises a tissue or component thereof obtained from the subject. In one embodiment, the tissue is selected from the group consisting of bone, connective tissue, cartilage, lung, liver, kidney, muscle tissue, heart, pancreas, and skin.

In one embodiment, the subject is a human.

In one embodiment, the level of expression of the one or more markers in the biological sample is determined by assaying a transcribed polynucleotide or a portion thereof in the sample. In one embodiment, wherein assaying the transcribed

polynucleotide comprises amplifying the transcribed polynucleotide.

In one embodiment, the level of expression of the marker in the subject sample is determined by assaying a protein or a portion thereof in the sample. In one embodiment, the protein is assayed using a reagent which specifically binds with the protein.

In one embodiment, the level of expression of the one or more markers in the sample is determined using a technique selected from the group consisting of polymerase chain reaction (PCR) amplification reaction, reverse-transcriptase PCR analysis, single-strand conformation polymorphism analysis (SSCP), mismatch cleavage detection, heteroduplex analysis, Southern blot analysis, Northern blot analysis, Western blot analysis, in situ hybridization, array analysis, deoxyribonucleic acid sequencing, restriction fragment length polymorphism analysis, and combinations or subcombinations thereof, of said sample.

In one embodiment, the level of expression of the marker in the sample is determined using a technique selected from the group consisting of

immunohistochemistry, immunocytochemistry, flow cytometry, ELISA and mass spectrometry.

In one embodiment, the level of expression of a plurality of markers is determined.

cardiotoxicity-inducing drug. In one embodiment, the Coenzyme Q10 is administered to a subject prior to treatment of the subject with a cardiotoxicity-inducing drug. In one embodiment, the drug-induced cardiotoxicity is associated with modulation of expression of one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty, twenty-five, thirty, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, or more of the biomarkers selected from the markers listed in table 2. All values presented in the foregoing list can also be the upper or lower limit of ranges, that are intended to be a part of this invention, e.g., between 1 and 5, 1 and 10, 2 and 5, 2 and 10, or 5 and 10 of the foregoing genes (or proteins).

In one embodiment, the drug-induced cardiotoxicity is associated with modulation of a panel of two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, or thirteen markers selected from a group consisting TIMP1, PTX3, HSP76, FINC, CYB5, PAIl, IBP7 (IGFBP7), 1C17, EDIL3, HMOXl, NUCBl, CSOlO, HSPA4.

The invention further provides biomarkers (e.g, genes and/or proteins) that are useful as predictive markers for drug-induced cardiotoxicity. These biomarkers include the markers listed in table 2. In one embodiment, the predictive markers for drug- induced cardiotoxicity is a panel of two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, or thirteen, markers selected from a group consisting TIMP1, PTX3, HSP76, FINC, CYB5, PAIl, IBP7 (IGFBP7), 1C17, EDIL3, HMOXl, NUCBl, CSOlO, HSPA4. The ordinary skilled artisan would, however, be able to identify additional biomarkers predictive of drug-induced cardiotoxicity by employing the methods described herein, e.g., by carrying out the methods described in Example 3 but by using a different drug known to induce cardiotoxicity. Exemplary drug-induced cardiotoxicity biomarkers of the invention are further described below.

GRP78 and GRP75 are also referred to as glucose response proteins. These proteins are associated with endo/sarcoplasmic reticulum stress (ER stress) of cardiomyocytes. SERCA, or sarcoendoplasmic reticulum calcium ATPase, regulates Ca2+ homeostatsis in cardiac cells. Any disruption of these ATPase can lead to cardiac dysfunction and heart failure. Based upon the data provided herein, GRP75 and GRP78 and the edges around them are novel predictors of drug induced cardiotoxicity.

TIMP1, also referred to as TIMP metalloprotease inhibitor 1, is involved with remodeling of extra cellular matrix in association with MMPs. TIMP1 expression is correlated with fibrosis of the heart, and hypoxia of vascular endothelial cells also induces TIMP1 expression. Based upon the data provided herein, TIMP1 is a novel predictor of drug induced cardiactoxicity

PTX3, also referred to as Pentraxin 3, belongs to the family of C Reactive Proteins (CRP) and is a good marker of an inflammatory condition of the heart.

However, plasma PTX3 could also be representative of systemic inflammatory response due to sepsis or other medical conditions. Based upon the data provided herein, PTX3 may be a novel marker of cardiac function or cardiotoxicity. Additionally, the edges associated with PTX 3 in the network could form a novel panel of biomarkers.

HSP76, also referred to as HSPA6, is only known to be expressed in endothelial cells and B lymphocytes. There is no known role for this protein in cardiac function. Based upon the data provided herein, HSP76 may be a novel predictor of drug induced cardiotoxicity

PDIA4, PDIA1, also referred to as protein disulphide isomerase family A proteins, are associated with ER stress response, like GRPs. There is no known role for these proteins in cardiac function. Based upon the data provided herein, these proteins may be novel predictors of drug induced cardiotoxicity.

CA2D1 is also referred to as calcium channel, voltage-dependent, alpha 2/delta subunit. The alpha-2/delta subunit of voltage-dependent calcium channel regulates calcium current density and activation/inactivation kinetics of the calcium channel. CA2D1 plays an important role in excitation-contraction coupling in the heart. There is no known role for this protein in cardiac function. Based upon the data provided herein, CA2D1 is a novel predictor of drug induced cardiotoxicity

GPAT1 is one of four known glycerol-3 -phosphate acyltransferase isoforms, and is located on the mitochondrial outer membrane, allowing reciprocal regulation with carnitine palmitoyltransferase- 1. GPAT1 is upregulated transcriptionally by insulin and SREBP- lc and downregulated acutely by AMP-activated protein kinase, consistent with a role in triacylglycerol synthesis. Based upon the data provided herein, GPAT1 is a novel predictor of drug induced cardiotoxicity.

TAZ, also referred to as Tafazzin, is highly expressed in cardiac and skeletal muscle. TAZ is involved in the metabolism of cardiolipin and functions as a phospholipid-lysophospholipid transacylase. Tafazzin is responsible for remodeling of a phospholipid cardiolipin (CL), the signature lipid of the mitochondrial inner membrane. Based upon the data provided herein, TAZ is a novel predictor of drug induced cardiotoxicity

Various aspects of the invention are described in further detail in the following subsections.

B. Isolated Nucleic Acid Molecules

One aspect of the invention pertains to isolated nucleic acid molecules, including nucleic acids which encode a marker protein or a portion thereof. Isolated nucleic acids of the invention also include nucleic acid molecules sufficient for use as hybridization probes to identify marker nucleic acid molecules, and fragments of marker nucleic acid molecules, e.g. , those suitable for use as PCR primers for the amplification or mutation of marker nucleic acid molecules. As used herein, the term "nucleic acid molecule" is intended to include DNA molecules (e.g. , cDNA or genomic DNA) and RNA molecules (e.g. , mRNA) and analogs of the DNA or RNA generated using nucleotide analogs. The nucleic acid molecule can be single- stranded or double-stranded, but preferably is double- stranded DNA.

An "isolated" nucleic acid molecule is one which is separated from other nucleic acid molecules which are present in the natural source of the nucleic acid molecule. In one embodiment, an "isolated" nucleic acid molecule is free of sequences (preferably protein-encoding sequences) which naturally flank the nucleic acid (i.e. , sequences located at the 5' and 3' ends of the nucleic acid) in the genomic DNA of the organism from which the nucleic acid is derived. For example, in various embodiments, the isolated nucleic acid molecule can contain less than about 5 kB, 4 kB, 3 kB, 2 kB, l kB, 0.5 kB or 0.1 kB of nucleotide sequences which naturally flank the nucleic acid molecule in genomic DNA of the cell from which the nucleic acid is derived. In another embodiment, an "isolated" nucleic acid molecule, such as a cDNA molecule, can be substantially free of other cellular material, or culture medium when produced by recombinant techniques, or substantially free of chemical precursors or other chemicals when chemically synthesized. A nucleic acid molecule that is substantially free of cellular material includes preparations having less than about 30%, 20%, 10%, or 5% of heterologous nucleic acid (also referred to herein as a "contaminating nucleic acid").

A nucleic acid molecule of the present invention can be isolated using standard molecular biology techniques and the sequence information in the database records described herein. Using all or a portion of such nucleic acid sequences, nucleic acid molecules of the invention can be isolated using standard hybridization and cloning techniques (e.g. , as described in Sambrook et al. , ed., Molecular Cloning: A Laboratory Manual, 2nd ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1989).

A nucleic acid molecule of the invention can be amplified using cDNA, mRNA, or genomic DNA as a template and appropriate oligonucleotide primers according to standard PCR amplification techniques. The nucleic acid so amplified can be cloned into an appropriate vector and characterized by DNA sequence analysis. Furthermore, nucleotides corresponding to all or a portion of a nucleic acid molecule of the invention can be prepared by standard synthetic techniques, e.g. , using an automated DNA synthesizer.

In another preferred embodiment, an isolated nucleic acid molecule of the invention comprises a nucleic acid molecule which has a nucleotide sequence complementary to the nucleotide sequence of a marker nucleic acid or to the nucleotide sequence of a nucleic acid encoding a marker protein. A nucleic acid molecule which is complementary to a given nucleotide sequence is one which is sufficiently

complementary to the given nucleotide sequence that it can hybridize to the given nucleotide sequence thereby forming a stable duplex.

Moreover, a nucleic acid molecule of the invention can comprise only a portion of a nucleic acid sequence, wherein the full length nucleic acid sequence comprises a marker nucleic acid or which encodes a marker protein. Such nucleic acids can be used, for example, as a probe or primer. The probe/primer typically is used as one or more substantially purified oligonucleotides. The oligonucleotide typically comprises a region of nucleotide sequence that hybridizes under stringent conditions to at least about 7, preferably about 15, more preferably about 25, 50, 75, 100, 125, 150, 175, 200, 250, 300, 350, or 400 or more consecutive nucleotides of a nucleic acid of the invention.

Probes based on the sequence of a nucleic acid molecule of the invention can be used to detect transcripts or genomic sequences corresponding to one or more markers of the invention. The probe comprises a label group attached thereto, e.g. , a radioisotope, a fluorescent compound, an enzyme, or an enzyme co-factor. Such probes can be used as part of a diagnostic test kit for identifying cells or tissues which mis-express the protein, such as by measuring levels of a nucleic acid molecule encoding the protein in a sample of cells from a subject, e.g. , detecting mRNA levels or determining whether a gene encoding the protein has been mutated or deleted.

The invention further encompasses nucleic acid molecules that differ, due to degeneracy of the genetic code, from the nucleotide sequence of nucleic acids encoding a marker protein, and thus encode the same protein.

It will be appreciated by those skilled in the art that DNA sequence

polymorphisms that lead to changes in the amino acid sequence can exist within a population (e.g. , the human population). Such genetic polymorphisms can exist among individuals within a population due to natural allelic variation. An allele is one of a group of genes which occur alternatively at a given genetic locus. In addition, it will be appreciated that DNA polymorphisms that affect RNA expression levels can also exist that may affect the overall expression level of that gene (e.g. , by affecting regulation or degradation).

As used herein, the phrase "allelic variant" refers to a nucleotide sequence which occurs at a given locus or to a polypeptide encoded by the nucleotide sequence.

As used herein, the terms "gene" and "recombinant gene" refer to nucleic acid molecules comprising an open reading frame encoding a polypeptide corresponding to a marker of the invention. Such natural allelic variations can typically result in 1-5% variance in the nucleotide sequence of a given gene. Alternative alleles can be identified by sequencing the gene of interest in a number of different individuals. This can be readily carried out by using hybridization probes to identify the same genetic locus in a variety of individuals. Any and all such nucleotide variations and resulting amino acid polymorphisms or variations that are the result of natural allelic variation and that do not alter the functional activity are intended to be within the scope of the invention.

In another embodiment, an isolated nucleic acid molecule of the invention is at least 7, 15, 20, 25, 30, 40, 60, 80, 100, 150, 200, 250, 300, 350, 400, 450, 550, 650, 700, 800, 900, 1000, 1200, 1400, 1600, 1800, 2000, 2200, 2400, 2600, 2800, 3000, 3500, 4000, 4500, or more nucleotides in length and hybridizes under stringent conditions to a marker nucleic acid or to a nucleic acid encoding a marker protein. As used herein, the term "hybridizes under stringent conditions" is intended to describe conditions for hybridization and washing under which nucleotide sequences at least 60% (65%, 70%, preferably 75%) identical to each other typically remain hybridized to each other. Such stringent conditions are known to those skilled in the art and can be found in sections 6.3.1-6.3.6 of Current Protocols in Molecular Biology, John Wiley & Sons, N.Y.

(1989). A preferred, non-limiting example of stringent hybridization conditions are hybridization in 6X sodium chloride/sodium citrate (SSC) at about 45°C, followed by one or more washes in 0.2X SSC, 0.1% SDS at 50-65°C.

In addition to naturally-occurring allelic variants of a nucleic acid molecule of the invention that can exist in the population, the skilled artisan will further appreciate that sequence changes can be introduced by mutation thereby leading to changes in the amino acid sequence of the encoded protein, without altering the biological activity of the protein encoded thereby. For example, one can make nucleotide substitutions leading to amino acid substitutions at "non-essential" amino acid residues. A "nonessential" amino acid residue is a residue that can be altered from the wild-type sequence without altering the biological activity, whereas an "essential" amino acid residue is required for biological activity. For example, amino acid residues that are not conserved or only semi-conserved among homologs of various species may be non-essential for activity and thus would be likely targets for alteration. Alternatively, amino acid residues that are conserved among the homologs of various species (e.g. , murine and human) may be essential for activity and thus would not be likely targets for alteration.

Accordingly, another aspect of the invention pertains to nucleic acid molecules encoding a variant marker protein that contain changes in amino acid residues that are not essential for activity. Such variant marker proteins differ in amino acid sequence from the naturally-occurring marker proteins, yet retain biological activity. In one embodiment, such a variant marker protein has an amino acid sequence that is at least about 40% identical, 50%, 60%, 70%, 80%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identical to the amino acid sequence of a marker protein.

An isolated nucleic acid molecule encoding a variant marker protein can be created by introducing one or more nucleotide substitutions, additions or deletions into the nucleotide sequence of marker nucleic acids, such that one or more amino acid residue substitutions, additions, or deletions are introduced into the encoded protein. Mutations can be introduced by standard techniques, such as site-directed mutagenesis and PCR-mediated mutagenesis. Preferably, conservative amino acid substitutions are made at one or more predicted non-essential amino acid residues. A "conservative amino acid substitution" is one in which the amino acid residue is replaced with an amino acid residue having a similar side chain. Families of amino acid residues having similar side chains have been defined in the art. These families include amino acids with basic side chains (e.g. , lysine, arginine, histidine), acidic side chains (e.g. , aspartic acid, glutamic acid), uncharged polar side chains (e.g. , glycine, asparagine, glutamine, serine, threonine, tyrosine, cysteine), non-polar side chains (e.g. , alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan), beta-branched side chains (e.g. , threonine, valine, isoleucine) and aromatic side chains (e.g. , tyrosine,

phenylalanine, tryptophan, histidine). Alternatively, mutations can be introduced randomly along all or part of the coding sequence, such as by saturation mutagenesis, and the resultant mutants can be screened for biological activity to identify mutants that retain activity. Following mutagenesis, the encoded protein can be expressed recombinantly and the activity of the protein can be determined.

The present invention encompasses antisense nucleic acid molecules, i.e. , molecules which are complementary to a sense nucleic acid of the invention, e.g. , complementary to the coding strand of a double-stranded marker cDNA molecule or complementary to a marker mRNA sequence. Accordingly, an antisense nucleic acid of the invention can hydrogen bond to (i. e. anneal with) a sense nucleic acid of the invention. The antisense nucleic acid can be complementary to an entire coding strand, or to only a portion thereof, e.g. , all or part of the protein coding region (or open reading frame). An antisense nucleic acid molecule can also be antisense to all or part of a non- coding region of the coding strand of a nucleotide sequence encoding a marker protein. The non-coding regions ("5' and 3' untranslated regions") are the 5' and 3' sequences which flank the coding region and are not translated into amino acids.

An antisense oligonucleotide can be, for example, about 5, 10, 15, 20, 25, 30, 35, 40, 45, or 50 or more nucleotides in length. An antisense nucleic acid of the invention can be constructed using chemical synthesis and enzymatic ligation reactions using procedures known in the art. For example, an antisense nucleic acid (e.g. , an antisense oligonucleotide) can be chemically synthesized using naturally occurring nucleotides or variously modified nucleotides designed to increase the biological stability of the molecules or to increase the physical stability of the duplex formed between the antisense and sense nucleic acids, e.g. , phosphorothioate derivatives and acridine substituted nucleotides can be used. Examples of modified nucleotides which can be used to generate the antisense nucleic acid include 5-fluorouracil, 5-bromouracil, 5- chlorouracil, 5-iodouracil, hypoxanthine, xanthine, 4-acetylcytosine, 5- (carboxyhydroxylmethyl) uracil, 5-carboxymethylaminomethyl-2-thiouridine, 5- carboxymethylaminomethyluracil, dihydrouracil, beta-D-galactosylqueosine, inosine, N6-isopentenyladenine, 1-methylguanine, 1-methylinosine, 2,2-dimethylguanine, 2- methyladenine, 2-methylguanine, 3-methylcytosine, 5-methylcytosine, N6-adenine, 7- methylguanine, 5-methylaminomethyluracil, 5-methoxyaminomethyl-2-thiouracil, beta- D-mannosylqueosine, 5'-methoxycarboxymethyluracil, 5-methoxyuracil, 2-methylthio- N6-isopentenyladenine, uracil-5-oxyacetic acid (v), wybutoxosine, pseudouracil, queosine, 2-thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil, 5- methyluracil, uracil-5-oxyacetic acid methylester, uracil-5-oxy acetic acid (v), 5 -methyl - 2-thiouracil, 3-(3-amino-3-N-2-carboxypropyl) uracil, (acp3)w, and 2,6-diaminopurine. Alternatively, the antisense nucleic acid can be produced biologically using an expression vector into which a nucleic acid has been sub-cloned in an antisense orientation (i.e., RNA transcribed from the inserted nucleic acid will be of an antisense orientation to a target nucleic acid of interest, described further in the following subsection).

The antisense nucleic acid molecules of the invention are typically administered to a subject or generated in situ such that they hybridize with or bind to cellular mRNA and/or genomic DNA encoding a marker protein to thereby inhibit expression of the marker, e.g. , by inhibiting transcription and/or translation. The hybridization can be by conventional nucleotide complementarity to form a stable duplex, or, for example, in the case of an antisense nucleic acid molecule which binds to DNA duplexes, through specific interactions in the major groove of the double helix. Examples of a route of administration of antisense nucleic acid molecules of the invention includes direct injection at a tissue site or infusion of the antisense nucleic acid into toxicity state associated body fluid. Alternatively, antisense nucleic acid molecules can be modified to target selected cells and then administered systemically. For example, for systemic administration, antisense molecules can be modified such that they specifically bind to receptors or antigens expressed on a selected cell surface, e.g. , by linking the antisense nucleic acid molecules to peptides or antibodies which bind to cell surface receptors or antigens. The antisense nucleic acid molecules can also be delivered to cells using the vectors described herein. To achieve sufficient intracellular concentrations of the antisense molecules, vector constructs in which the antisense nucleic acid molecule is placed under the control of a strong pol II or pol III promoter are preferred.

An antisense nucleic acid molecule of the invention can be an a-anomeric nucleic acid molecule. An a-anomeric nucleic acid molecule forms specific double- stranded hybrids with complementary RNA in which, contrary to the usual a-units, the strands run parallel to each other (Gaultier et al., 1987, Nucleic Acids Res. 15:6625- 6641). The antisense nucleic acid molecule can also comprise a 2'-o- methylribonucleotide (Inoue et al., 1987, Nucleic Acids Res. 15:6131-6148) or a chimeric RNA-DNA analogue (Inoue et al, 1987, FEBS Lett. 215:327-330).

The invention also encompasses ribozymes. Ribozymes are catalytic RNA molecules with ribonuclease activity which are capable of cleaving a single-stranded nucleic acid, such as an mRNA, to which they have a complementary region. Thus, ribozymes (e.g. , hammerhead ribozymes as described in Haselhoff and Gerlach, 1988, Nature 334:585-591) can be used to catalytically cleave mRNA transcripts to thereby inhibit translation of the protein encoded by the mRNA. A ribozyme having specificity for a nucleic acid molecule encoding a marker protein can be designed based upon the nucleotide sequence of a cDNA corresponding to the marker. For example, a derivative of a Tetrahymena L-19 IVS RNA can be constructed in which the nucleotide sequence of the active site is complementary to the nucleotide sequence to be cleaved (see Cech et al. U.S. Patent No. 4,987,071; and Cech et al. U.S. Patent No. 5,116,742).

Alternatively, an mRNA encoding a polypeptide of the invention can be used to select a catalytic RNA having a specific ribonuclease activity from a pool of RNA molecules (see, e.g. , Bartel and Szostak, 1993, Science 261: 1411-1418).

The invention also encompasses nucleic acid molecules which form triple helical structures. For example, expression of a marker of the invention can be inhibited by targeting nucleotide sequences complementary to the regulatory region of the gene encoding the marker nucleic acid or protein (e.g., the promoter and/or enhancer) to form triple helical structures that prevent transcription of the gene in target cells. See generally Helene (1991) Anticancer Drug Des. 6(6):569-84; Helene (1992) Ann. N Y. Acad. Sci. 660:27-36; and Maher (1992) Bioassays 14(12):807-15.

In various embodiments, the nucleic acid molecules of the invention can be modified at the base moiety, sugar moiety or phosphate backbone to improve, e.g., the stability, hybridization, or solubility of the molecule. For example, the deoxyribose phosphate backbone of the nucleic acids can be modified to generate peptide nucleic acids (see Hyrup et al, 1996, Bioorganic & Medicinal Chemistry 4(1): 5-23). As used herein, the terms "peptide nucleic acids" or "PNAs" refer to nucleic acid mimics, e.g. , DNA mimics, in which the deoxyribose phosphate backbone is replaced by a pseudopeptide backbone and only the four natural nucleobases are retained. The neutral backbone of PNAs has been shown to allow for specific hybridization to DNA and RNA under conditions of low ionic strength. The synthesis of PNA oligomers can be performed using standard solid phase peptide synthesis protocols as described in Hyrup et al. (1996), supra; Perry-O'Keefe et al. (1996) Proc. Natl. Acad. Sci. USA 93: 14670- 675.

PNAs can be used in therapeutic and diagnostic applications. For example, PNAs can be used as antisense or antigene agents for sequence-specific modulation of gene expression by, e.g. , inducing transcription or translation arrest or inhibiting replication. PNAs can also be used, e.g. , in the analysis of single base pair mutations in a gene by, e.g. , PNA directed PCR clamping; as artificial restriction enzymes when used in combination with other enzymes, e.g. , SI nucleases (Hyrup (1996), supra; or as probes or primers for DNA sequence and hybridization (Hyrup, 1996, supra; Perry- O'Keefe et al., 1996, Proc. Natl. Acad. Sci. USA 93: 14670-675).

In another embodiment, PNAs can be modified, e.g. , to enhance their stability or cellular uptake, by attaching lipophilic or other helper groups to PNA, by the formation of PNA-DNA chimeras, or by the use of liposomes or other techniques of drug delivery known in the art. For example, PNA-DNA chimeras can be generated which can combine the advantageous properties of PNA and DNA. Such chimeras allow DNA recognition enzymes, e.g. , RNase H and DNA polymerases, to interact with the DNA portion while the PNA portion would provide high binding affinity and specificity. PNA-DNA chimeras can be linked using linkers of appropriate lengths selected in terms of base stacking, number of bonds between the nucleobases, and orientation (Hyrup, 1996, supra). The synthesis of PNA-DNA chimeras can be performed as described in Hyrup (1996), supra, and Finn et al. (1996) Nucleic Acids Res. 24(17):3357-63. For example, a DNA chain can be synthesized on a solid support using standard

phosphoramidite coupling chemistry and modified nucleoside analogs. Compounds such as 5'-(4-methoxytrityl)amino-5'-deoxy-thymidine phosphoramidite can be used as a link between the PNA and the 5' end of DNA (Mag et al., 1989, Nucleic Acids Res. 17:5973-88). PNA monomers are then coupled in a step- wise manner to produce a chimeric molecule with a 5' PNA segment and a 3' DNA segment (Finn et al., 1996, Nucleic Acids Res. 24(17):3357-63). Alternatively, chimeric molecules can be synthesized with a 5' DNA segment and a 3' PNA segment (Peterser et al., 1975, Bioorganic Med. Chem. Lett. 5: 1119-11124).

In other embodiments, the oligonucleotide can include other appended groups such as peptides {e.g., for targeting host cell receptors in vivo), or agents facilitating transport across the cell membrane (see, e.g. , Letsinger et al, 1989, Proc. Natl. Acad. Sci. USA 86:6553-6556; Lemaitre et al., 1987, Proc. Natl. Acad. Sci. USA 84:648-652; PCT Publication No. WO 88/09810) or the blood-brain barrier (see, e.g. , PCT

Publication No. WO 89/10134). In addition, oligonucleotides can be modified with hybridization-triggered cleavage agents (see, e.g., Krol et al., 1988, Bio/Techniques 6:958-976) or intercalating agents (see, e.g., Zon, 1988, Pharm. Res. 5:539-549). To this end, the oligonucleotide can be conjugated to another molecule, e.g. , a peptide, hybridization triggered cross-linking agent, transport agent, hybridization-triggered cleavage agent, etc.

The invention also includes molecular beacon nucleic acids having at least one region which is complementary to a nucleic acid of the invention, such that the molecular beacon is useful for quantitating the presence of the nucleic acid of the invention in a sample. A "molecular beacon" nucleic acid is a nucleic acid comprising a pair of complementary regions and having a fluorophore and a fluorescent quencher associated therewith. The fluorophore and quencher are associated with different portions of the nucleic acid in such an orientation that when the complementary regions are annealed with one another, fluorescence of the fluorophore is quenched by the quencher. When the complementary regions of the nucleic acid are not annealed with one another, fluorescence of the fluorophore is quenched to a lesser degree. Molecular beacon nucleic acids are described, for example, in U.S. Patent 5,876,930.

C. Isolated Proteins and Antibodies

One aspect of the invention pertains to isolated marker proteins and biologically active portions thereof, as well as polypeptide fragments suitable for use as immunogens to raise antibodies directed against a marker protein or a fragment thereof. In one embodiment, the native marker protein can be isolated from cells or tissue sources by an appropriate purification scheme using standard protein purification techniques. In another embodiment, a protein or peptide comprising the whole or a segment of the marker protein is produced by recombinant DNA techniques. Alternative to recombinant expression, such protein or peptide can be synthesized chemically using standard peptide synthesis techniques.

An "isolated" or "purified" protein or biologically active portion thereof is substantially free of cellular material or other contaminating proteins from the cell or tissue source from which the protein is derived, or substantially free of chemical precursors or other chemicals when chemically synthesized. The language

"substantially free of cellular material" includes preparations of protein in which the protein is separated from cellular components of the cells from which it is isolated or recombinantly produced. Thus, protein that is substantially free of cellular material includes preparations of protein having less than about 30%, 20%, 10%, or 5% (by dry weight) of heterologous protein (also referred to herein as a "contaminating protein"). When the protein or biologically active portion thereof is recombinantly produced, it is also preferably substantially free of culture medium, i. e. , culture medium represents less than about 20%, 10%, or 5% of the volume of the protein preparation. When the protein is produced by chemical synthesis, it is preferably substantially free of chemical precursors or other chemicals, i.e. , it is separated from chemical precursors or other chemicals which are involved in the synthesis of the protein. Accordingly such preparations of the protein have less than about 30%, 20%, 10%, 5% (by dry weight) of chemical precursors or compounds other than the polypeptide of interest.

Biologically active portions of a marker protein include polypeptides comprising amino acid sequences sufficiently identical to or derived from the amino acid sequence of the marker protein, which include fewer amino acids than the full length protein, and exhibit at least one activity of the corresponding full-length protein. Typically, biologically active portions comprise a domain or motif with at least one activity of the corresponding full-length protein. A biologically active portion of a marker protein of the invention can be a polypeptide which is, for example, 10, 25, 50, 100 or more amino acids in length. Moreover, other biologically active portions, in which other regions of the marker protein are deleted, can be prepared by recombinant techniques and evaluated for one or more of the functional activities of the native form of the marker protein.

Preferred marker proteins are encoded by nucleotide sequences comprising the sequences encoding any of the genes described in the examples. Other useful proteins are substantially identical (e.g. , at least about 40%, preferably 50%, 60%, 70%, 80%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99%) to one of these sequences and retain the functional activity of the corresponding naturally-occurring marker protein yet differ in amino acid sequence due to natural allelic variation or mutagenesis.

To determine the percent identity of two amino acid sequences or of two nucleic acids, the sequences are aligned for optimal comparison purposes (e.g. , gaps can be introduced in the sequence of a first amino acid or nucleic acid sequence for optimal alignment with a second amino or nucleic acid sequence). The amino acid residues or nucleotides at corresponding amino acid positions or nucleotide positions are then compared. When a position in the first sequence is occupied by the same amino acid residue or nucleotide as the corresponding position in the second sequence, then the molecules are identical at that position. Preferably, the percent identity between the two sequences is calculated using a global alignment. Alternatively, the percent identity between the two sequences is calculated using a local alignment. The percent identity between the two sequences is a function of the number of identical positions shared by the sequences (i.e. , % identity = # of identical positions/total # of positions (e.g. , overlapping positions) xlOO). In one embodiment the two sequences are the same length. In another embodiment, the two sequences are not the same length. The determination of percent identity between two sequences can be

accomplished using a mathematical algorithm. A preferred, non-limiting example of a mathematical algorithm utilized for the comparison of two sequences is the algorithm of Karlin and Altschul (1990) Proc. Natl. Acad. Sci. USA 87:2264-2268, modified as in Karlin and Altschul (1993) Proc. Natl. Acad. Sci. USA 90:5873-5877. Such an algorithm is incorporated into the BLASTN and BLASTX programs of Altschul, et al. (1990) /. Mol. Biol. 215:403-410. BLAST nucleotide searches can be performed with the BLASTN program, score = 100, wordlength = 12 to obtain nucleotide sequences homologous to a nucleic acid molecules of the invention. BLAST protein searches can be performed with the BLASTP program, score = 50, wordlength = 3 to obtain amino acid sequences homologous to a protein molecules of the invention. To obtain gapped alignments for comparison purposes, a newer version of the BLAST algorithm called Gapped BLAST can be utilized as described in Altschul et al. (1997) Nucleic Acids Res. 25:3389-3402, which is able to perform gapped local alignments for the programs BLASTN, BLASTP and BLASTX. Alternatively, PSI-Blast can be used to perform an iterated search which detects distant relationships between molecules. When utilizing BLAST, Gapped BLAST, and PSI-Blast programs, the default parameters of the respective programs (e.g. , BLASTX and BLASTN) can be used. See

http://www.ncbi.nlm.nih.gov. Another preferred, non-limiting example of a mathematical algorithm utilized for the comparison of sequences is the algorithm of Myers and Miller, (1988) CABIOS 4: 11-17. Such an algorithm is incorporated into the ALIGN program (version 2.0) which is part of the GCG sequence alignment software package. When utilizing the ALIGN program for comparing amino acid sequences, a PAM120 weight residue table, a gap length penalty of 12, and a gap penalty of 4 can be used. Yet another useful algorithm for identifying regions of local sequence similarity and alignment is the FASTA algorithm as described in Pearson and Lipman (1988) Proc. Natl. Acad. Sci. USA 85:2444-2448. When using the FASTA algorithm for comparing nucleotide or amino acid sequences, a PAM120 weight residue table can, for example, be used with a fc-tuple value of 2.

The percent identity between two sequences can be determined using techniques similar to those described above, with or without allowing gaps. In calculating percent identity, only exact matches are counted. The invention also provides chimeric or fusion proteins comprising a marker protein or a segment thereof. As used herein, a "chimeric protein" or "fusion protein" comprises all or part (preferably a biologically active part) of a marker protein operably linked to a heterologous polypeptide (i.e. , a polypeptide other than the marker protein). Within the fusion protein, the term "operably linked" is intended to indicate that the marker protein or segment thereof and the heterologous polypeptide are fused in-frame to each other. The heterologous polypeptide can be fused to the amino-terminus or the carboxyl-terminus of the marker protein or segment.

One useful fusion protein is a GST fusion protein in which a marker protein or segment is fused to the carboxyl terminus of GST sequences. Such fusion proteins can facilitate the purification of a recombinant polypeptide of the invention.

In another embodiment, the fusion protein contains a heterologous signal sequence at its amino terminus. For example, the native signal sequence of a marker protein can be removed and replaced with a signal sequence from another protein. For example, the gp67 secretory sequence of the baculovirus envelope protein can be used as a heterologous signal sequence (Ausubel et ah , ed., Current Protocols in Molecular Biology, John Wiley & Sons, NY, 1992). Other examples of eukaryotic heterologous signal sequences include the secretory sequences of melittin and human placental alkaline phosphatase (Stratagene; La Jolla, California). In yet another example, useful prokaryotic heterologous signal sequences include the phoA secretory signal (Sambrook et ah, supra) and the protein A secretory signal (Pharmacia Biotech; Piscataway, New Jersey).

In yet another embodiment, the fusion protein is an immunoglobulin fusion protein in which all or part of a marker protein is fused to sequences derived from a member of the immunoglobulin protein family. The immunoglobulin fusion proteins of the invention can be incorporated into pharmaceutical compositions and administered to a subject to inhibit an interaction between a ligand (soluble or membrane-bound) and a protein on the surface of a cell (receptor), to thereby suppress signal transduction in vivo. The immunoglobulin fusion protein can be used to affect the bioavailability of a cognate ligand of a marker protein. Inhibition of ligand/receptor interaction can be useful therapeutically, both for treating proliferative and differentiative disorders and for modulating (e.g. promoting or inhibiting) cell survival. Moreover, the immunoglobulin fusion proteins of the invention can be used as immunogens to produce antibodies directed against a marker protein in a subject, to purify ligands and in screening assays to identify molecules which inhibit the interaction of the marker protein with ligands.

Chimeric and fusion proteins of the invention can be produced by standard recombinant DNA techniques. In another embodiment, the fusion gene can be synthesized by conventional techniques including automated DNA synthesizers.

Alternatively, PCR amplification of gene fragments can be carried out using anchor primers which give rise to complementary overhangs between two consecutive gene fragments which can subsequently be annealed and re-amplified to generate a chimeric gene sequence (see, e.g., Ausubel et al., supra). Moreover, many expression vectors are commercially available that already encode a fusion moiety (e.g. , a GST polypeptide). A nucleic acid encoding a polypeptide of the invention can be cloned into such an expression vector such that the fusion moiety is linked in-frame to the polypeptide of the invention.

A signal sequence can be used to facilitate secretion and isolation of marker proteins. Signal sequences are typically characterized by a core of hydrophobic amino acids which are generally cleaved from the mature protein during secretion in one or more cleavage events. Such signal peptides contain processing sites that allow cleavage of the signal sequence from the mature proteins as they pass through the secretory pathway. Thus, the invention pertains to marker proteins, fusion proteins or segments thereof having a signal sequence, as well as to such proteins from which the signal sequence has been proteolytically cleaved (i.e. , the cleavage products). In one embodiment, a nucleic acid sequence encoding a signal sequence can be operably linked in an expression vector to a protein of interest, such as a marker protein or a segment thereof. The signal sequence directs secretion of the protein, such as from a eukaryotic host into which the expression vector is transformed, and the signal sequence is subsequently or concurrently cleaved. The protein can then be readily purified from the extracellular medium by art recognized methods. Alternatively, the signal sequence can be linked to the protein of interest using a sequence which facilitates purification, such as with a GST domain.

The present invention also pertains to variants of the marker proteins. Such variants have an altered amino acid sequence which can function as either agonists (mimetics) or as antagonists. Variants can be generated by mutagenesis, e.g. , discrete point mutation or truncation. An agonist can retain substantially the same, or a subset, of the biological activities of the naturally occurring form of the protein. An antagonist of a protein can inhibit one or more of the activities of the naturally occurring form of the protein by, for example, competitively binding to a downstream or upstream member of a cellular signaling cascade which includes the protein of interest. Thus, specific biological effects can be elicited by treatment with a variant of limited function.

Treatment of a subject with a variant having a subset of the biological activities of the naturally occurring form of the protein can have fewer side effects in a subject relative to treatment with the naturally occurring form of the protein.

Variants of a marker protein which function as either agonists (mimetics) or as antagonists can be identified by screening combinatorial libraries of mutants, e.g. , truncation mutants, of the protein of the invention for agonist or antagonist activity. In one embodiment, a variegated library of variants is generated by combinatorial mutagenesis at the nucleic acid level and is encoded by a variegated gene library. A variegated library of variants can be produced by, for example, enzymatically ligating a mixture of synthetic oligonucleotides into gene sequences such that a degenerate set of potential protein sequences is expressible as individual polypeptides, or alternatively, as a set of larger fusion proteins (e.g. , for phage display). There are a variety of methods which can be used to produce libraries of potential variants of the marker proteins from a degenerate oligonucleotide sequence. Methods for synthesizing degenerate

oligonucleotides are known in the art (see, e.g., Narang, 1983, Tetrahedron 39:3; Itakura et ah, 1984, Annu. Rev. Biochem. 53:323; Itakura et ah, 1984, Science 198: 1056; Ike et al, 1983 Nucleic Acid Res. 11:477).

In addition, libraries of segments of a marker protein can be used to generate a variegated population of polypeptides for screening and subsequent selection of variant marker proteins or segments thereof. For example, a library of coding sequence fragments can be generated by treating a double stranded PCR fragment of the coding sequence of interest with a nuclease under conditions wherein nicking occurs only about once per molecule, denaturing the double stranded DNA, renaturing the DNA to form double stranded DNA which can include sense/antisense pairs from different nicked products, removing single stranded portions from reformed duplexes by treatment with S 1 nuclease, and ligating the resulting fragment library into an expression vector. By this method, an expression library can be derived which encodes amino terminal and internal fragments of various sizes of the protein of interest. Several techniques are known in the art for screening gene products of combinatorial libraries made by point mutations or truncation, and for screening cDNA libraries for gene products having a selected property. The most widely used techniques, which are amenable to high through-put analysis, for screening large gene libraries typically include cloning the gene library into replicable expression vectors,

transforming appropriate cells with the resulting library of vectors, and expressing the combinatorial genes under conditions in which detection of a desired activity facilitates isolation of the vector encoding the gene whose product was detected. Recursive ensemble mutagenesis (REM), a technique which enhances the frequency of functional mutants in the libraries, can be used in combination with the screening assays to identify variants of a protein of the invention (Arkin and Yourvan, 1992, Proc. Natl. Acad. Sci. USA 89:7811-7815; Delgrave et al, 1993, Protein Engineering 6(3):327- 331).

Another aspect of the invention pertains to antibodies directed against a protein of the invention. In preferred embodiments, the antibodies specifically bind a marker protein or a fragment thereof. The terms "antibody" and "antibodies" as used interchangeably herein refer to immunoglobulin molecules as well as fragments and derivatives thereof that comprise an immunologically active portion of an

immunoglobulin molecule, {i.e. , such a portion contains an antigen binding site which specifically binds an antigen, such as a marker protein, e.g. , an epitope of a marker protein). An antibody which specifically binds to a protein of the invention is an antibody which binds the protein, but does not substantially bind other molecules in a sample, e.g. , a biological sample, which naturally contains the protein. Examples of an immunologically active portion of an immunoglobulin molecule include, but are not limited to, single-chain antibodies (scAb), F(ab) and F(ab') ₂ fragments.

An isolated protein of the invention or a fragment thereof can be used as an immunogen to generate antibodies. The full-length protein can be used or, alternatively, the invention provides antigenic peptide fragments for use as immunogens. The antigenic peptide of a protein of the invention comprises at least 8 (preferably 10, 15, 20, or 30 or more) amino acid residues of the amino acid sequence of one of the proteins of the invention, and encompasses at least one epitope of the protein such that an antibody raised against the peptide forms a specific immune complex with the protein. Preferred epitopes encompassed by the antigenic peptide are regions that are located on the surface of the protein, e.g. , hydrophilic regions. Hydrophobicity sequence analysis, hydrophilicity sequence analysis, or similar analyses can be used to identify hydrophilic regions. In preferred embodiments, an isolated marker protein or fragment thereof is used as an immunogen.

An immunogen typically is used to prepare antibodies by immunizing a suitable (i.e. immunocompetent) subject such as a rabbit, goat, mouse, or other mammal or vertebrate. An appropriate immunogenic preparation can contain, for example, recombinantly-expressed or chemically-synthesized protein or peptide. The preparation can further include an adjuvant, such as Freund's complete or incomplete adjuvant, or a similar immunostimulatory agent. Preferred immunogen compositions are those that contain no other human proteins such as, for example, immunogen compositions made using a non-human host cell for recombinant expression of a protein of the invention. In such a manner, the resulting antibody compositions have reduced or no binding of human proteins other than a protein of the invention.

The invention provides polyclonal and monoclonal antibodies. The term "monoclonal antibody" or "monoclonal antibody composition", as used herein, refers to a population of antibody molecules that contain only one species of an antigen binding site capable of immunoreacting with a particular epitope. Preferred polyclonal and monoclonal antibody compositions are ones that have been selected for antibodies directed against a protein of the invention. Particularly preferred polyclonal and monoclonal antibody preparations are ones that contain only antibodies directed against a marker protein or fragment thereof.

Polyclonal antibodies can be prepared by immunizing a suitable subject with a protein of the invention as an immunogen. The antibody titer in the immunized subject can be monitored over time by standard techniques, such as with an enzyme linked immunosorbent assay (ELISA) using immobilized polypeptide. At an appropriate time after immunization, e.g. , when the specific antibody titers are highest, antibody- producing cells can be obtained from the subject and used to prepare monoclonal antibodies (mAb) by standard techniques, such as the hybridoma technique originally described by Kohler and Milstein (1975) Nature 256:495-497, the human B cell hybridoma technique (see Kozbor et ah, 1983, Immunol. Today 4:72), the EBV- hybridoma technique (see Cole et ah , pp. 77-96 In Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc., 1985) or trioma techniques. The technology for producing hybridomas is well known (see generally Current Protocols in Immunology, Coligan et al. ed., John Wiley & Sons, New York, 1994). Hybridoma cells producing a

monoclonal antibody of the invention are detected by screening the hybridoma culture supernatants for antibodies that bind the polypeptide of interest, e.g., using a standard ELISA assay.

Alternative to preparing monoclonal antibody- secreting hybridomas, a monoclonal antibody directed against a protein of the invention can be identified and isolated by screening a recombinant combinatorial immunoglobulin library (e.g. , an antibody phage display library) with the polypeptide of interest. Kits for generating and screening phage display libraries are commercially available (e.g. , the Pharmacia Recombinant Phage Antibody System, Catalog No. 27-9400-01 ; and the Stratagene SurfZAP Phage Display Kit, Catalog No. 240612). Additionally, examples of methods and reagents particularly amenable for use in generating and screening antibody display library can be found in, for example, U.S. Patent No. 5,223,409; PCT Publication No. WO 92/18619; PCT Publication No. WO 91/17271; PCT Publication No. WO 92/20791 ; PCT Publication No. WO 92/15679; PCT Publication No. WO 93/01288; PCT

Publication No. WO 92/01047; PCT Publication No. WO 92/09690; PCT Publication No. WO 90/02809; Fuchs et al. (1991) Bio/Technology 9: 1370-1372; Hay et al. (1992) Hum. Antibod. Hybridomas 3:81-85; Huse et al. (1989) Science 246: 1275- 1281 ;

Griffiths et al. (1993) EMBO J. 12:725-734.

The invention also provides recombinant antibodies that specifically bind a protein of the invention. In preferred embodiments, the recombinant antibodies specifically binds a marker protein or fragment thereof. Recombinant antibodies include, but are not limited to, chimeric and humanized monoclonal antibodies, comprising both human and non-human portions, single-chain antibodies and multi- specific antibodies. A chimeric antibody is a molecule in which different portions are derived from different animal species, such as those having a variable region derived from a murine mAb and a human immunoglobulin constant region. (See, e.g., Cabilly et al., U.S. Patent No. 4,816,567; and Boss et al., U.S. Patent No. 4,816,397, which are incorporated herein by reference in their entirety.) Single-chain antibodies have an antigen binding site and consist of a single polypeptide. They can be produced by techniques known in the art, for example using methods described in Ladner et. al U.S. Pat. No. 4,946,778 (which is incorporated herein by reference in its entirety); Bird et al. , (1988) Science 242:423-426; Whitlow et al. , (1991) Methods in Enzymology 2: 1-9; Whitlow et al , (1991) Methods in Enzymology 2:97-105; and Huston et al , (1991) Methods in Enzymology Molecular Design and Modeling: Concepts and Applications 203:46-88. Multi-specific antibodies are antibody molecules having at least two antigen-binding sites that specifically bind different antigens. Such molecules can be produced by techniques known in the art, for example using methods described in Segal, U.S. Patent No. 4,676,980 (the disclosure of which is incorporated herein by reference in its entirety); Holliger et al., (1993) Proc. Natl. Acad. Sci. USA 90:6444-6448; Whitlow et al , (1994) Protein Eng. 7: 1017-1026 and U.S. Pat. No. 6,121,424.

Humanized antibodies are antibody molecules from non-human species having one or more complementarity determining regions (CDRs) from the non-human species and a framework region from a human immunoglobulin molecule. (See, e.g., Queen, U.S. Patent No. 5,585,089, which is incorporated herein by reference in its entirety.) Humanized monoclonal antibodies can be produced by recombinant DNA techniques known in the art, for example using methods described in PCT Publication No. WO 87/02671 ; European Patent Application 184,187; European Patent Application 171,496; European Patent Application 173,494; PCT Publication No. WO 86/01533; U.S. Patent No. 4,816,567; European Patent Application 125,023; Better et al. (1988) Science 240: 1041-1043; Liu et al. (1987) Proc. Natl. Acad. Sci. USA 84:3439-3443; Liu et al. (1987) /. Immunol. 139:3521- 3526; Sun et al. (1987) Proc. Natl. Acad. Sci. USA 84:214-218; Nishimura et al. (1987) Cancer Res. 47:999-1005; Wood et al. (1985) Nature 314:446-449; and Shaw et al. (1988) /. Natl. Cancer Inst. 80: 1553-1559);

Morrison (1985) Science 229: 1202-1207; Oi et al. (1986) Bio/Techniques 4:214; U.S. Patent 5,225,539; Jones et al. (1986) Nature 321:552-525; Verhoeyan et al. (1988) Science 239: 1534; and Beidler et al. (1988) /. Immunol. 141:4053-4060.

More particularly, humanized antibodies can be produced, for example, using transgenic mice which are incapable of expressing endogenous immunoglobulin heavy and light chains genes, but which can express human heavy and light chain genes. The transgenic mice are immunized in the normal fashion with a selected antigen, e.g. , all or a portion of a polypeptide corresponding to a marker of the invention. Monoclonal antibodies directed against the antigen can be obtained using conventional hybridoma technology. The human immunoglobulin transgenes harbored by the transgenic mice rearrange during B cell differentiation, and subsequently undergo class switching and somatic mutation. Thus, using such a technique, it is possible to produce therapeutically useful IgG, IgA and IgE antibodies. For an overview of this technology for producing human antibodies, see Lonberg and Huszar (1995) Int. Rev. Immunol. 13:65-93). For a detailed discussion of this technology for producing human antibodies and human monoclonal antibodies and protocols for producing such antibodies, see, e.g. , U.S. Patent 5,625,126; U.S. Patent 5,633,425; U.S. Patent 5,569,825; U.S. Patent 5,661,016; and U.S. Patent 5,545,806. In addition, companies such as Abgenix, Inc. (Freemont, CA), can be engaged to provide human antibodies directed against a selected antigen using technology similar to that described above.

Completely human antibodies which recognize a selected epitope can be generated using a technique referred to as "guided selection." In this approach a selected non-human monoclonal antibody, e.g. , a murine antibody, is used to guide the selection of a completely human antibody recognizing the same epitope (Jespers et ah, 1994, Bio/technology 12:899-903).

The antibodies of the invention can be isolated after production {e.g. , from the blood or serum of the subject) or synthesis and further purified by well-known techniques. For example, IgG antibodies can be purified using protein A

chromatography. Antibodies specific for a protein of the invention can be selected or {e.g. , partially purified) or purified by, e.g. , affinity chromatography. For example, a recombinantly expressed and purified (or partially purified) protein of the invention is produced as described herein, and covalently or non-covalently coupled to a solid support such as, for example, a chromatography column. The column can then be used to affinity purify antibodies specific for the proteins of the invention from a sample containing antibodies directed against a large number of different epitopes, thereby generating a substantially purified antibody composition, i.e., one that is substantially free of contaminating antibodies. By a substantially purified antibody composition is meant, in this context, that the antibody sample contains at most only 30% (by dry weight) of contaminating antibodies directed against epitopes other than those of the desired protein of the invention, and preferably at most 20%, yet more preferably at most 10%, and most preferably at most 5% (by dry weight) of the sample is

contaminating antibodies. A purified antibody composition means that at least 99% of the antibodies in the composition are directed against the desired protein of the invention. In a preferred embodiment, the substantially purified antibodies of the invention may specifically bind to a signal peptide, a secreted sequence, an extracellular domain, a transmembrane or a cytoplasmic domain or cytoplasmic membrane of a protein of the invention. In a particularly preferred embodiment, the substantially purified antibodies of the invention specifically bind to a secreted sequence or an extracellular domain of the amino acid sequences of a protein of the invention. In a more preferred embodiment, the substantially purified antibodies of the invention specifically bind to a secreted sequence or an extracellular domain of the amino acid sequences of a marker protein.

An antibody directed against a protein of the invention can be used to isolate the protein by standard techniques, such as affinity chromatography or immunoprecipitation. Moreover, such an antibody can be used to detect the marker protein or fragment thereof (e.g. , in a cellular lysate or cell supernatant) in order to evaluate the level and pattern of expression of the marker. The antibodies can also be used diagnostically to monitor protein levels in tissues or body fluids (e.g. in toxicity state associated body fluid) as part of a clinical testing procedure, e.g. , to, for example, determine the efficacy of a given treatment regimen. Detection can be facilitated by the use of an antibody derivative, which comprises an antibody of the invention coupled to a detectable substance.

Examples of detectable substances include various enzymes, prosthetic groups, fluorescent materials, luminescent materials, bioluminescent materials, and radioactive materials. Examples of suitable enzymes include horseradish peroxidase, alkaline phosphatase, β-galactosidase, or acetylcholinesterase; examples of suitable prosthetic group complexes include streptavidin/biotin and avidin/biotin; examples of suitable fluorescent materials include umbelliferone, fluorescein, fluorescein isothiocyanate, rhodamine, dichlorotriazinylamine fluorescein, dansyl chloride or phycoerythrin; an example of a luminescent material includes luminol; examples of bioluminescent materials include luciferase, luciferin, and aequorin, and examples of suitable

125 131 35 3

radioactive material include I, I, S or H.

Antibodies of the invention may also be used as therapeutic agents in treating cancers. In a preferred embodiment, completely human antibodies of the invention are used for therapeutic treatment of human cancer patients, particularly those having a cancer. In another preferred embodiment, antibodies that bind specifically to a marker protein or fragment thereof are used for therapeutic treatment. Further, such therapeutic antibody may be an antibody derivative or immunotoxin comprising an antibody conjugated to a therapeutic moiety such as a cytotoxin, a therapeutic agent or a radioactive metal ion. A cytotoxin or cytotoxic agent includes any agent that is detrimental to cells. Examples include taxol, cytochalasin B, gramicidin D, ethidium bromide, emetine, mitomycin, etoposide, tenoposide, vincristine, vinblastine, colchicin, doxorubicin, daunorubicin, dihydroxy anthracin dione, mitoxantrone, mithramycin, actinomycin D, 1-dehydrotestosterone, glucocorticoids, procaine, tetracaine, lidocaine, propranolol, and puromycin and analogs or homologs thereof. Therapeutic agents include, but are not limited to, antimetabolites (e.g. , methotrexate, 6-mercaptopurine, 6-thioguanine, cytarabine, 5-fluorouracil decarbazine), alkylating agents (e.g. , mechlorethamine, thioepa chlorambucil, melphalan, carmustine (BSNU) and lomustine (CCNU), cyclothosphamide, busulfan, dibromomannitol, streptozotocin, mitomycin C, and cis-dichlorodiamine platinum (II) (DDP) cisplatin), anthracyclines (e.g. , daunorubicin (formerly daunomycin) and doxorubicin), antibiotics (e.g. , dactinomycin (formerly actinomycin), bleomycin, mithramycin, and anthramycin (AMC)), and anti-mitotic agents (e.g. , vincristine and vinblastine).

The conjugated antibodies of the invention can be used for modifying a given biological response, for the drug moiety is not to be construed as limited to classical chemical therapeutic agents. For example, the drug moiety may be a protein or polypeptide possessing a desired biological activity. Such proteins may include, for example, a toxin such as ribosome-inhibiting protein (see Better et al., U.S. Patent No. 6, 146,631 , the disclosure of which is incorporated herein in its entirety), abrin, ricin A, pseudomonas exotoxin, or diphtheria toxin; a protein such as tumor necrosis factor, .alpha. -interferon, β-interferon, nerve growth factor, platelet derived growth factor, tissue plasminogen activator; or, biological response modifiers such as, for example, lymphokines, interleukin-1 ("IL- l "), interleukin-2 ("IL-2"), interleukin-6 ("IL-6"), granulocyte macrophase colony stimulating factor ("GM-CSF"), granulocyte colony stimulating factor ("G-CSF"), or other growth factors.

Techniques for conjugating such therapeutic moiety to antibodies are well known, see, e.g. , Arnon et al., "Monoclonal Antibodies For Immunotargeting Of Drugs In Cancer Therapy", in Monoclonal Antibodies And Cancer Therapy, Reisfeld et al. (eds.), pp. 243-56 (Alan R. Liss, Inc. 1985); Hellstrom et al., "Antibodies For Drug Delivery", in Controlled Drug Delivery (2nd Ed.), Robinson et al. (eds.), pp. 623-53 (Marcel Dekker, Inc. 1987); Thorpe, "Antibody Carriers Of Cytotoxic Agents In Cancer Therapy: A Review", in Monoclonal Antibodies '84: Biological And Clinical

Applications, Pinchera et al. (eds.), pp. 475-506 (1985); "Analysis, Results, And Future Prospective Of The Therapeutic Use Of Radiolabeled Antibody In Cancer Therapy", in Monoclonal Antibodies For Cancer Detection And Therapy, Baldwin et al. (eds.), pp. 303-16 (Academic Press 1985), and Thorpe et al., "The Preparation And Cytotoxic Properties Of Antibody-Toxin Conjugates", Immunol. Rev., 62:119-58 (1982).

Accordingly, in one aspect, the invention provides substantially purified antibodies, antibody fragments and derivatives, all of which specifically bind to a protein of the invention and preferably, a marker protein. In various embodiments, the substantially purified antibodies of the invention, or fragments or derivatives thereof, can be human, non-human, chimeric and/or humanized antibodies. In another aspect, the invention provides non-human antibodies, antibody fragments and derivatives, all of which specifically bind to a protein of the invention and preferably, a marker protein. Such non-human antibodies can be goat, mouse, sheep, horse, chicken, rabbit, or rat antibodies. Alternatively, the non-human antibodies of the invention can be chimeric and/or humanized antibodies. In addition, the non-human antibodies of the invention can be polyclonal antibodies or monoclonal antibodies. In still a further aspect, the invention provides monoclonal antibodies, antibody fragments and derivatives, all of which specifically bind to a protein of the invention and preferably, a marker protein. The monoclonal antibodies can be human, humanized, chimeric and/or non-human antibodies.

The invention also provides a kit containing an antibody of the invention conjugated to a detectable substance, and instructions for use. Still another aspect of the invention is a pharmaceutical composition comprising an antibody of the invention. In one embodiment, the pharmaceutical composition comprises an antibody of the invention and a pharmaceutically acceptable carrier.

D. Predictive Medicine

The present invention pertains to the field of predictive medicine in which diagnostic assays, prognostic assays, pharmacogenomics, and monitoring clinical trails are used for prognostic (predictive) purposes to thereby treat an individual

prophylactically. Accordingly, one aspect of the present invention relates to diagnostic assays for determining the level of expression of one or more marker proteins or nucleic acids, in order to determine whether an individual is at risk of developing drug-induced toxicity. Such assays can be used for prognostic or predictive purposes to thereby prophylactically treat an individual prior to the onset of the disorder.

Yet another aspect of the invention pertains to monitoring the influence of agents (e.g. , drugs or other compounds administered either to inhibit or to treat or prevent or drug-induced toxicity { i. e. in order to understand any drug-induced toxic effects that such treatment may have}) on the expression or activity of a marker of the invention in clinical trials. These and other agents are described in further detail in the following sections.

E. Diagnostic Assays

An exemplary method for detecting the presence or absence of a marker protein or nucleic acid in a biological sample involves obtaining a biological sample (e.g.

toxicity-associated body fluid or tissue sample) from a test subject and contacting the biological sample with a compound or an agent capable of detecting the polypeptide or nucleic acid (e.g. , mRNA, genomic DNA, or cDNA). The detection methods of the invention can thus be used to detect mRNA, protein, cDNA, or genomic DNA, for example, in a biological sample in vitro as well as in vivo. For example, in vitro techniques for detection of mRNA include Northern hybridizations and in situ hybridizations. In vitro techniques for detection of a marker protein include enzyme linked immunosorbent assays (ELISAs), Western blots, immunoprecipitations and immunofluorescence. In vitro techniques for detection of genomic DNA include Southern hybridizations. In vivo techniques for detection of mRNA include polymerase chain reaction (PCR), Northern hybridizations and in situ hybridizations. Furthermore, in vivo techniques for detection of a marker protein include introducing into a subject a labeled antibody directed against the protein or fragment thereof. For example, the antibody can be labeled with a radioactive marker whose presence and location in a subject can be detected by standard imaging techniques.

A general principle of such diagnostic and prognostic assays involves preparing a sample or reaction mixture that may contain a marker, and a probe, under appropriate conditions and for a time sufficient to allow the marker and probe to interact and bind, thus forming a complex that can be removed and/or detected in the reaction mixture. These assays can be conducted in a variety of ways. For example, one method to conduct such an assay would involve anchoring the marker or probe onto a solid phase support, also referred to as a substrate, and detecting target marker/probe complexes anchored on the solid phase at the end of the reaction. In one embodiment of such a method, a sample from a subject, which is to be assayed for presence and/or concentration of marker, can be anchored onto a carrier or solid phase support. In another embodiment, the reverse situation is possible, in which the probe can be anchored to a solid phase and a sample from a subject can be allowed to react as an unanchored component of the assay.

There are many established methods for anchoring assay components to a solid phase. These include, without limitation, marker or probe molecules which are immobilized through conjugation of biotin and streptavidin. Such biotinylated assay components can be prepared from biotin-NHS (N-hydroxy-succinimide) using techniques known in the art (e.g. , biotinylation kit, Pierce Chemicals, Rockford, IL), and immobilized in the wells of streptavidin-coated 96 well plates (Pierce Chemical). In certain embodiments, the surfaces with immobilized assay components can be prepared in advance and stored.

Other suitable carriers or solid phase supports for such assays include any material capable of binding the class of molecule to which the marker or probe belongs. Well-known supports or carriers include, but are not limited to, glass, polystyrene, nylon, polypropylene, nylon, polyethylene, dextran, amylases, natural and modified celluloses, polyacrylamides, gabbros, and magnetite.

In order to conduct assays with the above mentioned approaches, the non- immobilized component is added to the solid phase upon which the second component is anchored. After the reaction is complete, uncomplexed components may be removed (e.g. , by washing) under conditions such that any complexes formed will remain immobilized upon the solid phase. The detection of marker/probe complexes anchored to the solid phase can be accomplished in a number of methods outlined herein.

In a preferred embodiment, the probe, when it is the unanchored assay component, can be labeled for the purpose of detection and readout of the assay, either directly or indirectly, with detectable labels discussed herein and which are well-known to one skilled in the art.

It is also possible to directly detect marker/probe complex formation without further manipulation or labeling of either component (marker or probe), for example by utilizing the technique of fluorescence energy transfer (see, for example, Lakowicz et al , U.S. Patent No. 5,631,169; Stavrianopoulos, et al , U.S. Patent No. 4,868,103). A fluorophore label on the first, 'donor' molecule is selected such that, upon excitation with incident light of appropriate wavelength, its emitted fluorescent energy will be absorbed by a fluorescent label on a second 'acceptor' molecule, which in turn is able to fluoresce due to the absorbed energy. Alternately, the 'donor' protein molecule may simply utilize the natural fluorescent energy of tryptophan residues. Labels are chosen that emit different wavelengths of light, such that the 'acceptor' molecule label may be differentiated from that of the 'donor'. Since the efficiency of energy transfer between the labels is related to the distance separating the molecules, spatial relationships between the molecules can be assessed. In a situation in which binding occurs between the molecules, the fluorescent emission of the 'acceptor' molecule label in the assay should be maximal. An FET binding event can be conveniently measured through standard fluorometric detection means well known in the art (e.g. , using a fluorimeter).

In another embodiment, determination of the ability of a probe to recognize a marker can be accomplished without labeling either assay component (probe or marker) by utilizing a technology such as real-time Biomolecular Interaction Analysis (BIA) (see, e.g. , Sjolander, S. and Urbaniczky, C, 1991, Anal. Chem. 63:2338-2345 and Szabo et al., 1995, Curr. Opin. Struct. Biol. 5:699-705). As used herein, "BIA" or "surface plasmon resonance" is a technology for studying biospecific interactions in real time, without labeling any of the interactants (e.g. , BIAcore). Changes in the mass at the binding surface (indicative of a binding event) result in alterations of the refractive index of light near the surface (the optical phenomenon of surface plasmon resonance (SPR)), resulting in a detectable signal which can be used as an indication of real-time reactions between biological molecules.

Alternatively, in another embodiment, analogous diagnostic and prognostic assays can be conducted with marker and probe as solutes in a liquid phase. In such an assay, the complexed marker and probe are separated from uncomplexed components by any of a number of standard techniques, including but not limited to: differential centrifugation, chromatography, electrophoresis and immunoprecipitation. In differential centrifugation, marker/probe complexes may be separated from

uncomplexed assay components through a series of centrifugal steps, due to the different sedimentation equilibria of complexes based on their different sizes and densities (see, for example, Rivas, G., and Minton, A.P., 1993, Trends Biochem Sci. 18(8):284-7). Standard chromatographic techniques may also be utilized to separate complexed molecules from uncomplexed ones. For example, gel filtration chromatography separates molecules based on size, and through the utilization of an appropriate gel filtration resin in a column format, for example, the relatively larger complex may be separated from the relatively smaller uncomplexed components. Similarly, the relatively different charge properties of the marker/probe complex as compared to the

uncomplexed components may be exploited to differentiate the complex from uncomplexed components, for example through the utilization of ion-exchange chromatography resins. Such resins and chromatographic techniques are well known to one skilled in the art (see, e.g. , Heegaard, N.H., 1998, /. Mol. Recognit. Winter 11(1- 6): 141-8; Hage, D.S., and Tweed, S.A. J Chromatogr B Biomed Sci Appl 1997 Oct 10;699(l-2):499-525). Gel electrophoresis may also be employed to separate complexed assay components from unbound components (see, e.g. , Ausubel et ah , ed., Current Protocols in Molecular Biology, John Wiley & Sons, New York, 1987-1999). In this technique, protein or nucleic acid complexes are separated based on size or charge, for example. In order to maintain the binding interaction during the electrophoretic process, non-denaturing gel matrix materials and conditions in the absence of reducing agent are typically preferred. Appropriate conditions to the particular assay and components thereof will be well known to one skilled in the art.

In a particular embodiment, the level of marker mRNA can be determined both by in situ and by in vitro formats in a biological sample using methods known in the art. The term "biological sample" is intended to include tissues, cells, biological fluids and isolates thereof, isolated from a subject, as well as tissues, cells and fluids present within a subject. Many expression detection methods use isolated RNA. For in vitro methods, any RNA isolation technique that does not select against the isolation of mRNA can be utilized for the purification of RNA from cells (see, e.g. , Ausubel et ah , ed., Current Protocols in Molecular Biology, John Wiley & Sons, New York 1987-1999).

Additionally, large numbers of tissue samples can readily be processed using techniques well known to those of skill in the art, such as, for example, the single-step RNA isolation process of Chomczynski (1989, U.S. Patent No. 4,843,155).

The isolated mRNA can be used in hybridization or amplification assays that include, but are not limited to, Southern or Northern analyses, polymerase chain reaction analyses and probe arrays. One preferred diagnostic method for the detection of mRNA levels involves contacting the isolated mRNA with a nucleic acid molecule (probe) that can hybridize to the mRNA encoded by the gene being detected. The nucleic acid probe can be, for example, a full-length cDNA, or a portion thereof, such as an oligonucleotide of at least 7, 15, 30, 50, 100, 250 or 500 nucleotides in length and sufficient to specifically hybridize under stringent conditions to a mRNA or genomic DNA encoding a marker of the present invention. Other suitable probes for use in the diagnostic assays of the invention are described herein. Hybridization of an mRNA with the probe indicates that the marker in question is being expressed.

In one format, the mRNA is immobilized on a solid surface and contacted with a probe, for example by running the isolated mRNA on an agarose gel and transferring the mRNA from the gel to a membrane, such as nitrocellulose. In an alternative format, the probe(s) are immobilized on a solid surface and the mRNA is contacted with the probe(s), for example, in an Affymetrix gene chip array. A skilled artisan can readily adapt known mRNA detection methods for use in detecting the level of mRNA encoded by the markers of the present invention.

An alternative method for determining the level of mRNA marker in a sample involves the process of nucleic acid amplification, e.g. , by RT-PCR (the experimental embodiment set forth in Mullis, 1987, U.S. Patent No. 4,683,202), ligase chain reaction (Barany, 1991, Proc. Natl. Acad. Sci. USA, 88: 189-193), self sustained sequence replication (Guatelli et al , 1990, Proc. Natl. Acad. Sci. USA 87: 1874-1878), transcriptional amplification system (Kwoh et ah, 1989, Proc. Natl. Acad. Sci. USA 86: 1173-1177), Q-Beta Replicase (Lizardi et al , 1988, Bio/Technology 6: 1197), rolling circle replication (Lizardi et al. , U.S. Patent No. 5,854,033) or any other nucleic acid amplification method, followed by the detection of the amplified molecules using techniques well known to those of skill in the art. These detection schemes are especially useful for the detection of nucleic acid molecules if such molecules are present in very low numbers. As used herein, amplification primers are defined as being a pair of nucleic acid molecules that can anneal to 5' or 3' regions of a gene (plus and minus strands, respectively, or vice-versa) and contain a short region in between. In general, amplification primers are from about 10 to 30 nucleotides in length and flank a region from about 50 to 200 nucleotides in length. Under appropriate conditions and with appropriate reagents, such primers permit the amplification of a nucleic acid molecule comprising the nucleotide sequence flanked by the primers.

For in situ methods, mRNA does not need to be isolated from the prior to detection. In such methods, a cell or tissue sample is prepared/processed using known histological methods. The sample is then immobilized on a support, typically a glass slide, and then contacted with a probe that can hybridize to mRNA that encodes the marker.

As an alternative to making determinations based on the absolute expression level of the marker, determinations may be based on the normalized expression level of the marker. Expression levels are normalized by correcting the absolute expression level of a marker by comparing its expression to the expression of a gene that is not a marker, e.g. , a housekeeping gene that is constitutively expressed. Suitable genes for normalization include housekeeping genes such as the actin gene, or epithelial cell- specific genes. This normalization allows the comparison of the expression level in one sample, e.g. , a patient sample, to another sample, e.g. , a non-disease or non-toxic sample, or between samples from different sources.

Alternatively, the expression level can be provided as a relative expression level. To determine a relative expression level of a marker, the level of expression of the marker is determined for 10 or more samples of normal versus disease or toxic cell isolates, preferably 50 or more samples, prior to the determination of the expression level for the sample in question. The mean expression level of each of the genes assayed in the larger number of samples is determined and this is used as a baseline expression level for the marker. The expression level of the marker determined for the test sample (absolute level of expression) is then divided by the mean expression value obtained for that marker. This provides a relative expression level.

Preferably, the samples used in the baseline determination will be from non-toxic cells. The choice of the cell source is dependent on the use of the relative expression level. Using expression found in normal tissues as a mean expression score aids in validating whether the marker assayed is toxicity specific (versus normal cells). In addition, as more data is accumulated, the mean expression value can be revised, providing improved relative expression values based on accumulated data. Expression data from disesase cells or toxic cells provides a means for grading the severity of the disease or toxic state.

Ill In another embodiment of the present invention, a marker protein is detected. A preferred agent for detecting marker protein of the invention is an antibody capable of binding to such a protein or a fragment thereof, preferably an antibody with a detectable label. Antibodies can be polyclonal, or more preferably, monoclonal. An intact antibody, or a fragment or derivative thereof (e.g. , Fab or F(ab') ₂) can be used. The term

"labeled", with regard to the probe or antibody, is intended to encompass direct labeling of the probe or antibody by coupling (i. e. , physically linking) a detectable substance to the probe or antibody, as well as indirect labeling of the probe or antibody by reactivity with another reagent that is directly labeled. Examples of indirect labeling include detection of a primary antibody using a fluorescently labeled secondary antibody and end-labeling of a DNA probe with biotin such that it can be detected with fluorescently labeled streptavidin.

Proteins from cells can be isolated using techniques that are well known to those of skill in the art. The protein isolation methods employed can, for example, be such as those described in Harlow and Lane (Harlow and Lane, 1988, Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York).

A variety of formats can be employed to determine whether a sample contains a protein that binds to a given antibody. Examples of such formats include, but are not limited to, enzyme immunoassay (EIA), radioimmunoassay (RIA), Western blot analysis and enzyme linked immunoabsorbant assay (ELISA). A skilled artisan can readily adapt known protein/antibody detection methods for use in determining whether cells express a marker of the present invention.

In one format, antibodies, or antibody fragments or derivatives, can be used in methods such as Western blots or immunofluorescence techniques to detect the expressed proteins. In such uses, it is generally preferable to immobilize either the antibody or proteins on a solid support. Suitable solid phase supports or carriers include any support capable of binding an antigen or an antibody. Well-known supports or carriers include glass, polystyrene, polypropylene, polyethylene, dextran, nylon, amylases, natural and modified celluloses, poly aery lamides, gabbros, and magnetite.

One skilled in the art will know many other suitable carriers for binding antibody or antigen, and will be able to adapt such support for use with the present invention. For example, protein isolated from disease or toxic cells can be run on a polyacrylamide gel electrophoresis and immobilized onto a solid phase support such as nitrocellulose. The support can then be washed with suitable buffers followed by treatment with the detectably labeled antibody. The solid phase support can then be washed with the buffer a second time to remove unbound antibody. The amount of bound label on the solid support can then be detected by conventional means.

The invention also encompasses kits for detecting the presence of a marker protein or nucleic acid in a biological sample. Such kits can be used to determine if a subject is suffering from or is at increased risk of developing drug-induced toxicity. For example, the kit can comprise a labeled compound or agent capable of detecting a marker protein or nucleic acid in a biological sample and means for determining the amount of the protein or mRNA in the sample (e.g. , an antibody which binds the protein or a fragment thereof, or an oligonucleotide probe which binds to DNA or mRNA encoding the protein). Kits can also include instructions for interpreting the results obtained using the kit.

For antibody -based kits, the kit can comprise, for example: (1) a first antibody (e.g. , attached to a solid support) which binds to a marker protein; and, optionally, (2) a second, different antibody which binds to either the protein or the first antibody and is conjugated to a detectable label.

For oligonucleotide -based kits, the kit can comprise, for example: (1) an oligonucleotide, e.g. , a detectably labeled oligonucleotide, which hybridizes to a nucleic acid sequence encoding a marker protein or (2) a pair of primers useful for amplifying a marker nucleic acid molecule. The kit can also comprise, e.g. , a buffering agent, a preservative, or a protein stabilizing agent. The kit can further comprise components necessary for detecting the detectable label (e.g. , an enzyme or a substrate). The kit can also contain a control sample or a series of control samples which can be assayed and compared to the test sample. Each component of the kit can be enclosed within an individual container and all of the various containers can be within a single package, along with instructions for interpreting the results of the assays performed using the kit.

F. Pharmaco2enomics

The markers of the invention are also useful as pharmacogenomic markers. As used herein, a "pharmacogenomic marker" is an objective biochemical marker whose expression level correlates with a specific clinical drug response or susceptibility in a patient (see, e.g., McLeod et al. (1999) Eur. J. Cancer 35(12): 1650-1652). The presence or quantity of the pharmacogenomic marker expression is related to the predicted response of the patient and more particularly the patient' s diseased or toxic cells to therapy with a specific drug or class of drugs. By assessing the presence or quantity of the expression of one or more pharmacogenomic markers in a patient, a drug therapy which is most appropriate for the patient, or which is predicted to have a greater degree of success, may be selected. For example, based on the presence or quantity of RNA or protein encoded by specific tumor markers in a patient, a drug or course of treatment may be selected that is optimized for the treatment of the specific tumor likely to be present in the patient. The use of pharmacogenomic markers therefore permits selecting or designing the most appropriate treatment for each cancer patient without trying different drugs or regimes.

Another aspect of pharmacogenomics deals with genetic conditions that alters the way the body acts on drugs. These pharmacogenetic conditions can occur either as rare defects or as polymorphisms. For example, glucose-6-phosphate dehydrogenase (G6PD) deficiency is a common inherited enzymopathy in which the main clinical complication is hemolysis after ingestion of oxidant drugs (anti-malarials, sulfonamides, analgesics, nitrofurans) and consumption of fava beans.

As an illustrative embodiment, the activity of drug metabolizing enzymes is a major determinant of both the intensity and duration of drug action. The discovery of genetic polymorphisms of drug metabolizing enzymes (e.g. , N-acetyltransferase 2 (NAT 2) and cytochrome P450 enzymes CYP2D6 and CYP2C19) has provided an explanation as to why some patients do not obtain the expected drug effects or show exaggerated drug response and serious toxicity after taking the standard and safe dose of a drug. These polymorphisms are expressed in two phenotypes in the population, the extensive metabolizer (EM) and poor metabolizer (PM). The prevalence of PM is different among different populations. For example, the gene coding for CYP2D6 is highly polymorphic and several mutations have been identified in PM, which all lead to the absence of functional CYP2D6. Poor metabolizers of CYP2D6 and CYP2C19 quite frequently experience exaggerated drug response and side effects when they receive standard doses. If a metabolite is the active therapeutic moiety, a PM will show no therapeutic response, as demonstrated for the analgesic effect of codeine mediated by its CYP2D6-formed metabolite morphine. The other extreme are the so called ultra-rapid metabolizers who do not respond to standard doses. Recently, the molecular basis of ultra-rapid metabolism has been identified to be due to CYP2D6 gene amplification.

Thus, the level of expression of a marker of the invention in an individual can be determined to thereby select appropriate agent(s) for therapeutic or prophylactic treatment of the individual. In addition, pharmacogenetic studies can be used to apply genotyping of polymorphic alleles encoding drug-metabolizing enzymes to the identification of an individual's drug responsiveness phenotype. This knowledge, when applied to dosing or drug selection, can avoid adverse reactions or therapeutic failure and thus enhance therapeutic or prophylactic efficiency when treating a subject with a modulator of expression of a marker of the invention.

G. Monitoring Clinical Trials

Monitoring the influence of agents (e.g. , drug compounds) on the level of expression of a marker of the invention can be applied not only in basic drug screening, but also in clinical trials. For example, the effectiveness of an agent to affect marker expression can be monitored in clinical trials of subjects receiving treatment for cardiotoxicity, or drug-induced toxicity. In a preferred embodiment, the present invention provides a method for monitoring the effectiveness of treatment of a subject with an agent (e.g. , an agonist, antagonist, peptidomimetic, protein, peptide, nucleic acid, small molecule, or other drug candidate) comprising the steps of (i) obtaining a pre- administration sample from a subject prior to administration of the agent; (ii) detecting the level of expression of one or more selected markers of the invention in the pre- administration sample; (iii) obtaining one or more post-administration samples from the subject; (iv) detecting the level of expression of the marker(s) in the post- administration samples; (v) comparing the level of expression of the marker(s) in the pre- administration sample with the level of expression of the marker(s) in the post- administration sample or samples; and (vi) altering the administration of the agent to the subject accordingly. For example, increased expression of the marker gene(s) during the course of treatment may indicate ineffective dosage and the desirability of increasing the dosage. Conversely, decreased expression of the marker gene(s) may indicate efficacious treatment and no need to change dosage. H. Arrays

The invention also includes an array comprising a marker of the present invention. The array can be used to assay expression of one or more genes in the array. In one embodiment, the array can be used to assay gene expression in a tissue to ascertain tissue specificity of genes in the array. In this manner, up to about 7600 genes can be simultaneously assayed for expression. This allows a profile to be developed showing a battery of genes specifically expressed in one or more tissues.

In addition to such qualitative determination, the invention allows the quantitation of gene expression. Thus, not only tissue specificity, but also the level of expression of a battery of genes in the tissue is ascertainable. Thus, genes can be grouped on the basis of their tissue expression per se and level of expression in that tissue. This is useful, for example, in ascertaining the relationship of gene expression between or among tissues. Thus, one tissue can be perturbed and the effect on gene expression in a second tissue can be determined. In this context, the effect of one cell type on another cell type in response to a biological stimulus can be determined. Such a determination is useful, for example, to know the effect of cell-cell interaction at the level of gene expression. If an agent is administered therapeutically to treat one cell type but has an undesirable effect on another cell type, the invention provides an assay to determine the molecular basis of the undesirable effect and thus provides the opportunity to co-administer a counteracting agent or otherwise treat the undesired effect. Similarly, even within a single cell type, undesirable biological effects can be determined at the molecular level. Thus, the effects of an agent on expression of other than the target gene can be ascertained and counteracted.

In another embodiment, the array can be used to monitor the time course of expression of one or more genes in the array. This can occur in various biological contexts, as disclosed herein, for example development of drug-induced toxicity, progression of drug-induced toxicity, and processes, such a cellular transformation associated with drug-induced toxicity.

The array is also useful for ascertaining the effect of the expression of a gene on the expression of other genes in the same cell or in different cells. This provides, for example, for a selection of alternate molecular targets for therapeutic intervention if the ultimate or downstream target cannot be regulated. The array is also useful for ascertaining differential expression patterns of one or more genes in normal and abnormal cells. This provides a battery of genes that could serve as a molecular target for diagnosis or therapeutic intervention.

VII. Methods for Obtaining Samples

Samples useful in the methods of the invention include any tissue, cell, biopsy, or bodily fluid sample that expresses a marker of the invention. In one embodiment, a sample may be a tissue, a cell, whole blood, serum, plasma, buccal scrape, saliva, cerebrospinal fluid, urine, stool, or bronchoalveolar lavage. In preferred embodiments, the tissue sample is a toxicity state sample. In more preferred embodiments, the tissue sample is a a cardiovascular sample or a drug-induced toxicity sample.

Body samples may be obtained from a subject by a variety of techniques known in the art including, for example, by the use of a biopsy or by scraping or swabbing an area or by using a needle to aspirate bodily fluids. Methods for collecting various body samples are well known in the art.

Tissue samples suitable for detecting and quantitating a marker of the invention may be fresh, frozen, or fixed according to methods known to one of skill in the art. Suitable tissue samples are preferably sectioned and placed on a microscope slide for further analyses. Alternatively, solid samples, i.e., tissue samples, may be solubilized and/or homogenized and subsequently analyzed as soluble extracts.

In one embodiment, a freshly obtained biopsy sample is frozen using, for example, liquid nitrogen or difluorodichlorome thane. The frozen sample is mounted for sectioning using, for example, OCT, and serially sectioned in a cryostat. The serial sections are collected on a glass microscope slide. For immunohistochemical staining the slides may be coated with, for example, chrome-alum, gelatine or poly-L-lysine to ensure that the sections stick to the slides. In another embodiment, samples are fixed and embedded prior to sectioning. For example, a tissue sample may be fixed in, for example, formalin, serially dehydrated and embedded in, for example, paraffin.

Once the sample is obtained any method known in the art to be suitable for detecting and quantitating a marker of the invention may be used (either at the nucleic acid or at the protein level). Such methods are well known in the art and include but are not limited to western blots, northern blots, southern blots, immunohistochemistry, ELISA, e.g., amplified ELISA, immunoprecipitation, immunofluorescence, flow cytometry, immunocytochemistry, mass spectrometrometric analyses, e.g., MALDI- TOF and SELDI-TOF, nucleic acid hybridization techniques, nucleic acid reverse transcription methods, and nucleic acid amplification methods. In particular embodiments, the expression of a marker of the invention is detected on a protein level using, for example, antibodies that specifically bind these proteins.

Samples may need to be modified in order to make a marker of the invention accessible to antibody binding. In a particular aspect of the immunocytochemistry or immunohistochemistry methods, slides may be transferred to a pretreatment buffer and optionally heated to increase antigen accessibility. Heating of the sample in the pretreatment buffer rapidly disrupts the lipid bi-layer of the cells and makes the antigens (may be the case in fresh specimens, but not typically what occurs in fixed specimens) more accessible for antibody binding. The terms "pretreatment buffer" and "preparation buffer" are used interchangeably herein to refer to a buffer that is used to prepare cytology or histology samples for immunostaining, particularly by increasing the accessibility of a marker of the invention for antibody binding. The pretreatment buffer may comprise a pH-specific salt solution, a polymer, a detergent, or a nonionic or anionic surfactant such as, for example, an ethyloxylated anionic or nonionic surfactant, an alkanoate or an alkoxylate or even blends of these surfactants or even the use of a bile salt. The pretreatment buffer may, for example, be a solution of 0.1% to 1% of deoxycholic acid, sodium salt, or a solution of sodium laureth-13-carboxylate (e.g., Sandopan LS) or and ethoxylated anionic complex. In some embodiments, the pretreatment buffer may also be used as a slide storage buffer.

Any method for making marker proteins of the invention more accessible for antibody binding may be used in the practice of the invention, including the antigen retrieval methods known in the art. See, for example, Bibbo, et al. (2002) Acta. Cytol. 46:25-29; Saqi, et al. (2003) Diagn. Cytopathol. 27:365-370; Bibbo, et al. (2003) Anal. Quant. Cytol. Histol. 25:8-11, the entire contents of each of which are incorporated herein by reference.

Following pretreatment to increase marker protein accessibility, samples may be blocked using an appropriate blocking agent, e.g. , a peroxidase blocking reagent such as hydrogen peroxide. In some embodiments, the samples may be blocked using a protein blocking reagent to prevent non-specific binding of the antibody. The protein blocking reagent may comprise, for example, purified casein. An antibody, particularly a monoclonal or polyclonal antibody that specifically binds to a marker of the invention is then incubated with the sample. One of skill in the art will appreciate that a more accurate prognosis or diagnosis may be obtained in some cases by detecting multiple epitopes on a marker protein of the invention in a patient sample. Therefore, in particular embodiments, at least two antibodies directed to different epitopes of a marker of the invention are used. Where more than one antibody is used, these antibodies may be added to a single sample sequentially as individual antibody reagents or

simultaneously as an antibody cocktail. Alternatively, each individual antibody may be added to a separate sample from the same patient, and the resulting data pooled.

Techniques for detecting antibody binding are well known in the art. Antibody binding to a marker of the invention may be detected through the use of chemical reagents that generate a detectable signal that corresponds to the level of antibody binding and, accordingly, to the level of marker protein expression. In one of the immunohistochemistry or immunocytochemistry methods of the invention, antibody binding is detected through the use of a secondary antibody that is conjugated to a labeled polymer. Examples of labeled polymers include but are not limited to polymer- enzyme conjugates. The enzymes in these complexes are typically used to catalyze the deposition of a chromogen at the antigen-antibody binding site, thereby resulting in cell staining that corresponds to expression level of the biomarker of interest. Enzymes of particular interest include, but are not limited to, horseradish peroxidase (HRP) and alkaline phosphatase (AP).

In one particular immunohistochemistry or immunocytochemistry method of the invention, antibody binding to a marker of the invention is detected through the use of an HRP-labeled polymer that is conjugated to a secondary antibody. Antibody binding can also be detected through the use of a species-specific probe reagent, which binds to monoclonal or polyclonal antibodies, and a polymer conjugated to HRP, which binds to the species specific probe reagent. Slides are stained for antibody binding using any chromagen, e.g., the chromagen 3,3-diaminobenzidine (DAB), and then counterstained with hematoxylin and, optionally, a bluing agent such as ammonium hydroxide or TBS/Tween-20. Other suitable chromagens include, for example, 3-amino-9- ethylcarbazole (AEC). In some aspects of the invention, slides are reviewed

microscopically by a cytotechnologist and/or a pathologist to assess cell staining, e.g., fluorescent staining (i.e., marker expression). Alternatively, samples may be reviewed via automated microscopy or by personnel with the assistance of computer software that facilitates the identification of positive staining cells.

Detection of antibody binding can be facilitated by coupling the anti-marker antibodies to a detectable substance. Examples of detectable substances include various enzymes, prosthetic groups, fluorescent materials, luminescent materials, bioluminescent materials, and radioactive materials. Examples of suitable enzymes include horseradish peroxidase, alkaline phosphatase, β-galactosidase, or acetylcholinesterase; examples of suitable prosthetic group complexes include streptavidin/biotin and avidin/biotin;

examples of suitable fluorescent materials include umbelliferone, fluorescein, fluorescein isothiocyanate, rhodamine, dichlorotriazinylamine fluorescein, dansyl chloride or phycoerythrin; an example of a luminescent material includes luminol;

examples of bioluminescent materials include luciferase, luciferin, and aequorin; and

125 131 35 14 3

examples of suitable radioactive material include I, I, S, C, or H.

In one embodiment of the invention frozen samples are prepared as described above and subsequently stained with antibodies against a marker of the invention diluted to an appropriate concentration using, for example, Tris-buffered saline (TBS). Primary antibodies can be detected by incubating the slides in biotinylated anti-immunoglobulin. This signal can optionally be amplified and visualized using diaminobenzidine precipitation of the antigen. Furthermore, slides can be optionally counterstained with, for example, hematoxylin, to visualize the cells.

In another embodiment, fixed and embedded samples are stained with antibodies against a marker of the invention and counterstained as described above for frozen sections. In addition, samples may be optionally treated with agents to amplify the signal in order to visualize antibody staining. For example, a peroxidase-catalyzed deposition of biotinyl-tyramide, which in turn is reacted with peroxidase-conjugated streptavidin (Catalyzed Signal Amplification (CSA) System, DAKO, Carpinteria, CA) may be used.

Tissue-based assays (i.e., immunohistochemistry) are the preferred methods of detecting and quantitating a marker of the invention. In one embodiment, the presence or absence of a marker of the invention may be determined by immunohistochemistry. In one embodiment, the immunohistochemical analysis uses low concentrations of an anti-marker antibody such that cells lacking the marker do not stain. In another embodiment, the presence or absence of a marker of the invention is determined using an immunohistochemical method that uses high concentrations of an anti-marker antibody such that cells lacking the marker protein stain heavily. Cells that do not stain contain either mutated marker and fail to produce antigenically recognizable marker protein, or are cells in which the pathways that regulate marker levels are dysregulated, resulting in steady state expression of negligible marker protein.

One of skill in the art will recognize that the concentration of a particular antibody used to practice the methods of the invention will vary depending on such factors as time for binding, level of specificity of the antibody for a marker of the invention, and method of sample preparation. Moreover, when multiple antibodies are used, the required concentration may be affected by the order in which the antibodies are applied to the sample, e.g. , simultaneously as a cocktail or sequentially as individual antibody reagents. Furthermore, the detection chemistry used to visualize antibody binding to a marker of the invention must also be optimized to produce the desired signal to noise ratio.

In one embodiment of the invention, proteomic methods, e.g., mass

spectrometry, are used for detecting and quantitating the marker proteins of the invention. For example, matrix-associated laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) or surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) which involves the application of a biological sample, such as serum, to a protein-binding chip (Wright, G.L., Jr., et al. (2002) Expert Rev Mol Diagn 2:549; Li, J., et al. (2002) Clin Chem 48: 1296; Laronga, C, et al. (2003) Dis Markers 19:229; Petricoin, E.F., et al. (2002) 359:572; Adam, B.L., et al. (2002) Cancer Res 62:3609; Tolson, J., et al. (2004) Lab Invest 84:845; Xiao, Z., et al. (2001) Cancer Res 61:6029) can be used to detect and quantitate the PY-Shc and/or p66-Shc proteins. Mass spectrometric methods are described in, for example, U.S. Patent Nos. 5,622,824, 5,605,798 and 5,547,835, the entire contents of each of which are incorporated herein by reference.

In other embodiments, the expression of a marker of the invention is detected at the nucleic acid level. Nucleic acid-based techniques for assessing expression are well known in the art and include, for example, determining the level of marker mRNA in a sample from a subject. Many expression detection methods use isolated RNA. Any RNA isolation technique that does not select against the isolation of mRNA can be utilized for the purification of RNA from cells that express a marker of the invention (see, e.g., Ausubel et al., ed., (1987-1999) Current Protocols in Molecular Biology (John Wiley & Sons, New York). Additionally, large numbers of tissue samples can readily be processed using techniques well known to those of skill in the art, such as, for example, the single-step RNA isolation process of Chomczynski (1989, U.S. Pat. No. 4,843,155).

The term "probe" refers to any molecule that is capable of selectively binding to a marker of the invention, for example, a nucleotide transcript and/or protein. Probes can be synthesized by one of skill in the art, or derived from appropriate biological preparations. Probes may be specifically designed to be labeled. Examples of molecules that can be utilized as probes include, but are not limited to, RNA, DNA, proteins, antibodies, and organic molecules.

Isolated mRNA can be used in hybridization or amplification assays that include, but are not limited to, Southern or Northern analyses, polymerase chain reaction analyses and probe arrays. One method for the detection of mRNA levels involves contacting the isolated mRNA with a nucleic acid molecule (probe) that can hybridize to the marker mRNA. The nucleic acid probe can be, for example, a full-length cDNA, or a portion thereof, such as an oligonucleotide of at least 7, 15, 30, 50, 100, 250 or 500 nucleotides in length and sufficient to specifically hybridize under stringent conditions to marker genomic DNA.

In one embodiment, the mRNA is immobilized on a solid surface and contacted with a probe, for example by running the isolated mRNA on an agarose gel and transferring the mRNA from the gel to a membrane, such as nitrocellulose. In an alternative embodiment, the probe(s) are immobilized on a solid surface and the mRNA is contacted with the probe(s), for example, in an Affymetrix gene chip array. A skilled artisan can readily adapt known mRNA detection methods for use in detecting the level of marker mRNA.

An alternative method for determining the level of marker mRNA in a sample involves the process of nucleic acid amplification, e.g., by RT-PCR (the experimental embodiment set forth in Mullis, 1987, U.S. Pat. No. 4,683,202), ligase chain reaction (Barany (1991) Proc. Natl. Acad. Sci. USA 88: 189-193), self sustained sequence replication (Guatelli et al. (1990) Proc. Natl. Acad. Sci. USA 87: 1874-1878), transcriptional amplification system (Kwoh et al. (1989) Proc. Natl. Acad. Sci. USA 86: 1173-1177), Q-Beta Replicase (Lizardi et al. (1988) Bio/Technology 6: 1197), rolling circle replication (Lizardi et al., U.S. Pat. No. 5,854,033) or any other nucleic acid amplification method, followed by the detection of the amplified molecules using techniques well known to those of skill in the art. These detection schemes are especially useful for the detection of nucleic acid molecules if such molecules are present in very low numbers. In particular aspects of the invention, marker expression is assessed by quantitative fluorogenic RT-PCR (i. e., the TaqMan™ System). Such methods typically utilize pairs of oligonucleotide primers that are specific for a marker of the invention. Methods for designing oligonucleotide primers specific for a known sequence are well known in the art.

The expression levels of a marker of the invention may be monitored using a membrane blot (such as used in hybridization analysis such as Northern, Southern, dot, and the like), or microwells, sample tubes, gels, beads or fibers (or any solid support comprising bound nucleic acids). See U.S. Pat. Nos. 5,770,722, 5,874,219, 5,744,305, 5,677,195 and 5,445,934, which are incorporated herein by reference. The detection of marker expression may also comprise using nucleic acid probes in solution.

In one embodiment of the invention, microarrays are used to detect the expression of a marker of the invention. Microarrays are particularly well suited for this purpose because of the reproducibility between different experiments. DNA microarrays provide one method for the simultaneous measurement of the expression levels of large numbers of genes. Each array consists of a reproducible pattern of capture probes attached to a solid support. Labeled RNA or DNA is hybridized to complementary probes on the array and then detected by laser scanning. Hybridization intensities for each probe on the array are determined and converted to a quantitative value representing relative gene expression levels. See, U.S. Pat. Nos. 6,040, 138, 5,800,992 and 6,020,135, 6,033,860, and 6,344,316, which are incorporated herein by reference. High-density oligonucleotide arrays are particularly useful for determining the gene expression profile for a large number of RNA's in a sample.

The amounts of marker, and/or a mathematical relationship of the amounts of a marker of the invention may be used to calculate the risk of a toxicity state, e.g., a drug- induced toxicity or cardiotoxicity, in a subject being treated with a drug, , the efficacy of a treatment regimen for treating, preventing or counteracting a toxicity state, and the like, using the methods of the invention, which may include methods of regression analysis known to one of skill in the art. For example, suitable regression models include, but are not limited to CART (e.g., Hill, T, and Lewicki, P. (2006) "STATISTICS Methods and Applications" StatSoft, Tulsa, OK), Cox {e.g.,

www.evidence-based-medicine.co.uk), exponential, normal and log normal {e.g., www.obgyn.cam.ac.uk/mrg/statsbook/stsurvan.html), logistic {e.g.,

www.en.wikipedia.org/wiki/Logistic_regression), parametric, non-parametric, semi- parametric {e.g., www.socserv.mcmaster.ca/jfox/Books/Companion), linear {e.g., www.en.wikipedia.org/wiki/Linear_regression), or additive {e.g.,

www.en.wikipedia.org/wiki/Generalized_additive_model).

In one embodiment, a regression analysis includes the amounts of marker. In another embodiment, a regression analysis includes a marker mathematical relationship. In yet another embodiment, a regression analysis of the amounts of marker, and/or a marker mathematical relationship may include additional clinical and/or molecular co- variates. Such clinical co-variates include, but are not limited to, nodal status, tumor stage, tumor grade, tumor size, treatment regime, e.g., chemotherapy and/or radiation therapy, clinical outcome {e.g., relapse, disease-specific survival, therapy failure), and/or clinical outcome as a function of time after diagnosis, time after initiation of therapy, and/or time after completion of treatment.

VIII. Kits

The invention also provides compositions and kits for identifying an agent at risk for causing drug-induced toxicity, e.g., cardiotoxicity, for prognosing a cardiotoxic state, e.g., a drug-induced cardiotoxicity, recurrence of cardiotoxicity, or survival of a subject being treated for cardiotoxicity. These kits include one or more of the following: a detectable antibody that specifically binds to a marker of the invention, a detectable antibody that specifically binds to a marker of the invention, reagents for obtaining and/or preparing subject tissue samples for staining, and instructions for use.

The kits of the invention may optionally comprise additional components useful for performing the methods of the invention. By way of example, the kits may comprise fluids {e.g., SSC buffer) suitable for annealing complementary nucleic acids or for binding an antibody with a protein with which it specifically binds, one or more sample compartments, an instructional material which describes performance of a method of the invention and tissue specific controls/standards. IX. Screening Assays

Targets of the invention include, but are not limited to, the genes and/or proteins listed herein. Based on the results of experiments described by Applicants herein, the key proteins modulated in a toxicity state are associated with or can be classified into different pathways or groups of molecules, including cytoskeletal components, transcription factors, apoptotic response, pentose phosphate pathway, biosynthetic pathway, oxidative stress (pro-oxidant), membrane alterations, and oxidative

phosphorylation metabolism. Accordingly, in one embodiment of the invention, a marker may include one or more genes (or proteins) selected from the markers listed in table 2. In some embodiments, the markers are a combination of at least two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty, twenty-five, thirty, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, or more of the foregoing genes (or proteins).

Screening assays useful for identifying modulators of identified markers are described below.

The invention also provides methods (also referred to herein as "screening assays") for identifying modulators, i.e., candidate or test compounds or agents (e.g., proteins, peptides, peptidomimetics, peptoids, small molecules or other drugs), which are useful for treating or preventing a toxicity state by modulating the expression and/or activity of a marker of the invention. Such assays typically comprise a reaction between a marker of the invention and one or more assay components. The other components may be either the test compound itself, or a combination of test compounds and a natural binding partner of a marker of the invention. Compounds identified via assays such as those described herein may be useful, for example, for modulating, e.g., inhibiting, ameliorating, treating, or preventing aggressiveness of a disease state or toxicity state.

The test compounds used in the screening assays of the present invention may be obtained from any available source, including systematic libraries of natural and/or synthetic compounds. Test compounds may also be obtained by any of the numerous approaches in combinatorial library methods known in the art, including: biological libraries; peptoid libraries (libraries of molecules having the functionalities of peptides, but with a novel, non-peptide backbone which are resistant to enzymatic degradation but which nevertheless remain bioactive; see, e.g., Zuckermann et ah , 1994, /. Med. Chem. 37:2678-85); spatially addressable parallel solid phase or solution phase libraries; synthetic library methods requiring deconvolution; the 'one -bead one-compound' library method; and synthetic library methods using affinity chromatography selection. The biological library and peptoid library approaches are limited to peptide libraries, while the other four approaches are applicable to peptide, non-peptide oligomer or small molecule libraries of compounds (Lam, 1997, Anticancer Drug Des. 12: 145).

Examples of methods for the synthesis of molecular libraries can be found in the art, for example in: DeWitt et al. (1993) Proc. Natl. Acad. Sci. U.S.A. 90:6909; Erb et al. (1994) Proc. Natl. Acad. Sci. USA 91: 11422; Zuckermann et al. (1994). /. Med. Chem. 37:2678; Cho et al. (1993) Science 261 : 1303; Carrell et al. (1994) Angew. Chem. Int. Ed. Engl. 33:2059; Carell et al. (1994) Angew. Chem. Int. Ed. Engl. 33:2061; and in Gallop et al. (1994) /. Med. Chem. 37: 1233.

Libraries of compounds may be presented in solution (e.g., Houghten, 1992, Biotechniques 13:412-421), or on beads (Lam, 1991, Nature 354:82-84), chips (Fodor, 1993, Nature 364:555-556), bacteria and/or spores, (Ladner, USP 5,223,409), plasmids (Cull et al, 1992, Proc Natl Acad Sci USA 89: 1865-1869) or on phage (Scott and Smith, 1990, Science 249:386-390; Devlin, 1990, Science 249:404-406; Cwirla et al, 1990, Proc. Natl. Acad. Sci. 87:6378-6382; Felici, 1991, /. Mol. Biol. 222:301-310; Ladner, supra.).

The screening methods of the invention comprise contacting a toxicity state cell with a test compound and determining the ability of the test compound to modulate the expression and/or activity of a marker of the invention in the cell. The expression and/or activity of a marker of the invention can be determined as described herein.

In another embodiment, the invention provides assays for screening candidate or test compounds which are substrates of a marker of the invention or biologically active portions thereof. In yet another embodiment, the invention provides assays for screening candidate or test compounds which bind to a marker of the invention or biologically active portions thereof. Determining the ability of the test compound to directly bind to a marker can be accomplished, for example, by coupling the compound with a radioisotope or enzymatic label such that binding of the compound to the marker can be determined by detecting the labeled marker compound in a complex. For example, compounds (e.g., marker substrates) can be labeled with ¹³¹I, ¹²⁵1, ³⁵S, ¹⁴C, or ³H, either directly or indirectly, and the radioisotope detected by direct counting of radioemission or by scintillation counting. Alternatively, assay components can be enzymatically labeled with, for example, horseradish peroxidase, alkaline phosphatase, or luciferase, and the enzymatic label detected by determination of conversion of an appropriate substrate to product.

This invention further pertains to novel agents identified by the above-described screening assays. Accordingly, it is within the scope of this invention to further use an agent identified as described herein in an appropriate animal model. For example, an agent capable of modulating the expression and/or activity of a marker of the invention identified as described herein can be used in an animal model to determine the efficacy, toxicity, or side effects of treatment with such an agent. Alternatively, an agent identified as described herein can be used in an animal model to determine the mechanism of action of such an agent. Furthermore, this invention pertains to uses of novel agents identified by the above-described screening assays for treatment as described above.

Exemplification of the Invention

EXAMPLE 1: Employing Platform Technology to Build Models of Drug

Induced Cardiotoxicity

In this example, the platform technology described in detail in international PCT Application No. PCT/US2012/027615 was employed to integrate data obtained from a custom built drug-induced cardiotoxicity model, and to identify novel proteins/pathways driving the pathogenesis/ cardiotoxicity of drugs. Relational maps resulting from this analysis have provided drug-induced cardiotoxicity biomarkers.

In the healthy heart contractile function depends on a balance of fatty acid and carbohydrate oxidation. Chronic imbalance in uptake, utilization, organellar biogenesis and secretion in non-adipose tissue (heart and liver) is thought to be at the center of mitochondrial damage and dysfunction and a key player in drug induced cardiotoxicity. Here Applicants describe a systems approach combining protein and lipid signatures with functional end point assays specifically looking at cellular bioenergetics and mitochondrial membrane function. In vitro models comprising diabetic and normal cardiomyocytes supplemented with excessive fatty acid and hyperglycemia were treated with a panel of drugs to create signatures and potential mechanisms of toxicity.

Applicants demonstrated the varied effects of drugs in destabilizing the mitochondria by disrupting the energy metabolism component at various levels including (i)

Dysregulation of transcriptional networks that controls expression of mitochondrial energy metabolism genes; (ii) Induction of GPAT1 and taffazin in diabetic

cardiomyocytes thereby initiating de novo phospholipid synthesis and remodeling in the mitochondrial membrane; and (iii) Altered fate of fatty acid in diabetic cardiomyocytes, influencing uptake, fatty acid oxidation and ATP synthesis. Further, Applicants combined the power of wet lab biology and AI based data mining platform to generate causal network based on bayesian models. Networks of proteins and lipids that are causal for loss of normal cell function were used to discern mechanisms of drug induced toxicity from cellular protective mechanisms. This novel approach will serve as a powerful new tool to understand mechanism of toxicity while allowing for development of safer therapeutics that correct an altered phenotype. Human cardiomyocytes were subject to conditions simulating an diabetic environment experienced by the disease-relevant cells in vivo. Specifically, the cells were exposed to hyperglycemic conditions and hyperlipidemia conditions. The hyperglycemic condition was induced by culturing cells in media containing 22 mM glucose. The hyperlipidemia condition was induced by culturing the cells in media containing ImM L-carnitine, 0.7mM Oleic acid and 0.7mM Linoleic acid.

The cell model comprising the above-mentioned cells, wherein the cells were exposed to each condition described above, was additionally "interrogated" by exposing the cells to an "environmental perturbation" by treating with a diabetic drug (T) which is known to cause cardiotoxicity, a rescue molecule (R) or both the diabetic drug and the rescue molecule (T+R). Specifically, the cells were treated with diabetic drug; or treated with rescue molecule Coenzyme Q10 at 0, 50μΜ, or ΙΟΟμΜ; or treated with both of the diabetic drug and the rescue molecule Coenzyme Q10.

Cell samples from each condition with each perturbation treatment were collected at various times following treatment, including after 6 hours of treatment. For certain conditions, media samples were also collected and analyzed.

iProfiling of changes in total cellular protein expression by quantitative proteomics was performed for cell and media samples collected for each condition and with each "environmental perturbation", i.e, diabetic drug treatment, Coenzyme Q10 treatment or both, using the techniques described above in the detailed description. Transcriptional profiling experiments were carried out using the Biorad cfx-384 amplification system. Following data collection (Ct), the final fold change over control was determined using the 5Ct method as outlined in manufacturer' s protocol.

Lipidomics experiments were carried out using mass spectrometry. Functional assays such as Oxygen consumption rate OCR were measured by employing the Seahorse analyzer essentially as recommended by the manufacturer. OCR was recorded by the electrodes in a 7 μΐ chamber created with the cartridge pushing against the seahorse culture plate.

As shown in Figure 20, transcriptional network and expression of human mitochondrial energy metabolism genes in diabetic cardiomyocytes (cardiomyocytes conditioned in hyperglycemic and hyperlipidemia) were compared between perturbed and unperturbed treatments. Specifically, data of transcriptional network and expression of human mitochondrial energy metabolism genes were compared between diabetic cardiomyocytes treated with diabetic drug (T) and untreated diabetic cardiomyocytes samples (UT). Data of Transcriptional network and expression of human mitochondrial energy metabolism genes were compared between diabetic cardiomyocytes treated with both diabetic drug and rescue molecule Coenzyme Q10 (T+R) and untreated diabetic cardiomyocytes samples (UT). Comparing to data from untreated diabetic

cardiomyocytes, certain genes expression and transcription were altered when diabetic cardiomyocytes were treated with diabetic drug. Rescue molecule Coenzyme Q10 was demonstrated to reverse the toxic effect of diabetic drug and normalize gene expression and transcription.

As shown in Figure 21A, cardiomyocytes were cultured either in normoglycemia (NG) or hyperglygemia (HG) condition and treated with either diabetic drug alone (T) or with both diabetic drug and rescue molecule Coenzyme Q10 (T+R) . Protein expression levels of GPATl and TAZ for each condition and each treatment were tested with western blotting. Both GPATl and TAZ were upregulated in hyperglycemia conditioned and diabetic drug treated cardiomyocytes. When hyperglycemia conditioned cardiomyocytes were treated with both diabetic drug and rescue molecule Coenzyme Q 10, the upregulated protein expression level of GPATl and TAZ were normalized.

As shown in Figure 22A, mitochondrial oxygen consumption rate ( ) experiments were carried out for hyperglycemia conditioned cardiomyocytes samples. Hyperglycemia conditioned cardiomyocytes were either untreated (UT), treated with diabetic drug Tl which is known to cause cardiotoxicity, treated with diabetic drug T2 which is known to cause cardiotoxicity, treated with both diabetic drug Tl and rescue molecule Coenzyme Q10 (Tl+R), or treated with both diabetic drug T2 and rescue molecule Coenzyme Q10 (T2+R). Comparing to untreated control samples, mitochondrial OCR was decreased when hyperglycemia conditioned cardiomyocytes were treated with diabetic drug Tl or T2. However, mitochondrial OCR was normalized when hyperglycemia conditioned cardiomyocytes were treated with both diabetic drug and rescue molecule Coenzyme Q10 (Tl + R, or T2 + R).

As shown in Figure 22B, mitochondria ATP synthesis experiments were carried out for hyperglycemia conditioned cardiomyocytes samples. Hyperglycemia conditioned cardiomyocytes were either untreated (UT), treated with a diabetic drug (T), or treated with both diabetic drug and rescue molecule Coenzyme Q10 (T+R). Comparing to untreated control samples, mitochondrial ATP synthesis was repressed when hyperglycemia conditioned cardiomyocytes were treated with diabetic drug (T).

As shown in Figure 23, based on the collected proteomic data, proteins down regulated by drug treatment were annotated with GO terms. Proteins involved in mitochondrial energy metabolism were down regulated when hyperglycemia conditioned cardiomyocytes were treated with a diabetic drug which is known to cause cardiotoxicity.

Proteomics, lipidomics, transcriptional profiling, functional assays, and western blotting data collected for each condition and with each perturbation, were then processed by the REFS™ system. Composite perturbed networks were generated from combined data obtained from one specific condition (e.g., hyperglycemia, or hyperlipidemia) exposed to each perturbation (e.g., diabetic drug, CoQIO, or both). Composite unperturbed networks were generated from combined data obtained from the same one specific condition (e.g., hyperglycemia, or hyperlipidemia), without perturbation (untreated). Similarly, composite perturbed networks were generated from combined data obtained for a second, control condition (e.g., normal glycemia) exposed to each perturbation (e.g., diabetic drug, CoQIO, or both). Composite unperturbed networks were generated from combined data obtained from the same second, control condition (e.g., normal glycemia), without perturbation (untreated).

Each node in the consensus composite networks described above was simulated (by increasing or decreasing by 10-fold) to generate simulation networks using REFS™, as described in detail above in the detailed description.

The area under the curve and fold changes for each edge connecting a parent node to a child node in the simulation networks were extracted by a custom-built program using the R programming language, where the R programming language is an open source software environment for statistical computing and graphics.

Delta networks were generated from the simulated composite networks. To generate a drug induced cardiotoxicity condition vs. normal condition differential network in response to the diabetic drug (delt network), steps of comparison were performed as illustrated in Figure 24, by a custom built program using the PERL programming language.

Specifically, as shown in Figure 24, Untreated refers to protein expression networks of untreated control cardiomyocytes in hyperglycemia condition. Drug refers to protein expression networks of diabetic drug treated cardiomyocytes in

hyperglycemia condition. Unique edges from Drug in the Drug Π Untreated delta network are presented in Figure 25.

Specifically, a simulated composite map of untreated cardiomyocytes in hyperglycemia condition and a simulated composite map of diabetic drug treated cardiomyocytes in hyperglycemia condition were compared using a custom-made Perl program to generate unique edges of the diabetic drug treated cardiomyocytes in hyperglycemia condition. Output from the PERL and R programs were input into Cytoscape, an open source program, to generate a visual representation of the delta network. As shown in Figure 25, the network represents delta networks that are driven by the diabetic drug versus untreated in cardiomyocytes/ cardiotox models in hyperglycemia condition.

From the drug induced toxicity condition vs. normal condition differential network shown in Figure 25, proteins were identified which drive pathophysiology of drug induced cardiotoxicity, such as GRP78, GRP75, TIMP1, PTX3, HSP76, PDIA4, PDIA1, CA2D1. These proteins can function as biomarkers for identification of other cardiotoxicity inducing drugs. These proteins can also function as biomarkers for identification of agents which can alleviate cardiotoxicity.

The experiments described in this Example demonstrate that perturbed membrane biology and altered fate of free fatty acid in diabetic cardiomyocytes exposed to drug treatment represent the center piece of drug induced toxicity. Data integration and network biology have allowed for an enhanced understanding of cardiotoxicity, and identification of novel biomarkers predictive for cardiotoxicity.

EXAMPLE 2: Employing Models of Drug Induced Cardiotoxicity to Identify Additional Markers of Cardiotoxicity

The platform technology described above in Example 1 was similarly employed to integrate further data obtained from the same custom built cardiotoxicity model. Five patient cardiomyocyte lines were used to create a model of cardiotoxicity as explained in the above-detailed description. The five cardiomyocyte lines were then subjected to a mitochondrial ATP assay to assay for mitochondrial dysfunction imposed by drug treatment or absence there of (as indicated as + and -) under diabetic conditions (hyperglycemia) and normal conditions (normoglycemia). A reduction of mitochondrial ATP was observed under diabetic conditions upon drug treatment in only 2 out of the 5 cardiotoxicity model (see Figure 30). The results of these further experiments lead to the identification of additional novel proteins/pathways driving the pathogenesis of cardiotoxicity of drugs, as summarized in Figures 26-34.

The causal interaction network identified several novel biomarkers and potential therapeutic targets for drug-induced cardiotoxicity. Relational maps resulting from this analysis as shown in Figures 28, 29, 31-33 have provided additional drug-induced cardiotoxicity biomarkers, which are listed below in Table 2. These biomarkers may be used for predicting drug-induced cardiotoxicity of a drug, for diagnosis/prognosis of drug-induced cardiotoxicity, and for identifying a rescue agent which can reduce or alleviate drug-induced cardiotoxicity.

Table 2: biomarkers identified by the Interrogative Biology Discovery Platform

1A69, 1C17, ACBD3, ACLY, ACTR2, ANXA6, ANXA7, AP2A1, ARCN1, ASNA1, ATAD3A, ATP5A, ATP5B, ATP5D, ATP5F1, ATP5H, ATPIF1, BSG, C14orfl66, CA2D1, CAPN1, CAPZA2, CARS, CCDC22, CCDC47, CCT7, CLIC4, CMPK1, CNN2, C01A2, C06A1, COTL1, COX6B1, CRTAP, CSOIO, CTSA, CTSB, CYB5, DDX1, DDX17, DDX18, DLD, EDIL3, EHD2, EIF4A3, EN02, EPHX1, ETFA, FERMT2, FINC, FKB10, FKBP2, FLNC, G3BP2, GOLGA3, GPAT1, GPSN2, GRP75, GRP78, HMOX1, HNRNPD, HNRNPH1, HNRPG, HPX, HSP76, HSP90AB1, HSPA1A, HSPA4, HSPA9, IBP7, IDH1, IQGAP1, ITB1, ITGB1, KARS, KIF5B, KPNA3, KPNB1, LAMC1, LGALS1, LM07, M6PRBP1, MACF1, MAP1B, MARS, MDH1, MPR1, MTHFD1, MYH10, NCL, NHP2L1, NUCB1, OLA1, P08621, P3H1, P4HA2, P4HB, SEC61A1 (P61619), PAI1, PAPSS2, PCBP2, PDCD6, PDIA1, PDIA3, PDIA3, PDIA4, PDLIM7, PEBP1, PFKM, PH4B, PLIN2, POFUT1, PRKDC, PSMA1, PSMA7, PSMD12, PSMD3, PSMD4, PSMD6, PSME2, PTBP1, PTX3, Q9BQE5, Q9Y262, RAB1B, RP515A, RPL32, RPL7A, RPL8, RPS25, RPS6, RRAS2, RRP1, SAR1B, SDHA, SENP1, SEPT11, SEPT7, SERPH, SERPINE1, SFRS2, SH3BGRL, SNRPB, SNX12, SOD1, SPRC, ST13, SUB1, SYNCRIP, TAGLN, TAZ, TGM2, TIMP1, TLN1, TPM4, TRAP1, TSP1, TTLL12, TXNDC12, UBA1C, UGDH, UGP2, UQCRH, VAMP3, VAPA

In one embodiment, a panel of two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, or thirteen markers selected from a group consisting TIMP1, PTX3, HSP76, FINC, CYB5, PAIl, IBP7 (IGFBP7), 1C17, EDIL3, HMOXl, NUCBl, CSOlO, HSPA4 can be used for predicting drug-induced cardiotoxicity of a drug, for diagnosis/prognosis of drug-induced cardiotoxicity, for identifying a rescue agent which can reduce or alleviate drug-induced cardiotoxicity.

Among the markers listed in Table 2, PTX3, PAIl, IBP7 (IGFBP7) have been reported as markers of cardiomyopathy previously. GRP78 and PDIA3 have been reported as serving important indications of ER stress and hypoxic insult. The fact that these markers have been identified by the above-descriped platform technology for drug- induced cardiotoxicity, have validated this platform technology for probing novel drug- induced cardiotoxicity biomarkers.

The sDNA sequences of the markers listed in Table 2 are set forth in Appendix A, and are known in the art.

Incorporation by Reference

The contents of all cited references (including literature references, patents, patent applications, and websites) that maybe cited throughout this application are hereby expressly incorporated by reference in their entirety, as are the references cited therein. The practice of the present invention will employ, unless otherwise indicated, conventional techniques of protein formulation, which are well known in the art.

Equivalents

The invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The foregoing embodiments are therefore to be considered in all respects illustrative rather than limiting of the invention described herein. Scope of the invention is thus indicated by the appended claims rather than by the foregoing description, and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced herein.

Appendix A

Grp78

Official Symbol: HSPA5

Official Name: heat shock 70kDa protein 5 (glucose-regulated protein, 78kDa) Gene ID: 3309

Organism: Homo sapiens

Other Aliases: BIP; MIF2; GRP78

Other Designations: 78 kDa glucose-regulated protein; endoplasmic reticulum lumenal Ca(2+)-binding protein grp78; immunoglobulin heavy chain-binding protein

Nucleotide sequence:

NCBI Reference Sequence: NM_005347.4

LOCUS NM_005347

ACCESSION NM 005347

agggtatata agccgagtag gcgacggtga ggtcgacgcc ggccaagaca

61 ς attgacctat tggggtgttt cgcgagtgtg agagggaagc gccgcggcct

121 ς acctgccctt cgcctggttc gtggcgcctt gtgaccccgg gcccctgccg

181 c ggaaattgcg ctgtgctcct gtgctacggc ctgtggctgg actgcctgct

241 ς tggctggcaa gatgaagctc tccctggtgg ccgcgatgct gctgctgctc

301 a gggccgagga ggaggacaag aaggaggacg tgggcacggt ggtcggcatc

361 ς ccacctactc ctgcgtcggc gtgttcaaga acggccgcgt ggagatcatc

421 ς agggcaaccg catcacgccg tcctatgtcg ccttcactcc tgaaggggaa

481 c gcgatgccgc caagaaccag ctcacctcca accccgagaa cacggtcttt

541 ς ggctcatcgg ccgcacgtgg aatgacccgt ctgtgcagca ggacatcaag

601 t tcaaggtggt tgaaaagaaa actaaaccat acattcaagt tgatattgga

661 ς caaagacatt tgctcctgaa gaaatttctg ccatggttct cactaaaatg

721 a ctgaggctta tttgggaaag aaggttaccc atgcagttgt tactgtacca

781 ς atgatgccca acgccaagca accaaagacg ctggaactat tgctggccta

841 a ggatcatcaa cgagcctacg gcagctgcta ttgcttatgg cctggataag

901 a agaagaacat cctggtgttt gacctgggtg gcggaacctt cgatgtgtct 961 cttctcacca ttgacaatgg tgtcttcgaa gttgtggcca ctaatggaga tactcatctg

1021 ggtggagaag actttgacca gcgtgtcatg gaacacttca tcaaactgta caaaaagaag

1081 acgggcaaag atgtcaggaa agacaataga gctgtgcaga aactccggcg cgaggtagaa

1141 aaggccaaac gggccctgtc ttctcagcat caagcaagaa ttgaaattga gtccttctat

1201 gaaggagaag acttttctga gaccctgact cgggccaaat ttgaagagct caacatggat

1261 ctgttccggt ctactatgaa gcccgtccag aaagtgttgg aagattctga tttgaagaag

1321 tctgatattg atgaaattgt tcttgttggt ggctcgactc gaattccaaa gattcagcaa

1381 ctggttaaag agttcttcaa tggcaaggaa ccatcccgtg gcataaaccc agatgaagct

1441 gtagcgtatg gtgctgctgt ccaggctggt gtgctctctg gtgatcaaga tacaggtgac

1501 ctggtactgc ttgatgtatg tccccttaca cttggtattg aaactgtggg aggtgtcatg

1561 accaaactga ttccaaggaa cacagtggtg cctaccaaga agtctcagat cttttctaca

1621 gcttctgata atcaaccaac tgttacaatc aaggtctatg aaggtgaaag acccctgaca

1681 aaagacaatc atcttctggg tacatttgat ctgactggaa ttcctcctgc tcctcgtggg

1741 gtcccacaga ttgaagtcac ctttgagata gatgtgaatg gtattcttcg agtgacagct

1801 gaagacaagg gtacagggaa caaaaataag atcacaatca ccaatgacca gaatcgcctg

1861 acacctgaag aaatcgaaag gatggttaat gatgctgaga agtttgctga ggaagacaaa

1921 aagctcaagg agcgcattga tactagaaat gagttggaaa gctatgccta ttctctaaag

1981 aatcagattg gagataaaga aaagctggga ggtaaacttt cctctgaaga taaggagacc

2041 atggaaaaag ctgtagaaga aaagattgaa tggctggaaa gccaccaaga tgctgacatt

2101 gaagacttca aagctaagaa gaaggaactg gaagaaattg ttcaaccaat tatcagcaaa

2161 ctctatggaa gtgcaggccc tcccccaact ggtgaagagg atacagcaga aaaagatgag

2221 ttgtagacac tgatctgcta gtgctgtaat attgtaaata ctggactcag gaacttttgt

2281 taggaaaaaa ttgaaagaac ttaagtctcg aatgtaattg gaatcttcac ctcagagtgg

2341 agttgaaact gctatagcct aagcggctgt ttactgcttt tcattagcag ttgctcacat

2401 gtctttgggt gggggggaga agaagaattg gccatcttaa aaagcgggta aaaaacctgg

2461 gttagggtgt gtgttcacct tcaaaatgtt ctatttaaca actgggtcat gtgcatctgg

2521 tgtaggaagt tttttctacc ataagtgaca ccaataaatg tttgttattt acactggtct

2581 aatgtttgtg agaagcttct aattagatca attacttatt ttaggaaatt taagactaga

2641 tactcgtgtg tggggtgagg ggagggagta tttggtatgt tgggataagg aaacacttct

2701 atttaatgct tccagggatt tttttttttt tttttaaccc tcctgggccc aagtgatcct 2761 tccacctcag tctcccagct aattgagacc acaggcttgt taccaccatg ctcggctttt

2821 gcattaatct aagaaaaggg gagagaagtt aatccacatc tttactcagg caaggggcat

2881 ttcacagtgc ccaagagtgg ggttttcttg aacatacttg gtttcctatt tccccttatc

2941 tttctaaaac tgcctttctg gtggcttttt ttaaaattat tactaatgat gcttttatag

3001 ctgcttggat tctctgagaa atgatgggga gtgagtgatc actggtatta actttataca

3061 cttggatttc atttgtaact ttaggatgta aaggtatatt gtgaacccta gctgtgtcag

3121 aatctccatc cctgaaattt ctcattagtg gtactggggt gggatcttgg atggtgacat

3181 tgaaactaca ctaaatcccc tcactatgaa tgggttgtta aaggcaatgg tttgtgtcaa

3241 aactggttta ggattactta gattgtgttc ctgaagaaaa gagtccaggt aaatggtatg

3301 atcaataaag gacaggctgg tgctaacata aaatccaata ttgtaatcct agcactttgg

3361 gaggccaagg cgggtggatc acaaggtcaa gagatagaga ccatctttgc caacatggtg

3421 aaactccatc tctactgaaa atacaaaaat tagctgggcg tggtagtgca agctgaaggc

3481 tgaggcagga gaatcactcg aacccgggag gcagaggttg cagtgagccg agatcacacc

3541 actgtactcc agcccggcac tccagcctgg cgacaagagt gagactccac ctcaaaaaaa

3601 aaaaaaagaa tccaatactg cccaaggata ggtattttat agatgggcaa ctggctgaaa

3661 ggttaattct ctagggctag tagaactgga tcccaacacc aaactcttaa ttagacctag

3721 gcctcagctg cactgcccga aaagcatttg ggcagaccct gagcagaata ctggtctcag

3781 gccaagccca atacagccat taaagatgac ctacagtgct gtgtaccctg gggcaatagg

3841 gttaaatggt agttagcaac tagggctagt cttcccttac ctcaaaggct ctcactaccg

3901 tggaccacct agtctgtaac tctttctgag gagctgttac tgaatattaa aaagatagac

3961 ttcaactatg aaa

Protein sequence:

NCBI Reference Sequence: NP_005338.1

LOCUS NP_005338

ACCESSION NP_005338

1 mklslvaaml lllsaaraee edkkedvgtv vgidlgttys cvgvfkngrv eiiandqgnr

61 itpsyvaftp egerligdaa knqltsnpen tvfdakrlig rtwndpsvqq dikflpfkvv

121 ekktkpyiqv digggqtktf apeeisamvl tkmketaeay lgkkvthavv tvpayfndaq

181 rqatkdagti aglnvmriin eptaaaiayg ldkregekni lvfdlgggtf dvslltidng 241 vfevvatngd thlggedfdq rvmehf ikly kkktgkdvrk dnravqklrr evekakr ls

301 sqhqarieie sfyegedfse tltrakfeel nmdlfr stmk pvqkvledsd lkksdideiv

361 lvggstripk iqqlvkeffn gkepsrginp deavaygaav qagvlsgdqd tgdlvlldvc

421 pltlgietvg gvmtkliprn tvvptkksqi fstasdnqpt vtikvyeger pltkdnhllg

481 tfdltgippa prgvpqievt feidvngilr vtaedkgtgn knkititndq nrltpeeier

541 mvndaekfae edkklkerid trnelesyay slknqigdke klggklssed ketmekavee

601 kiewleshqd adiedfkakk keleeivqpi isklygsagp pptgeedtae

Grp75

Official Symbol: HSPA9

Official Name: heat shock 70kDa protein 9 (mortalin)

Gene ID: 3313

Organism: Homo sapiens

Other Aliases: CSA; MOT; MOT2; GRP75; PBP74; GRP-75; HSPA9B;

MTHSP75

Other Designations: 75 kDa glucose-regulated protein; heat shock 70kD protein 9B; mortalin, perinuclear; mortalin-2; p66-mortalin; peptide-binding protein 74; stress-70 protein, mitochondrial

Nucleotide sequence:

NCBI Reference Sequence: NM_004134.6

LOCUS NM_004134

ACCESSION NM_004134

1 ttcctcccct ggactctttc tgagctcaga gccgccgcag ccgggacagg agggcaggct

61 ttctccaacc atcatgctgc ggagcatatt acctgtacgc cctggctccg ggagcggcag

121 tcgagtatcc tctggtcagg cggcgcgggc ggcgcctcag cggaagagcg ggcctctggg

181 ccgcagtgac caacccccgc ccctcacccc acgtggttgg aggtttccag aagcgctgcc

241 gccaccgcat cgcgcagctc tttgccgtcg gagcgcttgt ttgctgcctc gtactcctcc

301 atttatccgc catgataagt gccagccgag ctgcagcagc ccgtctcgtg ggcgccgcag

361 cctcccgggg ccctacggcc gcccgccacc aggatagctg gaatggcctt agtcatgagg

421 cttttagact tgtttcaagg cgggattatg catcagaagc aatcaaggga gcagttgttg 481 gtattgattt gggtactacc aactcctgcg tggcagttat ggaaggtaaa caagcaaagg

541 tgctggagaa tgccgaaggt gccagaacca ccccttcagt tgtggccttt acagcagatg

601 gtgagcgact tgttggaatg ccggccaagc gacaggctgt caccaaccca aacaatacat

661 tttatgctac caagcgtctc attggccggc gatatgatga tcctgaagta cagaaagaca

721 ttaaaaatgt tccctttaaa attgtccgtg cctccaatgg tgatgcctgg gttgaggctc

781 atgggaaatt gtattctccg agtcagattg gagcatttgt gttgatgaag atgaaagaga

841 ctgcagaaaa ttacttgggg cacacagcaa aaaatgctgt gatcacagtc ccagcttatt

901 tcaatgactc gcagagacag gccactaaag atgctggcca gatatctgga ctgaatgtgc

961 ttcgggtgat taatgagccc acagctgctg ctcttgccta tggtctagac aaatcagaag

1021 acaaagtcat tgctgtatat gatttaggtg gtggaacttt tgatatttct atcctggaaa

1081 ttcagaaagg agtatttgag gtgaaatcca caaatgggga taccttctta ggtggggaag

1141 actttgacca ggccttgcta cggcacattg tgaaggagtt caagagagag acaggggttg

1201 atttgactaa agacaacatg gcacttcaga gggtacggga agctgctgaa aaggctaaat

1261 gtgaactctc ctcatctgtg cagactgaca tcaatttgcc ctatcttaca atggattctt

1321 ctggacccaa gcatttgaat atgaagttga cccgtgctca atttgaaggg attgtcactg

1381 atctaatcag aaggactatc gctccatgcc aaaaagctat gcaagatgca gaagtcagca

1441 agagtgacat aggagaagtg attcttgtgg gtggcatgac taggatgccc aaggttcagc

1501 agactgtaca ggatcttttt ggcagagccc caagtaaagc tgtcaatcct gatgaggctg

1561 tggccattgg agctgccatt cagggaggtg tgttggccgg cgatgtcacg gatgtgctgc

1621 tccttgatgt cactcccctg tctctgggta ttgaaactct aggaggtgtc tttaccaaac

1681 ttattaatag gaataccact attccaacca agaagagcca ggtattctct actgccgctg

1741 atggtcaaac gcaagtggaa attaaagtgt gtcagggtga aagagagatg gctggagaca

1801 acaaactcct tggacagttt actttgattg gaattccacc agcccctcgt ggagttcctc

1861 agattgaagt tacatttgac attgatgcca atgggatagt acatgtttct gctaaagata

1921 aaggcacagg acgtgagcag cagattgtaa tccagtcttc tggtggatta agcaaagatg

1981 atattgaaaa tatggttaaa aatgcagaga aatatgctga agaagaccgg cgaaagaagg

2041 aacgagttga agcagttaat atggctgaag gaatcattca cgacacagaa accaagatgg

2101 aagaattcaa ggaccaatta cctgctgatg agtgcaacaa gctgaaagaa gagatttcca

2161 aaatgaggga gctcctggct agaaaagaca gcgaaacagg agaaaatatt agacaggcag

2221 catcctctct tcagcaggca tcactgaagc tgttcgaaat ggcatacaaa aagatggcat 2281 ctgagcgaga aggctctgga agttctggca ctggggaaca aaaggaagat caaaaggagg

2341 aaaaacagta ataatagcag aaattttgaa gccagaagga caacatatga agcttaggag

2401 tgaagagact tcctgagcag aaatgggcga acttcagtct ttttactgtg tttttgcagt

2461 attctatata taatttcctt aatttgtaaa tttagtgacc attagctagt gatcatttaa

2521 tggacagtga ttctaacagt ataaagttca caatattcta tgtccctagc ctgtcatttt

2581 tcagctgcat gtaaaaggag gtaggatgaa ttgatcatta taaagattta actattttat

2641 gctgaagtga ccatattttc aaggggtgaa accatctcgc acacagcaat gaaggtagtc

2701 atccatagac ttgaaatgag accacatatg gggatgagat ccttctagtt agcctagtac

2761 tgctgtactg gcctgtatgt acatggggtc cttcaactga ggccttgcaa gtcaagctgg

2821 ctgtgccatg tttgtagatg gggcagagga atctagaaca atgggaaact tagctattta

2881 tattaggtac agctattaaa acaaggtagg aatgaggcta gacctttaac ttccctaagg

2941 catacttttc tagctacctt ctgccctgtg tctggcacct acatccttga tgattgttct

3001 cttacccatt ctggaatttt ttttttttta aataaataca gaaagcatct tgatctcttg

3061 tttgtgaggg gtgatgccct gagatttagc ttcaagaata tgccatggct catgcttccc

3121 atatttccca aagagggaaa tacaggattt gctaacactg gttaaaaatg caaattcaag

3181 atttggaagg gctgttataa tgaaataatg agcagtatca gcatgtgcaa atcttgtttg

3241 aaggatttta ttttctcccc ttagaccttt ggtacattta gaatcttgaa agtttctaga

3301 tctctaacat gaaagtttct agatctctaa catgaaagtt tttagatctc taacatgaaa

3361 accaaggtgg ctattttcag gttgctttca gctccaagta gaaataacca gaattggctt

3421 acattaaaga aactgcatct agaaataagt cctaagatac tatttctatg gctcaaaaat

3481 aaaaggaacc cagatttctt tcccta

Protein sequence:

NCBI Reference Sequence: NP_004125.3

LOCUS NP_004125

ACCESSION NP_004125

1 misasraaaa rlvgaaasrg ptaarhqdsw nglsheafrl vsrrdyasea ikgavvgidl

61 gttnscvavm egkqakvlen aegarttpsv vaftadgerl vgmpakrqav tnpnntfyat

121 krligrrydd pevqkdiknv pfkivrasng dawveahgkl yspsqigafv lmkmketaen

181 ylghtaknav itvpayfnds qrqatkdagq isglnvlrvi neptaaalay gldksedkvi 241 avydlgggtf disileiqkg vfevkstngd tflggedfdq allrhivkef kretgvdltk

301 dnmalqrvre aaekakcels ssvqtdinlp yltmdssgpk hlnmkltraq fegivtdlir

361 rtiapcqkam qdaevsksdi gevilvggmt rmpkvqqtvq dlfgrapska vnpdeavaig

421 aaiqggvlag dvtdvllldv tplslgietl ggvftklinr nttiptkksq vfstaadgqt

481 qveikvcqge remagdnkll gqftligipp aprgvpqiev tfdidangiv hvsakdkgtg

541 reqqiviqss gglskddien mvknaekyae edrrkkerve avnmaegiih dtetkmeefk

601 dqlpadecnk lkeeiskmre llarkdsetg enirqaassl qqaslklfem aykkmasere

661 gsgssgtgeq kedqkeekq

TIMP1

Official Symbol: TIMP1

Official Name: TIMP metallopeptidase inhibitor 1

Gene ID: Gene ID: 7076

Organism: Homo sapiens

Other Aliases: RP1 -230G1 .3, CLGI, EPA, EPO, HCI, TIMP

Other Designations: TIMP-1 ; collagenase inhibitor; erythroid potentiating activity; erythroid-potentiating activity; fibroblast collagenase inhibitor; metalloproteinase inhibitor 1 ; tissue inhibitor of metalloproteinases 1

Nucleotide sequence:

NCBI Reference Sequence: NM_003254.2

LOCUS NM_003254

ACCESSION NM_003254

1 tttcgtcggc ccgccccttg gcttctgcac tgatggtggg tggatgagta atgcatccag

61 gaagcctgga ggcctgtggt ttccgcaccc gctgccaccc ccgcccctag cgtggacatt

121 tatcctctag cgctcaggcc ctgccgccat cgccgcagat ccagcgccca gagagacacc

181 agagaaccca ccatggcccc ctttgagccc ctggcttctg gcatcctgtt gttgctgtgg

241 ctgatagccc ccagcagggc ctgcacctgt gtcccacccc acccacagac ggccttctgc

301 aattccgacc tcgtcatcag ggccaagttc gtggggacac cagaagtcaa ccagaccacc

361 ttataccagc gttatgagat caagatgacc aagatgtata aagggttcca agccttaggg

421 gatgccgctg acatccggtt cgtctacacc cccgccatgg agagtgtctg cggatacttc

481 cacaggtccc acaaccgcag cgaggagttt ctcattgctg gaaaactgca ggatggactc 541 ttgcacatca ctacctgcag ttttgtggct ccctggaaca gcctgagctt agctcagcgc

601 cggggcttca ccaagaccta cactgttggc tgtgaggaat gcacagtgtt tccctgttta

661 tccatcccct gcaaactgca gagtggcact cattgcttgt ggacggacca gctcctccaa

721 ggctctgaaa agggcttcca gtcccgtcac cttgcctgcc tgcctcggga gccagggctg

781 tgcacctggc agtccctgcg gtcccagata gcctgaatcc tgcccggagt ggaagctgaa

841 gcctgcacag tgtccaccct gttcccactc ccatctttct tccggacaat gaaataaaga

901 gttaccaccc agcagaaaaa aaaaaaaaaa

Protein sequence:

NCBI Reference Sequence: NP_003245.1

LOCUS NP_003245

ACCESSION NP_003245

1 mapfeplasg illllwliap sractcvpph pqtafcnsdl virakfvgtp evnqttlyqr

61 yeikmtkmyk gfqalgdaad irfvytpame svcgyfhrsh nrseefliag klqdgllhit

121 tcsfvapwns lslaqrrgft ktytvgceec tvfpclsipc klqsgthclw tdqllqgsek

181 gfqsrhlacl prepglctwq slrsqia

PTX3

Official Symbol: PTX3

Official Name: pentraxin 3, long

Gene ID: 5806

Organism: Homo sapiens

Other Aliases: TNFAIP5, TSG-14

Other Designations: TNF alpha-induced protein 5; pentaxin-related gene, rapidly induced by IL-1 beta, tumor necrosis factor, alpha-induced protein 5; pentaxin- related protein PTX3; pentraxin-3; pentraxin-related gene, rapidly induced by IL- 1 beta; pentraxin-related protein PTX3; tumor necrosis factor alpha-induced protein 5; tumor necrosis factor, alpha-induced protein 5; tumor necrosis factor- inducible gene 14 protein; tumor necrosis factor-inducible protein TSG-14

Nucleotide sequence:

NCBI Reference Sequence: NM_002852.3

LOCUS NM_002852

ACCESSION NM 002852 1 attcatcccc attcaggctt tcctcagcat ttattaagga ctctctgctc cagcctctca

61 ctctcactct cctccgctca aactcagctc acttgagagt ctcctcccgc cagctgtgga

121 aagaactttg cgtctctcca gcaatgcatc tccttgcgat tctgttttgt gctctctggt

181 ctgcagtgtt ggccgagaac tcggatgatt atgatctcat gtatgtgaat ttggacaacg

241 aaatagacaa tggactccat cccactgagg accccacgcc gtgcgcctgc ggtcaggagc

301 actcggaatg ggacaagctc ttcatcatgc tggagaactc gcagatgaga gagcgcatgc

361 tgctgcaagc cacggacgac gtcctgcggg gcgagctgca gaggctgcgg gaggagctgg

421 gccggctcgc ggaaagcctg gcgaggccgt gcgcgccggg ggctcccgca gaggccaggc

481 tgaccagtgc tctggacgag ctgctgcagg cgacccgcga cgcgggccgc aggctggcgc

541 gtatggaggg cgcggaggcg cagcgcccag aggaggcggg gcgcgccctg gccgcggtgc

601 tagaggagct gcggcagacg cgagccgacc tgcacgcggt gcagggctgg gctgcccgga

661 gctggctgcc ggcaggttgt gaaacagcta ttttattccc aatgcgttcc aagaagattt

721 ttggaagcgt gcatccagtg agaccaatga ggcttgagtc ttttagtgcc tgcatttggg

781 tcaaagccac agatgtatta aacaaaacca tcctgttttc ctatggcaca aagaggaatc

841 catatgaaat ccagctgtat ctcagctacc aatccatagt gtttgtggtg ggtggagagg

901 agaacaaact ggttgctgaa gccatggttt ccctgggaag gtggacccac ctgtgcggca

961 cctggaattc agaggaaggg ctcacatcct tgtgggtaaa tggtgaactg gcggctacca

1021 ctgttgagat ggccacaggt cacattgttc ctgagggagg aatcctgcag attggccaag

1081 aaaagaatgg ctgctgtgtg ggtggtggct ttgatgaaac attagccttc tctgggagac

1141 tcacaggctt caatatctgg gatagtgttc ttagcaatga agagataaga gagaccggag

1201 gagcagagtc ttgtcacatc cgggggaata ttgttgggtg gggagtcaca gagatccagc

1261 cacatggagg agctcagtat gtttcataaa tgttgtgaaa ctccacttga agccaaagaa

1321 agaaactcac acttaaaaca catgccagtt gggaaggtct gaaaactcag tgcataatag

1381 gaacacttga gactaatgaa agagagagtt gagaccaatc tttatttgta ctggccaaat

1441 actgaataaa cagttgaagg aaagacattg gaaaaagctt ttgaggataa tgttactaga

1501 ctttatgcca tggtgctttc agtttaatgc tgtgtctctg tcagataaac tctcaaataa

1561 ttaaaaagga ctgtattgtt gaacagaggg acaattgttt tacttttctt tggttaattt

1621 tgttttggcc agagatgaat tttacattgg aagaataaca aaataagatt tgttgtccat

1681 tgttcattgt tattggtatg taccttatta caaaaaaaag atgaaaacat atttatacta

1741 caaggtgact taacaactat aaatgtagtt tatgtgttat aatcgaatgt cacgtttttg 1801 agaagatagt catataagtt atattgcaaa agggatttgt attaatttaa gactattttt

1861 gtaaagctct actgtaaata aaatatttta taaaactagc tcacgtcatt taattataaa

1921 tttaagagat gttttggaaa aaaaaaaaaa aaaaa

Protein sequence:

NCBI Reference Sequence: NPJ302843.2

LOCUS NP_002843

ACCESSION NP_002843

1 mhllailfca lwsavlaens ddydlmyvnl dneidnglhp tedptpcacg qehsewdklf

61 imlensqmre rmllqatddv lrgelqrlre elgrlaesla rpcapgapae arltsaldel

121 lqatrdagrr larmegaeaq rpeeagrala avleelrqtr adlhavqgwa arswlpagce

181 tailfpmrsk kifgsvhpvr pmrlesfsac iwvkatdvln ktilfsygtk rnpyeiqlyl

241 syqsivfvvg geenklvaea mvslgrwthl cgtwnseegl tslwvngela attvematgh

301 ivpeggilqi gqekngccvg ggfdetlafs grltgfniwd svlsneeire tggaeschir

361 gnivgwgvte iqphggaqyv s

HSP76

Official Symbol: HSPA6

Official Name: heat shock 70kDa protein 6 (HSP70B')

Gene ID: 3310

Organism: Homo sapiens

Other Aliases:

Other Designations: heat shock 70 kDa protein 6; heat shock 70 kDa protein B'; heat shock 70kD protein 6 (HSP70B')

Nucleotide sequence:

NCBI Reference Sequence: NM_002155.3

LOCUS NM_002155

ACCESSION NM_002155

1 agagccagcc cggaggagct agaaccttcc ccgcatttct ttcagcagcc gag

61 gcgggctggc ctggcgtagc cgcccagcct cgcggctcat gccccgatct 121 tctcccgggg tcagcgccgc gccgcgccac ccggctgagt cagcccgggc gggcgagagg

181 ctctcaactg ggcgggaagg tgcgggaagg tgcggaaagg ttcgcgaaag ttcgcggcgg

241 cgggggtcgg gtgaggcgca aaaggataaa aagcccgtgg aagcggagct gagcagatcc

301 gagccgggct ggctgcagag aaaccgcagg gagagcctca ctgctgagcg cccctcgacg

361 gcggagcggc agcagcctcc gtggcctcca gcatccgaca agaagcttca gccatgcagg

421 ccccacggga gctcgcggtg ggcatcgacc tgggcaccac ctactcgtgc gtgggcgtgt

481 ttcagcaggg ccgcgtggag atcctggcca acgaccaggg caaccgcacc acgcccagct

541 acgtggcctt caccgacacc gagcggctgg tcggggacgc ggccaagagc caggcggccc

601 tgaaccccca caacaccgtg ttcgatgcca agcggctgat cgggcgcaag ttcgcggaca

661 ccacggtgca gtcggacatg aagcactggc ccttccgggt ggtgagcgag ggcggcaagc

721 ccaaggtgcg cgtatgctac cgcggggagg acaagacgtt ctaccccgag gagatctcgt

781 ccatggtgct gagcaagatg aaggagacgg ccgaggcgta cctgggccag cccgtgaagc

841 acgcagtgat caccgtgccc gcctatttca atgactcgca gcgccaggcc accaaggacg

901 cgggggccat cgcggggctc aacgtgttgc ggatcatcaa tgagcccacg gcagctgcca

961 tcgcctatgg gctggaccgg cggggcgcgg gagagcgcaa cgtgctcatt tttgacctgg

1021 gtgggggcac cttcgatgtg tcggttctct ccattgacgc tggtgtcttt gaggtgaaag

1081 ccactgctgg agatacccac ctgggaggag aggacttcga caaccggctc gtgaaccact

1141 tcatggaaga attccggcgg aagcatggga aggacctgag cgggaacaag cgtgccctgc

1201 gcaggctgcg cacagcctgt gagcgcgcca agcgcaccct gtcctccagc acccaggcca

1261 ccctggagat agactccctg ttcgagggcg tggacttcta cacgtccatc actcgtgccc

1321 gctttgagga actgtgctca gacctcttcc gcagcaccct ggagccggtg gagaaggccc

1381 tgcgggatgc caagctggac aaggcccaga ttcatgacgt cgtcctggtg gggggctcca

1441 ctcgcatccc caaggtgcag aagttgctgc aggacttctt caacggcaag gagctgaaca

1501 agagcatcaa ccctgatgag gctgtggcct atggggctgc tgtgcaggcg gccgtgttga

1561 tgggggacaa atgtgagaaa gtgcaggatc tcctgctgct ggatgtggct cccctgtctc

1621 tggggctgga gacagcaggt ggggtgatga ccacgctgat ccagaggaac gccactatcc

1681 ccaccaagca gacccagact ttcaccacct actcggacaa ccagcctggg gtcttcatcc

1741 aggtgtatga gggtgagagg gccatgacca aggacaacaa cctgctgggg cgttttgaac

1801 tcagtggcat ccctcctgcc ccacgtggag tcccccagat agaggtgacc tttgacattg

1861 atgctaatgg catcctgagc gtgacagcca ctgacaggag cacaggtaag gctaacaaga 1921 tcaccatcac caatgacaag ggccggctga gcaaggagga ggtggagagg atggttcatg

1981 aagccgagca gtacaaggct gaggatgagg cccagaggga cagagtggct gccaaaaact

2041 cgctggaggc ccatgtcttc catgtgaaag gttctttgca agaggaaagc cttagggaca

2101 agattcccga agaggacagg cgcaaaatgc aagacaagtg tcgggaagtc cttgcctggc

2161 tggagcacaa ccagctggca gagaaggagg agtatgagca tcagaagagg gagctggagc

2221 aaatctgtcg ccccatcttc tccaggctct atggggggcc tggtgtccct gggggcagca

2281 gttgtggcac tcaagcccgc cagggggacc ccagcaccgg ccccatcatt gaggaggttg

2341 attgaatggc ccttcgtgat aagtcagctg tgactgtcag ggctatgcta tgggccttct

2401 agactgtctt ctatgatcct gcccttcaga gatgaacttt ccctccaaag ctagaacttt

2461 cttcccagga taactgaagt cttttgactt tttgggggga gggcggttca tcctcttctg

2521 cttcaaataa aaagtcatta atttattaaa acttgtgtgg cactttaaca ttgctttcac

2581 ctatattttg tgtactttgt tacttgcatg tatgaatttt gttatgtaaa atatagttat

2641 agacctaaat aaaaaaaaaa aaaa

Protein sequence:

NCBI Reference Sequence: NP_002146.2

LOCUS NP_002146

ACCESSION NP_002146

1 mqaprelavg idlgttyscv gvfqqgrvei landqgnrtt psyvaftdte r lvgdaaksq

61 aalnphntvf dakrligrkf adttvqsdmk hwpfrvvseg gkpkvrvcyr gedktfypee

121 issmvlskmk etaeaylgqp vkhavitvpa yfndsqrqat kdagaiagln vlr iinepta

181 aaiaygldrr gagernvlif dlgggtfdvs vlsidagvfe vkatagdthl ggedfdnr lv

241 nhfmeefrrk hgkdlsgnkr alrrlrtace rakrtlssst qatleidslf egvdfytsit

301 rarfeelcsd Ifrstlepve kalrdakldk aqihdvvlvg gstripkvqk llqdffngke

361 lnksinpdea vaygaavqaa vlmgdkcekv qdlllldvap lslgletagg vmttliqrna

421 tiptkqtqtf ttysdnqpgv fiqvyegera mtkdnnllgr felsgippap rgvpqievtf

481 didangilsv tatdrstgka nkititndkg rlskeeverm vheaeqykae deaqrdrvaa

541 knsleahvfh vkgslqeesl rdkipeedrr kmqdkcrevl awlehnqlae keeyehqkre

601 leqicrpifs rlyggpgvpg gsscgtqarq gdpstgpiie evd

PDIA4 Official Symbol: PDIA4

Official Name: protein disulfide isomerase family A, member 4

Gene ID: 9601

Organism: Homo sapiens

Other Aliases: ERP70, ERP72, ERp-72

Other Designations: ER protein 70; ER protein 72; endoplasmic reticulum resident protein 70; endoplasmic reticulum resident protein 72; protein disulfide isomerase related protein (calcium-binding protein, intestinal-related); protein disulfide isomerase-associated 4; protein disulfide-isomerase A4

Nucleotide sequence:

NCBI Reference Sequence: NM_00491 1 .4

LOCUS NM_00491 1

ACCESSION NM 00491 1

1 gttttaaacg cgcagccgag ggccgcgcgc aggagtaggg agggcctagg gcggcggagc

61 cgactcgtcg cggccgaggc gcgcgcggtc cgtgccggcg tcagtctggg attggccggc

121 ccgcgacttc ctccgccccc tgccaatcgc cggggacgac ttccgtgggt ttttccggct

181 cccccgcgtc gctaaggagc gacgggctgt cggccagacc ccgagttctc ggtgcgctca

241 gcggccgccg acgctaggag gccgcgctcc gcccccgcta ccatgaggcc ccggaaagcc

301 ttcctgctcc tgctgctctt ggggctggtg cagctgctgg ccgtggcggg tgccgagggc

361 ccggacgagg attcttctaa cagagaaaat gccattgagg atgaagagga ggaggaggag

421 gaagatgatg atgaggaaga agacgacttg gaagttaagg aagaaaatgg agtcttggtc

481 ctaaatgatg caaactttga taattttgtg gctgacaaag acacagtgct gctggagttt

541 tatgctccat ggtgtggaca ttgcaagcag tttgctccgg aatatgaaaa aattgccaac

601 atattaaagg ataaagatcc tcccattcct gttgccaaga tcgatgcaac ctcagcgtct

661 gtgctggcca gcaggtttga tgtgagtggc taccccacca tcaagatcct taagaagggg

721 caggctgtag actacgaggg ctccagaacc caggaagaaa ttgttgccaa ggtcagagaa

781 gtctcccagc ccgactggac gcctccacca gaagtcacgc ttgtgttgac caaagagaac

841 tttgatgaag ttgtgaatga tgcagatatc attctggtgg agttttatgc cccatggtgt

901 ggacactgca agaaacttgc ccccgagtat gagaaggccg ccaaggagct cagcaagcgt

961 tctcctccaa ttcccctggc aaaggtcgac gccaccgcag aaacagacct ggccaagagg 1021 tttgatgtct ctggctatcc caccctgaaa attttccgca aaggaaggcc ttatgactac

1081 aacggcccac gagaaaaata tggaatcgtt gattacatga tcgagcagtc cgggcctccc

1141 tccaaggaga ttctgaccct gaagcaggtc caggagttcc tgaaggatgg agacgatgtc

1201 atcatcatcg gggtctttaa gggggagagt gacccagcct accagcaata ccaggatgcc

1261 gctaacaacc tgagagaaga ttacaaattt caccacactt tcagcacaga aatagcaaag

1321 ttcttgaaag tctcccaggg gcagttggtt gtaatgcagc ctgagaaatt ccagtccaag

1381 tatgagcccc ggagccacat gatggacgtc cagggctcca cccaggactc ggccatcaag

1441 gacttcgtgc tgaagtacgc cctgcccctg gttggccacc gcaaggtgtc aaacgatgct

1501 aagcgctaca ccaggcgccc cctggtggtc gtctactaca gtgtggactt cagctttgat

1561 tacagagctg caactcagtt ttggcggagc aaagtcctag aggtggccaa ggacttccct

1621 gagtacacct ttgccattgc ggacgaagag gactatgctg gggaggtgaa ggacctgggg

1681 ctcagcgaga gtggggagga tgtcaatgcc gccatcctgg acgagagtgg gaagaagttc

1741 gccatggagc cagaggagtt tgactctgac accctccgcg agtttgtcac tgctttcaaa

1801 aaaggaaaac tgaagccagt catcaaatcc cagccagtgc ccaagaacaa caagggaccc

1861 gtcaaggtcg tggtgggaaa gacctttgac tccattgtga tggaccccaa gaaggacgtc

1921 ctcatcgagt tctacgcgcc atggtgcggg cactgcaagc agctagagcc cgtgtacaac

1981 agcctggcca agaagtacaa gggccaaaag ggcctggtca tcgccaagat ggacgccact

2041 gccaacgacg tccccagcga ccgctataag gtggagggct tccccaccat ctacttcgcc

2101 cccagtgggg acaaaaagaa cccagttaaa tttgagggtg gagacagaga tctggagcat

2161 ttgagcaagt ttatagaaga acatgccaca aaactgagca ggaccaagga agagctttga

2221 aggcctgagg tctgcggaag gtgggaggag gcagacgccc tgcgtggccc atggtcgggg

2281 cgtccacgcc gaggccggca acaaacgaca gtatctcgga ttcctttttt ttttttttta

2341 attttttata ctttggtgtt tcacttcatg ctctgaatac tgaataacca tgaatgactg

2401 aatagtttag tccagatttt tacagaggat acatctattt ttatcattat ttggggtttg

2461 aaaaattttt ttttacacct tctaatttct ttatttctca aagcagataa ttcttctgtg

2521 tgaaaatgtt ttcttttttt aatttaaggt ttaaaattcc ttttccaaat catgttgatt

2581 ttgctctttg ctttttcgtt gtctgagaaa ttgttggcgt agatttggct tctggtatgt

2641 gtttctgatt gcttcctgtt gagcacaaag tgagagctgc cactgagcag ccctgccagg

2701 ggtgctgttt caggctgggc atcgccaggc ggcctccctg caaaccaagg gctgggggca

2761 aaggggcatg atccagggtc ccccagggtg ggctcagctc cagggagagg ccacccacgt 2821 ggcagcccca cctcttgaga gcccccagtg ccggagcaga aaggaccctg gacccagagg

2881 cagatactgc ggggtggtag aaaaggtaga gtaggctgtg gcaatggaat aaaacacgat

2941 taaaaacgtt aaaaaaaaaa aaaaaaaaaa

Protein sequence:

NCBI Reference Sequence: NP_004902.1

LOCUS NP_004902

ACCESSION NP_004902

1 mrprkaflll lllglvqlla vagaegpded ssnrenaied eeeeeeeddd eeeddlevke

61 engvlvlnda nfdnfvadkd tvllefyapw cghckqfape yekianilkd kdppipvaki

121 datsasvlas rfdvsgypti kilkkgqavd yegsrtqeei vakvrevsqp dwtpppevtl

181 vltkenfdev vndadiilve fyapwcghck klapeyekaa kelskrsppi plakvdatae

241 tdlakrfdvs gyptlkifrk grpydyngpr ekygivdymi eqsgppskei ltlkqvqef1

301 kdgddviiig vfkgesdpay qqyqdaannl redykfhhtf steiakflkv sqgqlvvmqp

361 ekfqskyepr shmmdvqgst qdsaikdfvl kyalplvghr kvsndakryt rrplvvvyys

421 vdfsfdyraa tqfwrskvle vakdfpeytf aiadeedyag evkdlglses gedvnaaild

481 esgkkfamep eefdsdtlre fvtafkkgkl kpviksqpvp knnkgpvkvv vgktfdsivm

541 dpkkdvlief yapwcghckq lepvynslak kykgqkglvi akmdatandv psdrykvegf

601 ptiyfapsgd kknpvkfegg drdlehlskf ieehatklsr tkeel

PDIA1

Official Symbol: P4HB

Official Name: prolyl 4-hydroxylase, beta polypeptide

Gene ID: 5034

Organism : Homo sapiens

Other Aliases: DSI, ERBA2L, GIT, P4Hbeta, PDI, PDIA1 , PHDB, P04DB, P04HB, PROHB

Other Designations: cellular thyroid hormone-binding protein ; collagen prolyl 4- hydroxylase beta; glutathione-insulin transhydrogenase; p55; procollagen- proline, 2-oxoglutarate 4-dioxygenase (proline 4-hydroxylase), beta polypeptide; prolyl 4-hydroxylase subunit beta; protein disulfide isomerase family A, member 1 ; protein disulfide isomerase-associated 1 ; protein disulfide isomerase/oxidoreductase; protein disulfide-isomerase; protocollagen hydroxylase; thyroid hormone-binding protein p55

Nucleotide sequence:

NCBI Reference Sequence: NM_000918.3

LOCUS NM_000918

ACCESSION NM_000918

1 gagcctcgaa gtccgccggc caatcgaagg cgggccccag cggcgcgtgc gcgccgcggc

61 cagcgcgcgc gggcgggggg gcaggcgcgc cccggaccca ggatttataa aggcgaggcc

121 gggaccggcg cgcgctctcg tcgcccccgc tgtcccggcg gcgccaaccg aagcgccccg

181 cctgatccgt gtccgacatg ctgcgccgcg ctctgctgtg cctggccgtg gccgccctgg

241 tgcgcgccga cgcccccgag gaggaggacc acgtcctggt gctgcggaaa agcaacttcg

301 cggaggcgct ggcggcccac aagtacctgc tggtggagtt ctatgcccct tggtgtggcc

361 actgcaaggc tctggcccct gagtatgcca aagccgctgg gaagctgaag gcagaaggtt

421 ccgagatcag gttggccaag gtggacgcca cggaggagtc tgacctggcc cagcagtacg

481 gcgtgcgcgg ctatcccacc atcaagttct tcaggaatgg agacacggct tcccccaagg

541 aatatacagc tggcagagag gctgatgaca tcgtgaactg gctgaagaag cgcacgggcc

601 cggctgccac caccctgcct gacggcgcag ctgcagagtc cttggtggag tccagcgagg

661 tggctgtcat cggcttcttc aaggacgtgg agtcggactc tgccaagcag tttttgcagg

721 cagcagaggc catcgatgac ataccatttg ggatcacttc caacagtgac gtgttctcca

781 aataccagct cgacaaagat ggggttgtcc tctttaagaa gtttgatgaa ggccggaaca

841 actttgaagg ggaggtcacc aaggagaacc tgctggactt tatcaaacac aaccagctgc

901 cccttgtcat cgagttcacc gagcagacag ccccgaagat ttttggaggt gaaatcaaga

961 ctcacatcct gctgttcttg cccaagagtg tgtctgacta tgacggcaaa ctgagcaact

1021 tcaaaacagc agccgagagc ttcaagggca agatcctgtt catcttcatc gacagcgacc

1081 acaccgacaa ccagcgcatc ctcgagttct ttggcctgaa gaaggaagag tgcccggccg

1141 tgcgcctcat caccctggag gaggagatga ccaagtacaa gcccgaatcg gaggagctga

1201 cggcagagag gatcacagag ttctgccacc gcttcctgga gggcaaaatc aagccccacc

1261 tgatgagcca ggagctgccg gaggactggg acaagcagcc tgtcaaggtg cttgttggga

1321 agaactttga agacgtggct tttgatgaga aaaaaaacgt ctttgtggag ttctatgccc

1381 catggtgtgg tcactgcaaa cagttggctc ccatttggga taaactggga gagacgtaca gaacatcgtc atcgccaaga tggactcgac tgccaacgag gtggaggccg

1501 t cagcttcccc acactcaagt tctttcctgc cagtgccgac aggacggtca

1561 t cggggaacgc acgctggatg gttttaagaa attcctggag agcggtggcc

1621 a aggggatgat gacgatctcg aggacctgga agaagcagag gagccagaca

1681 t cgatgatcag aaagctgtga aagatgaact gtaatacgca aagccagacc

1741 c cgagacccct cgggggctgc acacccagca gcagcgcacg cctccgaagc

1801 c gcttgaagga gggcgtcgcc ggaaacccag ggaacctctc tgaagtgaca

1861 c acacaccgtc cgttcacccc cgtctcttcc ttctgctttt cggtttttgg

1921 a atctccaggc agcccaccct ggtggggctt gtttcctgaa accatgatgt

1981 a acatgagtct gtccagagtg cttgctaccg tgttcggagt ctcgctgcct

2041 c ggaggtttct cctctttttg aaaattccgt ctgtgggatt tttagacatt

2101 t agggtatttg ttccaccttg gccaggcctc ctcggagaag cttgtccccc

2161 ς gacggagccg gactggacat ggtcactcag taccgcctgc agtgtcgcca

2221 t tggctcttgc atttttgggt aaatggagac ttccggatcc tgtcagggtg

2281 t ctggaagagg agctggtggc tgccagccct ggggcccggc acaggcctgg

2341 ς tccctcaagc cagggctcct cctcctgtcg tgggctcatt gtgaccactg

2401 ς agcacggcct gtggcctgtt caaggcagaa ccacgaccct tgactcccgg

2461 ς ggccaaggat gctggagctg aatcagacgc tgacagttct tcaggcattt

atcgaattga acacattggc caaataaagt tgaaatttta ccacctgtaa

2581 aaaaaaaaaa aaaaaa

Protein sequence:

NCBI Reference Sequence: NP_000909.2

LOCUS NP_000909

ACCESSION NP_000909

1 mlrrallcla vaalvradap eeedhvlvlr ksnfaealaa hkyllvefya pwcghckala

61 peyakaagkl kaegseirla kvdateesdl aqqygvrgyp tikffrngdt aspkeytagr

121 eaddivnwlk krtgpaattl pdgaaaeslv essevavigf fkdvesdsak qflqaaeaid

181 dipfgitsns dvfskyqldk dgvvlfkkfd egrnnfegev tkenlldfik hnqlplvief

241 teqtapkifg geikthillf lpksvsdydg klsnfktaae sfkgkilfif idsdhtdnqr

301 ileffglkke ecpavrlitl eeemtkykpe seeltaerit efchrflegk ikphlmsqel 361 pedwdkqpvk vlvgknfedv afdekknvfv efyapwcghc kqlapiwdkl getykdheni

421 viakmdstan eveavkvhsf ptlkffpasa drtvidynge rtldgfkkfl esggqdgagd

481 dddledleea eepdmeeddd qkavkdel

CA2D1

Official Symbol: CACNA2D1

Official Name: calcium channel, voltage-dependent, alpha 2/delta subunit 1 Gene ID: 781

Organism: Homo sapiens

Other Aliases: H_DJ0560O14.1 , CACNA2, CACNL2A, CCHL2A

Other Designations: calcium channel, L type, alpha 2 polypeptide;

dihydropyridine-sensitive L-type, calcium channel alpha-2/delta subunit; voltage- dependent calcium channel subunit alpha-2/delta-1 ; voltage-gated calcium channel subunit alpha-2/delta-1

Nucleotide sequence:

NCBI Reference Sequence: NM_000722.2

LOCUS NM_000722

ACCESSION NM_000722

1 cggcggaggc aaggcggccg cggcgcggag cagccgacgc acgctagtgg gtccgcccgc

61 caccgcccct tcctcggcgt ccgctcccgc ccttgccgtc ccccgcgcgg ctccgcgcct

121 cgggccccgg gcgcagccag ccctccagac gcccgcggtc ccggcggcgt gtgctgctct

181 tcctccgccc gcggtttcca gcgccgctcc ttcccccgct tgggcaggga gggggcattg

241 atcttcgatc gcgaagatgg ctgctggctg cctgctggcc ttgactctga cacttttcca

301 atctttgctc atcggcccct cgtcggagga gccgttccct tcggccgtca ctatcaaatc

361 atgggtggat aagatgcaag aagaccttgt cacactggca aaaacagcaa gtggagtcaa

421 tcagcttgtt gatatttatg agaaatatca agatttgtat actgtggaac caaataatgc

481 acgccagctg gtagaaattg cagccaggga tattgagaaa cttctgagca acagatctaa

541 agccctggtg cgcctggcat tggaagcgga gaaagttcaa gcagctcacc agtggagaga

601 agattttgca agcaatgaag ttgtctacta caatgcaaag gatgatctcg atcctgagaa

661 aaatgacagt gagccaggca gccagaggat aaaacctgtt ttcattgaag atgctaattt 721 tggacgacaa atatcttatc agcacgcagc agtccatatt cctactgaca tctatgaggg

781 ctcaacaatt gtgttaaatg aactcaactg gacaagtgcc ttagatgaag ttttcaaaaa

841 gaatcgcgag gaagaccctt cattattgtg gcaggttttt ggcagtgcca ctggcctagc

901 tcgatattat ccagcttcac catgggttga taatagtaga actccaaata agattgacct

961 ttatgatgta cgcagaagac catggtacat ccaaggagct gcatctccta aagacatgct

1021 tattctggtg gatgtgagtg gaagtgttag tggattgaca cttaaactga tccgaacatc

1081 tgtctccgaa atgttagaaa ccctctcaga tgatgatttc gtgaatgtag cttcatttaa

1141 cagcaatgct caggatgtaa gctgttttca gcaccttgtc caagcaaatg taagaaataa

1201 aaaagtgttg aaagacgcgg tgaataatat cacagccaaa ggaattacag attataagaa

1261 gggctttagt tttgcttttg aacagctgct taattataat gtttccagag caaactgcaa

1321 taagattatt atgctattca cggatggagg agaagagaga gcccaggaga tatttaacaa

1381 atacaataaa gataaaaaag tacgtgtatt cacgttttca gttggtcaac acaattatga

1441 cagaggacct attcagtgga tggcctgtga aaacaaaggt tattattatg aaattccttc

1501 cattggtgca ataagaatca atactcagga atatttggat gttttgggaa gaccaatggt

1561 tttagcagga gacaaagcta agcaagtcca atggacaaat gtgtacctgg atgcattgga

1621 actgggactt gtcattactg gaactcttcc ggtcttcaac ataaccggcc aatttgaaaa

1681 taagacaaac ttaaagaacc agctgattct tggtgtgatg ggagtagatg tgtctttgga

1741 agatattaaa agactgacac cacgttttac actgtgcccc aatgggtatt actttgcaat

1801 cgatcctaat ggttatgttt tattacatcc aaatcttcag ccaaagaacc ccaaatctca

1861 ggagccagta acattggatt tccttgatgc agagttagag aatgatatta aagtggagat

1921 tcgaaataag atgattgatg gggaaagtgg agaaaaaaca ttcagaactc tggttaaatc

1981 tcaagatgag agatatattg acaaaggaaa caggacatac acatggacac ctgtcaatgg

2041 cacagattac agtttggcct tggtattacc aacctacagt ttttactata taaaagccaa

2101 actagaagag acaataactc aggccagatc aaaaaagggc aaaatgaagg attcggaaac

2161 cctgaagcca gataattttg aagaatctgg ctatacattc atagcaccaa gagattactg

2221 caatgacctg aaaatatcgg ataataacac tgaatttctt ttaaatttca acgagtttat

2281 tgatagaaaa actccaaaca acccatcatg taacgcggat ttgattaata gagtcttgct

2341 tgatgcaggc tttacaaatg aacttgtcca aaattactgg agtaagcaga aaaatatcaa

2401 gggagtgaaa gcacgatttg ttgtgactga tggtgggatt accagagttt atcccaaaga

2461 ggctggagaa aattggcaag aaaacccaga gacatatgag gacagcttct ataaaaggag 2521 cctagataat gataactatg ttttcactgc tccctacttt aacaaaagtg gacctggtgc

2581 ctatgaatcg ggcattatgg taagcaaagc tgtagaaata tatattcaag ggaaacttct

2641 taaacctgca gttgttggaa ttaaaattga tgtaaattcc tggatagaga atttcaccaa

2701 aacctcaatc agagatccgt gtgctggtcc agtttgtgac tgcaaaagaa acagtgacgt

2761 aatggattgt gtgattctgg atgatggtgg gtttcttctg atggcaaatc atgatgatta

2821 tactaatcag attggaagat tttttggaga gattgatccc agcttgatga gacacctggt

2881 taatatatca gtttatgctt ttaacaaatc ttatgattat cagtcagtat gtgagcccgg

2941 tgctgcacca aaacaaggag caggacatcg ctcagcatat gtgccatcag tagcagacat

3001 attacaaatt ggctggtggg ccactgctgc tgcctggtct attctacagc agtttctctt

3061 gagtttgacc tttccacgac tccttgaggc agttgagatg gaggatgatg acttcacggc

3121 ctccctgtcc aagcagagct gcattactga acaaacccag tatttcttcg ataacgacag

3181 taaatcattc agtggtgtat tagactgtgg aaactgttcc agaatctttc atggagaaaa

3241 gcttatgaac accaacttaa tattcataat ggttgagagc aaagggacat gtccatgtga

3301 cacacgactg ctcatacaag cggagcagac ttctgacggt ccaaatcctt gtgacatggt

3361 taagcaaccc agataccgaa aagggcctga tgtctgcttt gataacaatg tcttggagga

3421 ttatactgac tgtggtggtg tttctggatt aaatccctcc ctgtggtata tcattggaat

3481 ccagtttcta ctactttggc tggtatctgg cagcacacac cgcctgttat gaccttctaa

3541 aaaccaaatc tgcatagtta aactccagac cctgccaaaa catgagccct gccctcaatt

3601 acagtaacgt agggtcagct ataaaatcag acaaacatta gctgggcctg ttccatggca

3661 taacactaag gcgcagactc ctaaggcacc cactggctgc atgtcagggt gtcagatcct

3721 taaacgtgtg tgaatgctgc atcatctatg tgtaacatca aagcaaaatc ctatacgtgt

3781 cctctattgg aaaatttggg agtttgttgt tgcattgttg gt

Protein sequence:

NCBI Reference Sequence: NP_000713.2

LOCUS NP_000713

ACCESSION NP_000713

1 maagcllalt ltlfqsllig psseepfpsa vtikswvdkm qedlvtlakt asgvnqlvdi

61 yekyqdlytv epnnarqlve iaardiekll snrskalvrl aleaekvqaa hqwredfasn

121 evvyynakdd ldpekndsep gsqrikpvfi edanfgrqis yqhaavhipt diyegstivl psllwqvfgs atglaryypa spwvdnsrtp nkidlydvrr

241 rpwyiqgaas p> sgsvsgltlk lirtsvseml etlsdddfvn vasfnsnaqd

301 vscfqhlvqa n\ avnnitakgi tdykkgfsfa feqllnynvs rancnkiiml

361 ftdggeeraq e: kvrvftfsvg qhnydrgpiq wmacenkgyy yeipsigair

421 intqeyldvl gi akqvqwtnvy ldalelglvi tgtlpvfnit gqfenktnlk

481 nqlilgvmgv d\ tprftlcpng yyfaidpngy vllhpnlqpk npksqepvtl

541 dfldaelend i> dgesgektfr tlvksqdery idkgnrtytw tpvngtdysi

601 alvlptysfy y: tqar skkgkm kdsetlkpdn feesgytf ia prdycndlki

661 sdnnteflln fr nnpscnadli nrvlldagft nelvqnywsk qknikgvkar

721 fvvtdggitr qenpetyeds fykr sldndn yvftapyfnk sgpgayesgi

781 mvskaveiyi qq gikidvnswi enftktsird pcagpvcdck rnsdvmdcvi

841 lddggfllma nl rffgeidpsl mrhlvnisvy afnksydyqs vcepgaapkq

901 gaghrsayvp s\ wataaawsil qqfllsltfp r lleavemed ddftaslskq

961 sciteqtqyf fc vldcgncsri fhgeklmntn lifimveskg tcpcdtr Hi

1021 qaeqtsdgpn pc rkgpdvcfdn nvledytdcg gvsglnpslw yiigiqf 111

1081 wlvsgsthrl 1

GPAT1

Official Symbol: GPAM

Official Name: glycerol-3-phosphate acyltransferase, mitochondrial

Gene ID: 57678

Organism: Homo sapiens

Other Aliases: RP1 1 -426E5.2, GPAT, GPAT1

Other Designations: GPAT-1 ; glycerol 3-phosphate acyltransferase, mitochondrial; glycerol-3-phosphate acyltransferase 1 , mitochondrial

Nucleotide sequence:

NCBI Reference Sequence: NM_001244949.

LOCUS NM_001244949

ACCESSION NM 001244949 1 tgcgtcatca gggtgcgcca ctgcagctgg cattggccgg gactggaagt gcgggcttct

61 gcagcagccg aagctggagc tgctagggca gcagcggctc ccctgttgta tggacattct

121 gcacccgaaa ctgatagctg agtcctgaag ttttatgtta tgaaacagaa gaactttcat

181 cccagcacat gatttgggaa ttacactttg tgacatggat gaatctgcac tgacccttgg

241 tacaatagat gtttcttatc tgccacattc atcagaatac agtgttggtc gatgtaagca

301 cacaagtgag gaatggggtg agtgtggctt tagacccacc atcttcagat ctgcaacttt

361 aaaatggaaa gaaagcctaa tgagtcggaa aaggccattt gttggaagat gttgttactc

421 ctgcactccc cagagctggg acaaattttt caaccccagt atcccgtctt tgggtttgcg

481 gaatgttatt tatatcaatg aaactcacac aagacaccgc ggatggcttg caagacgcct

541 ttcttacgtt ctttttattc aagagcgaga tgtgcataag ggcatgtttg ccaccaatgt

601 gactgaaaat gtgctgaaca gcagtagagt acaagaggca attgcagaag tggctgctga

661 attaaaccct gatggttctg cccagcagca atcaaaagcc gttaacaaag tgaaaaagaa

721 agctaaaagg attcttcaag aaatggttgc cactgtctca ccggcaatga tcagactgac

781 tgggtgggtg ctgctaaaac tgttcaacag cttcttttgg aacattcaaa ttcacaaagg

841 tcaacttgag atggttaaag ctgcaactga gacgaatttg ccgcttctgt ttctaccagt

901 tcatagatcc catattgact atctgctgct cactttcatt ctcttctgcc ataacatcaa

961 agcaccatac attgcttcag gcaataatct caacatccca atcttcagta ccttgatcca

1021 taagcttggg ggcttcttca tacgacgaag gctcgatgaa acaccagatg gacggaaaga

1081 tgttctctat agagctttgc tccatgggca tatagttgaa ttacttcgac agcagcaatt

1141 cttggagatc ttcctggaag gcacacgttc taggagtgga aaaacctctt gtgctcgggc

1201 aggacttttg tcagttgtgg tagatactct gtctaccaat gtcatcccag acatcttgat

1261 aatacctgtt ggaatctcct atgatcgcat tatcgaaggt cactacaatg gtgaacaact

1321 gggcaaacct aagaagaatg agagcctgtg gagtgtagca agaggtgtta ttagaatgtt

1381 acgaaaaaac tatggttgtg tccgagtgga ttttgcacag ccattttcct taaaggaata

1441 tttagaaagc caaagtcaga aaccggtgtc tgctctactt tccctggagc aagcgttgtt

1501 accagctata cttccttcaa gacccagtga tgctgctgat gaaggtagag acacgtccat

1561 taatgagtcc agaaatgcaa cagatgaatc cctacgaagg aggttgattg caaatctggc

1621 tgagcatatt ctattcactg ctagcaagtc ctgtgccatt atgtccacac acattgtggc

1681 ttgcctgctc ctctacagac acaggcaggg aattgatctc tccacattgg tcgaagactt

1741 ctttgtgatg aaagaggaag tcctggctcg tgattttgac ctggggttct caggaaattc 1801 agaagatgta gtaatgcatg ccatacagct gctgggaaat tgtgtcacaa tcacccacac

1861 tagcaggaac gatgagtttt ttatcacccc cagcacaact gtcccatcag tcttcgaact

1921 caacttctac agcaatgggg tacttcatgt ctttatcatg gaggccatca tagcttgcag

1981 cctttatgca gttctgaaca agaggggact ggggggtccc actagcaccc cacctaacct

2041 gatcagccag gagcagctgg tgcggaaggc ggccagcctg tgctaccttc tctccaatga

2101 aggcaccatc tcactgcctt gccagacatt ttaccaagtc tgccatgaaa cagtaggaaa

2161 gtttatccag tatggcattc ttacagtggc agagcacgat gaccaggaag atatcagtcc

2221 tagtcttgct gagcagcagt gggacaagaa gcttccagaa cctttgtctt ggagaagtga

2281 tgaagaagat gaagacagtg actttgggga ggaacagcga gattgctacc tgaaggtgag

2341 ccaatccaag gagcaccagc agtttatcac cttcttacag agactccttg ggcctttgct

2401 ggaggcctac agctctgctg ccatctttgt tcacaacttc agtggtcctg ttccagaacc

2461 tgagtatctg caaaagttgc acaaatacct aataaccaga acagaaagaa atgttgcagt

2521 atatgctgag agtgccacat attgtcttgt gaagaatgct gtgaaaatgt ttaaggatat

2581 tggggttttc aaggagacca aacaaaagag agtgtctgtt ttagaactga gcagcacttt

2641 tctacctcaa tgcaaccgac aaaaacttct agaatatatt ctgagttttg tggtgctgta

2701 ggtaacgtgt ggcactgctg gcaaatgaag gtcatgagat gagttccttg taggtaccag

2761 cttctggctc aagagttgaa ggtgccatcg cagggtcagg cctgccctgt cccgaagtga

2821 tctcctggaa gacaagtgcc ttctccctcc atggatctgt gatcttccca gctctgcatc

2881 aacacagcag cctgcagata acacttgggg ggacctcagc ctctattcgc aactcataat

2941 ccgtagacta caagatgaaa tctcaataaa ttatttttga gtttattaaa gattgacatt

3001 ttaagtacaa cttttaagga ctaattactg tgatggacac agaaatgtag ctgtgttctg

3061 gaactgaatc ttacatggta tacttagtgc tgctgggtaa tttgttggta tattatctgg

3121 ttagtggtta atgcttcctt taaaaataat tgagtcatcc attcactctt tttcagtttt

3181 atctgtcaat agtagctaca tttttaatgg gagcaccttt tatcccaaag tgctttataa

3241 attgagtgga ctgatatata tcacacccag gtatcactgt gctgtccttt gctgtcagat

3301 ttagaaatgt ttttaagagc tatgtgaaaa cagacaatat tagtttaggt cgggaactga

3361 gatattgtaa tcaaatagtt aacatcagga agttaatttg gctggcaaaa ttctagggaa

3421 acttggccag aaaactggtg ttgaaggctt ttgctcatat aaacaagtgc cattgagttt

3481 caaatgacca gcaaatatat ttagaaccct tcctgtttta tgtctgtacc tcgtccaccc

3541 ctcaggtaat acctgcctct cacaggtaca gctgtttctt ggaaatcctc caaccaaata 3601 gcagttttcc taacttgatt agcttgagct gacagactgt tagaatacag ttctctggcc

3661 acagctgatg agggctttct gtactgcaca cagattgtgt actgcacccc agtccaggtg

3721 actggtaccc actcgagttg tgccgtgcac aacctgtcca gtatatgcat gtggtggccc

3781 tactgactgg taatggttag aggcatttat ggatttttag ctttgaggaa aaaccatgac

3841 ttttaacaaa tttttatggg ttatatgcct aaacccttat gccacatagt ggtaaataat

3901 tatgaaaaat ggtctgttca taattggtag gtgccttttg tgagcaggga gcataattat

3961 tggtttatta tggtaattat ggtgattttt taaatatcat gtaatgttaa aacgttttct

4021 aacagtttac tgttgcttat ctccaagata ttatggaatt aagaattttt ccagatgagt

4081 gttacataga ttctttgaat ttagtataaa agtactgaga attaagtttg tacttccata

4141 agcttggatt ttaaacactg atagtatctc atgagtaatg tgtgttttgg gagagggagg

4201 gatgctgatt gatatttcac attgtatgaa ataccatgtt tgaaactcat agcaataatg

4261 ctatgctgtt gtgatccctc tcaagttctg catttaaaat atattttttc tttataggaa

4321 ttgatgtata ccatgaagtc attgtcagtt gtagtagctc tgatgttgaa tgagatatca

4381 tgttttagca ttccatttta ctgactaggg tagaagaaca cttttcttgg ctacatttgg

4441 aggataccca gggagtcttg ggtgttcctt atctggggaa gcaaacattt cactagtctc

4501 tttttttcat cctttaaatt gtaaattaag gattactcaa gctcaccatt attcaagatt

4561 gggactcgct tcccagtcga cactctgccc tgcctgtcat tgctgcaaag agctgctgct

4621 ttgccaacct aagcaaagaa aatacggctt ctcttgcatt attttccctt ttggttggtt

4681 tgttttctag aagtacgttc agatgctttg gggaatgcaa tgtatgattt gctagctctc

4741 tcaccactta actcactgtg aggataaata tgcatgcttt ttgtaattaa ctggtgcttt

4801 gaaaatcttt tttaagggag aaaaatctca accaaagtta tgctcatcca gacaagctga

4861 cctttgagtt aatttcagca caactcattc ttcagtgcct catgactgaa aacaaaaaac

4921 aaaaaaacga aagcatcttc acaatgaagc ttccagatag caccgttttg ctaaaagata

4981 cattctcatt gttttccaac agtgatggct tccacataag gttaaacaaa ctaggtgctt

5041 gtaaataatt tattacagtt tactctatcg catttctgta acatgaaatg catgcccttc

5101 ttcaggggaa gactgtggtc aagttaaaaa aaaaaaacaa tattaaacaa catgaaactg

5161 cagtctgttt ttgaaaatga gaatgtccta agtgattcag aagagaggag ggaagttgtg

5221 cactctgaaa atgcatgaaa aacaaaggca aaaactagtg ggaaatgtgt agaactgtta

5281 actgagatgg cttcgagtct tccttctgga atctgttaaa tttcacaaag tcatgagggt

5341 aaatggagaa aatatttctg ggattacaat gaatgtaagc ccaaattgtg gaattgccag ggggaaaagc atttcccata gcactccatg taatatgagt gctctgtgag

5461 a gtgttttata gaaatggtgt tgctgggaaa ccaagtttgc acctggaaac

5521 t ctttagcgca gtaagggctt ggcatccggt agtgaaaaac tgtctaaccc

5581 a aaactatttt gacaccagga cctttttctc ctttgggata cttatgaacc

5641 t gtcctgtgga gaacattttg ggaaacacta tgttagatag ttctttaagg

5701 a gtaatgaaca gatagcactg gggcagaata tgcatgcatt ttgtaacgtc

5761 c ttgaatagat gtgtatttcc tcccctgcag aaaataagca cagaaaatta

5821 t gatcggagct ctttcctttg atagagagaa cagccccaat gatcctggct

5881 t acgtatcaga atacatggat gaattggggt aaataaggtt ttaattcaga

5941 t agtattgtac gtttgaatgc agatttttat ccacagatag ttgtagtgtt

6001 t aggacctatc gttgaggttt ctaagactta ctatgggctg taaacctgtt

6061 t attttagaaa cctgagactt gccgtctggc attttagttt aatacaaact

6121 a tttgaaagag attcttgacc ttatttctaa acgtctagag ctctgaaatg

6181 t aggtattaaa ctatttgcct gttgtacaaa gaaatgttaa gactcgtgaa

6241 a ataaggtact gtgaaataac tgcgattttg tgagcaaaac atacttggaa

6301 a atttttatgc ttgttagtgt attgcaagaa acacagaaaa tgtagttttg

ccaaaaattg aacatacaaa aaaaaaaaaa aaaaaa

Protein sequence:

NCBI Reference Sequence: NP_001231878.1

LOCUS NP_001231878

ACCESSION NP_001231878

1 mdesaltlgt idvsylphss eysvgrckht seewgecgfr ptifrsatlk wkeslmsrkr

61 pfvgrccysc tpqswdkffn psipslglrn viyinethtr hrgwlarrls yvlfiqerdv

121 hkgmfatnvt envlnssrvq eaiaevaael npdgsaqqqs kavnkvkkka krilqemvat

181 vspamirltg wvllklfnsf fwniqihkgq lemvkaatet nlpllflpvh rshidylllt

241 filfchnika pyiasgnnln ipifstlihk lggffirrrl detpdgrkdv lyrallhghi

301 vellrqqqfl eiflegtrsr sgktscarag llsvvvdtls tnvipdilii pvgisydr ii

361 eghyngeqlg kpkkneslws vargvirmlr knygcvrvdf aqpfslkeyl esqsqkpvsa

421 llsleqallp ailpsrpsda adegrdtsin esrnatdesl rrrlianlae hilftasksc 481 aimsthivac lllyrhrqgi dlstlvedff vmkeevlard fdlgfsgnse dvvmhaiqll

541 gncvtithts rndeffitps ttvpsvfeln fysngvlhvf imeaiiacsl yavlnkrglg

601 gptstppnli sqeqlvrkaa slcyllsneg tislpcqtfy qvchetvgkf iqygiltvae

661 hddqedisps laeqqwdkkl peplswrsde ededsdfgee qrdcylkvsq skehqqf itf

721 lqrllgplle ayssaaifvh nf sgpvpepe ylqklhkyli trternvavy aesatyclvk

781 navkmfkdig vfketkqkrv svlelsstf1 pqcnrqklle yilsfvvl

TAZ

Official Symbol: TAZ

Official Name: tafazzin

Gene ID: 6901

Organism: Homo sapiens

Other Aliases: XX-FW83563B9.3, BTHS, CMD3A, EFE, EFE2, G4.5, LVNCX, Taz1

Other Designations: protein G4.5

Nucleotide sequence:

NCBI Reference Sequence: NM_0001 16.3

LOCUS NM_0001 16

ACCESSION NM_0001 16

1 tttccggcgg ttgcaccggg ccggggtgcc agcgcccgcc ttcccgtttc ctcccgttcc

61 gcagcgcgcc cacggcctgt gaccccggcg accgctcccc agtgacgaga gagcggggcc

121 gggcgctgct ccggcctgac ctgcgaaggg acctcggtcc agtcccctgt tgcgccgcgc

181 cccctgtccg tccgtgcgcg ggccagtcag gggccagtgt ctcgagcggt cgaggtcgca

241 gacctagagg cgccccacag gccggcccgg ggcgctggga gcgccggccg cgggccgggt

301 ggggatgcct ctgcacgtga agtggccgtt ccccgcggtg ccgccgctca cctggaccct

361 ggccagcagc gtcgtcatgg gcttggtggg cacctacagc tgcttctgga ccaagtacat

421 gaaccacctg accgtgcaca acagggaggt gctgtacgag ctcatcgaga agcgaggccc

481 ggccacgccc ctcatcaccg tgtccaatca ccagtcctgc atggacgacc ctcatctctg

541 ggggatcctg aaactccgcc acatctggaa cctgaagttg atgcgttgga cccctgcagc 601 tgcagacatc tgcttcacca aggagctaca ctcccacttc ttcagcttgg gcaagtgtgt

661 gcctgtgtgc cgaggagcag aatttttcca agcagagaat gaggggaaag gtgttctaga

721 cacaggcagg cacatgccag gtgctggaaa aagaagagag aaaggagatg gcgtctacca

781 gaaggggatg gacttcattt tggagaagct caaccatggg gactgggtgc atatcttccc

841 agaagggaaa gtgaacatga gttccgaatt cctgcgtttc aagtggggaa tcgggcgcct

901 gattgctgag tgtcatctca accccatcat cctgcccctg tggcatgtcg gaatgaatga

961 cgtccttcct aacagtccgc cctacttccc ccgctttgga cagaaaatca ctgtgctgat

1021 cgggaagccc ttcagtgccc tgcctgtact cgagcggctc cgggcggaga acaagtcggc

1081 tgtggagatg cggaaagccc tgacggactt cattcaagag gaattccagc atctgaagac

1141 tcaggcagag cagctccaca accacctcca gcctgggaga taggccttgc ttgctgcctt

1201 ctggattctt ggcccgcaca gagctggggc tgagggatgg actgatgctt ttagctcaaa

1261 cgtggctttt agacagattt gttcatagac cctctcaagt gccctctccg agctggtagg

1321 cattccagct cctccgtgct tcctcagtta cacaaaggac ctcagctgct tctcccactt

1381 ggccaagcag ggaggaagaa gcttaggcag ggctctcttt ccttcttgcc ttcagatgtt

1441 ctctcccagg ggctggcttc aggagggagc atagaaggca ggtgagcaac cagttggcta

1501 ggggagcagg gggcccacca gagctgtgga gaggggaccc taagactcct cggcctggct

1561 cctacccacc gcccttgccg aaccaggagc tgctcactac ctcctcaggg atggccgttg

1621 gccacgtctt ccttctgcct gagcttcccc ccgaccacag gccctttcct caggcaaggt

1681 ctggcctcag gtgggccgca ggcgggaaaa gcagcccttg gccagaagtc aagcccagcc

1741 acgtggagcc tagagtgagg gcctgaggtc tggctgcttg cccccatgct ggcgccaaca

1801 acttctccat cctttctgcc tctcaacatc acttgaatcc tagggcctgg gttttcatgt

1861 ttttgaaaca gaaccataaa gcatatgtgt tggcttgttg taaaaaaaaa aaaaaaaaa

Protein sequence:

NCBI Reference Sequence: NP_000107.1

LOCUS NP_000107

ACCESSION NP_000107

1 mplhvkwpfp avppltwtla ssvvmglvgt yscfwtkymn hltvhnrevl yeliekrgpa

61 tplitvsnhq scmddphlwg ilklrhiwnl klmrwtpaaa dicftkelhs hff slgkcvp

121 vcrgaeffqa enegkgvldt grhmpgagkr rekgdgvyqk gmdfilekln hgdwvhifpe 181 gkvnmssefl rfkwgigrli aechlnpiil plwhvgmndv lpnsppyfpr fgqkitvlig

241 kpfsalpvle rlraenksav emrkaltdfi qeefqhlktq aeqlhnhlqp

CQ1A2

Official Symbol: COL1 A2

Official Name: collagen, type I, alpha 2

Gene ID: 1278

Organism: Homo sapiens

Other Aliases: OI4

Other Designations: alpha 2(l)-collagen; alpha-2 type I collagen; collagen I, alpha-2 polypeptide; collagen alpha-2(l) chain; collagen of skin, tendon and bone, alpha-2 chain; type I procollagen

Nucleotide sequence:

NCBI Reference Sequence: NM_000089.3

LOCUS NM_000089

ACCESSION NM_000089

1 gtgtcccata gtgtttccaa acttggaaag ggcgggggag ggcgggagga tgcggagggc

61 ggaggtatgc agacaacgag tcagagtttc cccttgaaag cctcaaaagt gtccacgtcc

121 tcaaaaagaa tggaaccaat ttaagaagcc agccccgtgg ccacgtccct tcccccattc

181 gctccctcct ctgcgccccc gcaggctcct cccagctgtg gctgcccggg cccccagccc

241 cagccctccc attggtggag gcccttttgg aggcacccta gggccaggga aacttttgcc

301 gtataaatag ggcagatccg ggctttatta ttttagcacc acggcagcag gaggtttcgg

361 ctaagttgga ggtactggcc acgactgcat gcccgcgccc gccaggtgat acctccgccg

421 gtgacccagg ggctctgcga cacaaggagt ctgcatgtct aagtgctaga catgctcagc

481 tttgtggata cgcggacttt gttgctgctt gcagtaacct tatgcctagc aacatgccaa

541 tctttacaag aggaaactgt aagaaagggc ccagccggag atagaggacc acgtggagaa

601 aggggtccac caggcccccc aggcagagat ggtgaagatg gtcccacagg ccctcctggt

661 ccacctggtc ctcctggccc ccctggtctc ggtgggaact ttgctgctca gtatgatgga

721 aaaggagttg gacttggccc tggaccaatg ggcttaatgg gacctagagg cccacctggt

781 gcagctggag ccccaggccc tcaaggtttc caaggacctg ctggtgagcc tggtgaacct 841 ggtcaaactg gtcctgcagg tgctcgtggt ccagctggcc ctcctggcaa ggctggtgaa

901 gatggtcacc ctggaaaacc cggacgacct ggtgagagag gagttgttgg accacagggt

961 gctcgtggtt tccctggaac tcctggactt cctggcttca aaggcattag gggacacaat

1021 ggtctggatg gattgaaggg acagcccggt gctcctggtg tgaagggtga acctggtgcc

1081 cctggtgaaa atggaactcc aggtcaaaca ggagcccgtg ggcttcctgg tgagagagga

1141 cgtgttggtg cccctggccc agctggtgcc cgtggcagtg atggaagtgt gggtcccgtg

1201 ggtcctgctg gtcccattgg gtctgctggc cctccaggct tcccaggtgc ccctggcccc

1261 aagggtgaaa ttggagctgt tggtaacgct ggtcctgctg gtcccgccgg tccccgtggt

1321 gaagtgggtc ttccaggcct ctccggcccc gttggacctc ctggtaatcc tggagcaaac

1381 ggccttactg gtgccaaggg tgctgctggc cttcccggcg ttgctggggc tcccggcctc

1441 cctggacccc gcggtattcc tggccctgtt ggtgctgccg gtgctactgg tgccagagga

1501 cttgttggtg agcctggtcc agctggctcc aaaggagaga gcggtaacaa gggtgagccc

1561 ggctctgctg ggccccaagg tcctcctggt cccagtggtg aagaaggaaa gagaggccct

1621 aatggggaag ctggatctgc cggccctcca ggacctcctg ggctgagagg tagtcctggt

1681 tctcgtggtc ttcctggagc tgatggcaga gctggcgtca tgggccctcc tggtagtcgt

1741 ggtgcaagtg gccctgctgg agtccgagga cctaatggag atgctggtcg ccctggggag

1801 cctggtctca tgggacccag aggtcttcct ggttcccctg gaaatatcgg ccccgctgga

1861 aaagaaggtc ctgtcggcct ccctggcatc gacggcaggc ctggcccaat tggcccagct

1921 ggagcaagag gagagcctgg caacattgga ttccctggac ccaaaggccc cactggtgat

1981 cctggcaaaa acggtgataa aggtcatgct ggtcttgctg gtgctcgggg tgctccaggt

2041 cctgatggaa acaatggtgc tcagggacct cctggaccac agggtgttca aggtggaaaa

2101 ggtgaacagg gtccccctgg tcctccaggc ttccagggtc tgcctggccc ctcaggtccc

2161 gctggtgaag ttggcaaacc aggagaaagg ggtctccatg gtgagtttgg tctccctggt

2221 cctgctggtc caagagggga acgcggtccc ccaggtgaga gtggtgctgc cggtcctact

2281 ggtcctattg gaagccgagg tccttctgga cccccagggc ctgatggaaa caagggtgaa

2341 cctggtgtgg ttggtgctgt gggcactgct ggtccatctg gtcctagtgg actcccagga

2401 gagaggggtg ctgctggcat acctggaggc aagggagaaa agggtgaacc tggtctcaga

2461 ggtgaaattg gtaaccctgg cagagatggt gctcgtggtg ctcctggtgc tgtaggtgcc

2521 cctggtcctg ctggagccac aggtgaccgg ggcgaagctg gggctgctgg tcctgctggt

2581 cctgctggtc ctcggggaag ccctggtgaa cgtggtgagg tcggtcctgc tggccccaat 2641 ggatttgctg gtcctgctgg tgctgctggt caacctggtg ctaaaggaga aagaggagcc

2701 aaagggccta agggtgaaaa cggtgttgtt ggtcccacag gccccgttgg agctgctggc

2761 ccagctggtc caaatggtcc ccccggtcct gctggaagtc gtggtgatgg aggcccccct

2821 ggtatgactg gtttccctgg tgctgctgga cggactggtc ccccaggacc ctctggtatt

2881 tctggccctc ctggtccccc tggtcctgct gggaaagaag ggcttcgtgg tcctcgtggt

2941 gaccaaggtc cagttggccg aactggagaa gtaggtgcag ttggtccccc tggcttcgct

3001 ggtgagaagg gtccctctgg agaggctggt actgctggac ctcctggcac tccaggtcct

3061 cagggtcttc ttggtgctcc tggtattctg ggtctccctg gctcgagagg tgaacgtggt

3121 ctaccaggtg ttgctggtgc tgtgggtgaa cctggtcctc ttggcattgc cggccctcct

3181 ggggcccgtg gtcctcctgg tgctgtgggt agtcctggag tcaacggtgc tcctggtgaa

3241 gctggtcgtg atggcaaccc tgggaacgat ggtcccccag gtcgcgatgg tcaacccgga

3301 cacaagggag agcgcggtta ccctggcaat attggtcccg ttggtgctgc aggtgcacct

3361 ggtcctcatg gccccgtggg tcctgctggc aaacatggaa accgtggtga aactggtcct

3421 tctggtcctg ttggtcctgc tggtgctgtt ggcccaagag gtcctagtgg cccacaaggc

3481 attcgtggcg ataagggaga gcccggtgaa aaggggccca gaggtcttcc tggcttaaag

3541 ggacacaatg gattgcaagg tctgcctggt atcgctggtc accatggtga tcaaggtgct

3601 cctggctccg tgggtcctgc tggtcctagg ggccctgctg gtccttctgg ccctgctgga

3661 aaagatggtc gcactggaca tcctggtaca gttggacctg ctggcattcg aggccctcag

3721 ggtcaccaag gccctgctgg cccccctggt ccccctggcc ctcctggacc tccaggtgta

3781 agcggtggtg gttatgactt tggttacgat ggagacttct acagggctga ccagcctcgc

3841 tcagcacctt ctctcagacc caaggactat gaagttgatg ctactctgaa gtctctcaac

3901 aaccagattg agacccttct tactcctgaa ggctctagaa agaacccagc tcgcacatgc

3961 cgtgacttga gactcagcca cccagagtgg agcagtggtt actactggat tgaccctaac

4021 caaggatgca ctatggatgc tatcaaagta tactgtgatt tctctactgg cgaaacctgt

4081 atccgggccc aacctgaaaa catcccagcc aagaactggt ataggagctc caaggacaag

4141 aaacacgtct ggctaggaga aactatcaat gctggcagcc agtttgaata taatgtcaaggaaaa

4201 ggagtgactt ccaaggaaat ggctacccaa cttgccttca tgcgcctgct ggccaactat

4261 gcctctcaga acatcaccta ccactgcaag aacagcattg catacatgga tgaggagact

4321 ggcaacctga aaaaggctgt cattctacag ggctctaatg atgttgaact tgttgctgag

4381 ggcaacagca ggttcactta cactgttctt gtagatggct gctctaaaaa gacaaatgaa 4441 tggggaaaga caatcattga atacaaaaca aataagccat cacgcctgcc cttccttgat

4501 attgcacctt tggacatcgg tggtgctgac caggaattct ttgtggacat tggcccagtc

4561 tgtttcaaat aaatgaactc aatctaaatt aaaaaagaaa gaaatttgaa aaaactttct

4621 ctttgccatt tcttcttctt cttttttaac tgaaagctga atccttccat ttcttctgca

4681 catctacttg cttaaattgt gggcaaaaga gaaaaagaag gattgatcag agcattgtgc

4741 aatacagttt cattaactcc ttcccccgct cccccaaaaa tttgaatttt tttttcaaca

4801 ctcttacacc tgttatggaa aatgtcaacc tttgtaagaa aaccaaaata aaaattgaaa

4861 aataaaaacc ataaacattt gcaccacttg tggcttttga atatcttcca cagagggaag

4921 tttaaaaccc aaacttccaa aggtttaaac tacctcaaaa cactttccca tgagtgtgat

4981 ccacattgtt aggtgctgac ctagacagag atgaactgag gtccttgttt tgttttgttc

5041 ataatacaaa ggtgctaatt aatagtattt cagatacttg aagaatgttg atggtgctag

5101 aagaatttga gaagaaatac tcctgtattg agttgtatcg tgtggtgtat tttttaaaaa

5161 atttgattta gcattcatat tttccatctt attcccaatt aaaagtatgc agattatttg

5221 cccaaatctt cttcagattc agcatttgtt ctttgccagt ctcattttca tcttcttcca

5281 tggttccaca gaagctttgt ttcttgggca agcagaaaaa ttaaattgta cctattttgt

5341 atatgtgaga tgtttaaata aattgtgaaa aaaatgaaat aaagcatgtt tggttttcca

5401 aaagaacata t

Protein sequence:

NCBI Reference Sequence: NP_000080.2

LOCUS NP_000080

ACCESSION NP_000080

1 mlsfvdtrtl lllavtlcla tcqslqeetv rkgpagdrgp rgergppgpp grdgedgptg

61 ppgppgppgp pglggnfaaq ydgkgvglgp gpmglmgprg ppgaagapgp qgfqgpagep

121 gepgqtgpag argpagppgk agedghpgkp grpgergvvg pqgargfpgt pglpgfkgir

181 ghngldglkg qpgapgvkge pgapgengtp gqtgarglpg ergrvgapgp agargsdgsv

241 gpvgpagpig sagppgfpga pgpkgeigav gnagpagpag prgevglpgl sgpvgppgnp

301 gangltgakg aaglpgvaga pglpgprgip gpvgaagatg arglvgepgp agskgesgnk

361 gepgsagpqg ppgpsgeegk rgpngeagsa gppgppglrg spgsrglpga dgragvmgpp

421 gsrgasgpag vrgpngdagr pgepglmgpr glpgspgnig pagkegpvgl pgidgrpgpi nigfpgpkgp tgdpgkngdk ghaglagarg apgpdgnnga qgppgpqgvq

541 ς ppgfqglpgp sgpagevgkp gerglhgefg lpgpagprge rgppgesgaa

601 ς psgppgpdgn kgepgvvgav gtagpsgpsg lpgergaagi pggkgekgep

661 ς rdgargapga vgapgpagat gdrgeagaag pagpagprgs pgergevgpa

721 ς aagqpgakge rgakgpkgen gvvgptgpvg aagpagpngp pgpagsrgdg

781 ς aagrtgppgp sgisgppgpp gpagkeglrg prgdqgpvgr tgevgavgpp

841 ς eagtagppgt pgpqgllgap gilglpgsrg erglpgvaga vgepgplgia

901 ς avgspgvnga pgeagrdgnp gndgppgrdg qpghkgergy pgnigpvgaa

961 ς pagkhgnrge tgpsgpvgpa gavgprgpsg pqgirgdkge pgekgprglp

1021 ς lpgiaghhgd qgapgsvgpa gprgpagpsg pagkdgrtgh pgtvgpagir

1081 ς ppgppgppgp pgvsgggydf gydgdfyrad qprsapslrp kdyevdatlk

1141 s tpegsrknpa rtcrdlrlsh pewssgyywi dpnqgctmda ikvycdf stg

1201 e ipaknwyr ss kdkkhvwlge tinagsqfey nvegvtskem atqlafmr 11

1261 a hcknsiaymd eetgnlkkav ilqgsndvel vaegnsrfty tvlvdgcskk

yktnkpsr lp fldiapldig gadqeffvdi gpvcfk

LAMC1

Official Symbol: LAMC1

Official Name: laminin, gamma 1 (formerly LAMB2)

Gene ID: 3915

Organism: Homo sapiens

Other Aliases: RP1 1 -181 K3.1 , LAMB2

Other Designations: S-LAM gamma; S-laminin subunit gamma; laminin B2 chain; laminin subunit gamma-1 ; laminin-10 subunit gamma; laminin-1 1 subunit gamma; laminin-2 subunit gamma; laminin-3 subunit gamma; laminin-4 subunit gamma; laminin-6 subunit gamma; laminin-7 subunit gamma; laminin-8 subunit gamma; laminin-9 subunit gamma

Nucleotide sequence:

NCBI Reference Sequence: NM_002293.3

LOCUS NM 002293 ACCESSION NM 002293

1 gtgcaggctg ctcccggggt aggtgaggga agcgcggagg cggcgcgcgg gggcagtggt

61 cggcgagcag cgcggtcctc gctaggggcg cccacccgtc agtctctccg gcgcgagccg

121 ccgccaccgc ccgcgccgga gtcaggcccc tgggccccca ggctcaagca gcgaagcggc

181 ctccggggga cgccgctagg cgagaggaac gcgccggtgc ccttgccttc gccgtgaccc

241 agcgtgcggg cggcgggatg agagggagcc atcgggccgc gccggccctg cggccccggg

301 ggcggctctg gcccgtgctg gccgtgctgg cggcggccgc cgcggcgggc tgtgcccagg

361 cagccatgga cgagtgcacg gacgagggcg ggcggccgca gcgctgcatg cccgagttcg

421 tcaacgccgc cttcaacgtg actgtggtgg ccaccaacac gtgtgggact ccgcccgagg

481 aatactgtgt gcagaccggg gtgaccgggg tcaccaagtc ctgtcacctg tgcgacgccg

541 ggcagcccca cctgcagcac ggggcagcct tcctgaccga ctacaacaac caggccgaca

601 ccacctggtg gcaaagccag accatgctgg ccggggtgca gtaccccagc tccatcaacc

661 tcacgctgca cctgggaaaa gcttttgaca tcacctatgt gcgtctcaag ttccacacca

721 gccgcccgga gagctttgcc atttacaagc gcacacggga agacgggccc tggattcctt

781 accagtacta cagtggttcc tgtgagaaca cctactccaa ggcaaaccgc ggcttcatca

841 ggacaggagg ggacgagcag caggccttgt gtactgatga attcagtgac atttctcccc

901 tcactggggg caacgtggcc ttttctaccc tggaaggaag gcccagcgcc tataactttg

961 acaatagccc tgtgctgcag gaatgggtaa ctgccactga catcagagta actcttaatc

1021 gcctgaacac ttttggagat gaagtgttta acgatcccaa agttctcaag tcctattatt

1081 atgccatctc tgattttgct gtaggtggca gatgtaaatg taatggacac gcaagcgagt

1141 gtatgaagaa cgaatttgat aagctggtgt gtaattgcaa acataacaca tatggagtag

1201 actgtgaaaa gtgtcttcct ttcttcaatg accggccgtg gaggagggca actgcggaaa

1261 gtgccagtga atgcctgccc tgtgattgca atggtcgatc ccaggaatgc tacttcgacc

1321 ctgaactcta tcgttccact ggccatgggg gccactgtac caactgccag gataacacag

1381 atggcgccca ctgtgagagg tgccgagaga acttcttccg ccttggcaac aatgaagcct

1441 gctcttcatg ccactgtagt cctgtgggct ctctaagcac acagtgtgat agttacggca

1501 gatgcagctg taagccagga gtgatggggg acaaatgtga ccgttgccag cctggattcc

1561 attctctcac tgaagcagga tgcaggccat gctcttgtga tccctctggc agcatagatg

1621 aatgtaatat tgaaacagga agatgtgttt gcaaagacaa tgtcgaaggc ttcaattgtg

1681 aaagatgcaa acctggattt tttaatctgg aatcatctaa tcctcggggt tgcacaccct 1741 gcttctgctt tgggcattct tctgtctgta caaacgctgt tggctacagt gtttattcta

1801 tctcctctac ctttcagatt gatgaggatg ggtggcgtgc ggaacagaga gatggctctg

1861 aagcatctct cgagtggtcc tctgagaggc aagatatcgc cgtgatctca gacagctact

1921 ttcctcggta cttcattgct cctgcaaagt tcttgggcaa gcaggtgttg agttatggtc

1981 agaacctctc cttctccttt cgagtggaca ggcgagatac tcgcctctct gcagaagacc

2041 ttgtgcttga gggagctggc ttaagagtat ctgtaccctt gatcgctcag ggcaattcct

2101 atccaagtga gaccactgtg aagtatgtct tcaggctcca tgaagcaaca gattaccctt

2161 ggaggcctgc tcttacccct tttgaatttc agaagctcct aaacaacttg acctctatca

2221 agatacgtgg gacatacagt gagagaagtg ctggatattt ggatgatgtc accctggcaa

2281 gtgctcgtcc tgggcctgga gtccctgcaa cttgggtgga gtcctgcacc tgtcctgtgg

2341 gatatggagg gcagttttgt gagatgtgcc tctcaggtta cagaagagaa actcctaatc

2401 ttggaccata cagtccatgt gtgctttgcg cctgcaatgg acacagcgag acctgtgatc

2461 ctgagacagg tgtttgtaac tgcagagaca atacggctgg cccgcactgt gagaagtgca

2521 gtgatgggta ctatggagat tcaactgcag gcacctcctc cgattgccaa ccctgtccgt

2581 gtcctggagg ttcaagttgt gctgttgttc ccaagacaaa ggaggtggtg tgcaccaact

2641 gtcctactgg caccactggt aagagatgtg agctctgtga tgatggctac tttggagacc

2701 ccctgggtag aaacggccct gtgagacttt gccgcctgtg ccagtgcagt gacaacatcg

2761 atcccaatgc agttggaaat tgcaatcgct tgacgggaga atgcctgaag tgcatctata

2821 acactgctgg cttctattgt gaccggtgca aagacggatt ttttggaaat cccctggctc

2881 ccaatccagc agacaaatgc aaagcctgca attgcaatct gtatgggacc atgaagcagc

2941 agagcagctg taaccccgtg acggggcagt gtgaatgttt gcctcacgtg actggccagg

3001 actgtggtgc ttgtgaccct ggattctaca atctgcagag tgggcaaggc tgtgagaggt

3061 gtgactgcca tgccttgggc tccaccaatg ggcagtgtga catccgcacc ggccagtgtg

3121 agtgccagcc cggcatcact ggtcagcact gtgagcgctg tgaggtcaac cactttgggt

3181 ttggacctga aggctgcaaa ccctgtgact gtcatcctga gggatctctt tcacttcagt

3241 gcaaagatga tggtcgctgt gaatgcagag aaggctttgt gggaaatcgc tgtgaccagt

3301 gtgaagaaaa ctatttctac aatcggtctt ggcctggctg ccaggaatgt ccagcttgtt

3361 accggctggt aaaggataag gttgctgatc atagagtgaa gctccaggaa ttagagagtc

3421 tcatagcaaa ccttggaact ggggatgaga tggtgacaga tcaagccttc gaggatagac

3481 taaaggaagc agagagggaa gttatggacc tccttcgtga ggcccaggat gtcaaagatg 3541 ttgaccagaa tttgatggat cgcctacaga gagtgaataa cactctgtcc agccaaatta

3601 gccgtttaca gaatatccgg aataccattg aagagactgg aaacttggct gaacaagcgc

3661 gtgcccatgt agagaacaca gagcggttga ttgaaatcgc atccagagaa cttgagaaag

3721 caaaagtcgc tgctgccaat gtgtcagtca ctcagccaga atctacaggg gacccaaaca

3781 acatgactct tttggcagaa gaggctcgaa agcttgctga acgtcataaa caggaagctg

3841 atgacattgt tcgagtggca aagacagcca atgatacgtc aactgaggca tacaacctgc

3901 ttctgaggac actggcagga gaaaatcaaa cagcatttga gattgaagag cttaatagga

3961 agtatgaaca agcgaagaac atctcacagg atctggaaaa acaagctgcc cgagtacatg

4021 aggaggccaa aagggccggt gacaaagctg tggagatcta tgccagcgtg gctcagctga

4081 gccctttgga ctctgagaca ctggagaatg aagcaaataa cataaagatg gaagctgaga

4141 atctggaaca actgattgac cagaaattaa aagattatga ggacctcaga gaagatatga

4201 gagggaagga acttgaagtc aagaaccttc tggagaaagg caagactgaa cagcagaccg

4261 cagaccaact cctagcccga gctgatgctg ccaaggccct cgctgaagaa gctgcaaaga

4321 agggacggga taccttacaa gaagctaatg acattctcaa caacctgaaa gattttgata

4381 ggcgtgtgaa cgataacaag acggccgcag aggaggcact aaggaagatt cctgccatca

4441 accagaccat cactgaagcc aatgaaaaga ccagagaagc ccagcaggcc ctgggcagtg

4501 ctgcggcgga tgccacagag gccaagaaca aggcccatga ggcggagagg atcgcgagcg

4561 ctgtccaaaa gaatgccacc agcaccaagg cagaagctga aagaactttt gcagaagtta

4621 cagatctgga taatgaggtg aacaatatgt tgaagcaact gcaggaagca gaaaaagagc

4681 taaagagaaa acaagatgac gctgaccagg acatgatgat ggcagggatg gcttcacagg

4741 ctgctcaaga agccgagatc aatgccagaa aagccaaaaa ctctgttact agcctcctca

4801 gcattattaa tgacctcttg gagcagctgg ggcagctgga tacagtggac ctgaataagc

4861 taaacgagat tgaaggcacc ctaaacaaag ccaaagatga aatgaaggtc agcgatcttg

4921 ataggaaagt gtctgacctg gagaatgaag ccaagaagca ggaggctgcc atcatggact

4981 ataaccgaga tatcgaggag atcatgaagg acattcgcaa tctggaggac atcaggaaga

5041 ccttaccatc tggctgcttc aacaccccgt ccattgaaaa gccctagtgt ctttagggct

5101 ggaaggcagc atccctctga caggggggca gttgtgaggc cacagagtgc cttgacacaa

5161 agattacatt tttcagaccc ccactcctct gctgctgtcc atgactgtcc ttttgaacca

5221 ggaaaagtca cagagtttaa agagaagcaa attaaacatc ctgaatcggg aacaaagggt

5281 tttatctaat aaagtgtctc ttccattcac gttgctacct tacccacact ttcccttctg 5341 atttgcgtga ggacgtggca tcctacgtta ctgtacagtg gcataagcac atcgtgtgag

5401 cccatgtatg ctggggtaga gcaagtagcc ctcccctgtc tcatcgatac cagcagaacc

5461 tcctcagtct cagtactctt gtttctatga aggaaaagtt tggctactaa cagtagcatt

5521 gtgatggcca gtatatccag tccatggata aagaaaatgc atctgcatct cctacccctc

5581 ttccttctaa gcaaaaggaa ataaacatcc tgtgccaaag gtattggtca tttagaatgt

5641 cggtagccat ccatcagtgc ttttagttat tatgagtgta ggacactgag ccatccgtgg

5701 gtcaggatgc aattatttat aaaagtctcc aggtgaacat ggctgaagat ttttctagta

5761 tattaataat tgactaggaa gatgaacttt ttttcagatc tttgggcagc tgataattta

5821 aatctggatg ggcagcttgc actcaccaat agaccaaaag acatcttttg atattcttat

5881 aaatggaact tacacagaag aaatagggat atgataacca ctaaaatttt gttttcaaaa

5941 tcaaactaat tcttacagct tttttattag ttagtcttgg aactagtgtt aagtatctgg

6001 cagagaacag ttaatcccta aggtcttgac aaaacagaag aaaaacaagc ctcctcgtcc

6061 tagtcttttc tagcaaaggg ataaaactta gatggcagct tgtactgtca gaatcccgtg

6121 tatccatttg ttcttctgtt ggagagatga gacatttgac ccttagctcc agttttcttc

6181 tgatgtttcc atcttccaga atccctcaaa aaacattgtt tgccaaatcc tggtggcaaa

6241 tacttgcact cagtatttca cacagctgcc aacgctatcg agttcctgca ctttgtgatt

6301 taaatccact ctaaaccttc cctctaagtg tagagggaag acccttacgt ggagtttcct

6361 agtgggcttc tcaacttttg atcctcagct ctgtggtttt aagaccacag tgtgacagtt

6421 ccctgccaca cacccccttc ctcctaccaa cccacctttg agattcatat atagccttta

6481 acactatgca actttgtact ttgcgtagca ggggcggggt ggggggaaag aaactattat

6541 ctgacacact ggtgctatta attatttcaa atttatattt ttgtgtgaat gttttgtgtt

6601 ttgtttatca tgattataga ataaggaatt tatgtaaata tacttagtcc tatttctaga

6661 atgacactct gttcactttg ctcaattttt cctcttcact ggcacaatgt atctgaatac

6721 ctccttccct cccttctaga attctttgga ttgtactcca aagaattgtg ccttgtgttt

6781 gcagcatctc cattctctaa aattaatata attgctttcc tccacaccca gccactgtaa

6841 agaggtaact tgggtcctct tccattgcag tcctgatgat cctaacctgc agcacggtgg

6901 ttttacaatg ttccagagca ggaacgccag gttgacaagc tatggtagga ttaggaaagt

6961 ttgctgaaga ggatctttga cgccacagtg ggactagcca ggaatgaggg agaaatgccc

7021 tttctggcaa ttgttggagc tggataggta agttttataa gggagtacat tttgactgag

7081 cacttagggc atcaggaaca gtgctactta ctgatgggta gactgggaga ggtggtgtaa 7141 cttagttctt gatgatccca cttcctgttt ccatctgctt gggatatacc agagtttacc

7201 acaagtgttt tgacgatata ctcctgagct ttcactctgc tgcttctccc aggcctcttc

7261 tactatggca ggagatgtgg cgtgctgttg caaagttttc acgtcattgt ttcctggcta

7321 gttcatttca ttaagtggct acatcctaac atatgcattt ggtcaaggtt gcagaagagg

7381 actgaagatt gactgccaag ctagtttggg tgaagttcac tccagcaagt ctcaggccac

7441 aatggggtgg tttggtttgg tttcctttta actttctttt tgttatttgc ttttctcctc

7501 cacctgtgtg gtatattttt taagcagaat tttatttttt aaaataaaag gttctttaca

7561 agatgatacc ttaattacac tcccgcaaca cagccattat tttattgtct agctccagtt

7621 atctgtattt tatgtaatgt aattgacagg atggctgctg cagaatgctg gttgacacag

7681 ggattattat actgctattt ttccctgaat ttttttcctt tgaattccaa ctgtggacct

7741 tttatatgtg ccttcacttt agctgtttgc cttaatctct acagccttgc tctccggggt

7801 ggttaataaa atgcaacact tggcattttt atgttttaag aaaaacagta ttttatttat

7861 aataaaatct gaatatttgt aacccttt

Protein sequence:

NCBI Reference Sequence: NP_002284.3

LOCUS NP_002284

ACCESSION NP 002284

1 mrgshraapa Irprgrlwpv lavlaaaaaa gcaqaamdec tdeggrpqrc mpefvnaafn

61 vtvvatntcg tppeeycvqt gvtgvtksch Icdagqphlq hgaafltdyn nqadttwwqs

121 qtmlagvqyp ssinltlhlg kafdityvrl kfhtsrpesf aiykrtredg pwipyqyysg

181 scentyskan rgfirtggde qqalctdefs displtggnv afstlegrps aynfdnspvl

241 qewvtatdir vtlnrlntfg devfndpkvl ksyyyaisdf avggrckcng hasecmknef

301 dklvcnckhn tygvdcekcl pffndrpwrr ataesasecl pcdcngrsqe cyfdpelyrs

361 tghgghctnc qdntdgahce rcrenffrlg nneacsschc spvgslstqc dsygrcsckp

421 gvmgdkcdrc qpgfhsltea gcrpcscdps gsidecniet grcvckdnve gfncerckpg

481 ffnlessnpr gctpcfcfgh ssvctnavgy svysisstfq idedgwraeq rdgseaslew

541 sserqdiavi sdsyfpryfi apakflgkqv lsygqnlsfs frvdrrdtrl saedlvlega

601 glrvsvplia qgnsypsett vkyvfrlhea tdypwrpalt pfefqkllnn ltsikirgty 661 sersagyldd vtlasarpgp gvpatwvesc tcpvgyggqf cemclsgyrr etpnlgpysp

721 cvlcacnghs etcdpetgvc ncrdntagph cekcsdgyyg dstagtssdc qpcpcpggss

781 cavvpktkev vctncptgtt gkrcelcddg yfgdplgrng pvrlcrlcqc sdnidpnavg

841 ncnrltgecl kciyntagfy cdrckdgffg nplapnpadk ckacncnlyg tmkqqsscnp

901 vtgqceclph vtgqdcgacd pgfynlqsgq gcercdchal gstngqcdir tgqcecqpgi

961 tgqhcercev nhfgfgpegc kpcdchpegs lslqckddgr cecregfvgn rcdqceenyf

1021 ynrswpgcqe cpacyrlvkd kvadhrvklq eleslianlg tgdemvtdqa fedrlkeaer

1081 evmdllreaq dvkdvdqnlm drlqrvnntl ssqisrlqni rntieetgnl aeqarahven

1141 terlieiasr elekakvaaa nvsvtqpest gdpnnmtlla eearklaerh kqeaddivrv

1201 aktandtste aynlllrtla genqtafeie elnrkyeqak nisqdlekqa arvheeakra

1261 gdkaveiyas vaqlspldse tleneannik meaenleqli dqklkdyedl redmrgkele

1321 vknllekgkt eqqtadqlla radaakalae eaakkgrdtl qeandilnnl kdfdrrvndn

1381 ktaaeealrk ipainqtite anektreaqq algsaaadat eaknkaheae r iasavqkna

1441 tstkaeaert faevtdldne vnnmlkqlqe aekelkrkqd dadqdmmmag masqaaqeae

1501 inarkaknsv tsllsiindl leqlgqldtv dlnklneieg tlnkakdemk vsdldrkvsd

1561 leneakkqea aimdynrdie eimkdirnle dirktlpsgc fntpsiekp

SPRC

Official Symbol: SPARC

Official Name: secreted protein, acidic, cysteine-rich (osteonectin)

Gene ID: 6678

Organism: Homo sapiens

Other Aliases: ON

Other Designations: BM-40; basement-membrane protein 40; cysteine-rich protein; osteonectin; secreted protein acidic and rich in cysteine

Nucleotide sequence:

NCBI Reference Sequence: NM_0031 18.3

LOCUS NM_0031 18

ACCESSION NM_0031 18

1 gggagaagga ggaggccggg ggaaggagga gacaggagga ggagggacca cggggtggag 61 gggagataga cccagcccag agctctgagt ggtttcctgt tgcctgtctc taaacccctc

121 cacattcccg cggtccttca gactgcccgg agagcgcgct ctgcctgccg cctgcctgcc

181 tgccactgag ggttcccagc accatgaggg cctggatctt ctttctcctt tgcctggccg

241 ggagggcctt ggcagcccct cagcaagaag ccctgcctga tgagacagag gtggtggaag

301 aaactgtggc agaggtgact gaggtatctg tgggagctaa tcctgtccag gtggaagtag

361 gagaatttga tgatggtgca gaggaaaccg aagaggaggt ggtggcggaa aatccctgcc

421 agaaccacca ctgcaaacac ggcaaggtgt gcgagctgga tgagaacaac acccccatgt

481 gcgtgtgcca ggaccccacc agctgcccag cccccattgg cgagtttgag aaggtgtgca

541 gcaatgacaa caagaccttc gactcttcct gccacttctt tgccacaaag tgcaccctgg

601 agggcaccaa gaagggccac aagctccacc tggactacat cgggccttgc aaatacatcc

661 ccccttgcct ggactctgag ctgaccgaat tccccctgcg catgcgggac tggctcaaga

721 acgtcctggt caccctgtat gagagggatg aggacaacaa ccttctgact gagaagcaga

781 agctgcgggt gaagaagatc catgagaatg agaagcgcct ggaggcagga gaccaccccg

841 tggagctgct ggcccgggac ttcgagaaga actataacat gtacatcttc cctgtacact

901 ggcagttcgg ccagctggac cagcacccca ttgacgggta cctctcccac accgagctgg

961 ctccactgcg tgctcccctc atccccatgg agcattgcac cacccgcttt ttcgagacct

1021 gtgacctgga caatgacaag tacatcgccc tggatgagtg ggccggctgc ttcggcatca

1081 agcagaagga tatcgacaag gatcttgtga tctaaatcca ctccttccac agtaccggat

1141 tctctcttta accctcccct tcgtgtttcc cccaatgttt aaaatgtttg gatggtttgt

1201 tgttctgcct ggagacaagg tgctaacata gatttaagtg aatacattaa cggtgctaaa

1261 aatgaaaatt ctaacccaag acatgacatt cttagctgta acttaactat taaggccttt

1321 tccacacgca ttaatagtcc catttttctc ttgccatttg tagctttgcc cattgtctta

1381 ttggcacatg ggtggacacg gatctgctgg gctctgcctt aaacacacat tgcagcttca

1441 acttttctct ttagtgttct gtttgaaact aatacttacc gagtcagact ttgtgttcat

1501 ttcatttcag ggtcttggct gcctgtgggc ttccccaggt ggcctggagg tgggcaaagg

1561 gaagtaacag acacacgatg ttgtcaagga tggttttggg actagaggct cagtggtggg

1621 agagatccct gcagaaccca ccaaccagaa cgtggtttgc ctgaggctgt aactgagaga

1681 aagattctgg ggctgtgtta tgaaaatata gacattctca cataagccca gttcatcacc

1741 atttcctcct ttacctttca gtgcagtttc ttttcacatt aggctgttgg ttcaaacttt

1801 tgggagcacg gactgtcagt tctctgggaa gtggtcagcg catcctgcag ggcttctcct 1861 cctctgtctt ttggagaacc agggctcttc tcaggggctc tagggactgc caggctgttt

1921 cagccaggaa ggccaaaatc aagagtgaga tgtagaaagt tgtaaaatag aaaaagtgga

1981 gttggtgaat cggttgttct ttcctcacat ttggatgatt gtcataaggt ttttagcatg

2041 ttcctccttt tcttcaccct cccctttttt cttctattaa tcaagagaaa cttcaaagtt

2101 aatgggatgg tcggatctca caggctgaga actcgttcac ctccaagcat ttcatgaaaa

2161 agctgcttct tattaatcat acaaactctc accatgatgt gaagagtttc acaaatcctt

2221 caaaataaaa agtaatgact tagaaactgc cttcctgggt gatttgcatg tgtcttagtc

2281 ttagtcacct tattatcctg acacaaaaac acatgagcat acatgtctac acatgactac

2341 acaaatgcaa acctttgcaa acacattatg cttttgcaca cacacacctg tacacacaca

2401 ccggcatgtt tatacacagg gagtgtatgg ttcctgtaag cactaagtta gctgttttca

2461 tttaatgacc tgtggtttaa cccttttgat cactaccacc attatcagca ccagactgag

2521 cagctatatc cttttattaa tcatggtcat tcattcattc attcattcac aaaatattta

2581 tgatgtattt actctgcacc aggtcccatg ccaagcactg gggacacagt tatggcaaag

2641 tagacaaagc atttgttcat ttggagctta gagtccagga ggaatacatt agataatgac

2701 acaatcaaat ataaattgca agatgtcaca ggtgtgatga agggagagta ggagagacca

2761 tgagtatgtg taacaggagg acacagcatt attctagtgc tgtactgttc cgtacggcag

2821 ccactaccca catgtaactt tttaagattt aaatttaaat tagttaacat tcaaaacgca

2881 gctccccaat cacactagca acatttcaag tgcttgagag ccatgcatga ttagtggtta

2941 ccctattgaa taggtcagaa gtagaatctt ttcatcatca cagaaagttc tattggacag

3001 tgctcttcta gatcatcata agactacaga gcacttttca aagctcatgc atgttcatca

3061 tgttagtgtc gtattttgag ctggggtttt gagactcccc ttagagatag agaaacagac

3121 ccaagaaatg tgctcaattg caatgggcca catacctaga tctccagatg tcatttcccc

3181 tctcttattt taagttatgt taagattact aaaacaataa aagctcctaa aaaatcaaac

3241 tgtattctgg tgttctcttc tacacagtgg gagggcgagc agtaggagag attggcccat

3301 ttggtgctgg ccatttgagg aatgcaagcc cagcactagt ctcataatct ctaggaatct

3361 gtagagagag gaattgaagt aaatttcagc attggctcat tcagtcattc ggcgacattc

3421 atcaggtacc tgcaatgtgt taggggatct tatgagtagg cagcgtgcgt gatccttgct

3481 cccctggagc tttctaacat tctagcaggc agaccacaca taaatttgca atactgtttc

3541 tgataaaaac gtgctgtaaa ggaaataaag cagagaacta tcatggaaaa aaaaaaaaaa

3601 aaaa Protein sequence:

NCBI Reference Sequence: NP_003109.1

LOCUS NP_003109

ACCESSION NP_003109

1 mrawiffllc lagralaapq qealpdetev veetvaevte vsvganpvqv evgefddgae

61 eteeevvaen pcqnhhckhg kvceldennt pmcvcqdpts cpapigefek vcsndnktfd

121 sschffatkc tlegtkkghk Ihldyigpck yippcldsel tefplrmrdw lknvlvt lye

181 rdednnllte kqklrvkkih enekrleagd hpvellardf eknynmyifp vhwqfgqldq

241 hpidgylsht elaplrapli pmehcttrff etcdldndky ialdewagcf gikqkdidkd

301 lvi

P3H1

Official Symbol: LEPRE1

Official Name: leucine proline-enriched proteoglycan (leprecan) 1

Gene ID: 64175

Organism: Homo sapiens

Other Aliases: PSEC0109, GROS1 , OI8, P3H1

Other Designations: growth suppressor 1 ; leprecan; leucine- and proline- enriched proteoglycan 1 ; prolyl 3-hydroxylase 1

Nucleotide sequence:

NCBI Reference Sequence: NM_001 146289.1

LOCUS NM_001 146289

ACCESSION NM_001 146289

1 atgcgccgcc cggcttggaa ggtggggctt cgcccggggg cgggccttcg ccgggggtag

61 gactccggcc ttggtggcgg gtggctggcg gttccgttag gtctgaggga gcgatggcgg

121 tacgcgcgtt gaagctgctg accacactgc tggctgtcgt ggccgctgcc tcccaagccg

181 aggtcgagtc cgaggcagga tggggcatgg tgacgcctga tctgctcttc gccgagggga

241 ccgcagccta cgcgcgcggg gactggcccg gggtggtcct gagcatggaa cgggcgctgc

301 gctcccgggc agccctccgc gcccttcgcc tgcgctgccg cacccagtgt gccgccgact 361 tcccgtggga gctggacccc gactggtccc ccagcccggc ccaggcctcg ggcgccgccg

421 ccctgcgcga cctgagcttc ttcgggggcc ttctgcgtcg cgctgcctgc ctgcgccgct

481 gcctcgggcc gccggccgcc cactcgctca gcgaagagat ggagctggag ttccgcaagc

541 ggagccccta caactacctg caggtcgcct acttcaagat caacaagttg gagaaagctg

601 ttgctgcagc acacaccttc ttcgtgggca atcctgagca catggaaatg cagcagaacc

661 tagactatta ccaaaccatg tctggagtga aggaggccga cttcaaggat cttgagactc

721 aaccccatat gcaagaattt cgactgggag tgcgactcta ctcagaggaa cagccacagg

781 aagctgtgcc ccacctagag gcggcgctgc aagaatactt tgtggcctat gaggagtgcc

841 gtgccctctg cgaagggccc tatgactacg atggctacaa ctaccttgag tacaacgctg

901 acctcttcca ggccatcaca gatcattaca tccaggtcct caactgtaag cagaactgtg

961 tcacggagct tgcttcccac ccaagtcgag agaagccctt tgaagacttc ctcccatcgc

1021 attataatta tctgcagttt gcctactata acattgggaa ttatacacag gctgttgaat

1081 gtgccaagac ctatcttctc ttcttcccca atgacgaggt gatgaaccaa aatttggcct

1141 attatgcagc tatgcttgga gaagaacaca ccagatccat cggcccccgt gagagtgcca

1201 aggagtaccg acagcgaagc ctactggaaa aagaactgct tttcttcgct tatgatgttt

1261 ttggaattcc ctttgtggat ccggattcat ggactccaga agaagtgatt cccaagagat

1321 tgcaagagaa acagaagtca gaacgggaaa cagccgtacg catctcccag gagattggga

1381 accttatgaa ggaaatcgag acccttgtgg aagagaagac caaggagtca ctggatgtga

1441 gcagactgac ccgggaaggt ggccccctgc tgtatgaagg catcagtctc accatgaact

1501 ccaaactcct gaatggttcc cagcgggtgg tgatggacgg cgtaatctct gaccacgagt

1561 gtcaggagct gcagagactg accaatgtgg cagcaacctc aggagatggc taccggggtc

1621 agacctcccc acatactccc aatgaaaagt tctatggtgt cactgtcttc aaagccctca

1681 agctggggca agaaggcaaa gttcctctgc agagtgccca cctgtactac aacgtgacgg

1741 agaaggtgcg gcgcatcatg gagtcctact tccgcctgga tacgcccctc tacttttcct

1801 actctcatct ggtgtgccgc actgccatcg aagaggtcca ggcagagagg aaggatgata

1861 gtcatccagt ccacgtggac aactgcatcc tgaatgccga gaccctcgtg tgtgtcaaag

1921 agcccccagc ctacaccttc cgcgactaca gcgccatcct ttacctaaat ggggacttcg

1981 atggcggaaa cttttatttc actgaactgg atgccaagac cgtgacggca gaggtgcagc

2041 ctcagtgtgg aagagccgtg ggattctctt caggcactga aaacccacat ggagtgaagg

2101 ctgtcaccag ggggcagcgc tgtgccatcg ccctgtggtt caccctggac cctcgacaca 2161 gcgagcgggt gagagcagct cgagcgggac agggtgcagg cagatgacct ggtgaagatg

2221 ctcttcagcc cagaagagat ggacctctcc caggagcagc ccctggatgc ccagcagggc

2281 ccccccgaac ctgcacaaga gtctctctca ggcagtgaat cgaagcccaa ggatgagcta

2341 tgacagcgtc caggtcagac ggatgggtga ctagacccat ggagaggaac tcttctgcac

2401 tctgagctgg ccagcccctc ggggctgcag agcagtgagc ctacatctgc cactcagccg

2461 aggggaccct gctcacagcc ttctacatgg tgctactgct cttggagtgg acatgaccag

2521 acaccgcacc ccctggatct ggctgagggc tcaggacaca ggcccagcca cccccagggg

2581 cctccacagg ccgctgcatg acagcgatac agtacttaag tgtctgtgta gacaaccaaa

2641 gaataaatga ttcatggttt tttttacttg gtttgttcag acaatggaaa tttgcccatt

2701 ctgtcaaaaa aaaaa

Protein sequence:

NCBI Reference Sequence: NP_001 139761 .1

LOCUS NP_001 139761

ACCESSION NP_001 139761

1 mavralkllt tllavvaaas qaeveseagw gmvtpdllfa egtaayargd wpgvvlsmer

61 alrsraalra lrlrcrtqca adfpweldpd wspspaqasg aaalrdlsff ggllrraacl

121 rrclgppaah slseemelef rkrspynylq vayfkinkle kavaaahtff vgnpehmemq

181 qnldyyqtms gvkeadfkdl etqphmqefr lgvrlyseeq pqeavphlea alqeyfvaye

241 ecralcegpy dydgynyley nadlfqaitd hyiqvlnckq ncvtelashp srekpfedf1

301 pshynylqfa yynignytqa vecaktyllf fpndevmnqn layyaamlge ehtrsigpre

361 sakeyrqrsl lekellffay dvfgipfvdp dswtpeevip krlqekqkse retavrisqe

421 ignlmkeiet lveektkesl dvsrltregg pllyegislt mnskllngsq rvvmdgvisd

481 hecqelqrlt nvaatsgdgy rgqtsphtpn ekfygvtvfk alklgqegkv plqsahlyyn

541 vtekvrrime syfrldtply fsyshlvcrt aieevqaerk ddshpvhvdn cilnaetlvc

601 vkeppaytfr dysailylng dfdggnfyft eldaktvtae vqpqcgravg f ssgtenphg

661 vkavtrgqrc aialwftldp rhservraar agqgagr

C06A1

Official Symbol: COL6A1 Official Name: collagen, type VI, alpha 1

Gene ID: 1291

Organism: Homo sapiens

Other Aliases: OPLL

Other Designations: alpha 1 (VI) chain (61 AA); collagen VI, alpha-1 polypeptide; collagen alpha-1 (VI) chain

Nucleotide sequence:

NCBI Reference Sequence: NM_001848.2

LOCUS NM_001848

ACCESSION NM_001848

1 gctctcactc tggctgggag cagaaggcag cctcggtctc tgggcggcgg cggcggccca

61 ctctgccctg gccgcgctgt gtggtgaccg caggccccag acatgagggc ggcccgtgct

121 ctgctgcccc tgctgctgca ggcctgctgg acagccgcgc aggatgagcc ggagaccccg

181 agggccgtgg ccttccagga ctgccccgtg gacctgttct ttgtgctgga cacctctgag

241 agcgtggccc tgaggctgaa gccctacggg gccctcgtgg acaaagtcaa gtccttcacc

301 aagcgcttca tcgacaacct gagggacagg tactaccgct gtgaccgaaa cctggtgtgg

361 aacgcaggcg cgctgcacta cagtgacgag gtggagatca tccaaggcct cacgcgcatg

421 cctggcggcc gcgacgcact caaaagcagc gtggacgcgg tcaagtactt tgggaagggc

481 acctacaccg actgcgctat caagaagggg ctggagcagc tcctcgtggg gggctcccac

541 ctgaaggaga ataagtacct gattgtggtg accgacgggc accccctgga gggctacaag

601 gaaccctgtg gggggctgga ggatgctgtg aacgaggcca agcacctggg cgtcaaagtc

661 ttctcggtgg ccatcacacc cgaccacctg gagccgcgtc tgagcatcat cgccacggac

721 cacacgtacc ggcgcaactt cacggcggct gactggggcc agagccgcga cgcagaggag

781 gccatcagcc agaccatcga caccatcgtg gacatgatca aaaataacgt ggagcaagtg

841 tgctgctcct tcgaatgcca gcctgcaaga ggacctccgg ggctccgggg cgaccccggc

901 tttgagggag aacgaggcaa gccggggctc ccaggagaga agggagaagc cggagatcct

961 ggaagacccg gggacctcgg acctgttggg taccagggaa tgaagggaga aaaagggagc

1021 cgtggggaga agggctccag gggacccaag ggctacaagg gagagaaggg caagcgtggc

1081 atcgacgggg tggacggcgt gaagggggag atggggtacc caggcctgcc aggctgcaag

1141 ggctcgcccg ggtttgacgg cattcaagga ccccctggcc ccaagggaga ccccggtgcc 1201 tttggactga aaggagaaaa gggcgagcct ggagctgacg gggaggcggg gagaccaggg

1261 agctcgggac catctggaga cgagggccag ccgggagagc ctgggccccc cggagagaaa

1321 ggagaggcgg gcgacgaggg gaacccagga cctgacggtg cccccgggga gcggggtggc

1381 cctggagaga gaggaccacg ggggacccca ggcacgcggg gaccaagagg agaccctggt

1441 gaagctggcc cgcagggtga tcagggaaga gaaggccccg ttggtgtccc tggagacccg

1501 ggcgaggctg gccctatcgg acctaaaggc taccgaggcg atgagggtcc cccagggtcc

1561 gagggtgcca gaggagcccc aggacctgcc ggaccccctg gagacccggg gctgatgggt

1621 gaaaggggag aagacggccc cgctggaaat ggcaccgagg gcttccccgg cttccccggg

1681 tatccgggca acaggggcgc tcccgggata aacggcacga agggctaccc cggcctcaag

1741 ggggacgagg gagaagccgg ggaccccgga gacgataaca acgacattgc accccgagga

1801 gtcaaaggag caaaggggta ccggggtccc gagggccccc agggaccccc aggacaccaa

1861 ggaccgcctg ggccggacga atgcgagatt ttggacatca tcatgaaaat gtgctcttgc

1921 tgtgaatgca agtgcggccc catcgacctc ctgttcgtgc tggacagctc agagagcatt

1981 ggcctgcaga acttcgagat tgccaaggac ttcgtcgtca aggtcatcga ccggctgagc

2041 cgggacgagc tggtcaagtt cgagccaggg cagtcgtacg cgggtgtggt gcagtacagc

2101 cacagccaga tgcaggagca cgtgagcctg cgcagcccca gcatccggaa cgtgcaggag

2161 ctcaaggaag ccatcaagag cctgcagtgg atggcgggcg gcaccttcac gggggaggcc

2221 ctgcagtaca cgcgggacca gctgctgccg cccagcccga acaaccgcat cgccctggtc

2281 atcactgacg ggcgctcaga cactcagagg gacaccacac cgctcaacgt gctctgcagc

2341 cccggcatcc aggtggtctc cgtgggcatc aaagacgtgt ttgacttcat cccaggctca

2401 gaccagctca atgtcatttc ttgccaaggc ctggcaccat cccagggccg gcccggcctc

2461 tcgctggtca aggagaacta tgcagagctg ctggaggatg ccttcctgaa gaatgtcacc

2521 gcccagatct gcatagacaa gaagtgtcca gattacacct gccccatcac gttctcctcc

2581 ccggctgaca tcaccatcct gctggacggc tccgccagcg tgggcagcca caactttgac

2641 accaccaagc gcttcgccaa gcgcctggcc gagcgcttcc tcacagcggg caggacggac

2701 cccgcccacg acgtgcgggt ggcggtggtg cagtacagcg gcacgggcca gcagcgccca

2761 gagcgggcgt cgctgcagtt cctgcagaac tacacggccc tggccagtgc cgtcgatgcc

2821 atggacttta tcaacgacgc caccgacgtc aacgatgccc tgggctatgt gacccgcttc

2881 taccgcgagg cctcgtccgg cgctgccaag aagaggctgc tgctcttctc agatggcaac

2941 tcgcagggcg ccacgcccgc tgccatcgag aaggccgtgc aggaagccca gcgggcaggc 3001 atcgagatct tcgtggtggt cgtgggccgc caggtgaatg agccccacat ccgcgtcctg

3061 gtcaccggca agacggccga gtacgacgtg gcctacggcg agagccacct gttccgtgtc

3121 cccagctacc aggccctgct ccgcggtgtc ttccaccaga cagtctccag gaaggtggcg

3181 ctgggctagc ccaccctgca cgccggcacc aaaccctgtc ctcccacccc tccccactca

3241 tcactaaaca gagtaaaatg tgatgcgaat tttcccgacc aacctgattc gctagatttt

3301 ttttaaggaa aagcttggaa agccaggaca caacgctgct gcctgctttg tgcagggtcc

3361 tccggggctc agccctgagt tggcatcacc tgcgcagggc cctctggggc tcagccctga

3421 gctagtgtca cctgcacagg gccctctgag gctcagccct gagctggcgt cacctgtgca

3481 gggccctctg gggctcagcc ctgagctggc ctcacctggg ttccccaccc cgggctctcc

3541 tgccctgccc tcctgcccgc cctccctcct gcctgcgcag ctccttccct aggcacctct

3601 gtgctgcatc ccaccagcct gagcaagacg ccctctcggg gcctgtgccg cactagcctc

3661 cctctcctct gtccccatag ctggtttttc ccaccaatcc tcacctaaca gttactttac

3721 aattaaactc aaagcaagct cttctcctca gcttggggca gccattggcc tctgtctcgt

3781 tttgggaaac caaggtcagg aggccgttgc agacataaat ctcggcgact cggccccgtc

3841 tcctgagggt cctgctggtg accggcctgg accttggccc tacagccctg gaggccgctg

3901 ctgaccagca ctgaccccga cctcagagag tactcgcagg ggcgctggct gcactcaaga

3961 ccctcgagat taacggtgct aaccccgtct gctcctccct cccgcagaga ctggggcctg

4021 gactggacat gagagcccct tggtgccaca gagggctgtg tcttactaga aacaacgcaa

4081 acctctcctt cctcagaata gtgatgtgtt cgacgtttta tcaaaggccc cctttctatg

4141 ttcatgttag ttttgctcct tctgtgtttt tttctgaacc atatccatgt tgctgacttt

4201 tccaaataaa ggttttcact cctctaaaaa aaaaaaaaaa aaaaaa

Protein sequence:

NCBI Reference Sequence: NP

LOCUS NP_001839

ACCESSION NP 001839

1 mraarallpl llqacwtaaq depetprava fqdcpvdlff vldtsesval r lkpygalvd

61 kvksftkrfi dnlrdryyrc drnlvwnaga Ihysdeveii qgltrmpggr dalkssvdav

121 kyfgkgtytd caikkgleql lvggshlken kylivvtdgh plegykepcg gledavneak 181 hlgvkvfsva itpdhleprl siiatdhtyr rnftaadwgq srdaeeaisq tidtivdmik

241 nnveqvccsf ecqpargppg lrgdpgfege rgkpglpgek geagdpgrpg dlgpvgyqgm

301 kgekgsrgek gsrgpkgykg ekgkrgidgv dgvkgemgyp glpgckgspg fdgiqgppgp

361 kgdpgafglk gekgepgadg eagrpgssgp sgdegqpgep gppgekgeag degnpgpdga

421 pgerggpger gprgtpgtrg prgdpgeagp qgdqgregpv gvpgdpgeag pigpkgyrgd

481 egppgsegar gapgpagppg dpglmgerge dgpagngteg fpgfpgypgn rgapgingtk

541 gypglkgdeg eagdpgddnn diaprgvkga kgyrgpegpq gppghqgppg pdeceildii

601 mkmcscceck cgpidllfvl dssesiglqn feiakdfvvk vidrlsrdel vkfepgqsya

661 gvvqyshsqm qehvslrsps irnvqelkea ikslqwmagg tftgealqyt rdqllppspn

721 nrialvitdg rsdtqrdttp Invlcspgiq vvsvgikdvf dfipgsdqln viscqglaps

781 qgrpglslvk enyaelleda flknvtaqic idkkcpdytc pitfsspadi tilldgsasv

841 gshnfdttkr fakrlaerfl tagrtdpahd vrvavvqysg tgqqrperas lqflqnytal

901 asavdamdfi ndatdvndal gyvtrfyrea ssgaakkr11 Ifsdgnsqga tpaaiekavq

961 eaqragieif vvvvgrqvne phirvlvtgk taeydvayge shlfrvpsyq allrgvfhqt

1021 vsrkvalg

CRTAP

Official Symbol: CRTAP

Official Name: cartilage associated protein

Gene ID: 10491

Organism: Homo sapiens

Other Aliases: CASP, LEPREL3, OI7

Other Designations: cartilage-associated protein; leprecan-like 3

Nucleotide sequence:

NCBI Reference Sequence: NM_006371 .4

LOCUS NM_006371

ACCESSION NM_006371

1 aggctggcgt ccccgccccg aaagcactgg gcccgccgcg tcgcaccgtc ctctttcctt

61 tccttctccc tccccttttc ccttccttcg tcccttcctt ccttcctttc gccgggcgcg 121 atggagccgg ggcgccgggg ggccgcggcg ctgctagcgc tgctgtgcgt ggcctgcgcg

181 ctgcgcgccg ggcgcgccca atacgaacgc tacagcttcc gcagcttccc acgggacgag

241 ctgatgccgc tcgagtcggc ctaccggcac gcgctggaca agtacagcgg cgagcactgg

301 gccgagagcg tgggctacct ggagatcagc ctgcggctgc accgcttgct gcgcgacagc

361 gaggccttct gccaccgcaa ctgcagcgcc gcgccgcagc ccgagcccgc cgccggcctc

421 gccagctatc ccgagctgcg cctcttcggg ggcctgctgc gccgcgcgca ctgcctcaag

481 cgctgcaagc agggcctgcc agccttccgc cagtcccagc ccagccgcga ggtgctggcg

541 gacttccagc gccgcgagcc ctacaagttc ctgcagttcg cttacttcaa ggcaaataat

601 ctccccaaag ccatcgccgc tgctcacacc tttctactga agcatcctga tgacgaaatg

661 atgaagagga acatggcata ttataagagc ctgcctggtg ccgaggacta cattaaagac

721 ctggaaacca agtcatatga aagcctgttc atccgagcag tgcgggcata caacggtgag

781 aactggagaa catccatcac agacatggag ctggcccttc ccgacttctt caaagccttt

841 tacgagtgtc tcgcagcctg cgagggttcc agggagatca aggacttcaa ggatttctac

901 ctttccatag cagatcatta tgtagaagtt ctggaatgca aaatacagtg tgaagagaac

961 ctcaccccag ttataggagg ctatccggtt gagaaatttg tggctaccat gtatcattac

1021 ttgcagtttg cctattataa gttgaacgac ctgaagaatg cagccccctg tgcagtcagc

1081 tatctgctct ttgatcagaa tgacaaggtc atgcagcaga acctggtgta ttaccagtac

1141 cacagggaca cttggggcct ctcggatgag cacttccagc ccagacctga agcagttcag

1201 ttctttaatg tgaccacact ccagaaggag ctgtatgact ttgctaagga aaatataatg

1261 gatgatgatg agggagaagt tgtggaatat gtggatgacc tcttggaact ggaggagacc

1321 agctagccca cagcaaccaa agagacttcc tcttggcgtt caggaaacac agattctttg

1381 tccttttccc aacagcccag gctgttgata cctcagagcc ttctctttac tctccaaagt

1441 gaaagggaag cccccgtctc tctaactgca tgtcatcagg ggtgagcctg cctttcctat

1501 cttcacacct gccacctcat gttcacacct atctttctca cctttttttt gagatggagt

1561 ctcgctctct tgcccaggct ggagtgcaat ggcacgttct cagctcactg caacctccgc

1621 ctcttgggtt caagcaattc tgctgcatca gcctcccgag tacctgggat tacaggcatg

1681 tgccaccacg cccggctaat tttgtatttt tagtagagac ggggttttgc catgttggcc

1741 aggctggtct cgaactcttg acttcagatg atccatctgc cttggcctcc cacagtgctg

1801 ggattacagg cgtgagccac catgcccggc ctctttctca cctttacacc tgtcttctta

1861 tcctcacatc tgttttcaca ccttcatccc tgtcttcctc atgttcacac ttgtcttccc 1921 catgttcata gctgcctttc ttaccatttt ggtttgaagg gcagtcttct ctggcttgtt

1981 tttttgtttt tcccagaaaa tcagtattat tttttaaata agaaaaacat tcctagaaga

2041 tgataattgt gaaaacctcc tttggcttat ttgcttttcc agattttagt ctcctttctc

2101 cccatccggg aaagatggtg gaagacatag gctaaatttc tccagcctca caatggtctt

2161 cacttggtct gacttgtacc aattctagca cccactgaaa aacaagttga gtagagagtg

2221 tagagtgcag aaatgtggct tttgccccac tttgcatctc caaaattaca acggttggcc

2281 gatcccattt gaggacaatg cttagttata agtctccgag ttggaaaagg aagaaagcca

2341 gagctgtcta gtttcattca ttctttcagt aaatatttat tgagtaccta ctgtgtgcta

2401 ggcattgacc tgggaactag agatacttca cagaataaca gggaaagttc cctgtgctca

2461 tggagcttac attctacagg gagaaagaga tagccaatac ataggaataa atatatacaa

2521 ggtatcatgt agtgataatt gctgtggaga aaaataaagc aggggaggga gtaagaaatc

2581 ctggagatga ggctgcagtt ttaaatgggg cctcactggg aatgtgacgt tgagcagaga

2641 cgttagggaa gtggatcctg gacaaggcat tccaggcaga ggaacaggat gtgcactgcc

2701 ccaaagtgag aacttgctct acgtggtcag gaaagagcag ggagaccaag cagagtcgtg

2761 ggcaggggta gaatggaagg agaggcggct ggggaggaca ggtggtggag ggccttggct

2821 tctgctaagt gagatgggaa ccactggagg gtttgaacag aggagtgcct tgattgattt

2881 atattttgca agggtcattc tagctgccat attgtgaaaa actttagtgg acaagggcag

2941 aaggaagagg gaagacctgt taggaagcta ctgcaaggtt ccaggcttgg gcctgggcca

3001 cagcaacagc agtggtcaaa tatctagatt tattttgaaa agagccaata ggatttgctg

3061 agagtttgaa tgtggagtgt aagagaagga agagttaatg atgacattaa ggtttttggc

3121 ctgaatagca ggaaagatgg agttaccagt tactgaaata gggaaggatg ggctgggtaa

3181 gtatggaatt tggtgcaaag caggctgtct gtggttggaa tgggaggttc tggctgcaaa

3241 tcaaagtgga gagttctctc aggtcaggtc tgcagcagag ctcgagacag ggatctgaat

3301 gcacttggtt tattgttggg ggtgctctca gaaggaacct gtgaaagcct ttatcagtca

3361 tttattggct gtgagaagtt ctctgggagt gtgggtacat ttgaaggcaa gtgacttcag

3421 ttgagggcaa gtctctggaa aagaggctgt aggcatctgg cagctaccat gcatggtagt

3481 gtgttggggg tgggggtcct gggcactggc tgtgtgaagg gatctggcag ggcaccacag

3541 cgccccctac tgaaccatca gcatgtcagt ggcatttaaa gccatgcagc tggaggggcc

3601 actgagattg tctctgagta ttactgagaa gcaacagaaa agagccatgg atggagccct

3661 tgggctctct gggaaatggg aaatcagcca aaggactgag aaggagttac cttaaggtca 3721 gagaaaacca agagagtgtg gtgttctgga agctgagctt tctttattca acctcattcc

3781 cttctccaaa taagccactt gtgtagttgg gcccctccag ggttgaaggc aagaggagaa

3841 aggcacagcg tttgggaaac aagacttttc ctgcaatagc ctgggaagga ataaaaggat

3901 agagtgtttg ggtttttgtg taatggtggt taattggggt ggaacactca cacgttgtgc

3961 tttttctggg cttcccttat cccccagaac actctaccaa cctcggggaa ctcgggcaca

4021 tccttctgtt tctccttcag ctctatcctg ctttcctcat cccttctgac accacgtcct

4081 cactcacctg cacaagaatc cctgcatcag gttctccttt gagggtaccc acccaggaca

4141 gtcccctacc acttctgtct tgggctgaag ttgcccacgt ccacaaaatc tgtactccca

4201 gcgggggtgt ttggcccgag gagtcagtgt tattactggt ggatgcaccg tgtccacagc

4261 agcccccaat cccagcgatg cgtcagatct tacgtggctt cctgctgggg gagatggcct

4321 tcacccacgg gatgccgggt tctcctttct ttcctcaccc caacctttac tccaccagag

4381 aaacttcctt ttgaactcag tggggaagag ggtgatgaga caggactaga aagtagtggg

4441 ggacccagcg agtggacgcc ctgctccggg attcctgagt ctgtaaatag tgtgcccagc

4501 agctgtgaac tccccttata gcctcaggct gcagtgtcct tcccagctgt gtgagaaaat

4561 gaaagccgac gtccacaggg acccaggcag ggttgggtgt tgtgactcac tccacctctg

4621 tgccctgcag aggtactgtt gggtccttgt cttgtgagcc tggggtgagc tctctgtaca

4681 tgttgttgtt ccacgtatgg gttgacttgg catgctgggg ggtcctcgtt cactctctga

4741 agttggcctc ctttcactgg ggattgaaaa gcacctccac ccctacccta gtgatgtccc

4801 ctgaggaccc gggtgatagt acagtcaata ttgtcagtac tttgctttga ttgaaggctg

4861 tagagctgag ttaccaaaat ttctatttca aaggaaacca aaccttaaaa aaaaaaaaca

4921 aaaactgggc tgggtcttcc aaacctacca tgaaaccctg gtgtgcaggc tgcactcaat

4981 gacctcaacc caacacctcc ctgagtgtgc ttcttggaag agcctagaag attcctggat

5041 ggagacccca ttggttcagc ctcaagtctg gcccgtcttc gaaaaaacaa acacatttgt

5101 aagctttgtg ggagcttcca ggcctgctct aagatgcctt gcttgtcctt tgacccatca

5161 gcatggagct cagtggttgc tgtttggttc tgcaggctgg tggggaggcc gcccatcgtg

5221 gtggggcatc tgtccagccc cattgccact cagggcatcc aaacaggagg cacccgctgg

5281 gaagggtcta aagatactcc ttgtggccac tgctactgtt cacacttgac ttgtggagaa

5341 gcgaagggct gaggggaggt ttgtgtacac ccatgtattt aaaagtgact gactgactga

5401 aatgagcaca taccgacata tgcaacatac taataccttc ctgattttcg agactttcta

5461 attactacaa ctaacctgtt gtgctcacct ctggaattca gaaagagagc cactgcgagc 5521 actgaccaca agggctgcct taggaaggaa atgtgtgctt tcaggagttc ttattagggg

5581 aattttaaga gcacaaaaat tttattgaac ccaccttagt gacaaacaga aaatgatttg

5641 ttagattgtc agttggaagg ggtttattat gactgctggt gaaataaaca gtgaccagtt

5701 tctgccaaca tttatggaaa taacgtttct aggttttaaa tgtgagccgt aaactgaagc

5761 tacttcagtt aaaaaaaaaa aaatcaatac ttaaactgta gggaaaaggt ggattggtgc

5821 cagagaaaac attatttaat agtgtcagaa catggaactg caacacagtt tgtagcaagg

5881 cactgagaaa aacccacaac taatccttca tcgcagctgt ctgaactctt gatcatctgc

5941 catcccccaa acagtgactt ctttttttct ggtgactcca ggcctgaatg acctagtgtg

6001 gagagtttaa actatgtaca agggaggaaa gaaaaaaagg aaaggaacct taagtaagcc

6061 tcagccaaag gttttatgca ttggacttcc tgtgcctgcc cccaggggca agactgatcc

6121 ccatgcctgt gcccatgaca tctccctgaa agaggacacc atgacagccc ggctttgcct

6181 tgactgaccc actgctaccc cagaaataag aatcaagagc agctattgtt atccttagag

6241 tgttttccgt ctagggccgg ctcgtgaaca gccacatatc cttgcacctg acactgtccc

6301 cacacaaata gctggctttc gttgcttgtt gaatgaatga gtgagttggc tctatatccc

6361 cttggagctg gccggtaaga tattagtgcc tcattttaca agaagagaat aggaaggaat

6421 taagcaattt ggccaacaga tacaagatag attccagagt tttatctccc actttagggt

6481 ggcagccagt aggccaaact ccaaagaccg ttgctgatgt ctttttctgc ctcccctctt

6541 tgggttagtg tggtatgtac aagctcactc ttgttgaaaa ttagaaaata gttgaaaaca

6601 aaaggttttt gtttttcttt ttaaatcaca ttaaatgttt tacattgctt aaaaaaaaaa

6661 aaaaaaaa

Protein sequence:

NCBI Reference Sequence: NP_006362.1

LOCUS NP_006362

ACCESSION NP_006362

1 mepgrrgaaa llallcvaca lragraqyer ysfrsfprde lmplesayrh aldkysgehw

61 aesvgyleis Irlhrllrds eafchrncsa apqpepaagl asypelrlfg gllrrahclk

121 rckqglpafr qsqpsrevla dfqrrepykf lqfayfkann lpkaiaaaht f llkhpddem

181 mkrnmayyks lpgaedyikd letksyeslf iravraynge nwrtsitdme lalpdffkaf

241 yeclaacegs reikdfkdfy lsiadhyvev leckiqceen ltpviggypv ekfvatmyhy 301 lqfayyklnd lknaapcavs yllfdqndkv mqqnlvyyqy hrdtwglsde hfqprpeavq

361 ffnvttlqke lydfakenim dddegevvey vddlleleet s

SERPH

Official Symbol: SERPINH1

Official Name: serpin peptidase inhibitor, clade H (heat shock protein 47), member 1 , (collagen binding protein 1 )

Gene ID: 871

Organism: Homo sapiens

Other Aliases: PIG14, AsTP3, CBP1 , CBP2, HSP47, 0110, PPROM, RA-A47, SERPINH2, gp46

Other Designations: 47 kDa heat shock protein; arsenic-transactivated protein 3; cell proliferation-inducing gene 14 protein; colligin-1 ; colligin-2; rheumatoid arthritis antigen A-47; rheumatoid arthritis-related antigen RA-A47; serine (or cysteine) proteinase inhibitor, clade H (heat shock protein 47), member 1 , (collagen binding protein 1 ); serine (or cysteine) proteinase inhibitor, clade H (heat shock protein 47), member 2, (collagen-binding protein 2); serpin H1

Nucleotide sequence:

NCBI Reference Sequence: NM_001207014.1

LOCUS NM 001207014

ACCESSION NM 001207014

1 agtaggaccc aggggccggg a> gcgccggc agagggaggg gccgggggcc ggggaggttt

61 tgagggaggt ctttggcttt ttttggcgga gctggggcgc cctccggaag cgtttccaac

121 tttccagaag tttctcggga cgggcaggag ggggtgggga ctgccatata tagatcccgg

181 gagcagggga gcgggctaag agtagaatcg tgtcgcggct cgagagcgag agtcacgtcc

241 cggcgctagc ccagcccgac ccagaatgaa aaaggcaggc attgacctcc ctctgaggca

301 gtttccaggc ccaccgtggt gcacgcaaac cacttcctgg ccatgcgctc cctcctgctt

361 ctcagcgcct tctgcctcct ggaggcggcc ctggccgccg aggtgaagaa acctgcagcc

421 gcagcagctc ctggcactgc ggagaagttg agccccaagg cggccacgct tgccgagcgc

481 agcgccggcc tggccttcag cttgtaccag gccatggcca aggaccaggc agtggagaac

541 atcctggtgt cacccgtggt ggtggcctcg tcgctagggc tcgtgtcgct gggcggcaag

601 gcgaccacgg cgtcgcaggc caaggcagtg ctgagcgccg agcagctgcg cgacgaggag 661 gtgcacgccg gcctgggcga gctgctgcgc tcactcagca actccacggc gcgcaacgtg

721 acctggaagc tgggcagccg actgtacgga cccagctcag tgagcttcgc tgatgacttc

781 gtgcgcagca gcaagcagca ctacaactgc gagcactcca agatcaactt ccgcgacaag

841 cgcagcgcgc tgcagtccat caacgagtgg gccgcgcaga ccaccgacgg caagctgccc

901 gaggtcacca aggacgtgga gcgcacggac ggcgccctgc tagtcaacgc catgttcttc

961 aagccacact gggatgagaa attccaccac aagatggtgg acaaccgtgg cttcatggtg

1021 actcggtcct ataccgtggg tgtcatgatg atgcaccgga caggcctcta caactactac

1081 gacgacgaga aggaaaagct gcaaatcgtg gagatgcccc tggcccacaa gctctccagc

1141 ctcatcatcc tcatgcccca tcacgtggag cctctcgagc gccttgaaaa gctgctaacc

1201 aaagagcagc tgaagatctg gatggggaag atgcagaaga aggctgttgc catctccttg

1261 cccaagggtg tggtggaggt gacccatgac ctgcagaaac acctggctgg gctgggcctg

1321 actgaggcca ttgacaagaa caaggccgac ttgtcacgca tgtcaggcaa gaaggacctg

1381 tacctggcca gcgtgttcca cgccaccgcc tttgagttgg acacagatgg caaccccttt

1441 gaccaggaca tctacgggcg cgaggagctg cgcagcccca agctgttcta cgccgaccac

1501 cccttcatct tcctagtgcg ggacacccaa agcggctccc tgctattcat tgggcgcctg

1561 gtccggccta agggtgacaa gatgcgagac gagttatagg gcctcagggt gcacacagga

1621 tggcaggagg catccaaagg ctcctgagac acatgggtgc tattggggtt gggggggagg

1681 tgaggtacca gccttggata ctccatgggg tgggggtgga aaaacagacc ggggttcccg

1741 tgtgcctgag cggaccttcc cagctagaat tcactccact tggacatggg ccccagatac

1801 catgatgctg agcccggaaa ctccacatcc tgtgggacct gggccatagt cattctgcct

1861 gccctgaaag tcccagatca agcctgcctc aatcagtatt catatttata gccaggtacc

1921 ttctcacctg tgagaccaaa ttgagctagg ggggtcagcc agccctcttc tgacactaaa

1981 acacctcagc tgcctcccca gctctatccc aacctctccc aactataaaa ctaggtgctg

2041 cagcccctgg gaccaggcac ccccagaatg acctggccgc agtgaggcgg attgagaagg

2101 agctcccagg aggggcttct gggcagactc tggtcaagaa gcatcgtgtc tggcgttgtg

2161 gggatgaact ttttgttttg tttcttcctt ttttagttct tcaaagatag ggagggaagg

2221 gggaacatga gcctttgttg ctatcaatcc aagaacttat ttgtacattt tttttttcaa

2281 taaaactttt ccaatgacat tttgttggag cgtggaagaa aaaaaaaaaa aaa

Protein sequence:

NCBI Reference Sequence: NP_001 193943.1 LOCUS NP_001 193943

ACCESSION NP_001 193943

1 mrsllllsaf clleaalaae vkkpaaaaap gtaeklspka atlaersagl afslyqamak

61 dqavenilvs pvvvasslgl vslggkatta sqakavlsae qlrdeevhag lgellrslsn

121 starnvtwkl gsrlygpssv sfaddfvrss kqhyncehsk infrdkrsal qsinewaaqt

181 tdgklpevtk dvertdgall vnamffkphw dekfhhkmvd nrgfmvtrsy tvgvmmmhrt

241 glynyyddek eklqivempl ahklssliil mphhvepler leklltkeql kiwmgkmqkk

301 avaislpkgv vevthdlqkh laglglteai dknkadlsrm sgkkdlylas vfhatafeld

361 tdgnpfdqdi ygreelrspk lfyadhpfif lvrdtqsgsl lfigrlvrpk gdkmrdel

ITB1

Official Symbol: ITGB1

Official Name: integrin, beta 1 (fibronectin receptor, beta polypeptide, antigen CD29 includes MDF2, MSK12)

Gene ID: 3688

Organism: Homo sapiens

Other Aliases: RP1 1 -479G22.2, CD29, FNRB, GPIIA, MDF2, MSK12, VLA- BETA, VLAB

Other Designations: integrin VLA-4 beta subunit; integrin beta-1 ; very late activation protein, beta polypeptide

Nucleotide sequence:

NCBI Reference Sequence: NM_00221 1 .3

LOCUS NM_00221 1

ACCESSION NM_00221 1

1 atcagacgcg cagaggaggc ggggccgcgg ctggtttcct gccggggggc ggctctgggc

61 cgccgagtcc cctcctcccg cccctgagga ggaggagccg ccgccacccg ccgcgcccga

121 cacccgggag gccccgccag cccgcgggag aggcccagcg ggagtcgcgg aacagcaggc

181 ccgagcccac cgcgccgggc cccggacgcc gcgcggaaaa gatgaattta caaccaattt

241 tctggattgg actgatcagt tcagtttgct gtgtgtttgc tcaaacagat gaaaatagat

301 gtttaaaagc aaatgccaaa tcatgtggag aatgtataca agcagggcca aattgtgggt 361 ggtgcacaaa ttcaacattt ttacaggaag gaatgcctac ttctgcacga tgtgatgatt

421 tagaagcctt aaaaaagaag ggttgccctc cagatgacat agaaaatccc agaggctcca

481 aagatataaa gaaaaataaa aatgtaacca accgtagcaa aggaacagca gagaagctca

541 agccagagga tattactcag atccaaccac agcagttggt tttgcgatta agatcagggg

601 agccacagac atttacatta aaattcaaga gagctgaaga ctatcccatt gacctctact

661 accttatgga cctgtcttac tcaatgaaag acgatttgga gaatgtaaaa agtcttggaa

721 cagatctgat gaatgaaatg aggaggatta cttcggactt cagaattgga tttggctcat

781 ttgtggaaaa gactgtgatg ccttacatta gcacaacacc agctaagctc aggaaccctt

841 gcacaagtga acagaactgc accagcccat ttagctacaa aaatgtgctc agtcttacta

901 ataaaggaga agtatttaat gaacttgttg gaaaacagcg catatctgga aatttggatt

961 ctccagaagg tggtttcgat gccatcatgc aagttgcagt ttgtggatca ctgattggct

1021 ggaggaatgt tacacggctg ctggtgtttt ccacagatgc cgggtttcac tttgctggag

1081 atgggaaact tggtggcatt gttttaccaa atgatggaca atgtcacctg gaaaataata

1141 tgtacacaat gagccattat tatgattatc cttctattgc tcaccttgtc cagaaactga

1201 gtgaaaataa tattcagaca atttttgcag ttactgaaga atttcagcct gtttacaagg

1261 agctgaaaaa cttgatccct aagtcagcag taggaacatt atctgcaaat tctagcaatg

1321 taattcagtt gatcattgat gcatacaatt ccctttcctc agaagtcatt ttggaaaacg

1381 gcaaattgtc agaaggcgta acaataagtt acaaatctta ctgcaagaac ggggtgaatg

1441 gaacagggga aaatggaaga aaatgttcca atatttccat tggagatgag gttcaatttg

1501 aaattagcat aacttcaaat aagtgtccaa aaaaggattc tgacagcttt aaaattaggc

1561 ctctgggctt tacggaggaa gtagaggtta ttcttcagta catctgtgaa tgtgaatgcc

1621 aaagcgaagg catccctgaa agtcccaagt gtcatgaagg aaatgggaca tttgagtgtg

1681 gcgcgtgcag gtgcaatgaa gggcgtgttg gtagacattg tgaatgcagc acagatgaag

1741 ttaacagtga agacatggat gcttactgca ggaaagaaaa cagttcagaa atctgcagta

1801 acaatggaga gtgcgtctgc ggacagtgtg tttgtaggaa gagggataat acaaatgaaa

1861 tttattctgg caaattctgc gagtgtgata atttcaactg tgatagatcc aatggcttaa

1921 tttgtggagg aaatggtgtt tgcaagtgtc gtgtgtgtga gtgcaacccc aactacactg

1981 gcagtgcatg tgactgttct ttggatacta gtacttgtga agccagcaac ggacagatct

2041 gcaatggccg gggcatctgc gagtgtggtg tctgtaagtg tacagatccg aagtttcaag

2101 ggcaaacgtg tgagatgtgt cagacctgcc ttggtgtctg tgctgagcat aaagaatgtg 2161 ttcagtgcag agccttcaat aaaggagaaa agaaagacac atgcacacag gaatgttcct

2221 attttaacat taccaaggta gaaagtcggg acaaattacc ccagccggtc caacctgatc

2281 ctgtgtccca ttgtaaggag aaggatgttg acgactgttg gttctatttt acgtattcag

2341 tgaatgggaa caacgaggtc atggttcatg ttgtggagaa tccagagtgt cccactggtc

2401 cagacatcat tccaattgta gctggtgtgg ttgctggaat tgttcttatt ggccttgcat

2461 tactgctgat atggaagctt ttaatgataa ttcatgacag aagggagttt gctaaatttg

2521 aaaaggagaa aatgaatgcc aaatgggaca cgggtgaaaa tcctatttat aagagtgccg

2581 taacaactgt ggtcaatccg aagtatgagg gaaaatgagt actgcccgtg caaatcccac

2641 aacactgaat gcaaagtagc aatttccata gtcacagtta ggtagcttta gggcaatatt

2701 gccatggttt tactcatgtg caggttttga aaatgtacaa tatgtataat ttttaaaatg

2761 ttttattatt ttgaaaataa tgttgtaatt catgccaggg actgacaaaa gacttgagac

2821 aggatggtta ctcttgtcag ctaaggtcac attgtgcctt tttgaccttt tcttcctgga

2881 ctattgaaat caagcttatt ggattaagtg atatttctat agcgattgaa agggcaatag

2941 ttaaagtaat gagcatgatg agagtttctg ttaatcatgt attaaaactg atttttagct

3001 ttacaaatat gtcagtttgc agttatgcag aatccaaagt aaatgtcctg ctagctagtt

3061 aaggattgtt ttaaatctgt tattttgcta tttgcctgtt agacatgact gatgacatat

3121 ctgaaagaca agtatgttga gagttgctgg tgtaaaatac gtttgaaata gttgatctac

3181 aaaggccatg ggaaaaattc agagagttag gaaggaaaaa ccaatagctt taaaacctgt

3241 gtgccatttt aagagttact taatgtttgg taacttttat gccttcactt tacaaattca

3301 agccttagat aaaagaaccg agcaattttc tgctaaaaag tccttgattt agcactattt

3361 acatacaggc catactttac aaagtatttg ctgaatgggg accttttgag ttgaatttat

3421 tttattattt ttattttgtt taatgtctgg tgctttctgt cacctcttct aatcttttaa

3481 tgtatttgtt tgcaattttg gggtaagact ttttttatga gtactttttc tttgaagttt

3541 tagcggtcaa tttgcctttt taatgaacat gtgaagttat actgtggcta tgcaacagct

3601 ctcacctacg cgagtcttac tttgagttag tgccataaca gaccactgta tgtttacttc

3661 tcaccatttg agttgcccat cttgtttcac actagtcaca ttcttgtttt aagtgccttt

3721 agttttaaca gttcactttt tacagtgcta tttactgaag ttatttatta aatatgccta

3781 aaatacttaa atcggatgtc ttgactctga tgtattttat caggttgtgt gcatgaaatt

3841 tttatagatt aaagaagttg aggaaaagca aaaaaaaaa

Protein sequence: NCBI Reference Sequence: NP_002202.2

LOCUS NP_002202

ACCESSION NP_002202

1 mnlqpifwig lissvccvfa qtdenrclka nakscgeciq agpncgwctn stf lqegmpt

61 sarcddleal kkkgcppddi enprgskdik knknvtnr sk gtaeklkped itqiqpqqlv

121 lrlrsgepqt ftlkfkraed ypidlyylmd lsysmkddle nvkslgtdlm nemrr itsdf

181 rigfgsfvek tvmpyisttp aklrnpctse qnctspfsyk nvlsltnkge vfnelvgkqr

241 isgnldspeg gfdaimqvav cgsligwrnv trllvf stda gfhfagdgkl ggivlpndgq

301 chlennmytm shyydypsia hlvqklsenn iqtifavtee fqpvykelkn lipksavgtl

361 sanssnviql iidaynslss evilengkls egvtisyksy ckngvngtge ngrkcsnisi

421 gdevqfeisi tsnkcpkkds dsfkirplgf teevevilqy icececqseg ipespkcheg

481 ngtfecgacr cnegrvgrhc ecstdevnse dmdaycrken sseicsnnge cvcgqcvcrk

541 rdntneiysg kfcecdnfnc dr snglicgg ngvckcrvce cnpnytgsac dcsldtstce

601 asngqicngr gicecgvckc tdpkfqgqtc emcqtclgvc aehkecvqcr afnkgekkdt

661 ctqecsyfni tkvesrdklp qpvqpdpvsh ckekdvddcw fyftysvngn nevmvhvven

721 pecptgpdii pivagvvagi vliglallli wkllmiihdr refakfekek mnakwdtgen

781 piyksavttv vnpkyegk

FKB10

Official Symbol: FKBP10

Official Name: FK506 binding protein 10, 65 kDa

Gene ID: 60681

Organism: Homo sapiens

Other Aliases: PSEC0056, FKBP65, 011 1 , OI6, PPIASE, hFKBP65

Other Designations: 65 kDa FK506-binding protein; 65 kDa FKBP; FK506- binding protein 10; FKBP-10; FKBP-65; PPIase FKBP10; immunophilin FKBP65; peptidyl-prolyl cis-trans isomerase FKBP10; rotamase

Nucleotide sequence:

NCBI Reference Sequence: NM_021939.3

LOCUS NM_021939

ACCESSION NM 021939 1 cccgagcctc tctccctggc caggccccag gtctcgcagc cagggatgga gatgggggga

61 gggggaacct agagttcttt gtagtgcctc cctcagactc taacacactc agcctggccc

121 cctcctccta ttgcaacccc ctcccccgct cctcccggcc aggccagctc agtcttccca

181 gcccccattc cacgtggacc agccagggcg ggggtaggga aagaggacag gaagaggggg

241 agccagttct gggaggcggg gggaaggagg ttggtggcga ctccctcgct cgccctcact

301 gccggcggtc ccaactccag gcaccatgtt ccccgcgggc ccccccagcc acagcctcct

361 ccggctcccc ctgctgcagt tgctgctact ggtggtgcag gccgtgggga gggggctggg

421 ccgcgccagc ccggccgggg gccccctgga agatgtggtc atcgagaggt accacatccc

481 cagggcctgt ccccgggaag tgcagatggg ggattttgtg cgctaccact acaacggcac

541 ttttgaagat ggcaagaagt ttgattcaag ctatgatcgc aacaccttgg tggccatcgt

601 ggtgggtgtg gggcgcctca tcactggcat ggaccgaggc ctcatgggca tgtgtgtcaa

661 cgagcggcga cgcctcattg tgcctcccca cctgggctat gggagcatcg gcctggcggg

721 gctcattcca ccggatgcca ccctctactt cgatgtggtt ctgctggatg tgtggaacaa

781 ggaagacacc gtgcaggtga gcacattgct gcgcccgccc cactgccccc gcatggtcca

841 ggacggcgac tttgtccgct accactacaa tggcaccctg ctggacggca cctccttcga

901 caccagctac agtaagggcg gcacttatga cacctacgtc ggctctggtt ggctgatcaa

961 gggcatggac caggggctgc tgggcatgtg tcctggagag agaaggaaga ttatcatccc

1021 tccattcctg gcctatggcg agaaaggcta tgggacagtg atccccccac aggcctcgct

1081 ggtctttcac gtcctcctga ttgacgtgca caacccgaag gacgctgtcc agctagagac

1141 gctggagctc ccccccggct gtgtccgcag agccggggcc ggggacttca tgcgctacca

1201 ctacaatggc tccttgatgg acggcaccct cttcgattcc agctactccc gcaaccacac

1261 ctacaatacc tatatcgggc agggttacat catccccggg atggaccagg ggctgcaggg

1321 tgcctgcatg ggggaacgcc ggagaattac catccccccg cacctcgcct atggggagaa

1381 tggaactgga gacaagatcc ctggctctgc cgtgctaatc ttcaacgtcc atgtcattga

1441 cttccacaac cctgcggatg tggtggaaat caggacactg tcccggccat ctgagacctg

1501 caatgagacc accaagcttg gggactttgt tcgataccat tacaactgtt ctttgctgga

1561 cggcacccag ctgttcacct cgcatgacta cggggccccc caggaggcga ctctcggggc

1621 caacaaggtg atcgaaggcc tggacacggg cctgcagggc atgtgtgtgg gagagaggcg

1681 gcagctcatc gtgcccccgc acctggccca cggggagagt ggagcccggg gagtcccagg

1741 cagtgctgtg ctgctgtttg aggtggagct ggtgtcccgg gaggatgggc tgcccacagg 1801 ctacctgttt gtgtggcaca aggaccctcc tgccaacctg tttgaagaca tggacctcaa

1861 caaggatggc gaggtccctc cggaggagtt ctccaccttc atcaaggctc aagtgagtga

1921 gggcaaagga cgcctcatgc ctgggcagga ccctgagaaa accataggag acatgttcca

1981 gaaccaggac cgcaaccagg acggcaagat cacagtcgac gagctcaagc tgaagtcaga

2041 tgaggacgag gagcgggtcc acgaggagct ctgaggggca gggagcctgg ccaggcctga

2101 gacacagagg cccactgcga gggggacagt ggcggtggga ctgacctgct gacagtcacc

2161 ctccctctgc tgggatgagg tccaggagcc aactaaaaca atggcagagg agacatctct

2221 ggtgttccca ccaccctaga tgaaaatcca cagcacagac ctctaccgtg tttctcttcc

2281 atccctaaac cacttcctta aaatgtttgg atttgcaaag ccaatttggg gcctgtggag

2341 cctggggttg gatagggcca tggctggtcc cccaccatac ctcccctcca catcactgac

2401 acagctgagc ttgttatcca tctccccaaa ctttctcttt ctttgtactt cttgtcatcc

2461 ccactcccag ccccttttcc tctatgtgac agctccctag gacccctctg ccttcctccc

2521 caatcctgac tggctcctag ggaaggggaa ggctcctgga gggcagccct acctctccca

2581 tgccctttgc cctcctccct cgcctccagt ggaggctgag ctgaccctgg gctgctggag

2641 gccagactgg gctgtagtta gcttttcatc cctaaagaag gctcctttcc ctaaggaacc

2701 atagaagaga ggaagaaaac aaagggcatg tgtgagggaa gctgcttggg tgggtgttag

2761 ggctatgaaa tcttggattt ggggctgagg ggtgggaggg agggcagagc tctgcacact

2821 caaaggctaa actggtgtca gtcctttttt cctttgttcc aaataaaaga ttaaaccaat

2881 ggcaaaaa

Protein sequence:

NCBI Reference Sequence: NP_068758.3

LOCUS NP_068758

ACCESSION NP_068758

1 mfpagppshs llrlpllqll llvvqavgrg Igraspaggp ledvvieryh ipracprevq

61 mgdfvryhyn gtfedgkkfd ssydrntlva ivvgvgrlit gmdrglmgmc vnerrrlivp

121 phlgygsigl aglippdatl yfdvvlldvw nkedtvqvst llrpphcprm vqdgdfvryh

181 yngtlldgts fdtsyskggt ydtyvgsgwl ikgmdqgllg mcpgerrkii ippflaygek

241 gygtvippqa slvfhvllid vhnpkdavql etlelppgcv rragagdfmr yhyngslmdg

301 tlfdssysrn htyntyigqg yiipgmdqgl qgacmgerrr itipphlayg engtgdkipg 361 savlifnvhv idfhnpadvv eirtlsrpse tcnettklgd fvryhyncsl ldgtqlftsh

421 dygapqeatl gankviegld tglqgmcvge rrqlivpphl ahgesgargv pgsavllfev

481 elvsredglp tgylfvwhkd ppanlfedmd lnkdgevppe ef stfikaqv segkgrlmpg

541 qdpektigdm fqnqdrnqdg kitvdelklk sdedeervhe el

FINC

Official Symbol: FN1

Official Name: fibronectin 1

Gene ID: 2335

Organism: Homo sapiens

Other Aliases: CIG, ED-B, FINC, FN, FNZ, GFND, GFND2, LETS, MSF

Other Designations: cold-insoluble globulin; fibronectin; migration-stimulating factor

Nucleotide sequence:

NCBI Reference Sequence: NM_002026.2

LOCUS NM_002026

ACCESSION NM 002026

1 gcccgcgccg gctgtgctgc acagggggag gagagggaac cccaggcgcg agcgggaaga

61 ggggacctgc agccacaact tctctggtcc tctgcatccc ttctgtccct ccacccgtcc

121 ccttccccac cctctggccc ccaccttctt ggaggcgaca acccccggga ggcattagaa

181 gggatttttc ccgcaggttg cgaagggaag caaacttggt ggcaacttgc ctcccggtgc

241 gggcgtctct cccccaccgt ctcaacatgc ttaggggtcc ggggcccggg ctgctgctgc

301 tggccgtcca gtgcctgggg acagcggtgc cctccacggg agcctcgaag agcaagaggc

361 aggctcagca aatggttcag ccccagtccc cggtggctgt cagtcaaagc aagcccggtt

421 gttatgacaa tggaaaacac tatcagataa atcaacagtg ggagcggacc tacctaggca

481 atgcgttggt ttgtacttgt tatggaggaa gccgaggttt taactgcgag agtaaacctg

541 aagctgaaga gacttgcttt gacaagtaca ctgggaacac ttaccgagtg ggtgacactt

601 atgagcgtcc taaagactcc atgatctggg actgtacctg catcggggct gggcgaggga 661 gaataagctg taccatcgca aaccgctgcc atgaaggggg tcagtcctac aagattggtg

721 acacctggag gagaccacat gagactggtg gttacatgtt agagtgtgtg tgtcttggta

781 atggaaaagg agaatggacc tgcaagccca tagctgagaa gtgttttgat catgctgctg

841 ggacttccta tgtggtcgga gaaacgtggg agaagcccta ccaaggctgg atgatggtag

901 attgtacttg cctgggagaa ggcagcggac gcatcacttg cacttctaga aatagatgca

961 acgatcagga cacaaggaca tcctatagaa ttggagacac ctggagcaag aaggataatc

1021 gaggaaacct gctccagtgc atctgcacag gcaacggccg aggagagtgg aagtgtgaga

1081 ggcacacctc tgtgcagacc acatcgagcg gatctggccc cttcaccgat gttcgtgcag

1141 ctgtttacca accgcagcct cacccccagc ctcctcccta tggccactgt gtcacagaca

1201 gtggtgtggt ctactctgtg gggatgcagt ggctgaagac acaaggaaat aagcaaatgc

1261 tttgcacgtg cctgggcaac ggagtcagct gccaagagac agctgtaacc cagacttacg

1321 gtggcaactc aaatggagag ccatgtgtct taccattcac ctacaatggc aggacgttct

1381 actcctgcac cacagaaggg cgacaggacg gacatctttg gtgcagcaca acttcgaatt

1441 atgagcagga ccagaaatac tctttctgca cagaccacac tgttttggtt cagactcgag

1501 gaggaaattc caatggtgcc ttgtgccact tccccttcct atacaacaac cacaattaca

1561 ctgattgcac ttctgagggc agaagagaca acatgaagtg gtgtgggacc acacagaact

1621 atgatgccga ccagaagttt gggttctgcc ccatggctgc ccacgaggaa atctgcacaa

1681 ccaatgaagg ggtcatgtac cgcattggag atcagtggga taagcagcat gacatgggtc

1741 acatgatgag gtgcacgtgt gttgggaatg gtcgtgggga atggacatgc attgcctact

1801 cgcagcttcg agatcagtgc attgttgatg acatcactta caatgtgaac gacacattcc

1861 acaagcgtca tgaagagggg cacatgctga actgtacatg cttcggtcag ggtcggggca

1921 ggtggaagtg tgatcccgtc gaccaatgcc aggattcaga gactgggacg ttttatcaaa

1981 ttggagattc atgggagaag tatgtgcatg gtgtcagata ccagtgctac tgctatggcc

2041 gtggcattgg ggagtggcat tgccaacctt tacagaccta tccaagctca agtggtcctg

2101 tcgaagtatt tatcactgag actccgagtc agcccaactc ccaccccatc cagtggaatg

2161 caccacagcc atctcacatt tccaagtaca ttctcaggtg gagacctaaa aattctgtag

2221 gccgttggaa ggaagctacc ataccaggcc acttaaactc ctacaccatc aaaggcctga

2281 agcctggtgt ggtatacgag ggccagctca tcagcatcca gcagtacggc caccaagaag

2341 tgactcgctt tgacttcacc accaccagca ccagcacacc tgtgaccagc aacaccgtga

2401 caggagagac gactcccttt tctcctcttg tggccacttc tgaatctgtg accgaaatca 2461 cagccagtag ctttgtggtc tcctgggtct cagcttccga caccgtgtcg ggattccggg

2521 tggaatatga gctgagtgag gagggagatg agccacagta cctggatctt ccaagcacag

2581 ccacttctgt gaacatccct gacctgcttc ctggccgaaa atacattgta aatgtctatc

2641 agatatctga ggatggggag cagagtttga tcctgtctac ttcacaaaca acagcgcctg

2701 atgcccctcc tgacccgact gtggaccaag ttgatgacac ctcaattgtt gttcgctgga

2761 gcagacccca ggctcccatc acagggtaca gaatagtcta ttcgccatca gtagaaggta

2821 gcagcacaga actcaacctt cctgaaactg caaactccgt caccctcagt gacttgcaac

2881 ctggtgttca gtataacatc actatctatg ctgtggaaga aaatcaagaa agtacacctg

2941 ttgtcattca acaagaaacc actggcaccc cacgctcaga tacagtgccc tctcccaggg

3001 acctgcagtt tgtggaagtg acagacgtga aggtcaccat catgtggaca ccgcctgaga

3061 gtgcagtgac cggctaccgt gtggatgtga tccccgtcaa cctgcctggc gagcacgggc

3121 agaggctgcc catcagcagg aacacctttg cagaagtcac cgggctgtcc cctggggtca

3181 cctattactt caaagtcttt gcagtgagcc atgggaggga gagcaagcct ctgactgctc

3241 aacagacaac caaactggat gctcccacta acctccagtt tgtcaatgaa actgattcta

3301 ctgtcctggt gagatggact ccacctcggg cccagataac aggataccga ctgaccgtgg

3361 gccttacccg aagaggacag cccaggcagt acaatgtggg tccctctgtc tccaagtacc

3421 cactgaggaa tctgcagcct gcatctgagt acaccgtatc cctcgtggcc ataaagggca

3481 accaagagag ccccaaagcc actggagtct ttaccacact gcagcctggg agctctattc

3541 caccttacaa caccgaggtg actgagacca ccattgtgat cacatggacg cctgctccaa

3601 gaattggttt taagctgggt gtacgaccaa gccagggagg agaggcacca cgagaagtga

3661 cttcagactc aggaagcatc gttgtgtccg gcttgactcc aggagtagaa tacgtctaca

3721 ccatccaagt cctgagagat ggacaggaaa gagatgcgcc aattgtaaac aaagtggtga

3781 caccattgtc tccaccaaca aacttgcatc tggaggcaaa ccctgacact ggagtgctca

3841 cagtctcctg ggagaggagc accaccccag acattactgg ttatagaatt accacaaccc

3901 ctacaaacgg ccagcaggga aattctttgg aagaagtggt ccatgctgat cagagctcct

3961 gcacttttga taacctgagt cccggcctgg agtacaatgt cagtgtttac actgtcaagg

4021 atgacaagga aagtgtccct atctctgata ccatcatccc agctgttcct cctcccactg

4081 acctgcgatt caccaacatt ggtccagaca ccatgcgtgt cacctgggct ccacccccat

4141 ccattgattt aaccaacttc ctggtgcgtt actcacctgt gaaaaatgag gaagatgttg

4201 cagagttgtc aatttctcct tcagacaatg cagtggtctt aacaaatctc ctgcctggta 4261 cagaatatgt agtgagtgtc tccagtgtct acgaacaaca tgagagcaca cctcttagag

4321 gaagacagaa aacaggtctt gattccccaa ctggcattga cttttctgat attactgcca

4381 actcttttac tgtgcactgg attgctcctc gagccaccat cactggctac aggatccgcc

4441 atcatcccga gcacttcagt gggagacctc gagaagatcg ggtgccccac tctcggaatt

4501 ccatcaccct caccaacctc actccaggca cagagtatgt ggtcagcatc gttgctctta

4561 atggcagaga ggaaagtccc ttattgattg gccaacaatc aacagtttct gatgttccga

4621 gggacctgga agttgttgct gcgaccccca ccagcctact gatcagctgg gatgctcctg

4681 ctgtcacagt gagatattac aggatcactt acggagagac aggaggaaat agccctgtcc

4741 aggagttcac tgtgcctggg agcaagtcta cagctaccat cagcggcctt aaacctggag

4801 ttgattatac catcactgtg tatgctgtca ctggccgtgg agacagcccc gcaagcagca

4861 agccaatttc cattaattac cgaacagaaa ttgacaaacc atcccagatg caagtgaccg

4921 atgttcagga caacagcatt agtgtcaagt ggctgccttc aagttcccct gttactggtt

4981 acagagtaac caccactccc aaaaatggac caggaccaac aaaaactaaa actgcaggtc

5041 cagatcaaac agaaatgact attgaaggct tgcagcccac agtggagtat gtggttagtg

5101 tctatgctca gaatccaagc ggagagagtc agcctctggt tcagactgca gtaaccaaca

5161 ttgatcgccc taaaggactg gcattcactg atgtggatgt cgattccatc aaaattgctt

5221 gggaaagccc acaggggcaa gtttccaggt acagggtgac ctactcgagc cctgaggatg

5281 gaatccatga gctattccct gcacctgatg gtgaagaaga cactgcagag ctgcaaggcc

5341 tcagaccggg ttctgagtac acagtcagtg tggttgcctt gcacgatgat atggagagcc

5401 agcccctgat tggaacccag tccacagcta ttcctgcacc aactgacctg aagttcactc

5461 aggtcacacc cacaagcctg agcgcccagt ggacaccacc caatgttcag ctcactggat

5521 atcgagtgcg ggtgaccccc aaggagaaga ccggaccaat gaaagaaatc aaccttgctc

5581 ctgacagctc atccgtggtt gtatcaggac ttatggtggc caccaaatat gaagtgagtg

5641 tctatgctct taaggacact ttgacaagca gaccagctca gggagttgtc accactctgg

5701 agaatgtcag cccaccaaga agggctcgtg tgacagatgc tactgagacc accatcacca

5761 ttagctggag aaccaagact gagacgatca ctggcttcca agttgatgcc gttccagcca

5821 atggccagac tccaatccag agaaccatca agccagatgt cagaagctac accatcacag

5881 gtttacaacc aggcactgac tacaagatct acctgtacac cttgaatgac aatgctcgga

5941 gctcccctgt ggtcatcgac gcctccactg ccattgatgc accatccaac ctgcgtttcc

6001 tggccaccac acccaattcc ttgctggtat catggcagcc gccacgtgcc aggattaccg 6061 gctacatcat caagtatgag aagcctgggt ctcctcccag agaagtggtc cctcggcccc

6121 gccctggtgt cacagaggct actattactg gcctggaacc gggaaccgaa tatacaattt

6181 atgtcattgc cctgaagaat aatcagaaga gcgagcccct gattggaagg aaaaagacag

6241 acgagcttcc ccaactggta acccttccac accccaatct tcatggacca gagatcttgg

6301 atgttccttc cacagttcaa aagacccctt tcgtcaccca ccctgggtat gacactggaa

6361 atggtattca gcttcctggc acttctggtc agcaacccag tgttgggcaa caaatgatct

6421 ttgaggaaca tggttttagg cggaccacac cgcccacaac ggccaccccc ataaggcata

6481 ggccaagacc atacccgccg aatgtaggac aagaagctct ctctcagaca accatctcat

6541 gggccccatt ccaggacact tctgagtaca tcatttcatg tcatcctgtt ggcactgatg

6601 aagaaccctt acagttcagg gttcctggaa cttctaccag tgccactctg acaggcctca

6661 ccagaggtgc cacctacaac atcatagtgg aggcactgaa agaccagcag aggcataagg

6721 ttcgggaaga ggttgttacc gtgggcaact ctgtcaacga aggcttgaac caacctacgg

6781 atgactcgtg ctttgacccc tacacagttt cccattatgc cgttggagat gagtgggaac

6841 gaatgtctga atcaggcttt aaactgttgt gccagtgctt aggctttgga agtggtcatt

6901 tcagatgtga ttcatctaga tggtgccatg acaatggtgt gaactacaag attggagaga

6961 agtgggaccg tcagggagaa aatggccaga tgatgagctg cacatgtctt gggaacggaa

7021 aaggagaatt caagtgtgac cctcatgagg caacgtgtta tgatgatggg aagacatacc

7081 acgtaggaga acagtggcag aaggaatatc tcggtgccat ttgctcctgc acatgctttg

7141 gaggccagcg gggctggcgc tgtgacaact gccgcagacc tgggggtgaa cccagtcccg

7201 aaggcactac tggccagtcc tacaaccagt attctcagag ataccatcag agaacaaaca

7261 ctaatgttaa ttgcccaatt gagtgcttca tgcctttaga tgtacaggct gacagagaag

7321 attcccgaga gtaaatcatc tttccaatcc agaggaacaa gcatgtctct ctgccaagat

7381 ccatctaaac tggagtgatg ttagcagacc cagcttagag ttcttctttc tttcttaagc

7441 cctttgctct ggaggaagtt ctccagcttc agctcaactc acagcttctc caagcatcac

7501 cctgggagtt tcctgagggt tttctcataa atgagggctg cacattgcct gttctgcttc

7561 gaagtattca ataccgctca gtattttaaa tgaagtgatt ctaagatttg gtttgggatc

7621 aataggaaag catatgcagc caaccaagat gcaaatgttt tgaaatgata tgaccaaaat

7681 tttaagtagg aaagtcaccc aaacacttct gctttcactt aagtgtctgg cccgcaatac

7741 tgtaggaaca agcatgatct tgttactgtg atattttaaa tatccacagt actcactttt

7801 tccaaatgat cctagtaatt gcctagaaat atctttctct tacctgttat ttatcaattt 7861 ttcccagtat ttttatacgg aaaaaattgt attgaaaaca cttagtatgc agttgataag

7921 aggaatttgg tataattatg gtgggtgatt attttttata ctgtatgtgc caaagcttta

7981 ctactgtgga aagacaactg ttttaataaa agatttacat tccacaactt gaagttcatc

8041 tatttgatat aagacacctt cgggggaaat aattcctgtg aatattcttt ttcaattcag

8101 caaacatttg aaaatctatg atgtgcaagt ctaattgttg atttcagtac aagattttct

8161 aaatcagttg ctacaaaaac tgattggttt ttgtcacttc atctcttcac taatggagat

8221 agctttacac tttctgcttt aatagattta agtggacccc aatatttatt aaaattgcta

8281 gtttaccgtt cagaagtata atagaaataa tctttagttg ctcttttcta accattgtaa

8341 ttcttccctt cttccctcca cctttccttc attgaataaa cctctgttca aagagattgc

8401 ctgcaaggga aataaaaatg actaagatat taaaaaaaaa aaaaaaaaa

Protein sequence:

NCBI Reference Sequence: NP_002017.1

LOCUS NP 002017

ACCESSION NP 002017

1 mlrgpgpgll llavqclgta vpstgasksk rqaqqmvqpq spvavsqskp gcydngkhyq

61 inqqwertyl gnalvctcyg gsrgfncesk peaeetcfdk ytgntyrvgd tyerpkdsmi

121 wdctcigagr grisctianr cheggqsyki gdtwrrphet ggymlecvcl gngkgewtck

181 piaekcfdha agtsyvvget wekpyqgwmm vdctclgegs gritctsrnr cndqdtrtsy

241 rigdtwskkd nrgnllqcic tgngrgewkc erhtsvqtts sgsgpftdvr aavyqpqphp

301 qpppyghcvt dsgvvysvgm qwlktqgnkq mlctclgngv scqetavtqt yggnsngepc

361 vlpftyngrt fyscttegrq dghlwcstts nyeqdqkysf ctdhtvlvqt rggnsngalc

421 hfpflynnhn ytdctsegrr dnmkwcgttq nydadqkfgf cpmaaheeic ttnegvmyri

481 gdqwdkqhdm ghmmrctcvg ngrgewtcia ysqlrdqciv dditynvndt fhkrheeghm

541 lnctcfgqgr grwkcdpvdq cqdsetgtfy qigdswekyv hgvryqcycy grgigewhcq

601 plqtypsssg pvevfitetp sqpnshpiqw napqpshisk yilrwrpkns vgrwkeatip

661 ghlnsytikg lkpgvvyegq lisiqqyghq evtrfdfttt ststpvtsnt vtgettpf sp

721 lvatsesvte itassfvvsw vsasdtvsgf rveyelseeg depqyldlps tatsvnipdl

781 lpgrkyivnv yqisedgeqs lilstsqtta pdappdptvd qvddtsivvr wsrpqapitg

841 yrivyspsve gsstelnlpe tansvtlsdl qpgvqyniti yaveenqest pvviqqettg

901 tprsdtvpsp rdlqfvevtd vkvtimwtpp esavtgyrvd vipvnlpgeh gqr lpisrnt 961 faevtglspg vtyyfkvfav shgreskplt aqqttkldap tnlqfvnetd stvlvrwtpp

1021 raqitgyrlt vgltrrgqpr qynvgpsvsk yplrnlqpas eytvslvaik gnqespkatg

1081 vfttlqpgss ippyntevte ttivitwtpa prigfklgvr psqggeapre vtsdsgsivv

1141 sgltpgveyv ytiqvlrdgq erdapivnkv vtplspptnl hleanpdtgv ltvswer stt

1201 pditgyritt tptngqqgns leevvhadqs sctfdnlspg leynvsvytv kddkesvpis

1261 dtiipavppp tdlrftnigp dtmrvtwapp psidltnflv ryspvkneed vaelsispsd

1321 navvltnllp gteyvvsvss vyeqhestpl rgrqktglds ptgidfsdit ansftvhwia

1381 pratitgyri rhhpehfsgr predrvphsr nsitltnltp gteyvvsiva lngreespll

1441 igqqstvsdv prdlevvaat ptslliswda pavtvryyri tygetggnsp vqeftvpgsk

1501 statisglkp gvdytitvya vtgrgdspas skpisinyrt eidkpsqmqv tdvqdnsisv

1561 kwlpssspvt gyrvtttpkn gpgptktkta gpdqtemtie glqptveyvv svyaqnpsge

1621 sqplvqtavt nidrpkglaf tdvdvdsiki awespqgqvs ryrvtysspe dgihelfpap

1681 dgeedtaelq glrpgseytv svvalhddme sqpligtqst aipaptdlkf tqvtptslsa

1741 qwtppnvqlt gyrvrvtpke ktgpmkeinl apdsssvvvs glmvatkyev svyalkdtlt

1801 srpaqgvvtt lenvspprra rvtdatetti tiswrtktet itgfqvdavp angqtpiqrt

1861 ikpdvrsyti tglqpgtdyk iylytlndna rsspvvidas taidapsnlr f lattpnsll

1921 vswqpprari tgyiikyekp gspprevvpr prpgvteati tglepgteyt iyvialknnq

1981 ksepligrkk tdelpqlvtl phpnlhgpei ldvpstvqkt pfvthpgydt gngiqlpgts

2041 gqqpsvgqqm ifeehgfrrt tppttatpir hrprpyppnv gqealsqtti swapfqdtse

2101 yiischpvgt deeplqfrvp gtstsatltg ltrgatynii vealkdqqrh kvreevvtvg

2161 nsvneglnqp tddscfdpyt vshyavgdew ermsesgfkl lcqclgfgsg hfrcdssrwc

2221 hdngvnykig ekwdrqgeng qmmsctclgn gkgefkcdph eatcyddgkt yhvgeqwqke

2281 ylgaicsctc fggqrgwrcd ncrrpggeps pegttgqsyn qysqryhqrt ntnvncpiec

2341 fmpldvqadr edsre

CYB5

Official Symbol: CYB5A

Official Name: cytochrome b5 type A (microsomal)

Gene ID: 1528

Organism: Homo sapiens Other Aliases: CYBS, MCBS

Other Designations: cytochrome b5; type 1 cyt-b5

Note - there are three difference isoforms

Isoform 1

Nucleotide sequence:

NCBI Reference Sequence: NM_148923.3

LOCUS NM_148923

ACCESSION NM_148923

1 gcgccccgcc cctgagccgg ccgcccagcc cccagtgggg ttcccggcgc ggggaatgtc

61 ccgggtggag ctggctgagt cgcgcgctct gctccacccg acggggctgt gtgtgctggg

121 cctggctcgc ggcgaaccga gatggcagag cagtcggacg aggccgtgaa gtactacacc

181 ctagaggaga ttcagaagca caaccacagc aagagcacct ggctgatcct gcaccacaag

241 gtgtacgatt tgaccaaatt tctggaagag catcctggtg gggaagaagt tttaagggaa

301 caagctggag gtgacgctac tgagaacttt gaggatgtcg ggcactctac agatgccagg

361 gaaatgtcca aaacattcat cattggggag ctccatccag atgacagacc aaagttaaac

421 aagcctccgg aaactcttat cactactatt gattctagtt ccagttggtg gaccaactgg

481 gtgatccctg ccatctctgc agtggccgtc gccttgatgt atcgcctata catggcagag

541 gactgaacac ctcctcagaa gtcagcgcag gaagagcctg ctttggacac gggagaaaag

601 aagccattgc taactacttc aactgacaga aaccttcact tgaaaacaat gattttaata

661 tatctctttc tttttcttcc gacattagaa acaaaacaaa aagaactgtc ctttctgcgc

721 tcaaattttt cgagtgtgcc tttttattca tctactttat tttgatgttt ccttaatgtg

781 taatttactt attataagca tgatctttta aaaatatatt tggcttttaa agtatgcaaa

841 aaaaaaaaaa

Protein sequence:

NCBI Reference Sequence: NP_683725.1

LOCUS NP_683725

ACCESSION NP_683725

1 maeqsdeavk yytleeiqkh nhskstwlil hhkvydltkf leehpggeev lreqaggdat 61 enfedvghst daremsktfi igelhpddrp klnkppetli ttidssssww tnwvipaisa

121 vavalmyrly maed

Isoform 2

Nucleotide sequence:

NCBI Reference Sequence: NM_001914.3

LOCUS NM_001914

ACCESSION NM_001914

1 gcgccccgcc cctgagccgg ccgcccagcc cccagtgggg ttcccggcgc ggggaatgtc

61 ccgggtggag ctggctgagt cgcgcgctct gctccacccg acggggctgt gtgtgctggg

121 cctggctcgc ggcgaaccga gatggcagag cagtcggacg aggccgtgaa gtactacacc

181 ctagaggaga ttcagaagca caaccacagc aagagcacct ggctgatcct gcaccacaag

241 gtgtacgatt tgaccaaatt tctggaagag catcctggtg gggaagaagt tttaagggaa

301 caagctggag gtgacgctac tgagaacttt gaggatgtcg ggcactctac agatgccagg

361 gaaatgtcca aaacattcat cattggggag ctccatccag atgacagacc aaagttaaac

421 aagcctccgg aaccttaaag gcggtgtttc aaggaaactc ttatcactac tattgattct

481 agttccagtt ggtggaccaa ctgggtgatc cctgccatct ctgcagtggc cgtcgccttg

541 atgtatcgcc tatacatggc agaggactga acacctcctc agaagtcagc gcaggaagag

601 cctgctttgg acacgggaga aaagaagcca ttgctaacta cttcaactga cagaaacctt

661 cacttgaaaa caatgatttt aatatatctc tttctttttc ttccgacatt agaaacaaaa

721 caaaaagaac tgtcctttct gcgctcaaat ttttcgagtg tgccttttta ttcatctact

781 ttattttgat gtttccttaa tgtgtaattt acttattata agcatgatct tttaaaaata

841 tatttggctt ttaaagtatg caaaaaaaaa aaaa

Protein sequence:

NCBI Reference Sequence: NP_001905.1

LOCUS NP_001905

ACCESSION NP 001905

1 maeqsdeavk yytleeiqkh nhskstwlil hhkvydltkf leehpggeev lreqaggdat

61 enfedvghst daremsktfi igelhpddrp klnkppep Isoform 3

Nucleotide sequence:

NCBI Reference Sequence: NM_001 190807.2

LOCUS NM_001 190807

ACCESSION NM 001 190807

1 gcgccccgcc cctgagccgg ccgcccagcc cccagtgggg ttcccggcgc ggggaatgtc

61 ccgggtggag ctggctgagt cgcgcgctct gctccacccg acggggctgt gtgtgctggg

121 cctggctcgc ggcgaaccga gatggcagag cagtcggacg aggccgtgaa gtactacacc

181 ctagaggaga ttcagaagca caaccacagc aagagcacct ggctgatcct gcaccacaag

241 gtgtacgatt tgaccaaatt tctggaagag catcctggtg gggaagaagt tttaagggaa

301 caagctggag gtgacgctac tgagaacttt gaggatgtcg ggcactctac agatgccagg

361 gaaatgtcca aaacattcat cattggggag ctccatccag aaactcttat cactactatt

421 gattctagtt ccagttggtg gaccaactgg gtgatccctg ccatctctgc agtggccgtc

481 gccttgatgt atcgcctata catggcagag gactgaacac ctcctcagaa gtcagcgcag

541 gaagagcctg ctttggacac gggagaaaag aagccattgc taactacttc aactgacaga

601 aaccttcact tgaaaacaat gattttaata tatctctttc tttttcttcc gacattagaa

661 acaaaacaaa aagaactgtc ctttctgcgc tcaaattttt cgagtgtgcc tttttattca

721 tctactttat tttgatgttt ccttaatgtg taatttactt attataagca tgatctttta

781 aaaatatatt tggcttttaa agtatgcaaa aaaaaaaaaa

Protein sequence:

NCBI Reference Sequence: NP_001 177736.1

LOCUS NP_001 177736

ACCESSION NP_001 177736

1 maeqsdeavk yytleeiqkh nhskstwlil hhkvydltkf leehpggeev lreqaggdat

61 enfedvghst daremsktfi igelhpetli ttidssssww tnwvipaisa vavalmyr ly

121 raaed PAI1

Official Symbol: SERPINE1

Official Name: serpin peptidase inhibitor, clade E (nexin, plasminogen activator inhibitor type 1 ), member 1

Gene ID: 5054

Organism: Homo sapiens

Other Aliases: PAI, PAI-1 , PAI1 , PLANH1

Other Designations: endothelial plasminogen activator inhibitor; plasminogen activator inhibitor 1 ; serine (or cysteine) proteinase inhibitor, clade E (nexin, plasminogen activator inhibitor type 1 ), member 1 ; serpin E1

Nucleotide sequence (Isoform 1 ):

NCBI Reference Sequence: NM_000602.4

LOCUS NM_000602

ACCESSION NM_000602

1 ggcccacaga ggagcacagc tgtgtttggc tgcagggcca agagcgctgt caagaagacc

61 cacacgcccc cctccagcag ctgaattcct gcagctcagc agccgccgcc agagcaggac

121 gaaccgccaa tcgcaaggca cctctgagaa cttcaggatg cagatgtctc cagccctcac

181 ctgcctagtc ctgggcctgg cccttgtctt tggtgaaggg tctgctgtgc accatccccc

241 atcctacgtg gcccacctgg cctcagactt cggggtgagg gtgtttcagc aggtggcgca

301 ggcctccaag gaccgcaacg tggttttctc accctatggg gtggcctcgg tgttggccat

361 gctccagctg acaacaggag gagaaaccca gcagcagatt caagcagcta tgggattcaa

421 gattgatgac aagggcatgg cccccgccct ccggcatctg tacaaggagc tcatggggcc

481 atggaacaag gatgagatca gcaccacaga cgcgatcttc gtccagcggg atctgaagct

541 ggtccagggc ttcatgcccc acttcttcag gctgttccgg agcacggtca agcaagtgga

601 cttttcagag gtggagagag ccagattcat catcaatgac tgggtgaaga cacacacaaa

661 aggtatgatc agcaacttgc ttgggaaagg agccgtggac cagctgacac ggctggtgct

721 ggtgaatgcc ctctacttca acggccagtg gaagactccc ttccccgact ccagcaccca

781 ccgccgcctc ttccacaaat cagacggcag cactgtctct gtgcccatga tggctcagac

841 caacaagttc aactatactg agttcaccac gcccgatggc cattactacg acatcctgga

901 actgccctac cacggggaca ccctcagcat gttcattgct gccccttatg aaaaagaggt 961 gcctctctct gccctcacca acattctgag tgcccagctc atcagccact ggaaaggcaa

1021 catgaccagg ctgccccgcc tcctggttct gcccaagttc tccctggaga ctgaagtcga

1081 cctcaggaag cccctagaga acctgggaat gaccgacatg ttcagacagt ttcaggctga

1141 cttcacgagt ctttcagacc aagagcctct ccacgtcgcg caggcgctgc agaaagtgaa

1201 gatcgaggtg aacgagagtg gcacggtggc ctcctcatcc acagctgtca tagtctcagc

1261 ccgcatggcc cccgaggaga tcatcatgga cagacccttc ctctttgtgg tccggcacaa

1321 ccccacagga acagtccttt tcatgggcca agtgatggaa ccctgaccct ggggaaagac

1381 gccttcatct gggacaaaac tggagatgca tcgggaaaga agaaactccg aagaaaagaa

1441 ttttagtgtt aatgactctt tctgaaggaa gagaagacat ttgccttttg ttaaaagatg

1501 gtaaaccaga tctgtctcca agaccttggc ctctccttgg aggaccttta ggtcaaactc

1561 cctagtctcc acctgagacc ctgggagaga agtttgaagc acaactccct taaggtctcc

1621 aaaccagacg gtgacgcctg cgggaccatc tggggcacct gcttccaccc gtctctctgc

1681 ccactcgggt ctgcagacct ggttcccact gaggcccttt gcaggatgga actacggggc

1741 ttacaggagc ttttgtgtgc ctggtagaaa ctatttctgt tccagtcaca ttgccatcac

1801 tcttgtactg cctgccaccg cggaggaggc tggtgacagg ccaaaggcca gtggaagaaa

1861 caccctttca tctcagagtc cactgtggca ctggccaccc ctccccagta caggggtgct

1921 gcaggtggca gagtgaatgt cccccatcat gtggcccaac tctcctggcc tggccatctc

1981 cctccccaga aacagtgtgc atgggttatt ttggagtgta ggtgacttgt ttactcattg

2041 aagcagattt ctgcttcctt ttatttttat aggaatagag gaagaaatgt cagatgcgtg

2101 cccagctctt caccccccaa tctcttggtg gggaggggtg tacctaaata tttatcatat

2161 ccttgccctt gagtgcttgt tagagagaaa gagaactact aaggaaaata atattattta

2221 aactcgctcc tagtgtttct ttgtggtctg tgtcaccgta tctcaggaag tccagccact

2281 tgactggcac acacccctcc ggacatccag cgtgacggag cccacactgc caccttgtgg

2341 ccgcctgaga ccctcgcgcc ccccgcgccc ctctttttcc ccttgatgga aattgaccat

2401 acaatttcat cctccttcag gggatcaaaa ggacggagtg gggggacaga gactcagatg

2461 aggacagagt ggtttccaat gtgttcaata gatttaggag cagaaatgca aggggctgca

2521 tgacctacca ggacagaact ttccccaatt acagggtgac tcacagccgc attggtgact

2581 cacttcaatg tgtcatttcc ggctgctgtg tgtgagcagt ggacacgtga ggggggggtg

2641 ggtgagagag acaggcagct cggattcaac taccttagat aatatttctg aaaacctacc

2701 agccagaggg tagggcacaa agatggatgt aatgcacttt gggaggccaa ggcgggagga 2761 ttgcttgagc ccaggagttc aagaccagcc tgggcaacat accaagaccc ccgtctcttt

2821 aaaaatatat atattttaaa tatacttaaa tatatatttc taatatcttt aaatatatat

2881 atatatttta aagaccaatt tatgggagaa ttgcacacag atgtgaaatg aatgtaatct

2941 aatagaagcc taatcagccc accatgttct ccactgaaaa atcctctttc tttggggttt

3001 ttctttcttt cttttttgat tttgcactgg acggtgacgt cagccatgta caggatccac

3061 aggggtggtg tcaaatgcta ttgaaattgt gttgaattgt atgctttttc acttttgata

3121 aataaacatg taaaaatgtt tcaaaaaaat aataaaataa ataaatacga agaatatgtc

3181 aggacagtca aaaaaaaaaa aaaaaaa

Protein sequence (isoform 1 ):

NCBI Reference Sequence: NP_000593.1

LOCUS NP_000593

ACCESSION NP_000593

1 mqmspaltcl vlglalvfge gsavhhppsy vahlasdfgv rvfqqvaqas kdrnvvf spy

61 gvasvlamlq lttggetqqq iqaamgfkid dkgmapalrh lykelmgpwn kdeisttdai

121 fvqrdlklvq gfmphffrlf rstvkqvdfs everarfiin dwvkthtkgm isnllgkgav

181 dqltrlvlvn alyfngqwkt pfpdssthrr lfhksdgstv svpmmaqtnk fnytefttpd

241 ghyydilelp yhgdtlsmfi aapyekevpl saltnilsaq lishwkgnmt r lprllvlpk

301 fsletevdlr kplenlgmtd mfrqfqadft slsdqeplhv aqalqkvkie vnesgtvass

361 stavivsarm apeeiimdrp flfvvrhnpt gtvlfmgqvm ep

Nucleotide sequence (isoform 2):

NCBI Reference Sequence: NM_001 165413.2

LOCUS NM_001 165413

ACCESSION NM_001 165413

1 ggcccacaga ggagcacagc tgtgtttggc tgcagggcca agagcgctgt caagaagacc

61 cacacgcccc cctccagcag ctgaattcct gcagctcagc agccgccgcc agagcaggac

121 gaaccgccaa tcgcaaggca cctctgagaa cttcaggatg cagatgtctc cagccctcac

181 ctgcctagtc ctgggcctgg cccttgtctt tggtgaaggg tctgctgtgc accatccccc

241 atcctacgtg gcgcaggcct ccaaggaccg caacgtggtt ttctcaccct atggggtggc 301 ctcggtgttg gccatgctcc agctgacaac aggaggagaa acccagcagc agattcaagc

361 agctatggga ttcaagattg atgacaaggg catggccccc gccctccggc atctgtacaa

421 ggagctcatg gggccatgga acaaggatga gatcagcacc acagacgcga tcttcgtcca

481 gcgggatctg aagctggtcc agggcttcat gccccacttc ttcaggctgt tccggagcac

541 ggtcaagcaa gtggactttt cagaggtgga gagagccaga ttcatcatca atgactgggt

601 gaagacacac acaaaaggta tgatcagcaa cttgcttggg aaaggagccg tggaccagct

661 gacacggctg gtgctggtga atgccctcta cttcaacggc cagtggaaga ctcccttccc

721 cgactccagc acccaccgcc gcctcttcca caaatcagac ggcagcactg tctctgtgcc

781 catgatggct cagaccaaca agttcaacta tactgagttc accacgcccg atggccatta

841 ctacgacatc ctggaactgc cctaccacgg ggacaccctc agcatgttca ttgctgcccc

901 ttatgaaaaa gaggtgcctc tctctgccct caccaacatt ctgagtgccc agctcatcag

961 ccactggaaa ggcaacatga ccaggctgcc ccgcctcctg gttctgccca agttctccct

1021 ggagactgaa gtcgacctca ggaagcccct agagaacctg ggaatgaccg acatgttcag

1081 acagtttcag gctgacttca cgagtctttc agaccaagag cctctccacg tcgcgcaggc

1141 gctgcagaaa gtgaagatcg aggtgaacga gagtggcacg gtggcctcct catccacagc

1201 tgtcatagtc tcagcccgca tggcccccga ggagatcatc atggacagac ccttcctctt

1261 tgtggtccgg cacaacccca caggaacagt ccttttcatg ggccaagtga tggaaccctg

1321 accctgggga aagacgcctt catctgggac aaaactggag atgcatcggg aaagaagaaa

1381 ctccgaagaa aagaatttta gtgttaatga ctctttctga aggaagagaa gacatttgcc

1441 ttttgttaaa agatggtaaa ccagatctgt ctccaagacc ttggcctctc cttggaggac

1501 ctttaggtca aactccctag tctccacctg agaccctggg agagaagttt gaagcacaac

1561 tcccttaagg tctccaaacc agacggtgac gcctgcggga ccatctgggg cacctgcttc

1621 cacccgtctc tctgcccact cgggtctgca gacctggttc ccactgaggc cctttgcagg

1681 atggaactac ggggcttaca ggagcttttg tgtgcctggt agaaactatt tctgttccag

1741 tcacattgcc atcactcttg tactgcctgc caccgcggag gaggctggtg acaggccaaa

1801 ggccagtgga agaaacaccc tttcatctca gagtccactg tggcactggc cacccctccc

1861 cagtacaggg gtgctgcagg tggcagagtg aatgtccccc atcatgtggc ccaactctcc

1921 tggcctggcc atctccctcc ccagaaacag tgtgcatggg ttattttgga gtgtaggtga

1981 cttgtttact cattgaagca gatttctgct tccttttatt tttataggaa tagaggaaga

2041 aatgtcagat gcgtgcccag ctcttcaccc cccaatctct tggtggggag gggtgtacct 2101 aaatatttat catatccttg cccttgagtg cttgttagag agaaagagaa ctactaagga

2161 aaataatatt atttaaactc gctcctagtg tttctttgtg gtctgtgtca ccgtatctca

2221 ggaagtccag ccacttgact ggcacacacc cctccggaca tccagcgtga cggagcccac

2281 actgccacct tgtggccgcc tgagaccctc gcgccccccg cgcccctctt tttccccttg

2341 atggaaattg accatacaat ttcatcctcc ttcaggggat caaaaggacg gagtgggggg

2401 acagagactc agatgaggac agagtggttt ccaatgtgtt caatagattt aggagcagaa

2461 atgcaagggg ctgcatgacc taccaggaca gaactttccc caattacagg gtgactcaca

2521 gccgcattgg tgactcactt caatgtgtca tttccggctg ctgtgtgtga gcagtggaca

2581 cgtgaggggg gggtgggtga gagagacagg cagctcggat tcaactacct tagataatat

2641 ttctgaaaac ctaccagcca gagggtaggg cacaaagatg gatgtaatgc actttgggag

2701 gccaaggcgg gaggattgct tgagcccagg agttcaagac cagcctgggc aacataccaa

2761 gacccccgtc tctttaaaaa tatatatatt ttaaatatac ttaaatatat atttctaata

2821 tctttaaata tatatatata ttttaaagac caatttatgg gagaattgca cacagatgtg

2881 aaatgaatgt aatctaatag aagcctaatc agcccaccat gttctccact gaaaaatcct

2941 ctttctttgg ggtttttctt tctttctttt ttgattttgc actggacggt gacgtcagcc

3001 atgtacagga tccacagggg tggtgtcaaa tgctattgaa attgtgttga attgtatgct

3061 ttttcacttt tgataaataa acatgtaaaa atgtttcaaa aaaataataa aataaataaa

3121 tacgaagaat atgtcaggac agtcaaaaaa aaaaaaaaaa aa

Protein sequence (isoform 2):

NCBI Reference Sequence: NP_001 158885.1

LOCUS NP_001 158885

ACCESSION NP_001 158885

1 mqmspaltcl vlglalvfge gsavhhppsy vaqaskdrnv vfspygvasv lamlqlttgg

61 etqqqiqaam gfkiddkgma palrhlykel mgpwnkdeis ttdaifvqrd lklvqgfmph

121 ffrlfrstvk qvdfsevera rfiindwvkt htkgmisnll gkgavdqltr lvlvnalyfn

181 gqwktpfpds sthrrlfhks dgstvsvpmm aqtnkfnyte fttpdghyyd ilelpyhgdt

241 lsmfiaapye kevplsaltn ilsaqlishw kgnmtrlprl lvlpkfslet evdlrkplen

301 lgmtdmfrqf qadftslsdq eplhvaqalq kvkievnesg tvassstavi vsarmapeei

361 imdrpflfvv rhnptgtvlf mgqvmep MPR1

Official Symbol: IGF2R

Official Name: insulin-like growth factor 2 receptor

Gene ID: 3482

Organism : Homo sapiens

Other Aliases: CD222, CIMPR, M6P-R, MPR1 , MPRI

Other Designations: 300 kDa mannose 6-phosphate receptor; CI Man-6-P receptor; CI-MPR; IGF-II receptor; M6P/IGF2 receptor; M6P/IGF2R; M6PR; MPR 300; cation-independent mannose-6 phosphate receptor; cation- independent mannose-6-phosphate receptor; insulin-like growth factor II receptor

Nucleotide sequence:

NCBI Reference Sequence: NM_000876.2

LOCUS NM_000876

ACCESSION NM_000876

1 cgagcccagt cgagccgcgc tcacctcggg ctcccgctcc gtctccacct ccgcctttgc

61 cctggcggcg cgaccccgtc ccgggcgcgg cccccagcag tcgcgcgccg ttagcctcgc

121 gcccgccgcg cagtccgggc ccggcgcgat gggggccgcc gccggccgga gcccccacct

181 ggggcccgcg cccgcccgcc gcccgcagcg ctctctgctc ctgctgcagc tgctgctgct

241 cgtcgctgcc ccggggtcca cgcaggccca ggccgccccg ttccccgagc tgtgcagtta

301 tacatgggaa gctgttgata ccaaaaataa tgtactttat aaaatcaaca tctgtggaag

361 tgtggatatt gtccagtgcg ggccatcaag tgctgtttgt atgcacgact tgaagacacg

421 cacttatcat tcagtgggtg actctgtttt gagaagtgca accagatctc tcctggaatt

481 caacacaaca gtgagctgtg accagcaagg cacaaatcac agagtccaga gcagcattgc

541 cttcctgtgt gggaaaaccc tgggaactcc tgaatttgta actgcaacag aatgtgtgca

601 ctactttgag tggaggacca ctgcagcctg caagaaagac atatttaaag caaataagga

661 ggtgccatgc tatgtgtttg atgaagagtt gaggaagcat gatctcaatc ctctgatcaa

721 gcttagtggt gcctacttgg tggatgactc cgatccggac acttctctat tcatcaatgt

781 ttgtagagac atagacacac tacgagaccc aggttcacag ctgcgggcct gtccccccgg

841 cactgccgcc tgcctggtaa gaggacacca ggcgtttgat gttggccagc cccgggacgg

901 actgaagctg gtgcgcaagg acaggcttgt cctgagttac gtgagggaag aggcaggaaa 961 gctagacttt tgtgatggtc acagccctgc ggtgactatt acatttgttt gcccgtcgga

1021 gcggagagag ggcaccattc ccaaactcac agctaaatcc aactgccgct atgaaattga

1081 gtggattact gagtatgcct gccacagaga ttacctggaa agtaaaactt gttctctgag

1141 cggcgagcag caggatgtct ccatagacct cacaccactt gcccagagcg gaggttcatc

1201 ctatatttca gatggaaaag aatatttgtt ttatttgaat gtctgtggag aaactgaaat

1261 acagttctgt aataaaaaac aagctgcagt ttgccaagtg aaaaagagcg atacctctca

1321 agtcaaagca gcaggaagat accacaatca gaccctccga tattcggatg gagacctcac

1381 cttgatatat tttggaggtg atgaatgcag ctcagggttt cagcggatga gcgtcataaa

1441 ctttgagtgc aataaaaccg caggtaacga tgggaaagga actcctgtat tcacagggga

1501 ggttgactgc acctacttct tcacatggga cacggaatac gcctgtgtta aggagaagga

1561 agacctcctc tgcggtgcca ccgacgggaa gaagcgctat gacctgtccg cgctggtccg

1621 ccatgcagaa ccagagcaga attgggaagc tgtggatggc agtcagacgg aaacagagaa

1681 gaagcatttt ttcattaata tttgtcacag agtgctgcag gaaggcaagg cacgagggtg

1741 tcccgaggac gcggcagtgt gtgcagtgga taaaaatgga agtaaaaatc tgggaaaatt

1801 tatttcctct cccatgaaag agaaaggaaa cattcaactc tcttattcag atggtgatga

1861 ttgtggtcat ggcaagaaaa ttaaaactaa tatcacactt gtatgcaagc caggtgatct

1921 ggaaagtgca ccagtgttga gaacttctgg ggaaggcggt tgcttttatg agtttgagtg

1981 gcacacagct gcggcctgtg tgctgtctaa gacagaaggg gagaactgca cggtctttga

2041 ctcccaggca gggttttctt ttgacttatc acctctcaca aagaaaaatg gtgcctataa

2101 agttgagaca aagaagtatg acttttatat aaatgtgtgt ggcccggtgt ctgtgagccc

2161 ctgtcagcca gactcaggag cctgccaggt ggcaaaaagt gatgagaaga cttggaactt

2221 gggtctgagt aatgcgaagc tttcatatta tgatgggatg atccaactga actacagagg

2281 cggcacaccc tataacaatg aaagacacac accgagagct acgctcatca cctttctctg

2341 tgatcgagac gcgggagtgg gcttccctga atatcaggaa gaggataact ccacctacaa

2401 cttccggtgg tacaccagct atgcctgccc ggaggagccc ctggaatgcg tagtgaccga

2461 cccctccacg ctggagcagt acgacctctc cagtctggca aaatctgaag gtggccttgg

2521 aggaaactgg tatgccatgg acaactcagg ggaacatgtc acgtggagga aatactacat

2581 taacgtgtgt cggcctctga atccagtgcc gggctgcaac cgatatgcat cggcttgcca

2641 gatgaagtat gaaaaagatc agggctcctt cactgaagtg gtttccatca gtaacttggg

2701 aatggcaaag accggcccgg tggttgagga cagcggcagc ctccttctgg aatacgtgaa 2761 tgggtcggcc tgcaccacca gcgatggcag acagaccaca tataccacga ggatccatct

2821 cgtctgctcc aggggcaggc tgaacagcca ccccatcttt tctctcaact gggagtgtgt

2881 ggtcagtttc ctgtggaaca cagaggctgc ctgtcccatt cagacaacga cggatacaga

2941 ccaggcttgc tctataaggg atcccaacag tggatttgtg tttaatctta atccgctaaa

3001 cagttcgcaa ggatataacg tctctggcat tgggaagatt tttatgttta atgtctgcgg

3061 cacaatgcct gtctgtggga ccatcctggg aaaacctgct tctggctgtg aggcagaaac

3121 ccaaactgaa gagctcaaga attggaagcc agcaaggcca gtcggaattg agaaaagcct

3181 ccagctgtcc acagagggct tcatcactct gacctacaaa gggcctctct ctgccaaagg

3241 taccgctgat gcttttatcg tccgctttgt ttgcaatgat gatgtttact cagggcccct

3301 caaattcctg catcaagata tcgactctgg gcaagggatc cgaaacactt actttgagtt

3361 tgaaaccgcg ttggcctgtg ttccttctcc agtggactgc caagtcaccg acctggctgg

3421 aaatgagtac gacctgactg gcctaagcac agtcaggaaa ccttggacgg ctgttgacac

3481 ctctgtcgat gggagaaaga ggactttcta tttgagcgtt tgcaatcctc tcccttacat

3541 tcctggatgc cagggcagcg cagtggggtc ttgcttagtg tcagaaggca atagctggaa

3601 tctgggtgtg gtgcagatga gtccccaagc cgcggcgaat ggatctttga gcatcatgta

3661 tgtcaacggt gacaagtgtg ggaaccagcg cttctccacc aggatcacgt ttgagtgtgc

3721 tcagatatcg ggctcaccag catttcagct tcaggatggt tgtgagtacg tgtttatctg

3781 gagaactgtg gaagcctgtc ccgttgtcag agtggaaggg gacaactgtg aggtgaaaga

3841 cccaaggcat ggcaacttgt atgacctgaa gcccctgggc ctcaacgaca ccatcgtgag

3901 cgctggcgaa tacacttatt acttccgggt ctgtgggaag ctttcctcag acgtctgccc

3961 cacaagtgac aagtccaagg tggtctcctc atgtcaggaa aagcgggaac cgcagggatt

4021 tcacaaagtg gcaggtctcc tgactcagaa gctaacttat gaaaatggct tgttaaaaat

4081 gaacttcacg gggggggaca cttgccataa ggtttatcag cgctccacag ccatcttctt

4141 ctactgtgac cgcggcaccc agcggccagt atttctaaag gagacttcag attgttccta

4201 cttgtttgag tggcgaacgc agtatgcctg cccacctttc gatctgactg aatgttcatt

4261 caaagatggg gctggcaact ccttcgacct ctcgtccctg tcaaggtaca gtgacaactg

4321 ggaagccatc actgggacgg gggacccgga gcactacctc atcaatgtct gcaagtctct

4381 ggccccgcag gctggcactg agccgtgccc tccagaagca gccgcgtgtc tgctgggtgg

4441 ctccaagccc gtgaacctcg gcagggtaag ggacggacct cagtggagag atggcataat

4501 tgtcctgaaa tacgttgatg gcgacttatg tccagatggg attcggaaaa agtcaaccac 4561 catccgattc acctgcagcg agagccaagt gaactccagg cccatgttca tcagcgccgt

4621 ggaggactgt gagtacacct ttgcctggcc cacagccaca gcctgtccca tgaagagcaa

4681 cgagcatgat gactgccagg tcaccaaccc aagcacagga cacctgtttg atctgagctc

4741 cttaagtggc agggcgggat tcacagctgc ttacagcgag aaggggttgg tttacatgag

4801 catctgtggg gagaatgaaa actgccctcc tggcgtgggg gcctgctttg gacagaccag

4861 gattagcgtg ggcaaggcca acaagaggct gagatacgtg gaccaggtcc tgcagctggt

4921 gtacaaggat gggtcccctt gtccctccaa atccggcctg agctataaga gtgtgatcag

4981 tttcgtgtgc aggcctgagg ccgggccaac caataggccc atgctcatct ccctggacaa

5041 gcagacatgc actctcttct tctcctggca cacgccgctg gcctgcgagc aagcgaccga

5101 atgttccgtg aggaatggaa gctctattgt tgacttgtct ccccttattc atcgcactgg

5161 tggttatgag gcttatgatg agagtgagga tgatgcctcc gataccaacc ctgatttcta

5221 catcaatatt tgtcagccac taaatcccat gcacggagtg ccctgtcctg ccggagccgc

5281 tgtgtgcaaa gttcctattg atggtccccc catagatatc ggccgggtag caggaccacc

5341 aatactcaat ccaatagcaa atgagattta cttgaatttt gaaagcagta ctccttgctt

5401 agcggacaag catttcaact acacctcgct catcgcgttt cactgtaaga gaggtgtgag

5461 catgggaacg cctaagctgt taaggaccag cgagtgcgac tttgtgttcg aatgggagac

5521 tcctgtcgtc tgtcctgatg aagtgaggat ggatggctgt accctgacag atgagcagct

5581 cctctacagc ttcaacttgt ccagcctttc cacgagcacc tttaaggtga ctcgcgactc

5641 gcgcacctac agcgttgggg tgtgcacctt tgcagtcggg ccagaacaag gaggctgtaa

5701 ggacggagga gtctgtctgc tctcaggcac caagggggca tcctttggac ggctgcaatc

5761 aatgaaactg gattacaggc accaggatga agcggtcgtt ttaagttacg tgaatggtga

5821 tcgttgccct ccagaaaccg atgacggcgt cccctgtgtc ttccccttca tattcaatgg

5881 gaagagctac gaggagtgca tcatagagag cagggcgaag ctgtggtgta gcacaactgc

5941 ggactacgac agagaccacg agtggggctt ctgcagacac tcaaacagct accggacatc

6001 cagcatcata tttaagtgtg atgaagatga ggacattggg aggccacaag tcttcagtga

6061 agtgcgtggg tgtgatgtga catttgagtg gaaaacaaaa gttgtctgcc ctccaaagaa

6121 gttggagtgc aaattcgtcc agaaacacaa aacctacgac ctgcggctgc tctcctctct

6181 caccgggtcc tggtccctgg tccacaacgg agtctcgtac tatataaatc tgtgccagaa

6241 aatatataaa gggcccctgg gctgctctga aagggccagc atttgcagaa ggaccacaac

6301 tggtgacgtc caggtcctgg gactcgttca cacgcagaag ctgggtgtca taggtgacaa 6361 agttgttgtc acgtactcca aaggttatcc gtgtggtgga aataagaccg catcctccgt

6421 gatagaattg acctgtacaa agacggtggg cagacctgca ttcaagaggt ttgatatcga

6481 cagctgcact tactacttca gctgggactc ccgggctgcc tgcgccgtga agcctcagga

6541 ggtgcagatg gtgaatggga ccatcaccaa ccctataaat ggcaagagct tcagcctcgg

6601 agatatttat tttaagctgt tcagagcctc tggggacatg aggaccaatg gggacaacta

6661 cctgtatgag atccaacttt cctccatcac aagctccaga aacccggcgt gctctggagc

6721 caacatatgc caggtgaagc ccaacgatca gcacttcagt cggaaagttg gaacctctga

6781 caagaccaag tactaccttc aagacggcga tctcgatgtc gtgtttgcct cttcctctaa

6841 gtgcggaaag gataagacca agtctgtttc ttccaccatc ttcttccact gtgaccctct

6901 ggtggaggac gggatccccg agttcagtca cgagactgcc gactgccagt acctcttctc

6961 ttggtacacc tcagccgtgt gtcctctggg ggtgggcttt gacagcgaga atcccgggga

7021 cgacgggcag atgcacaagg ggctgtcaga acggagccag gcagtcggcg cggtgctcag

7081 cctgctgctg gtggcgctca cctgctgcct gctggccctg ttgctctaca agaaggagag

7141 gagggaaaca gtgataagta agctgaccac ttgctgtagg agaagttcca acgtgtccta

7201 caaatactca aaggtgaata aggaagaaga gacagatgag aatgaaacag agtggctgat

7261 ggaagagatc cagctgcctc ctccacggca gggaaaggaa gggcaggaga acggccatat

7321 taccaccaag tcagtgaaag ccctcagctc cctgcatggg gatgaccagg acagtgagga

7381 tgaggttctg accatcccag aggtgaaagt tcactcgggc aggggagctg gggcagagag

7441 ctcccaccca gtgagaaacg cacagagcaa tgcccttcag gagcgtgagg acgatagggt

7501 ggggctggtc aggggtgaga aggcgaggaa agggaagtcc agctctgcac agcagaagac

7561 agtgagctcc accaagctgg tgtccttcca tgacgacagc gacgaggacc tcttacacat

7621 ctgactccgc agtgcctgca ggggagcacg gagccgcggg acagccaagc acctccaacc

7681 aaataagact tccactcgat gatgcttcta taattttgcc tttaacagaa actttcaaaa

7741 gggaagagtt tttgtgatgg gggagagggt gaaggaggtc aggccccact ccttcctgat

7801 tgtttacagt cattggaata aggcatggct cagatcggcc acagggcggt accttgtgcc

7861 cagggttttg ccccaagtcc tcatttaaaa gcataaggcc ggacgcatct caaaacagag

7921 ggctgcattc gaagaaaccc ttgctgcttt agtcccgata gggtatttga ccccgatata

7981 ttttagcatt ttaattctct ccccctattt attgactttg acaattactc aggtttgaga

8041 aaaaggaaaa aaaaacagcc accgtttctt cctgccagca ggggtgtgat gtaccagttt

8101 gtccatcttg agatggtgag gctgtcagtg tatggggcag cttccggcgg gatgttgaac tgtgtcccct gagttggagc tcattctgtc tcttttctct tttgctttct

8221 gtttctta ggcacacaca cgtgcgtgcg agcacacaca cacatacgtg cacagggtcc

8281 ccgagtgc aggttttgga gagtttgcct gttctatgcc tttagtcagg aatggctgca

8341 cctttttg tgatatcttc aagcctgggc gtacagagca catttgtcag tatttttgcc

8401 ggctggtg ttcaaacaac ctgcccaaag attgatttgt gtgtttgtgt gtgtgtgtgt

8461 gtgtgtgt gtgtgtgtga gtggagttga ggtgtcagag aaaatgaatt ttttccagat

8521 ttggggta ggtctcatct cttcaggttc tcatgatacc acctttactg tgcttatttt

8581 tttaagaa aagtgttgat caaccattcg acctataaga agccttaatt tgcacagtgt

8641 gtgactta gaaactgcat gaaaaatcat gggccagagc ctcggcccta gcattgcact

8701 tggcctca ctggagggag gctgggcggg tacagcgcgg aggaggaggg aggccaggcg

8761 ggcatggc ggaggaggag ggaggccggg cggtcacagc atggaggagg agggaggcgc

8821 tgctggtg cttattctgg cggcagcgcc tttcctgcca tgtttagtga atgacttttc

8881 tcgcattg gaattgtata tagactctgg tgttctattg ctgagaagca aaccgccctg

8941 cagcatcc cagcctgtac cggtttggct ggcttgtttg atttcaacat gagtgtattt

9001 tttaaaat atttttctct tcattttttt ttcaatcaac tttactgtaa tataaagtat

9061 tcaacaattt caataaaaga taaattatta aaa

Protein sequence:

NCBI Reference Sequence: NP_000867.2

LOCUS NP_000867

ACCESSION NP_000867

1 mgaaagrsph Igpaparrpq rsllllqlll Ivaapgstqa qaapfpelcs ytweavdtkn

61 nvlykinicg svdivqcgps savcmhdlkt rtyhsvgdsv lrsatrslle fnttvscdqq

121 gtnhrvqssi aflcgktlgt pefvtatecv hyfewrttaa ckkdifkank evpcyvfdee

181 lrkhdlnpli klsgaylvdd sdpdtslfin vcrdidtlrd pgsqlracpp gtaaclvrgh

241 qafdvgqprd glklvrkdrl vlsyvreeag kldfcdghsp avtitfvcps erregtipkl

301 taksncryei ewiteyachr dylesktcsl sgeqqdvsid ltplaqsggs syisdgkeyl

361 fylnvcgete iqfcnkkqaa vcqvkksdts qvkaagryhn qtlrysdgdl tliyfggdec

421 ssgfqrmsvi nfecnktagn dgkgtpvftg evdctyfftw dteyacvkek edllcgatdg

481 kkrydlsalv rhaepeqnwe avdgsqtete kkhffinich rvlqegkarg cpedaavcav

541 dkngsknlgk fisspmkekg niqlsysdgd dcghgkkikt nitlvckpgd lesapvlrts 601 geggcfyefe whtaaacvls ktegenctvf dsqagfsfdl spltkkngay kvetkkydfy

661 invcgpvsvs pcqpdsgacq vaksdektwn lglsnaklsy ydgmiqlnyr ggtpynnerh

721 tpratlitfl cdrdagvgfp eyqeednsty nfrwytsyac peeplecvvt dpstleqydl

781 sslakseggl ggnwyamdns gehvtwrkyy invcrplnpv pgcnryasac qmkyekdqgs

841 ftevvsisnl gmaktgpvve dsgsllleyv ngsacttsdg rqttyttrih lvcsrgrIns

901 hpifslnwec vvsflwntea acpiqtttdt dqacsirdpn sgfvfnlnpl nssqgynvsg

961 igkifmfnvc gtmpvcgtil gkpasgceae tqteelknwk parpvgieks lqlstegfit

1021 ltykgplsak gtadafivrf vcnddvysgp lkflhqdids gqgirntyfe fetalacvps

1081 pvdcqvtdla gneydltgls tvrkpwtavd tsvdgrkrtf ylsvcnplpy ipgcqgsavg

1141 sclvsegnsw nlgvvqmspq aaangslsim yvngdkcgnq rfstritfec aqisgspafq

1201 lqdgceyvfi wrtveacpvv rvegdncevk dprhgnlydl kplglndtiv sageytyyfr

1261 vcgklssdvc ptsdkskvvs scqekrepqg fhkvaglltq kltyengllk mnftggdtch

1321 kvyqrstaif fycdrgtqrp vflketsdcs ylfewrtqya cppfdltecs fkdgagnsfd

1381 lsslsrysdn weaitgtgdp ehylinvcks lapqagtepc ppeaaacllg gskpvnlgrv

1441 rdgpqwrdgi ivlkyvdgdl cpdgirkkst tirftcsesq vnsrpmfisa vedceytfaw

1501 ptatacpmks nehddcqvtn pstghlfdls slsgragfta aysekglvym sicgenencp

1561 pgvgacfgqt risvgkankr lryvdqvlql vykdgspcps ksglsyksvi sfvcrpeagp

1621 tnrpmlisld kqtctlffsw htplaceqat ecsvrngssi vdlsplihrt ggyeaydese

1681 ddasdtnpdf yinicqplnp mhgvpcpaga avckvpidgp pidigrvagp pilnpianei

1741 ylnfesstpc ladkhfnyts liafhckrgv smgtpkllrt secdfvfewe tpvvcpdevr

1801 mdgctltdeq llysfnlssl ststfkvtrd srtysvgvct favgpeqggc kdggvcllsg

1861 tkgasfgrlq smkldyrhqd eavvlsyvng drcppetddg vpcvfpfifn gksyeeciie

1921 sraklwcstt adydrdhewg fcrhsnsyrt ssiifkcded edigrpqvfs evrgcdvtfe

1981 wktkvvcppk kleckfvqkh ktydlrllss ltgswslvhn gvsyyinlcq kiykgplgcs

2041 erasicrrtt tgdvqvlglv htqklgvigd kvvvtyskgy pcggnktass vieltctktv

2101 grpafkrfdi dsctyyfswd sraacavkpq evqmvngtit npingksfsi gdiyfklfra

2161 sgdmrtngdn ylyeiqlssi tssrnpacsg anicqvkpnd qhfsrkvgts dktkyylqdg

2221 dldvvfasss kcgkdktksv sstiffhcdp lvedgipefs hetadcqyIf swytsavcpl

2281 gvgfdsenpg ddgqmhkgls ersqavgavl slllvaltcc llalllykke rretvisklt

2341 tccrrssnvs ykyskvnkee etdenetewl meeiqlpppr qgkegqengh ittksvkals 2401 slhgddqdse devltipevk vhsgrgagae sshpvrnaqs nalqereddr vglvrgekar

2461 kgksssaqqk tvsstklvsf hddsdedllh i

1A69

Official Symbol: HLA-A

Official Name: major histocompatibility complex, class I, A

Gene ID: 3105

Organism: Homo sapiens

Other Aliases: DAQB-90C1 1 .16-002, HLAA

Other Designations: HLA class I histocompatibility antigen, A-1 alpha chain; MHC class I antigen HLA-A heavy chain; antigen presenting molecule; leukocyte antigen class l-A

Nucleotide sequence (variant 1 ):

NCBI Reference Sequence: NM_0021 16.7

LOCUS NM_0021 16

ACCESSION NM_0021 16

1 gagaagccaa tcagtgtcgt cgcggtcgct gttctaaagc ccgcacgcac ccaccgggac

61 tcagattctc cccagacgcc gaggatggcc gtcatggcgc cccgaaccct cctcctgcta

121 ctctcggggg ccctggccct gacccagacc tgggcgggct cccactccat gaggtatttc

181 ttcacatccg tgtcccggcc cggccgcggg gagccccgct tcatcgccgt gggctacgtg

241 gacgacacgc agttcgtgcg gttcgacagc gacgccgcga gccagaggat ggagccgcgg

301 gcgccgtgga tagagcagga ggggccggag tattgggacc aggagacacg gaatgtgaag

361 gcccagtcac agactgaccg agtggacctg gggaccctgc gcggctacta caaccagagc

421 gaggccggtt ctcacaccat ccagataatg tatggctgcg acgtggggtc ggacgggcgc

481 ttcctccgcg ggtaccggca ggacgcctac gacggcaagg attacatcgc cctgaacgag

541 gacctgcgct cttggaccgc ggcggacatg gcggctcaga tcaccaagcg caagtgggag

601 gcggcccatg aggcggagca gttgagagcc tacctggatg gcacgtgcgt ggagtggctc

661 cgcagatacc tggagaacgg gaaggagacg ctgcagcgca cggacccccc caagacacat

721 atgacccacc accccatctc tgaccatgag gccaccctga ggtgctgggc cctgggcttc

781 taccctgcgg agatcacact gacctggcag cgggatgggg aggaccagac ccaggacacg agaccaggcc tgcaggggat ggaaccttcc

ggctgtggtg

901 ς gagaggagca gagatacacc tgccatgtgc

tctgcccaag

961 c tgagatggga gctgtcttcc cagcccacca

gggcatcatt

1021 ς ttctccttgg agctgtgatc actggagctg

cgtgatgtgg

1081 a gctcagatag aaaaggaggg agttacactc

cagtgacagt

1141 ς ctgatgtgtc cctcacagct tgtaaagtgt

ccttgtgtgg

1201 ς caagagttgt tcctgccctt ccctttgtga

cctgactttg

1261 t ggcacctgca tgtgtctgtg ttcgtgtagg

ggaggtgggg

1321 a acccccatgt ccaccatgac cctcttccca

tgctccctcc

1381 c ttcctgttcc agagaggtgg ggctgaggtg

tgtctcaact

1441 t ctgagctgta acttcttcct tccctattaa

ttagtataaa

1501 t caaattcttg ccatgagagg ttgatgagtt

gaagattcct

agacaaaata aatggaagac atgagaacct

aaaaaaaaaa

1621 aaaaaaaaaa aaaaaa

Protein sequence (variant 1 ):

NCBI Reference Sequence: NP_002107.3

LOCUS NP_002107

ACCESSION NP_002107

1 mavmaprtll lllsgalalt qtwagshsmr yfftsvsrpg rgeprfiavg yvddtqfvrf

61 dsdaasqrme prapwieqeg peywdqetrn vkaqsqtdrv dlgtlrgyyn qseagshtiq

121 imygcdvgsd grflrgyrqd aydgkdyial nedlrswtaa dmaaqitkrk weaaheaeql

181 rayldgtcve wlrrylengk etlqrtdppk thmthhpisd heatlrcwal gfypaeitlt

241 wqrdgedqtq dtelvetrpa gdgtfqkwaa vvvpsgeeqr ytchvqhegl pkpltlrwel

301 ssqptipivg iiaglvllga vitgavvaav mwrrkssdrk ggsytqaass dsaqgsdvsl

361 tackv

Nucleotide sequence (variant 2):

NCBI Reference Sequence: NM_001242758.1

LOCUS NM_001242758

ACCESSION NM 001242758 1 gagaagccaa tcagtgtcgt cgcggtcgct gttctaaagt ccgcacgcac ccaccgggac

61 tcagattctc cccagacgcc gaggatggcc gtcatggcgc cccgaaccct cctcctgcta

121 ctctcggggg ccctggccct gacccagacc tgggcgggct cccactccat gaggtatttc

181 ttcacatccg tgtcccggcc cggccgcggg gagccccgct tcatcgccgt gggctacgtg

241 gacgacacgc agttcgtgcg gttcgacagc gacgccgcga gccagaagat ggagccgcgg

301 gcgccgtgga tagagcagga ggggccggag tattgggacc aggagacacg gaatatgaag

361 gcccactcac agactgaccg agcgaacctg gggaccctgc gcggctacta caaccagagc

421 gaggacggtt ctcacaccat ccagataatg tatggctgcg acgtggggcc ggacgggcgc

481 ttcctccgcg ggtaccggca ggacgcctac gacggcaagg attacatcgc cctgaacgag

541 gacctgcgct cttggaccgc ggcggacatg gcagctcaga tcaccaagcg caagtgggag

601 gcggtccatg cggcggagca gcggagagtc tacctggagg gccggtgcgt ggacgggctc

661 cgcagatacc tggagaacgg gaaggagacg ctgcagcgca cggacccccc caagacacat

721 atgacccacc accccatctc tgaccatgag gccaccctga ggtgctgggc cctgggcttc

781 taccctgcgg agatcacact gacctggcag cgggatgggg aggaccagac ccaggacacg

841 gagctcgtgg agaccaggcc tgcaggggat ggaaccttcc agaagtgggc ggctgtggtg

901 gtgccttctg gagaggagca gagatacacc tgccatgtgc agcatgaggg tctgcccaag

961 cccctcaccc tgagatggga gctgtcttcc cagcccacca tccccatcgt gggcatcatt

1021 gctggcctgg ttctccttgg agctgtgatc actggagctg tggtcgctgc cgtgatgtgg

1081 aggaggaaga gctcagatag aaaaggaggg agttacactc aggctgcaag cagtgacagt

1141 gcccagggct ctgatgtgtc tctcacagct tgtaaagtgt gagacagctg ccttgtgtgg

1201 gactgagagg caagagttgt tcctgccctt ccctttgtga cttgaagaac cctgactttg

1261 tttctgcaaa ggcacctgca tgtgtctgtg ttcgtgtagg cataatgtga ggaggtgggg

1321 agagcacccc acccccatgt ccaccatgac cctcttccca cgctgacctg tgctccctct

1381 ccaatcatct ttcctgttcc agagaggtgg ggctgaggtg tctccatctc tgtctcaact

1441 tcatggtgca ctgagctgta acttcttcct tccctattaa aattagaacc tgagtataaa

1501 tttactttct caaattcttg ccatgagagg ttgatgagtt aattaaagga gaagattcct

1561 aaaatttgag agacaaaatt aatggaacgc atgagaacct tccagagtcc a

Protein sequence (variant 2):

NCBI Reference Sequence: NP_001229687.1

LOCUS NP 001229687 ACCESSION NP_001229687

1 mavmaprtll lllsgalalt qtwagshsmr yfftsvsrpg rgeprfiavg yvddtqfvrf

61 dsdaasqkme prapwieqeg peywdqetrn mkahsqtdra nlgtlrgyyn qsedgshtiq

121 imygcdvgpd grflrgyrqd aydgkdyial nedlrswtaa dmaaqitkrk weavhaaeqr

181 rvylegrcvd glrrylengk etlqrtdppk thmthhpisd heatlrcwal gfypaeitlt

241 wqrdgedqtq dtelvetrpa gdgtfqkwaa vvvpsgeeqr ytchvqhegl pkpltlrwel

301 ssqptipivg iiaglvllga vitgavvaav mwrrkssdrk ggsytqaass dsaqgsdvsl

361 tackv

P4HA2

Official Symbol: P4HA2

Official Name: prolyl 4-hydroxylase, alpha polypeptide II

Gene ID: 8974

Organism: Homo sapiens

Other Aliases: UNQ290/PRO330

Other Designations: 4-PH alpha 2; 4-PH alpha-2; C-P4Halpha(ll); collagen prolyl 4-hydroxylase alpha(ll); procollagen-proline, 2-oxoglutarate 4-dioxygenase (proline 4-hydroxylase), alpha polypeptide II; procollagen-proline,2-oxoglutarate- 4-dioxygenase subunit alpha-2; prolyl 4-hydroxylase subunit alpha-2

Nucleotide sequence (variant 1 ):

NCBI Reference Sequence: NM_004199.2

LOCUS NM_004199

ACCESSION NM_004199

1 agcgttgttt ttccttggca gctgcggaga cccgtgataa ttcgttaact aattcaacaa

61 acgggaccct tctgtgtgcc agaaaccgca agcagttgct aacccagtgg gacaggcgga

121 ttggaagagc gggaaggtcc tggcccagag cagtgtggtg agcgctgtgc tggaagggaa

181 tgcgggcagt gggtacttgg tagagcactg actgcctccg gccagaggac ttcccggagg

241 aggtgaccca tgagctggag tggtcagagg aaggctggca aaagggcatc gtggacagag

301 gaacagccta tgtgagtggg agcagagacc ttggccaatg ccattcctta tggccttgta

361 gtggaagcaa ggtgatgggg aaggaacact gtaggggata gctgtccacg gacgctgtct

421 acaagaccct ggagtgagat aacgtgcctg gtactgtgcc ctgcatgtgt aagatgccca 481 gttgaccttc gcagcaggag cctggatcag ggcacttcct gcctcaggta ttgctggaca

541 gcccagacac ttccctctgt gaccatgaaa ctctgggtgt ctgcattgct gatggcctgg

601 tttggtgtcc tgagctgtgt gcaggccgaa ttcttcacct ctattgggca catgactgac

661 ctgatttatg cagagaaaga gctggtgcag tctctgaaag agtacatcct tgtggaggaa

721 gccaagcttt ccaagattaa gagctgggcc aacaaaatgg aagccttgac tagcaagtca

781 gctgctgatg ctgagggcta cctggctcac cctgtgaatg cctacaaact ggtgaagcgg

841 ctaaacacag actggcctgc gctggaggac cttgtcctgc aggactcagc tgcaggtttt

901 atcgccaacc tctctgtgca gcggcagttc ttccccactg atgaggacga gataggagct

961 gccaaagccc tgatgagact tcaggacaca tacaggctgg acccaggcac aatttccaga

1021 ggggaacttc caggaaccaa gtaccaggca atgctgagtg tggatgactg ctttgggatg

1081 ggccgctcgg cctacaatga aggggactat tatcatacgg tgttgtggat ggagcaggtg

1141 ctaaagcagc ttgatgccgg ggaggaggcc accacaacca agtcacaggt gctggactac

1201 ctcagctatg ctgtcttcca gttgggtgat ctgcaccgtg ccctggagct cacccgccgc

1261 ctgctctccc ttgacccaag ccacgaacga gctggaggga atctgcggta ctttgagcag

1321 ttattggagg aagagagaga aaaaacgtta acaaatcaga cagaagctga gctagcaacc

1381 ccagaaggca tctatgagag gcctgtggac tacctgcctg agagggatgt ttacgagagc

1441 ctctgtcgtg gggagggtgt caaactgaca ccccgtagac agaagaggct tttctgtagg

1501 taccaccatg gcaacagggc cccacagctg ctcattgccc ccttcaaaga ggaggacgag

1561 tgggacagcc cgcacatcgt caggtactac gatgtcatgt ctgatgagga aatcgagagg

1621 atcaaggaga tcgcaaaacc taaacttgca cgagccaccg ttcgtgatcc caagacagga

1681 gtcctcactg tcgccagcta ccgggtttcc aaaagctcct ggctagagga agatgatgac

1741 cctgttgtgg cccgagtaaa tcgtcggatg cagcatatca cagggttaac agtaaagact

1801 gcagaattgt tacaggttgc aaattatgga gtgggaggac agtatgaacc gcacttcgac

1861 ttctctagga atgatgagcg agatactttc aagcatttag ggacggggaa tcgtgtggct

1921 actttcttaa actacatgag tgatgtagaa gctggtggtg ccaccgtctt ccctgatctg

1981 ggggctgcaa tttggcctaa gaagggtaca gctgtgttct ggtacaacct cttgcggagc

2041 ggggaaggtg actaccgaac aagacatgct gcctgccctg tgcttgtggg ctgcaagtgg

2101 gtctccaata agtggttcca tgaacgagga caggagttct tgagaccttg tggatcaaca

2161 gaagttgact gacatccttt tctgtccttc cccttcctgg tccttcagcc catgtcaacg

2221 tgacagacac ctttgtatgt tcctttgtat gttcctatca ggctgatttt tggagaaatg 2281 aatgtttgtc tggagcagag ggagaccata ctagggcgac tcctgtgtga ctgaagtccc

2341 agcccttcca ttcagcctgt gccatccctg gccccaaggc taggatcaaa gtggctgcag

2401 cagagttagc tgtctagcgc ctagcaaggt gcctttgtac ctcaggtgtt ttaggtgtga

2461 gatgtttcag tgaaccaaag ttctgatacc ttgtttacat gtttgttttt atggcatttc

2521 tatctattgt ggctttacca aaaaataaaa tgtccctacc agaagcctta aaaaaaaaaa

2581 aaaaaaaa

Protein sequence (variant 1 ):

NCBI Reference Sequence: NP_004190.1

LOCUS NP_004190

ACCESSION NP_004190

1 mklwvsallm awfgvlscvq aefftsighm tdliyaekel vqslkeyilv eeaklskiks

61 wankmealts ksaadaegyl ahpvnayklv krlntdwpal edlvlqdsaa gfianlsvqr

121 qffptdedei gaakalmrlq dtyrldpgti srgelpgtky qamlsvddcf gmgrsayneg

181 dyyhtvlwme qvlkqldage eatttksqvl dylsyavfql gdlhralelt rrllsldpsh

241 eraggnlryf eqlleeerek tltnqteael atpegiyerp vdylperdvy eslcrgegvk

301 ltprrqkrlf cryhhgnrap qlliapfkee dewdsphivr yydvmsdeei erikeiakpk

361 laratvrdpk tgvltvasyr vsksswleed ddpvvarvnr rmqhitgltv ktaellqvan

421 ygvggqyeph fdfsrnderd tfkhlgtgnr vatflnymsd veaggatvfp dlgaaiwpkk

481 gtavfwynll rsgegdyrtr haacpvlvgc kwvsnkwfhe rgqeflrpcg stevd

Nucleotide sequence (variant 2):

NCBI Reference Sequence: NM_001017973.1

LOCUS NM_001017973

ACCESSION NM_001017973

1 agcgttgttt ttccttggca gctgcggaga cccgtgataa ttcgttaact aattcaacaa

61 acgggaccct tctgtgtgcc agaaaccgca agcagttgct aacccagtgg gacaggcgga

121 ttggaagagc gggaaggtcc tggcccagag cagtgtggtg agcgctgtgc tggaagggaa

181 tgcgggcagt gggtacttgg tagagcactg actgcctccg gccagaggac ttcccggagg

241 aggtgaccca tgagctggag tggtcagagg aaggctggca aaagggcatc gtggacagag

301 gaacagccta tgtgagtggg agcagagacc ttggccaatg ccattcctta tggccttgta 361 gtggaagcaa ggtgatgggg aaggaacact gtaggggata gctgtccacg gacgctgtct

421 acaagaccct ggagtgagat aacgtgcctg gtactgtgcc ctgcatgtgt aagatgccca

481 gttgaccttc gcagcaggag cctggatcag ggcacttcct gcctcaggta ttgctggaca