Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
IDENTIFYING HIGH RISK CLINICALLY ISOLATED SYNDROME PATIENTS
Document Type and Number:
WIPO Patent Application WO/2010/011965
Kind Code:
A2
Abstract:
Disclosed herein are methods and kits for identifying clinically isolated syndrome (CIS) patients at high risk of developing multiple sclerosis (MS).

Inventors:
CORVOL JEAN-CHRISTOPHE (FR)
PELLETIER DANIEL (US)
HAUSER STEPHEN L (US)
OKSENBERG JORGE R (US)
BARANZINI SERGIO (US)
Application Number:
PCT/US2009/051750
Publication Date:
January 28, 2010
Filing Date:
July 24, 2009
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV CALIFORNIA (US)
CORVOL JEAN-CHRISTOPHE (FR)
PELLETIER DANIEL (US)
HAUSER STEPHEN L (US)
OKSENBERG JORGE R (US)
BARANZINI SERGIO (US)
International Classes:
G01N33/50; C12Q1/68
Domestic Patent References:
WO2005051988A22005-06-09
Attorney, Agent or Firm:
JENKINS, Kenneth, E. et al. (Two Embarcadero Center8th Floo, San Francisco CA, US)
Download PDF:
Claims:
WHAT IS CLAIMED IS:

1. A method of identifying a patient with clinically isolated syndrome (CIS) at high risk of developing multiple sclerosis (MS), said method comprising: detecting the level of expression of a marker gene within said patient, wherein said marker gene is a marker gene set forth in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8, or Table 13, or said marker gene comprises a nucleic acid of at least 10 nucleotides in length and at least 90% identity with a contiguous portion of one of SEQ ID NO:1 to SEQ ID NO:1021; and comparing the level of expression of said marker gene to a standard control whereby a differential expression of said marker gene relative to said standard control indicates that said patient is at high risk of developing multiple sclerosis.

2. The method of claim 1 , wherein said marker gene is a marker gene set forth in Table 18.

3. The method of claim 1 , wherein said marker gene is a marker gene set forth in Table 19.

4. The method of claim 1 , wherein said patient at high risk of developing MS is a patient with CIS that will develop MS within two years of being initially diagnosed with CIS.

5. The method of claim 1, wherein said marker gene is ZNF 12 (SEQ ID NO:83), C17orf65 (SEQ ID NO:977), BATl (SEQ ID NO:981), ARHGDIA (SEQ ID NO: 1000), NAPA (SEQ ID NO:995), ATP5G2 (SEQ ID NO:996), DDX52 (SEQ ID NO :292), NDFIP 1 (SEQ ID NO :2), SDAD 1 (SEQ ID NO : 116), USP7 (SEQ ID NO:1014),MEF2A (SEQ ID NO: 1007), AGER (SEQ ID NO:998), RABlB (SEQ ID NO: 1011)5 GDIl (SEQ ID NO:986) or BANFl(SEQ ID NO:999).

6. The method of claim 1, wherein said marker gene is ZNF 12 (SEQ ID NO:83), C17orf65 (SEQ ID NO:997), BATl (SEQ ID NO:981), ARHGDIA (SEQ ID NO: 1000), NAPA (SEQ ID NO:995), ATP5G2 (SEQ ID NO:996), DDX52 (SEQ ID NO :292), NDFIPl (SEQ ID NO :2) or SDADl (SEQ ID NO: 116).

7. The method of claim 1 , wherein said marker gene is USP7 (SEQ ID NO: 1014), MEF2A (SEQ ID NO: 1007), AGER (SEQ ID NO:998), RABlB (SEQ ID NO: 1011), GDIl (SEQ ID NO:986) or BANFl (SEQ ID NO:999).

8. The method of claim 1, wherein said marker gene is C17orf65 (SEQ ID NO:977), C4orflO (SEQ ID NO: 1005), FAM98A (SEQ ID NO: 1020), TLEl (SEQ ID NO:844), INHBC (SEQ ID NO:993), NAPA (SEQ ID NO:995), TKT (SEQ ID NO:994), TPT 1 (SEQ ID NO : 138), FLJ20054 (SEQ ID NO : 11 ), KIAA0794 (SEQ ID NO : 104), LOC134492 (SEQ ID NO: 184), or MGC34648 (SEQ ID NO:348).

9. The method of claim 1, wherein said marker gene is CDlD (SEQ ID NO:376), CD44 (SEQ ID NO:275), CDC34 (SEQ ID NO:553), CDKNlC (SEQ ID NO:320), CD47 (SEQ ID NO:1015), GZMM (SEQ ID NO:617), or PPIA (SEQ ID NO:1010).

10. The method of claim 1, wherein said marker gene comprises a nucleic acid sequence at least 10 nucleotides in length having at least 90% identity with a contiguous portion a nucleic acid having the sequence of one of SEQ ID NO: 1 to SEQ ID NO:1021.

11. The method of claim 1 , wherein said standard control is a detected level of expression of a standard control gene in said patient.

12. The method of claim 1, wherein said standard control gene is GAPDH, 18s ribosomal subunit, beta actin (ACTB), PPP 1 CA, beta 2 microglobulin (B2M), HPRTl, RPS 13, RPL27, RPS20 or OAZl.

13. The method of claim 1, wherein said standard control gene is GAPDH.

14. The method of claim 1 , wherein the elevated level of expression of said marker gene or the lowered level of expression of said marker gene is determined by the ratio of the level of expression of said marker gene to the level of expression of said standard control gene, whereby said ratio being approximately equal to the corresponding ratio set forth in Table IA or Table 2 predicts development of MS within two years of being initially diagnosed with CIS.

15. The method of claim 1 , wherein the elevated level of expression of said marker gene or the lowered level of expression of said marker gene is determined by a threshold expression level resulting from a statistical model.

16. The method of claim 15, wherein said statistical model is obtained using a classifier algorithm selected from a compound covariate predictor, a diagonal linear discriminant analysis, and a support vector machine.

17. A kit for use in identifying a patient with clinically isolated syndrome (CIS) at high risk of developing multiple sclerosis (MS), said kit comprising; (i) a nucleic acid sequence having at least 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity over at least a 10 nucleotide continuous region with one or more nucleic acids having SEQ ID NO:1 to SEQ ID NO: 1021, or a nucleic acid complimentary thereto; and (ii) an electronic device or computer software capable of comparing a marker gene expression level from said patient to a standard control thereby indicating whether said patient is at high risk of developing MS.

Description:
IDENTIFYING HIGH RISK CLINICALLY ISOLATED SYNDROME PATIENTS

CROSS-REFERENCES TO RELATED APPLICATIONS [0001] This application claims the benefit of U.S. Provisional Application No.

61/083,505, filed July 24, 2008, U.S. Provisional Application No. 61/103,215, filed October 6, 2008, and U.S. Provisional Application No. 61/108,469, filed October 24, 2008, all of which are incorporated herein by reference in their entireties and for all purposes.

STATEMENT AS TO RIGHTS TO INVENTIONS MADE UNDER FEDERALLY SPONSORED RESEARCH AND DEVELOPMENT

[0002] The invention was supported, in whole or in part, by a grant from the National Institutes of Health (2R01NS026799). The Government has certain rights in the invention.

BACKGROUND OF THE INVENTION

[0003] Multiple sclerosis (MS) is a common disabling neurologic disease of young adults (1). Most patients with MS initially present with a clinically isolated syndrome (CIS) due to an inflammatory demyelinating insult in the central nervous system (CNS). Approximately one third of CIS patients will progress to clinically definite MS (CDMS) within 1 year after diagnosis and about half will do so after 2 years (2, 3). Although MRI assessment is routinely used to monitor and forecast conversion into MS, its specificity remains moderate (3). It is estimated that about 10% of CIS patients will remain free of further demyelinating attacks and neurological complications even in the presence of radiological evidence of white matter lesions(4). Although structural neuro-imaging studies are invaluable in the diagnosis and clinical surveillance of MS(3, 5), there is currently no biological marker that accurately predicts MS conversion in CIS patients. Individualized early prognosis and prediction of CDMS would be of substantial value because patients at high risk for rapid progression could be offered disease-modifying therapy, an approach shown to be beneficial in early MS (2).

[0004] The present invention meets these and other needs in the art.

BRIEF SUMMARY OF THE INVENTION [0005] The present invention provides methods and kits for identifying clinically isolated syndrome (CIS) patients at high risk of developing multiple sclerosis (MS). [0006] In one aspect, a method is provided for identifying a patient with clinically isolated syndrome (CIS) at high risk of developing multiple sclerosis (MS). The method includes detecting the level of expression of a marker gene within the patient. The marker gene is a marker gene set forth in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8, or Table 13, or the marker gene includes a nucleic acid sequence of at least 10 nucleotides in length and at least 90% (e.g., 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) identity with a contiguous portion of one of SEQ ID NO: 1 to SEQ ID NO: 1021. The level of expression of the marker gene is then compared to a standard control whereby a differential expression of the marker gene relative to the standard control indicates that the patient is at high risk of developing multiple sclerosis.

[0007] In another aspect, a method is provided for identifying a patient with clinically isolated syndrome (CIS) at high risk of developing multiple sclerosis (MS). The method includes detecting the level of expression of a plurality (e.g. a panel or group) of marker genes within the patient. The plurality of marker genes are all or a portion of marker genes listed in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8, or Table 13, or the plurality of marker genes comprise a nucleic acid of at least 10 nucleotides in length and at least 90% identity with a contiguous region of all or a portion of marker gene sequences listed in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8, or Table 13. The marker gene sequences listed in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8, or Table 13 are referenced as SEQ ID numbers. In some embodiments, the plurality of marker genes are all or a portion of marker genes listed in one of Table 18, Table 19, Table IA, Table 2, Table 4, Table 8, or Table 13, or the plurality of marker genes comprise a nucleic acid of at least 10 nucleotides in length and at least 90% identity with a contiguous region of all or a portion of marker gene sequences listed in one of Table 18, Table 19, Table IA, Table 2, Table 4, Table 8, or Table 13. In other embodiments, the plurality of marker genes are all marker genes listed in one of Table 18, Table 19, Table IA, Table 2, Table 4, Table 8, or Table 13, or the plurality of marker genes comprise a nucleic acid of at least 10 nucleotides in length and at least 90% identity with a contiguous region of all marker gene sequences listed in one of Table 18, Table 19, Table IA, Table 2, Table 4, Table 8, or Table 13. The level of expression of the marker gene to a standard control is compared whereby a differential expression of the marker gene relative to the standard control indicates that the patient is at high risk of developing multiple sclerosis.

[0008] In another aspect, a kit is provided for use in identifying a patient with clinically isolated syndrome (CIS) at high risk of developing multiple sclerosis (MS). The kit includes (i) a nucleic acid sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity over at least a 10 or 20 nucleotide continuous region with one or more nucleic acids within a marker gene identified in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8, and/or Table 13, (ii) a nucleic acid sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity over at least a 10 or 20 nucleotide continuous region with a target sequence to which the probe set identified in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8 and/or Table 13 is designed to interrogate, or (iii) a nucleic acid complimentary to the nucleic acids set forth in (i) or (ii) above. In some embodiments, the kit also includes an electronic device or computer software capable of comparing a marker gene expression level from the patient to a standard control thereby indicating whether the patient is at high risk of developing multiple sclerosis.

BRIEF DESCRIPTION OF THE DRAWINGS

[0009] Figure 1. Molecular signature in CD4+ cells segregates CIS patients from controls. A. Three dimensional plot of the first 3 principal components computed from expression values of the 1,718 genes with the highest variance across all samples. B. Hierarchical clustering of expression values from the same genes and samples as in A. C. Gene ontology (GO) categories significantly enriched in CIS patients at baseline. D. A representative model for the prediction of the 4 CIS groups using the Integrated Bayesian Inference System (IBIS).

[0010] Figure 2. Clinical and radiological characteristics of the 4 CIS groups. A. Clinical and MRI characteristics of the 4 CIS groups defined by gene expression. B. Hierarchical clustering of differences (delta) between measurements at baseline and 12 months later of brain parenchyma (nBPV), grey matter (nGMV), and white matter (nWMV), CSF (nCSFV) volumes, and SIENA (PBVC) expressed as quartiles. C. Kaplan- Meyer curve of high risk (red) and low risk (blue) groups of MS conversion as predicted by the expression of RNA gene products hybridizing to 28 probe sets set forth in Table 2. D. Hierarchical clustering of the mRNA gene products hybridizing to 108 probe sets set forth in Table IA that characterize group #1 patients. [0011] Figure 3. TOBl abrogates T cell quiescence. A. Relative expression (fold change compared to controls) of TOBl in CIS patients from group #1 and CIS patients from other groups assessed by RT-PCR. B. Relative expression of TOBl (yellow bars), Interleukin-2 (green bars) and Interferon-gamma (red bars) assessed by RT-PCR in CD4+ T cells cultured for 6 or 24 hours in plates coated with 1 mg/ml anti-CD3 and 1 mg/ml anti-CD28 antibodies (n = 3). C. Immunostaining for TOBl and CD4 in lymph nodes of mice injected with MOG35_55, CFA alone, or vehicle. D. Microarray-based TOBl expression in lymph nodes and spinal cords from mice immunized with MOG 35 _ 55 + adjuvant (EAE) or adjuvant only (CFA). E. Relative expression (fold change compared to controls) of CD44 in CIS patients from group #1 and CIS patients from other groups assessed by RT-PCR. F. Plasma OPN concentration in group #1 patients, other CIS and controls measured by ELISA. G. Genomic map of TOBl showing the relative position of the 5 SNP used for association analysis. H. Schematic representation of gene expression signature in T cells from group #1 patients.

[0012] Figure 4. Gene expression still differentiates group #1 from other CIS patients a year later. A. Hierarchical clustering of the expression of the same 1,718 genes as in Figure 1 but obtained at 12 months. B. Number of genes differentially expressed in CIS compared to controls at baseline (orange circle), at 12 months (blue circle) or on both sets (intersection). C. A SVM predictive model was built using the expression of mRNA gene products hybridizing to 108 probe sets set forth in Table IA that distinguished group#l from other CIS patients.

DETAILED DESCRIPTION OF THE INVENTION

I. Definitions [0013] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Generally, the nomenclature used herein and the laboratory procedures in cell culture, molecular genetics, and nucleic acid chemistry and hybridization described below are those well known and commonly employed in the art. [0014] As used herein, "nucleic acid" means either DNA, RNA, single-stranded, double- stranded, or more highly aggregated hybridization motifs, and any chemical modifications thereof. Modifications include, but are not limited to, those which provide other chemical groups that incorporate additional charge, polarizability, hydrogen bonding, electrostatic interaction, and functionality to the nucleic acid ligand bases or to the nucleic acid ligand as a whole. Such modifications include, but are not limited to, peptide nucleic acids, phosphodiester group modifications {e.g., phosphorothioates, methylphosphonates), T- position sugar modifications, 5-position pyrimidine modifications, 8-position purine modifications, modifications at exocyclic amines, substitution of 4-thiouridine, substitution of 5-bromo or 5-iodo-uracil; backbone modifications, methylations, unusual base-pairing combinations such as the isobases isocytidine and isoguanidine and the like. Modifications can also include 3' and 5' modifications such as capping. [0015] The term "nucleic acid" or "polynucleotide" refers to deoxyribonucleotides or ribonucleotides and polymers thereof in either single- or double-stranded form. Unless specifically limited, the term encompasses nucleic acids containing known analogues of natural nucleotides that have similar binding properties as the reference nucleic acid and are metabolized in a manner similar to naturally occurring nucleotides. Unless otherwise indicated, a particular nucleic acid sequence also implicitly encompasses conservatively modified variants thereof (e.g., degenerate codon substitutions), alleles, orthologs, SNPs (haplotypes), and complementary sequences as well as the sequence explicitly indicated. Specifically, degenerate codon substitutions may be achieved by generating sequences in which the third position of one or more selected (or all) codons is substituted with mixed- base and/or deoxyinosine residues (Batzer et al., Nucleic Acid Res. 19:5081 (1991);

Ohtsuka et al., J. Biol. Chem. 260:2605-2608 (1985); and Cassol et al. (1992); Rossolini et al., MoI. Cell. Probes 8:91-98 (1994)). The term nucleic acid is used interchangeably with gene, cDNA, and mRNA encoded by a gene. [0016] The phrase "selectively (or specifically) hybridizes to" refers to the detectable binding, duplexing, or hybridizing of a molecule only to a particular nucleotide sequence under stringent hybridization conditions when that sequence is present in a complex mixture (e.g., total cellular or library DNA or RNA).

[0017] The terms "identical" or percent "identity," in the context of two or more nucleic acids, refer to two or more sequences or subsequences that are the same or have a specified percentage of nucleotides that are the same (i.e., about 60% identity, preferably 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or higher) identity over a specified region, when compared and aligned for maximum correspondence over a comparison window or designated region) as measured using a BLAST or BLAST 2.0 sequence comparison algorithms with default parameters described below, or by manual alignment and visual inspection (see, e.g., the NCBI web site or the like). Such sequences are then said to be "substantially identical." This definition also refers to, or may be applied to, the compliment of a test sequence. The definition also includes sequences that have deletions and/or additions, as well as those that have substitutions. As described below, the preferred algorithms can account for gaps and the like. Preferably, identity exists over a region that is at least about 10, 11, 12, 13, 14, 15, 20, 25 amino acids or nucleotides in length, or over a region in the range 10-20, 10-25, 10-30, 10-40, 10-50, 10-60, 10-70, 10-80, 10-90, or even 10-100.. In certain preferred embodiments, identity exists over a region that is 10-100 amino acids or nucleotides in length. [0018] The phrase "stringent hybridization conditions" refers to conditions under which a first nucleic acid will hybridize to its target subsequence, typically in a complex mixture of nucleic acid, but not detectably to other sequences. Stringent conditions are sequence- dependent and will be different in different circumstances. Longer sequences hybridize specifically at higher temperatures. An extensive guide to the hybridization of nucleic acids is found in Tijssen, Techniques in Biochemistry and Molecular Biology— Hybridization with Nucleic Probes, "Overview of principles of hybridization and the strategy of nucleic acid assays" (1993). Generally, stringent conditions are selected to be about 5-10° C lower than the thermal melting point (T m ) for the specific sequence at a defined ionic strength pH. The T m is the temperature (under defined ionic strength, pH, and nucleic concentration) at which 50% of the probes complementary to the target hybridize to the target sequence at equilibrium (as the target sequences are present in excess, at T m , 50% of the probes are occupied at equilibrium). Stringent conditions will be those in which the salt concentration is less than about 1.0 M sodium ion, typically about 0.01 to 1.0 M sodium ion concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 3O°C for short probes (e.g., 10 to 50 nucleotides) and at least about 60° C for long probes (e.g., greater than 50 nucleotides). Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide. For selective or specific hybridization, a positive signal is at least two times background, optionally 10 times background hybridization. Exemplary stringent hybridization conditions can be as following: 50% formamide, 5X SSC, and 1% SDS, incubating at 42°C, or 5X SSC, 1% SDS, incubating at 65°C, with wash in 0.2X SSC, and 0.1% SDS at 65°C. Such washes can be performed for 5, 15, 30, 60, 120, or more minutes.

[0019] Exemplary "moderately stringent hybridization conditions" include a hybridization in a buffer of 40% formamide, 1 M NaCl, 1% SDS at 37°C, and a wash in IX SSC at 45°C. Such washes can be performed for 5, 15, 30, 60, 120, or more minutes. A positive hybridization is at least twice background. Those of ordinary skill will readily recognize that alternative hybridization and wash conditions can be utilized to provide conditions of similar stringency. [0020] The terms "differential expression" or "differentially expressed" used in reference to the expression of a marker gene means an elevated level of expression of the marker gene or a lowered level of expression of the marker gene relative to a standard control that is indicative of a high risk CIS patient, as set forth in the methods and results disclosed herein (e.g. Tables 1, 2, 10A-12C, and 15A- 17C). As is customary in the art, marker genes described herein each have an associated name (e.g., C17orf65, C4orflO, FAM98A, and the like). Accordingly, reference to a marker gene name in turn refers to the marker gene itself. "Target sequence" refers to a region within a target gene (e.g., marker gene) which a probe will identify, as known in the art. The term "probe set identifier" refers to set of nucleic acid probes capable of identifying a particular marker gene (e.g. target sequence). Probe set identifiers may be provided by Affymetrix (Santa Clara, CA), for example, as known in the art and disclosed herein. It is understood that one of skill in the art can, with only routine experimentation, design and use probes to identify specific marker genes as described herein. It is further understood that more than one probe, and more than one probe set identifier may be designed to identify a specific gene, for example a marker gene described herein.

II. Methods and Kits

[0021] Provided herein are methods of determining whether a patient with clinically isolated syndrome (CIS) is at high risk of developing multiple sclerosis (MS). CIS patients at high risk of developing MS are typically those patients that develop MS within two years of being initially diagnosed with CIS or within two years of the onset of CIS. In some embodiments, high risk CIS patients are those that develop CIS within 18 months, 12 months, or 9 months of being initially diagnosed with CIS or the onset of CIS. Thus, the methods provided herein are useful in identifying CIS patients that are likely to develop MS quickly relative to the average CIS patient.

[0022] It has been discovered that certain genes are markers of rapid development of MS in CIS patients. These marker genes were identified as genes that are differentially expressed relative to healthy individuals and/or CIS patients that do not develop MS quickly (i.e. those that are at low risk of rapid onset of MS). Thus, by detecting the level of expression of a marker gene within a CIS patient and comparing the level of expression of the maker gene to a standard control, high risk CIS patients may be identified. In some embodiments, the level of a plurality (e.g. a panel) of marker genes are detected and compared to the level of expression of the maker gene to a standard control to identify high risk CIS patients. Specific panels or groups of maker genes are discussed below. In some embodiments, the standard control may be approximately the average amount of expression of the marker gene(s) in humans, humans without CIS, or humans with CIS that are not at high risk of developing MS. In other embodiments, the standard control is a detected level of expression of a standard control gene in the CIS patient.

[0023] In some embodiments, the marker gene is a gene set forth in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8 and/or Table 13. In some embodiments, the marker gene is any one of the genes set forth in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8 and/or Table 13. In some embodiments, a plurality (e.g. a panel or group) of marker genes are detected. Thus, in some embodiments all the marker genes set forth in one of the following tables is selected: Table 18, Table 19, Table IA, Table 2, Table 4, Table 8 or Table 13. In another embodiment, at least 2-9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 400, 500, 600, 700, or 800 of the marker genes set forth in one of the following tables is selected, as appropriate according to the number of genes set forth within the following tables: Table 18, Table 19, Table IA, Table 2, Table 4, Table 8 or

Table 13. Where a plurality (e.g. panel or group) of genes are detected, any combination of the marker genes disclosed in relevant table(s) may be detected. By measuring the expression levels of these one or more of these marker genes in a patient with CIS and comparing those levels with healthy individuals and/or CIS patients that do not develop MS quickly, the risk of the patient developing MS may be assessed. In some related embodiments, the marker gene is ZNF12, C17orf65, BATl, ARHGDIA, NAPA, ATP5G2, DDX52, NDFIPl, SDADl, USP7, MEF2A, AGER, RABlB, GDIl and/or BANFl. In other related embodiments, the marker gene is ZNF 12, C17orf65, BATl, ARHGDIA, NAPA, ATP5G2, DDX52, NDFIPl and/or SDADl. In still other related embodiments, the marker gene is USP7, MEF2A, AGER, RABlB, GDIl and/or BANFl . In other embodiments, the marker gene is TOB 1. In other embodiments, the marker gene is not TOBl. In some embodiments, the marker gene is C17orf65, C4orflO, FAM98A, TLEl, INHBC, NAPA, TKT, TPTl, FLJ20054, KIAA0794, LOC134492, and/or MGC34648. In some embodiments, the marker gene is any one of C17orf65, C4orflO, FAM98A, TLEl, INHBC, NAPA, TKT, TPTl, FLJ20054, KIAA0794, LOC134492 or MGC34648. In some embodiments, the marker gene is included within a plurality of genes selected from C17orf65, C4orflO, FAM98A, TLEl, INHBC, NAPA, TKT, TPTl, FLJ20054, KIAA0794, LOC134492 and MGC34648. In some embodiments, the marker gene is CDlD, CD44, CDC34, CDKNlC, CD47, GZMM, and/or PPIA. In some embodiments, the marker gene is any one of CDlD, CD44, CDC34, CDKNlC, CD47, GZMM, or PPIA. In some embodiments, the marker gene included within a plurality of genes selected from CDlD, CD44, CDC34, CDKNlC, CD47, GZMM, or PPIA.

[0024] In certain embodiments, the method described herein for detecting the level of expression of a marker gene is an in vitro method. In some embodiments, the marker gene is a gene set forth in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8 and/or Table 13, and detection is conducted in vitro (e.g. on a biological sample derived from a CIS patient). [0025] The expression levels of the marker genes may be measured using any appropriate method. In some embodiments, the amount of RNA expressed by the marker gene is measured. The amount of RNA expressed may be assessed, for example, using nucleic acid probes with marker gene coding sequences or using quantitative PCR techniques. For example, a nucleic acid array forming a probe set may be used to detect RNA expressed by the marker gene. The RNA expressed by the marker gene may be transcribed to cDNA (and in some cases to cRNA) and then queried with a gene chip array using methods known in the art. Thus, in some embodiments the marker gene may also be a gene including a nucleic acid sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity over at least a 10 or 20 nucleotide continuous region (i.e. sequence) within a marker gene identified in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8, and/or Table 13, or with a target sequence to which the probe set identified in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8 and/or Table 13 is designed to interrogate. For example, the continuous region may be 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 nucleotides in length. In related embodiments, the marker gene includes a nucleic acid sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity with the entire length of one or more nucleic acids within a marker gene identified in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8, and/or Table 13, or with a target sequence to which the probe set identified in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8 and/or Table 13 is designed to interrogate. In other related embodiments, the marker gene includes a nucleic acid sequence having 100% identity with the entire length of one or more nucleic acids within a marker gene identified in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8, and/or Table 13, or with a target sequence to which the probe set identified in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8 and/or Table 13 is designed to interrogate. In other related embodiments, "one or more" nucleic acids within a probe set identified in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8, and/or Table 13 referred to above is the majority or all of the nucleic acids within the probe set. [0026] Tables IA, 2, 4, 8, 13, 18 and 19 provide probe set identifiers using Affymetrix probe set identifier numbers, as known in the art. The nucleic acid sequences contained within each probe set identifier number and the target sequence to which the probe set is designed to interrogate are publicly available in a variety of sources, including the Affymetrix website and the National Cancer Institute website. The term "designed to interrogate" in the context of target genes, marker genes and probes refers to a probe having sufficient primary sequence complementarity to a target to detectably bind the target, as well known in the art.

[0027] In some embodiments, the marker gene includes a nucleic acid sequence within a marker gene identified in Table IA. In other embodiments, the marker gene includes a nucleic acid sequence within a marker gene identified in Table 2. In other embodiments, the marker gene includes a nucleic acid sequence within a p marker gene identified in Table 4. In other embodiments, the marker gene includes a nucleic acid sequence within a marker gene identified in Table 8. In other embodiments, the marker gene includes a nucleic acid sequence within a marker gene identified in Table 13. In some embodiments, the marker gene is a gene set forth in Table 18. In some embodiments, the marker gene is C17orf65 (SEQ ID NO:977), C4orflO (SEQ ID NO: 1005), FAM98A (SEQ ID NO: 1020), TLEl (SEQ ID NO:844), INHBC (SEQ ID NO:993), NAPA (SEQ ID NO:995), TKT (SEQ ID NO:994), TPTl (SEQ ID NO: 138), FLJ20054 (SEQ ID NO: 11), KIAA0794 (SEQ ID NO: 104), LOC134492 (SEQ ID NO:184), and/or MGC34648 (SEQ ID NO:348). In some embodiments, the expression levels of a plurality (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 and/or 13) of marker genes as disclosed in Table 18 are detected. In some embodiments, the marker gene is a gene set forth in Table 19. In some embodiments, the marker gene is

CDlD (SEQ ID NO:376), CD44 (SEQ ID NO:275), CDC34 (SEQ ID NO:553), CDKNlC (SEQ ID NO:320), CD47 (SEQ ID NO: 1015), GZMM (SEQ ID NO:617), and/or PPIA (SEQ ID NO:1010). In some embodiments, the expression levels of a plurality (e.g., 2, 3, 4, 5, 6 or 7) of marker genes as disclosed in Table 19 are detected. [0028] The comparison of the marker gene expression levels with a standard control may be accomplished by determining whether the marker gene is expressed in the CIS patient at an elevated level or a lowered level (i.e. detecting differential expression). The elevated or lowered levels are indicative of rapid development of multiple sclerosis (MS) (e.g. within two years of being initially diagnosed with CIS). Whether elevation or lowering of expression of a particular marker gene expression is indicative of rapid onset of MS in a CIS patient is clearly set forth in Tables IA, 2, 10A-12C, and 15A-17C. For example, where the marker gene is TOBl, Table IA clearly shows that lowered expression of TOBl is indicative of rapid onset of MS in a CIS patient. [0029] The standard control may be any appropriate standard known in the art. In some embodiments, the standard control is approximately the average amount of expression of the marker gene in humans, humans without CIS, or humans with CIS that are not at high risk of developing MS. Approximate average relative amounts of expression of marker genes are set forth in Tables IA and 2 in a sample of humans without CIS, humans with CIS that are not at high risk of developing MS, and humans with CIS at high risk of developing MS. In addition, Table 4 provides approximate average amounts of expression of genes for humans with CIS and humans without CIS.

[0030] In other embodiments, the standard control is a detected level of expression of a standard control gene in the CIS patient. As used herein, a standard control gene is a human gene that is expressed at approximately constant levels thereby providing a baseline reading of gene expression for an individual. The standard control gene may also be referred to herein and in the art as a housekeeping gene. In some embodiments, the standard control gene is GAPDH, 18s ribosomal subunit, beta actin (ACTB), PPPlCA, beta 2 microglobulin (B2M), HPRTl, RPS 13, RPL27, RPS20 or OAZl. [0031] The elevated level of expression of the marker gene or the lowered level of expression of the marker gene may be determined by calculating the ratio of the level of expression of the marker gene to the level of expression of a standard control gene. For example, Table IB lists an average amount of GAPDH in the subjects studied according to the examples set forth below. The corresponding ratios of marker genes to GAPDH are set forth in Tables IA and 2. By using the calculated ratios provided in Tables IA and 2, the ratio of expression of a corresponding marker gene to GAPDH in a CIS patient may be calculated. Where the calculated marker to GAPDH ratio in the patient is approximately equal to the corresponding ratio provided in Table IA and 2, the CIS patient is at high risk of rapidly developing MS. [0032] In some embodiments, statistical models are established for determining whether expression of a marker gene is indicative of a CIS patient that is highly likely to develop MS, for example within 9 months of being initially diagnosed with CIS. Thus, in some related embodiments, the standard control is a threshold expression value obtained from a statistical model. Threshold expression values may be obtained optionally using a standard gene (e.g. GADPH or ACTB) and a classifier algorithm (e.g. compound covariate predictor (CCP), diagonal linear discriminant analysis (DLDA), and/or support vector machines (SVM) classifiers) (see Example 9 and Tables 8 to 17C). In some embodiments, a composite predictor is used to establish a statistical model or threshold vale wherein the composite predictor employs a CCP, DLDA and SVM. Where the expression of a marker gene in a CIS subject is above the calculated threshold expression value, a patient with CIS is at high risk for developing MS.

[0033] Using the teachings provided herein, one skilled in the art is enabled to use any known housekeeping gene to establish similar ratios and statistical models to identify CIS patients at high risk of rapidly developing MS using the disclosed methods.

[0034] In another aspect, there is provided an in vitro method for determining whether a patient with clinically isolated syndrome (CIS) is at high risk of developing multiple sclerosis (MS). The method includes isolating mRNA from the patient, thereby providing an in vitro nucleic acid sample. Optionally, the method further includes subjecting the in vitro nucleic acid sample to polymerase chain reaction under conditions suitable to amplify nucleic acid within the in vitro nucleic acid sample. The in vitro nucleic acid sample is contacted with a microarray, the microarray having a plurality of probes designed to interrogate specific marker genes. The level of nucleic acid duplex formation is determined between the in in vitro nucleic acid sample and the microarray, thereby providing the expression level of nucleic acid present in the in vitro nucleic acid sample. The expression level of nucleic acid is then compared to the expression level of a standard control. A differential expression of the marker gene relative to said standard control indicates that the patient is at high risk of developing multiple sclerosis. In some embodiments, the standard control may be approximately the average amount of expression of the marker gene in humans, humans without CIS, or humans with CIS that are not at high risk of developing MS. In other embodiments, the standard control is a detected level of expression of a standard control gene in the CIS patient. In some embodiments, the marker gene is a gene set forth in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8 and/or Table 13. In some embodiments, the marker gene is any one of the marker genes set forth in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8 and/or Table 13. In some embodiments, the expression level of a plurality (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 30, 40, 50, 60, 70, 80, 90 or 100) of marker genes as set forth in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8 and/or Table 13, are determined. In some embodiments, the marker gene is ZNF 12, C17orf65, BATl, ARHGDIA, NAPA, ATP5G2, DDX52, NDFIPl, SDADl, USP7, MEF2A, AGER, RABlB, GDIl and/or BANFl. In other related embodiments, the marker gene is ZNF12, C17orf65, BATl, ARHGDIA, NAPA, ATP5G2, DDX52, NDFIPl and/or SDADl . In still other related embodiments, the marker gene is USP7, MEF2A, AGER, RABlB, GDIl and/or BANFl. In some embodiments, the expression levels of a plurality (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or 15) of marker genes selected from ZNF 12, C17orf65, BATl, ARHGDIA, NAPA, ATP5G2, DDX52, NDFIPl, SDADl, USP7, MEF2A, AGER, RABlB, GDIl and BANFl. In some embodiments, the marker gene is a gene set forth in Table 18. In some embodiments, the marker gene is C17orf65 (SEQ ID NO:977), C4orflO (SEQ ID NO: 1005), FAM98A (SEQ ID NO: 1020), TLEl (SEQ ID NO:844), INHBC (SEQ ID NO:993), NAPA (SEQ ID NO:995), TKT (SEQ ID NO:994), TPTl (SEQ ID NO: 138), FLJ20054 (SEQ ID NO: 11), KIAA0794 (SEQ ID NO: 104), LOC134492 (SEQ ID NO:184), or MGC34648 (SEQ ID NO:348). In some embodiments, the expression levels of a plurality (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 or 13) of marker genes as disclosed in Table 18 are detected. In some embodiments, the marker gene is a gene set forth in Table 19. In some embodiments, the marker gene is CDlD (SEQ ID NO:376), CD44 (SEQ ID NO:275), CDC34 (SEQ ID NO:553), CDKNlC (SEQ ID NO:320), CD47 (SEQ ID NO:1015), GZMM (SEQ ID NO:617), or PPIA (SEQ ID NO:1010). In some embodiments, the expression levels of a plurality (e.g., 2, 3, 4, 5, 6 or 7) of marker genes as disclosed in Table 19 are detected. [0035] In another aspect, a kit is provided for use in identifying a patient with clinically isolated syndrome (CIS) at high risk of developing multiple sclerosis (MS). The kit includes (i) a nucleic acid sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity over at least a 10 nucleotide continuous region (e.g., 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20, or within the range 10-50, 10-40, 10-30, or 10-20) with one or more nucleic acids within a marker gene identified in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8, and/or Table 13, (ii) a nucleic acid sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity over at least a 10 (e.g., 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20) nucleotide continuous region with a target sequence to which the probe set identified in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8 and/or Table 13 is designed to interrogate, or (iii) a nucleic acid complimentary to the nucleic acids set forth in (i) or (ii) above. In some embodiments, the kit also includes an electronic device or computer software capable of comparing a marker gene expression level from the patient to a standard control thereby indicating whether the patient is at high risk of developing multiple sclerosis. In some embodiments, the kit contains a plurality (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20) of nucleic acid sequences having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity over at least a 10 nucleotide continuous region (e.g., 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20, or within the range 10-50, 10-40, 10-30, or 10-20) with a marker gene identified in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8, and/or Table 13, or complement thereof. In some embodiments, the kit contains a plurality (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20) of nucleic acid sequences having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity over at least a 10 nucleotide continuous region (e.g., 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20, or within the range 10-50, 10-40, 10-30, or 10-20) with a marker gene identified in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8, and/or Table 13, or complement thereof. In some embodiments, the plurality of marker genes are all or a portion of marker genes listed in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8, or Table 13, or the plurality of marker genes comprise a nucleic acid of at least 10 nucleotides in length and at least 90% identity with a contiguous region of all or a portion of marker gene sequences listed in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8, or Table 13. In some embodiments, the plurality of marker genes are all or a portion of marker genes listed in one of Table 18, Table 19, Table IA, Table 2, Table 4, Table 8, or Table 13, or the plurality of marker genes comprise a nucleic acid of at least 10 nucleotides in length and at least 90% identity with a contiguous region of all or a portion of marker gene sequences listed in one of Table 18, Table 19, Table IA, Table 2, Table 4, Table 8, or Table 13. In other embodiments, the plurality of marker genes are all marker genes listed in one of Table 18, Table 19, Table IA, Table 2, Table 4, Table 8, or Table 13, or the plurality of marker genes comprise a nucleic acid of at least 10 nucleotides in length and at least 90% identity with a contiguous region of all marker gene sequences listed in one of Table 18, Table 19, Table IA, Table 2, Table 4, Table 8, or Table 13.

[0036] In some embodiments, the electronic device or computer software employs the use of a statistical model. The electronic device or computer software may also utilize a threshold expression values obtained optionally using a standard gene (e.g. GADPH or ACTB) and a classifier algorithm (e.g. compound covariate predictor (CCP), diagonal linear discriminant analysis (DLDA), and/or support vector machines (SVM) classifiers) such as those set forth in Example 9 and Tables 8 to 17. One skilled in the art will immediately recognize that the electronic device or computer software may be used in the methods disclosed herein. [0037] In some embodiments, the nucleic acid provided in the kit above may be a probe nucleic acid for use in a PCR technique, such as quantitative PCR, to assess the expression of a given marker gene. In some embodiments, the nucleic acid sequence has 100% identity with a continuous nucleic acid region (i.e. sequence) within a marker gene identified in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8, and/or Table 13, or with a target sequence to which the probe set identified in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8 and/or Table 13 is designed to interrogate, or is complimentary thereto. In other embodiments, the nucleic acid has the same sequence as a nucleic acid contained within a marker gene identified in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8, and/or Table 13 or the target sequence to which the probe set identified in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8 and/or Table 13 is designed to interrogate, or is complimentary thereto.

[0038] The nucleic acid provided in the kit may also hybridize under stringent conditions (or moderately stringent conditions) to a nucleic acid sequence within a marker gene identified in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8 and/or Table 13 or a target sequence to which the probe set identified in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8 and/or Table 13 is designed to interrogate. The nucleic acid provided in the kit may also be perfectly complimentary to a nucleic acid sequence within a marker gene identified in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8 and/or Table 13 or a target sequence to which the probe set identified in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8 and/or Table 13 is designed to interrogate.

[0039] The present invention also contains the subject matter of the following numbered embodiments:

Embodiment 1: A method of identifying a patient with clinically isolated syndrome (CIS) at high risk of developing multiple sclerosis (MS), said method comprising: detecting the level of expression of a marker gene within said patient, wherein said marker gene is a marker gene set forth in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8, or Table 13, or said marker gene comprises a nucleic acid of at least 10 nucleotides in length and at least 90% identity with a contiguous portion of one of SEQ ID NO:1 to SEQ ID NO: 1021; and comparing the level of expression of said marker gene to a standard control whereby a differential expression of said marker gene relative to said standard control indicates that said patient is at high risk of developing multiple sclerosis.

Embodiment IA: A method of identifying a patient with clinically isolated syndrome (CIS) at high risk of developing multiple sclerosis (MS), said method comprising: detecting the level of expression of a plurality of marker genes within said patient, wherein said plurality of marker genes are all or a portion of marker genes listed in one of Table 18, Table 19, Table IA, Table 2, Table 4, Table 8, or Table 13, or said plurality of marker genes comprises a nucleic acid of at least 10 nucleotides in length and at least 90% identity with a contiguous region of all or a portion of marker gene sequences listed in one of Table 18,

Table 19, Table IA, Table 2, Table 4, Table 8, or Table 13; and comparing the level of expression of said plurality of marker genes to a standard control whereby a differential expression of said plurality of marker genes relative to said standard control indicates that said patient is at high risk of developing multiple sclerosis.

Embodiment 2: The method of Embodiment 1, wherein said marker gene comprises a nucleic acid sequence at least 10 nucleotides in length having at least 90% identity with a contiguous portion of a nucleic acid having the sequence of one of SEQ ID NO: 1 to SEQ ID NO: 1021.

Embodiment 3: The method of Embodiment 1 or Embodiment 2, wherein the said marker gene comprises a nucleic acid sequence at least 10 nucleotides in length having at least 95% identity with a contiguous portion of a nucleic acid having the sequence of one of SEQ ID NO:1 to SEQ ID NO: 1021. Embodiment 4: The method of any preceding Embodiments, wherein the method is an in vitro method and comprises detecting the level of expression of a marker gene in a sample previously isolated from said patient. Embodiment 5: The method of Embodiment 4, which comprises contacting the sample with at least onejiucleic acid of at least 10 nucleotides in length and having at least 90% identity with a contiguous portion of one of SEQ ID NO:1 to

SEQ ID NO: 1021, and optionally comprises contacting the sample with 2, 3, 4, 5, 6, 7, 8, 9, 10 or more nucleic acids of at least 10 nucleotides in length and having at least 90% identity with a contiguous portion of one of SEQ ID NO:1 to SEQ ID NO:1021. Embodiment 6: The method of Embodiment 4, which comprises contacting the sample with at least onejiucleic acid of at least 10 nucleotides in length and at least 95% identity with a contiguous portion of one of SEQ ID NO:1 to SEQ ID NO:1021, and optionally comprises contacting the sample with 2,3, 4, 5, 6, 7, 8, 9, 10 or more nucleic acids of at least 10 nucleotides in length and having at least

95% identity with a contiguous portion of one of SEQ ID NO:1 to SEQ ID NO:1021,

Embodiment 7: The method of Embodiment 4, which comprises contacting the sample with at least onejiucleic acid of at least 10 nucleotides in length and at least 99% identity with a contiguous portion of one of SEQ ID NO:1 to SEQ ID

NO:1021, and optionally comprises contacting the sample with 2,3, 4, 5, 6, 7, 8, 9, 10 or more nucleic acids of at least 10 nucleotides in length and having at least 99% identity with a contiguous portion of one of SEQ ID NO:1 to SEQ ID NO:1021, Embodiment 8: The method of any preceding Embodiment, wherein said marker gene is a marker gene set forth in Table 18.

Embodiment 9: The method of any of Embodiments 1 to 7, wherein said marker gene is a marker gene set forth in Table 19. Embodiment 10: The method of any of Embodiments 1 to 7, wherein said marker gene is ZNF12 (SEQ ID NO:83), C17orf65 (SEQ ID NO:977), BATl

(SEQ ID NO:981), ARHGDIA (SEQ ID NO:1000), NAPA (SEQ ID NO:995), ATP5G2 (SEQ ID NO:996), DDX52 (SEQ ID NO:292), NDFIPl (SEQ ID NO:2), SDADl (SEQ ID NO: 116), USP7 (SEQ ID NO:1014),MEF2A (SEQ ID NO:1007), AGER (SEQ ID NO:998), RABlB (SEQ ID NO:1011), GDIl (SEQ ID NO:986) or BANFl(SEQ ID NO:999).

Embodiment 11: The method of any of Embodiments 1 to 7, wherein said marker gene is ZNF12 (SEQ ID NO:83), C17orf65 (SEQ ID NO:977), BATl (SEQ ID NO:981), ARHGDIA (SEQ ID NO:1000), NAPA (SEQ ID NO:995), ATP5G2 (SEQ ID NO:996), DDX52 (SEQ ID NO:292), NDFIPl (SEQ ID NO:2) or SDADl (SEQ ID NO: 116).

Embodiment 12: The method of any of Embodiments 1 to 7, wherein said marker gene is USP7 (SEQ ID NO: 1014), MEF2A (SEQ ID NO: 1007), AGER (SEQ ID NO:998), RABlB (SEQ ID NO: 1011), GDIl (SEQ ID NO:986) or BANFl (SEQ ID NO:999). Embodiment 13: The method of any of Embodiments 1 to 7, wherein said marker gene is C17orf65 (SEQ ID NO:977), C4orflO (SEQ ID NO: 1005), FAM98A (SEQ ID NO: 1020), TLEl (SEQ ID NO: 844), INHBC (SEQ ID NO:993), NAPA (SEQ ID NO:995), TKT (SEQ ID NO:994), TPTl (SEQ ID NO: 138), FLJ20054 (SEQ ID NO: 11), KIAA0794 (SEQ ID NO: 104),

LOC134492 (SEQ ID NO: 184), or MGC34648 (SEQ ID NO:348). Embodiment 14: The method of any of Embodiments 1 to 7, wherein said marker gene is CDlD (SEQ ID NO:376), CD44 (SEQ ID NO:275), CDC34 (SEQ ID NO:553), CDKNlC (SEQ ID NO:320), CD47 (SEQ ID NO: 1015), GZMM (SEQ ID NO:617), or PPIA (SEQ ID NO: 1010).

Embodiment 15: The method of any preceding Embodiment, wherein said standard control is a detected level of expression of a standard control gene in said patient.

Embodiment 16: The method of Embodiment 15, wherein said standard control gene is GAPDH, 18s ribosomal subunit, beta actin (ACTB), PPPlCA, beta 2 microglobulin (B2M), HPRTl, RPS 13, RPL27, RPS20 or OAZl. Embodiment 17: The method of Embodiment 16, wherein said standard control gene is GAPDH. Embodiment 18: The method of any preceding Embodiment, wherein the elevated level of expression of said marker gene or the lowered level of expression of said marker gene is determined by the ratio of the level of expression of said marker gene to the level of expression of said standard control gene, whereby said ratio being approximately equal to the corresponding ratio set forth in Table IA or Table 2 predicts development of MS within two years of being initially diagnosed with CIS.

Embodiment 19: The method of any preceding Embodiment, wherein the elevated level of expression of said marker gene or the lowered level of expression of said marker gene is determined by a threshold expression level resulting from a statistical model. Embodiment 20: The method of Embodiment 19, wherein said statistical model is obtained using a classifier algorithm selected from a compound covariate predictor, a diagonal linear discriminant analysis, and a support vector machine. Embodiment 21: The method of any preceding Embodiment, wherein said patient at high risk of developing MS is a patient with CIS that will develop MS within two years of being initially diagnosed with CIS.

Embodiment 22: A kit for use in identifying a patient with clinically isolated syndrome (CIS) at high risk of developing multiple sclerosis (MS), said kit comprising;

(i) a nucleic acid comprising a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity over at least a 10 nucleotide continuous region with one or more nucleic acids having SEQ ID NO : 1 to SEQ ID NO : 1021 , or a nucleic acid complimentary thereto; and

(ii) an electronic device or computer software capable of comparing a marker gene expression level from said patient to a standard control thereby indicating whether said patient is at high risk of developing MS.

Embodiment 22A: A kit for use in identifying a patient with clinically isolated syndrome (CIS) at high risk of developing multiple sclerosis (MS), said kit comprising;

(i) a nucleic acid comprising a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity over at least a 10 nucleotide continuous region with one or more nucleic acids having SEQ ID NO:1 to SEQ ID NO: 1021, or a nucleic acid complimentary thereto.

Embodiment 23: The kit of Embodiment 22 or 22A, wherein the nucleic acid is at least 10 nucleotides in length.

Embodiment 24: The kit of Embodiment 22, 22A or Embodiment 23, which comprises a nucleic acid comprising a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity over at least a 10 nucleotide continuous region with one or more marker genes wherein said marker gene is selected from ZNF12 (SEQ ID NO:83), C17orf65 (SEQ ID NO:977),

BATl (SEQ ID NO:981), ARHGDIA (SEQ ID NO:1000), NAPA (SEQ ID

NO:995), ATP5G2 (SEQ ID NO:996), DDX52 (SEQ ID NO:292), NDFIPl (SEQ ID NO:2), SDADl (SEQ ID NO: 116), USP7 (SEQ ID NO:1014),MEF2A (SEQ

ID NO:1007), AGER (SEQ ID NO:998), RABlB (SEQ ID NO:1011), GDIl

(SEQ ID NO:986) and BANFl(SEQ ID NO:999).

Embodiment 25: The kit of Embodiment 22, 22A or Embodiment 23, which comprises a nucleic acid comprising a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity over at least a 10 nucleotide continuous region with one or more marker genes wherein said marker gene is selected from ZNF12 (SEQ ID NO:83), C17orf65 (SEQ ID NO:977), BATl (SEQ ID NO:981), ARHGDIA (SEQ ID NO:1000), NAPA (SEQ ID NO:995), ATP5G2 (SEQ ID NO:996), DDX52 (SEQ ID NO:292), NDFIPl (SEQ

ID NO:2) and SDADl (SEQ ID NO: 116).

Embodiment 26: The kit of Embodiment 22, 22A or Embodiment 23, which comprises a nucleic acid comprising a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity over at least a 10 nucleotide continuous region with one or more marker genes wherein said marker gene is selected from USP7 (SEQ ID NO: 1014), MEF2A (SEQ ID NO: 1007), AGER (SEQ ID NO:998), RABlB (SEQ ID NO:1011), GDIl (SEQ ID NO:986) and BANFl (SEQ ID NO:999). Embodiment 27: Use, in the identification of a patient with clinically isolated syndrome (CIS) at high risk of developing multiple sclerosis (MS), of a microarray comprising a nucleic acid immobilised on a solid substrate, said nucleic acid having a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity over at least a 10 nucleotide continuous region with one or more marker genes wherein said marker gene is selected from the group consisting of SEQ ID NO : 1 to SEQ ID NO : 1021.

Embodiment 28: The use of Embodiment 27, wherein said nucleic acid comprises a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity over at least a 10 nucleotide continuous region with one or more marker genes wherein said marker gene is selected from ZNF 12 (SEQ ID NO:83), C17orf65 (SEQ ID NO:977), BATl (SEQ ID NO:981),

ARHGDIA (SEQ ID NO: 1000), NAPA (SEQ ID NO:995), ATP5G2 (SEQ ID NO:996), DDX52 (SEQ ID NO:292), NDFIPl (SEQ ID NO:2) or SDADl (SEQ ID NO: 116), USP7 (SEQ ID NO: 1014), MEF2A (SEQ ID NO: 1007), AGER (SEQ ID NO:998), RABlB (SEQ ID NO: 1011), GDIl (SEQ ID NO:986) and BANFl (SEQ ID NO:999).

Embodiment 29: The use of Embodiment 27, wherein said marker gene is ZNF12 (SEQ ID NO:83), C17orf65 (SEQ ID NO:977), BATl (SEQ ID NO:981), ARHGDIA (SEQ ID NO: 1000), NAPA (SEQ ID NO:995), ATP5G2 (SEQ ID NO:996), DDX52 (SEQ ID NO:292), NDFIPl (SEQ ID NO:2) or SDADl (SEQ ID NO: 116).

Embodiment 30: The use of Embodiment 27, wherein said marker gene is USP7 (SEQ ID NO: 1014), MEF2A (SEQ ID NO: 1007), AGER (SEQ ID NO:998), RABlB (SEQ ID NO: 1011), GDIl (SEQ ID NO:986) or BANFl (SEQ

ID NO:999).

Embodiment 31: The use of Embodiment 27, wherein a plurality of nucleic acids are immobilised on said solid substrate, said plurality of nucleic acids having a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity over at least a 10 nucleotide continuous region with all marker gene sequences listed in Table 18, Table 19, Table IA, Table 2, Table 4, Table 8, or Table 13.

III. Examples

1. Molecular signature in CD4+ cells segregates CIS patients from controls

[0040] Gene expression microarray analysis was performed in negatively isolated naϊve CD4+ T-cells obtained from 37 CIS patients after initial clinical presentation (mean 4.5 +/- 2.6 months) and from 29 controls matched for age and gender. Four arrays failed to pass our quality control protocol and were thus excluded from further analysis. Demographic characteristics of the remaining patients (n = 34) and controls (n = 28) were similar (Table 3). Analysis was focused on the 1,718 probe sets that showed at least a 2 fold-change from each gene's median value in more than 20% of the samples.

[0041] Principal component analysis (PCA) using expression values from these 1,718 probe sets showed a clear segregation between controls and CIS samples (Figure Ia). Furthermore, a hierarchical clustering of all samples based on the expression of the same probe sets discriminated CIS from controls with high accuracy (only 3 samples were misclassified, 2 controls and 1 patient) (Figure Ib). The robustness index for this classification was 99.8% (see material and methods). Noteworthy, PCA and hierarchical clustering were performed with the 1,718 probe sets with the highest variance across all samples, but without any prior statistical testing for differences between cases and controls. When subjected to a T-test, 975 probe sets were found differentially expressed between CIS and controls after correction for multiple comparisons (FDR < 0.1 (Table 4). Interestingly, most of the discriminating transcripts (70%) were under-expressed in CIS whereas the remaining 30% were over-expressed. This finding is in agreement with previous observations that downregulated genes greatly outnumber upregulated genes in T lymphocytes from MS patients when studied by gene expression microarrays (6, 7) or by FACS (8). [0042] Gene ontology (GO) enrichment of these 975 differentially expressed genes revealed alteration of major molecular functions and biological processes (Figure Ic). Unexpectedly for an autoimmune disease, most genes involved in inflammatory responses were downregulated in CIS individuals including pro-inflammatory cytokines, chemokines, integrins, and HLA class II molecules (Table 4). One possible explanation for overall reduced transcript abundance is cell death. However, there was no evidence found that this generalized downregulation was due to increased apoptosis. Quite the opposite, transcripts coding for the anti-apoptotic molecule BAX were increased by 2-fold whereas those coding for the pro-apoptotic molecules BCL-2 and cytochrome C (CYCS) were decreased by 2- fold. Over-expressed genes were mostly involved in protein metabolism (27% of over- expressed genes) and/or nucleotide binding (23% of over-expressed genes). Altogether, these results suggest a decreased inflammatory activity in CD4+ T cells from CIS patients. [0043] On the basis of transcriptional activity, samples from CIS patients segregated into 4 groups (groups #1, 2, 3, 4) corresponding to the first 4 splits of the dendrogram (Robustness index 99.4%) (Figure Ib). Furthermore, the likelihood that this segregation occurred by chance using IBIS, a supervised machine learning approach was investigated (9) In short, the accuracy of 2-gene Bayesian models created was measured by using only a "training" set (70% samples) to classify a left out "test" set (30% samples) into the 4 previously defined CIS groups (Figure Id). The mean accuracy after 10 randomized splits of the samples for the top 7 gene pairs was more than 80% (Table 5).

2. Clinical and radiological characteristics of the 4 CIS groups

[0044] Patients from each of the 4 CIS transcriptional groups did not differ significantly according to age, gender, ethnic background, time from initial clinical event, or HLA- DRBl *1501 status (Figure 2a). In contrast, time to conversion into CDMS was significantly shorter in patients from group #1 (Figure 2a). The proportional Cox-regression hazard ratio for patients in group #1 was 3.5 (95% confidence interval [1.4-8.8], p = 0.008) indicating a much higher risk of MS conversion for these individuals. Gadolinium enhancement on brain MRI shortly after clinical presentation was significantly higher in patients from group #1 compared to those from other groups, also indicating a higher disease activity (Figure 2a). However, only 58% of the patients in group #1 showed gadolinium enhancement within 3 months of diagnosis.

[0045] In order to evaluate the concordance of gene expression with neurodegeneration, it was investigated to what extent changes in brain volume differed across the four CIS groups. To avoid biases related to therapy, only the 15 patients who did not receive disease- modifying therapy during the first year after diagnosis were included in this analysis. Normalized brain parenchyma (nBPV), white matter (nWMV), grey matter (nGMV) and CSF (nCSFV) volumes were not significantly different among the 4 groups at baseline. However, hierarchical clustering of changes after 1 year in these quantitative MRI parameters segregated patients in two major groups (Figure 2b). One group displayed higher volume change (i.e. increase in CSF and decrease in brain volume), indicating a larger degree of neurodegeneration. The other group was characterized by relatively low change in CSF or brain volume. Interestingly, all group #1 patients were clustered into the latter group (Chi-square test, p = 0.01). [0046] To further explore what information about conversion to MS is contained in gene expression at the CIS stage, a predictive model of survival (i.e. conversion to MS) was built based on supervised principal components (10). The resulting model contained mRNA gene products hybridizing to 28 probe sets set forth in Table 2 and allowed a segregation of CIS into high and low risk groups (Figure 2c). The separation remained significant even after adjustment for age, gender, and HLA-DRBl *1501 status (data not shown). Patients segregated into the high risk group based on their gene expression (red line), converted to MS by 9 months of follow up. Remarkably, 11 out of the 12 patients from group #1 were clustered into the high-risk group, thus resulting in a sensitivity of 92% and a specificity of 86%. This confirms the previous observation that gene expression-based segregation of group #1 patients is associated with high risk of MS conversion.

3. Gene expression still differentiates group #1 from other CIS patients a year later

[0047] Differential gene expression observed shortly after CIS diagnosis may either reflect an acute and transient biological response to the disease and/or a predisposing causative signature. To investigate this further and to confirm our observations, samples from the same individuals where collected and processed one year (mean 11 +/- 3 months) after diagnosis of CIS. For this follow-up, 31 CIS and 9 controls were available. Hierarchical clustering of these samples performed with the same 1,718 genes identified at baseline still discriminated CIS from controls (Figure 4a). Among them, differential expression in 461 transcripts was statistically significant, including 270 (59%) of the 975 genes originally found to be differentially expressed at baseline (Figure 4b). Remarkably, the first split of the samples' dendrogram segregated 100% of the previously identified group #1 patients from the rest of CIS patients and controls (Yellow box, Figure 4a). In contrast, previously identified CIS groups #2, #3 and #4 were no longer detected. In order to measure the robustness of group #1 signature along time, a support vector machine (SVM) classifier (with 10-fold leave-one-out crossvalidation) with the 108 expression values obtained at baseline (training set) was built and used to predict the status of samples obtained at 12 months (test set). This model classified group #1 samples at baseline with 100% accuracy, positive predictive value (PPV), and negative predictive value (NPV) (Figure 4c). After 12 months, the same model was able to predict group #1 patients with an accuracy of 86%, a PPV of 78% and a NPV of 90% (Figure 4c). Altogether these results suggest that the molecular signature found in naϊve CD4+ T cells of group #1 CIS patients is stable for at least 1 year.

4. TOBl abrogates T cell quiescence

[0048] Because group#l signature persists along time, focus lay on this group which also appears to constitute a relatively consistent biological entity. Among the RNA gene products hybridizing to the 975 probe sets set forth in Table 4 whose expression differentiates CIS from controls at baseline, RNA gene products hybridizing to 108 probe sets set forth in Table IA were also differentially expressed between group #1 and the other CIS groups combined (Figure 2d, Table 6). With the exception of a ribosomal protein (RPL37A), the most under-expressed gene in group#l patients was TOBl (transducer of ERBB-2, 1) a finding confirmed by RT-PCR (Figure 3a). TOBl is a member of the APRO (anti-pro liferative) family and has been shown to repress T cell proliferation (11). A strong downregulation of TOBl upon in- vitro activation of peripheral-blood CD4+ T cells from control individuals (n = 3, Figure 3b) was observed, which is in accordance with previous studies (12). To examine whether downregulation of TOBl is associated with T cell proliferation in-vivo, C57/B16 mice were immunized with either MOG 35 _ 55 or CFA and investigated TOBl protein levels in the lymph nodes by immunofluorescence. In agreement with the molecular and in- vitro data, TOB 1 immunostaining was decreased in both groups 3 days after immunization while higher levels of the protein were detected in the lymph nodes of naϊve mice (Figure 3c). These results suggest that TOBl downregulation can be detected as T cells proliferate in response to either a specific (MOG peptide) or unspecifϊc (CFA) antigenic stimulus.

[0049] C57/B16 mice injected with MOG35-55 reproducibly develop experimental autoimmune encephalomyelitis (EAE), a widely-employed laboratory model for MS. In contrast to the sustained inflammation endured by animals with EAE, the response to CFA is expected to be transient and it was hypothesized that patterns of Tobl expression should reflect this difference. To test this hypothesis, the database from two previous experiments was searched, in which high throughput gene expression in lymph nodes and spinal cords at the peak of EAE was measured (13, 14). As expected, Tobl mRNA expression was decreased in both lymph nodes and spinal cords of EAE animals compared to CFA controls (Figure 3d).

[0050] In addition to intra-molecular mechanisms, engagement of transmembrane receptors may contribute in regulating T cell homeostasis. Among the 3 differentially expressed genes coding for transmembrane receptors (SIGLEClO, EMRl and CD44) only CD44, was over-expressed in CIS patients (Table 6). This upregulation was confirmed by qRT-PCR (Figure 3e). Of interest, CD44 serves as a receptor for osteopontin (OPN or SPPl), a pleiotropic molecule that has been shown to be highly expressed in both MS plaques and EAE lesions (15). OPN acts as a key promoter of disease severity in EAE by directing the differentiation and survival of ThI cells (16). Plasma OPN levels in CIS group #1 patients were significantly higher than both other CIS and controls (Figure 3f).

[0051] Since CIS patients classified as group #1 converted to CDMS earlier than other CIS patients, it was hypothesized that TOBl is also implicated in the progression of disease once established. A genetic effect would be then expected in CDMS patients showing extreme phenotypes (mild or severe). This hypotheses was tested by genotyping 5 SNPs located within or near the gene (Figure 4g) in individuals selected from a cohort of more than 1,200 RRMS patients that were clinically classified as either "mild" (EDSS<3 15 years after onset, n=62) or "severe" (EDSS>6 10 years after onset, n=74). Allelic frequencies were analyzed by logistic regression and case-control association. Differences in allelic frequencies for marker rs4626 (coding, synonymous) between mild and severe cases were statistically significant by logistic regression for both genotype and trend tests (marker rs7221352 also showed both effects although without reaching statistical significance). The same two markers showed statistical significance in the allele case-control (mild vs. severe) analysis (Table 7). Haplotype analysis with these two markers also showed statistical significance when the associated, but not the neutral alleles were considered (G-A, exact p- value=0.0115; A-G, exact p-value=0.0353). Altogether this data suggests that while TOBl downregulation identifies CIS patients at higher risk of conversion to CDMS, there is also a genetic association between markers in this gene and the clinical progression of CDMS patients.

5. Patients and samples

[0052] The study cohort consisted of 37 untreated CIS patients and 29 healthy control subjects matched for age and sex, evaluated at the UCSF Multiple Sclerosis Center. CIS patients were identified as subjects presenting with a first well-defined, neurological event persisting for more than 48 hours involving the optic nerve, brain parenchyma, brainstem, cerebellum, or spinal cord. All CIS patients demonstrated at least two abnormalities on brain MRI measuring greater than 3 mm 2 . Patients were followed for an average of 20 (+/- 8) months. Time to conversion was defined as the delay between recruitment and next clinical event or the date of identified MRI changes fulfilling the McDonald criteria (5). Written informed consent was obtained from all study participants.

6. Magnetic resonance imaging

[0053] MRI scans for all subjects were acquired on a 1.5 T GE (GE) MRI scanner with a standard head coil. All CIS subjects were scanned every 3 months during the first year of follow-up and then every 6 months during the second year. T2 hyperintense lesions were identified on simultaneously viewed T2 and proton density- weighted dual echo (lmm x lmm x 3mm pixels, interleaved slices, 20 ms and 80 ms echo times) images with regions of interest drawn based on a semi-automated threshold with manual editing as described elsewhere (26). Annual percent brain volume change (PBVC) was calculated from high resolution 3D Tl -weighted spoiled gradient recalled echo volumes (pixel size of lmm x lmm x 1.5mm, 124 slices, flip angle 40°) using SIENA (27).

7. RNA preparation and hybridization

[0054] Blood samples were collected at the time of recruitment into the study (baseline) and after 12 months. Peripheral blood mononuclear cells (PBMC) were separated on a Ficoll gradient and frozen in liquid nitrogen until needed. Naϊve CD4+ T cells were isolated by negative selection using Dynabeads® (Invitrogen). CD4+ T cells purity was assessed by FACS (>95%, data not shown). RNA was then extracted using RNeasy® Mini kit (Quiagen), amplified with MessageAmp™ II a RNA kit (Ambion) and labeled with Bio- 11-UTP for subsequent hybridization onto Affymetrix® Human Genome U 133 Plus2.0 arrays (TGEN). Thus, the probe set identifier numbers set forth in the Tables below (including Tablesl8, 19, IA, 2, 4, 8 and 13) are in reference to Affymetrix® Human Genome Ul 33 Plus2.0 arrays.

8. Statistical analysis [0055] Quality control (QC) analysis of the arrays was performed using the Bioconductor package, available at the bioconductor.org website. In order to pass QC, arrays had to have at least 40% of their probe sets called present and had to have similar RNA degradation slopes, GAPDH and beta-actin ratios, scaling factors, histograms and box plot of intensities. Arrays were normalized using RMA (28). Statistical analyses were carried out using BRB- array Tools (Biometrics Research Branch, N1H). For multiple comparison correction, genes were considered differentially expressed if the univariate p-value was less than 0.001 and False discovery rate (FDR) less than 0.1 (29). Genes predicting MS conversion were determined using the Survival Analysis Prediction Tool of BRB-array Tools. The 2 survival risk groups were built using PCA with a p-value set at 0.001 for univarietely correlated genes with survival and leave-one-out-cross validation. (10) For cases with above and below average risk (50 th percentile) Kaplan-Meier survival curves were used. Hierarchical clustering was performed using Genes@work® software (IBM Research). To gauge robustness in the classification, the dataset was perturbed by adding random (white) Gaussian noise using the median variance of the dataset and re-clustered the samples 100 times. The index of robustness is the mean percentage of times a pair of samples remained in the same cluster. To investigate the likelihood that segregation into 4 groups occurred by chance, the Integrated Bayesian Inference System (IBIS) was used, which is a supervised machine learning approach (9).

9. Univariate and multivariate statistical models: [0056] In order to calculate the marker to GAPDH ratio in a patient, univariate and multivariate statistical models are used. In univariate statistical models, the characteristic of each individual gene in classifying samples as being high or low risk genes is determined. In multivariate models, the best possible combination of two or more genes that can maximize the positive predictive value (PPV) or negative predictive value (NPV) is established. The positive predictive value, is defined as the number of true positives per total of true and false positives, whereas the negative predictive value describes the number of true negatives per total of true negatives and false negatives. Applying this statistical model provides methods to discriminate between high risk and low risk patients. [0057] As discussed above, 108 probe sets were identified (Table IA) by T-test analysis that hybridized to gene products that were differentially expressed between group#l and other subjects. And 28 probe sets were identified (Table 2) by principal-component-based survival analysis that hybridized to gene products that were differentially expressed by high risk CIS patients. The combined set of 136 probe sets (108 + 28) were used to search for classifiers that could discriminate between the two groups with a reduced number of genes. Using compound covariate predictor (CCP), diagonal linear discriminant analysis (DLDA), and support vector machines (SVM) classifiers (see below), 13 probe sets were identified (Table 8) that hybridized to gene products that were differentially expressed. The CCP, DLDA and SVM were run with default parameters and within the BRB array tools application available from the National Cancer Institute. For each classifier a specific weight was assigned to each probe set as set forth in Table 9. The expression value of each probe set was normalized by that of two housekeeping (HK) genes: GAPDH and ACTB, with the results are provided in Tables 1OA to 12C, which detail the predictive value of the statistical model by providing the number of CIS patients that developed MS within nine months (MS) and the number that did not develop MS within nine months (No MS) and the corresponding prediction based on the threshold value. [0058] An independent (network-based) search was conducted based on the hypothesis that groups of genes whose products interact physically are likely to define biologically functional modules, as described in Ideker, T., et al. (2002). "Discovering regulatory and signalling circuits in molecular interaction networks." Bioinformatics 18 Suppl 1 : S233-40. Unlike with classical statistical analyses, identification of these modules allows for direct biological interpretation of the results. Briefly, we implemented a sub-network identification tool based on the algorithm previously described by Ideker et al. to identify groups of functionally related genes that could classify high versus low risk CIS patients. This algorithm consists of the following steps. First, a protein interaction database was downloaded locally. Second, starting from each node in the network, a sub-network was recursively grown by the addition of one neighboring node at a time. At each step, a scoring function was computed based on the mutual information between the weighted average of the expression values of all nodes considered at this step, and the vector of phenotypes (case versus control, high vs. low risk, etc). Third, the sub-network continued to grow until addition of a new node did not increase the score significantly. Three classifiers were constructed using the CCP, DLDA, and SVM algorithms. The network based search resulted in the identification of 6 probe sets (Table 13) that hybridized to gene products that were differentially expressed. For each classifier a specific weight was assigned to each probe set as set forth in Table 14. The expression value of each probe set was normalized by that of two housekeeping (HK) genes: GAPDH and ACTB. The predictive value and threshold values for the 6 probe sets were calculated, with the results provided in Table 15A to 17C, which detail the predictive value of the statistical model by providing the number of CIS patients that developed MS within nine months (MS) and the number that did not develop MS within nine months (No MS) and the corresponding prediction based on the threshold value.

[0059] The compound covariate predictor (CCP) used in the above studies is a weighted linear combination of log-ratios (or log intensities for single-channel experiments) for genes that are univariately significant at the specified level. By specifying a more stringent significance level, fewer genes are included in the multivariate predictor. Genes in which larger values of the log-ratio pre-dispose to class 2 rather than class 1 have weights of one sign, whereas genes in which larger values of the log-ratios pre-dispose to class 1 rather than class 2 have weights of the opposite sign. The univariate t- statistics for comparing the classes are used as the weights. The CCP is described in further detail in Radmacher MD, McShane LM, and Simon R. A paradigm for class prediction using gene expression profiles. Journal of Computational Biology 9:505-511 , 2002; and I Hedenfalk, D Duggan, Y Chen, M Radmacher, M Bittner, R Simon, P Meltzer, B Gusterson, M Esteller, M Raffeld, et al. Gene expression profiles of hereditary breast cancer, New England Journal of Medicine 344:539-548, 2001. [0060] The Diagonal Linear Discriminant Analysis (DLDA) used in the above studies is similar to the Compound Covariate Predictor, but not identical. It is a version of linear discriminant analysis that ignores correlations among the genes in order to avoid over- fitting the data. Many complex methods have too many parameters for the amount of data available. Consequently they appear to fit the training data used to estimate the parameters of the model, but they have poor prediction performance for independent data. The DLDA is described in further detail in McLachlan GJ. Discriminant Analysis and Statistical

Pattern Recognition Wiley-Interscience; New Ed edition (August 4, 2004); and Dudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data, Journal of the American Statistical Association 97:77- 87, 2002). [0061] The support vector machine (SVM) used in the above studies is a class prediction algorithm that has appeared effective in other contexts and is currently of great interest to the machine learning community. The SVM predictor can employ a variety of functions, as known in the art. In some embodiments, the SVM predictor is a linear function of the log- ratios or the log-intensities that best separates the data subject to penalty costs on the number of specimens misclassifϊed. The SVM is described in further detail in Vapnik V. The Nature of Statistical Learning Theory. Springer- Verlag, 1995.

10. Quantitative RT-PCR

[0062] Master mix was prepared essentially as described previously, (9) with the addition of 200 μM ROX (Sigma), and overlaid on top of each well of a freshly thawed 384-well plate containing 5 ng of RNA in each well. Reactions were performed in triplicates using an ABI 7900 Sequence Detection System (Applied Biosystems).

11. Immunofluorescence and ELISA

[0063] Draining lymph nodes from either naive or injected (MOG35_55 or CFA alone) C57/B16 mice were removed, washed in PBS and then embedded in OCT and frozen.

Sections were cut at 6μm on a cryostat and stained for immunofluorescence examination using either a rabbit anti-TOBl polyclonal antibody (H-70, Santa Cruz Biotechnology Inc. CA), or a purified rat anti CD4 antibody (BD Pharmingen). Secondary antibodies were anti-rabbit Alexa 488 (Molecular Probes, Eugene OR) and anti-rat Alexa 594 (Molecular Probes). ELISAs for OPN were carried out using the Quantikine kit (R&D Systems) according to manufacturer's instructions.

12. Genotyping of TOBl SNPs

[0064] Five single nucleotide polymorphisms (SNP) located within or near TOBl were selected for genotyping in 62 mild and 74 severe MS patients. Mild disease was defined as EDSS<3 after 15 years of onset while severe was defined as EDSS>6 after 10 years of onset. Genotyping assays were carried out in 384-well plates using TaqMan® Universal PCR Master Mix on an ABI GeneAmp PCR System 7900 (Applied Biosystems). Statistical tests were carried out in SAS and Jmp Genomics suite (SAS). For haplotype analysis, exact p-values were calculated using the EM algorithm in a Monte Carlo approach with 10,000 permutations. 13. Tables

[0065] Tables 1-19 follow. Tables IA, IB and 2 provide differential gene expression analysis data. Table 3 provides data regarding subject characteristics at baseline. Table 4 provides a list of 975 genes differentially expressed between CIS and controls at baseline. Table 5 provides data regarding mean predictive accuracy of the top seven gene pairs.

Table 6 provides data relating to the signature of group #1 patients. And Table 7 provides genotyping data of 5 TOBl SNP in patients with mild (n=62) or severe (n=74) MS. Tables 9 to 19 present data resulting form the statistical model analysis as described herein. Terms used in the tables are as follows: The term "SD" in the context of statistical analysis refers to the standard deviation, as known in the art. The term "Ave." refers to the statistical average, as known in the art. The term "Grpl" refers to Group #1 as described herein.

IV. References

[0066] 1. Hauser SL & Goodin DS (2005) in Harrison 's Principles in Internal

Medicine, eds. Braunwald E, Fauci AD, Kasper DL, Hauser SL, Longo DL, & Jameson JL (McGraw-Hill, New York), pp. 2461 -2471.

[0067] 2. Kappos L, et al. (2006) Treatment with interferon beta-lb delays conversion to clinically definite and McDonald MS in patients with clinically isolated syndromes.

Neurology 67, 1242-1249.

[0068] 3. Korteweg T, et al. (2006) MRI criteria for dissemination in space in patients with clinically isolated syndromes: a multicentre follow-up study. Lancet Neurol 5, 221-

227.

[0069] 4. Brex PA, et al. (2002) A longitudinal study of abnormalities on MRI and disability from multiple sclerosis. N Engl J Med 346, 158-164.

[0070] 5. McDonald WI, et al. (2001) Recommended diagnostic criteria for multiple sclerosis: guidelines from the International Panel on the diagnosis of multiple sclerosis. Ann

Neurol 50, 121-127.

[0071] 6. Satoh J, et al. (2005) Microarray analysis identifies an aberrant expression of apoptosis and DNA damage-regulatory genes in multiple sclerosis. Neurobiol Dis 18, 537-

550. [0072] 7. Satoh J, et al. (2006) T cell gene expression profiling identifies distinct subgroups of Japanese multiple sclerosis patients. J Neuroimmunol 174, 108-118. [0073] 8. Kantor AB, et al. (2007) Identification of short-term pharmacodynamic effects of interferon-beta-la in multiple sclerosis subjects with broad- based phenotypic profiling. J Neuroimmunol 188, 103-116.

[0074] 9. Baranzini SE, et al. (2005) Transcription-based prediction of response to IFNbeta using supervised computational methods. PLoS Biol 3, e2.

[0075] 10. Bair E & Tibshirani R (2004) Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol 2, E 108.

[0076] 11. Tzachanis D, et al. (2001) Tob is a negative regulator of activation that is expressed in anergic and quiescent T cells. Nat Immunol 2, 1174-1182. [0077] 12. Yusuf I & Fruman DA (2003) Regulation of quiescence in lymphocytes.

Trends Immunol 24, 380-386.

[0078] 13. Baranzini SE, Bernard CC, & Oksenberg JR (2005) Modular transcriptional activity characterizes the initiation and progression of autoimmune encephalomyelitis. J

Immunol 174, 7412-7422. [0079] 14. Otaegui D, et al. (2007) Increased transcriptional activity of milk-related genes following the active phase of experimental autoimmune encephalomyelitis and multiple sclerosis. J Immunol 179, 4074-4082.

[0080] 15. Chabas D, et al. (2001) The influence of the proinflammatory cytokine, osteopontin, on autoimmune demyelinating disease. Science 294, 1731-1735. [0081] 16. Hur EM, et al. (2007) Osteopontin-induced relapse and progression of autoimmune brain disease through enhanced survival of activated T cells. Nat Immunol 8,

74-83.

[0082] 17. Tintore M, et al. (2001) Isolated demyelinating syndromes: comparison of

CSF oligoclonal bands and different MR imaging criteria to predict conversion to CDMS. Multiple Sclerosis 7, 359-363.

[0083] 18. Berger T, et al. (2003) Antimyelin antibodies as a predictor of clinically definite multiple sclerosis after a first demyelinating event. N EnglJ Med 349, 139-145.

[0084] 19. Kuhle J, et al. (2007) Lack of association between antimyelin antibodies and progression to multiple sclerosis. N EnglJ Med 356, 371-378. [0085] 20. Orban T, et al. (2007) Reduced CD4+ T-cell-specific gene expression in human type 1 diabetes mellitus. J Autoimmun 28, 177-187.

[0086] 21. Tzachanis D, Lafuente EM, Li L, & Boussiotis VA (2004) Intrinsic and extrinsic regulation of T lymphocyte quiescence. Leuk Lymphoma 45, 1959-1967. [0087] 22. Matsuoka S, et al. (1995) p57KIP2, a structurally distinct member of the p2 ICIPl Cdk inhibitor family, is a candidate tumor suppressor gene. Genes Dev 9, 650-662. [0088] 23. Plon SE, Leppig KA, Do HN, & Groudine M (1993) Cloning of the human homo log of the CDC34 cell cycle gene by complementation in yeast. Proc Natl Acad Sci USA 90, 10484-10488.

[0089] 24. Ousman SS, et al. (2007) Protective and therapeutic role for alphaB- crystallin in autoimmune demyelination. Nature 448, 474-479.

[0090] 25. Vogt MH, Lopatinskaya L, Smits M, Polman CH, & Nagelkerken L (2003) Elevated osteopontin levels in active relapsing-remitting multiple sclerosis. Ann Neurol 53, 819-822.

[0091] 26. Blum D, et al. (2002) Dissociating perceptual and conceptual implicit memory in multiple sclerosis patients. Brain Cogn 50, 51-61.

[0092] 27. Smith SM, De Stefano N, Jenkinson M, & Matthews PM (2001) Normalized accurate measurement of longitudinal brain change. J Comput Assist Tomogr 25, 466-475. [0093] 28. Irizarry RA, et al. (2003) Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res 31, el 5.

[0094] 29. Benjamini Y & Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc, 289-300.

Table 4.