Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
NUCLEIC ACID SEQUENCES ASSOCIATED WITH CELL STATES
Document Type and Number:
WIPO Patent Application WO/2007/019499
Kind Code:
A3
Abstract:
The present invention is directed to nucleic acid sequences whose expression is associated with different cell states, including nucleic acid sequences whose expression is induced at least 100-fold, or alternatively upregulated, in cells exhibiting asymmetric self-renewal relative to other cells. The invention is also directed to nucleic acid sequences whose expression is induced at least 100-fold, or alternatively upregulated, in cells exhibiting symmetric self-renewal relative to other cells.

Inventors:
SHERLEY JAMES L (US)
NOH MIN-SOO (US)
Application Number:
PCT/US2006/030887
Publication Date:
June 28, 2007
Filing Date:
August 08, 2006
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
MASSACHUSETTS INTITUTE OF TECH (US)
SHERLEY JAMES L (US)
NOH MIN-SOO (US)
International Classes:
C12Q1/68
Other References:
R LEWIS: "From parts list to architecture", THE SCIENTIST, vol. 18, no. 16, 30 August 2004 (2004-08-30), US, pages 24 - 24, XP002430531
FORTUNEL NICOLAS O ET AL: "Comment on " 'Stemness': transcriptional profiling of embryonic and adult stem cells" and "a stem cell molecular signature".", SCIENCE (NEW YORK, N.Y.) 17 OCT 2003, vol. 302, no. 5644, 17 October 2003 (2003-10-17), pages 393; author reply 393, XP002430532, ISSN: 1095-9203
BHAT KRISHNA MOORTHI ET AL: "Upregulation of Mitimere and Nubbin acts through cyclin E to confer self-renewing asymmetric division potential to neural precursor cells.", DEVELOPMENT (CAMBRIDGE, ENGLAND) MAR 2004, vol. 131, no. 5, March 2004 (2004-03-01), pages 1123 - 1134, XP002430533, ISSN: 0950-1991
RAMALHO-SANTOS MIGUEL ET AL: ""Stemness": transcriptional profiling of embryonic and adult stem cells.", SCIENCE (NEW YORK, N.Y.) 18 OCT 2002, vol. 298, no. 5593, 18 October 2002 (2002-10-18), pages 597 - 600, XP002430534, ISSN: 1095-9203
IVANOVA N B ET AL: "A stem cell molecular signature", SCIENCE, AMERICAN ASSOCIATION FOR THE ADVANCEMENT OF SCIENCE,, US, vol. 298, no. 5593, 18 October 2002 (2002-10-18), pages 601 - 604, XP002327803, ISSN: 0036-8075
Attorney, Agent or Firm:
EISENSTEIN, Ronald, I. et al. (100 Summer Street Boston, MA, US)
Download PDF:
Claims:

We claim:

1. A method for identifmg a cell exhibiting asymmetric self-renewal comprising detecting or measuring expression of five or more of the nucleic acid sequences selected from the group consisting of SEQ ID NOs: 1 - 141, wherein an at least 100 fold increase in expression level relative to isogenic cells not undergoing asymmetric replication of five or more of said nucleic acids is indicative of a cell exhibiting asymmetric self-renewal, and wherein said expression level is measured using a nucleic acid array.

2. The method of claim 1, wherein one measures expression of at least 10 of said nucleic acid sequences.

3. The method of claim 1 , wherein the culture of cells comprises human cells and at least one of the nucleic acids is a human homolog from the group consisting of AF308602; AI264121; AU160041; AL136573; NM_017585; AF047004; AL136566; NM_005545; AF327066; U73531; BC016797; BE781857; NM_024660; NM_019099; AL133001; NM_024587; AI954412; AI393309; NM_030581; and NM_017585.

4. The method of claim 1 , wherein the culture of cells comprises human cells and at least one of the nucleic acids is a human homolog selected from the group consisting ofNM_008714; BB559706; AK005731; BB131106; BB196807; BI217574; BC024599, NM_012043; NM_008026; NM_030712; BF457736; BE981473; BB009770;

) BB049759; AU020235; BC019937; BC026495; AW259452; BB215355; and BB196807.

5. A method for identifying a cell exhibiting symmetric self-renewal comprising detecting or measuring expression of five or more of the nucleic acid sequences selected from the group consisting of SEQ ID NOs: 142 - 215, wherein an at least 100 fold increase in expression level relative to isogenic cells not undergoing asymmetric replication of five or more of said nucleic acids is indicative of a cell exhibiting symmetric self-renewal , and wherein said expression level is measured using a nucleic acid array.

6. The method of claim 5, wherein one measures expression of at least 10 of said nucleic acid sequences.

7. A method for identifying a cell exhibiting asymmetric self-renewal comprising detecting or measuring expression of five or more of the nucleic acid

> sequences selected from the group consisting of SEQ ID NOs: 216 - 418, wherein an at least 100 fold increase in expression level relative to isogenic cells not undergoing asymmetric replication of five or more of said nucleic acids is indicative of a cell exhibiting asymmetric self-renewal, and wherein said expression level is measured using a nucleic acid array.

) 8. The method claim 7, wherein one measures expression of at least 10 of such nucleic acid sequences.

9. A method for identifying a cell exhibiting symmetric self-renewal comprising detecting or measuring expression of five or more of the nucleic acid sequences selected from the group consisting of SEQ ID NOs: 419 - 604, wherein an at i least 100 fold increase in expression level relative to isogenic cells not undergoing asymmetric replication of five or more of said nucleic acids is indicative of a cell exhibiting symmetric self-renewal , and wherein said expression level is measured using a nucleic acid array.

10. The method of claim 9, wherein one measures expression of at least 10 of i said nucleic acid sequences.

11. A method for identifying a cell exhibiting symmetric self-renewal comprising detecting or measuring expression of five or more of the nucleic acid sequences selected from the group consisting of SEQ ID NOs: 605-624, wherein an at least 100 fold change in expression level expression relative to isogenic cells not undergoing asymmetric replication of five or more of said nucleic acids is indicative of a cell exhibiting symmetric self-renewal , and wherein said expression level is measured using a nucleic acid array.

16. The method of claim 11, wherein one measures expression of at least 10 of said nucleic acid sequences.

17. The method of claim 11, wherein the change in expression level is an at least 100 fold increase in expression level.

Description:

NUCLEIC ACID SEQUENCES ASSOCIATED WITH CELL STATES

CROSS REFERENCE TO RELATED APPLICATIONS

[001] This Application claims the benefit under 35 U.S.C §119(e) of U.S. Provisional Application No. 60/706,366 filed August 08, 2005.

GOVERNMENT SUPPORT

[002] This invention was supported by N.I.H.-N.H.G.R.I. grant number PSO HG 003170-02, andN.I.H.-N.I.E.H.S. C.E.H.S. pilot grant, and the government of the United States has certain rights thereto.

FIELD OF THE INVENTION

[003] The present application is directed to our identification of certain groupings of nucleic acid sequences associated with different cell states, including asymmetric self-renewal associated genes and symmetric self-renewal associated genes. The invention provides methods of using such nucleic acid sequences, including methods to identify cells displaying asymmetric self-renewal (ASR), stem cells, stem cell specific markers, methods to identify and enumerate ASR cells, stem cells, as well as methods of using such nucleic acids.

BACKGROUND OF THE INVENTION

[004] Considerable attention has focused on stem cells such as embryonic stem cells and non-embryonic stem cells, and their uses in a range of therapies. The availability of stem cells from non-embryonic tissues can greatly contribute to cell replacement therapies such as bone marrow transplants, gene therapies, tissue engineering, and in vitro organogenesis. Production of autologous stem cells to replace injured tissue would also reduce the need for immune suppression interventions. Beyond their potential therapeutic applications, homogenous preparations of, for example, adult stem cells would have another important benefit, the ability to study their molecular and biochemical properties.

[005] The existence of stem cells in somatic tissues is well established by functional tissue cell transplantation assays (Reisner et al, 1978). However, their individual identification has been difficult to accomplish. Even though their numbers have been enriched by methods such as immuno-selection with specific antibodies, there are no known markers that uniquely identify stem cells in somatic tissues (Merok and Sherley, 2001). Secondly, adult stem cells are often present in only minute quantities, are difficult to isolate and purify, and their numbers may decrease with age.

[006] Mammalian adult stem cells replicate by asymmetric self-renewal to replenish cells in tissues that undergo cell turnover but maintain a constant cell mass (J. L. Sherley, Stem Cells 20, 561 (2002); M. Loeffler, C. S. Potten, in Stem Cells (ed, Potten, C. S.) 1-27 (Academic Press, London, 1997)). Each asymmetric adult stem cell division yields a new stem cell and a non-stem cell sister. The non-stem cell sister becomes the progenitor of the differentiated cells responsible for mature tissue functions (Loeffler, 1997; Sherley, 2002). In contrast, embryonic stem cells exhibit symmetric self-renewal (Stead E, et al., Oncogene 21(54):8320-33 (2002); Savatier P, et al., Oncogene (3):809-18 (1994)).

[007] Cells display a range of expression states at certain times or in response to environmental stimuli, e.g. from resting to replicating. Recently attention has focused on identifying gene patterns, including mRNA patterns and protein expression patterns, connected with such different states. This is sometimes referred to as gene profiling - where transcriptomes associated with a specific state are identified. Being able to identify certain genes (and/or associated proteins and/or transcripts) that are associated with a cell being in a specific state permits one to readily identify and screen for specific cells, even from a population of related cells.

[008] Thus, despite the need for methods to identify and isolate specific cells from an individual, it has not been possible to readily do so. Accordingly, it would be desirable to have a method to identify markers associated with different cells and/or different cells states in mammalian tissues.

SUMMARY OF THE INVENTION

[009] We have now discovered groupings of nucleic acid sequences and corresponding proteins whose expression is associated with different cell states.

[0010] One embodiment of the invention is directed to nucleic acid sequences whose expression is changed by at least 100-fold in cells exhibiting asymmetric self- renewal relative to isogenic cells not undergoing such replication, as measured using a nucleic acid array. In one embodiment, the change in expression is measured using Affymetrix.TM nucleic acid technology. Preferably, the change is an induction, one can also look for suppression - i.e., a decrease in expression.

[0011] One embodiment provides a gene expression profile associated with asymmetric self-renewal comprising an at least 100 fold increase in expression level relative to isogenic cells not undergoing asymmetric replication of at least five nucleic acid sequences, preferably at least ten nucleic acid sequences, selected from the group of Table 1, SEQ ID NOs: 1 — 141. In one embodiment, the cells are human cells and at least one of the nucleic acid sequences is selected from the group consisting AF308602; AI264121; AU160041; AL136573; NM_017585; AF047004; AL136566; NM_005545; AF327066; U73531; BC016797; BE781857; NM_024660; NM_019099; AL133OO1; NM_024587; AI954412; AI393309; NM_O3O581; and NM_017585. In one embodiment, the cells are murine cells and at least one of the nucleic acids is selected from the group consisting of NM_008714; BB559706; AK005731; BB131106; BB196807; BI217574; and BC024599, NM_012043; NM_008026; NM_030712; BF457736; BE981473; BB009770; BB049759; AU020235; BC019937; BC026495; AW259452; BB215355; and BB196807.

[0012] One embodiment of the invention provides identifying nucleic acid sequences whose expression is induced by at least 100-fold in cells exhibiting symmetric self-renewal relative to other cells. One embodiment provides a gene expression profile associated with symmetric self-renewal comprising at least five nucleic acid sequences, preferably at least ten nucleic acid sequences, selected from the group of Table 2, SEQ ID NOs: 142 - 215.

[0013] One embodiment of the invention provides identifying nucleic acid sequences whose expression is upregulated in cells exhibiting asymmetric self-renewal

relative to other cells. One embodiment provides a gene expression profile associated with asymmetric self-renewal comprising at least five nucleic acid sequences, preferably at least ten nucleic acid sequences, selected from the group of Table 3, SEQ ID NOs: 216 - 418.

[0014] One embodiment of the invention provides identifying nucleic acid sequences whose expression is upregulated in cells exhibiting symmetric self-renewal, as compared to cells exhibiting asymmetric self-renewal. One embodiment provides a gene expression profile associated with symmetric self-renewal comprising at least five nucleic acid sequences, preferably at least ten nucleic acid sequences, selected from the group of Table 4, SEQ ID NOs: 419 - 604.

[0015] The nucleic acid sequences of the invention may be used as markers for cells exhibiting different cell states. In one embodiment, expression of at least 5, preferably at least 10, of the nucleic acid sequences of Table 1, SEQ ID NOs: 1 - 141, is indicative of asymmetrically self-renewing cells.

[0016] One embodiment of the invention provides for identifying a cell exhibiting symmetric self-renewal comprising detecting or measuring expression of five or more of the nucleic acid sequences selected from the group consisting of SEQ ID NOs: 605-624, wherein an at least 100 fold change in expression level expression relative to isogenic cells not undergoing asymmetric replication of five or more of said nucleic acids is indicative of a cell exhibiting symmetric self-renewal , and wherein said expression level is measured using a nucleic acid array. In one embodiment, the change in expression level is an at least 100 fold increase in expression level. In one embodiment, one measures expression of at least 10 of said nucleic acid sequences.

BRIEF DESCRIPTION OF THE DRAWINGS

[0017] Figure 1 is a schematic that shows asymmetric self-renewal kinetics of adult stem cells.

[0018] Figure 2 is a schematic that shows cell culture model systems which conditionally exhibit asymmetric self-renewal or symmetric self-renewal. Essential features of the model cell lines for studying asymmetric self-renewal include 1)

reversible regulation of self-renewal symmetry by p53 expression, and 2) non-random chromosome co-segregation. Four different models are shown.

[0019] Figure 3 is a schematic that shows the experimental design for the

Affymetrix GeneChip™ analysis.

[0020] Figure 4 shows three graphs of expression of p53, IMPDH2, and p21 using two different probe sets to analyze three populations of cells: p53 null cells, which exhibit symmetric self-renewal; p53 induced cells, which exhibit asymmetric self-renewal; and p53 induced cells which also express IMPDH, which exhibit symmetric self-renewal.

[0021] Figure 5 shows representative results of genes exclusively expressed in cells with asymmetric self-renewal, exclusively expressed in cells with symmetric self- renewal, genes differentially expressed in cells with asymmetric self-renewal, and genes differentially expressed in cells with symmetric self-renewal.

[0022] Figure 6 shows a Western blot confirming the expression of several genes identified by evaluation of whole genome transcripts associated with different cell self-renewal states.

[0023] Figure 7 shows the expression of several proteins exclusively expressed in cells exhibiting asymmetric self-renewal.

[0024] Figure 8 shows change in the localization of survivin, an asymmetric i self-renewal associated gene down-regulated during ASR, during the different stages of mitosis in asymmetrically self-renewing (non-random chromosome segregation) cells compared to symmetrically self-renewing cells (random chromosome segregation). The localization of survivin is normal in asymmetrically self-renewing cells (non-random chromosome segregation), except in telophase when it is often i undetectable in centrosomes.

[0025] Figure 9 shows that survivan localization to the centrosome is reduced during non-randome chromosome segregation. These data represent quantitative analysis of survivin localization during prophase, metaphase, anaphase, and telophase

in asymmetrically self-renewing (non-random chromosome segregation) cells compared to symmetrically self-renewing cells (random chromosome segregation).

DETAILED DESCRIPTION OF THE INVENTION

[0026] We have now discovered groups of nucleic acid sequences associated with different cell states. Accordingly, the present invention is directed to gene groups and methods of using the gene groups to identify cells in different cell states, including asymmetric self-renewal and symmetric self-renewal.

[0027] Asymmetric self-renewal (ASR, sometimes referred to as asymmetric replication) is illustrated in Figure 1 (J. L. Sherley, Stem Cells 20, 561 (2002); M. Loeffler, C. S. Potten, in Stem Cells (ed, Potten, C. S.) 1-27 (Academic Press, London, 1997)). Mammalian adult stem cells display ASR and use ASR to replenish cells in

« tissues that undergo cell turnover but maintain a constant cell mass (Loeffler, 1997; Sherley, 2002). Each asymmetric adult stem cell division yields a new stem cell and a non-stem cell sister (i.e. a differentiated as opposed to pluripotent cell). The non-stem cell sister becomes the progenitor of the differentiated cells responsible for mature tissue functions (Loeffler, 1997; Sherley, 2002).

[0028] Symmetric self renewal is a general property of established cell lines in culture. Shifts from asymmetric self-renewal to symmetric self-renewal occur during adult maturation, wound repair, and in precancerous cells (see Figure 1). Additionally, embryonic stem cells exhibit symmetric self-renewal (Stead E, et al., Oncogene 21(54):8320-33 (2002); Savatier P, et al., Oncogene (3):809-18 (1994)).

[0029] Because asymmetric self-renewal is associated with non-embryonic stem cells, genes whose expression profiles are associated with asymmetric self- renewal are useful to identify such stem cells.

[0030] The present invention takes advantage of cell lines which model asymmetric and symmetric self-renewal, as illustrated in Figures 2 and 3. One regulator of asymmetric self-renewal is the p53 tumor suppressor protein. Several stable cultured murine cell lines have been derived that exhibit asymmetric self-

renewal in response to controlled expression of the wild-type murine p53 (Figure 2). (Sherley, 1991; Sherley et al, 1995 A-B; Liu et al., 1998 A-B; Rambhatla et al., 2001).

Gene expression profiles

[0031] We have now discovered various nucleic acid sequences whose expression is associated with different cell states. These global changes in gene expression are also referred to as expression profiles. The expression profiles have been used to identify individual genes that are differentially expressed under one or more conditions. In addition, the present invention identifies groups of genes that are differentially expressed. As used herein, "gene groups" includes, but is not limited to, the specific genes identified by accession number herein, as well as related sequences, the mRNAs and associated proteins.

[0032] The present invention provides gene groups whose expression is associated either with cells expressing asymmetric self-renewal or symmetric self- renewal. The gene groups are further classified into genes expressed exclusively in cells exhibiting asymmetric self-renewal; genes whose expression is induced in cells exhibiting asymmetric self-renewal relative to other cells; genes expressed exclusively in cells exhibiting symmetric self-renewal; and genes whose expression is induced in cells exhibiting symmetric self-renewal relative to other cells. Thus, by looking at enhanced or reduced expression in genes relative to other cells or other replicating cells one can readily screen for and select cells from a population of similar cells that are undergoing ASR or symmetric self-renewal. The chang in expression of genes relataive ot other cells can be at least 50-fold, at least 100-fold, at least 150-fold, at least 200 fold, or at least 250-fold.

[0033] One embodiment of the invention provides nucleic acid sequences whose expression is induced by at least 100-fold in cells exhibiting asymmetric self- renewal relative to other cells. One embodiment provides a gene expression profile associated with asymmetric self-renewal comprising at least five nucleic acid sequences selected from the group of Table 1, SEQ ID NOs: 1 - 141. Preferably, one looks for changes in at least ten genes from the group. As used herein, all combinations between 5 to all 141 members can be looked at, such as 15, 20, 25, 35, 50, 75, 100, 141, etc. Additionally, one can look at other indicators of gene expression

such as mRNA or the expression of the encoded proteins. In one embodiment, the cells are human cells and at least one of the nucleic acid sequences is selected from the group consisting AF308602; AI264121; AUl 60041; AL136573; NM_017585; AF047004; AL136566; NM_005545; AF327066; U73531; BC016797; BE781857; NM_024660; NM_019099; AL133001; NM_024587; AI954412; AI393309; NM_030581 ; and NM_017585 (see Table 6). In one embodiment, the cells are murine cells and at least one of the nucleic acids is selected from the group consisting of NM_008714; BB559706; AK005731; BB131106; BB196807; BI217574; and BC024599, NMJH2043; NM_008026; NM_030712; BF457736; BE981473; BB009770; BB049759; AU020235; BC019937; BC026495; AW259452; BB215355; and BB 196807 (see Table 5).

[0034] One embodiment of the invention provides nucleic acid sequences whose expression induced in cells exhibiting symmetric self-renewal relative to other cells by at least 100-fold. One embodiment provides a gene expression profile associated with symmetric self-renewal comprising at least five nucleic acid sequences selected from the group of Table 2, SEQ ID NOs: 142 - 215. Preferably, one looks for changes in at least ten genes from the group. As used herein, all combinations between 5 to all 74 members can be looked at, such as 15, 20, 25, 35, 50, 74, etc. Additionally, one can look at other indicators of gene expression such as mRNA or the expression of the encoded proteins.

[0035] One embodiment of the invention provides nucleic acid sequences whose expression is upregulated in cells exhibiting asymmetric self-renewal relative to other cells. One embodiment provides a gene expression profile associated with asymmetric self-renewal comprising at least five nucleic acid sequences selected from the group of Table 3, SEQ ID NOs: 216 - 418. Preferably, one looks for changes in at least ten genes from the group. As used herein, all combinations between 5 to all 203 members can be looked at, such as 15, 20, 25, 35, 50, 75, 100, 150, 203, etc. Additionally, one can look at other indicators of gene expression such as mRNA or the expression of the encoded proteins.

[0036] One embodiment of the invention provides nucleic acid sequences whose expression is upregulated in cells exhibiting symmetric self-renewal, as

compared to cells exhibiting asymmetric self-renewal. (This can be looked at as having decreased expression in cells exhibiting ASR relative to symmetric replication.) One embodiment provides a gene expression profile associated with symmetric self- renewal comprising at least five nucleic acid sequences selected from the group of Table 4, SEQ ID NOs: 419 - 604. Preferably, one looks for changes in at least ten genes from the group. As used herein, all combinations between 5 to all 186 members can be looked at, such as 15, 20, 25, 35, 50, 75, 100, 150, 186, etc. Additionally, one can look at other indicators of gene expression such as mRNA or the expression of the encoded proteins.

[0037] One embodiment of the invention provides for identifying a cell exhibiting symmetric self-renewal comprising detecting or measuring expression of five or more of the nucleic acid sequences selected from the group consisting of SEQ ID NOs: 605-624, wherein an at least 100 fold change in expression level expression relative to isogenic cells not undergoing asymmetric replication of five or more of said nucleic acids is indicative of a cell exhibiting symmetric self-renewal, when said expression level is measured using a nucleic acid array. In one embodiment, the change in expression level is an at least 100 fold increase in expression level. In one embodiment, one measures expression of at least 10 of said nucleic acid sequences. As used herein, all combinations between 5 to all 20 members can be looked at, such as 5, 6, 7, 8, 9, 10, 11, 12 ,13, 14, 15, 16, 17, 18, 19, and 20 members. Additionally, one can look at other indicators of gene expression such as mRNA or the expression of the encoded proteins and correlate the level of expression measured in such embodiment. In one embodiment, the combination measured does not include at least one of the sequences selected from the group consisting of SEQ ID NO: 605, SEQ ID NO: 606, SEQ ID NO: 607, and SEQ ID NO: 611.

[0038] The nucleic acid sequences and corresponding expressed proteins of the invention may be used as markers to identify cells exhibiting different cell states. For example, the nucleic acid sequences are useful for the development of cell state - specific molecular probes, as well as methods to identify desired cells in tissues and to isolate them directly from tissues. In one embodiment one can identify non-embryonic

stem cells from a population of cells and isolate them by taking advantage of the correlation between cells exhibiting ASR and such stem cells.

[0039] In one embodiment, expression of any of the nucleic acid sequences of

Table 1, SEQ ID NOs: 1 - 141, is indicative of asymmetrically self-renewing cells. Preferably, it is a grouping of at least five of those sequences. However, one can use any of five to all one hundred forty-one, such as 10, 15, 25, 50, 75, 90, 100, 141 and all combinations in between. In one embodiment one looks at the level of mRNAs. Alternatively, one looks at the expressed proteins. Expression of these nucleic acid sequences can be used to identify, detect, and quantify cells exhibiting asymmetric self-renewal, including non-embryonic stem cells.

[0040] One particularly preferred group of genes exclusively expressed in asymmetrically self-renewing cells is provided in Tables 5 and 6. For each Affy ID, determined as described in detail in the example below, Table 5 provides for the mouse genes the corresponding GenBank ID and gene name, as well as a description of the gene and the SEQ ID NO. used herein. Similarly, Table 6 provides for the human genes the corresponding GenBank ID and gene name, as well as a description of the gene and the SEQ ID NO. for the human gene.

[0041] In one embodiment, expression of any of the nucleic acid sequences of

Table 2, SEQ ID NOs: 142 - 215, can be used to identify cells dividing with symmetric self-renewal. In one embodiment, these nucleic acid sequences are useful for discriminating between adult stem cell and their transient amplifying progeny. These nucleic acid sequences are also useful for identifying potential pre-cancerous and cancerous cells. These nucleic acid sequences are also useful as indicators of effective expansion of adult stem cells. Preferably, it is a grouping of at least five of those sequences. However, one can use any of five to all seventy-four, such as 10, 15, 25, 50, 74, and all combinations in between. In one embodiment one looks at the level of mRNAs. Alternatively, one looks at the expressed proteins.

[0042] In one embodiment, expression of any of the nucleic acid sequences of

Table 3, SEQ ID NOs: 216 - 418, which are expressed in cells undergoing either asymmetric or symmetric self-renewal, but expressed at a higher level during

asymmetric self-renewal, can be used to identify, detect, and quantify cells, including adult stem cells. Preferably, it is a grouping of at least five of those sequences. However, one can use any of five to all two hundred and three, such as 10, 15, 25, 50, 75, 90, 100, 150, 203, and all combinations in between. In one embodiment one looks at the level of mRNAs. Alternatively, one looks at the expressed proteins.

[0043] In one embodiment, expression of any of the nucleic acid sequences of

Table 4, SEQ ID NOs: 419 - 604, can be used to identify cells dividing with symmetric self-renewal. In one embodiment, these nucleic acid sequences are useful for discriminating between adult stem cell and their transient amplifying progeny. These nucleic acid sequences are also useful for identifying potential pre-cancerous and cancerous cells. These nucleic acid sequences are also useful as indicators of effective expansion of adult stem cells. Preferably, it is a grouping of at least five of those sequences. However, one can use any of five to all one hundred eighty-six, such as 10, 15, 25, 50, 75, 90, 100, 150, 186, and all combinations in between. In one embodiment one looks at the level of mRNAs. Alternatively, one looks at the expressed proteins.

[0044] In one embodiment, the exemplary probes shown in the column "Affy

ID" of Tables 1 - 6 can be used to detect expression of the nucleic acid sequences of the invention. The sequences of the individual probes of the Affymetrix GeneChip® 430 2.0 array are publicly available, including from Affymetrix, affymetrix.com/products/arrays/index.affx. Alternatively, any sequences which hybridize to those genes can be used. One can use chips from any commercial manufacturer to identify the expression levels.

Methods of detection

[0045] The expression profiles have been used to identify individual genes that are differentially expressed under one or more conditions. In addition, the present invention identifies families of genes that are differentially expressed. As used herein, "gene families" includes, but is not limited to, the specific genes identified by accession number herein, as well as related sequences. Related sequences may be, for example, sequences having a high degree of sequence identity with a specifically

identified sequence either at the nucleotide level or at the level of amino acids of the encoded polypeptide. A high degree of sequence identity is seen to be at least about 65% sequence identity at the nucleotide level to said genes, preferably about 80 or 85% sequence identity or more preferably about 90 or 95% or more sequence identity to said genes. With regard to amino acid identity of encoded polypeptides, a high degree of identity is seen to be at least about 50% identity, more preferably about 75% identity and most preferably about 85% or more sequence identity. In particular, related sequences include homologous genes from different organisms. For example, if the specifically identified gene is from a non-human mammal, the gene family would encompass homologous genes from other mammals including humans. If the specifically identified gene is a human gene, gene family would encompass the homologous gene from different organisms. Those skilled in the art will appreciate that a homologous gene may be of different length and may comprise regions with differing amounts of sequence identity to a specifically identified sequence.

[0046] The genes and sequences identified as being differentially expressed in the various cell populations described herein, as well as related sequences, may be used in a variety of nucleic acid detection assays to detect or quantititate the expression level of a gene or multiple genes in a given sample. For example, traditional Northern blotting, nuclease protection, RT-PCR, QPCR (quantitative RT-PCR), Taqman.RTM. and differential display methods may be used for detecting gene expression levels. Those methods are useful for some embodiments of the invention. However, methods and assays of the invention are most efficiently designed with hybridization-based methods for detecting the expression of a large number of genes.

[0047] The genes which are assayed according to the present invention are typically in the form of mRNA or reverse transcribed mRNA. The genes may be cloned or not. The genes may be amplified or not. In certain embodiments, it may be preferable to use polyadenylated RNA as a source, as it can be used with less processing steps.

[0048] Tables 1-8 provide the Accession numbers and name for the sequences of the differentially expressed markers (SEQ ID NOs: 1-624). The sequences of the genes in GenBank are expressly incorporated herein.

[0049] Table 9 provides an example showing the sequences for the sequences and GenBank ID accessions listed in Table 6.

[0050] Probes based on the sequences of the genes described above may be prepared by any commonly available method. Oligonucleotide probes for interrogating the tissue or cell sample are preferably of sufficient length to specifically hybridize only to appropriate, complementary genes or transcripts. Typically the oligonucleotide probes will be at least 10, 12, 14, 16, 18, 20 or 25 nucleotides in length. In some cases longer probes of at least 30, 40 or 50 nucleotides will be desirable.

[0051] As used herein, oligonucleotide sequences that are complementary to one or more of the genes and/or gene families described in Tables 1-8, refer to oligonucleotides that are capable of hybridizing under stringent conditions to at least part of the nucleotide sequences of said genes. Such hybridizable oligonucleotides will typically exhibit at least about 75% sequence identity at the nucleotide level to said genes, preferably about 80 or 85% sequence identity or more preferably about 90 or 95% or more sequence identity to said genes.

[0052] "Bind(s) substantially" refers to complementary hybridization between a probe nucleic acid and a target nucleic acid and embraces minor mismatches that can be accommodated by reducing the stringency of the hybridization media to achieve the desired detection of the target polynucleotide sequence.

[0053] The terms "background" or "background signal intensity" refer to hybridization signals resulting from non-specific binding, or other interactions, between the labeled target nucleic acids and components of the oligonucleotide array (e.g., the oligonucleotide probes, control probes, the array substrate, etc.). Background signals may also be produced by intrinsic fluorescence of the array components themselves. A single background signal can be calculated for the entire array, or a different background signal may be calculated for each target nucleic acid. In a preferred embodiment, background is calculated as the average hybridization signal intensity for the lowest 5 to 10% of the probes in the array, or, where a different background signal is calculated for each target gene, for the lowest 5 to 10% of the

probes for each gene. Of course, one of skill in the art will appreciate that where the probes to a particular gene hybridize well and thus appear to be specifically binding to a target sequence, they should not be used in a background signal calculation. Alternatively, background may be calculated as the average hybridization signal intensity produced by hybridization to probes that are not complementary to any sequence found in the sample (e.g., probes directed to nucleic acids of the opposite sense or to genes not found in the sample such as bacterial genes where the sample is mammalian nucleic acids). Background can also be calculated as the average signal intensity produced by regions of the array that lack any probes at all.

[0054] The phrase "hybridizing specifically to" refers to the binding, duplexing, or hybridizing of a molecule substantially to or only to a particular nucleotide sequence or sequences under stringent conditions when that sequence is present in a complex mixture (e.g., total cellular) DNA or RNA.

[0055] Assays and methods of the invention may utilize available formats to simultaneously screen at least about 100, preferably about 1000, more preferably about 10,000 and most preferably about 100,000 different nucleic acid hybridizations.

[0056] The terms "mismatch control" or "mismatch probe" refer to a probe whose sequence is deliberately selected not to be perfectly complementary to a particular target sequence. For each mismatch (MM) control in a high-density array there typically exists a corresponding perfect match (PM) probe that is perfectly complementary to the same particular target sequence. The mismatch may comprise one or more bases.

[0057] While the mismatch(s) may be located anywhere in the mismatch probe, terminal mismatches are less desirable as a terminal mismatch is less likely to prevent hybridization of the target sequence. In a particularly preferred embodiment, the mismatch is located at or near the center of the probe such that the mismatch is most likely to destabilize the duplex with the target sequence under the test hybridization conditions.

[0058] The term "perfect match probe" refers to a probe that has a sequence that is perfectly complementary to a particular target sequence. The test probe is

typically perfectly complementary to a portion (subsequence) of the target sequence. The perfect match (PM) probe can be a "test probe" or a "normalization control" probe, an expression level control probe and the like. A perfect match control or perfect match probe is, however, distinguished from a "mismatch control" or "mismatch probe" as defined herein.

[0059] As used herein a "probe" is defined as a nucleic acid, capable of binding to a target nucleic acid of complementary sequence through one or more types of chemical bonds, usually through complementary base pairing, usually through hydrogen bond formation. As used herein, a probe may include natural (i.e., A, G, U, C or T) or modified bases (7-deazaguanosine, inosine, PNAs, etc.). In addition, the bases in probes may be joined by a linkage other than a phosphodiester bond, so long as it does not interfere with hybridization. Thus, probes may be peptide nucleic acids in which the constituent bases are joined by peptide bonds rather than phosphodiester linkages.

[0060] The term "stringent conditions" refers to conditions under which a probe will hybridize to its target subsequence, but with only insubstantial hybridization to other sequences or to other sequences such that the difference may be identified. Stringent conditions are sequence-dependent and will be different in different circumstances. Longer sequences hybridize specifically at higher temperatures. Generally, stringent conditions are selected to be about 5. degree. C. lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH.

[0061] Typically, stringent conditions will be those in which the salt concentration is at least about 0.01 to 1.0 M sodium ion concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30. degree. C. for short probes (e.g., 10 to 50 nucleotides). Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide.

[0062] The "percentage of sequence identity" or "sequence identity" is determined by comparing two optimally aligned sequences or subsequences over a comparison window or span, wherein the portion of the polynucleotide sequence in the

comparison window may optionally comprise additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical residue (e.g., nucleic acid base or amino acid residue) occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and multiplying the result by 100 to yield the percentage of sequence identity.

[0063] Percentage sequence identity can be calculated by the local homology algorithm of Smith & Waterman, (1981) Adv. Appl. Math. 2:482-485; by the homology alignment algorithm of Needleman & Wunsch, (1970) J. MoI. Biol. 48:443- 445; or by computerized implementations of these algorithms (GAP & BESTFIT in the GCG Wisconsin Software Package, Genetics Computer Group) or by manual alignment and visual inspection.

[0064] Percentage sequence identity when calculated using the programs GAP or BESTFIT is calculated using default gap weights. The BESTFIT program has two alignment variables, the gap creation penalty and the gap extension penalty, which can be modified to alter the stringency of a nucleotide and/or amino acid alignment produced by the program. Parameter values used in the percent identity determination were default values previously established for version 8.0 of BESTFIT (see Dayhoff, (1979) Atlas of Protein Sequence and Structure, National Biomedical Research Foundation, pp. 353-358).

[0065] As is apparent to one of ordinary skill in the art, nucleic acid samples, which may be DNA and/or RNA, used in the methods and assays of the invention may be prepared by any available method or process. Methods of isolating total mRNA are well known to those of skill in the art. For example, methods of isolation and purification of nucleic acids are described in detail in Chapter 3 of Tijssen, (1993) Laboratory Techniques in Biochemistry and Molecular Biology: Hybridization With Nucleic Acid Probes, Elsevier Press. Such samples include RNA samples, but also include cDNA synthesized from a mRNA sample isolated from a cell or tissue of interest. Such samples also include DNA amplified from the cDNA, and RNA

transcribed from the amplified DNA. One of skill in the art would appreciate that it is desirable to inhibit or destroy RNase present in homogenates before homogenates can be used.

[0066] Biological samples may be of any biological tissue or fluid or cells from any organism as well as cells raised in vitro, such as cell lines and tissue culture cells. Frequently, the sample will be a "clinical sample" which is a sample derived from a patient. Typical clinical samples include, but are not limited to, sputum, blood, blood- cells (e.g., white cells), tissue or fine needle biopsy samples, urine, peritoneal fluid, and pleural fluid, or cells therefrom. Biological samples may also include sections of tissues, such as frozen sections or formalin fixed sections taken for histological purposes.

[0067] In certain embodiments, the term "individual", as used herein, preferably refers to human. However, the methods are not limited to humans, and a skilled artisan can use the diagnostic/prognostic gene groupings of the present invention in, for example, laboratory test animals, including but not limited to rats and mice, dogs, sheep, pig, guinea pigs, and other model animals.

[0068] The phrase "altered expression" as used herein, refers to either increased or decreased expression in a cell. The terms "upregulation" and "downregulation" refers to the amount of expression in a first cell or population of cells relative to the amount of expression in a second cell or population of cells.

[0069] The analysis of the gene expression of one or more gene groups of the present invention can be performed using any gene expression method known to one skilled in the art. Such methods include, but are not limited to, expression analysis using nucleic acid chips (e.g. Affymetrix chips) and quantitative RT-PCR based methods using, for example real-time detection of the transcripts. Analysis of transcript levels according to the present invention can be made using total or messenger RNA or proteins encoded by the genes identified in the diagnostic gene groups of the present invention as a starting material. In one embodiment the analysis is an immunohistochemical analysis with an antibody directed against proteins

comprising at least 5 proteins encoded by the genes of expression group being analyzed

[0070] The methods of analyzing transcript levels of the gene groups in an individual include Northern-blot hybridization, ribonuclease protection assay, and reverse transcriptase polymerase chain reaction (RT-PCR) based methods. The different RT-PCR based techniques are the most suitable quantification method for certain applications of the present invention, because they are very sensitive and thus require only a small sample size which is desirable for a diagnostic test. A number of quantitative RT-PCR based methods have been described and are useful in measuring the amount of transcripts according to the present invention. These methods include RNA quantification using PCR and complementary DNA (cDNA) arrays (Shalon et al., Genome Research 6(7):639-45, 1996; Bernard et al., Nucleic Acids Research 24(8): 1435-42, 1996), real competitive PCR using a MALDI-TOF Mass spectrometry based approach (Ding et al, PNAS, 100: 3059-64, 2003), solid-phase mini-sequencing technique, which is based upon a primer extension reaction (U.S. Patent No. 6,013,431, Suomalainen et al. MoI. Biotechnol. Jun;15(2):123-31, 2000), ion-pair high-performance liquid chromatography (Doris et al. J. Chromatogr. A May 8;806(l):47~60, 1998), and 5' nuclease assay or real-time RT-PCR (Holland et al. Proc Natl Acad Sci USA 88: 7276-7280, 1991).

[0071] Methods using RT-PCR and internal standards differing by length or restriction endonuclease site from the desired target sequence allowing comparison of the standard with the target using gel electrophoretic separation methods followed by densitometric quantification of the target have also been developed and can be used to detect the amount of the transcripts according to the present invention (see, e.g., U.S. Patent Nos. 5,876,978; 5,643,765; and 5,639,606.

[0072] The practice of the present invention may employ, unless otherwise indicated, conventional techniques and descriptions of organic chemistry, polymer technology, molecular biology (including recombinant techniques), cell biology, biochemistry, and immunology, which are within the skill of the art. Such conventional techniques include polymer array synthesis, hybridization, ligation, and detection of hybridization using a label. Specific illustrations of suitable techniques

can be had by reference to the example herein below. However, other equivalent conventional procedures can, of course, also be used. Such conventional techniques and descriptions can be found in standard laboratory manuals such as Genome Analysis: A Laboratory Manual Series (VoIs. I-IV), Using Antibodies: A Laboratory Manual, Cells: A Laboratory Manual, PCR Primer: A Laboratory Manual, and Molecular Cloning: A Laboratory Manual (all from Cold Spring Harbor Laboratory Press), Stryer, L. (1995) Biochemistry (4th Ed.) Freeman, New York, Gait, "Oligonucleotide Synthesis: A Practical Approach" 1984, IRL Press, London, Nelson and Cox (2000), Lehninger, Principles of Biochemistry 3 rd Ed., W.H. Freeman Pub., New York, NY and Berg et al. (2002) Biochemistry, 5 th Ed., W.H. Freeman Pub., New York, NY, all of which are herein incorporated in their entirety by reference for all purposes.

[0073] The methods of the present invention can employ solid substrates, including arrays in some preferred embodiments. Methods and techniques applicable to polymer (including protein) array synthesis have been described in U.S. S.N 09/536,841, WO 00/58516, U.S. Patents Nos. 5,143,854, 5,242,974, 5,252,743, 5,324,633, 5,384,261, 5,405,783, 5,424,186, 5,451,683, 5,482,867, 5,491,074, 5,527,681, 5,550,215, 5,571,639, 5,578,832, 5,593,839, 5,599,695, 5,624,711, 5,631,734, 5,795,716, 5,831,070, 5,837,832, 5,856,101, 5,858,659, 5,936,324, 5,968,740, 5,974,164, 5,981,185, 5,981,956, 6,025,601, 6,033,860, 6,040,193, 6,090,555, 6,136,269, 6,269,846 and 6,428,752, in PCT Applications Nos. PCT/US99/00730 (International Publication Number WO 99/36760) and PCT/USOl/04285, which are all incorporated herein by reference in their entirety for all purposes.

[0074] Patents that describe synthesis techniques in specific embodiments include U.S. Patents Nos. 5,412,087, 6,147,205, 6,262,216, 6,310,189, 5,889,165, and 5,959,098. Nucleic acid arrays are described in many of the above patents, but the same techniques are applied to polypeptide and protein arrays.

[0075] Nucleic acid arrays that are useful in the present invention include, but are not limited to those that are commercially available from Affymetrix (Santa Clara,

CA) under the brand name GeneChip7. Example arrays are shown on the website at affymetrix.com

[0076] One of skill in the art will appreciate that an enormous number of array designs are suitable for the practice of this invention. In some preferred embodiments, a high density array may be used. The high density array will typically include a number of probes that specifically hybridize to the sequences of interest (see WO 99/32660 for methods of producing probes for a given gene or genes). In addition, in a preferred embodiment, the array will include one or more control probes.

[0077] High density array chips of the invention include "test probes" as defined herein. Test probes could be oligonucleotides that range from about 5 to about 45 or 5 to about 500 nucleotides, more preferably from about 10 to about 40 nucleotides and most preferably from about 15 to about 40 nucleotides in length. In other particularly preferred embodiments, the probes are 20 or 25 nucleotides in length. In another preferred embodiment, test probes are double or single strand nucleic acid sequences, preferably DNA sequences. Nucleic acid sequences may be isolated or cloned from natural sources or amplified from natural sources using native nucleic acid as templates. These probes have sequences complementary to particular subsequences of the genes whose expression they are designed to detect. Thus, the test probes are capable of specifically hybridizing to the target nucleic acid they are to detect.

[0078] In addition to test probes that bind the target nucleic acid(s) of interest, the high density array can contain a number of control probes. The control probes fall into three categories referred to herein as (1) normalization controls; (2) expression level controls; and (3) mismatch controls.

[0079] Normalization controls are oligonucleotide or other nucleic acid probes that are complementary to labeled reference oligonucleotides or other nucleic acid sequences that are added to the nucleic acid sample to be screened. The signals obtained from the normalization controls after hybridization provide a control for variations in hybridization conditions, label intensity, "reading" efficiency and other factors that may cause the signal of a perfect hybridization to vary between arrays. In a

preferred embodiment, signals (e.g., fluorescence intensity) read from all other probes in the array are divided by the signal (e.g., fluorescence intensity) from the control probes thereby normalizing the measurements.

[0080] Virtually any probe may serve as a normalization control. However, it is recognized that hybridization efficiency varies with base composition and probe length. Preferred normalization probes are selected to reflect the average length of the other probes present in the array, however, they can be selected to cover a range of lengths. The normalization control(s) can also be selected to reflect the (average) base composition of the other probes in the array, however in a preferred embodiment, only one or a few probes are used and they are selected such that they hybridize well (i.e., no secondary structure) and do not match any target-specific probes.

[0081] Expression level controls are probes that hybridize specifically with constitutively expressed genes in the biological sample. Virtually any constitutively expressed gene provides a suitable target for expression level controls. Typically expression level control probes have sequences complementary to subsequences of constitutively expressed "housekeeping genes" including, but not limited to the actin gene, the transferrin receptor gene, the GAPDH gene, and the like.

[0082] Mismatch controls may also be provided for the probes to the target genes, for expression level controls or for normalization controls. Mismatch controls are oligonucleotide probes or other nucleic acid probes identical to their corresponding test or control probes except for the presence of one or more mismatched bases. A mismatched base is a base selected so that it is not complementary to the corresponding base in the target sequence to which the probe would otherwise specifically hybridize. One or more mismatches are selected such that under appropriate hybridization conditions (e.g., stringent conditions) the test or control probe would be expected to hybridize with its target sequence, but the mismatch probe would not hybridize (or would hybridize to a significantly lesser extent). Preferred mismatch probes contain a central mismatch. Thus, for example, where a probe is a twenty-mer, a corresponding mismatch probe will have the identical sequence except for a single base mismatch (e.g., substituting a G, C or T for an A) at any of positions six through fourteen (the central mismatch).

[0083] Mismatch probes thus provide a control for non-specific binding or cross hybridization to a nucleic acid in the sample other than the target to which the probe is directed. Mismatch probes also indicate whether a hybridization is specific or not.

[0084] For example, if the target is present the perfect match probes should be consistently brighter than the mismatch probes. In addition, if all central mismatches are present, the mismatch probes can be used to detect a mutation. The difference in intensity between the perfect match and the mismatch probe provides a good measure of the concentration of the hybridized material..

[0085] The present invention also contemplates many uses for polymers attached to solid substrates. These uses include gene expression monitoring, profiling, library screening, genotyping and diagnostics. Examples of gene expression monitoring, and profiling methods are shown in U.S. Patents Nos. 5,800,992, 6,013,449, 6,020,135, 6,033,860, 6,040,138, 6,177,248 and 6,309,822. Examples of genotyping and uses therefore are shown in USSN 60/319,253, 10/013,598, and U.S. Patents Nos. 5,856,092, 6,300,063, 5,858,659, 6,284,460, 6,361,947, 6,368,799 and 6,333,179. Other examples of uses are embodied in U.S. Patents Nos. 5,871,928, 5,902,723, 6,045,996, 5,541,061, and 6,197,506.

[0086] The present invention also contemplates sample preparation methods in certain preferred embodiments. Prior to or concurrent with expression analysis, the nucleic acid sample may be amplified by a variety of mechanisms, some of which may employ PCR. See, e.g., PCi? Technology: Principles and Applications for DNA Amplification (Ed. H. A. Erlich, Freeman Press, NY, NY, 1992); PCR Protocols: A Guide to Methods and Applications (Eds. Innis, et al., Academic Press, San Diego, CA, 1990); Mattila et al., Nucleic Acids Res. 19, 4967 (1991); Eckert et al., PCR Methods and Applications 1, 17 (1991); PCR (Eds. McPherson et al., IRL Press, Oxford); and U.S. Patent Nos. 4,683,202, 4,683,195, 4,800,159 4,965,188,and 5,333,675, and each of which is incorporated herein by reference in their entireties for all purposes. The sample may be amplified on the array. See, for example, U.S Patent No 6,300,070 and U.S. patent application 09/513,300, which are incorporated herein by reference.

[0087] Other suitable amplification methods include the ligase chain reaction

(LCR) (e.g., Wu and Wallace, Genomics 4, 560 (1989), Landegren et al., Science 241, 1077 (1988) and Barringer et al. Gene 89:117 (1990)), transcription amplification (Kwoh et al., Proc. Natl. Acad. ScI USA 86, 1173 (1989) and WO88/10315), self-sustained sequence replication (Guatelli et al., Proc. Nat. Acad. ScL USA, 87, 1874 (1990) and WO90/06995), selective amplification of target polynucleotide sequences (U.S. Patent No 6,410,276), consensus sequence primed polymerase chain reaction (CP-PCR) (U.S. Patent No 4,437,975), arbitrarily primed polymerase chain reaction (AP-PCR) (U.S. Patent No 5, 413,909, 5,861,245) and nucleic acid based sequence amplification (NABSA). (See, US patents nos. 5,409,818, 5,554,517, and 6,063,603, each of which is incorporated herein by reference). Other amplification methods that may be used are described in, U.S. Patent Nos. 5,242,794, 5,494,810, 4,988,617 and in USSN 09/854,317, each of which is incorporated herein by reference.

[0088] Additional methods of sample preparation and techniques for reducing the complexity of a nucleic sample are described, for example, in Dong et al., Genome Research 11, 1418 (2001), in U.S. Patent No 6,361,947, 6,391,592 and U.S. Patent application Nos. 09/916,135, 09/920,491, 09/910,292, and 10/013,598.

[0089] Methods for conducting polynucleotide hybridization assays have been well developed in the art. Hybridization assay procedures and conditions will vary depending on the application and are selected in accordance with the general binding methods known including those referred to in: Maniatis et al. Molecular Cloning: A Laboratory Manual (2 nd Ed. Cold Spring Harbor, N. Y, 1989); Berger and Kimmel Methods in Enzymology, Vol. 152, Guide to Molecular Cloning Techniques (Academic Press, Inc., San Diego, CA, 1987); Young and Davism, P.N.A.S, 80: 1194 (1983). Methods and apparatus for carrying out repeated and controlled hybridization reactions have been described, for example, in US patent 5,871,928, 5,874,219, 6,045,996 and 6,386,749, 6,391,623 each of which are incorporated herein by reference.

[0090] The present invention also contemplates signal detection of hybridization between ligands in certain preferred embodiments. See, for example, U.S. Pat. Nos. 5,143,854, 5,578,832; 5,631,734; 5,834,758; 5,936,324; 5,981,956; 6,025,601; 6,141,096; 6,185,030; 6,201,639; 6,218,803; and 6,225,625, in provisional

U.S. Patent application 60/364,731 and in PCT Application PCT/US99/06097 (published as WO99/47964), each of which also is hereby incorporated by reference in its entirety for all purposes.

[0091] Examples of methods and apparatus for signal detection and processing of intensity data are disclosed in, for example, U.S. Patents Numbers 5,143,854, 5,547,839, 5,578,832, 5,631,734, 5,800,992, 5,834,758; 5,856,092, 5,902,723, 5,936,324, 5,981,956, 6,025,601, 6,090,555, 6,141,096, 6,185,030, 6,201,639; 6,218,803; and 6,225,625, in U.S. Patent application 60/364,731 and in PCT Application PCT/US99/06097 (published as WO99/47964), each of which also is hereby incorporated by reference in its entirety for all purposes.

[0092] The practice of the present invention may also employ conventional biology methods, software and systems. Computer software products of the invention typically include computer readable medium having computer-executable instructions for performing the logic steps of the method of the invention. Suitable computer readable medium include floppy disk, CD-ROM/DVD/DVD-ROM, hard-disk drive, flash memory, ROM/RAM, magnetic tapes and etc. The computer executable instructions may be written in a suitable computer language or combination of several languages. Basic computational biology methods are described in, e.g. Setubal and Meidanis et al., Introduction to Computational Biology Methods (PWS Publishing Company, Boston, 1997); Salzberg, Searles, Kasif, (Ed.), Computational Methods in Molecular Biology, (Elsevier, Amsterdam, 1998); Rashidi and Buehler, Bioinformatics Basics: Application in Biological Science and Medicine (CRC Press, London, 2000) and Ouelette and Bzevanis Bioinformatics: A Practical Guide for Analysis of Gene and Proteins (Wiley & Sons, Inc., 2 nd ed., 2001).

[0093] The present invention also makes use of various computer program products and software for a variety of purposes, such as probe design, management of data, analysis, and instrument operation. See, for example, U.S. Patent Nos. 5,593,839, 5,795,716, 5,733,729, 5,974,164, 6,066,454, 6,090,555, 6,185,561, 6,188,783, 6,223,127, 6,229,911 and 6,308,170.

[0094] Additionally, the present invention may have preferred embodiments that include methods for providing genetic information over networks such as the Internet as shown in, for example, U.S. Patent applications 10/063,559, 60/349,546, 60/376,003, 60/394,574, 60/403,381.

[0095] Throughout this specification, various aspects of this invention are presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.

[0096] The present invention has many preferred embodiments and relies on many patents, applications and other references for details known to those of the art. Therefore, when a patent, application, or other reference is cited or repeated throughout the specification, it should be understood that it is incorporated by reference in its entirety for all purposes as well as for the proposition that is recited.

[0097] In one preferred embodiment, the invention provides a prognostic and/or diagnostic immunohistochemical approach, such as a dip-stick analysis, to determine the presence of adult stem cells. Antibodies against proteins, or antigenic epitopes thereof, that are encoded by the group of genes of the present invention, are either commercially available or can be produced using methods well know to one skilled in the art. The invention contemplates either one dipstick capable of detecting all the diagnostically important gene products or alternatively, a series of dipsticks capable of detecting the amount proteins of a smaller sub-group of diagnostic proteins of the present invention.

[0098] Antibodies can be prepared by means well known in the art. The term

"antibodies" is meant to include monoclonal antibodies, polyclonal antibodies and

antibodies prepared by recombinant nucleic acid techniques that are selectively reactive with a desired antigen. Antibodies against the proteins encoded by any of the genes in the diagnostic gene groups of the present invention are either known or can be easily produced using the methods well known in the art. Internet sites such as Biocompare at http://www.biocompare.com/abmatrix.asp?antibody : =y provide a useful tool to anyone skilled in the art to locate existing antibodies against any of the proteins provided according to the present invention.

[0099] Antibodies against the proteins according to the present invention can be used in standard techniques such as Western blotting or immunohistochemistry to quantify the level of expression of the proteins corresponding to the gene group of interest. Immunohistochemical applications include assays, wherein increased presence of the protein can be assessed, for example, from a biological sample.

[00100] The immunohistochemical assays according to the present invention can be performed using methods utilizing solid supports. The solid support can be any phase used in performing immunoassays, including dipsticks, membranes, absorptive pads, beads, microtiter wells, test tubes, and the like. The preparation and use of such conventional test systems is well described in the patent, medical, and scientific literature. If a stick is used, the anti-protein antibody is bound to one end of the stick such that the end with the antibody can be dipped into the solutions as described below for the detection of the protein. Alternatively, the samples can be applied onto the antibody-coated dipstick or membrane by pipette or dropper or the like.

[00101] The antibody against proteins encoded by the genes of interest (the

"protein") can be of any isotype, such as IgA, IgG or IgM, Fab fragments, or the like. The antibody may be a monoclonal or polyclonal and produced by methods as generally described, for example, in Harlow and Lane, Antibodies, A Laboratory Manual, Cold Spring Harbor Laboratory, 1988, incorporated herein by reference. The antibody can be applied to the solid support by direct or indirect means. Indirect bonding allows maximum exposure of the protein binding sites to the assay solutions since the sites are not themselves used for binding to the support. Preferably, polyclonal antibodies are used since polyclonal antibodies can recognize different epitopes of the protein thereby enhancing the sensitivity of the assay.

[00102] The solid support is preferably non-specifically blocked after binding the protein antibodies to the solid support. Non-specific blocking of surrounding areas can be with whole or derivatized bovine serum albumin, or albumin from other animals, whole animal serum, casein, non-fat milk, and the like.

[00103] The sample is applied onto the solid support with bound protein- specific antibody such that the protein will be bound to the solid support through said antibodies. Excess and unbound components of the sample are removed and the solid support is preferably washed so the antibody-antigen complexes are retained on the solid support. The solid support may be washed with a washing solution which may contain a detergent such as Tween-20, Tween-80 or sodium dodecyl sulfate.

[00104] After the protein has been allowed to bind to the solid support, a second antibody which reacts with protein is applied. The second antibody may be labeled, preferably with a visible label. The labels may be soluble or particulate and may include dyed immunoglobulin binding substances, simple dyes or dye polymers, dyed latex beads, dye-containing liposomes, dyed cells or organisms, or metallic, organic, inorganic, or dye solids. The labels may be bound to the protein antibodies by a variety of means that are well known in the art. In some embodiments of the present invention, the labels may be enzymes that can be coupled to a signal producing system. Examples of visible labels include alkaline phosphatase, beta-galactosidase, horseradish peroxidase, and biotin. Many enzyme-chromogen or enzyme-substrate- chromogen combinations are known and used for enzyme-linked assays. Dye labels also encompass radioactive labels and fluorescent dyes.

[00105] Simultaneously with the sample, corresponding steps may be carried out with a known amount or amounts of the protein and such a step can be the standard for the assay. A sample from a healthy individual exposed to a similar air pollutant such as cigarette smoke, can be used to create a standard for any and all of the diagnostic gene group encoded proteins.

[00106] The solid support is washed again to remove unbound labeled antibody and the labeled antibody is visualized and quantified. The accumulation of label will generally be assessed visually. This visual detection may allow for detection of

different colors, for example, red color, yellow color, brown color, or green color, depending on label used. Accumulated label may also be detected by optical detection devices such as reflectance analyzers, video image analyzers and the like. The visible intensity of accumulated label could correlate with the concentration of protein in the sample. The correlation between the visible intensity of accumulated label and the amount of the protein may be made by comparison of the visible intensity to a set of reference standards. Preferably, the standards have been assayed in the same way as the unknown sample, and more preferably alongside the sample, either on the same or on a different solid support.

[00107] The assay reagents, pipettes/dropper, and test tubes may be provided in the form of a kit. Accordingly, the invention further provides a test kit for visual detection of the proteins encoded by the various gene groups. The test kit comprises one or more solutions containing a known concentration of one or more proteins encoded by the gene group of interest (the "protein") to serve as a standard; a solution of a anti-protein antibody bound to an enzyme; a chromogen which changes color or shade by the action of the enzyme; a solid support chosen from the group consisting of dip-stick and membrane carrying on the surface thereof an antibody to the protein. Instructions including the up or down regulation of the each of the genes in the groups as provided by the Tables 1-8 are included with the kit.

Somatic stem cells

[00108] As used herein, stem cells derived from or found in tissues other than from an embryo are sometimes referred to as non-embryonic stem cells, adult stem cells, somatic tissue stem cells, or somatic stem cells.

[00109] Any source of non-embryonic stem cells can be used in the methods of the present invention, including primary stem cells from an animal as well as model cell lines which exhibit asymmetric self-renewal.

[00110] The methods of the present invention can use these p53 model Cells lines, as well as other cell lines which exhibit conditional asymmetric self-renewal.

[00111] Non-embryonic stem cells of the present invention include any stem cells isolated from adult tissue, including but are not limited to bone marrow derived

stem cells, adipose derived stem cells, mesenchymal stem cells, neural stem cells, liver stem cells, and pancreatic stem cells. Bone marrow derived stem cells refers to all stem cells derived from bone marrow; these include but are not limited to mesenchymal stem cells, bone marrow stromal cells, and hematopoietic stem cells. Bone marrow stem cells are also known as mesenchymal stem cells or bone marrow stromal stem cells, or simply stromal cells or stem cells.

[00112] The stem cells are pluripotent and act as precursor cells, which produce daughter cells that mature into differentiated cells. In some embodiments, non- embryonic stem cells can be isolated from fresh bone marrow or adipose tissue by fractionation using fluorescence activated call sorting (FACS) with unique cell surface antigens to isolate specific subtypes of stem cells (such as bone marrow or adipose derived stem cells).

[00113] Bone marrow or adipose tissue derived stem cells may be obtained by removing bone marrow cells or fat cells, from a donor, either self or matched, and placing the cells in a sterile container. If the cells are adherent cells, the sterile container may include a plastic surface or other appropriate surface to which the cells adhere. For example, stromal cells will adhere to a plastic surface within 30 minutes to about 6 hours. After at least 30 minutes, preferably about four hours, the non-adhered cells may be removed and discarded. The adhered cells are stem cells, which are initially non-dividing. After about 2-4 days however the cells begin to proliferate.

[00114] Cells can be obtained from donor tissue by dissociation of individual cells from the connecting extracellular matrix of the tissue. Tissue is removed using a sterile procedure, and the cells are dissociated using any method known in the art including treatment with enzymes such as trypsin, collagenase, and the like, or by using physical methods of dissociation such as with a blunt instrument. Dissociation of cells can be carried out in any acceptable medium, including tissue culture medium. For example, a preferred medium for the dissociation of neural stem cells is low calcium artificial cerebrospinal fluid.

[00115] The dissociated stem cells or model cell lines can be cultured in any known culture medium capable of supporting cell growth, including HEM, DMEM,

RPMI, F- 12, and the like, containing supplements which are required for cellular metabolism such as glutamine and other amino acids, vitamins, minerals and useful proteins such as transferrin and the like. Medium may also contain antibiotics to prevent contamination with yeast, bacteria and fungi such as penicillin, streptomycin, gentamicin and the like. In some cases, the medium may contain serum derived from bovine, equine, chicken and the like. Serum can contain xanthine, hypoxanthine, or other compounds which enhance guanine nucleotide biosynthesis, although generally at levels below the effective concentration to suppress asymmetric cell kinetics. Thus, preferably a defined, serum-free culture medium is used, as serum contains unknown components (i.e. is undefined). Preferably, if serum is used, it has been dialyzed to remove guanine ribonucleotide precursors (rGNPrs). A defined culture medium is also preferred if the cells are to be used for transplantation purposes. A particularly preferable culture medium is a defined culture medium comprising a mixture of DMEM, F 12, and a defined hormone and salt mixture.

[00116] The culture medium can be supplemented with a proliferation-inducing growth factor(s). As used herein, the term "growth factor" refers to a protein, peptide or other molecule having a growth, proliferative, differentiative, or trophic effect on neural stem cells and/or neural stem cell progeny. Growth factors that may be used include any trophic factor that allows stem cells to proliferate, including any molecule that binds to a receptor on the surface of the cell to exert a trophic, or growth-inducing effect on the cell. Preferred proliferation-inducing growth factors include EGF, amphiregulin, acidic fibroblast growth factor (aFGF or FGF-I), basic fibroblast growth factor (bFGF or FGF-2), transforming growth factor alpha (TGF.alpha.), and combinations thereof. Growth factors are usually added to the culture medium at concentrations ranging between about 1 fg/ml to 1 mg/ml. Concentrations between about 1 to 100 ng/ml are usually sufficient. Simple titration experiments can be easily performed to determine the optimal concentration of a particular growth factor.

[00117] In addition to proliferation-inducing growth factors, other growth factors may be added to the culture medium that influence proliferation and differentiation of the cells including NGF, platelet-derived growth factor (PDGF),

thyrotropin releasing hormone (TRH), transforming growth factor betas (TGF.beta.s), insulin-like growth factor (IGF.sub.-l) and the like.

[00118] Stem cells can be cultured in suspension or on a fixed substrate. One particularly preferred substrate is a hydrogel, such as a peptide hydrogel, as described below. However, certain substrates tend to induce differentiation of certain stem cells. Thus, suspension cultures are preferable for such stem cell populations. Cell suspensions can be seeded in any receptacle capable of sustaining cells, particularly culture flasks, cultures plates, or roller bottles, more particularly in small culture flasks such as 25 cm 2 cultures flasks. In one preferred embodiment, cells are cultured at high cell density to promote the suppression of asymmetric cell kinetics.

[00119] Conditions for culturing should be close to physiological conditions.

The pH of the culture medium should be close to physiological pH, preferably between pH 6-8, more preferably between about pH 7 to 7.8, with pH 7.4 being most preferred. Physiological temperatures range between about 30.degree. C. to 4O.degree. C. Cells are preferably cultured at temperatures between about 32. degree. C. to about 38.degree. C, and more preferably between about 35.degree. C. to about 37.degree. C.

[00120] Cells are preferably cultured for 3-30 days, preferably at least about 7 days, more preferably at least 10 days, still more preferably at least about 14 days. Cells can be cultured substantially longer. They can also be frozen using known methods such as cryopreservation, and thawed and used as needed.

EXAMPLE

[00121] Specific markers for adult stem cells (also referred to as non- embroyonic stem cells) (ASCs) are essential for ASC research, tissue engineering, and biomedicine. Lack of molecular markers that are unique for ASCs has been major barrier to the initial identification and pure isolation of ASCs. Recent efforts to understand ASC-specific gene expression profiles have provided limited information on specific markers for ASCs, partially due to difficulty in obtaining pure ASCs. We approached this problem by targeting asymmetric self-renewal, which we have found is a defining property of ASCs.

[00122] Recently, global gene expression profiles have been reported for stem cells based on comparisons of genes expressed in embryonic stem cells (ESCs) to genes expressed in ASC-enriched preparations. These include hematopoietic stem cell (HSC)-enriched fractions, cultured neural stem cells (NSCs), and cultured retinal progenitor cells (RPCs) (1-3). These populations also contain a significant fraction of non-stem cell progenitors and differentiating progeny cells that limit their utility for identifying genes whose expression is unique to stem cells, i.e., sternness genes (1-4). In addition, gene expression profiles based on specific expression in both ESCs and ASC-enriched populations will exclude genes whose expression is specific to either of these distinctive stem cell classes. One essential difference is that ESCs propagate in culture by symmetric self-renewal, whereas ASCs are defined by asymmetric self- renewal (J, 6).

[00123] We applied a novel strategy to identify genes whose expression levels are related to ASC function based on targeting their unique asymmetric self-renewal. Mammalian ASCs self-renew asymmetrically to replenish cells in tissues that undergo cell turnover but maintain a constant cell mass (5, 6). Each asymmetric ASC division yields a new stem cell and a non-stem cell sister (Figure 1). The non-stem cell sister becomes the progenitor of the differentiated cells responsible for mature tissue functions (5, 6). Because asymmetric self-renewal is unique to ASCs, some genes whose expression profiles are associated with asymmetric self-renewal may specify adult sternness and also identify ASCs.

[00124] We were able to pursue this strategy because of the availability of cultured cell lines that express asymmetric self-renewal conditionally. Restoration of normal wild-type p53 protein expression induces these lines to undergo asymmetric self-renewal like ASCs (7-9). When p53 expression is reduced, the cells switch to symmetric self-renewal, resulting in exponential proliferation. In vivo, symmetric self- renewal by ASCs is regulated to increase tissue mass during normal adult maturation and to repair injured tissues (5). When controls that constrain ASCs to asymmetric self-renewal are disrupted (e.g., by p53 mutations), the risk of proliferative disorders like cancer increases (5, T).

[00125] Previously, we derived cell lines with conditional self-renewal symmetry from non-tumorigenic, immortalized cells that originated from mouse mammary epithelium ("MME") cells and mouse embryo fibroblasts (MEFs). The self- renewal symmetry of these cells can be reversibly switched between symmetric and asymmetric by varying either culture temperature or Zn concentration, as a consequence of controlling p53 expression with respectively responsive promoters (J- 10; see also Figure 2). These diverse properties allowed a microarray analysis to identify genes whose expression consistently showed the same pattern of change between asymmetric versus symmetric self-renewal.

[00126] Using cultured cells with experimentally controlled self-renewal symmetry, we performed an analysis of whole genome transcripts to identify genes whose expression is associated with asymmetric self-renewal using an Affymetrix mouse whole genome microarray.

[00127] As shown in Figure 3, the following three populations of cells were compared. Population 1: p53-null control MEFs (Con-3 cells) cultured in Zn- supplemented medium (9, 10). Population 2: Zn-responsive p53-inducible MEFs in Zn-supplemented medium. Population 3: a previously described derivative of the Zn- responsive p53 -inducible MEFs which is stably transfected with a constitutively expressed inosine monophophate dehydrogenase (IMPDH) gene (S). The purpose of the final population was to provide a comparison of asymmetric versus symmetric self- renewal that was not based on a difference in p53 expression. IMPDH is the rate- limiting enzyme for guanine nucleotide biosynthesis. Its down-regulation by p53 is required for asymmetric self-renewal (5). Therefore, even in Zn-supplemented medium, which induces normal p53 expression, cells derived with a stably expressed IMPDH transgene continue to undergo symmetric self-renewal (8, 9). This abrogation of p53 effects on cell division frequency occurs even though other p53-dependent responses remain intact (8, 10). Under the same conditions, control vector-only transfectants (tC-2 cells) continue to exhibit asymmetric self-renewal (8, 9). Thus, this final comparison was used to exclude genes whose change in expression was primarily due to changes p53 expression and not specifically transitions in self-renewal symmetry.

[00128] We performed complimentary microarray analyses with Affymetrix

GeneChip® mouse whole genome arrays, analyzing 42,000 genes using a single color assay. The statistical power of this analysis allows PM/MM algorithms for each probe sets representing a single gene, e.g. 11 oligonucleotide cells per each probe set in a GeneChip® 430 2.0 array.

[00129] The results of the microarray analyses are depicted in Tables 1 -8.

More specifically, the results from the microarray analysis were used to place the genes into four groups, based on the gene corresponding to the Affymetrix ID. Gene group 1 includes genes exclusively expressed in cells with asymmetric self-renewal; these genes are found in Table 1, SEQ ID NOs: 1 - 141. Gene group 2 includes genes exclusively expressed in cells with symmetric self renewal; these genes are found in Table 2, SEQ ID NOs: 142 - 215. Gene group 3 includes genes which are expressed at higher levels in cells with asymmetric self-renewal as compared to cells with symmetric self-renewal; these genes are found in Table 3, SEQ ID NOs: 216 - 418. Gene group 4 includes genes which are expressed at higher levels in cells with symmetric self-renewal as compared to cells with asymmetric self-renewal; these genes are found in Table 4, SEQ ID NOs: 419 - 604.

[00130] Tables 1 - 4 each include the Affymetrix ID number for the probe, as well as the locus link information for that probe, and the corresponding GenBank ID for the mouse gene. The 141 probe sets of Gene group 1 (Table 1) represent 132 different genes. The 74 probe sets of Gene group 2 (Table 2) represent 69 different genes. The 203 probe sets of Gene group 3 (Table 3) represent 188 different genes. The 186 probe sets of Gene group 4 (Table 4) represent 170 different genes. Figure 5 shows examples of several genes representative of each gene group.

[00131] The genes of Gene group 1, those genes exclusively expressed in cells exhibiting asymmetric self-renewal, were further analyzed. Tables 5 - 6 represent particularly preferred genes for identification of cells expressing asymmetric self- renewal. Thirteen of these genes exhibit a high level of expression in the microarray and are predicted to encode membrane spanning proteins. Cell surface expressed proteins are particularly useful as markers for cell states, because they are excellent potential targets for the development of antibodes for use in detecting cells. Seven of

these genes fall within 15 megabases of mouse chromosome 2, as indicated in Table 5. This region is also associated with the Philadelphia chromosome translocation, and is a candidate for a chromatin domain associated with aymmetric self-renewal. None of the genes associated with symmetric self-renewal are located in this region. Table 5 provides the gene name and GenBank ID for the mouse genes; Table 6 provides the gene name and GenBank ID for the corresponding human gene.

[00132] The genes of Gene group 1 , those genes exclusively expressed in cells exhibiting asymmetric self-renewal, were compared to expression profiles reported for several stem cell populations. The genes in Table 7 were identified as members of Gene group 1 in the present analysis; these genes were also identified as associated with stem cells in one of five previous reports, as follows. A "+" in the column indicates that the Affymetrix ID was also identified as being expressed in a cell type previously reported in the named reference. "ES" indicates genes expressed in embryonic stem cells, "NS" refers to genes expressed in neural stem cells, "HS" refers to genes expressed in hematopoietic stem cells, and "RP" refers to genes expressed in retinal precursor cells. The columns labeled "Melton" refer to the results of Ramalho- Santos, M., et al., (2002). Sternness: Transcriptional profiling of embryonic and adult stem cells. Science. 298, 597-600. The columns labeled "Lemischka" refer to the results of Ivanova, N.B., et al., (2002). A stem cell molecular signature. Science 298, 601-604. The columns labeled "Fortunel" refer to the results of Fortunel et al. (2003) Science. 302, 393b. The Group 1 genes were also compared to the results of the following two papers; however, no overlapping genes were identified: Tumbar, T., et al., (2004). Defining the epithelial stem cell niche in skin. Science. 303, 359-363; and Morris, R. J., et al., Capturing and profiling adult hair follicle stem cells. (2004). Nat. Biotech. 22, 411-417.

[00133] The genes in Table 8 were identified as members of Gene group 1 in the present analysis; these genes were not previously identified as associated with stem cells in one of five previously discussed reports of stem cell expression profiles (Ramalho-Santos et al., Ivanova et al., Fortunel et al., Tumbar et al., and Morris et al.).

[00134] Western blotting studies showed that proteins encoded by several asymmetric self-renewal associated genes changed in expression level as predicted by

microarray studies. Figure 4 shows three graphs of expression of p53, IMPDH2, and p21 using two different probe sets to analyze three populations of cells: p53 null cells, which exhibit symmetric self-renewal; p53 induced cells, which exhibit asymmetric self-renewal; and p53 induced cells which also express IMPDH, which exhibit symmetric self-renewal. Figure 6 shows a Western blot confirming the expression of several genes identified by evaluation of whole genome transcripts associated with different cell self-renewal states. We have confirmed protein expression for several ASRA genes, including survivin, HMGB2, cyclin G, and proliferin. These ASRA proteins dynamically change their expression dependent on self-renewal symmetry states. Figure 7 shows the expression of several proteins exclusively expressed in cells exhibiting asymmetric self-renewal, including as they transition.

[00135] Figure 8 shows localization of survivin, an asymmetric self-renewal associated gene down-regulated during ASR, during the different stages of mitosis in asymmetrically self-renewing (non-random chromosome segregation) cells compared to symmetrically self-renewing cells (random chromosome segregation). The localization of survivin is normal in asymmetrically self-renewing cells (non-random chromosome segregation), except in telophase when it is often undetectable in centrosomes. Figure 9 shows quantitative analysis of survivin localization during prophase, metaphase, anaphase, and telophase in asymmetrically self-renewing (non- random chromosome segregation) cells compared to symmetrically self-renewing cells (random chromosome segregation).

[00136] The expression pattern of various ASRA proteins can be used to identify self-renewal symmetry state in culture. As more ASRA proteins are evaluated, the specificity and sensitivity of this phenotypic signature will increase. In concept, this set of ASRA proteins will also provide a proteomic signature that uniquely identifies ASCs

[00137] When ASRA genes were compared with the sets of differentially expressed genes in ASC-enriched preparations, nearly all ASRA genes were included in sets of ASC-specific genes. However, association between ASRA genes and embryonic stem cell (ESC)-specific genes was not significant.

[00138] We have shown that genes whose expression is dependent on self- renewal symmetry states are highly represented among genes up-regulated in natural ASC-enriched cell populations.

REFERENCES

1. Sherley, J.L. (2002). Asymmetric cell kinetics genes: the key to expansion of adult stem cells in culture. Stem Cells, 20, 561-572.

2. Cairns, J. (2002) Somatic stem cells and the kinetics of mutagenesis and carcinogenesis. Proc, Natl. Acad. ScL USA 99, 10567-10570.

3. Merok, J.R. and Sherley, J.L. (2001). Breaching the kinetic barrier to in vitro somatic stem cell propagation. J. Biomed. Biotech. 1, 25-27.

4. Merok, J. R., Lansita, J. A., Tunstead, J. R., and Sherley, J. R. (2002). Cosegregation of chromosomes containing immortal DNA strands in cells that cycle with asymmetric stem cell kinetics. Cancer Res., 62, 6791-6795.

5. Ramalho-Santos, M., Yoon, S., Matsuzaki, Y., Mulligan, R.C. and Melton, D.A. (2002). Sternness: Transcriptional profiling of embryonic and adult stem cells. Science. 298, 597-600 .

6. Ivanova, N.B., Dimos, J.T., Schaniel, C, Hackney, J. A., Moore, K. A., and Lemischka, LR. (2002). A stem cell molecular signature. Science. 298, 601- 604.

7. Fortunel, N. O. et al. (2003) Comment on "'Sternness' : transcriptional profiling of embryonic and adult stem cells" and "A stem cell molecular signature" (I). Science 302, 393b.

8. Sherley, J. L., Stadler, P. B., and Stadler, J. S. (1995). A quantitative method for the analysis of mammalian cell proliferation in culture in terms of dividing and non-dividing cells. CellProlif. 28, 137-144.

9. Sherley, J. L., Stadler, P. B., and Johnson, D. R. (1995). Expression of the wild-type p53 antioncogene induces guanine nucleotide-dependent stem cell division kinetics. Proc. Natl. Acad. Sd. USA 92, 136-140.

10. Liu, Y., Bohn, S. A., and Sherley, J. L. (1998). Inosine-5 '-monophosphate dehydrogenase is a rate-limiting factor for p53 -dependent growth regulation MoI. Biol. Cell 9, 15-28.

11. Rambhatla L. et al. (2001). Cellular senescence: ex vivo p53-dependent asymmetric cell kinetics. J. Biomed. Biotech. 1, 28-37.

12. Altieri, D. C. (2003). Validating survivin as a cancer therapeutic target. Nature Rev. Cancer. 3, 46-54.

13. Tanaka, T.U., Rachidi, N., Janke, C, Pereira, G., Galova, M., Schiebel, E., Stark, M.J.R and Nasmyth, K. (2002). Evidence that the IpIl -SIi 15 (Aurora Kinase-INCENP) Complex Promotes

All references described herein are incorporated herein by reference.

Table 3: 203 Genes of Gene Set 3: Upregulated in Asymmetric Self-Renewal

SEQ D) GenBank ID Locus Affy ID Gene name NO: Link

216 NM_007403 11501 1416871. _at a disintegrin and metalloprotease domain 8

217 NM 009636 11568 1450637 a at AE binding protein 1

218 NM 021515 11636 1422184 a at adenylate kinase 1

219 NM 013473 11752 1417732 at annexin A8

220 NM 007494 11898 1416239 at argininosuccinate synthetase 1

221 NM_007570 12227 1416250_ .at B-cell translocation gene 2, antiproliferative

222 BB230296 12238 1454642 a at COMM domain containing 3

223 BB234940 12305 1456226_ _x_at discoidin domain receptor family, member 1

224 BCO 10758 12409 1418509 at carbonyl reductase 2

225 BQ175880 12444 1434745 at cyclin D2

226 NM 009866 12552 1450757 at cadherin 11

227 AKO 16527 12554 1454015 a at cadherin 13

228 BQ176681 12554 1434115 at cadherin 13

229 AF059567 12579 1449152. at cyclin-dependent kinase inhibitor 2B (p 15, inhibits CDK4)

230 BG967663 12709 1455106 a at creatine kinase, brain

231 NM 018827 12931 1418476 at cytokine receptor-like factor 1

232 NM 009964 12955 1416455 a at crystallin, alpha B

233 AV016515 12955 1434369 a at crystallin, alpha B

234 NM 007881 13498 1421149 a at dentatorubral pallidoluysian atrophy

235 AV346607 13655 1436329 at early growth response 3

236 NM 007933 13808 1417951 at enolase 3, beta muscle

237 NM 010145 13849 1422438 at epoxide hydrolase 1, microsomal

238 NM 010161 14017 1450241 a at ecotropic viral integration site 2a

239 NM_010189 14132 1416978_ at Fc receptor, IgG, alpha chain transporter

240 M33760 14182 1424050 s at Fibroblast growth factor receptor 1

241 NM 010222 14231 1416803 at FK506 binding protein 7

242 AV026617 14281 1423100 at FBJ osteosarcoma oncogene

243 NM 008046 14313 1421365 at follistatin

244 BB444134 14313 1434458 at Follistatin

245 AB037596 14538 1425503 , _at glucosaminyl (N-acetyl) transferase 2, I-branching enzyme

246 AF297615 14594 1418483_ a_at glycoprotein galactosyltransferase alpha 1, 3

247 BC003726 14789 1449531 at leprecan-like 2

248 NM 010357 14860 1416368 at glutathione S-transferase, alpha 4

249 AFl 17613 15199 1418172 at heme binding protein 1

250 NM 010442 15368 1448239 at heme oxygenase (decycling) 1

251 NM_010444 15370 1416505_ at nuclear receptor subfamily 4, group A, member 1

252 AK005016 15473 1428326 s at heat-responsive protein 12

253 U03561 15507 1425964 x at heat shock protein 1

254 NM 013560 15507 1422943 a at heat shock protein 1

255 NM 008393 16373 1418517 at Iroquois related homeobox 3

Table 3: 203 Genes of Gene Set 3: Upregulated in Asymmetric Self-Renewal

SEQ ID GenBankID Locus Affym Gene name NO: Link

290 BB228713 22232 1439433 _a_at solute carrier family 35 (UDP- galactose transporter), member 2

291 NMJ) 11706 22368 1416935 _at transient receptor potential cation channel, subfamily V, member 2

292 NM_016873 22403 1419015 _at WNTl inducible signaling pathway protein 2

293 BB479063 24131 1433783 at LIM domain binding 3

294 AFl 14378 24131 1451999 at LIM domain binding 3

295 AFl 88290 26903 1451891 a at dysferlin

296 BC008105 27015 1449483 at polymerase (DNA directed), kappa

297 NM_013750 27280 1449002 _at pleckstrin homology-like domain, family A, member 3

298 NM 013759 27361 1418888 a at selenoprotein X 1

299 BB749092 28064 1444012 _at DNA segment, Chr 17, Wayne State University 94, expressed

300 BI739353 29858 1430780 a at phosphomannomutase 1

301 BC006809 29858 1424167 a at phosphomannomutase 1

302 NM 015772 50524 1416638 at sal-like 2 (Drosophila)

303 NM 015776 50530 1418454 at microfibrillar associated protein 5

304 BB533903 50708 1436994 a at histone 1, HIc

305 NM 015786 50708 1416101 a at histone 1, HIc

306 BB107412 52065 1429005 at Malignant fibrous histiocytoma amplified sequence 1

307 AK003278 52466 1426714 at DNA segment, Chr 11, ERATO Doi 18, expressed

308 AU014694 52666 1419978 _s_at DNA segment, Chr 10, ERATO Doi 610, expressed

309 NM_030598 53901 1421425 _a_at Down syndrome critical region gene 1- like l

310 NM 133914 54153 1417333 at RAS p21 protein activator 4

311 NM_019971 54635 1419123 _a_at platelet-derived growth factor, C polypeptide

312 AF282255 54720 1416601 _a_at Down syndrome critical region homolog 1 (human)

313 AF282255 54720 1416600 a_at Down syndrome critical region homolog 1 (human)

314 AI326893 55927 1436050 _x_at hairy and enhancer of split 6 (Drosophila)

315 NM 019631 56277 1422587 at transmembrane protein 45 a

316 AV370848 56316 1423554 at gamma-glutamyl carboxylase

317 NM_019790 56363 1419073 _at transmembrane protein with EGF-like and two follistatin-like domains 2

318 NM 019976 56742 1417323 at RIKEN cDNA 5430413102 gene

319 BC005569 58809 1422603 at ribonuclease, RNase A family 4

320 NM 022329 64164 1448958 at interferon alpha responsive gene

321 BC010291 66141 1423754 at interferon induced transmembrane protein 3

322 BG067878 66251 1426534 _a_at ADP-ribosylation factor GTPase activating protein 3

The following 7 murine genes are exclusively associated with asymmetric self renewal and are located on Chromosome 2: NM_008714; BB559706; AK005731; BB131106; BB196807; BI217574; and BC024599.

The following 13 murine genes are exclusively associated with asymmetric self renewal and are NOT located on Chromosome 2: NM_012043; NM_008026; NM_030712; BF457736; BE981473; BB009770; BB049759; AU020235; BC019937; BC026495; AW259452; BB215355; and BB196807.

87

The following 7 human genes are exclusively associated with asymmetric self renewal and their murine homologues are located on Chromosome 2: AF308602; AI264121; AU 160041; AL136573; NM_017585; AF047004; and AL136566.

The following 13 human genes are exclusively associated with asymmetric self renewal and their murine homologues are NOT located on Chromosome 2: NM_005545; AF327066; U73531; BC016797; BE781857; NM_024660; NM_019099; AL133001; NM_024587; AI954412; AI393309; NM_030581; andNM_017585.

Docket No. 019028-57011-PCT Express Mail Label No.: EV 653005964 US

Table 7: Overlap between Gene Set 1 ( Exclusive Aymmetric Self-Renewal) and Stem Cell Enriched Genes previously described

Key

1: Melton ES cells 4' Lemischka ES cells 7: Fortunel ES cells 2: Melton NS cells 5: Lemischka NS cells 8: Forunel NS cells 3.: Melton HS cells 6: Lemischka HS cells 9: Fortunel RP cells

VO

Docket No. 019028-57011-PCT Express Mail Label No.: EV 653005964 US

Docket No. 019028-57011-PCT Express Mail Label No.: EV 653005964 US

Docket No. 019028-57011-PCT Express Mail Label No.: EV 653005964 US

-4 K)

Docket No. 019028-57011-PCT Express Mail Label No.: EV 653005964 US

Docket No. 019028-57011 -PCT Express Mail Label No.: EV 653005964 US

-4

4-

Docket No. 019028-57011-PCT Express Mail Label No.: EV 653005964 US

~4

Docket No. 019028-57011-PCT Express Mail Label No.: EV 653005964 US

-4

-4 -4

-4

-4

N/A: No human orthologue target in AffyChip

SEQUENCES TABLE 9

In Tables 1-8 of the Application, the Applicants have given sequence identifier numbers (SEQ ID NO's) according to Genbank accession numbers and cross referenced these numbers with Affymetrix ID numbers. For example, in Table 6, SEQ ID NO: 605 corresponds to Genebank accession number AF308602, which is the Homo sapiens NOTCH 1 (Nl) mRNA, complete coding sequence. SEQ ID NO: 605 also corresponds to the Affymetrix ID number of 1418633_at.

SEQ ID NO: 605 AF308602 Homo sapiens NOTCH 1 (Nl) mRNA

1 atgccgccgc tcctggcgcc cctgctctgc ctggcgctgc tgcccgcgct cgccgcacga 61 ggcccgcgat gctcccagcc cggtgagacc tgcctgaatg gcgggaagtg tgaagcggcc 121 aatggcacgg aggcctgcgt ctgtggcggg gccttcgtgg gcccgcgatg ccaggacccc 181 aacccgtgcc tcagcacccc ctgcaagaac gccgggacat gccacgtggt ggaccgcaga 241 ggcgtggcag actatgcctg cagctgtgcc ctgggcttct ctgggcccct ctgcctgaca 301 cccctggaca acgcctgcct caccaacccc tgccgcaacg ggggcacctg cgacctgctc 361 acgctgacgg agtacaagtg ccgctgcccg cccggctggt cagggaaatc gtgccagcag 421 gctgacccgt gcgcctccaa cccctgcgcc aacggtggcc agtgcctgcc cttcgaggcc 481 tcctacatct gccactgccc acccagcttc catggcccca cctgccggca ggatgtcaac 541 gagtgtggcc agaagcccag gctttgccgc cacggaggca cctgccacaa cgaggtcggc 601 tcctaccgct gcgtctgccg cgccacccac actggcccca actgcgagcg gccctacgtg 661 ccctgcagcc cctcgccctg ccagaacggg ggcacctgcc gccccacggg cgacgtcacc 721 cacgagtgtg cctgcctgcc aggcttcacc ggccagaact gtgaggaaaa tatcgacgat 781 tgtccaggaa acaactgcaa gaacgggggt gcctgtgtgg acggcgtgaa cacctacaac

6 030887

841 tgcccgtgcc cgccagagtg gacaggtcag tactgtaccg aggatgtgga cgagtgccag 901 ctgatgccaa atgcctgcca gaacggcggg acctgccaca acacccacgg tggctacaac 961 tgcgtgtgtg tcaacggctg gactggtgag gactgcagcg agaacattga tgactgtgcc 1021 agcgccgcct gcttccacgg cgccacctgc catgaccgtg tggcctcctt ttactgcgag 1081 tgtccccatg gccgcacagg tctgctgtgc cacctcaacg acgcatgcat cagcaacccc 1141 tgtaacgagg gctccaactg cgacaccaac cctgtcaatg gcaaggccat ctgcacctgc 1201 ccctcggggt acacgggccc ggcctgcagc caggacgtgg atgagtgctc gctgggtgcc 1261 aacccctgcg agcatgcggg caagtgcatc aacacgctgg gctccttcga gtgccagtgt 1321 ctgcagggct acacgggccc ccgatgcgag atcgacgtca acgagtgcgt ctcgaacccg 1381 tgccagaacg acgccacctg cctggaccag attggggagt tccagtgcat gtgcatgccc 1441 ggctacgagg gtgtgcactg cgaggtcaac acagacgagt gtgccagcag cccctgcctg 1501 cacaatggcc gctgcctgga caagatcaat gagttccagt gcgagtgccc cacgggcttc 1561 actgggcatc tgtgccagta cgatgtggac gagtgtgcca gcaccccctg caagaatggt 1621 gccaagtgcc tggacggacc caacacttac acctgtgtgt gcacggaagg gtacacgggg 1681 acgcactgcg aggtggacat cgatgagtgc gaccccgacc cctgccacta cggctcctgc 1741 aaggacggcg tcgccacctt cacctgcctc tgccgcccag gctacacggg ccaccactgc 1801 gagaccaaca tcaacgagtg ctccagccag ccctgccgcc tacggggcac ctgccaggac 1861 ccggacaacg cctacctctg cttctgcctg aaggggacca caggacccaa ctgcgagatc 1921 aacctggatg actgtgccag cagcccctgc gactcgggca cctgtctgga caagatcgat 1981 ggctacgagt gtgcctgtga gccgggctac acagggagca tgtgtaacag caacatcgat 2041 gagtgtgcgg gcaacccctg ccacaacggg ggcacctgcg aggacggcat caatggcttc 2101 acctgccgct gccccgaggg ctaccacgac cccacctgcc tgtctgaggt caatgagtgc 2161 aacagcaacc cctgcgtcca cggggcctgc cgggacagcc tcaacgggta caagtgcgac 2221 tgtgaccctg ggtggagtgg gaccaactgt gacatcaaca acaacgagtg tgaatccaac

2281 ccttgtgtca acggcggcac ctgcaaagac atgaccagtg gcatcgtgtg cacctgccgg 2341 gagggcttca gcggtcccaa ctgccagacc aacatcaacg agtgtgcgtc caacccatgt 2401 ctgaacaagg gcacgtgtat tgacgacgtt gccgggtaca agtgcaactg cctgctgccc 2461 tacacaggtg ccacgtgtga ggtggtgctg gccccgtgtg cccccagccc ctgcagaaac 2521 ggcggggagt gcaggcaatc cgaggactat gagagcttct cctgtgtctg ccccacggct 2581 ggggccaaag ggcagacctg tgaggtcgac atcaacgagt gcgttctgag cccgtgccgg 2641 cacggcgcat cctgccagaa cacccacggc gsstaccgct gccactgcca ggccggctac 2701 agtgggcgca actgcgagac cgacatcgac gactgccggc ccaacccgtg tcacaacggg 2761 ggctcctgca cagacggcat caacacggcc ttctgcgact gcctgcccgg cttccggggc 2821 actttctgtg aggaggacat caacgagtgt gccagtgacc cctgccgcaa cggggccaac 2881 tgcacggact gcgtggacag ctacacgtgc acctgccccg caggcttcag cgggatccac 2941 tgtgagaaca acacgcctga ctgcacagag agctcctgct tcaacggtgg cacctgcgtg 3001 gacggcatca actcgttcac ctgcctgtgt ccacccggct tcacgggcag ctactgccag 3061 cacgtagtca atgagtgcga ctcacgaccc tgcctgctag gcggcacctg tcaggacggt 3121 cgcggtctcc acaggtgcac ctgcccccag ggctacactg gccccaactg ccagaacctt 3181 gtgcactggt gtgactcctc gccctgcaag aacggcggca aatgctggca gacccacacc 3241 cagtaccgct gcgagtgccc cagcggctgg accggccttt actgcgacgt gcccagcgtg 3301 tcctgtgagg tggctgcgca gcgacaaggt gttgacgttg cccgcctgtg ccagcatgga 3361 gggctctgtg tggacgcggg caacacgcac cactgccgct gccaggcggg ctacacaggc 3421 agctactgtg aggacctggt ggacgagtgc tcacccagcc cctgccagaa cggggccacc 3481 tgcacggact acctgggcgg ctactcctgc aagtgcgtgg ccggctacca cggggtgaac 3541 tgctctgagg agatcgacga gtgcctctcc cacccctgcc agaacggggg cacctgcctc 3601 gacctcccca acacctacaa gtgctcctgc ccacggggca ctcagggtgt gcactgtgag 3661 atcaacgtgg acgactgcaa tccccccgtt gaccccgtgt cccggagccc caagtgcttt

3721 aacaacggca cctgcgtgga ccaggtgggc ggctacagct gcacctgccc gccgggcttc 3781 gtgggtgagc gctgtgaggg ggatgtcaac gagtgcctgt ccaatccctg cgacgcccgt 3841 ggcacccaga actgcgtgca gcgcgtcaat gacttccact gcgagtgccg tgctggtcac 3901 accgggcgcc gctgcgagtc cgtcatcaat ggctgcaaag gcaagccctg caagaatggg 3961 ggcacctgcg ccgtggcctc caacaccgcc cgcgggttca tctgcaagtg ccctgcgggc 4021 ttcgagggcg ccacgtgtga gaatgacgct cgtacctgcg gcagcctgcg ctgcctcaac 4081 ggcggcacat gcatctccgg cccgcgcagc cccacctgcc tgtgcctggg ccccttcacg 4141 ggccccgaat gccagttccc ggccagcagc ccctgcctgg gcggcaaccc ctgctacaac 4201 caggggacct gtgagcccac atccgagagc cccttctacc gttgcctgtg ccccgccaaa 4261 ttcaacgggc tcttgtgcca catcctggac tacagcttcg ggggtggggc cgggcgcgac 4321 atccccccgc cgctgatcga ggaggcgtgc gagctgcccg agtgccagga ggacgcgggc 4381 aacaaggtct gcagcctgca gtgcaacaac cacgcgtgcg gctgggacgg cggtgactgc 4441 tccctcaact tcaatgaccc ctggaagaac tgcacgcagt ctctgcagtg ctggaagtac 4501 ttcagtgacg gccactgtga cagccagtgc aactcagccg gctgcctctt cgacggcttt 4561 gactgccagc gtgcggaagg ccagtgcaac cccctgtacg accagtactg caaggaccac 4621 ttcagcgacg ggcactgcga ccagggctgc aacagcgcgg agtgcgagtg ggacgggctg 4681 gactgtgcgg agcatgtacc cgagaggctg gcggccggca cgctggtggt ggtggtgctg 4741 atgccgccgg agcagctgcg caacagctcc ttccacttcc tgcgggagct cagccgcgtg 4801 ctgcacacca acgtggtctt caagcgtgac gcacacggcc agcagatgat cttcccctac 4861 tacggccgcg aggaggagct gcgcaagcac cccatcaagc gtgccgccga gggctgggcc 4921 gcacctgacg ccctgctggg ccaggtgaag gcctcgctgc tccctggtgg cagcgagggt 4981 gggcggcggc ggagggagct ggaccccatg gacgtccgcg gctccatcgt ctacctggag 5041 attgacaacc ggcagtgtgt gcaggcctcc tcgcagtgct tccagagtgc caccgacgtg 5101 gccgcattcc tgggagcgct cgcctcgctg ggcagcctca acatccccta caagatcgag

5161 gccgtgcaga gtgagaccgt ggagccgccc ccgccggcgc agctgcactt catgtacgtg 5221 gcggcggccg cctttgtgct tctgttcttc gtgggctgcg gggtgctgct gtcccgcaag 5281 cgccggcggc agcatggcca gctctggttc cctgagggct tcaaagtgtc tgaggccagc 5341 aagaagaagc ggcgggagcc cctcggcgag gactccgtgg gcctcaagcc cctgaagaac 5401 gcttcagacg gtgccctcat ggacgacaac cagaatgagt ggggggacga ggacctggag 5461 accaagaagt tccggttcga ggagcccgtg gttctgcctg acctggacga ccagacagac 5521 caccggcagt ggactcagca gcacctggat gccgctgacc tgcgcatgtc tgccatggcc 5581 cccacaccgc cccagggtga ggttgacgcc gactgcatgg acgtcaatgt ccgcgggcct 5641 gatggcttca ccccgctcat gatcgcctcc tgcagcgggg gcggcctgga gacgggcaac 5701 agcgaggaag aggaggacgc gccggccgtc atctccgact tcatctacca gggcgccagc 5761 ctgcacaacc agacagaccg cacgggcgag accgccttgc acctggccgc ccgctactca 5821 cgctctgatg ccgccaagcg cctgctggag gccagcgcag atgccaacat ccaggacaac 5881 atgggccgca ccccgctgca tgcggctgtg tctgccgacg cacaaggtgt cttccagatc 5941 ctgatccgga accgagccac agacctggat gcccgcatgc atgatggcac gacgccactg 6001 atcctggctg cccgcctggc cgtggagggc atgctggagg acctcatcaa ctcacacgcc 6061 gacgtcaacg ccgtagatga cctgggcaag tccgccctgc actgggccgc cgccgtgaac 6121 aatgtggatg ccgcagttgt gctcctgaag aacggggcta acaaagatat gcagaacaac 6181 agggaggaga cacccctgtt tctggccgcc cgggagggca gctacgagac cgccaaggtg 6241 ctgctggacc actttgccaa ccgggacatc acggatcata tggaccgcct gccgcgcgac 6301 atcgcacagg agcgcatgca tcacgacatc gtgaggctgc tggacgagta caacctggtg 6361 cgcagcccgc agctgcacgg agccccgctg gggggcacgc ccaccctgtc gcccccgctc 6421 tgctcgccca acggctacct gggcagcctc aagcccggcg tgcagggcaa gaaggtccgc 6481 aagcccagca gcaaaggcct ggcctgtgga agcaaggagg ccaaggacct caaggcacgg 6541 aggaagaagt cccaggatgg caagggctgc ctgctggaca gctccggcat gctctcgccc

6601 gtggactccc tggagtcacc ccatggctac ctgtcagacg tggcctcgcc gccactgctg 6661 ccctccccgt tccagcagtc tccgtccgtg cccctcaacc acctgcctgg gatgcccgac 6721 acccacctgg gcatcgggca cctgaacgtg gcggccaagc ccgagatggc ggcgctgggt 6781 gggggcggcc ggctggcctt tgagactggc ccacctcgtc tctcccacct gcctgtggcc 6841 tctggcacca gcaccgtcct gggctccagc agcggagggg ccctgaattt cactgtgggc 6901 gggtccacca gtttgaatgg tcaatgcgag tggctgtccc ggctgcagag cggcatggtg 6961 ccgaaccaat acaaccctct gcgggggagt gtggcaccag gccccctgag cacacaggcc 7021 ccctccctgc agcatggcat ggtaggcccg ctgcacagta gccttgctgc cagcgccctg 7081 tcccagatga tgagctacca gggcctgccc agcacccggc tggccaccca gcctcacctg 7141 gtgcagaccc agcaggtgca gccacaaaac ttacagatgc agcagcagaa cctgcagcca 7201 gcaaacatcc agcagcagca aagcctgcag ccgccaccac caccaccaca gccgcacctt 7261 ggcgtgagct cagcagccag cggccacctg ggccggagct tcctgagtgg agagccgagc 7321 caggcagacg tgcagccact gggccccagc agcctggcgg tgcacactat tctgccccag 7381 gagagccccg ccctgcccac gtcgctgcca tcctcgctgg tcccacccgt gaccgcagcc 7441 cagttcctga cgcccccctc gcagcacagc tactcctcgc ctgtggacaa cacccccagc 7501 caccagctac aggtgcctga gcaccccttc ctgacccctt cgccggagtc gcccgaccaa 7561 tggtcgtcct cgtcgccgca ctctaatgtg tctgactggt ctgagggcgt gtcgtcgccc 7621 ccgacctcca tgcagtccca gatcgcgcgc atcccggagg cgttcaagta atagctcgag 7681 gtgccagcag etc (SEQ ID NO: 605)

SEQ ID NO: 606 AI264121, NCI_CGAP_Kid3 Homo sapiens cDNA clone, mRNA sequence

1 cagcttcttt ItULtLtU ttcatgaact aaagctttat tacgattcct tttttttgat 61 ccctttgcac ccctgcacct aagccaaaag cattataatc ttgtcatact tcagataagt

121 ccacgggaga tgttccgagt gaactataga tgacattcca ctagggaatt ctatgttcag 181 tgtaaatggt atcttgtata agttttagtt ttttgtctac cctttgtttc ctgggctgag 241 cttgtccaga aatcttgtct tcttcaggct acagcagctt agagcttgct tgtgtgtgtg 301 tttgtttgtt tgtcttaaag gtataggcaa aattttagtc ttaacacctg taaaccagta 361 ctggtgttgt tctgtcctag aaattttagc actgctctga tacaataaag ccttctttct 421 ctccaactgg ttcaacttca gcataggcag gatgtccaga gcctcttcta aacttcatcg 481 caggccatct gcttgggc (SEQ ID NO: 606)

SEQ ID NO: 607 AU160041 Y79AA1 Homo sapiens cDNA clone Y79AA1000969 3-, mRNA sequence

1 caggatgtga caacgttttt aatgcaaagt caaccattag catctttccc atgtacttat 61 tagatgtgaa atggcaggac ttcacggccc cgtttgcata ttttcctact ccgcagacga 121 ataatatttt cagggaaggc agcgcantct gtgccgtcac aatcgggcga ctgtgggtga 181 tgagggatga tgattttcca ggaggccctg gggtcanagg actcctagag ggagtttcca 241 gcccctcaat cgcagatgga tggcctgttg atgttgtaac tggggtggaa gttganccgg 301 tcacaggagg tgatgcagtt atcggggcca gtcacgatgc ttttctccag gtaaacattg 361 agagtattgt tccggaacat tccacccgag gcatctcntg cacggtgggg gctctgctcc 421 cgtaagcctg gttactgggt cctgtcactg aaacagcctt ctgggtcctt gtaacccccg

481 aaccacccng ggttggntna accttgcccg gcanngtccg cgcttacgcc gnaagtna (SEQ ID NO: 607)

SEQ ID NO: 608 AL136573, Homo sapiens mRNA; cDNA DKFZp761 J1523 (from clone DKFZp761J1523)

1 ataatactga tgaagcattt ttgttccagc tctgtctcgg aagacctagg ctgtagacgt 61 ggggatttca gtaggaaaca ttatggatct gtggagctgc ttatttccag tgatgctgat

121 ggagccatcc aaagggctgg aagattcaga gtggaaaatg gctcttcaga tgagaatgca 181 actgccctgc ctggtacttg gcgaagaaca gacgtgcact tagagaaccc agaataccac 241 accagatggt atttcaaata ttttttagga caagtccatc agaactacat tggaaacgat 301 gccgagaaga gccctttctt cttgtccgtg accctttctg accaaaacaa tcaacgtgtc 361 cctcaatacc gtgcaattct ttggagaaaa acaggtaccc agaaaatatg ccttccctac 421 agtcccacaa aaactctttc tgtgaagtcc atcttaagtg ccatgaatct ggacaaattt 481 gagaaaggcc ccagggaaat ttttcatcct gaaatacaaa aggacttgct ggttcttgaa 541 gaacaagagg gctctgtgaa tttcaagttt ggggttcttt ttgccaaaga tgggcagctc 601 actgatgatg agatgttcag caatgaaatt ggaagcgagc cttttcaaaa atttttaaat 661 cttctgggtg acacaatcac tctaaagggc tggacgggct accgtggcgg tctggatacc 721 aaaaatgata ccacagggat acattcagtt tatactgtgt accaagggca tgagatcatg 781 tttcatgttt ccaccatgtt gccatattcc aaagagaaca aacagcaggt ggaaaggaaa 841 cgccacattg gaaacgatat cgtcaccatt gtgttccaag aaggagagga atcttctcct 901 gcctttaagc cttccatgat ccgctcccac tttacacata tttttgcctt agtgagatac 961 aatcaacaaa atgacaatta caggctgaaa atattttcag aagagagcgt accactcttt 1021 ggccctccct tgccaactcc accagtgttt acagaccacc aggaattcag ggactttttg 1081 ctagtgaaat taattaatgg tgaaaaagcc actttggaaa ccccaacatt tgcccagaaa 1141 cgtcggcgta ccctggatat gttgattaga tctttacacc aggatttgat gccagatttg 1201 cataagaaca tgcttaatag acgatctttt agtgatgtct taccagagtc acccaagtca 1261 gcgcggaaga aagaggaggc ccgccaggcg gagtttgtta gaatagggca ggcactaaaa 1321 ctgaaatcca ttgtgagagg ggatgctcca tcaagcttgg cagcttcagg gatctgtaaa 1381 aaagagccgt gggagcccca gtgtttctgc agtaatttcc ctcatgaagc cgtgtgtgca 1441 gatccctggg gccaggcctt gctggtttcc actgatgctg gcgtcttgct agtggatgat 1501 gaccttccat cagtgcccgt gtttgacaga actctgccag tgaagcaaat gcatgtgctt

1561 gagaccctgg accttctggt tctcagagca gacaaaggaa aagatgctcg cctctttgtc 1621 ttcaggctaa gtgctctgca aaagggcctt gaggggaagc aggctgggaa gagcaggtct 1681 gactgcagag aaaacaagtt ggagaaaaca aaaggctgcc acctgtatgc tattaacact 1741 caccacagca gagagctgag gattgtggtt gcaattcgga ataaactgct tctgatcaca 1801 agaaaacaca acaagccaag cggggtcacc agcacctcat tgttatctcc cctgtctgag 1861 tcacctgttg aagaattcca gtacatcagg gagatctgtc tgtctgactc tcccatggtg 1921 atgaccttag tggatgggcc agctgaagag agtgacaatc tcatctgtgt ggcttatcga 1981 caccaatttg atgtggtgaa tgagagcaca ggagaagcct tcaggctgca ccacgtggag 2041 gccaacaggg ttaattttgt tgcagctatt gatgtgtacg aagatggaga agctggtttg 2101 ctgttgtgtt acaactacag ttgcatctat aaaaaggttt gcccctttaa tggtggctct 2161 tttttggttc aaccttctgc gtcagatttc cagttctgtt ggaaccaggc tccctatgca 2221 attgtctgtg ctttcccgta tctcctggcc ttcaccaccg actccatgga gatccgcctg 2281 gtggtgaacg ggaacctggt ccacactgca gtcgtgccgc agctgcagct ggtggcctcc 2341 agggtgaaat tcaatcaaaa aatctgtaca agattccact tagaaacctc gtgggcagaa 2401 gcatcgaacg acctctgaag tcacccttag tctccaaggt catcacccca cccactccca 2461 tcagtgtggg ccttgctgcc attccagtca cgcactcctt gtccctgtct cgcatggaga 2521 tcaaagaaat agcaagcagg acccgcaggg aactactggg cctctcggat gaaggtggac 2581 ccaagtcaga aggagcgcca aaggccaaat caaaaccccg gaagcggtta gaagaaagcc 2641 aaggaggccc caagccaggg gcagtgaggt catctagcag tgacaggatc ccatcaggct 2701 ccttggaaag tgcttctact tccgaagcca accctgaggg gcactcagcc agctctgacc 2761 aggaccctgt ggcagacaga gagggcagcc cggtctccgg cagcagcccc ttccagctca 2821 cggctttctc cgatgaagac attatagact tgaagtaaca gagttgaatc tcatttgcca 2881 tctttagttt tcttatggag gtttatactc tttaaacagt tctgatgtaa tttctcaaca 2941 aaatgtggct tttagcctgt cagtgatcta ttggaccaaa ccttctgcac actcggccag

3001 ttccctctcc aatgtccggt gccatctttc ctgacctttg tttctttctg ttcaggaacc 3061 atcagtcccc ttgtaataaa ggtggtagat ttcattgagg ttttagattg aaactttgaa 3121 taaatcaaaa atactcattc ttaaaaaaaa aaaaaaaaaa (SEQ ID NO: 608)

SEQ ID NO: 609 NMJH7585 Homo sapiens solute carrier family 2 (facilitated glucose transporter), member 6 (SLC2A6), mRNA

1 ctgagcgccc tccgctcgcc ccgagagaga cccggccatg caggagccgc tgctgggagc 61 cgagggcccg gactacgaca ccttccccga gaagccgccc ccgtcgccag gggacagggc 121 gcgggtcggg accctgcaga acaaaagggt gttcctggcc accttcgccg cagtgctcgg 181 caatttcagc tttgggtatg ccctggtcta cacatcccct gtcatcccag ccctggagcg 241 ctccttggat cctgacctgc atctgaccaa atcccaggca tcctggtttg ggtccgtgtt 301 caccctggga gcagcggccg gaggcctgag tgccatgatc ctcaacgacc tcctgggccg 361 gaagctgagc atcatgttct cagctgtgcc gtcggcggcc ggctatgcgc tcatggcggg 421 tgcgcacggc ctctggatgc tgctgctcgg aaggacgctg acgggcttcg ccggggggct 481 cacagctgcc tgcatcccgg tgtacgtgtc tgagattgct cccccaggcg ttcgtggggc 541 tctgggggcc acaccccagc tcatggcagt gttcggatcc ctgtccctct acgcccttgg 601 cctcctgctg ccgtggcgct ggctggctgt ggccggggag gcgcctgtgc tcatcatgat 661 cctgctgctc agcttcatgc ccaactcgcc gcgc ' ttcctg ctctctcggg gcagggacga 721 agaggccctg cgggcgctgg cctggctgcg tgggacggac gtcgatgtcc actgggagtt 781 cgagcagatc caggacaacg tccggagaca gagcagccga gtatcgtggg ctgaggcacg 841 ggccccacac gtgtgccggc ccatcaccgt ggccttgctg atgcgcctcc tgcagcagct 901 gacgggcatc acgcccatcc tggtctacct gcagtccatc ttcgacagca ccgctgtcct 961 gctgcccccc aaggacgacg cagccatcgt tggggccgtg cggctcctgt ccgtgctgat 1021 cgccgccctc accatggacc tcgcaggccg caaggtgctg ctcttcgtct cagcggccat 1081 catgtttgct gccaacctga ctctggggct gtacatccac tttggcccca ggcctctgag

1141 ccccaacagc actgcgggcc tggaaagcga gtcctggggg gacttggcgc agcccctggc 1201 agcacccgct ggctacctca ccctggtgcc cctgctggcc accatgctct tcatcatggg 1261 ctacgccgtg ggctggggtc ccatcacctg gctgctcatg tctgaggtcc tgcccctgcg 1321 tgcccgtggc gtggcctcag ggctctgcgt gctggccagc tggctcaccg ccttcgtcct 1381 caccaagtcc ttcctgccag tggtgagcac cttcggcctc caggtgcctt tcttcttctt 1441 cgcggccatc tgcttggtga gcctggtgtt cacaggctgc tgtgtgcccg agaccaaggg 1501 acggtccctg gagcagatcg agtccttctt ccgcatgggg agaaggtcct tcttgcgcta 1561 ggtcaaggtc cccgcctgga gggggccaaa cccccagtgg ctgggcctct gtgttggcta 1621 caaacctgca ccctgggacc aagaggcagc agtcatccct gccaccagcc agagcacagg 1681 aagagcagtg tgatggggcc tcagcagcgg gtgcccctgg ctcgggacag gtagcactgc 1741 tgtccagcca cagccccagc ccaggcagcc cacagtgctg cacgtagcca tgggccgcag 1801 gagtgcatac aaccctgcat ccagggacac ggccctgctg ggtgacctca ggcctagtcc 1861 ctttcccttg cgtgaaggac acgccccaca gaaggctacg gggaggactg agaggacagg 1921 gctggaggca gccaagtaac gtagtcatat catcgcgctc tgatctggtg gcatctggct 1981 gtgcaaggaa gacccggctt tgccctcaca agtcttatgg gcaccacagg gaacatcctg 2041 gacttaaaaa gccagggcag gccgggcaca gtggctcacg cctgtaatcc cagcactttg 2101 ggaggccaaa gcaggtggat tacccaaggc caggagttca agaccagcct ggccaacatg 2161 gtgaaacccc gtctctacta aaaaatacaa aaaagctggg tgtggtggca cacacccgta 2221 gttccagcta cttgggaggc tgaggcagca ttgcttgaac ccgggaggtg gaggctgcaa 2281 tgagctgaga tcatgccatt gcactccagc ctgggcaacg agagtgaaac tccgtcccca 2341 ccccctgcca aaaaaaaaaa aaaaaaagcc agggcaaagg acctggcgtg gccacttcct 2401 cctgccccag cccaacctct gggaacaggc agctcctatc tgcaaactgt gttcaccctt 2461 ttgtaaaaat aaaggaactg gacccgt (SEQ ID NO: 609)

SEQ ID NO: 610 AF047004 Homo sapiens diniethylglycine dehydrogenase-like protein isoform 1 mRNA, complete cds

1 cctggagttc cggccaggcc actgcttggg aagcaagaag gtgaaggcac ctctgctggg 61 ccaagcactc ttagggccga ggggcactgc agctgacaag agctccctgt tttgctgagg 121 cctggagccc ccatggcctc actgagccga gccctacgtg tggctgctgc ccaccctcgc 181 cagagcccta cccggggcat ggggccatgc aacctgtcca gcgcagctgg ccccacagcc 241 gagaagagtg tgccatatca gcggaccctg aaggagggac agggcacctc ggtggtggcc 301 caaggcccaa gccggcccct gcccagcacg gccaacgtgg tggtcattgg tggaggcagc 361 ttgggctgcc agaccctgta ccacctggcc aagctgggca tgagtggggc ggtgctgctg 421 gagcgggagc ggctgacctc cgggaccacc tggcacacgg caggcctgct gtggcagctg 481 cggcccagtg acgtggaggt ggagcttctg gcccacactc ggcgggtggt gagccgggag 541 ctggaggagg agacgggact acacacgggc tggatccaga atgggggcct cttcatcgcg 601 tccaaccggc agcgcctgga cgagtacaag aggctcatgt cgctgggcaa ggcgtatggt 661 gtggaatccc atgtgctgag cccggcagag accaagactc tgtacccgct gatgaatgtg 721 gacgacctct acgggaccct gtatgtgccg cacgacggta ccatggaccc cgctggcacc 781 tgtaccaccc tcgccagggc agcttctgcc cgaggagcac aggtcattga gaactgccca 841 gtgaccggca ttcgtgtgtg gacggatgat tttggggtgc ggcgggtcgc gggtgtggag 901 actcagcatg gttccatcca gacaccctgc gtggtcaatt gtgcaggagt gtgggcaagt 961 gctgtgggcc ggatggctgg agtcaaggtc ccgctggtgg ccatgcacca tgcctatgtc 1021 gtcaccgagc gcatcgaggg gattcagaac atgcccaatg tccgtgatca tgatgcctct 1081 gtctacctcc gcctccaagg ggatgccttg tctgtgggtg gctatgaggc caaccccatc 1141 ttttgggagg aggtgtcaga caagtttgcc ttcggcctct ttgacctgga ctgggaggtg 1201 ttcacccagc acattgaagg cgccatcaac agggtccccg tgctggagaa gacaggaatc 1261 aagtccacgg tctgcggccc tgaatccttc acgcccgacc acaagcccct gatgggggag 1321 gcacctgagc tccgagggtt cttcctgggc tgtggcttca acagcgcagg gaaggtccag

1381 acagtcctgc cactcctgtt taccgtcaac gtctatctgt atctgtaggt caggaggaca 1441 aacataggtc aataaatatg taatgttagt gaacg (SEQ ID NO: 610)

SEQ ID NO: 611 AL136566 Homo sapiens mRNA; cDNA DKFZp761 J191 (from clone DKFZp761J191)

1 gccggagccc ggaccaggcg cctgtgcctc ctcctcgtcc ctcgccgcgt ccgcgaagcc 61 tggagccggc gggagccccg cgctcgccat gtcgggcgag ctcagcaaca ggttccaagg 121 agggaaggcg ttcggcttgc tcaaagcccg gcaggagagg aggctggccg agatcaaccg 181 ggagtttctg tgtgaccaga agtacagtga tgaagagaac cttccagaaa agctcacagc 241 cttcaaagag aagtacatgg agtttgacct gaacaatgaa ggcgagattg acctgatgtc 301 tttaaagagg atgatggaga agcttggtgt ccccaagacc cacctggaga tgaagaagat 361 gatctcagag gtgacaggag gggtcagtga cactatatcc taccgagact ttgtgaacat 421 gatgctgggg aaacggtcgg ctgtcctcaa gttagtcatg atgtttgaag gaaaagccaa 481 cgagagcagc cccaagccag ttggcccccc tccagagaga gacattgcta gcctgccctg 541 aggaccccgc ctggactccc cagccttccc accccatacc tccctcccga tcttgctgcc 601 cttcttgaca cactgtgatc tctctctctc tcatttgttt ggtcattgag ggtttgtttg 661 tgttttcatc aatgtctttg taaagcacaa attatctgcc ttaaaggggc tctgggtcgg 721 ggaatcctga gccttgggtc ccctccctct cttcttccct ccttccccgc tccctgtgca 781 gaagggctga tatcaaacca aaaactagag ggggcagggc cagggcaggg aggcttccag 841 cctgtgttcc cctcacttgg aggaaccagc actctccatc ctttcagaaa gtctccaagc 901 caagttcagg ctcactgacc tggctctgac gaggacccca ggccactctg agaagacctt 961 ggagtaggga caaggctgca gggcctcttt cgggtttcct tggacagtgc catggttcca 1021 gtgctctggt gtcacccagg acacagccac tcggggcccc gctgccccag ctgatcccca 1081 ctcattccac acctcttctc atcctcagtg atgtgaaggt gggaaggaaa ggagcttggc 1141 attgggagcc cttcaagaag gtaccagaag gaaccctcca gtcctgctct ctggccacac

1201 ctgtgcaggc agctgagagg cagcgtgcag ccctactgtc ccttactggg gcagcagagg 1261 gcttcggagg cagaagtgag gcctggggtt tggggggaaa ggtcagctca gtgctgttcc 1321 accttttagg gaggatactg aggggaccag gatgggagaa tgaggagtaa aatgctcacg 1381 gcaaagtcag cagcactggt aagccaagac tgagaaatac aaggttgctt gtctgacccc 1441 aatctgcttg aaacctgact ctgcttctct catttgtctt cctaccctac tcacataatt 1501 cactcattga ctcactcatt caccagatat ttattgacct gctattataa gctttacatc 1561 ctcccatgtt gtcctggcat gtgcagtata cacggtctaa ctcatctctc cccagatctc 1621 tcagaacctt gagcttggga attgaactgg ggtcacctgt gtcctttctt atggactcgc 1681 aggattttag aaccctaatg caccctggag ggtagctggg ccagacttct catttcacag 1741 gtgaggagac tggtgcccca cagggattaa gtgccttgcc caaggtcagg cttatctcca 1801 gagggaggtg ccctggactg gggcccagat gttcagggac cctgcctaca cctcatttcc 1861 agtgtgggct gccttagtta gttatgagaa cagggaaggg ctgggaagag acagcctcca 1921 aggtcaacac ttggagaggg tttcacttgc tctgaagacc ctggtccagg attcgccctc 1981 tcccatgcct tcaagtcagc atcaggctta gggcaaagac caggcctctg aagctgcctc 2041 ttgtaattca tgcaggaaga tgtcaaagtc agccccatct tggctgatca gggtgttcag 2101 ccttaacccc acctgtgttc tgaagtctct taccctacct gctcaggact gagacagtta 2161 ttcactgaac atatttatta agcacttgct gtaggccaac agttaagaat ccaataatga 2221 aatggacaga ttcatggaac ttagagtcca ataggaaagt gagacccaga caatgacaat 2281 gagataaatg ttaggaaggg ggaggtatgg ggtgacttcc ctgcagtcct gggggcctac 2341 atgggcccaa gactgggtga gagtcttggc agagcctttg caacacctta agtggacagg 2401 actgggaggt cttggtggtt ggagccaacg tgggttccct gcggctcctt agtcacctct 2461 gatagcagat tgagggagga aaacaggtaa ggcatgagga aatggccagg ttgggttaac 2521 ccactggttt caaccagttc aggaatgagg ttatttggcc atgactggct gatcttgagc 2581 tcaaggatct gcttcaaatg cacacaggcc tagttgaagt ttaaacccca gcaaaacatt

2641 cctccctgta aatggaaaat cctacttcta cccccaccct gccctgtttt ttgttttttt 2701 tttccccaag atcattagat gtcctcaccc ctcctcactg cctctcctct ctgggacagg 2761 ctgggacctt tgaggaagat aaagccttcc ttgactaccc atcatattca gtgtccctgt 2821 tcctcactca gagaggaagg cagaaccagt caggcttatt tcagtaagtt ccacagttct 2881 acaagactgc aggaattctc cttaagggag gagagcaagc aggtgtggcc ccagcttctg 2941 gaaatggcag aagagagggt tttctcattg aatgggggtg ggggctcgtg tgtcctggga 3001 aaccccatca gtcccttcat ttcttgagac tcaactcctg ggaggagagg gtctcaagag 3061 ttgtccctgg aaggagggcg ggggcagtct gcatctattt caggttgtgg ctcttggttc 3121 taggactctt acttctctgg ctaagggctc agcttcttgg gacttcaacc atcttctttc 3181 tgaaagacca aatctaatgt aaccagtaac gtgaggactg ccaagtatgg ctttgtccct 3241 atgactcaga ggagggtttg tcgggcaaat tcaggtggat gaagtatgtg tgtgcgtgtg 3301 catgggagtg tgcgtggact gggatatcat ctctacagcc tgcaaataaa ccagacaaac 3361 ttaaaaaaaa aaaaaaaaaa a (SEQ ID NO: 611)

SEQ ID NO: 612 NM_005545 Homo sapiens immunoglobulin superfamily containing leucine-rich repeat (ISLR), transcript variant 1, mRNA

1 aagcagttgt tttgctggaa ggagggagtg cgcgggctgc cccgggctcc tccctgccgc 61 ctcctctcag tggatggttc caggcaccct gtctggggca gggagggcac aggcctgcac 121 atcgaaggtg gggtgggacc aggctgcccc tcgccccagc atccaagtcc tcccttgggc 181 gcccgtggcc ctgcagactc tcagggctaa ggtcctctgt tgctttttgg ttccacctta 241 gaagaggctc cgcttgacta agagtagctt gaaggaggca ccatgcagga gctgcatctg 301 ctctggtggg cgcttctcct gggcctggct caggcctgcc ctgagccctg cgactgtggg 361 gaaaagtatg gcttccagat cgccgactgt gcctaccgcg acctagaatc cgtgccgcct 421 ggcttcccgg ccaatgtgac tacactgagc ctgtcagcca accggctgcc aggcttgccg 481 gagggtgcct tcagggaggt gcccctgctg cagtcgctgt ggctggcaca caatgagatc

541 cgcacggtgg ccgccggagc cctggcctct ctgagccatc tcaagagcct ggacctcagc 601 cacaatctca tctctgactt tgcctggagc gacctgcaca acctcagtgc cctccaattg 661 ctcaagatgg acagcaacga gctgaccttc atcccccgcg acgccttccg cagcctccgt 721 gctctgcgct cgctgcaact caaccacaac cgcttgcaca cattggccga gggcaccttc 781 accccgctca ccgcgctgtc ccacctgcag atcaacgaga accccttcga ctgcacctgc 841 ggcatcgtgt ggctcaagac atgggccctg accacggccg tgtccatccc ggagcaggac 901 aacatcgcct gcacctcacc ccatgtgctc aagggtacgc cgctgagccg cctgccgcca 961 ctgccatgct cggcgccctc agtgcagctc agctaccaac ccagccagga tggtgccgag 1021 ctgcggcctg gttttgtgct ggcactgcac tgtgatgtgg acgggcagcc ggcccctcag 1081 cttcactggc acatccagat acccagtggc attgtggaga tcaccagccc caacgtgggc 1141 actgatgggc gtgccctgcc tggcacccct gtggccagct cccagccgcg cttccaggcc 1201 tttgccaatg gcagcctgct tatccccgac tttggcaagc tggaggaagg cacctacagc 1261 tgcctggcca ccaatgagct gggcagtgct gagagctcag tggacgtggc actggccacg 1321 cccggtgagg gtggtgagga cacactgggg cgcaggttcc atggcaaagc ggttgaggga 1381 aagggctgct atacggttga caacgaggtg cagccatcag ggccggagga caatgtggtc 1441 atcatctacc tcagccgtgc tgggaaccct gaggctgcag tcgcagaagg ggtccctggg 1501 cagctgcccc caggcctgct cctgctgggc caaagcctcc tcctcttctt cttcctcacc 1561 tccttctagc cccacccagg gcttccctaa ctcctcccct tgcccctacc aatgcccctt 1621 taagtgctgc aggggtctgg ggttggcaac tcctgaggcc tgcatgggtg acttcacatt 1681 ttcctacctc tccttctaat ctcttctaga gcacctgcta tccccaactt ctagacctgc 1741 tccaaactag tgactaggat agaatttgat cccctaactc actgtctgcg gtgctcattg 1801 ctgctaacag cattgcctgt gctctcctct caggggcagc atgctaacgg ggcgacgtcc 1861 taatccaact gggagaagcc tcagtggtgg aattccaggc actgtgactg tcaagctggc 1921 aagggccagg attgggggaa tggagctggg gcttagctgg gaggtggtct gaagcagaca

1981 gggaatggga gaggaggatg ggaagtagac agtggctggt atggctctga ggctccctgg 2041 ggcctgctca agctcctcct gctccttgct gttttctgat gatttggggg cttgggagtc 2101 cctttgtcct catctgagac tgaaatgtgg ggatccagga tggccttcct tcctcttacc 2161 cttcctccct cagcctgcaa cctctatcct ggaacctgtc ctccctttct ccccaactat 2221 gcatctgttg tctgctcctc tgcaaaggcc agccagcttg ggagcagcag agaaataaac 2281 agcatttctg atgccaaaaa aaaaaaaaaa aa (SEQ ID NO: 612)

SEQ ID NO: 613 AF327066, Homo sapiens Ewings sarcoma EWS-Flil (type 1) oncogene niRNA, complete cds

1 atggcgtcca cggattacag tacctatagc caagctgcag cgcagcaggg ctacagtgct 61 tacaccgccc agcccactca aggatatgca cagaccaccc aggcatatgg gcaacaaagc 121 tatggaacct atggacagcc cactgatgtc agctataccc aggctcagac cactgcaacc 181 tatgggcaga ccgcctatgc aacttcttat ggacagcctc ccactggtta tactactcca 241 actgcccccc aggcatacag ccagcctgtc caggggtatg gcactggtgc ttatgatacc 301 accactgcta cagtcaccac cacccaggcc tcctatgcag ctcagtctgc atatggcact 361 cagcctgctt atccagccta tgggcagcag ccagcagcca ctgcacctac aagaccgcag 421 gatggaaaca agcccactga gactagtcaa cctcaatcta gcacaggggg ttacaaccag 481 cccagcctag gatatggaca gagtaactac agttatcccc aggtacctgg gagctacccc 541 atgcagccag tcactgcacc tccatcctac cctcctacca gctattcctc tacacagccg 601 actagttatg atcagagcag ttactctcag cagaacacct atgggcaacc gagcagctat 661 ggacagcaga gtagctatgg tcaacaaagc agctatgggc agcagcctcc cactagttac 721 ccaccccaaa ctggatccta cagccaagct ccaagtcaat atagccaaca gagcagcagc 781 tacgggcagc agagtcctcc ccttggaggg gcacaaacga tcagtaagaa tacagagcaa 841 cggccccagc cagatccgta tcagatcctg ggcccgacca gcagtcgcct agccaaccct 901 ggaagcgggc agatccagct gtggcaattc ctcctggagc tgctctccga cagcgccaac

961 gccagctgta tcacctggga ggggaccaac ggggagttca aaatgacgga ccccgatgag

1021 gtggccaggc gctgggggca gcggaaaagc aagcccaaca tgaattacga caagctgagc

1081 cgggccctcc gttattacta tgataaaaac attatgacca aagtgcacgg caaaagatat

1141 gcttacaaat ttgacttcca cggcattgcc caggctctgc agccacatcc gaccgagtcg

1201 tccatgtaca agtacccttc tgacatctcc tacatgcctt cctaccatgc ccaccagcag

1261 aaggtgaact ttgtccctcc ccatccatcc tccatgcctg tcacttcctc cagcttcttt

1321 ggagccgcat cacaatactg gacctccccc acggggggaa tctaccccaa ccccaacgtc

1381 ccccgccatc ctaacaccca cgtgccttca cacttaggca gctactacta g (SEQ IDNO: 613)

SEQ ID NO-.614 U73531 Human G protein-coupled receptor STRL33.3 (STRL33) mRNA, complete cds

1 atttttatta agcagtctta gcccaaaggc agcatccttc cttgctagag agaaagggca 61 ctttggtccc tggaaagaca gaggcaagca gcagcatcgg agacactgct cccagtcagg 121 actcaaagtc agcgacagaa gtgtttctga gtggattagg aaaggtaacc tcatcgttta 181 tatgcacttg tctggtcagg caatattttg actttgctgg cagagattct gtccaaacac 241 ctgctcttct tcatacatct tctagaggtg ctggccagac atggctccag gtcactggaa 301 atgagctgct gcatgttgag tatctgcagt cctgtagcaa gggcagactt ggcactcatg 361 ggctgatgtt gccgcagctg cccctgctcc cacaccacag gttacatgat cccttgtcct 421 gtccatggtc tttggcaggg tcacagggca gagggaaggg tcagagagaa gtgacatctt 481 gaagggctgg tgcctgggta agaaaggttg cccatctggc atcccatttc aattgggttt 541 tctgcttgtt aaatgaggcc cctaagtcct aacctgccaa tcacaggagc taaggcaagg 601 ttccgctttg gggaaatcta ccttttaaga gacttcttgt tcagaagtct tcaggaaatg 661 aggctctgat ggtagaatgc cataaactgt gttaactgat gaaggggaaa gtttagttgg 721 gaagtgagga gaaccaccca atgctttaac catgaagcca gctcagccaa agtgctgggc 781 agtcgtgggc ttttctatgc tttgtttccc cattagtagc ctttgaaaat ctatgcaatt

841 gaggggaagt aaaggcagga aggactacct acccaggcag agcagtcttg ccatccccaa 901 acacctgtgg tctccaggag tctccttgat aggagagccc cctggtaggg gcacttgctt 961 tagctttcac aatttattag gaaatggggc tcaggatggg tgggcaactg tggtgaggca 1021 gggggagatg aaaacaggca tgttccattg atgagctcat attatcagtg ggctcaacca 1081 tccatcatca gtgttgctct tccaaacagc actgtgccca cctggcagca aagcgacttt 1141 tggtttcaaa ataattgagc acaggatttt atggaatgtg cttaggggtc agttatgagt 1201 tgtctcccag atgggtgaga tcctgagaat tttcaggcta atggagagtc ctcatcctgt 1261 ctgagcaatt tcccctcaga attggttatc ttcaatatac tggactgtgc tgtttctaca 1321 catcccagtg ggtgggttta gaagatgact atttgccccc taaatgtggt caatgggata 1381 gcaggaagac aaagaatgcc atcctcagcc ccaaatataa ttcctgggtt ctgactcaca 1441 ggtgttcatc agaacagaca ccatggcaga gcatgattac catgaagact atgggttcag 1501 cagtttcaat gacagcagcc aggaggagca tcaagccttc ctgcagttca gcaaggtctt 1561 tctgccctgc atgtacctgg tggtgtttgt ctgtggtctg gtggggaact ctctggtgct 1621 ggtcatatcc atcttctacc ataagttgca gagcctgacg gatgtgttcc tggtgaacct 1681 acccctggct gacctggtgt ttgtctgcac tctgcccttc tgggcctatg caggcatcca 1741 tgaatgggtg tttggccagg tcatgtgcaa aagcctactg ggcatctaca ctattaactt 1801 ctacacgtcc atgctcatcc tcacctgcat cactgtggat cgtttcattg tagtggttaa 1861 ggccaccaag gcctacaacc agcaagccaa gaggatgacc tggggcaagg tcaccagctt 1921 gctcatctgg gtgatatccc tgctggtttc cttgccccaa attatctatg gcaatgtctt 1981 taatctcgac aagctcatat gtggttacca tgacgaggca atttccactg tggttcttgc 2041 cacccagatg acactggggt tcttcttgcc actgctcacc atgattgtct gctattcagt 2101 cataatcaaa acactgcttc atgctggagg cttccagaag cacagatctc taaagatcat 2161 cttcctggtg atggctgtgt tcctgctgac ccagatgccc ttcaacctca tgaagttcat 2221 ccgcagcaca cactgggaat actatgccat gaccagcttt cactacacca tcatggtgac

2281 agaggccatc gcatacctga gggcctgcct taaccctgtg ctctatgcct ttgtcagcct 2341 gaagtttcga aagaacttct ggaaacttgt gaaggacatt ggttgcctcc cttaccttgg 2401 ggtctcacat caatggaaat cttctgagga caattccaag actttttctg cctcccacaa 2461 tgtggaggcc accagcatgt tccagttata ggccttgcca gggtttcgaa aaactgctct 2521 ggaatttgca aggcatggct gtgccctctt gatgtggtga ggcaggcttt gtttatagct 2581 tgcgcattct catggagaag ttatcagaca ctctggctgg tttggaatgc ttcttctcag 2641 gcatgaacat gtactgttct cttcttgaac actcatgctg aaagcccaag tagggggtct 2701 aaaattttta aggactttcc ttcctccatc tccaagaatg ctgaaaccaa gggggatgac 2761 atgtgactcc tatgatctca ggttctcctt gattgggact gggg (SEQ ID NO: 614)

SEQ ID NO.615 BCO 16797, Homo sapiens chromosome 7 open reading frame 19, mRNA (cDNA clone IMAGE:4070303), partial cds

1 ggggggcttc ttcatgctct gatcacatct ctcgtaaaag cttaagctct ctccggggtc 61 cgggttggcc gtgccgtgga attctgggtg gcctggctgg ggtctctgga aatgtggctg 121 cagcagagaa cagagaccct gacatgcagt tttccgtgct gaggggccct aggggagtca 181 caccaagggt ccccacgaga aagttgtggc atccccgggg gccggagaag agccccgtgt 241 cttctgagga gttcgtcctt tgtgtcccct gcagacattt gtctgcgacc tttgccctcc 301 agcatgtatg tactttcctg cagcctgtag aaacgcctct tacggtttaa tatgtgttcg 361 ctttgctaaa gaatatcaac atcggccagg cgaggtgggg cacgcctgtc atcccagcac 421 tttgggaggc tgaggtggga ggatcacttg ggcccagggg tgcaagacca gcctgggcaa 481 catagcgaga ccccatgtct aaaaaaatta ttttaaatta gccaggccgg gtgcaatggc 541 tcgcgcccgt aatcttagca ctctgggagg ccgaggcagg cagatcactt gagatcagga 601 ctttaagacc agcctcggca acaacatggt gaaaccatct ctagcaaaaa tacaaaaaat 661 tagccgggta tggtggcggg tacctgtaat cccagctact caggaggctg aggcaagaga 721 atcgcttgaa cgcaggaggc agaggttgca gtgagctgag atcgtgccac tgcactccag

781 cctggacaac agagcaaaac tctgtctcaa aaaataataa ataaaaataa attagctggg 841 cgtggtggtg catgcctgta gttccagcta cttgggaggc tgaggtggga ggattgcttg 901 agcctgggaa gtagaggctg cagtgaacta taactgtgct agtggccggg cgcagtggct 961 cacgcctata atcccagcac tttgggaggc caaagcaggt ggatcacttg aggtcaggag 1021 ttcgagacca gcctggccaa catggtgaaa ctctgtgtct actaaaaata caaaaaaaaa 1081 aaaaaaaaaa aaaaaaaaaa a (SEQ ID NO: 615)

SEQ IDNO:616 BE781857, Homo sapiens cDNA clone IMAGE:3873282 5-, niRNA sequence

1 tgtagccagc tcggctccct tccctgtgta tctgtgtcct gctaacagcc aagagatgtt 61 gcaagggagg aaaatgtgag agaccttgga acctgtcagg tttattgttt cgtttttaaa 121 ggcatgtttg aagtttagtt ctttaccctt ctcctaaaat ctttttttaa tcagcctcaa 181 ggttaaaata aggagtgact acagtatgta aaataaggaa ag'gaagcatt aatggtgtga 241 tgtgacctgc ctgttttttt gtaaacaaga gaataggaaa tgttttcaag gtagtttcac 301 atgtcttgca ccaagctcat gcctcttgct tttccttttt gactttatct ccctcagttt 361 ttcttctgct gtggccagaa agacagtcac tacagttgac tattgataca aaggtgcaac 421 agaaatatta tccctgcatt tttaaatata agaagtagac attaatcttt aaccatggtg 481 cctccctaat gtaagtgata tttcattggt ggtttcaaca aaggttaagc tcattacaga 541 cagaaatatt cgtctttatc ttccttttcc cctgcctcag tcgtgttatt cacccctatt 601 cttgatattt caaaggagga gaatcagtag cattttcctt atattataca catgtgtcta 661 tcccatttca ggtcaagtct tacacccaac tcatggcttc cagtaggaaa ataagacatt 721 ctgccttagt gttaaatgca agatagggct tctcttccgg atgaggactg gttgttctac 781 tctagtctgg gactaacatc cgactgggct acttaattaa ggacgacaga agtgctccaa 841 tttaaaacgt gtccaggata agagatcaca aaaggttggt cagaataggc ttttcacata 901 gacatcgagg tcccaacggg gggaattaaa cataggtatc tgatgttatc ataga (SEQ ID NO:616)

SEQ IDNO: 617NM_024660 Homo sapiens transmembrane protein 149 (TMEM 149), mRNA

1 acacaacttc agctgaggaa cttggcacgg ccagcttggg acccaggacc ctaacgcaga 61 ggcgctgtgt ttggaagtcc cgctatcacg gccccccaga tggggcctgg acgatgcctc 121 ctgacggcct tgttgcttct ggccctggcg ccaccgccgg aagcctccca gtactgcggc 181 cgccttgaat actggaaccc agacaacaag tgctgcagca gctgcctgca acgcttcggg 241 ccgcccccct gcccggacta tgagttccgg gaaaactgcg gactcaatga ccacggcgat 301 ttcgtaacgc ccccgttccg aaagtgttct tctgggcagt gcaaccccga cggcgcggag 361 ctatgtagcc cctgcggcgg cggagccgtg acccctactc ccgccgcggg cgggggcaga 421 accccgtggc gctgcagaga gaggccggtc cctgccaagg ggcactgccc cctcacacct 481 ggaaacccag gcgcccctag ctcccaggag cgcagctcac cagcaagttc cattgcctgg 541 aggacccctg agcctgtccc tcagcaggcc tggccgaatt tccttccgct cgtggtgctg 601 gtcctgctcc tgaccttggc ggtgatagcg atcctcctgt ttattctgct ctggcatctc 661 tgctggccca aggagaaagc cgacccctat ccctatcctg gcttggtctg cggagtcccc 721 aacacccaca ccccttcctc ctcgcatctg tcctccccag gcgccctgga gacaggggac 781 acatggaagg aggcctcact acttccactc ctgagcaggg aactgtccag tctggcgtca 841 caacccctgt ctcgcctcct ggatgagctg gaggtgctgg aagagctgat tgtactgctg 901 gaccctgagc ctgggccagg tgggggtatg gcccatggca ctactcgaca cctggccgca 961 agatatgggc tgcctgctgc ctggtccacc tttgcctatt cgctgaggcc gagtcgctcg 1021 ccgctgcggg ctctgattga gatggtggtg gcaagggagc cctctgcctc cctgggccag 1081 cttggcacac acctcgccca gctagggcgg gcagatgcat tgcgggtgct gtccaagctt 1141 ggctcatctg gggtttgctg ggcttaacac ccaataaaga actttgctga ctactaaaaa 1201 aaaaaaaaaa aaaaaaaa (SEQ ID NO: 617)

SEQ ID NO : 618 NM_019099 Homo sapiens chromosome 1 open reading frame 183 (Clorfl83), transcript variant 1, mRNA

1 gaagcgactc tgagtcccgg gctcggagcg caggctcagc tccgcgctgc gagcgctacg 61 ggcgcagggg cggggagccg gcccggagcg cagtttccag tggggccggg gtttcacccg 121 ggccctctct gtttgaaccg aacccgacaa atgggcgcat gacgatggag agcagggaaa 181 tggactgcta tctccgtcgc ctcaaacagg agctgatgtc catgaaggag gtgggtgatg 241 gcttacagga tcagatgaac tgcatgatgg gtgcactgca agaactgaag ctcctccagg 301 tgcagacagc actggaacag ctggagatct ctggaggggg tcctgtgcca ggcagccctg 361 aaggtcccag gacccagtgc gagcaccctt gttgggaggg tggcagaggt cctgccaggc 421 ccacagtctg ttccccctcc agtcaacctt ctcttggcag cagcaccaag tttccatccc 481 ataggagtgt ctgtggaagg gatttagccc ccttgcccag gacacagcca catcaaagct 541 gtgctcagca ggggccagag cgagtggaac cggatgactg gacctccacg ttgatgtccc 601 ggggccggaa tcgacagcct ctggtgttag gggacaacgt ttttgcagac ctggtgggca 661 attggctaga cttgccagaa ctggagaagg gtggggagaa gggtgagact gggggggcac 721 gtgaacccaa aggagagaaa ggccagcccc aggagctggg ccgcaggttc gccctgacag 781 caaacatctt taagaagttc ttgcgtagtg tgcggcctga ccgtgaccgg ctgctgaagg 841 agaagccagg ctgggtgaca cccatggtcc ctgagtcccg aaccggccgc tcacagaagg 901 tcaagaagcg gagcctttcc aagggctctg gacatttccc cttcccaggc accggggagc 961 acaggcgagg ggagaatccc cccacaagct gccccaaggc cctggagcac tcaccctcag 1021 gatttgatat taacacagct gtttgggtct gaatcctaga gacagaaagt tgactgagcc 1081 tgaaagggcc aggtcccagt gctgggcccc tggggaggag ggagggtggg cggtatggct 1141 ctcgaaagcc caactccaag ttcctttccc ccagaaagcg gggagaagcc agagttcttg 1201 gctcaggact gaagggaagg tggttgggag aggctgtctt gggggctagc tggtggagga 1261 ggtaagagta gctggagagt gagctgtgcg tgtgtgtgtg tgtgtgtgca tgtgtgtgtc

1321 tgtctggcat gcatgcactc actttggggc tggaggtgac agtaggtgag ggcagaggag

1381 gagatcagaa aatccctctg acatctccac tgcccccaaa gacctccgtt gaacattctg

1441 tatggaaaag agccctggag catcaggttc cccagatagg cccccaaata aagacctgtc

1501 tatggctctc ccaaccttct gtcagcttct ttggcaagac attgctccag gcacagggac

1561 tgaaccccag gcctcctggg actggagcag cagtgaggca aaacccgacc tgctagccct

1621 ttctgccttg gaggtttcag tccatacctg gactctgaga aaatgagctg aataaggagt

1681 acagtgtgta aggagcagcc agggaagccc tagacactcc ccgcgtctcc cccatgcaca

1741 ggggaaggat gttgacatag cactgggctg tttgaatgcc ttttcatctc catggtctca

1801 tttgaaagtg agcgaggcag gcaggcatga tcccattttc cagataagga aacaagccta

1861 gatatgctac atgtccagga acaactgcag ccaggaggca gaacagccta ggtctaactg

1921 cagagtagaa gctggaccct ggagttacca acactcctcc ccaacagttc ttagcgcccc

1981 gcaggctggg cgctgtggct cacgcctgta atcccagcac tttgggaggg caaggcaggc

2041 ggattacctg gggtcaggag ttcatgacca gcctggccaa catggtgaaa ccccgtctct

2101 actaaaaaaa tacgtaaaaa ttagccaggc gtggtggcac acgcctgtaa acccagctac

2161 tcgggaggct gaggcaggag aattgcttga gcccgggaga gggaggttgc agtgagccga

2221 gatcatgcca ctgcactcca gcctggctga cagagcaaga ctcccctgtc tc (SEQ IDNO: 618)

SEQ ID NO: 619 ALl 33001 Novel human gene on chromosome 20, similar to GLUCOSAMINE-ό-SULFATASES

1 tacaaggcca gctatgtccg cagtcgctcc atccgctcag tggccatcga ggtggacggc 61 agggtgtacc acgtaggcct gggtgatgcc gcccagcccc gaaacctcac caagcggcac 121 tggccagggg cccctgagga ccaagatgac aaggatggtg gggacttcag tggcactgga 181 ggccttcccg actactcagc cgccaacccc attaaagtga cacatcggtg ctacatccta 241 gagaacgaca cagtccagtg tgacctggac ctgtacaagt ccctgcaggc ctggaaagac 301 cacaagctgc acatcgacca cgagattgaa accctgcaga acaaaattaa gaacctgagg 361 gaagtccgag gtcacctgaa gaaaaagcgg ccagaagaat gtgactgtca caaaatcagc

421 taccacaccc agcacaaagg ccgcctcaag cacagaggct ccagtctgca tcctttcagg 481 aagggcctgc aagagaagga caaggtgtgg ctgttgcggg agcagaagcg caagaagaaa 541 ctccgcaagc tgctcaagcg cctgcagaac aacgacacgt gcagcatgcc aggcctcacg 601 tgcttcaccc acgacaacca gcactggcag acggcgcctt tctggacact ggggcctttc 661 tgtgcctgca ccagcgccaa caataacacg tactggtgca tgaggaccat caatgagact 721 cacaatttcc tcttctgtga atttgcaact ggcttcctag agtactttga tctcaacaca 781 gacccctacc agctgatgaa tgcagtgaac acactggaca gggatgtcct caaccagcta 841 cacgtacagc tcatggagct gaggagctgc aagggttaca agcagtgtaa cccccggact 901 cgaaacatgg acctgggact taaagatgga ggaagctatg agcaatacag gcagtttcag 961 cgtcgaaagt ggccagaaat gaagagacct tcttccaaat cactgggaca actgtgggaa 1021 ggctgggaag gttaagaaac aacagaggtg gacctccaaa aacatagagg catcacctga 1081 ctgcacaggc aatgaaaaac catgtgggtg atttccagca gacctgtggt attggccagg 1141 aggcctgaga aagcaagcac gcactctcag tcaacatgac agattctgga ggataaccag 1201 caggagcaga gataacttca ggaagtccat ttttgcccct gcttttgctt tggattatac 1261 ctcaccagct gcacaaaatg cattttttcg tatcaaaaag tcaccactaa ccctccccca 1321 gaagctcaca aaggaaaacg gagagagcga gcgagagaga tttccttgga aatttctccc 1381 aagggcgaaa gtcattggaa tttttaaatc ataggggaaa agcagtcctg ttctaaatcc 1441 tcttattctt ttggtttgtc acaaagaagg aactaagaag caggacagag gcaacgtgga 1501 gaggctgaaa acagtgcaga gacgtttgac aatgagtcag tagcacaaaa gagatgacat 1561 ttacctagca ctataaaccc tggttgcctc tgaagaaact gccttcattg tatatatgtg 1621 actatttaca tgtaatcaac atgggaactt ttaggggaac ctaataagaa atcccaattt 1681 tcaggagtgg tggtgtcaat aaacgctctg tggccagtgt aaaagaaaa (SEQ ID NO: 619)

SEQ ID NO: 620 NM_024587 Homo sapiens transmembrane protein 53 (TMEM53), mRNA 1 ggctggagac ccgtgctctg ggccggcgcc ttcaccatgg cctcggcaga gctggactac

61 accatcgaga tcccggatca gccctgctgg agccagaaga acagccccag cccaggtggg 121 aaggaggcag aaactcggca gcctgtggtg attctcttgg gctggggtgg ctgcaaggac 181 aagaaccttg ccaagtacag tgccatctac cacaaaaggg gctgcatcgt aatccgatac 241 acagccccgt ggcacatggt cttcttctcc gagtcactgg gtatcccttc acttcgtgtt 301 ttggcccaga agctgctcga gctgctcttt gattatgaga ttgagaagga gcccctgctc 361 ttccatgtct tcagcaacgg tggcgtcatg ctgtaccgct acgtgctgga gctcctgcag 421 acccgtcgct tctgccgcct gcgtgtggtg ggcaccatct ttgacagcgc tcctggtgac 481 agcaacctgg taggggctct gcgggccctg gcagccatcc tggagcgccg ggccgccatg 541 ctgcgcctgt tgctgctggt ggcctttgcc ctggtggtcg tcctgttcca cgtcctgctt 601 gctcccatca cagccctctt ccacacccac ttctatgaca ggctacagga cgcgggctct 661 cgctggcccg agctctacct ctactcgagg gctgacgaag tagtcctggc cagagacata 721 gaacgcatgg tggaggcacg cctggcacgc cgggtcctgg cgcgttctgt ggatttcgtg 781 tcatctgcac acgtcagcca cctccgtgac taccctactt actacacaag cctctgtgtc 841 gacttcatgc gcaactgcgt ccgctgctga ggccattgct ccatctcacc tctgctccag 901 aaataaatgc ctgacacctc cccacaacct gcaatctgtc gggcactctt ctcgttcaac 961 tccctgtagc cctttgggac tttgcggtcc cctaagtaga aaattcctat gggcctgtct 1021 cctgggggcc tctgtctgct ggtggtctgc ttaccacaga atcctaaggg gcaggagtgc 1081 ctgggcatgt gtctgtggga gccttgcagt cagttgtgtt tggacaagtg caacagtcag 1141 gctgctgatt cctgtggcat gcaggctgta gaggttgaca aatggagggg ggtgttgagg 1201 gtgagcccta gttgattttt taaaatttaa actctggtaa gaacatttaa tatgagacct 1261 actctctttt tttctttact tatttattta tctatttatt tcaagacagg gtctcactct 1321 gtcacctagg ctggggtgca atggtgcaat catggctcac tgcagcctca acctcccagg 1381 ctcaagtgat cctcccacct cagcctccca aagtgctagg attacaggca tgagtcaccg 1441 cgcctggcca agatcaccta acaaaattgt aagtgtgtac gatacttaaa atttaagaga 1501 ttatgtgcac ggcagacctc tagaactgaa tagtcttgca tcttgcataa ttcagaactt

1561 catcatcttg cataactgaa actttgtgcc tgttaccaga aaaaaaaaaa aaaa (SEQ ID NO: 620)

SEQ ID NO: 621 AI954412 Homo sapiens cDNA clone IMAGE:2490992 3-, mRNA sequence

1 tttttttttt UtULlHt tttttttttt ttacacactc attcaaacct ttattaagta 61 cctaccatat gtacaatact gttccaaata ttaagggaat acaaagatga atttttaaat 121 ggggccaaat cccaaggggt ttacaatata ataatagtaa aaagtaattt aacacgaact 181 gtgggaagaa aattacaagt aaacatttgc ccctgatgga gaaaaatgac cttattttta 241 aatttaaagc ataaattgcc agt (SEQ ID NO: 621)

SEQ ID NO: 622 AI393309 Homo sapiens cDNA clone IMAGE:2108789 3- similar to WP:ZK909.3 CEl 5477 GUANOSINE-3-,5--BIS(DIPHOSPHATE)- PYROPHOSPHOHYDROLASE LIKE ;, mRNA sequence

1 aaaccttaac ccagagttat ttttattttc cagaacgtgt taggaactag tacttaaata 61 atctcaagtc cctgaggggc cagagatccc accatgcaaa atagcaaaca gacccaagac 121 ttggggagag gcggtgagtg catcagaaat ggatgggtac atctgattcc caccacgcgg 181 ggctcagctt agttagcagg agaccttcag actgagaaaa aatgcaagtc tttttttggc 241 ctctaatatc tgggaaggat ggagggagct caggagacac agaaaagatg gcgtatgaat 301 cctgtccggc ctgaacgagg ctggagttgt gcctctggat agcttcaagc actgatcaga 361 ttgtcagccc ccgctgcttg aacagatgct ttagagcctc ttccagttgc cggtttgttc 421 cctgaagccc cttcaccacc tgcgctgccc actcgaagta ttcctggact cgatgttctg 481 accatccctc tggggtgcag cgattcaggt ccctcagatt gtacagcttg tctgccagct 541 tcaccagttt ggccccgggg ctactgtggn gcgcttggct cacctgcagc ctctntctct 601 ccagcttggg cagagtcttg tcatctggta cctnctncac caggcgccgc acttgtgccc 661 caaagtgtag cttcaccctc atccaggtgg tgtctgtgtc ctccaccgtg tcatggagca 721 gggc (SEQ ID NO: 622)

SEQ ID NO: 623 NM__030581 Homo sapiens WD repeat domain 59 (WDR59), mRNA

1 cggggctgat tctctggctg tgtggggcgc acggtcccgg gatactgggg acggcggggt 61 gggagggcgc cgtcctgggg ccgcggcggc cgggcggggg agatggcggc gcgatggagc 121 agcgaaaacg tggttgtaga gttccgtgac tcccaggcaa ctgcgatgtc tgtggactgt 181 cttgggcagc atgcagtgct ttctggccgc agattcttat acatcgtcaa tctagatgcc 241 cctttcgaag gtcaccgaaa gatctctcgc cagagcaaat gggacattgg agctgtgcag 301 tggaatcctc atgacagctt tgcacactat tttgcggctt cgagtaacca acgagtagac 361 ctttacaagt ggaaagacgg cagtggggaa gttggcacaa ccttacaagg ccacactcgt 421 gtcatcagcg acttggactg ggcggtgttt gagcctgacc tcctggttac cagctctgtg 481 gacacctaca tctacatttg ggatatcaaa gacacaagga aacctactgt tgcactgtct 541 gctgttgcgg gtgcctccca ggtcaaatgg aataaaaaaa atgctaactg ccttgccacc 601 agccatgacg gcgatgtgcg gatatgggat aagaggaaac ccagtacagc agtggaatat 661 ctagccgccc acctctccaa aatccatggc ctggactggc acccagacag cgagcacatt 721 cttgctacct ccagtcaaga caattctgtg aagttctggg attaccgcca gcctcggaaa 781 tacctcaata ttcttccttg ccaggtgcct gtctggaagg ccagatacac acctttcagc 841 aatggattgg tgactgtgat ggttccccag ctgcggaggg aaaacagcct tctcctgtgg 901 aatgtctttg acttgaacac cccagtccac accttcgtgg ggcatgatga tgtggtcctg 961 gagttccagt ggaggaagca gaaggaaggg tccaaggact atcaactggt gacgtggtcc 1021 cgggatcaga ccttgagaat gtggcgggtg gattcccaga tgcagaggct ttgtgcaaat 1081 gacatattag atggtgttga tgagttcatt gagagtattt cccttctgcc ggaacctgag 1141 aagaccctgc acactgaaga tacagatcac cagcacactg caagccatgg ggaggaagaa 1201 gccctaaaag aagatccccc tagaaatctc ctggaagaga ggaaatcaga tcaactgggg 1261 ctgcctcaga ccttgcagca ggaattctcc ctgatcaatg tgcaaatccg gaatgtcaat 1321 gtggagatgg atgcggcaga caggagctgc acagtgtctg tgcactgcag caaccatcgt 1381 gtcaagatgc tggtgaagtt ccctgcacag tacccaaaca acgccgcccc ttccttccag

1441 tttattaacc ccacaaccat cacatccacc atgaaagcta agctgctgaa gatcctgaag 1501 gacacagccc tgcagaaagt gaagcgtggc cagagctgcc tggagccctg cctgcgccag 1561 ctcgtctcct gccttgagtc ctttgtgaac caggaagaca gcgcttccag caacccgttt 1621 gcactcccca actctgtcac tcccccctta ccgacgtttg cgcgggtgac cacggcttac 1681 gggtcgtacc aggacgccaa cattcccttt cctaggactt ctggggccag gttctgcgga 1741 gcaggttacc tggtatattt cacaaggccc atgacaatgc atcgggcggt gtctcccaca 1801 gagcctactc cgagatctct ctcagccttg tctgcttatc acactggctt gatcgcgccc 1861 atgaagatcc gcacagaggc ccctgggaac cttcgtttat acagtgggag ccccactcgc 1921 agcgagaaag agcaggtctc catcagctcc ttctactaca aggagcggaa atcaagacga 1981 tggaaaagta agcgtgaggg atcagactct ggcaatcgac agatcaaggc tgctgggaaa 2041 gtcatcatcc aggatattgc ttgcctcctg cctgttcaca aatcgctggg agagctgtac 2101 atattgaatg tgaatgatat tcaggaaaca tgtcagaaga atgccgcctc tgccttgctc 2161 gttggaagaa aggatcttgt ccaggtttgg tcgctggcta cggtagctac agatctttgc 2221 cttggtccga aatctgaccc agatttggaa acaccctggg ctcgacatcc atttgggcgg 2281 cagctgctgg agtccctgtt ggctcactat tgccggctcc gggatgttca gacactggcg 2341 atgctctgta gcgtgtttga agcccagtct cggcctcagg ggctaccaaa cccctttggg 2401 ccttttccta accgttcttc taatcttgtg gtgtcccata gtcgatatcc tagctttacc 2461 tcttctggtt cctgctccag tatgtcagac ccagggctca acactggcgg ctggaacata 2521 gcgggaagag aggcagagca cttgtcctcc ccttggggag aatcctcacc agaagagctc 2581 cgctttggga gtctgaccta cagtgatccc cgtgagcgag aacgcgacca gcatgataaa 2641 aataaaaggc tcctggaccc cgccaatacc cagcaatttg atgactttaa gaaatgctat 2701 ggggaaatcc tctaccgttg gggtctgaga gagaagcgag ctgaagtgtt gaagtttgtc 2761 tcctgtcctc ctgaccctca caaagggatc gagttcggcg tgtactgcag ccactgccgg 2821 agtgaggtcc gtggcacgca gtgtgccatc tgcaaaggct tcacgttcca gtgtgccatc 2881 tgtcacgtgg ctgtgcgggg atcgtccaat ttctgcctga cctgtgggca cggtggccac

2941 accagccaca tgatggagtg gtttcggacc caggaggtgt gtcccaccgg gtgtgggtgc 3001 cactgcctgc ttgaaagcac tttctgaacc tacagaagtt gggtattgtc tgaaatccca 3061 gaggacccat aagtgccggt gacaagctgt ctgtcagggg agaggctcca gaacctgggt 3121 tcgtccccag tgagaccgga ggatgatccc ccaaggactg cgcagcatca gctcttggtg 3181 ggcctctgcc ttctcttctg tttggccacc tggtgtggat gtcactgtgt gaagataagg 3241 acagaagtgc agagctgcgc tttgtgtgtt gtctatgtcg gctgagctac caaggtggaa 3301 gttttcatgg agaaaagcac ctggctccag ggccagtgtt acagtgttac cctgtaaggt 3361 gttagcctta aaccaccgag cagcgttctc ttgatgccag tgcagagacc agagtcagat 3421 gcccgaggac agtgggtagg aatttcatca acaaatggac ctatggcatc atggctttag 3481 aagctggtac atttactgag ctgatggaca gtggccttct aaaatatgac acttaaattg 3541 taaatatgca ctgtacttaa ggattcttaa gatgtatttt tttgttattt ctcctccagc 3601 tgctatccct tggctaataa aattctagta atttgaaaaa aaaaaaaaag agagaaagtt 3661 aaaaaaaaaa aaaaaaaa (SEQ ID NO: 623)

SEQ IDNO: 624 NM_017585 Homo sapiens solute carrier family 2 (facilitated glucose transporter), member 6 (SLC2A6), mRNA

1 ctgagcgccc tccgctcgcc ccgagagaga cccggccatg caggagccgc tgctgggagc 61 cgagggcccg gactacgaca ccttccccga gaagccgccc ccgtcgccag gggacagggc 121 gcgggtcggg accctgcaga acaaaagggt gttcctggcc accttcgccg cagtgctcgg 181 caatttcagc tttgggtatg ccctggtcta cacatcccct gtcatcccag ccctggagcg 241 ctccttggat cctgacctgc atctgaccaa atcccaggca tcctggtttg ggtccgtgtt 301 caccctggga gcagcggccg gaggcctgag tgccatgatc ctcaacgacc tcctgggccg 361 gaagctgagc atcatgttct cagctgtgcc gtcggcggcc ggctatgcgc tcatggcggg 421 tgcgcacggc ctctggatgc tgctgctcgg aaggacgctg acgggcttcg ccggggggct 481 cacagctgcc tgcatcccgg tgtacgtgtc tgagattgct cccccaggcg ttcgtggggc 541 tctgggggcc acaccccagc tcatggcagt gttcggatcc ctgtccctct acgcccttgg

601 cctcctgctg ccgtggcgct ggctggctgt ggccggggag gcgcctgtgc tcatcatgat 661 cctgctgctc agcttcatgc ccaactcgcc gcgcttcctg ctctctcggg gcagggacga 721 agaggccctg cgggcgctgg cctggctgcg tgggacggac gtcgatgtcc actgggagtt 781 cgagcagatc caggacaacg tccggagaca gagcagccga gtatcgtggg ctgaggcacg 841 ggccccacac gtgtgccggc ccatcaccgt ggccttgctg atgcgcctcc tgcagcagct 901 gacgggcatc acgcccatcc tggtctacct gcagtccatc ttcgacagca ccgctgtcct 961 gctgcccccc aaggacgacg cagccatcgt tggggccgtg cggctcctgt ccgtgctgat 1021 cgccgccctc accatggacc tcgcaggccg caaggtgctg ctcttcgtct cagcggccat 1081 catgtttgct gccaacctga ctctggggct gtacatccac tttggcccca ggcctctgag 1141 ccccaacagc actgcgggcc tggaaagcga gtcctggggg gacttggcgc agcccctggc 1201 agcacccgct ggctacctca ccctggtgcc cctgctggcc accatgctct tcatcatggg 1261 ctacgccgtg ggctggggtc ccatcacctg gctgctcatg tctgaggtcc tgcccctgcg 1321 tgcccgtggc gtggcctcag ggctctgcgt gctggccagc tggctcaccg ccttcgtcct 1381 caccaagtcc ttcctgccag tggtgagcac cttcggcctc caggtgcctt tcttcttctt 1441 cgcggccatc tgcttggtga gcctggtgtt cacaggctgc tgtgtgcccg agaccaaggg 1501 acggtccctg gagcagatcg agtccttctt ccgcatgggg agaaggtcct tcttgcgcta 1561 ggtcaaggtc cccgcctgga gggggccaaa cccccagtgg ctgggcctct gtgttggcta 1621 caaacctgca ccctgggacc aagaggcagc agtcatccct gccaccagcc agagcacagg 1681 aagagcagtg tgatggggcc tcagcagcgg gtgcccctgg ctcgggacag gtagcactgc 1741 tgtccagcca cagccccagc ccaggcagcc cacagtgctg cacgtagcca tgggccgcag 1801 gagtgcatac aaccctgcat ccagggacac ggccctgctg ggtgacctca ggcctagtcc 1861 ctttcccttg cgtgaaggac acgccccaca gaaggctacg gggaggactg agaggacagg 1921 gctggaggca gccaagtaac gtagtcatat catcgcgctc tgatctggtg gcatctggct 1981 gtgcaaggaa gacccggctt tgccctcaca agtcttatgg gcaccacagg gaacatcctg 2041 gacttaaaaa gccagggcag gccgggcaca gtggctcacg cctgtaatcc cagcactttg

2101 ggaggccaaa gcaggtggat tacccaaggc caggagttca agaccagcct ggccaacatg 2161 gtgaaacccc gtctctacta aaaaatacaa aaaagctggg tgtggtggca cacacccgta 2221 gttccagcta cttgggaggc tgaggcagca ttgcttgaac ccgggaggtg gaggctgcaa 2281 tgagctgaga tcatgccatt gcactccagc ctgggcaacg agagtgaaac tccgtcccca 2341 ccccctgcca aaaaaaaaaa aaaaaaagcc agggcaaagg acctggcgtg gccacttcct 2401 cctgccccag cccaacctct gggaacaggc agctcctatc tgcaaactgt gttcaccctt 2461 ttgtaaaaat aaaggaactg gacccgt (SEQ ID NO: 624)