Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
EQUAL-ABUNDANCE TRANSCRIPT COMPOSITION AND METHOD
Document Type and Number:
WIPO Patent Application WO/1988/007585
Kind Code:
A1
Abstract:
A transcript composition containing approximately equal molar abundance of all of the transcript species produced by a selected genomic structure, such as total genomic DNA or isolated chromosome or chromosome fragments. Also disclosed are methods of using the composition, or compositions derived from control and test cell types, to (a) identify mRNA transcripts which are unique to one of the two cell types, (b) identify mRNA transcripts which are produced in significantly different quantities in control and test cells, and (c) identify the protein products of the equal-abundance transcripts.

Inventors:
WEISSMAN SHERMAN M (US)
REYES GREGORY R (US)
Application Number:
PCT/US1988/001050
Publication Date:
October 06, 1988
Filing Date:
April 01, 1988
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
GENELABS INC (US)
International Classes:
C12N15/10; C12Q1/68; C12Q1/6809; (IPC1-7): C12Q1/68; C07H15/12; C12N15/00
Foreign References:
US4394443A1983-07-19
US4675285A1987-06-23
Other References:
MIAMI WINTER SYMPOSIUM, Vol. 16, issued 1979 (Philadelphia Pa. USA), (PERRY et al.), "The Processing of Messenger RNA and the Determination of its Abundance", pages 187-206.
PROC. NATL. ACAD. SCIENCE, Vol. 79, issued July 1982 (Washington, D.C.), (KRAUS et al.), "Purification of low-abundance messenger RNAs from rat liver by polysome immunoadsorption", pages 4015-4019.
GENE, Vol. 35, issued 1985, (Amsterdam, Holland), (KOWALSKI et al.), "Vectors for the direct selection of cDNA clones corresponding to mammalian cell mRNA of low abundance", pages 45-54.
BENJAMIN LEWIN, GENES, second edition, Published 1985 by John Wiley & Sons (New York USA), pages 304-311.
CHEMICAL ABSTRACTS, Vol. 102, No. 21, issued 27 May 1985, (Columbus, Ohio, USA), (HORWICH et al), "Strategies for the molecular cloning of low abundance messenger RNAs", see page 154, Col. 2, the abstract No. 180176t, Mol. Basis Lysosomal Storage Disord. 1984, 365-85 (Eng).
CHEMICAL ABSTRACTS, Vol. 150, No. 17, issued 27 October 1986, (Columbus, Ohio, USA), (RHYNER et al.), "An efficient approach for the selective isolation of specific transcripts from complex brain mRNA populations", page 358 Col. 2, the abstract No. 149268n, J. Neurosci. Res 1986 16(1) 167-81 (Eng).
CHEMICAL ABSTRACTS, Vol. 105, No. 25, issued 22 December 1986, (Columbus, Ohio, USA), (SHELLY et al.), "The use of complementary RNA and Sl nuclease for the detection and quantitation of low abundance mRNA transcripts", page 396 Col. 2, the abstract No. 222148e, Bio Techniques 1986, 4(5), 434-8 (Eng).
CHEMICAL ABSTRACTS, Vol. 107, No. 17, issued 26 October 1987, (Columbus, Ohio, USA), (KIM et al.), "Characterization of cytoplasmic RNA containing mRNA encoding the 70,000-dalton heat stress protein", page 167, the abstract No. 148339r, Korean J. Genet. 1986, 8(4) 221-9 (Eng).
Download PDF:
Claims:
IT IS CLAIMED:
1. A transcript composition derived from a cellular genomic structure containing a plurality of genes G. which are each active in a defined cell type in producing messenger RNA transcripts, at various levels of messenger RNA abundance, said composition comprising a transcript species T. for each G. gene. and in substantially equal molar abundance.
2. The composition of claim 1, wherein the transcript species are derived from the entire genome of a given cell type.
3. The composition of claim 1, wherein the transcript species are derived from a selected chromosome or chromosome fragment of a given cell type in a defined state.
4. The composition of claim 1. wherein the transcript species are selected from the group consisting of messenger RNA transcripts, and single and doublestrand cDNA.
5. The composition of claim 4. wherein the transcript species are cDNAs which are cloned aε inεerts in a suitable cloning vector.
6. The composition of claim 1. wherein the transcript species are derived from genes G. whose messenger RNA transcripts are all within a defined size range.
7. The composition of claim 1. wherein the transcript species are all subεtantially equal size nucleic acid fragments derived from the 3'end fragments of said messenger RNA transcripts.
8. The composition of claim 1. wherein the transcript specieε are εubstantially equal size nucleic acid fragments derived from the 5'end fragments of the messenger RNA transcripts.
9. The composition of claim 1, wherein the transcript species are fulllength cDNAs derived from fulllength messenger RNAs.
10. A method of preparing a composition of transcripts species which are (a) each derived from one of a plurality of genes G.. in a cellular genomic structure, all of which genes are active in a defined cell type in producing cellular messenger RNA transcripts, at various levels of messenger RNA abundance, and (b) present in substantially equal molar abundance, said method comprising providing a collection of fragments of the cellular genomic structure, providing cellular transcript species derived from the differentabundance messenger RNA transcriptε, mixing the genomic fragments with a molar excess, of the cellular transcript species under conditions which promote hybridization between the fragments and homologous transcript species. isolating the fragment/transcript species hybridization products formed by said mixing, and recovering the transcript species from the hybridization products.
11. The method of claim 10. wherein the collection of genomic fragment provided has been substantially depleted of repeatsequence genomic fragments, the genomic fragments are labeled with an affinity label which permits binding to a solid support, and said isolating includes binding the fragment/transcript species hybrids to the solid support.
12. The method of claim 10. for use in preparing a composition in which all of the transcript species are within a selected size range, wherein providing the cellular transcript species includes obtaining total cellular transcripts, and fractionating the total transcripts into the selected size range, prior to said mixing.
13. The method of claim 10. for use in preparing a composition in which all of the transcript species have substantially the same size, wherein providing the cellular transcript specieε includes obtaining total cellular transcripts, fragmenting the transcripts into a predetermined size range and isolating those transcript fragments containing only polyA regions, prior to said mixing.
14. The method of claim 10. for use in prepar.ing a composition in which all of the transcript species have substantially the same size, wherein providing the cellular transcripts includes obtaining total cellular transcripts, fragmenting the transcripts into a predetermined size range and isolating those fragments containing only 5'end regions.
15. The method of of claim 10. for use in preparing a composition in which all of the transcript species have substantially the same size, wherein providing the cellular transcript specieε includes obtaining total cellular messenger RNAs. attaching each RNA to a polydT sticky end of a linearized vector, using the attached transcript to produce a duplex DNA copy of the transcript attached to the vector at its 3' end. circularizing the vector to attach the 5' end of the duplex DNA copy adjacent a selected marker and a known restriction site in the vector, fragmenting the vector into fragments smaller than about 1.000 base pairs, cutting the vector fragments at such known restriction site, and isolating those fragments containing the selected marker.
16. A method of detecting messenger RNA transcripts produced by one or more genes G in a selected genomic structure in a test cell type, but not by the genes contained in the corresponding genomic structure in a control cell type, where both the control and test cell genomic εtructures include at least about 10 genes which are active in producing messenger RNA transcripts, at different abundance levels, said method comprising (a) providing a composition of control cell transcript species derived from the control cell genomic structure genes G , and present in the composition in substantially equal abundance. (b) providing a composition of test cell transcript species derived from the test cell genomic structure genes G , and present in the test cell composition in substantially equal abundance. hybridizing the control cell transcript composition with the test cell transcript composition, and identifying those test cell transcript species which do not hybridize with control cell transcript species.
17. The method of claim 16, for use in detecting substantially all messenger RNA transcripts which are produced by the test cell, but not the control cell genomic structure, wherein the control cell and test cell transcript compositions each contain at least about one transcript species for each gene which is expressed by such control cell and test cell genomic structure, respectively, and in substantially equal molar abundance.
18. The method of claim 16. wherein the control cell transcript species are labeled with an affinity label which permits binding of the transcripts to a solid support, said hybridizing iε carried out in the presence of excess molar concentration of control cell transcript species, and said identifying includes contacting the hybridized species with the solid support, and isolating hybridized species which do not bind to the support.
19. The method of claim 16. wherein the test cell transcript species ace spotted individually on a filter, the control cell transcripts are radiolabeled. said hybridizing is carried out by hybridizing the radiolabeled species with the test cell species on the filter, and said identifying includes identifying those filter spots which do not contain radiolabel.
20. A method of detecting differences in the abundance of messenger RNA transcripts produced by one or more genes Gt in a selected genomic structure in a test cell type, with respect to the genes G contained in the corresponding genomic structure in a control cell type, where both the control and test cell genomic structures include at least about 10 genes which are active in producing messenger RNA transcripts, at different abundance levels, said method comprising (a) providing a composition of control cell transcript species derived from the control cell genomic structure genes G , and present in the control cell composition in substantially equal abundance. (b) transferring the composition onto two replica filters in which individual filter spots represent individual species of the composition, (c) obtaining radiolabeled transcript species from the control and test cells, respectively, (d) hybridizing the labeled control cell transcript species with one of the replica filters, and the labeled test cell transcript species, with the other filter. (f) producing autoradiographs of each of the filters, and (g) comparing the density of radiolabel associated with each spot on the test cell filter with that on the control cell filter, to determine the relative amounts of messenger RNA transcript associated with a gene in the genomic structure.
21. A method of preparing a clonal library composition of transcripts species which are (a) each derived from one of a plurality of genes G.. in a cellular genomic structure, all of which genes are active in a defined cell type in producing cellular messenger RNA transcripts, at various levels of messenger RNA abundance, and (b) present in substantially equal molar abundance, said method comprising providing cellular transcript specieε derived from the differentabundance messenger RNA transcriptε, preparing polynucleotide species which are homologous to the cellular transcript specieε. and present in substantially the same molar abundance as the homologous transcript species, mixing the polynucleotide species with the cellular transcript species under conditions which promote hybridization between homologous polynucleotide and cellular transcript species. carrying out the hybridization until the transcript species which remain unhybridized all have substantially the same molar abundance. separating hybridized from nonhybridized species, and cloning the separated, nonhybridized species.
22. The method of claim 21. wherein the cellular transcript species which are provided and the polynucleotide species which are prepared are all within a defined size range less than about 1.000 nucleotides.
Description:
EOUAL-ABUNDANCE TRANSCRIPT COMPOSITION AND METHOD

1. Field of the Invention The present invention relates to a compositio of mRNA or DNA transcript species which are present in the composition in substantially equal molar abundance, and to methods of preparing and using the composition.

2. References

Anderson. M.L.M.. and Young. B.D.. in Nucleic Acid Hybr idization: A Practical Approach. B.D. Hames and S.J. Higgins. eds.. IRL Press. Oxford, (1985), 73.

Bodmer, W. Cold Spring Harbor Symposia on Ouantita- tive Biology. (1986).

Britten. R.J.. and Davidson. E.H.. in Nucleic Acid Hy¬ bridization. A Practical Approach. B.D. Hames and S.J. Higgins, eds., IRL Press, Oxford, (1985). 3.

Britten, R.J., and Kohne, D.E., Science (1968). 161:529

Davidson. E.. in Gene Activity in Early Development. Academic Press, N.Y. (1976).

Engleman, E.. Human Hybridomas and Monoclonal Antibodies. E. Engleman, S.K.H. Foung, J. Larrick, and A. Raubitschek. eds.. Plenum Press, (1985). Hames, B.D., and Higgins, S.J., eds.. Nucleic Acid Hy¬ bridization: A Practical Approach. IRL Press, Oxford (1985) .

Huynh, T.V.. et al. in "DNA Cloning, Volume 1", ed. D.M Glover. Washington. D.C.: IRL Press, 1985, Chapter 2.

Kaplan, M.P., et al, J Immunol Methods (1979) 5_:131.

Kauffman. R.S., and Fields, B.N., in Fundamental Virol¬ ogy. B.N. Fields and D.M. Knipe, eds.. Raven Press. New York. (1986). 161. Lewin, B.. in Gene Expression 2. John Wiley and Sons. 2nd ed. (1980) .

Maniatis, T. , et al. Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory (1982), 280.

Manley, J.L., et al. Proc Nat Acad Sci (USA) (1980) 77:3855.

Messing, J. in Methods in Enzymology. 10: 20-78 (1983). -

Okayama. H. , et al, Mol Cell Biol (1982) 2.:161.

Stanbury, J.B., et al, eds.. The Metabolic Basis of In¬ herited Disease. McGraw-Hill Book Company, New York, (1978).

Van Beverow, C. and ϊ.M. Ver a, Curr Top Microbio Immunol (1986). 123.73.

3. Background of the Invention

The ability to identify differences in low-abundance messenger RNAs (mRNAs) between similar or dissimilar cell types is an important area of study in human genetics. One major application is in understanding and predicting certain disease states. For example, an absent or altered mRNA coding for a specific protein in a particular cell type is often the direct cause of a hereditary disease, while the presence of an added mRNA species may signal the beginning of malignant transformation or the latent presence of an otherwise undetectable infectious agent. Although some hereditary diseases—such as sickle cell anemia, other hemoglobinopathies and the thalassemias--are due to changes in the nature or presence of high-abundance mRNAs, a large percentage of hereditary diseases have

been shown to be or are likely to be caused by the absence of or alterations in specific proteins coded fo by low-abundance mRNAs. These include Lesch-Nyhan Syndrome, Hunter's Syndrome, Hurler's Syndrome, Tay-Sachs Disease and adenosine deaminase deficiency, among others (Stanbury).

It is also known that several oncogenes exist whose aberrant activation leads to malignant transformation (Van Beverow), and the detection of changes in low-abundance mRNA species will have - important applications in the early detection of such transformation.

Another important application of low-abundance mRNA analysis is in the diagnosis and study of low grade, slow and latent infections with viruses or other agents, especially in a tissue containing different cel types, in which only one cell type may be infected. In particular, the transcription of virus-specific mRNA(s) may be the first indication of reactivation of a latent viral infection (Kauffman) .

Other important applications include, but are not limited to, the study of gene expression changes during cell activation, embryonic development or cell cycle progression, and between similar cell types of th same or closely related species.

The major problem in the detection and analysi of low-abundance mRNAs, or complementary DNAs (cDNAs) derived-.from such mRNA species, is interference caused by numerous high-abundance mRNAs present in the cell. In any given cell type, there may be 10,000 - 30,000 distinct mRNA species (Davidson), and these can range i concentration from several hundred thousand molecules per cell, for high-abundance species, to only a few molecules per cell, for low-abundance species.

A number of nucleic acid hybridization techniques aimed at isolating and analyzing mRNAs or their corresponding cDNAs have been developed heretofore. These techniques rely on the ability of denatured nucleic acids to reassociate in a sequence specific manner by Watson-Crick base pairing interactions (Britten) . The rate at which renaturation occurs is determined by the sequence complexity of the sample, the absolute and relative concentrations of- various species, the length of DNA fragments and the conditions under which the reaction takes place (Hames).- In one common hybridization method, a selected gene probe cDNA is labeled and added in single-strand form to a saturating mixture of cellular transcripts. The presence of the probe-related transcript species can be assessed by the amount of labeled probe cDNA incorporated into double-strand material. This method is generally suitable for high- and moderate-abundance transcripts, but lacks the sensitivity required for the identification and analysis of low-abundance mRNAs due to high background levels. Another limitation of this method is the requirement for large quantities of total cellular mRNA, which may be difficult to provide, particularly in clinical specimens. Moreover, the technique requires the availability of mRNA-specific probes, which means it cannot be used to study the transcript-related basis of genetic defects or other cellular conditions for which the identity of the relevant mRNA species i * s unknown. Filter hybridization to a conventional cDNA library may be used to identify differences in low-abundance mRNAs (Anderson). However, the one million or more library clones needed to insure the presence of virtually all low-abundance mRNAs

effectively eliminates the utility of this technique f screening programs.

Nucleic acid subtraction techniques have also been proposed for use in studying differences in low-abundance mRNAs between related cell types

(Maniatis). Here transcript preparations from two different cell types are hybridized to completion, wit the unhybridized remainder being examined for the presence of species of interest. In practice, this method will also pick up transcript species whose concentrations in the two samples are different, and transcript species which are present because the hybridization reaction does not, in fact, go to completion. Therefore, the method results in a wide spectrum of transcript species with different abundanc levels, and each of the species must be rechecked against the original mRNAs because of incomplete subtraction. Such subtraction techniques also lack th requisite sensitivity for the reliable detection of specific low-abundance mRNAs. and are too cumbersome a difficult for use in routine clinical applications.

4. Summary of the Invention

It is therefore a general object of the invention to provide a transcript composition which substantially overcomes above-discussed prior-art limitations in isolating and analyzing low-abundance transcripts in cells.

A more specific object of the invention is to provide a composition which can be used to detect changes in the abundance and/or presence of low-abundance mRNA transcripts which occur at different cellular states and/or differentiate one cell type, or group of cells, from another cell type or group.

Still another object of the invention is to provide methods for producing such a composition.

Another general object of the invention is to provide methods for analyzing cell transcript composition, abundance, changes and/or cell-specific differences, particularly as they relate to low-abundance transcript species.

A related object is to provide such methods which are suitable for clinical application, such as in detecting markers of genetic diseases in clinical specimens.

The composition of the invention is derived from a cellular genomic structure, such as an entire cellular genome, an isolated chromosome, or a portion of chromosome, from a defined cell type or cell group. The genomic structure contains a plurality of genes G. which are active in the defined cell type in producing messenger RNA (mRNA). at various levels of mRNA abundance. The composition includes a transcript species T. for each G. gene, and in substantially equal molar abundance. The transcript species may be an mRNA, or fragment thereof, or single- or double-strand cDNA, or homologous genomic DNA fragments, and the transcript species may be cloned in a suitable cloning vector.

In one embodiment, the transcript species are derived from mRNA transcripts which are within a selected size class of mRNAs, e.g., 500-2.000 base pairs. In other and preferred embodiments, the transcript species are all substantially equal sizes, typically about 300-800 base pairs, and are derived from either the 3' or 5' end regions of all of the mRNAs produced by the selected genomic structure.

The composition is prepared, according to a preferred method of the invention, by providing (a) sequences from fragments of the cellular genomic structure, in which highly repetitive genomic fragments have been substantially removed, and (b) the different-abundance cellular transcript species produce by the genomic structure. The genomic fragments are hybridized with a large molar excess of the transcript species, yielding fragment/transcript-species hybrids which can be isolated to yield the desired composition. The isolation preferably involves labeling the fragment with an affinity label, such as biotin, which permits binding to a solid support, such as an avidin-coated support, and selectively retaining hybridized transcrip species by affinity chromatography.

Alternatively, an equal-abundance transcript composition can be obtained by direct hybridization of equal molar amounts of sense and anti-sense coding strands of total cellular transcript species. The opposite-strand species used in the hybridization reaction are preferably equal-size fragments derived either from the 3'- or 5'-ends of full-length transcripts. This approach is based on the more rapid (second-order) annealing rate of higher-concentration transcript species, which favors more rapid hybridization of the more abundant strands, and particularly so when all of the strands have the same length.. At a selected annealing reaction time (C t value), the abundance of each non-annealed species will be substantially equal. When this point is reached, th non-annealed species (which are present in roughly equa abundance) are separated from annealed duplex DNA by hydroxyapatite chromatography.

The invention also includes a method of detecting mRNA transcripts which are produced by the genes of a selected genomic structure in either a test or control cell type, but not by the corresponding genes contained in the other cell type. The method uses equal-abundance transcript compositions, of the type described above, derived from both the test and control cells. Equal-abundance transcripts from one of the two cell types are labeled, e.g., by photobiotinylation, and hyhridized in molar excess with the equal-abundance species from the other cell type. Unique unlabeled transcript species are isolated by affinity chromatography.

Alternatively, unique species can be identified by transferring the library equal-abundance clones from the test (or control) cell onto a filter, and hybridizing with the radiolabeled equal-abundance species form the control (or test) cell. Those clones on the filter which do not show labeled (hybridized) DNA are identified as unique to the test (or control) cell. The invention further includes a method of detecting differences in the abundance of mRNA transcripts produced by a test cell type with respect to a control cell type. Here a control cell transcript composition of the type described above is plated on to replica filters. One of these filters is hybridized with total radiolabeled control cell transcript species, and the other, with total radiolabeled test cell transcript species. Autoradiographs of the two filters allow the density of film spots at corresponding filter positions to be compared, providing a measure of the relative transcript abundance associated with each transcript species which is common to both the control and test cells.

These and other objects of the invention will be more fully apparent when the following detailed description of the invention is read in conjunction wit the accompanying drawings.

Brief Description of the Drawings Figure 1 is a flow diagram of the steps used i preparing the equal-abundance transcript composition of the invention; Figures 2A-2C illustrate three vector constructs useful in practicing the invention;

Figure 3 illustrates methods for preparing equal-abundance libraries derived from selected-size classes of mRNAs, where the transcript inserts are carried (A) in an efficient transcription vector and (B), in a vector which can be made single-stranded;

Figure 4 illustrates methods for preparing the equal-abundance libraries of Figure 3. where the amount of transcript material used in preparing the libraries is first amplified by an initial cloning step; Figure 5 outlines steps in preparing equal-abundance cDNA libraries of 3 '-end fragment transcript species according to one method of the invention, where an intermediate cloning step is not (A) and is (B) required;

Figure 6 outlines steps for preparing equal-abundance cDNA libraries of either 5'-end (A) or 3 '-end (B) fragments according to another method of the invention; Figure 7 outlines steps for preparing equal-abundance cDNA libraries of 5'-end fragments according to yet another method of the invention;

Figure 8 shows steps in producing a full-length equal-abundance cDNA library, using one of the

end-fragment equal-abundance libraries illustrated in Figures 5, 6, or 7;

Figure 9 illustrates one method for identifying transcript species which are produced by a test cell, but not a control cell;

Figure 10 illustrates a second method for identifying transcript species which are unique to a test cell;

Figure 11 outlines a method for determining the relative abundances of transcript species in control and test cells;

Figure 12 shows a method which uses a control cell equal-abundance cDNA composition prepared according to the invention, for identifying cell products which are unique to a test cell; and

Figure 13 illustrates the use of an equal-abundance cDNA composition formed from selected genomic chromosomes or chromosomal fragments, according to the invention, for determining the transcript products of the genomic structure.

Detailed Description of the Invention

I. Preparing an Equal-Abunda ce Transcript Composition Figure 1 illustrates steps in preparing the equal-abundance composition of the invention, according to a preferred embodiment of the invention. Briefly, genomic- DNA or a selected fraction thereof (genomic structure) from a control or test cell type is isolated, fragmented, and treated to remove multiple-copy fragments. The genomic fragments are preferably labeled with an affinity label, such as biotin, to allow bindin to a solid support, such as an avidin-coated support beads. These steps are indicated at the upper right in

Figure 1, and detailed below (Section IA and Examples 1 and 2) .

The fragments of the genomic structure are hybridized with cellular transcript species derived fro the selected cell or cell group containing the genomic structure. As used herein, the term "transcript species" refers to an mRNA transcript type or kind T. produced by each gene G. in the genomic structure which is active in producing mRNA transcripts in vivo. The transcript species may be a full or partial-length mRNA transcript, or a single-strand or double-strand cDNA derived from the mRNA transcript or transcript fragment, and may be derived directly from cell mRNA isolates, or from cloned cDNAs. The transcript preparation or mixture which is hybridized with the genomic fragments can be prepared i a variety of ways. The simplest transcript preparation is total full-length mRNA isolated directly from the cell, or the corresponding cDNA. This preparation typically contains a range of transcript sizes, from a few hundred to several thousand base pairs, and a wide range of abundances, from a few copies per cell, to a hundred thousand or more copies per cell. If a large molar excess of this preparation is mixed with the labeled genomic fragments under hybridization conditions, each transcript molecule will hybridize with approximately one genomic fragment. Assuming that the average, size of the genomic fragments is about the same as that of the smallest * ^ transcript size, the total number of genomic fragments available for hybridizing to each transcript species will be roughly proportional to the full-length transcript size.

To illustrate, the practical size of genomic fragments, for purposes of removing multiple-copy

genomic material, is about 300-800 base pairs. Therefore, a gene G whose transcript size is about 500 base pairs will be represented by about 1 genomic fragment, whereas a gene G whose transcript size is about 5,000 base pairs, will be represented by about 10 genomic fragments. Since the transcripts are in iarge molar excess of the genomic fragments (e.g., 250-fold excess), about ten times more genomic fragments are available for binding to the 5,000 base pair transcript than to the 500 base pair transcript. The resulting transcript composition—consisting of all transcript species which hybridize with the genomic fragments— would therefore contain substantially about one T. transcript species for each ten transcript species, and all other species would likewise be represented in abundances which are approximately related to transcript size. (The size dependence is actually more complicated, due to the representation of intron regions in the genomic fragments, but not the transcripts.) The resulting transcript composition may be thought of as having a size-equalized or size-normalized composition of transcript species.

Two strategies are used to reduce or eliminate the size dependence in the transcript compositions. In the first, described in Section C below, total cellular mRNAs are initially size fractionated, to produce selected size classes of transcripts, e.g., in the 500-2.000, 2,000-4,000, and over 4,000 base pair ranges. Each or a selected mRNA size class is then individually hybridized with the genomic fragments, yielding a transcript composition in which the molar amounts of the different transcript species vary within a fairly small range, e.g., one to fourfold, depending on the range of transcript sizes.

The second, and preferred approach to reducing size dependence in the transcript composition is to equalize all of the transcript sizes prior to hybridization with the genomic fragments. The equalize transcript pieces may be derived either from 3 '-end regions of the transcripts, as described in Section D below, or from 5'-end pieces, as described in Section E. The equal-length transcript end pieces, when mixed with the genomic fragments under hybridization conditions, will hybridize with corresponding genomic end fragments only, and additional transcript hybrids related to total transcript length will not form. It can be appreciated that the resulting transcript composition will contain substantially equal molar amounts of each transcript species, where each transcript species is represented by a 3'- or 5'- end fragment. As defined herein, a transcript composition is also said to have substantially equal molar abundances of its transcript species if the composition of transcript species is size normalized, as defined above, and where the sizes of transcript species are within a defined size range.

Figure 1 illustrates a preferred method for carrying out the hybridization and transcript separatio steps referred to above. The transcript species which are hybridized with biotinylated genomic fragments are preferably equal-size transcript species of the type indicated above, and preferably cloned 3 '-end single-strand mRNA or cDNA species prepared as described in Section D below. A large molar excess of the transcript species is mixed with biotinylated genomic fragments, the mixture is denatured by heating, then cooled to allow slow annealing of the fragments with the transcript species.

After hybrid formation, the material is applie to a " streptavidin column, which retains all of the biotinylated genomic fragments, including those hybridized with the transcript species. The column is washed thoroughly to remove transcript species not associated with the labeled fragments, and the transcript species are then released from the column by heat denaturation. The hybridization and transcript separation procedures just described are detailed in Example 3.

The equal-abundance transcripts are typically - cloned into a suitable cloning vector to form a cDNA library. This library in turn can be used in preparing a full-length, equal-abundance cDNA library, as will be described in Section F.

A. Preparing Single-Copy Genomic DNA Fragments

As indicated above, the genomic DNA fragments used for hybridizing the transcript species are preferably single-copy fragments, and are derived from selected cell or cell group whose transcript characteristics are to be examined. As defined herein, "genomic DNA structure" is intended to include total genomic DNA. or a fragment size class thereof, isolated from chromosomes, or fragments or selected regions thereof. The DNA structure includes a plurality of genes G. which are active, in the selected cell, in producing mRNAs at various levels of mRNA abundance. The actual number of genes G. in the structure may be relatively small (less than about 25), but typically contains hundreds or thousands of genes G*. and the range of transcript abundances produced by the genes in the cell ranges from a few copies per cell, up to 10 or more copies per cell.

The cell from which the DNA structure is derived may be a cell line capable of sustained growth in culture, such as a variety of fibroblast or lymphocytic lines, or newly immortalized B-lymphoblastoid cell lines. Sources of such cell lines, and methods for obtaining and culturing cell lines, and of immortalizing B-lymphocytes, are well known. A requirement of all such cultures, however, i that there not be any amplification or deletion of genetic material (G.) which would lead to an altered abundance of T..

Alternatively, the cell source may be a cell type or group isolated from living tissue (or whole organs or entire organisms), and suspended in culture, such as peripheral blood lymphocytes, or primary cultures of human embryonic lung or foreskin fibroblas cells which can be maintained and grown for a limited time in culture. Cultures of this type would have the lowest probability of developing chromosomal aneuploid To obtain the genomic DNA structure, the cell source is fractionated, according to conventional methods, to obtain total DNA. total nuclear DNA. and/or isolated chromosomes. The isolated DNA material or structure may be further fractionated to yield chromosomal regions of interest, for example defined-size restriction fragments from total genomic DNA or from one or more isolated chromosomes. Section IIC below, for example, describes a transcript composition formed from- a NotI fragment of isolated human chromosome 7.

DNA from the selected cell source is isolated by standard procedures, which typically include successive phenol and phenol/chloroform extractions wit ethanol precipitation, according to standard procedures

(Maniatis, p. 280). Example 1A below describes the isolation of total DNA from human peripheral blood lymphocytes (PBLs) by the same procedures.

In the usual case, where the genomic material is derived from a eukaryotic cell or cell type, the DNA will consist of a relatively small percent of single-copy genes—the material of interest for purpose of the present invention— and a major portion of repeat-sequence DNA. A preferred method for removing repeated sequences is by conventional hybridization methods which exploit the greater rate of hybridization of multiple-copy gene sequences (Britten). Briefly, in carrying out this method, DNA material is fragmented, such as by sonication or high-pressure extrusion, yielding fragments which are preferably between about 300-800 base pairs, as discussed in Example IB. These fragments are treated under salt and temperature conditions which cause disassociation, then reannealed slowly, by dropping the temperature below the measured melting temperature T , the midpoint temperature of transition between single and double strands of the DNA material. Since the rate of reassociation is greater for multiple-copy fragments, the last group of fragment to reanneal will be predominantly the single-copy gene fragment material.

The annealing reaction is allowed to proceed t a predetermined Cot (initial DNA concentration Co times annealing time t) value at which multiple-copy fragments are predomina-ntly annealed and single-copy material remains single-stranded. The partially annealed reaction mixture is then separated over hydroxyapatite, which selectively binds double-strand (duplex) DNA. Additional separation "between single-cop and ' multi-copy fragments can be achieved by repeating

the above hybridization procedure one or more times. The method is illustrated in Example IC, and detailed the published literature (Britten) .

Alternatively, initial and/or secondary remov of multiple-copy genes may be carried out by other kno techniques including hybridization and selective subtraction with known repetitive sequences such as th Alul and Kpnl repeat families (Lewin). In some cases may also be desirable to perform additional subtractio with known moderately repetitive sequences, such as histone and immunoglobulin genes and various pseudogen families, such as nerve growth factor genes, and with known polymorphic sequences such as those coding for t antigens of the major histocompatibility complex (MHC) . The predominantly single-copy genomic fragmen produced as above are now labeled in a manner which wil permit them to be physically separated from unlabeled RNA or DNA species. The label is preferably an affinit label, such as biotin. which binds specifically and wit high affinity to a surface-modified solid support, such as a solid support containing surface-bound avidin or streptavidin (Brigati, Herman). Example 2 details several methods for biotin-labeling one or both strands of duplex DNA. Streptavidin support materials are commercially available (Example 2).

As indicated above, the genomic fragments in the hybridization with transcript species serve to provide substantially equal molar quantities of each coding region of the genomic structure. Since most multiple-copy segments in a genome are non-coding, it will be appreciated that a fragment composition in whic none or only a portion of the multiple-copy fragments has been removed will nonetheless provide substantially equal molar amounts of coding fragments. However, such

a multiple-copy composition would have to be added to the transcript species at much higher concentration, and therefore the specificity of the hybridization reaction would be reduced, and the reaction time would be increased over a single-copy composition. In the case where the genomic fragments are not first treated to remove multiple-copy species, the multiple-copy fragments will largely hybridize with one another and. when added to an affinity column, become bound to the column material through both strands.

B. Modified Cloning Vectors

This section describes three modified vectors which are useful for cloning full-length or end fragment transcript species, as will be described below. Each of the three vectors is formed by introducing one or more rare restriction sites into or adjacent the normal "cloning sites" of a conventional, commercially available cloning vector. The rare cutting site(s) allow the cloning vector with its transcript insert to be cut adjacent one or both ends of the insert, with a substantially reduced risk of cutting the transcript insert itself at an internal cutting site, which would result in the loss of a portion of the insert. Figure 2A illustrates the modification of a pGEM-3 plasmid to form a cloning vector, designated pGEM-3/NS, which is useful for producing single-strand transcripts of cloned inserts. The pGEM-3 plasmid is commercially available from Promega Biotech (Madison, WI). As shown, this vector contains, in a 5'-to-3' direction, an SP6 RNA polymerase promoter, a polylinker cloning site region bounded by HindiII and EcoRI sites, and a T7 RNA polymerase promoter adjacent the EcoRI site. The two promoters are oriented to promote RNA

transcription of inserts cloned into the polylinker region. Thus, one strand of a DNA insert contained in the cloning site can be transcribed by an SP6 polymerase, and the other strand, by the T7 polymerase. In the usual transcription procedure, the vector with the DNA insert is cut at the insert end opposite the desired promoter, so that transcription from the desire promoter terminates at the end of the transcript.

The pGEM-3 modification is aimed at replacing the cloning site region of the plasmid with a segment containing a single NotI (NI) site adjacent the Hindlll (HIII) site, and a single Sfil (SI) site adjacent the EcoRI (RI) site. As illustrated in Figure 2A, this is done by cutting the plasmid with Hindlll and EcoRI. to remove the polylinker region, and inserting into the cu vector an oligonucleotide containing, in a 5'-to-3' direction, a Hindlll cohesive end, a NotI recognition sequence, an Sfil recognition sequence and an EcoRI cohesive end. The following oligonucleotide. which contains contiguous NotI and Sfil sites, is exemplary:

5 ' AGCTTGCGGCCGCGGCCGGGGGGGCCG 3 ' 3 ' ACGCCGGCGCCGGCCCCCCCGGCTTAA 5'

Methods for preparing oligonucleotides synthetically ar referenced in the Methods and Material section below, and services which specialize in the synthesis of such molecules are available 1 ; e.g.. Synthetic Genetics. Inc. (San Diego. CA) and Applied Biosystems (Foster City. CA After digesting the pGEM-3 plasmid with Hindll and EcoRI, the linearized plasmid is purified from the excised polylinker fragment by electroelution. The purified vector is mixed with an equimolar amount of th

oligσnucleotide under conditions which favor circularization, via the oligonucleotide, and the plasmid is ligated with T4 DNA ligase conventionally. The vector construction methods follow standard procedures, such as those referenced in the Material and Methods section below. Successful recombinants are selected for ampicillin resistance on E. coli strain DH5, substantially as described in Example 4B. The ligation restores the Hindlll and EcoRI plasmid sites. yielding a modified pGEM-3 plasmid, designated - gGE-f-3YNS. As seen in Figure 2A, the plasmid contains. - rro a 5--to-3' direction, the SP6 promoter, Hindlll. No :!, Sfil. and EcoRI sites, and the T7 promoter. A s-iπr ar plasmid construction in which the positions of the * NotI and Sfil sites are reversed can be prepared by insertion of an oligonucleotide like that above, but where the positions of the two rare sites with respect to the sticky ends are reversed.

The second vector construct, illustrated in Erigure 2B, is designed for cloning applications in which cHσ-ned transcript inserts are hybridized with labeled siIn * cςLe-strand DNA or RNA in a single-strand vector foxm * . A preferred plasmid for modification is the Mue-S-cxi.be M13+/- (M13+/-) plasmid shown in Figure 2B. amϋ wtEtch in fact represents a pair of plasmids, dϊe * ff£gτr3,ted + or -, whose origin or replication is d-tesaig-Ece to produce phage packaging of either + or - d3ιpri_e-__--insert strands, respectively. This plasmid (parr * ) 1 . which is commercially available from Stratagene, Inc. (San Diego, CA) , contains a cloning site bounded by EcoRI and Hindlll insertion sites, as shown, and an Fl origin of replication from the intergenic region of M13, which permits encapsidation of the plasmid in single-strand form in a bacterial host co-infected with

the plasmid and helper phage, as has been described (Messing). The plasmid also contains a T7 polymerase promoter adjacent its Hindlll site, and a T3 polymeras promoter adjacent its EcoRI site, as shown, for transcription of a sense strand from the T7 promoter a an anti-sense strand from the T3 promoter, analogous t the pGEM-3 vector.

The modification of the M13+/- vector is carried out substantially as above, by cutting the plasmid at its unique Hindlll and EcoRI sites, ligatin the oligonucleotide shown above into the linearized vector, and selecting for ampicillin resistance in transformed host E. coli strain JM101. The modified plasmid, designated M13+/-/NS, includes, in a 5 ' to 3' direction, a T7 promoter, unique NotI and Sfil sites, and the T3 promoter.

The third modified vector, shown in Figure 2C is used in generating a HindiII/Sfil/NotL/Pstl oligo dG fragment, shown at the bottom in the figure, which in turn is used in forming a 5'-end cloning vector of the type described by Okayama and Berg (Okayama), for plasmid-primed first- and second-strand cDNA synthesis. The modified vector is derived from a pSV1932 plasmid which is available commercially from PL Biochemicals (Milwaukee, WI), and which has a

Hindlll/Bcl segment containing an internal PstI site an a portion of the SV40 RNA PollI promoter, as shown. Th modification is aimed at introducing a single Sfil and single NotI site immediately adjacent the Hind III site (between the Hindlll and PstI sites). This is done, according to standard procedures, by cutting the pSV193 plasmid at the unique Hind III site, and mixing the linearized vector with the following oligonucleotide:

5 ' AGCTTGGCCGGGGGGGCCGCGGCCGC 3 '

" 3 ' ACCGGCCCCCCCGGCGCCGGCGTCGA 5'

which contains a cohesive Hindlll 5' end followed by an Sfil site, a NotI site, and a 3' cohesive Hindlll end. The oligonucleotide is prepared synthetically, as above, and is designed to regenerate only the 5* Hindlll with a blocked 3 '-end Hindlll site. The vector is inserted into the cut Hindlll site of pSV1932, ligated, and successful recombinants are selected for ampicil-lin resistance on E. coli strain DH5. The selected vector * is vector is designated pSV/SN.

The pSV/SN plasmid is linearized by digestion with PstI, and treated with terminal transferase in the presence of GTP, substantially as has been reported

(Okayama), to attach oligo dG tails to both ends of the vector. After removing the terminal transferase and GTP, the vector is cut with Hindlll to release the desired Hindlll/Notl/PstI- oligo dG fragment as shown. The use of this fragment in a procedure for cloning

5'-end transcript fragments will be described in Section E below.

C. Size-Class Equal-Abundance Libraries The first equal-abundance transcript composition which will be considered is a size-class composition in which equalized transcript species are all derived from a selected size class of full-length mRNAs. One method of forming this composition is illustrated in Figure 3, and described generally in Example 3. This method is generally suitable when the total amount of size-class mRNA available from the tissue or cell line of interest is sufficient to allow

the equal-abundance composition to be prepared by dire hybridization and transcript selection, using labeled genomic fragments.

It is often of interest to compare the transcript characteristics of one cell type with the same cell type in a different development or activatio state, for example B-cells before and after activatio with Epstein-Barr Virus (EBV) . As will be seen, this application generally requires two transcript compositions—one derived from control (e.g., inactivated) cells, and the second, from test (e.g., activated) cells. Methods of maintaining a variety of cells in culture and of activating, stimulating, or otherwise altering the metabolic and biochemical behavior of cells in culture are well known for many cell systems.

Another source of mRNA is whole tissue, such a organ tissue obtained from a human or animal. The latter tissue source may be treated initially, by known techniques, to separate one cell type from other cell types present in the tissue source. Often the cell source will be from a diseased tissue, such as from a tumor tissue, or from the tissue of an individual with known genetic disease. Methods for obtaining defined cell types or groups of cells from whole organ or tissu samples are well known.

Typically, the mRNA transcript material derive

8 from at. least about 10 cells, corresponding to a cell

3 volume of about 0.5 mm v of packed cell material, is sufficient. More generally, however, it may be necessary to determine experimentally for each cell or tissue type, and depending on the yield of poly A RNA achievable, whether an intermediate cloning step is required. The starting material is polyA selected mRNA

derived from the control or test cell of interest, by conventional extraction and selection procedures (Example 3A) .

The mRNAs may be sized into one of a convenient number of different size classes, such as 500-2,000; 2,000-4,000; 4,000-7,000; and 7,000 and greater base pair transcript ranges. Alternatively, the transcripts can be sized into a single desired size range, e.g., 1.000-1,500 base pair transcripts, with all other size transcripts being discarded. The preferred method for sizing transcripts is by formaldehyde agarose gel electrophoresis, which allows separation into well defined size classes. Here known molecular weight RNAs are used as molecular weight markers, to gauge migratio distance on the gels as a function of transcript size. The transcripts of each selected size class can be removed from the gel and purified by standard electroelution and ethanol precipitation steps, as outlined generally in Example 3B. With continued reference to Figure 3, the size transcripts (or corresponding cDNAs) are hybridized wit labeled, single-copy genomic fragments, as outlined above, and detailed in Example 3D, to produce an equal-abundance size-class transcript composition in which each transcript species T. is represented in molar amount substantially in proportion to its size (Example 3E) . For example, assuming an average genomic fragment size of about 500 base pairs, and a transcript size class of between 5OO-2.000 base pairs, each of the largest transcript species would be represented by approximately four times the molar abundances of the smallest species. The size-equalized transcript specie produced as above may be cloned into a suitable cloning vector by conventional methods.

One preferred cloning method, which gives directional cloning of full-copy cDNA species into an efficient transcription vector, and rare cutting sites at the transcript ends, is illustrated in Figure 3. Here the transcript species are treated conventionally to produce full-copy duplex cDNAs with 5 '-end hairpin loop. The duplex material is blunt-end repaired, and ligated with Sfil linkers, attaching the linkers to th duplex 3' ends. The duplex molecules are next treated with nuclease SI, to remove the 5'-end hairpins,- blunt-end repaired, and ligated to NotI linkers. Digestion of the duplex molecules with NotI and Sfil yields molecules with 5 '-end NotI sticky ends, and 3 '-end Sfil sticky ends, as indicated. These duplex molecules are now inserted into either the pGEM-3/NS plasmid, or the M13+/-/NS vector, at the NotI and Sfil sites created in these vectors. The resulting library clones, shown at the bottom in Figure 3A and 3B, both contain the equal-abundance full-size transcript speci oriented in a 5'-to-3' direction from the NotI to the

Sfil sites. Preparation of the pGEM-3/NS and M13+/-/N libraries is detailed in Examples 4A and 4B, respectively.

As indicated above, an important advantage of the pGEM-3/NS library is the ability to prepare either coding or non-coding mRNA strands efficiently. The advantage of the M13+/-/NS library is the ability to convert the plas ids to single-strand phage which are suitable for hybridization studies, as will be describe below in Section II. Depending on the orientation of the Fl origin of replication, either the + or - strand of the cDNA transcript will be packaged.

The total number of recombinants which are selected in forming the transcript library is preferabl

large enough to include the expected number of active genes in the gene structure of interest, and more preferably several times the number of anticipated genes. The expected number of active genes in total genomic DNA from a selected mammalian cell type is at least about 10,000-30,000 (Davidson). Since there will be a random distribution of each transcript species in the selected library, selecting at least about three clones for each expected gene guarantees that substantially all species will be represented in- the library. Thus, about 100,000 transcript clones would be selected for a full-genome library which would, with high probability, include at least one transcript species for each active gene in the genome. Since any limited size class of transcript species would contain substantially fewer transcript species than the total number of cellular transcript species, the equal-abundance libraries derived from size-class mRNAs can be made proportionately smaller than the 100,000 clone library indicated above.

Similarly, in forming equal-abundance libraries derived from isolated chromosomes, or other subpopulations of the total genomic material, the total number of equal-abundance library clones required to "span" the genomic structure of interest may also be reduced considerably, as the complexity of the DNA sequence is reduced. However, for purposes of illustration, it will be assumed that total library sizes of about 10 clones are desired in all cases. Here it is noted that the 20 or so plates needed to support a library of this size is readily screened, using screening or selection procedures described in Section II below.

In cases where the total amount of cellular mRNA obtainable from control or test cells is relatively

g small, e.g., where less than about 10 cells are available for RNA extraction, the preparation of an equal-abundance library further includes an intermedia cloning step to boost or amplify the total quantity of transcript material.

One intermediate cloning method is illustrated in Figure 4A. Here full-copy duplex cDNAs prepared fro cellular mRNAs are equipped with 5'-end NotI and 3'-end

Sfil sites, as described with reference to Figure 3, a the molecules are cloned in the pGEM-3/NS plasmid. The plasmids, after selection for successful recombinants and grown under conditions which favor plasmid production (selective amplification with chloraraphenicol) are harvested, digested with Sfil. and transcribed with SP6 RNA polymerase. The resulting mRN transcripts are coding-stran transcript species which are presumably full-length and present in approximately the same molar ratios as total cellular mRNAs. although in much greater quantity. The clone-derived transcripts are used to prepare an equal-abundance transcript composition, whic may be used to form an equal-abundance library, substantially as described above. Details of the metho are given in Example 5. Figure 4B illustrates a second intermediate cloning method for producing a size-class equal-abundance library, according to the invention. I this method, full-length duplex cDNAs derived from the total cellular transcripts are equipped with 5'-end Not and 3 '-end Sfil sticky ends, and cloned into the

M13+/-/NS plasmid as above. As indicated in Section IB above, this plasmid contains an Fl origin of replicatio which allows encapsidation of the single-strand form of the vector when a suitable bacterial host is coinfected

with a helper phage, such as M13 (Messing). Thus, depending -on whether the + or - vector is used, a single-strand + or - form of the cloned insert can be produced readily by infecting the bacterial host that harbors the recombinant plasmids with a helper phage, harvesting the encapsidated single-strand library phage, and isolating the DNA from the phage. Details of these methods are described or referenced in Example 6. The isolated phage DNA, which carries single-strand cDNA inserts derived from the tota-1 cellular mRNA, now amplified and presumably in about the same molar abundance as in cellular mRNAs, is hybridized with labeled genomic fragments, and separated by affinity chromatography, as above, to yield an equal-abundance composition of the phage inserts. The equal-abundance phage, after release from the affinity column, are used to produce duplex library plasmids, by direct transformation of E. coli strain JM101 and selection for ampicillin resistance, substantially as described or referenced in Example 6.

D. 3 '-End Fragment Equal-Abundance Libraries

This section describes an equal-abundance composition and library in which the equimolar transcript species are 3'-end transcript fragments, and preferably fragments whose size range is comparable to that of the labeled genomic fragments used in preparing the composition.

Figure 5 illustrates one method for preparing the 3'-end fragment composition. The discussion will " follow first the pathway at the left (Figure 5A) which is applicable when the original quantity of cellular mRNA obtained from test or control cells is sufficient for direct preparation of the composition, i.e.. where

8 at least about 10 cells are available for mRNA isolation. Fragmenting by RNAse digestion or alkaline hydrolysis are both suitable. The fragmentation treatment may be monitored, to insure optimum conditions, by agarose gel electrophoresis, as describe generally in Example 7.

The 3 '-end fragments from above are separated from the other mRNA fragments by oligo dT chromatography, and the isolated segments are transcribed, by standard oligo dT first-strand priming, to form single-strand cDNA. Alternatively, oligo dT priming and first-strand synthesis can be applied to th total RNA fragment mix. yielding cDNA fragments corresponding to the 3 '-end, polyA containing RNA piece only. RNA fragments can be removed by alkaline hydrolysis or RNAse treatment, leaving the 3 '-end cDNA fragments.

Total 3 '-end, single-strand cDNAs from either of the two procedures above are now made equal abundanc by hybridization with the labeled genomic fragments, in the presence of a large molar excess of the cDNA material. Since all of the 3 '-end transcript fragments are about the same length, and approximately equal in length with the genomic fragment, each transcript is expected to hybridize with one or at most two genomic fragments. Assuming each genomic fragment is a single-copy species, the resulting hybridized 3 '-end transcript species will then be present in molar ratios of either 1 or 2 σopies er genomic gene. The equal-abundance cDNA composition can then be transcribed to form double stranded duplex 3 '-end fragments, and these fragments equipped with 3 '-end Sfi and 5'-end NotI sticky end for cloning into pGEM-3/NS o M13+/-/NS cloning vectors, as above. Selecting

successful recombinants and forming a library of sufficient size to encompasss substantially all of the expected active genes is done according to the general procedures above. The pathway at the right in the figure (Figure

5B) illustrates an intermediate 3 '-end fragment cloning step which is required where the amount of originally obtained cellular mRNA is low, e.g., where the source p cellular material contains fewer than about 10 cells. In this procedure, the 3 '-end polyA mRNA or corresponding 3 '-end poly dT single-strand cDNA is transcribed to form duplex cDNAs and these are equipped with 5'-end (hairpin end) NotI and 3 '-end Sfil sticky ends, and cloned into either the pGEM-3/NS or M13+/-/NS cloning vectors, substantially as described with reference to Figure 3. The vectors, in turn, are used to produce equal-abundance libraries of 3 '-end transcript species, substantially as described with reference to Figure 4. Figure 6B illustrates a second general method for producing a 3 '-end fragment equal-abundance composition and library. The method inherently involves an intermediate cloning step, and is thus especially suited to cell samples in which limited mRNA starting material is available.

As a first step, the full-copy mRNA starting material is transcribed to form full-length duplex cDNAs, and these molecules are end-repaired and ligated with Sfil linkers, which are added at the 3 '-end of the duplex molecules only. The full-length cDNAs are now fragmented, such as by sonication, under conditions which produce duplex fragments predominantly in the 300-700 basepair size range. After repair of the fragment ends, NotI linkers are ligated onto the blunt

-si¬

s' ends. Digestion with Sfil and NotI yields fragments of the following types: (a) 3 '-end fragments having 5'-end NotI and 3 ' -end Sfil sticky ends; (b) internal fragments whose ends are both NotI sticky ends; and (c) 5'-end fragments having 3 '-end NotI sticky ends and 5 '-end hairpins.

Selection of the 3 '-end fragment types is made by cloning the fragments in a pGEM-3/NS or M13+/-/NS cloning vector cut at its NotI and Sfil sites, under conditions which favor circularization of the vector with the fragment inserts, and selecting for successful recombinants. Details of the method are described or referenced in Example 8.

The cloning vectors just described, and carrying the 3 '-end fragment inserts, are manipulated t provide single-strand transcript RNA or DNA, hybridized with labeled genomic DNA, selected for equal-abundance composition by affinity chromatography, and recloned in a suitable cloning vector, substantially as described with reference to Figure 3.

One advantage of the 3 '-end fragments, for use in producing an equal-abundance composition, is that fragments with co-terminal 5 '-ends, but different 3' termini (i.e., mRNA transcribed in the same direction with a common 5 '-end) will not hybridize with one another, or with common genomic fragments. Therefore, the potential problem of "dilution" of such common co-terminal transcripts, by binding to common genomic fragments, is eliminated.

E. 5 '-End Fragment Equal-Abundance Libraries

The equal-abundance compositions described in this section are prepared by hybridizing approximately equal-size 5 '-end transcript fragments with the genomic

DNA fragments. Preferred methods for preparing these 5 '-end compositions employ an intermediate cloning step in which total full-length mRNAs are used to generate a library of total 5'-end fragments, similar to the secon method described used in preparing cloned 3 '-end transcript fragments. The one additional step which is critical to the preparation of an equal-abundance 5'-en library is aimed at isolating full-length mRNA as the substrate for the cDNA reactions. This is accomplished by one of two methods, both of which are based an affinity binding to a unique 5' terminal structure or cap added post-transcriptionally to the mature message (Lewin) . The first method involves affinity chromatography with phenyl borate columns, according to published methods. The second is based on affinity binding of the mature mRNAs to anti-cap antibody.

The cloned 5*-end fragments in turn provide a source of 5'-end pieces for hybridizing with labeled genomic fragments, as above, to generate the desired equal-abundance library. Since both methods involve an intermediate cloning step, both are suitable in cases where cellular mRNA starting material is limited.

One method for generating a 5'-end fragment equal-abundance composition is illustrated in Figure 6A. As seen, the method follows many of the same steps described above with respect to Figure 6B for generatin the 3'-end fragments. Specifically, the duplex cDNA fragments produced by fragmenting full-copy duplex cDNA are end repaired and ligated to Sfil linkers, equipping all of the fragment ends except the 5'-end hairpins wit Sfil sites. The fragments are then digested with SI nuclease, to remove the 5 '-end hairpins, end repaired, and ligated with NotI linkers. Cleavage of the fragments with NotI and Sfil enzymes produces two

classes of fragments: (a) 5'-end fragments having 5'-en NotI and 3 '-end Sfil sticky ends; and (b) fragments whose ends are both Sfil sticky ends. Selection of the 5 '-end fragment types is made by cloning the fragments in a pGEM-3/NS or M13+/-/NS cloning vector cut at its NotI and Sfil sites, under conditions which favor σircularization of the vector with the fragment inserts and selecting for successful recombinants. Details of the method are described or referenced in Example 9. The cloning vectors just described, which carr the 5'-end fragment inserts, are manipulated to provide single-strand transcript RNA or DNA, hybridized with labeled genomic DNA, selected for equal-abundance composition by affinity chromatography, and recloned in a suitable cloning vector, substantially as described with reference to Figure 4.

Figure 7 illustrates a second general method for producing a 5'-end fragment equal-abundance composition and library. In this approach, full-length mRNAs are employed to generate a library of directionally oriented duplex cDNAs in the plasmid vector pSV1932, following the procedure of Okayama and Berg (Okayama). Briefly, pSV1932 is linearized by digestion with Kpjnl, and equipped with 5' and 3' strand oligo dT tails at opposite ends of the linearized vector, by treatment with terminal transferase. The 5' strand oligo dT end is removed from the vector by digestion with Hpal. and the vector annealed through it single oligo dT tail to^the poly A region of full-lengt mRNA. The attached mRNA is now copied with reverse transcriptase. and the ends of the vector provided with 5 'and 3' end oligo dC ends in the presence of terminal transferase. The resulting construct is illustrated at the top of Figure 7. The method to this point follows published methods (Okayama).

The vector construct at the top of Figure 7 is is cut with Hindlll to release the 5' oligo dC end, the released segment is replaced by the Hindlll/Sfil/Notl/ Pstl-oligo dG segment from Figure 2C, and the vector is circularized and ligated to form the plasmid shown in the middle of the figure. As seen, the resulting library vectors each contain a full-copy transcript cDN duplex insert, oriented in a 5'-to-3 ' direction between Hindlll, Sfil. NotI. and PstI sites at the 5' end. and PvuII site at the 3' end. These library vectors are fragmented, e.g., by sonication, down to about 300-800 base pair size fragments, and the sonicated ends are repaired (blunt-ended) and equipped with Sfil linkers. Cleavage of the fragments with NotI and Sfil enzymes produces three classes of fragments: (a) 5'-end cDNA fragments having 5'-end NotI and 3'-end Sfil sticky ends; (b) fragments whose ends are both Sfil sticky endsi and (c) a 5'-end oligonucleotide derived from the 5'-end second strand primer. Here it is noted that the combination of both Sfil and NotI sites adjacent the

5 '-end of the cDNA inserts is necessary for generating NotI site adjacent the 5 '-end of the insert and a predicted Notl/Sfil oligonucleotide vector fragment derived from the original plasmid. Prior to cloning, a size selection by agarose electrophoresiε and gel electroelution is performed to isolate appropriately sized 5'-end cDNAs and exclude th Notl/Sfil oligonucleotide fragments. Selection of the 5'-end fragment types is made by cloning the fragments in a pGEM-3/NS or M13+/-/NS cloning vector cut at its NotI and Sfil sites, under conditions which favor circularization of the vector with the fragment inserts and selecting for successful recombinants. Details of the method are described or " referenced in Example 10.

F. Preparing Full-Length Equal-Abundance Library

In some applications, it is convenient to have full-length copies of the equal-abundance transcript species, rather than 3 '-end or 5'-end transcript fragment species prepared as above. One application of the equal-abundance compositions, for example, is in identifying mRNA species which are unique to a particular cell type or state (Example IIA). Here it i generally useful to be able to isolate the full-length cDNA species either directly or by hybridization, in a single-step with a unique 3 '-end or 5'-end fragment species. Another application which requires full-copy transcripts involves in vitro protein synthesis of transcript species for purposes of analyzing transcript products (Section IIC). One further use is the expression directly in eukaryotic cells of the full-length species. Such an application may be applie to the full-length species cloned into the pSV/SN vecto (Figure 7). In preparing the full-copy comp.osition and library, the cloned equal-abundance 5 '-end or 3 '-end fragments are used as probes for total, full-length cellular mRNA transcripts, or their corresponding single-strand cDNAs. The preferred method, however, involves combined polyA selection and use of the 5 '-end equal-abundance library as probe to insure isolation of a full-length transcript. Full-length transcripts whic hybridize with the probes are separated from non-hybridized materials This is done, for example, by labeling the equal-abundance transcript end-fragments with an affinity label, such as biotin, using the probes to select full-length transcripts, and separating the probe-hybridized transcripts from non-hybridized material by affinity chromatography on a matrix

containing avidin. The bound full-length transcripts are then released and cloned to form a full-length equal-abundance library by the above methods.

Figure 8 illustrates the method generally, as it is applied to 3'-end or 5'-end fragment equal-abundance libraries contained in pGEM-3/NS or M13+/-/NS libraries. The particular vectors which are shown at the top in the figure are 5 '-end fragment libraries such as generated according to the methods of Figure 6A or 7. The strategy when using a pGEM-3/NS cloning vector is to generate non-coding fragment transcript or transcript species, i.e., species which are capable of hybridizing with cellular mRNAs. This can be done, as above, by cutting the pGEM-3/NS cloning vectors at the 5'end NotI site,, and transcribing the non-coding cDNA strand with T7 polymerase. The non-coding transcripts can be photobiotinylated by published methods. Alternatively, the cloning vector can be used to generate coding RNA transcripts, and these can then be used to select equal-abundance first strand cDNA for library construction.

To use an end-fragment M13+/-/NS cloning library, the library plasmids are made single-stranded, by infection of the plasmid-containing host with an M13 helper phage. The single-strand encapsidated phage are then isolated and treated to release the single-strand DNA, as above. The phage DNA is photobiotinylated. as above

The biotinylated end-fragment single-strand phage DNA is hybridized with a large molar excess of full-length cellular mRNAs (using - strand phage). or the corresponding single-strand cDNAs (using + strand phage) . and an equal-abundance full-length transcript composition is prepared as above. The composition may

be cloned, according to procedures described generally in Figure 3. Additional details of the method are provided in Example 11.

G - Direct Hybridization Methods

The methods for preparing equal-abundance libraries described in Sections IA-IF are based on hybridization of total transcript species with genomic fragments, and preferably single-copy genomic fragments. This section describes a second general method for preparing an equal-abundance composition. The method involves direct hybridization of complementary strands of cDNA derived from the total cellular transcripts of the cell of interest. The hybridization is carried out under concentration and temperature conditions which allow the bulk of high- an moderate-abundance, but not the low-abundance species, to hybridize with one another. That is, the hybridization reaction is carried out to a Cot value at which transcript species which are present at a relatively low concentration (corresponding approximately to the concentration of the lowest-abundance species) are in a predominantly (50%) non-annealed form. The hybridized duplex species are then separated from non-annealed molecules, e.g., by hydroxyapatite chromatography. The non-annealed molecules are present in substantially equal molar amounts, corresponding approximately to the concentration of the lowest-abundance species.

Alternatively, one of the strand mixtures can be labeled, such as by biotinylation, and the non-annealed molecules of the other strand separated by affinity chromatography, e.g., with avidin column support material.

The complementary strands which are used in the hybridization are preferably equal-size 3 '-end or 5'-end fragments generated as above. The advantage of equal-size fragments is that the rate of annealing of any transcript species, and therefore the relative concentration of that species with respect to all other species, is substantially independent of transcript size. The complementary, equal-size single-strand material which is annealed may be prepared by excising cloned duplex library fragments from one of the -total

5 '-end or 3 '-end transcript libraries described in

Sections ID and IE above. In this embodiment, the excised library inserts are denatured by heating, then slowly annealed to a Cot value at which the different species are substantially equalized in molar concentrations.

In a second embodiment, the complementary sens and anti-sense strands are individually generated from the pGEM-3/NS library vector of total equal-size transcripts, as described above. The two populations of transcript strands, are then annealed as above, until the transcript species are substantially equalized in concentrations.

In yet another embodiment, biotinylated sense or anti-sense transcript from pGEM-3/NS may be combined with M13/-/NS or M13/+/NS libraries, respectively, for direct hybridization and the resulting equal-abundance sinqle strand phage recovered by direct transformation of appropriate bacterial hosts. The selected Cot value which maximizes the extent to which all species are represented in substantially equal molar amounts, with a minimum loss of any transcript species, can be calculated from the estimated concentration in the mixture of the

lowest-abundance species. Specifically, the C t value selected is that which "precedes" the hybridization of any of the lowest-abundance species, as determined fro the initial concentration of such a lowest-abundance species. After the initial hybridization and separati steps, the isolated non-annealed material is preferabl carried through one or more additional hybridization/separation cycles, to further equalize t concentration of all of the species. The final equalized composition can be cloned in a suitable vector, such as pGEM-3/NS, as above.

Since the direct hybridization method just described does not require single-copy genomic DNA fragments, it is somewhat easier to prepare than the composition formed by hybridization to genomic fragments. However, the method has two limitations no shared by the genomic-fragment approach. First, the equal-size end-fragment species are likely to contain sequences (either 5'-end common untranslated leader or 3 '-end polyA sequences) which are common to defined subsets or many if not all of the species. It would therefore, in practice, increase the difficulty in selecting a Cot value to maximize equal-abundance representation. Secondly, it is inherently more difficult to achieve true equal-abundance concentrations of the man transcript species in the direct hybridization method. This limitation reflects the differences in the range concentrations of desired coding species which are present in genomic fragments versus total cellular transcripts. In genomic fragments, the desired single-copy species are all present at about one copy per cell, whereas repetitive genomic fragments are generally present at relatively high copy numbers. Thu

a fairly sharp separation between repetitive and single-copy coding sequences can be made in the hybridization step used to remove repetitive DNA. Accordingly, the equal-abundance composition, which is formed by hybridization to the genomic fragments, has essentially the same uniformity of concentration among the different transcript species. By contrast, the total transcript material used in forming the equalized composition in the second method contains a continuum of concentrations between lowest- and highest-abundance species. It is accordingly more difficult to achieve (or identify) a C t value at which the concentration of non-annealed molecules is about the same for all species.

II. Utility

A. Identifying Unigue mRNA Species

Information about the presence or absence of low-abundance mRNAs is of interest in understanding the etiology of disease processes as well as fundamental cellular events relating to induction, infection, differentiation, and the like. In particular, information about the induction or repression of unique species of mRNAs would aid in (a) understanding the basis of various disease states at the gene level, (b) developing new methods for detecting cancerous or precancerous conditions, (c) diagnosing, studying, and isolating latent virus infections, and (d) studying changes in gene expression which occur during cell induction, activation, cell cycle progression, or the like.

In many of these cases, it is desired to identify unique mRNA species which are present in a

test, but not a control cell, or inversely, are prese in the control, but not the test cell. As the term i used herein, control cell is intended to mean a reference cell against which changes in the transcrip composition of the test cell are measured. For examp in studying changes in cell type which occur as a res of viral infection, embryogenic change, activation, o the like, the control cell and test cell are typicall common cell type or common cell group, before and aft the cell event of interest.

One method of identifying unique mRNA species according to the invention, is the subtraction method illustrated in Figure 9. The particular method illustrated is designed for identifying one or more unique transcript species which are present in test cell, but not control cell transcripts. It will be appreciated that transcript species which are unique t the control cell can be identified by a similar method in which the roles of the control cell and test cell equal-abundance libraries are reversed.

The method requires initially the preparation of an equal-abundance library for both the test and control cells. Where both libraries are formed from a common genomic structure, such as the total genomic material from a common cell type, the libraries can ea be produced from a common labeled genomic fragment preparation, in combination with the cellular mRNA preparation from either control or test cells. The on requirement is that both equal-abundance libraries be prepared by substantially the same method, and in particular, that the cloned equal-abundance fragments from one library be capable of hybridizing with the corresponding fragments in the other library.

The libraries shown in Figure 9 are 5'-end fragment pGEM-3/NS libraries produced according to methods described above. One of the libraries, and preferably the control library, is manipulated to produce non-coding transcript species, by cutting the library plasmids with NotI and transcribing the fragment insert with T7 polymerase. The other library is similarly manipulated to produce the coding strand of the corresponding test cell fragment inserts, by cutting the plasmids with Sfil. and generating coding-strand transcripts in the presence of SP6 polymerase.

To identify unique test cell transcripts the control cell non-coding strand RNA fragments (or the corresponding cDNAs) are biotinylated, as above, and annealed in large molar excess with the test cell RNA . fragments (or the corresponding cDNAs). Those test cell transcripts which hybridize with the control cell transcripts, i.e., those transcripts that are common to both cells, are removed by affinity chromatography, yielding only those test cell species which are not present in the control cell. These may now be used as a hybridization probe (after end labeling) to identify the unique transcript species present in that cell. Probes generated by these procedures will have greater sensitivity than those isolated by standard subtraction procedures. Example 13 below illustrates the method, as it is applied to identifying and isolating transcripts which are unique to EBV-activated peripheral blood lymphocytes (PBLs). Figure 10 illustrates a second method for identifying transcripts which are unique to a test cell. The method involves first blot-transferring the plated library vectors from a test cell equal-abundance library onto a nitrocellulose filter, culturing the blot

transfer colonies to amplify plasmids within individual clones, fixing plasmid DNA to the filter, then hybridizing with radiolabeled mRNA or cDNA obtained fro the control cell equal-abundance library vectors. Labeled mRNA transcripts, produced, for example, by transcribing from a pGEM-3/NS cloning vector in the presence of radiolabeled ribonucleoside triphosphateε, are preferred, since non-hybridized RNA can be removed by hydrolysis. Development of the filters against X-ray film shows radiolabeling at all test cell clones corresponding to control cell transcripts. By comparin the pattern of radiolabel spots with the positions of the test cell clones, those test cell transcripts which do not hybridize with control cell species can be identified. The identified test cell clones, such as those indicated at A and B in Figure 10, are preferably recloned on fresh medium, reblotted on filter paper, an confirmed for non-hybridization with labeled control cell transcripts. The method is illustrated, for detecting unique transcript species related to EBV activation in peripheral blood lymphocytes, in Example 14.

Unique transcript(s) identified and isolated b the methods just described may be cloned, used in an in vitro protein synthesizing system to analyze unique protein products of the test cell, and/or radiolabeled and used for probing and isolating "unique" test cell genes, and their corresponding full-length cDNA clones from appropriate libraries.

It can be appreciated that the equal-abundance composition permits identification of low- to very low- abundance unique transcript species, due to a reduction of background levels by several orders of magnitude.

compared with conventional subtraction methods. The lower background is due both to the lower relative concentration of control cell fragments which are required to hybridize with the test cell transcript species, and to the much greater relative concentration of low-abundance species present in the test cell composition.

In the filter-hybridization method, the lower background is due to the much lower levels of radiolabeled species needed to detect species common to both control and test cells. In addition, this method - would be highly impractical without an equal-abundance library, due to the very large number of library transcripts which would have to be examined.

B. Measuring Transcript Abundance

In many cell systems, changes in transcript composition between control and test cells are expected to involve changes in the levels of existing transcripts, rather than the appearance of new species or the loss of existing transcript species. The method of the present section is designed for detecting such transcript-level changes. The method is based on differential levels of binding of total mRNAs from control and test cells to the DNAs of an equal-abundance library.

The method, as outlined in Figure 11, involves plating-.of the equal-abundance transcript library onto two filters, one which will function for control cell hybridization, and the other, for test cell hybridization, after the plasmid DNA material is fixed to the filters.

Total cellular mRNA is isolated as above, i.e., using oligo dT chromatography to isolate total polyA

RNAs. These can be labeled by polynucleotide kinases after limited base hydrolysis. Alternatively, total cellular mRNAs isolated from the two cell types can be reverse transcribed in the presence of radiolabeled nucleotide triphosphates, to produce labeled cDNA. Ea of the labeled mRNA (or corresponding cDNA) preparatio is added to a filter under hybridization conditions, t identify the corresponding library clones, in proporti to the total number of copies of each species original present in the mRNA preparation. The method assumes that total number of copies of each equal-abundance library vector is equal to or greater than the total number of mRNA molecules in the highest-abundance cellular mRNA species. For this reason, the filters a preferably prepared by growth on agar plates, as above and under growth conditions which favor large copy number of the library vectors in plated bacteria.

After hybridization, the filters are washed, and if necessary, treated with ribonuclease to remove non-specific background associated with the RNA probe. The filters are developed against an X-ray film. The two plates in Figure 11 illustrate typical autoradiograms which are observed. Here each circle represents an individual recombinant colony from the plated library, and the density of dot shading within each circle represents the relative numbers of labeled mRNA molecules which have hybridized with each colony. In a.ctuality, each plate would typically contain up to 5,000 or more colonies.^ The method is illustrated generally in Example 15.

As seen, the number of cellular transcripts which binds to all but two of the library vectors is about the same in control and test filters. In one of the library vectors, indicated by arrow A, the number o

transcripts is substantially increased in the test cell, and in the other, indicated by arrow B. it is substantially reduced.

The colonies corresponding to those which show a change in mRNA abundance, between control and test cell, can be picked from the original library plate, replated, and retested by hybridization with labeled cellular mRNAs, to confirm the correlation between specific library transcripts and changes in the abundance of cellular mRNA. The isolated recombinant itself might also be used as a hybridization probe against mRNAs from test and control cells to confirm it reduced or increased level of expression.

The feasibility of the present method iε due t the relatively manageable number of recombinant clones making up the equal-abundance library which are examine for hybridization with cellular mRNAs. Where the control cell library represents the entire genome, the total number of distinct library recombinants needed to guarantee representation of nearly all cellular transcripts is about 100,000, as discussed above. Assuming a cell density of about 5,000 per plate, the entire screening procedure can be limited to only 20 sets of filters.

C. Identifying Transcript Products

Another general application of the invention i in identifying total or unique transcript products associated with a selected genomic structure, such as total genomic DNA, isolated chromosomes, or large genomic fragments.

In the simplest case, an equal-abundance library is used to generate full-length equal-abundance mRNAs from a selected cell. This may be done, as

indicated above, by employing a 5 '-end fragment or 3 '-end fragment equal-abundance library to isolate by hybridization, an equal-abundance composition of full-length cellular mRNAs. Alternatively, full-lengt coding transcriptε can be generated from a full-length equal-abundance library, prepared as in Section IF. Since the full-length transcriptε are present in substantially equal molar amounts, in vitro protein synthesis in a conventional protein syntheεizing syste produces substantially equal numbers of all transcript protein products. These, in turn, can be displayed by two-dimensional electrophoresis, yielding a pattern of protein bands representing virtually all the protein products of the cell. The pattern gives information n available from conventional transcript preparations in that (a) proteins normally present in high-abundance d not mask nearby protein bands and (b) proteins normall present in small abundance are present in detectable amounts. The technique therefore is limited only by t number of proteins which co-migrate, due to similar siz and charge.

The method can be extended to identifying cellular products which are unique to a test cell. Thi is done by comparing the pattern of bands in the 2-D ge electrophoresis patterns of control cell and test cell equal-abundance transcripts, as illuεtrated in Figure 1

Figure 13 illustrates methods for generating equal-abundance transcript products from defined genomi structures. The first ^structure is a fraction which is substantially enriched for human chromosome 7, accordin to well known methods. A DNA library containing such material may be purchased commercially from the America Type Culture Collection (Rockville, MD) . The fragments prepared with the enriched-for chromosome, when

hybridized with total cellular transcripts, yield an equal-abundance of all of the transcriptε which are actively expreεεed by chromosome 7. The equal-abundance library, in turn, is used to produce an equal-abundance composition of full-length transcripts which, in an in vitro protein syntheεizing system, yields approximately equal molar amounts of all of the protein products produced by genes actively expressed in human chromosom 7. The size of the genomic structure can be narrowed still further, for example, to study the gene - products of a selected size fragment of an isolated chromosome, such as chromosome 7. Figure 13 showε a method for preparing a library of thiε type. Briefly. an equal-abundance library for chromosome 7 is prepared as above, and an equal-abundance library for an isolate sized genomic fragment, such as a NotI segment (N.-N.) of chromosome 7 is similarly produced. These libraries may be used to select their respective equal-abundance transcripts, which after in vitro translation, can be compared by 2-D gel electrophoresiε. mRNA transcripts produced by in vitro transcription of non-coding and coding strandε from theεe two εeparate reεpective libraries may also be hybridized, as above, and non-hybridized material, corresponding to non-overlapping transcriptε. removed b affinity chromatography, if the εelecting tranεcript segments have previously been labeled by biotin. The C7/N.N. equal-abundance-- transcripts can be used for identifying full-length cDNA cloneε for in vitro tranεcription and tranεlation of the NotI εegment εpecified proteinε.

The following examples illustrate methods for producing εize-claεε, 3 '-end fragment, 5 '-end fragment.

and full-length equal-abundance libraries, according to the invention, and to methods for using the libraries t analyze differences in transcript quantities and types between non-activated and EBV-activated peripheral bloo lymphocytes. The examples are intended to illustrate preferred methods of preparation and particular uses of the composition of the invention, but are in no way intended to limit the scope of the methods or applications to other cell types or other lymphocyte states.

Materials and Methods

pGEM-3 is obtained from Promega Biotech (Madison, WI); Bluescribe M13+/- and helper M13 phage, from Stratagene (San Diego. CA); and E.coli strain DH5. and E. coli strain JM101, from Bethesda Research Labs (Bethesda, MD) .

Terminal transferaεe (calf thymuε). alkaline phoεphataεe (calf intestine), polynucleotide kinaεe, Klenow reagent, and SI nuclease are all obtained from Boehringer Mannhein Biochemicals (Indianapolis, IN); SP6 and T7 polymerase, from Promega Biotech; and proteinase K, RNAse and DNAse, from Sigma (St. Louis. MO).

NotI. Sfil. Hindlll. EcoRI. KpnI. PstI. Hpal. T4 DNA ligase and T4 DNA polymerase are obtained from New England Biolabs (Beverly. MA); oligo dT primer and oligo dA and oligo dT tfellulose, from PL Biochemicals (Milwaukee. WI); Chelex-100, from Bio-Rad (Richmond. CA) ; Sephadex G-50, from Pharmacia (Piscataway, NJ); streptavidin agarose, from Bethesda Research Labs (Bethesda, MD) ; and photobiotin from Clontech Labs (Palo Alto, CA) .

Synthetic oligonucleotides for vector modifications to introduce NotI and Sfil linkers are prepared by either the phosphotriester method as described by Edge, et al. Nature (supra) and Duckworth, et al. Nucleic Acids Res (1981) :1691 or the phosphoramidite method as described by Beaucage, S.L., and Caruthers, M.H.. Tet Lettε (1981) 22.:1859 and Matteucci, M.D., and Carutherε, M.H., J Am Chem Soc (1981) 103:3185 and can be prepared using commercially available automated oligonucleotide εyntheεizers. Alternatively, cuεto deεigned εynthetic oligonucleotides may be purchased, for example, from Synthetic Genetics (San Diego, CA) . Kinasing of single strandε prior to annealing or for labeling iε achieved uεing an excess, e.g., approximately 10 units of polynucleotide kinase to 1 nmole εubεtrate in the preεence of 50 mM Tris, pH 7.6, 10 mM MgCl,, 5 mM dithiothreitol. 1-2 mM ATP, 1.7 pmoles γ32P-ATP (2.9 mCi/mmole), 0.1 mM εpermidine, 0.1 mM EDTA. Site specific DNA cleavage iε performed by treating with the suitable restriction enzyme (or enzymes) under conditions which are generally understoo in the art, and the particulars of which are specified by the manufacturer of these commercially available reεtriction enzymes. See, e.g.. New England Biolabs, Product Catalog. In general, about 1 μg of plaεmid or DNA sequence is cleaved by one unit of enzyme in about 20 ul of buffer solution; in the examples herein, typically, an excess of^restriction enzyme is used to insure complete digestion of the DNA substrate.

Incubation times of about one hour to two hours at abou 37°C are workable, although variationε can be eaεily tolerated. After each incubation, protein iε removed b extraction with phenol/chloroform, and may be followed

by ether extraction, and the nucleic acid recovered fro aqueous fractions by precipitation with ethanol (70%). If desired, size separation of the cleaved fragments ma be performed by polyacrylamide gel or agarose gel electrophoresiε uεing εtandard techniqueε. A general description of size separations is found in Methods in Enzymoloqy (1980) 65:499-560.

Restriction cleaved fragments may be blunt ended by treating with the large fragment of E. coli DN polymerase I (Klenow) in the presence of the four deoxynucleotide triphosphateε (dNTPε) uεing incubation timeε of about 15 to 25 min at 20 to 25°C in 50 mM Tris pH 7.6, 50 mM NaCl, 6 mM MgCl , 6 mM DTT and 0.1-1.0 mM dNTPε. The Klenow fragment fills in at 5' single-stranded overhangs but chews back protruding 3' single strands, even though the four dNTPs are present. If desired, selective repair can be performed by supplying only one of the, or selected, dNTPs within th limitations dictated by the nature of the overhang. After treatment with Klenow, the mixture is extracted with phenol/chloroform and ethanol precipitated. Treatment under appropriate conditions with SI nuclease results in hydrolysis of any single-stranded portions o DNA. In particular, the nicking of of 5' hairpins formed on synthesis of cDNA is achieved.

Ligations are performed in 15-50 μl volumes under the following standard conditions and temperatures: for example, 20 mM Tris-Cl pH 7.5, 10 mM MgCl . 10 mM DTT. 33 μςr/ml BSA. 10 mM-50 mM NaCl. and either 40 uM ATP. 0.01-0.02 (Weiss) units T4 DNA ligase at 14°C (for "sticky end" ligation) or 1 mM ATP, 0.3-0.6 (Weiss) units T4 DNA ligase at 14°C (for "blunt end" ligation). Intermolecular "sticky end" ligations are usually performed at 33-100 μg/ml total DNA

concentrations (5-100 nM total end concentration) . Intermolecular blunt end ligations are performed at 1 uM total ends concentration.

In vector construction employing "vector fragments", the vector fragment is commonly treated wit bacterial alkaline phosphatase (BAP) or calf intestinal alkaline phosphatase (CIP) in order to remove the 5' phosphate and prevent self-ligation of the vector. Digestions are conducted at pH 8 in approximately 10 m Tris-HCl, 1 mM EDTA using about 1 unit per μg of* BAP at 60°C for one hour or 1 unit of CIP per μg of vector at 37° for about one hour. In order to recover the nucleic acid fragments, the preparation is extracted with phenol/chloroform and ethanol precipitated. Alternatively, religation can be prevented in vectors which have been double digested by additional restriction enzyme digestion and separation of the unwanted fragments.

Example 1

Preparation of Single-Copy Genomic DNA

A. DNA Isolation

Peripheral blood lymphocytes (PBLs) are deriv from normal individuals and T cells are removed by Ficoll-Hypaque gradient (Kaplan, M.E.. & Clark, C, 1979, J. Immunol. Meth. 5_, 131-135). The chromosomal DNA is. isolated by proteinaεe K digeεtion in the preεence of 1.5 % εodium dodecyl sulfate (SDS), and 50 mM EDTA. pH 7.5, followed by successive phenol and phenol/chloroform (1:1) extractions, according to standard procedureε (Maniatiε). The DNA is redissolve in 0.15 M potassium phosphate buffer, pH 7.0 (PB) and passed over a Chelex 100 column to remove metal ions.

DNA concentration is determined by absorbance at 260 nm. ' The purity of the material iε confirmed by εpectrophotometric studies on melting (Britten) . The material is precipitated with ethanol, and stored as a 70% ethanol precipitate at -70° C until used.

B. DNA Fragmentation

An appropriate amount of DNA from above is collected by centrifugation and diεεolved in 10 ml of 0.06 M Na acetate to a DNA concentration of about 2 OD , unitε/ml. The DNA εolution is then diluted to 30 ml with glycerol, giving a final solution which is 0.02 M Na acetate and about 66% glycerol. This materi is placed in a 50-ml high-speed blender, and cooled by immersing the sides of the blender in a dry-ice/ethano bath. Blending is begun as the solution coolε, before it beco eε too viscous. The material is blended at 50,000 rpm for 30 minutes. Two volumes of cold ethano are added to the blended solution, and the material is allowed to stand in the freezer for two hours. The DN precipitate is collected by centrifugation at 10,000 g for 15 minutes.

The range of DNA fragment size, which is preferably about 200-800 bases, is confirmed by agarose gel electrophoresis. according to standard procedures.

C. Removing Repetitive-Seouence DNA

The DNA-fragment sample is dissolved in 0.12

PB. 0.2 mM EDTA. Repetitive-sequence DNA is removed by standard hybridization methods which are detailed in th literature (Britten). Briefly, the DNA is raised to about 10°C above the melting temperature (T ) , as determined for example by absorption at OD 60 " In tne buffer used above, the T is between about 80-90°C. m

The material iε then cooled εlowly to about 25°C below the , and allowed to anneal to a Cot value

(mole/liter x εec) of about 100, at which the repeat-sequence material is predominantly in reannealed form, and the non-repetitive fraction, in denatured form. Thiε duplex material is separated from single-strand DNA by hydroxyapatite (HAP) chromatography, according to standard procedures (Britten). Briefly, HAP is suspended in 0.15 PB, 2 mM EDTA, and poured into a water-jacketed column maintaine at the reannealing temperature. After washing the column with εeveral volumes of the reannealing buffer, the DNA material is loaded onto the column and the single-strand material eluted with several volumes of the buffer. This material is combined, and precipitate with cold ethanol, as above.

The precipitated single-strand material is redissolved in annealing buffer, and the entire separation procedure repeated, except that the reannealing is performed at a temperature about 10°C below the above Tm value.

Example 2 Preparation of Biotinylated Genomic DNA Double-stranded, single-copy genomic DNA from

Example I is biotinylated according to one of five methods detailed below. The biotinylated nucleotides used are Bio-11-dUTP (Brigati) which has an 11-atom linker arm separating the biotin and the pyri idine base, and Bio-19-SS-dUTP (Herman) which has a 19-atom linker containing a disulfide bond. 32P-labeled dNTPε are included when monitoring of the various steps of th method is desired.

A. Nick-Translation

A typical reaction, carried out in 60μl final volume, contains 1 μg DNA in 50 mM Tris-Cl pH 7.5, lO M MgS04, 0.1 mM DTT,. 100 mM of each of the followin nucleotides: dATP. dGTP, and Bio-11-dUTP or

Bio-19-SS-dUTP, 5 μCi of [α- 2 P] dCTP (Amersham. specific activity 3,000 Ci/mmole). 30 U DNA polymerase I, and 27 pg/ml DNAse I. The reaction mixture is incubated at 14°C for one hour, stopped by addition of EDTA to 10 mM and heated at 68°C for 5 min. Labeled DN is recovered by chromatography over Sephadex G50 equilibrated and eluted with 10 mM Tris-Cl, pH 7.5/1 mM EDTA (T.E.). When large amounts of DNA are required, two to three nick-translations are run in parallel and loaded onto one column to obtain a concentrated DNA solution.

B. Tailing by Terminal Transferase

Thiε procedure iε used only after the DNA is first treated to produce 3' protruding ends (Maniatis). The reaction mixture consiεts of 1 μg DNA in 100 mM potassium cacodylate (pH 7.2), 2 mM CoCl , 0.2 mM DTT, 100 μM Bio-11-dUTP. 50 μCi [α- 32 P] dCTP, and 20 U terminal transferase. added last. After incubation a 37°C for 45 min. an additional 20 U of enzyme is added and the incubation repeated. The reaction is terminate by EDTA added to 10 mM, the DNA is recovered as described above, precipitated with ethanol. washed with 70% ethanol and resuspended in 50 μl of T.E.

C. Labeling by T4 DNA Polymerase Replacement Reaction

The reaction contains 1 μg of DNA in 33 mM Tris-OAc (pH 7.9). 66 mM NaOAc. 10 mM MgOAc. 0.5 mM DTT 0.1 mg/ml BSA. and 0.5 U T4 DNA polymerase. After

incubation at 37 β C for 7 minutes. dATP, dGTP, and Bio-11-dUTP are added to a final concentration of 150 μM. dCTP is added to 10 μM. 50 μCi of [α- 3 P] dCTP (3000 Ci/m ole), and TrisOAc, NaOAc, MgOAc, BSA, and DTT are added to maintain previous concentrationε. This reaction is incubated at 37°C for 30 min. then dCTP is added to a concentration of 150 μM, and the reaction incubated for an extra 60 min at 37°C. The reaction is stopped by addition of EDTA to 10 μM,* heated at 65°C for 10 min, chromatographed and processed as described before.

D. Klenow Fill-in Reaction

This is carried out following standard protocols (Maniatis); incubation is at room temperature for 15 min.

E. Labeling by Photobiotinylation

This is carried out by standard procedures, as outlined in the protocol supplied by the manufacturer (Clontech. Palo Alto. CA) .

EXAMPLE 3 Preparation of Equal-Abundance Size-Class Composition: Method 1

A. Isolation of B-Lymphocyte mRNA

RNA is isolated from B-cells prepared as in Example 1, according to- standard procedures (Maniatis, p.187), which use vanadyl ribonucleoside complexes to inhibit RNAse. The total RNA preparation is fractionated by oligo dT chromatography, also according to well-known procedures (Maniatis, p. 211) yielding a polyA mRNA fraction.

B. Size Fractionation of mRNA

The polyA mRNA preparation from part A iε fractionated by electrophoreεis on 1% agarose, using glyoxal and dimethylsulfoxide to denature RNA (Maniatis, p. 150). A standard plot of log RNA size as a function of migration distance is prepared using standard RNA size standardε. Three size fractions of RNA are collected: 500-2,000; 2,000-4,000; and greater than 4.000 base pairs. The RNA is eluted from the three gel regions, by performing phenol extractions on the frozen gel slices, and then collected by ethanol precipitation.

C. Preparation of Single-Strand cDNA

The polyA mRNA from part B. from the 2,000-4.000 size class, is used to obtain single-strand cDNA transcripts according to the method of Maniatis, et al (supra). Briefly, a portion of the polyA RNA is treated under appropriate buffer conditions with reverse transcriptaεe in the preεence of poly dT primer, and the four nucleotide triphosphates. The complex is treated with base to destroy the remaining mRNA, and the single-strand cDNA is isolated by ethanol precipitation.

D. cDNA Hybridization with Single-Copy Genomic DNA The single-strand cDNA from part C is disεolved in 0.15 M PB. 2 mM EDTA . and mixed with a εolution of the single-copy biotinylated genomic DNA from Example 2. at a relative concentration of about 250 OD 2_6.0 Λ units cDNA to 1 OD__ n unit biotinylated genomic DNA, where 2o0 the OD measurement for the genomic fraction is determined in the denatured state.

The combined fractions are heated to about 10°C above the T (Example 1), until the genomic DNA has been been completely denatured, as determined by the

hyperchromic effect at OD__,oU_. The material is then cooled to about 25 β C below the T and the reannealing m reaction is allowed to proceed until to a Cot value of about 5.000, or until the reannealing process, as monitored by hyperchromic effects at OD_ 6Q , stabilizes.

E. Separation of Size-Class Egual-Abundance Transcripts A 1 ml silanized syringe plugged with silanized glass wool is packed with 0.3 ml streptavidin-agarose and washed with 0.15 PB. 2 mM EDTA. The hybridization mixture from 3D is loaded onto the column which is then- washed with several volumes of the hybridization buffer, to remove non-hybridized cDNA.

The column is then heated to about 10°C above " the T value (Example 1) for about 10 minutes, then washed with heated buffer (at the same T + 10°C m temperature) to elute the desired equal-abundance single-strand cDNA. The cDNA material which is eluted is cooled to 4°C and precipitated overnight with ethanol at -20 o C.

Example 4 Preparation of Equal-Abundance Size Class Libraries

A. Equal-Abundance pGEM-3/NS Library

The precipitated equal-abundance first-strand cDNA from Example 3E is taken up in 10 mM Tris-HCL. pH 8.3 containing .15 M KC1 and 10 mM MgCl_, and converted to duplex cDNA as above. The full-copy cDNAs are blunt ended and ligated at their free (3') ends with Sfil linkers. The duplex cDNA is cut with nuclease S., to cleave the molecules at their 5' ends, repaired with Klenow reagent and ligated with NotI linkers. The duplex molecules are digested with NotI. and Sfil. to

remove redundant end linkers, yielding duplex molecules with 5' NotI and 3' Sfil sticky ends. The mix is heat treated to denature the restriction enzymes. pGEM3 plasmid is modified to contain internal NotI and Sfil cloning siteε, substantially as described in Section IB. The modified plasmid, designated pGEM-3/NS is digested with NotI and Sfil to open the vector at its unique NotI and adjacent Sfil sites. The linearized plasmid fragment iε iεolated by electroelution after agaroεe gel electrophoresis and then treated with alkaline phosphatase prior to mixing with the Notl/Sfil cDNA fragments from above, under conditions which promote circularization of single plasmid fragments, and ligated to form circularized plasmids. The plasmid includes, in a 5'-to-3' direction, an SP6 promoter, a unique NotI site, the full-copy cDNA insert, a unique Sfil site, and the T7 RNA polymerase promoter. The circularized plasmid is selected on E. coli strain DH5, and successful recombinants are selected for ampicillin resiεtance.

The cell density of the plating step is such as to yiel about 5,000 colonies per plate, on a total of about 20 plates.

B. Equal-Abundance M13+/-/NS Library

The precipitated equal-abundance first-strand cDNA is treated as in Example 4A to produce double-strand equal-abundance cDNAs with 5 '-end NotI an 3 '-end Sfil sites. M13+/- is modified to contain internal NotI an

Sfil cloning siteε, εubεtantially as described in Section IB. The modified plasmid, designated M13+/-/NS is digested with NotI and Sfil to open the vector at it unique NotI and adjacent Sfil sites. The linearized

plasmid fragment is isolated by electroelution after agarόεe gel electrophoreεis and then treated with alkaline phosphatase prior to mixing with the Notl/Sfil cDNA fragments from above, under conditions which promote circularization of single plasmid fragments, and ligated to form circularized plasmidε. The plasmid includes, in a 5'-to-3' direction, a T7 polymerase promoter, a unique NotI site, the full-copy cDNA insert, a unique Sfil site, and a T3 polymerase promoter. The circularized plasmid is selected on E. coli strain JM101, and successful recombinants are selected for ampieillin resiεtance. The cell denεity of the plating εtep iε εuch aε to yield about 5,000 colonieε per plate, on a total of about 20 plateε.

Example 5 Equal-Abundance Size-Class Composition: Method 2 This method is suitable for cell systems in which limited amounts of cellular mRNA are available, requiring an initial transcript cloning step to generate amplified amounts of total transcript species.

Total full-length mRNA from Example 3A is reverse-tranεcribed to form full-length, duplex cDNAs according to standard procedures, using oligo dT priming for first-strand synthesiε. The full-copy cDNAs are equipped with 5*-end NotI and 3 '-end Sfil siteε, as in Example 4B, and these fragments are inεerted into the Notl/Sfil εite of pGEM-3/NS, alεo aε in Example 4B.

Successful recombinant plasmids, selected for ampicillin resistance on E. coli strain DH5, are treated with Sfil. to open the plaεmidε at the 3' end of the inserts. The linearized vector is mixed with SP6 RNA polymerase in the presence of ribonucleosides, under conditions specified by the manufacturer (Promega

Biotech. Bulletin # 001), giving mRNA tranεcription fro the SP6 promoter of the 5' strand, and yielding coding RNA transcripts.

The now amplified coding RNA is transcribed to single-strand cDNA. as above, using oligo dT priming, and hybridized with labeled genomic fragments from Example I. A 250 fold molar excess of the εingle-stran cDNA is hybridized with the biotinylated genomic DNA fragments from Examples 1 and 2, and separated from non-hybridized cDNA by affinity chromatography on a streptavidin column, as in Examples 3D and 3E. The hybridized cDNA released from the column, which is the desired equal-abundance composition, is made double stranded, equipped with 5'-end NotI and 3 '-end Sfil sticky ends, and cloned into a pGEM-3/NS or M13+/-/NS cloning vector, as in Example 4B or 4C, respectively.

Example 6 Equal-Abundance Size-Class Composition: Method 3 This example describes an alternative method for preparing an equal-abundance transcript composition when limited amounts of cellular mRNA are available.

Total full-length mRNA from Example 3A is transcribed to form full-length, duplex cDNAs as above. The full-length cDNAs are equipped with 5 '-end NotI and 3 '-end Sfil sites, as in Example 4B, and these fragment are inserted into the Notl/Sfil site of M13+/-/NS, as i Example 4C.

E. coli strain--JM101 harboring the recombinant M13+/-/NS plasmids is infected with the M13 helper phag (VCS-M13 helper phage supplied by the manufacturer), th latter permitting encapsidation of the M13+/-/NS single-strand DNA derived from the plasmids in the infected cellε. Coinfection and culture conditions are

performed according to publiεhed methods. The encapsidated recombinant phage are now isolated from the culture, and single-εtrand DNA material prepared by conventional procedures (Messing). The εingle-εtrand DNA is hybridized with labeled genomic fragments from Example I. A 250 fold molar excess of the single-strand cDNA is hybridized with the biotinylated genomic DNA fragments from Examples 1 and 2, and then separated from non-hybridized cDNA by affinity chromatography on a streptavidin column, as in Examples 3D and 3E.

The hybridized phage material which is released from the column contains the desired equal-abundance transcript species. The single-strand phage is now converted back to its plasmid form by transforming E. coli strain JM101. Transformed colonies are then selected for ampicillin resistance provided by the M13+/-/NS plasmid. The density of cells is such as to give about 5.000 colonies, on a total of 20 plates.

Example 7

Preparation of an Equal-Abundance

3 '-End cDNA Library: Method 1

A. Preparation of 3 '-End cDNAs

PBL mRNA is isolated as in Example 3A, and suspended in 20 mM Tris-HCl, 1 mM EDTA, pH 7.0 at 4°C. To the RNA solution is added RNAse A (1 unit/ml), and the mixture is incubateα at 10°C under conditions which produce RNA fragments predominantly in the 300-500 base pair regions. The reaction conditions may be established by digestion at 10°C for increasing time periods, and monitoring the size distribution of the RN with agarose gel electrophoresis.

The fragmented RNA is extracted with phenol/chloroform, precipitated with ethanol, and redissolved in 20 mM Tris-HCl, 0.5 mM NaCl, 1 mM EDTA, 0.1% SDS. pH 7.6. The material is then fractionated b oligo dT column chromatography, by standard procedures (Maniatis, p. 197) to isolate 3 'end fragments containi polyA. The 3' fragments are primed with poly dT and copied, as in Example 3, to produce 3 '-end single-stra cDNA.

B. Preparation of 3 '-End Egual-Abundance Library

A 250 fold molar excess of the 3 '-end- cDNAs from above are mixed with the biotinylated genomic fragments of from Examples 1 and 2, and the two DNA fractions are hybridized and bound on a streptavidin column, as described in Examples 3D and 3Ξ. After washing the column to remove non-bound (abundant) cDNAs the equal-abundance cDNA species are eluted by heating. The eluted material is made double stranded (Maniatis, p.214). and equipped with 5 '-end NotI and 3'-end Sfil linkers, and inserted into pGEM-3/NS as in Example 4B. Successful recombinants are selected on E. coli strain DH5, at a cell density of about 5,000/plate, on a total of 20 plates.

Example 8 Preparation of a 3 '-End Egual- Abundance Library: Method 2

Total mRNA from Example 3A is taken up in 10 m

Tris-HCL, pH 8.3 containing .15 M KC1 and 10 mM MgCl_. and converted to duplex full-length cDNAs (containing a 5 '-end hairpin), uεing oligo dT first-strand priming, a above. The 3 'ends of the full-copy cDNAs are repaired

with Klenow reagent, and ligated with Sfil linkers (Example 4B). The cDNAs are fragmented by sonication. until fragment sizeε predominantly between about 300-70 base pairs in length are obtained. The size distribution of the fragments as a function of sonication time can be followed by gel electrophoresiε. The staggered ends of the fragments are repaired with Klenow reagents, and the fragments are ligated with Not linkers, as above. Digestion of the fragments with bot NotI and Sfil yields 3 '-end fragments with 3 '-end Sfil and 5'-end NotI sticky ends. All of the other fragment contain either NotI sites at both ends or Notl/hairpin opposite ends (the 5'-end fragments).

The 3 '-end fragments are inserted into the Not and Sfil sites of pGEM-3/NS as in Example 4B, and successful recombinants are selected for ampicillin

5 resistance. A total of about 10 clones are selected. as above.

Example 9

Preparation of a 5 '-End Egual-Abundance Library: Method 1

Total cellular mRNA prepared as above iε further εelected for full-length intact mRNA moleculeε. This is accomplished by performing oligo dT selection, as performed above, to select for post-transcriptional 3'-end procesεing, followed by further iεolating the intact mRNA εpecies by a procedure which is specific fo intact processed (capped) 5' ends (Lewin) . The latter method uses chromatography on phenol-boronic agarose (Manley) , or affinity chromatography baεed on RNA binding to anti-cap (processed 5" end) antibody.

The double-strand cDNA fragments produced from such doubly selected mRNAs are then sonicated prior to blunt-end repair by Klenow reagent and ligation with Sfil linkers, as above. The fragments, which include 5 ' -end fragments having hairpin ends, are treated with nuclease SI and ligated with NotI linkers aε in Example 4B. Digestion of the fragments with both NotI and Sfil yields 5 '-end fragments with 5'-end NotI and 3 '-end Sfi sticky ends. All of the other fragments contain Sfil sites at both ends.

The 5 '-end fragments are inserted into the Not and Sfil siteε of pGEM-3/NS aε in Example 4B, and successful recombinants are selected for ampicillin resiεtance. Succeεεful recombinantε are grown in liqui culture in the presence of chloramphenicol, to enhance plasmid replication, and the plasmidε are isolated from the cells according to εtandard procedures.

The isolated pGEM-3/NS plasmids are digested with Sfil. and treated with SP6 polymerase, as in Example 5. The resulting RNA fragments are tailed with poly A (Maniatis). Oligo dT priming is now used to form single-strand cDNAs and these are hybridized with the labeled genomic fragments from Examples 1 and 2, and processed to form an equal-abundance cDNA composition as in Examples 3D and 3E. The equal-abundance cDNAs are made double-stranded, equipped with 5'-end NotI and 3 '-end Sfil sticky ends, and cloned into pGEM-3/NS as in Example 4B to form the desired 5'-end fragment library.

Example 10

Preparation of 5 '-End Total and Egual-Abundance Fragment Libraries: Method 2

A. Preparing a Full-Lenqth cDNA Library

Total polyA RNA prepared as in Example 9 is used to obtain a full-length cDNA library according to

the method of Okayama and Berg. Thiε method differs from " the usual cDNA cloning method in that (1) the plasmid vector DNA functions as the primer for the synthesiε of the first cDNA strand, and (2) the second strand is prepared by priming with an oligo dG tailed fragment. Briefly, the pBR322 plasmid is opened at its unique Kpnl site, with addition of oligo dT linkers to the cleaved ends. The vector is now cut at its Hpal site to remove the 5'-end oligo dT linker, and the large plasmid fragment iεolated. The plaεmid fragment, with a 3 ' oligo dT linker iε annealed to the polyA RNA, and the first strand cDNA is produced by copying the RNA in the presence of reverse transcriptaεe and the four deoxyribonucleotideε, followed by addition of oligo dC tails to the single-strand cDNA using terminal transferase. The plasmid is now digested with Hindlll to remove a Hindlll/Hpal segment which has terminal dC base pairs.

The Hindlll treated plasmid is .separated from the Hindi!I/Hpa fragment and ligated with the

HindIII/NotI//Sfil/Pstl-oligo dG fragment produced εubεtantially in accordance with Section IB, under conditionε which favor circularization of the plasmid with the insert. Ligation yields a plasmid with a full-copy transcript insert bounded at its 5'-end by a Hindlll. NotI. Sfil. and PstI sites, and at its 3 '-end ' by a PvuII site, as illustrated at the center in Figure 7. The plasmid. after ligation, is added to a liquid culture of E. coli DH5.--the culture is grown for 1 hour, then switched to ampicillin. Addition of chloramphenicol is used to amplify plasmid growth in the bacteria.

Full-length cDNAs cloned into the pSV/SN vector may be transfected directly into eukaryotic cells (e.g..

COS7) to promote expression of the insert utilizing the eukaryotic transcriptional elements provided by the SV4 sequences in the vector.

B. Preparing a Total 5 '-End Fragment Library

Plasmid from the transformed bacteria from above are isolated by well-known methods (Maniatiε) and the plasmid is fragmented by sonication into sizes of between about 300-500 base pairs, as in Example 8. The sta-ggered ends of the plasmid fragments are blunt-end repaired, and ligated with Sfil linkers aε in Example 4B. then cut with Sfil to remove redundant linkerε and releaεe plaεmid-derived segments from the 5'-ends of the 5 '-end insert fragments. The fragments are then treated with NotI to create a NotI sticky end at the 5'-ends of the 5' Notl/Sfil cDNA fragments, which are isolated from the Sfil/Notl linker segment by agarose gel electrophoresiε prior to cloning into pGEM-3/NS aε in Example 4B, with selection for ampicillin resistance. The p-GEM-3/NS library vectors are used as in Example 9 to generate an equal-abundance 5 '-end transcript library

Example 11 Preparing a Full-Length Equal-Abundance Library

The 5'-end fragment equal-abundance library from Example 9 or 10 is digested with NotI and Sfil to release the 5-end fragment inserts, and the total DNA is biotinylated by nick translation, as in Example 2A. Total full-length mRNA from Example 9 is used to prepare full-length single-εtrand cDNA, aε in Example 3C. The full-length, εingle-εtrand DNA material iε added in 250-fold molar excess to the"biotinylated 5 '-end equal-abundance fragments, and the mixture iε

hybridized, as in Example 3D. Single-strand material which hybridizes to the 5 '-end fragments is εeparated by affinity chromatography. The isolated single-strand cDNA transcript material is made double stranded, and equipped with 5'-end NotI and 3 '-end Sfil sticky ends as in Example 4B, and cloned into the Notl/Sfil site of a pGEM-1/NS vector, as in Example 4B, or into the Notl/Sfil site of a M13+/-/NS vector, as in Example 4C.

Example 12

Preparing an Equal-Abundance Library from EBV-Activated PBLs

PBLs " from a normal individual are isolated as in Example 1. The isolated B cells are cultured at 7 10 /ml for I hr at 37°C in IMDM medium containing 10% concentrated EBV supernatant from the marmoset line

B95-8 (Engleman, p 454). The infected cells are washed twice and then transferred to a 75 cm 2 tisεue culture

4 flask, at a density of about 5 x 10 cells/ml in IMDM medium with 10% fetal calf serum. Transformed colonies are evident at 1 week and the culture is maintained for

21 days at 37°C under 95% 0 /5% CO .

After culturing, the cells are washed two times with wash buffer (150 mM NaCl, 50 mM Tris. pH 8.3. 5 mM EDTA and 50 mM freshly added β-mercaptoethanol) . with low-speed centrifugation to pellet the cells. The pelleted cells are extracted conventionally for mRNA (Maniatis). The total -polyA RNA fraction from the EBV-activated cells is prepared as in Example 3A.

The total polyA RNA fraction from EBV-activated PBLs iε uεed to make full-length, double-strand cDNAs. and theεe are used, in conjunction with the biotinylated genomic fragments from Examples 1 and 2, to produce a

total 5'-end fragment library in pGEM-3/NS, following the general procedures in Example 9.

Example 13 Identification of RNA Sequences Unique to EBV-Activated B-Lymphocytes: Method 1 pGEM-3/NS 5 '-end fragment equal representation libraries from B-lymphocytes not subjected to EBV transformation (normal-cell library) are prepared aε described in Examples 9 or 10, and 12, reεpectiv.ely.

The normal-cell library vectorε are linearized at the

NotI εite and incubated with T7 RNA polymeraεe, in the preεence of all four ribonucleotide triphoεphates, to generate the non-coding RNA strand of the library inser (Figure 8). The RNA is photobiotinylated according to published techniques (Example HE).

The activated-cell library is linearized with

Sfil, and similarly reacted with SP6 RNA polymerase and ribonucleotide triphosphates. to generate the coding RNAs transcribed from the 5 '-end fragment library inserts. This RNA preparation is mixed with about a 10 fold molar excess of the biotinylated non-coding RNA from above, and the two RNA fractions are hybridized under slow annealing conditions. The reaction is carried to a C t value of about 5,000 and/or may be monitored at OD__ Λ to determine virtual reaction

260 completion. The reaction mixture— which contains non-hybridized biotinylated RNA, biotinylated RNA/activated-cell RNA hybrids, and unhybridized unique activated-cell RNA— is fractionated by affinity chromatography, using a streptavidin column. The initial eluate (non-bound material) contains the desire coding mRNA which is unique to EBV-activated cells.

The RNA fragmentε from above (the transcript fragments which are unique to the activated cells) are tailed with poly A at the 3 ' ends and these are reverse transcribed to form duplex cDNAs by first-strand oligo dT priming. The duplex molecules are equipped with

5-end NotI and 3 '-end Sfil εticky endε, and cloned into the the modified tranεcription vector pGEM-3/NS as in Example 4B.

Example 14

Identification of RNA Sequences Unique to EBV-Activated B-Lymphocytes: Method 2 The test cell 5'-end fragment equal-abundance library from Example 12 is plated, as described, on about 20 plates, at about 5,000 cells/plate. The colonies are replica plated onto nitrocellulose filters, and plasmid DNA fixed to the filters according to known procedures (Maniatis, p.316)

Control cell 5'-end fragment equal-abundance library plasmids from Example 9 are treated with Sfil and transcribed with SP6 in the presence of all four

32 ribonucleotide triphosphateε, including -P-labeled

UTP, to form radiolabeled RNA fragmentε.

The radiolabeled control cell probe fragments are added to the nitrocellulose filter, under hybridization conditions (Maniatis, p326). The filters are then washed, and placed in contact with X-ray film, for radiolabel development. The filter spots which do not show radiolabeling represent colonies whose library plas idε are unique to EBV-activated cells. These colonies may be picked from the original plates for eventual confirmation of their unique expreεεion in the teεt cell (EBV-activated B-lymphocytes) by dot blot hybridization against RNA isolated from activated and

non-activated B-lymphocyteε. Alternatively, the unique activated RNA transcripts from Example 13 may be subjected to limited alkaline hydrolysiε and end-labeled with polynucleotide kinase for use aε a hybridization probe againεt the test cell 5 '-end fragment equal-abundance library.

Example 15 Determining the Relative Abundance of Test and Control Cell Transcripts

Each plate of the control cell equal representation library (total of 20 plateε) from Example 9 is replica plated on each of two detergent free nitrocellulose filters, and the plasmid DNA is fixed to the filters as in Example 14. A total of 40 filters (2 per plate) are prepared.

Total mRNA is isolated from non-activated B-lymphoσytes (control cells) and from EBV-activated B-cells (test cells) as in Examples 3A and 12, respectively, and each RNA fraction is selected for full-length message by oligo dT and phenyl borate chromatography, as in Example 9.

The respective control and test cell RNAs are subjected to limited alkaline hydrolysiε and then end-labeled with polynucleotide kinaεe for uεe as hybridization probes against a replica filter set. All of the filters are developed against X-ray film for a period which is found to give good differential labeling among the spots on the filters. The extent of hybridization asεociated with each filter spot iε eεtimated qualitatively, or can be quantified, for example, by a digital optical reader which iε deεigned to output the coordinateε of each spot, and the density of dark spots (radioactive decay) associated with each spot.

The density of spots on each control cell filter " is compared with that rom the corresponding test cell filter, to identify clones which have hybridized a substantially different amount of probe, and hence display a differential transcript abundance and level of expression. These clones are picked from the corresponding control cell plate and plated for rescreening to confirm their differential hybridization. The same experiment performed on an equal-representation library from test cells will identify expresεed transcripts either unique ' to the activated (EBV-tranεfor ed) B-cell or exhibiting differential abundance.

While preferred methods, uses and embodiments have been described herein, it will be apparent to those in the field that various changes and modifications can be made, and the invention applied to a variety of cell systems, without departing from the scope of the invention.