Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND SYSTEM FOR IDENTIFYING BIOLOGICAL ENTITIES IN BIOLOGICAL AND ENVIRONMENTAL SAMPLES
Document Type and Number:
WIPO Patent Application WO/2005/017488
Kind Code:
A2
Abstract:
The present invention pertains to the identification of unique genomic sequences and unique oligonucleotide sequences that can be utilized to identify biological entities in biological or environmental samples. The present invention includes the use of these unique genomic sequences to generate probes, targets or primers, for the purpose of identifying known, unknown and genetically engineered entities in samples. The present invention provides unique genomic sequences, inferred unique genomic sequences and unique oligonucleotide sequences that identify biological entities. The present invention permits detection and identification of a plurality of biological entities from a single sample, and enables identification of closely related strains and genetically engineered biological entities.

Inventors:
ELEY GREGORY DANIEL
VOCKLEY JOSEPH GEORGE
CAPUCO JUSTIN ANTHONY
ROBINSON DOREEN A
SCHAUDIES PAUL R
Application Number:
US2004/002000
Publication Date:
February 24, 2005
Filing Date:
January 23, 2004
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
SCIENCE APPLICATIONS INTERNATIONAL CORPORATION (10260 Campus Point Drive, San Diego, CA, 92121, US)
International Classes:
G05B15/00; G06F7/00; G06F19/22; G11C17/00; G06F19/20; C12N; G01N; (IPC1-7): G01N/
Other References:
DENG W. ET AL.: 'Genome Sequence of Yersinia pestis KIM' JOURNAL OF BACTERIOLOGY vol. 184, no. 6, August 2002, pages 4601 - 4611, XP003005391
PARKHILL J. ET AL.: 'Genome sequence of Yersinia pestis, the causative agent of plague' NATURE vol. 413, 04 October 2001, pages 523 - 527, XP002240957
Attorney, Agent or Firm:
MARCOU, George, T. et al. (Kilpatrick Stockton LLP, 607 Fourteenth Street N.W., Suite 90, Washington DC, 20005, US)
Download PDF:
Claims:

10. The isolated unique genomic sequence of Claim 2, wherein the isolated unique genomic sequence is any one of SEQ ID NOs: 883 to 975 and the biological organism is Bacillus anthracis 11. The isolated unique genomic sequence of Claim 2, wherein the isolated unique genomic sequence is any one of SEQ ID NOs: 976 to 1013 and the biological organism is Dengue virus.
12. The isolated unique genomic sequence of Claim 2, wherein the isolated unique genomic sequence is any one of SEQ ID NOs: 1014 to 1017 and the biological organism is Ebola virus.
13. The isolated unique genomic sequence of Claim 2, wherein the isolated unique genomic sequence is any one of SEQ ID NOs: 1018 to 1019 and the biological organism is Arbovirus.
14. The isolated unique genomic sequence of Claim 2, wherein the isolated unique genomic sequence is any one of SEQ ID NOs: 1020 to 1023 and the biological organism is Francisella tularensis.
15. An inferred unique genomic sequence comprising an isolated nucleic acid sequence of any one of SEQ ID NOs: 1024 to 1029 or any one of SEQ ID NOs: 2072 to 3241.
16. The inferred unique genomic sequence of Claim 15, wherein the isolated nucleic acid sequence is from a biological organism.
17. The inferred unique genomic sequence of Claim 16, wherein the isolated nucleic acid sequence is any one of SEQ ID NOs: 1024 to 1029 and the biological organism is Vaccinia.
18. A target comprising a unique oligonucleotide sequence of any one of SEQ ID NOs: 1030 to 2071.
19. The target of Claim 18, wherein the target is capable of hybridizing to a nucleic acid sequence from Bacillus anthracis, Dengue virus, Ebola virus, Arbovirus, Francisella tularensis, Clostridium perfringens, Escherichia coli, Vaccinia, Yersinia pestis or Brucella melitensis.

20. The target of Claim 19, wherein the target is capable of hybridizing to a nucleic acid sequence from Bacillus anthracis and the target is any one of SEQ ID NOs: 1609 to 1884.
21. The target of Claim 19, wherein the target is capable of hybridizing to a nucleic acid sequence from Dengue virus, and the target is any one of SEQ ID NOs: 2001 to 2010.
22. The target of Claim 19, wherein the target is capable of hybridizing to a nucleic acid sequence from Ebola virus and the target is any one of SEQ ID NOs: 1900 to 2000.
23. The target of Claim 19, wherein the target is capable of hybridizing to a nucleic acid sequence from Francisella tularensis and the target is any one of SEQ ID NOs: 1885 to 1899.
24. The target of Claim 19, wherein the target is capable of hybridizing to a nucleic acid sequence from Clostridium perfringens and the target is any one of SEQ ID NOs: 1345 to 1461.
25. The target of Claim 19, wherein the target is capable of hybridizing to a nucleic acid sequence from Escherichia coli and the target is any one of SEQ ID NOs: 1129 to 1344.
26. The target of Claim 25, wherein the Escherichia coli is Escherichia coli 0157 : H7 or Escherichia coli K12.
27. The target of Claim 19, wherein the target is capable of hybridizing to a nucleic acid sequence from Vaccinia and the target is any one of SEQ ID NOs: 1462 to 1608.
28. The target of Claim 19, wherein the target is capable of hybridizing to a nucleic acid sequence from Yersinia pestis and the target is any one of SEQ ID NOs: 1030 to 1103.
29. The target of Claim 19, wherein the target is capable of hybridizing to a nucleic acid sequence from Brucella melitensis and the target is any one of SEQ ID NOs: 1104 to 1128.
30. An array for detection of at least one biological entity comprising unique oligonucleotide sequences bound to the array in predetermined locations, wherein the unique oligonucleotide sequences can hybridize to unique genomic sequences from the at least one biological entity.
31. The array of Claim 30, wherein the unique oligonucleotide sequences are immobilized on a surface of the array.

32. The array of Claim 30, wherein the unique oligonucleotide sequences comprises at least one of any of SEQ ID NOs: 1030 to 2071.
33. The array of Claim 30, wherein the at least one biological entity is Bacillus anthracis, Dengue virus, Ebola virus, Arbovirus, Francisella tularensis, Clostridium perfringens, Escherichia coli, Vaccinia, Yersinia pestis, Brucella melitensis or a combination thereof.
34. A method of identifying a biological organism in a sample comprising: immobilizing unique oligonucleotide sequences in predetermined locations on an array, wherein the predetermined locations are associated with a known biological organism; applying a sample containing labeled nucleic acid sequences from the biological organism to the array; permitting the immobilized unique oligonucleotide sequences on the array to hybridize with complementary labeled nucleic acid sequences from the biological organism; and, detecting the labeled nucleic acid sequences hybridized to the unique oligonucleotide sequences in predetermined locations on the array, wherein the location of the label identifies the biological organism, and the labeled nucleic acid sequences hybridized to the unique oligonucleotide sequences in predetermined locations on the array are termed unique genomic sequences.
35. The method of Claim 34, wherein the unique genomic sequences are genomic fragments of DNA, coding sequences, non-coding sequences, restriction fragments of DNA, RNA, primers, targets, probes, or PCR products.
36. The method of Claim 34, wherein the unique genomic sequences comprise at least one of any of SEQ ID NOs: 1 to 1023.
37. The method of Claim 34, wherein the unique oligonucleotide sequences comprise at least one of any of SEQ ID NOs: 1030 to 2071.
38. The method of claim 34, wherein the sample is an environmental sample, a clinical sample, a biological sample, or a food sample.
39. The method of claim 34, wherein the sample comprises at least one biological entity.

40. The method of claim 39, wherein the at least one biological entity is selected from the group consisting of Acytota, prokaryotes, eukaryotes, Protista, Fungi, Plantae, Animalia and Monera.
41. The method of claim 39, wherein the biological entity is genetically engineered.
42. The method of claim 39, wherein the biological entity is a pathogen.
43. The method of claim 39, wherein the biological entity is Bacillus anthracis, Dengue virus, Ebola virus, Arbovirus, Francisella tularensis, Clostridium perfringens, Escherichia coli 0157 : H7, Escherichia coli K12, Vaccinia, Yersinia pestis, or Brucella melitensis.
44. The method of Claim 34, wherein the labeled nucleic acids are enzymatically detected.
45. The method of Claim 34, wherein the labeled nucleic acids are labeled with digoxigenin, biotin, a fluorescent label, or a radiolabel.
46. The method of Claim 34, wherein the unique oligonucleotide sequences are more than 30 nucleotides in length.
47. Use of a target comprising a unique oligonucleotide sequence of any one of SEQ ID NOs: 1030 to 2071 for identification of a biological organism.
48. Use of a unique genomic sequence of any one of SEQ ID NOs: 1 to 1023 for identification of a unique oligonucleotide sequence.
49. Use of an inferred unique genomic sequence of any one of SEQ ID NOs: 1024 to 1029 or of any one of SEQ ID NOs: 2072 to 3241 for identification of a unique oligonucleotide sequence.
Description:

METHOD AND SYSTEM FOR IDENTIFYING BIOLOGICAL ENTITIES IN BIOLOGICAL AND ENVIRONMENTAL SAMPLES The U. S. Government has certain rights to this invention. The development of this invention was partially funded by the United States government under a grant from the United States Federal Bureau of Investigation. The sequence listing is herewith submitted on a compact disk containing the file named 36609-2825371fr. ST25. txt, 1,325, 056 bytes in size, created January 22,2004, and Table 3 is herewith submitted on a compact disk containing the file named Table_3. txt, 868, 352 bytes in size, created January 23,2004, and both compact disks are hereby incorporated by reference in their entirety.

FIELD OF THE INVENTION Embodiments of the invention relate to the identification of unique genomic sequences that are informative of the biological characteristics (e. g., presence, abundance, virulence, genetic modification) of a sample, along with systems and methods of using such sequences for gathering information on one or more biological entities or sets of biological entities present in the sample. Specific embodiments relate to microbial organisms. More particularly, the present invention includes the use of the unique genomic sequences to generate probes, targets or primers for the purpose of identifying known, unknown and genetically engineered biological entities from complex samples. Embodiments of the present invention allow for the detection and identification of a plurality of naturally occurring and recombinant biological entities from a single sample, with the further ability to identify and differentiate closely related strains or genetically engineered biological entities.

BACKGROUND Genes, natural units of hereditary material, are the physical basis for the transmission of the characteristics of biological entities from one generation to another. The basic genetic material is fundamentally the same in all biological entities. It consists of chain-like molecules of nucleic acids (deoxyribonucleic acid (DNA) in most organisms and ribonucleic acid (RNA) in certain viruses) and is usually associated in a linear or circular arrangement that, in part, constitutes chromosomes and extra-chromosomal elements, such as micro-chromosomal bodies.

The entire hereditary material in a cell is called the"genome. "In addition to the DNA contained in the nucleus, an organism's cells contain DNA in other locations within those cells,

e. g., bacteria also contain some DNA in plasmids, plants also contain some DNA in plastids, animals also contain some DNA in mitochondria.

A set of biological entities, such as a species, has a genome, e. g. , the complete sequence of genes characteristic of the set. Some portions of the genome are unique to the particular set, e. g., set-unique sequences. Example sets include strain, species, genus, family, group, clade, and other ad hoc sets.

Historically, the theory, principles, and process of classifying biological entities into sets (e. g.,, taxonomic classification) is based on the work of seventeenth century biologist Carl Linnaeus. Linnaeus created the taxonomy system of kingdom, phylum, class, order, family, genus, and species. Known as the Linnean system, this rank-based taxonomy is still in use today. Other basis for classifying organisms have been proposed, including some based on phylogeny, i. e., the evolutionary development of biological entities. For example, in contrast to the rank-based codes, the PhyloCode will provide rules for the express purpose of naming clades and species through explicit reference to phylogeny. See e. g., http://www. ohiou. edu/phylocode/index. html, accessed January 14,2004.

Bacterial and viral organisms exhibit significant regions of homology among their genomes. Standard methods of discriminating between individuals in human populations, such as single nucleotide polymorphism (SNP) analysis, are not applicable to the smaller bacterial and viral genomes. There is a need for a method of identifying regions of unique, species-specific sequence within a genome that can be used to discriminate between biological entities, species and strains. Approximately 300 microbial genomes have been completely or partially sequenced through 2003. In spite of this wealth of information, existing methods for the detection and characterization of microbes are limited by the availability of unique sequence information within the genomes of these biological entities. Frequently, only small fragments of genomic sequences are identified as unique and subsequently useful for the identification of an organism.

Current nucleotide-based methods of identifying microbiological entities rely on primer- requiring multiplex PCR methods or oligonucleotide microarrays that utilize the limited amount of unique nucleic acid sequence available from ribosomal genes (approximately 1% of the genome) or costly shotgun approaches aimed at entire genomes. As with most essential housekeeping genes, there is selective pressure to conserve ribosomal gene sequences across species to maintain functional regions. This conservation limits the size of unique regions that can be used for oligonucleotide design. Furthermore, microarrays that only contain ribosomal genes cannot detect the presence of virulence factors. Accordingly, what is needed is a method to

identify substantially all unique genomic sequence within a genome in order to provide more unique genomic sequence from which to prepare unique oligonucleotide sequences.

The genomic composition of an organism, RNA or DNA, contains unique and conserved nucleic acid sequences. Nucleic acid sequences that are unique to an organism can be used to establish the identity of that organism at the species and strain level (Wilson KH, et al. , Appl.

Environ. Microbiol. 2002 May; 68 (5): 2535-41; Small J, et al. , Appl. Environ. Microbiol. 2001 Oct; 67 (10): 4708-16; Al-Khaldi SF, et al. , J. AOAC. Int. 2002 Jul-Aug; 85 (4): 906-10).

Similarly, the identity of an organism can be established by identifying the presence of certain conserved sequences (Jansen R, et al. , OMICS. 2002; 6 (1) : 23-33). Known methods for detecting an organism include the use of species-specific ribosomal deoxyribonucleic acid sequences to indicate the presence of a single organism see e. g. Matsuki T, et al. , Appl. Environ. Microbiol.

2002 Nov; 68 (11): 5445-51, as well as species-specific nucleic acid sequences to indicate the presence of a small plurality of biological entities (Wilson WJ, et al. , Mol. Cell. Probes. 2002 Apr; 16 (2): 119-27).

Unique genomic sequences in an organism's genome include both coding and non-coding sequences. Coding sequences are sequences that are further processed into proteins or polypeptides, typically performing a single function. These sequences are frequently conserved across genus and species (Sanchez-Contreras M, et al. , Appl. Environ. Microbiol. 2000 Aug; 66 (8) : 3621-3). Conserved coding sequences can include genes that code for enzymatic elements, structural elements, virulence factors or developmental specific functions and processes. An example of conserved coding sequences includes the genomic sequences that encode for ribosomal genes in prokaryotic biological entities (Kuwahara T, et al. , Microbiol. Immunol.

2001; 45 (3): 191-9; Roth A, et al. , J. Clin. Microbiol. 2000 Mar; 38 (3): 1094-104). These sequences can be used to identify a particular species based on the ribosomal sequences they contain.

Non-coding sequences are sequences that are not further processed and do not appear to possess a known function at this time. These sequences may be contained in a portion of the genome that contains unique coding sequences as well as between conserved coding sequences.

Since non-coding sequences do not provide a known function, they are frequently overlooked as unimportant genomic material. However, unique non-coding sequences can be used to identify an organism, just as unique coding sequences are used (Roth A, et al. , J. Clin. Microbiol. 2000 Mar; 38 (3): 1094-104). Informative sequences can reflect a variety of features e. g. structural, functional, metabolic, virulence. See e. g. Schoolnik et al. , Microb. Physiol. Review 2002; 46: 1- 45.

As noted by the National Center for Biotechnology Information (NCBI) at htpp://www. ncbi. nlm. nih. gov/BLAST/blastoverview. shtml (accessed January 5,2004), BLAST D (Basic Local Alignment Search Tool) is a set of similarity search programs designed to explore genetic sequence databases available through NCBI. The BLAST programs have been designed for speed, with a minimal sacrifice of sensitivity to distant sequence relationships. The scores assigned in a BLAST search have a well-defined statistical interpretation, making real matches easier to distinguish from random background hits. BLAST uses a heuristic algorithm that seeks local as opposed to global alignments and is therefore able to detect relationships among sequences that share only isolated regions of similarity.

The Expected Value ("E") as noted in BLAST search results is a parameter that describes the number of hits of the type shown that one can expect to see just by chance when searching a database of a particular size. It decreases exponentially with the Score ("S") that is assigned to a match between two sequences. E can be interpreted as the random background noise that exists for matches between sequences. For example, an E value of 1 assigned to a hit can be interpreted as meaning that in a database of the current size, one might expect to see one match with a similar score simply by chance. This can be interpreted to means that the lower the E- value, or the closer it is to"0", the more significant the match is.

Accordingly, what is needed is a method to identify unique genomic sequences, and a process to rapidly characterize biological entities that is not species, or even organism restricted.

What is also needed is a method to detect and identify numerous dissimilar or closely related biological entities from an individual sample. Also needed are unique gonomic sequences that are useful in identifying unique oligonucleotide sequences. What is also needed are arrays containing these unique oligonucleotide sequences.

SUMMARY OF INVENTION The present invention provides compositions comprising nucleotide sequences comprising isolated unique genomic sequences, inferred unique genomic sequences and unique oligonucleotide sequences. The present invention provides methods of using these isolated unique genomic sequences, inferred unique genomic sequences and unique oligonucleotide sequences to identify biological organisms and entities. This invention also provides arrays comprising unique oligonucleotide sequences wherein the arrays are useful for identifying nucleic acids associated with biological organisms and entities in samples. The present invention includes a method for the generation of isolated unique genomic sequences, inferred unique genomic sequences and unique oligonucleotide sequences useful for the identification of

biological organisms and entities in samples, for example species and strains of bacteria, fungi, viruses, and the like.

The present invention provides compositions comprising nucleotide sequences comprising isolated unique genomic sequences as shown in SEQ ID NOs: 1 to 1023. These isolated unique genomic sequences are from biological organisms such as Bacillus anthracis, Dengue virus, Ebola virus, Arbovirus, Francisella tularensis, Clostridium perfringens, Escherichia coli (Escherichia coli 0157 : H7 and Escherichia coli K12), Vaccinia, Yersinia pestis and Brucella melitensis. Among the SEQ ID NOs: 1 to 1023 that represent the isolated unique genomic sequences provided by the present invention, the specific sequences associated with specific biological organisms are the following: SEQ ID NOs: 586 to 827 and Escherichia coli 0157: H7; SEQ ID NOs: 828 to 882 and Escherichia coli K12 ; SEQ ID NOs: 1 to 15 and Yersinia pestis ; SEQ ID NOs: 16 to 22 and Brucella melitensis ; SEQ ID NOs: 23 to 30 and Vaccinia; SEQ ID NOs: 31 to 585 and Clostridium perfringens ; SEQ ID NOs: 883 to 975 and Bacillus anthracis ; SEQ ID NOs: 976 to 1013 and Dengue virus; SEQ ID NOs: 1014 to 1017 and Ebola virus; SEQ ID NOs: 1018 to 1019 and Arbovirus; and, SEQ ID NOs: 1020 to 1023 and Francisella tularensis.

The unique genomic sequences of the present invention are useful for identification of unique oligonucleotide sequences.

The SEQ ID NOs: 1024 to 1029 or any one of SEQ ID NOs: 2072-3241 that represent the inferred unique genomic sequences provided by the present invention, are also associated with specific organisms and are described in the specification. The inferred unique genomic sequences of the present invention are useful for identification of unique oligonucleotide sequences.

The present invention provides compositions comprising nucleotide sequences comprising unique oligonucleotide sequences as shown in SEQ ID NOs: 1030 to 2071 for identification of a biological organism or entity. These unique oligonucleotide sequences are useful as targets on arrays for hybridization with probes in samples containing nucleic acids in order to identify the organism or entity containing or providing the nucleic acids. These isolated unique oligonucleotide sequences can hybridize with nucleic acid sequences from biological organisms such as Bacillus anthracis, Dengue virus, Ebola virus, Arbovirus, Francisella tularensis, Clostridium perfringens, Escherichia coli (Escherichia coli 0157 : H7 and Escherichia coli K12), Vaccinia, Yersinia pestis and Brucella melitensis. Among the SEQ ID NOs: 1030 to 2071 that represent the unique oligonucleotide sequences provided by the present invention, the specific sequences associated with specific biological organisms are the following: SEQ ID NOs:

1129 to 1344 and Escherichia coli ; SEQ ID NOs: 1200 to 1299 and Escherichia coli 0157: H7; SEQ ID NOs: 1129 to 1199 and Escherichia coli K12; SEQ ID NOs: 1300 to 1330 and Escherichia coli Shiga gene, SEQ ID NOs: 1331 to 1344 and Escherichia coli rmH gene; SEQ ID NOs: 1030 to 1103 and Yersinia pestis ; SEQ ID NOs: 1104 to 1128 and Brucella melitensis ; SEQ ID NOs: 1462 to 1608 and Vaccinia; SEQ ID NOs: 1345 to 1461 and Clostridium perfringens ; SEQ ID NOs: 1609 to 1884 and Bacillus anthracis ; SEQ ID NOs: 2001 to 2010 and Dengue virus; SEQ ID NOs: 1900 to 2000 and Ebola virus; SEQ ID NOs: 1018 to 1019 and Arbovirus ; and, SEQ ID NOs: 1885 to 1899 and Francisella tularensis.

The present invention provides arrays comprising unique oligonucleotide sequences, also called targets, and their use to identify nucleic acids in samples. Any of SEQ ID NOs: 1030 to 2071 may be placed on arrays for identification of a biological organism or entity. The unique oligonucleotide sequences are bound to the array in predetermined locations, and the unique oligonucleotide sequences hybridize to unique genomic sequences from at least one biological entity. Some non-limiting examples of such biological entities are Bacillus anthracis, Dengue virus, Ebola virus, Arbovirus, Francisella tularensis, Clostridium perfringens, Escherichia coli, Vaccinia, Yersinia pestis, Brucella melitensis or a combination thereof The present invention also provides a method of identifying a biological organism in a sample comprising: immobilizing unique oligonucleotide sequences in predetermined locations on an array, wherein the predetermined locations are associated with a known biological organism or entity; applying a sample containing labeled nucleic acid sequences from the biological organism to the array; permitting the immobilized unique oligonucleotide sequences on the array to hybridize with complementary labeled nucleic acid sequences from the biological organism or entity; and, detecting the labeled nucleic acid sequences hybridized to the unique oligonucleotide sequences in predetermined locations on the array, wherein the location of the label identifies the biological organism or entity, and the labeled nucleic acid sequences hybridized to the unique oligonucleotide sequences in predetermined locations on the array are termed unique genomic sequences. These unique genomic sequences may be genomic fragments of DNA, coding sequences, non-coding sequences, restriction fragments of DNA, RNA, primers, targets, probes, or PCR products. These unique genomic sequences used in the method may comprise at least one of any of SEQ ID NOs: 1 to 1023. These unique oligonucleotide sequences used in the present method may comprise at least one of any of SEQ ID NOs: 1030 to 2071. The samples include but are not limited to an environmental sample, a clinical sample, a biological sample, or a food sample, and may comprise a biological entity. Such biological entities may be selected from the group consisting of Acytota, prokaryotes, eukaryotes, Protista, Fungi, Plantae,

Animalia and Monera. In some embodiments, the biological entity is a pathogen or is genetically engineered. In some embodiments, the biological entity is Bacillus anthracis, Dengue virus, Ebola virus, Arbovirus, Francisella tularensis, Clostridium peifringens, Escherichia coli 0157 : H7, Escherichia coli K12, Vaccinia, Yersinia pestis, Brucella melitensis or a combination thereof.

In combination with current microarray technology, the compositions and methods of the present invention distinguish between different species of biological entities in a way that is not possible with other techniques. In fact, the present invention distinguishes between closely related strains of organisms, such as closely related microbes. In addition to being able to detect many different naturally occurring biological entities concurrently, the large number of highly specific, unique oligonucleotide sequences spotted onto a microarray permit the detection of genetic manipulation of a microbial genome and the presence of atypical virulence factors in an otherwise benign host genome. Embodiments of present invention provide novel and efficient methods for the identification of biological entities in a complex sample, in part, through the use of unique genomic sequences. These unique genomic sequences may be generated from genomic (DNA and RNA) and extra-chromosomal sequences, and from subsets of these sequences (generated by restriction enzyme digestion, PCR, or other enzymatic manipulations of genomic material). The unique genomic sequences may or may not represent coding sequences and subsets of the unique genomic sequences may be represented as unique oligonucleotide sequences. The generation of multiple unique genomic sequences allows for the detection and identification of substantially all biological entities in a given sample.

Preferred embodiments of the present invention relate to the identification of one or more known or unknown biological entities in a complex sample. The invention provides a method for the rapid identification of unknown biological entities in a sample. This invention allows scientists, technicians and medical workers to rapidly characterize unknown biological entities, including pathogens, in a sample taken from any source, including a biological sample, a human individual, an animal, water, plants or foodstuffs, soil, air, or any other environmental or forensic sample.

Methods of the invention have particular application to situations on the battlefield or during outbreaks of disease that may be caused by an unknown biological pathogen, as well as forensic analysis, food and water monitoring to screen for indications of genetic manipulations in specific biological entities and environmental analysis and background characterizations. Using methods of the invention, unknown biological entities having or producing nucleic acids may be detected through the use of targets on an array that directly relate to organism (s) within a sample.

In addition, methods of the invention are useful for the detection of biological pathogens that affect plants or animals. These methods are particularly powerful for the characterization of novel biological entities, such as extremophile biological entities, which grow under harsh conditions. The potential threat of terrorism and battlefield use of biological weapons is growing around the world. On the battlefield, multiple biological weapons may be released at one time, thus creating a situation in which field doctors should have the capability of identifying unknown biological species in a single test. Prior to applicants'invention, however, no such method existed. In an urban setting, a single biological pathogen might be released over a broad area, or in a crowded location, with little or no warning as to the threat and event of this release, nor any statement as to the identity of the biological species that was released.

In the situations referred to above, or in the event of a natural or accidental occurrence or dissemination of a biological pathogen, the first indication of the infection of humans could be a cluster of individuals each displaying similar symptoms. However, as the initial symptoms of many biological pathogens are very similar to each other and to symptoms of the flu (e. g., headaches, fever, fatigue, aching muscles, coughing) the rapid identification of the actual biological species causing the symptoms would be a significant benefit such that medical professionals could implement prompt and proper treatment. In addition, the method according to the invention can be used to assess the status of the etiologic agent with respect to drug resistance, thereby affording more effective treatment e. g. through the use of one or more antibiotics for which the pathogen is not resistant.

Examples of biological pathogens which may be used for production of biological weapons, or for use in terrorism in which event the goal of such terrorism may be to kill or debilitate individuals, animals or plants, include; without-limitation, Bacillus anthracis (anthrax), Yersinia pestis (bubonic plague), Brucella suis (brucellosis), Brucella melitensis, Brucella abortus, Francisella tularensis (tularemia), Coxiella burnetti (Q-fever), Pseudomonas aeriginosa (pneumonia, meningitis), Vibrio cholera (cholera), Variola virus (small pox), Ebola virus (Ebola hemorrhagic fever), Dengue virus (Dengue hemorrhagic fever), Arboviral encephalitides, Alphaviruses (Eastern Equine Encephalitis), Flaviviruses (West Nile virus), Bunyviruses (Crimean-Congo Hemorrhagic fever) SARS-CoV (severe acute respiratory syndrome-associated coronavirus), Botulinum toxin (botulism), Saxitoxin (respiratory paralysis), Ricinus communis (ricin), Salmonella typhimurium (salmonella gastroenteritis), Staphylococcus aureus (staphylococcal food poisoning), methicillin-resistant S. aureus (MRSA), Escherichia coli 0157 : H7, Clostridium perfringens (clostridium food poisoning), Clostridium botulinum, Bacillus subtilus (Bacillus food poisoning), aflatoxin and other fungal toxins, Shigella

(dysentery), Yellow Fever Virus, various hemorrhagic fever viruses, encehpalomyelitis viruses and various encephalitis viruses. There are also numerous animal specific biological entities that are important to the agricultural industry as well as biological entities that are important to the medical diagnostic community that may be of interest such as staphylococcus species, streptococcus species, pseudomonas species and numerous viruses.

These and other objects, features and advantages of the present invention will become apparent after a review of the following detailed description of the disclosed embodiments, the figures and the claims.

BRIEF DESCRIPTION OF THE FIGURES Figure 1 is a flowchart describing, in conjunction with portions of the written description, methods of the present invention.

Figure 2 is a microarray hybridization of fluorescently labeled genomic DNA and unique oligonucleotide sequences demonstrating the hybridization pattern of two different species, C. perfringens and B. anthracis.

Figure 3 is a microarray hybridization of fluorescently labeled genomic DNA and unique oligonucleotide sequences demonstrating the hybridization pattern of two different strains, E. coli 0157: H7 and E. coli K12.

Figure 4 is a scatter plot of the hybridization intensities for two different strains of E. coli that demonstrate strain-specific hybridization differentiation.

Figure 5 shows informative unique oligonucleotide sequences exhibiting strain-specific hybridization.

Figure 6 is a histogram reporting the levels of species-specific hybridization upon exposure of various species to unique oligonucleotide sequences.

Figure 7 demonstrates the sensitivity of the assay of the present invention.

Figure 8 an oligonucleotide array probed with a specific C. perfringens amplicon amplified from PCR primers.

DETAILED DESCRIPTION OF THE INVENTION As required, detailed embodiments of the present invention are disclosed herein.

However, it is to be understood that the disclosed embodiments are merely exemplary of the invention that may be embodied in various and alternative forms. The figures are not necessarily to scale, and some features may be exaggerated or minimized to show details of particular components. Therefore, specific structural and functional details disclosed herein are not to be

interpreted as limiting, but merely as a basis for the claims and as a representative basis for teaching one skilled in the art to variously employ the present invention.

For purposes of the invention disclosure, the term"primer"means a short pre-existing polynucleotide chain to which new nucleotides can be added by DNA or RNA polymerase.

The term"randomly amplifying"means increasing the copy number of a fragment of a genomic sequence in vitro using random primers, each of which are preferably four to fifteen nucleotides in length.

"Amplicon"refers to DNA that has been manufactured utilizing a polymerase chain reaction (PCR) where a set of single stranded primers is used to direct the amplification of a single species of DNA.

"Biological entity"describes a biological element, cellular component, or organism that exists as a particular and discrete unit. This includes, but is not limited to gene, transgene, oncogene, allele, protein, DNA, RNA, mitochondria, pathogenic trait, vector, plasmid, clone, Acytota, prokaryotes, eukaryotes, Protista, Fungi, Plantae, Animalia and Monera, or any mixture thereof. "Organism"is used interchangeably herein with"biological entity." A"sample"may be from any source, and can be a gas, a fluid, a solid, a biological sample, an environmental sample, or any mixture thereof.

"Nucleic acids"means RNA and/or DNA, and may include unnatural or modified bases.

The terms"unique oligonucleotide sequence"and"target"are interchangeable in this disclosure to describe a nucleic acid sequence for which the sequence is known. In some embodiments, unique oligonucleotide sequences are at least 30 nucleotides in length.

The terms"unique genomic sequence"and"unique sequence"are interchangeable in the invention and refer to a sequence of nucleic acids that are specific to a set of organisms.

The term"set of biological organisms"refers to a set of organisms that contain characteristics that are common within the set, e. g. , a species, in which regions of the genome contain unique genomic sequences or genes that are characteristic of the set. Example sets include strain, species, genus, family, group, clade, and other ad hoc sets.

The term"inferred unique genomic sequence"refers to a one or more nucleic acid sequences that are initially identified during an initial similarity search of a query-length genomic sequence, that shares only partial homology to the query length genomic sequence.

These inferred sequences are typically identified in separate species, strains or organisms. The inferred unique genomic sequences are re-routed as query length genomic sequences to confirm the uniqueness of each sequence. Those sequences identified in this step as unique are from then on termed unique genomic sequences.

In the literature there exist at least two confusing nomenclature systems for referring to hybridization partners. Both use common terms:"probes"and"targets."For the purpose of this disclosure, a"target"is the unique oligonucleotide sequence (often set-unique), whereas a "probes the sample whose characteristic (s) (e. g. , nucleic acid sequence, identity, abundance, virulence) is being detected. "Probe"includes any single stranded nucleic acid sequence, molecule, genomic sequence, or amplicon that maybe labeled. Probes can hybridize to a target if sufficient complementarities exist. Note that labeling can be implemented at various stages in either the probe or target or both, as known to those skilled in the art.

The terms"microarray"and"array"are interchangeable as defined by this invention and include a set of miniaturized chemical or biological reaction areas that may also be used to test DNA, DNA fragments, RNA, antibodies, or proteins. Typically, in this disclosure, an"array" contains a plurality of unique oligonucleotide sequences (including nucleic acid sequences complementary to a biological entity to potentially be detected) tethered or immobilized to a surface in predetermined locations, in which the unique oligonucleotide sequences have a known spatial arrangement or relationship to each other. Typically, unique oligonucleotide sequences are chemically attached to a substrate, which can be a microchip, a glass slide or a microsphere- sized bead.

A"labeled"or"detectable"nucleic acid is a nucleic acid that can be detected. The term "detection"refers to a method where analysis or viewing of the detectable nucleic acid is possible visually or with the aid of a device, including, but not limited to microscopes, fluorescent activated cell sorter (FACS) devices, spectrophotometers, scintillation counters, densitometer, and fluorometers, devices using mass spectrometry, devices using or detecting radioisotopes.

"Hybridized"means having formed a sufficient number of base pairs to form a nucleic acid that is at least partly double-stranded under the conditions of detection. The term "hybridization"refers to the process by which two complementary strands of nucleic acids combine to form double-stranded molecules.

The term"complementarity"refers to a property conferred by the base sequence of a single strand of DNA or RNA that may form a hybrid or double stranded DNA: DNA, RNA: RNA or DNA: RNA through hydrogen bonding between base pairs on the respective strands. Adenine (A) usually complements thymine (T) or uracil (U), while guanine (G) usually complements cytosine (C).

For the purpose of this disclosure, the terms unique genomic sequence, inferred unique genomic sequence and unique oligonucleotide sequence typically refer to a sequence of nucleic

acids that are unique to a specific organism, or set of organisms, at the genomic or oligonucleotide level. In addition, "unique"or"uniqueness"as defined by this disclosure is a function of other thresholds, set by the user, regarding identity, homology, score, expected (E) value and the length of the unique sequence under consideration. The disclosed embodiments are merely exemplary of the invention that may be embodied in various and alternative forms and are not therefore construed as limiting.

The identification and characterization of microorganisms present in the environment has historically been accomplished by exploiting a variety of biological, immunological, biochemical, and genetic differences between organisms. Nucleic acid-based diagnostic methods have been developed that are specific for a single organism or small sets of organisms. PCR- based assays are typically performed by designing oligonucleotide primers that amplify organism-specific fragments of DNA. These fragments are subsequently detected by methods such as gel-electrophoresis, real-time PCR, or hybridization to either a membrane or microarray.

A limitation of these existing assays is that although a positive result is informative for a specific organism or organism set, a negative result typically provides no information about the organism (s) under investigation.

Though it is possible to multiplex primers for numerous amplifications in a PCR for the concurrent identification of a variety of organisms, it is non-trivial to design compatible multiple primer pair sets that function in a single amplification reaction. Thus, the number of microorganisms that can be detected or otherwise characterized concurrently with this type of multiplex reaction is relatively small. Techniques such as real-time PCR and quantitative PCR are limited in the number of primer sets that can be used in a single amplification reaction and in the number of fluorescent molecules available for labeling DNA molecules and detection.

In a method for detecting and distinguishing between various species and strains of viruses, viral RNA is reverse transcribed from semi-random primers, amplified by specific primers and then labeled with fluorescent nucleotides in a non-amplifying reaction. The labeled nucleic acids are then hybridized to microarrays that have been spotted with virus and strain- specific oligonucleotides that are representative of the genomes of these organisms. The resulting hybridization pattern discriminates between viruses represented on the array (Wang D, et al., Proc Natl Acad Sci USA. 2002; 24: 15687-92). In this approach, a critical factor of the method is how oligonucleotides are selected for inclusion on an array. Here, oligonucleotides derived from the entire genome are assessed using a software system similar to OLIGO 6, as to whether or not potential oligonucleotide sequences will be good candidates for hybridization based on specific parameters selected by the user, for example GC content. Once the user has

selected the parameters, only oligonucleotides that represented highly conserved sequences within each virus family were selected for representation on an array. This varies significantly from the present invention in which a unique genomic sequence from the organism or set of organisms of interest is first identified, as described below. After the identification of a unique genomic sequence, this unique sequence is screened in a step wise fashion for potential oligonucleotide sequences that demonstrate good hybridization parameters, such as GC content, secondary structure, lack of repeated elements, and the like. Once suitable unique oligonucleotide sequences are identified these may be manufactured onto an array. In another important aspect the approach adopted by Wang et al. , is not directly translatable to fungi and bacteria. The relatively large size (3-5 million bases) and complexity of bacterial and fungal genomes, as compared to most viral genomes, represents an obstacle in the ability to identify oligonucleotides that are species and strain specific. In addition, it is not feasible to synthesize and spot every possible oligonucleotide sequence to represent the entire genome for every microbial species onto a microarray.

Bioinformatic tools such as BLAST, are intended to identify similarities between sequences. While similarities between the sequences of organisms are useful in some types of analysis, the differences between genomes can also be useful in the identification and characterization of organisms. Unfortunately, bacterial and fungal genomes are so vast that it is resource-intensive to subtract common sequences in order to identify unique sequences from all known genomes. Frequently only small fragments of genomic sequences have been identified as unique and are available for identification of an organism.

Current DNA amplification approaches to identify microorganisms are limited in terms of the number of sequences that can be identified concurrently. In vitro, two separate methods are used to multiplex, or identify multiple sequences concurrently. Both are limited by the challenge of generating specific primer pair sets that work well together in a single reaction mixture.

One method for assessing which amplicons are produced in a multiplex PCR reaction is to run the amplification product on a gel and to discriminate the various amplicons based on molecular size. The number of bands that can be resolved on the gel is a limiting factor for this approach. Real time PCR approaches use different fluorescent tags to identify specific amplicons in a multiplex PCR reaction. The number of amplicons that can be resolved using this approach is limited by the number of different fluorescent tags available for probes used in the reaction.

Thus, the limitations are two-fold. The first is a compatibility issue regarding the use of multiple sets of unique primer pairs, and the second is the resolution of the amplified products.

Unique genomic sequences as set-unique sequences Unique genomic sequences in an organism, or set of organisms, may include both coding and non-coding sequences. Set-unique sequences can be coding or non-coding sequences. Set- unique sequences (coding or non-coding) can be inferred (see below) or identified by searching through fully sequenced genomes. Partially sequenced genomes typically focus on coding sequences. Unique genomic sequences are useful for identification of unique oligonucleotide sequences.

Using BLAST to identify unique genomic sequences.

Embodiments of the present invention include methods and systems for the identification of unique genomic sequences that are informative of the biological characteristics (e. g. , nucleic acid sequence, presence of an entity or organism, abundance, virulence, genetic modification) of a sample. Referring to Figure 1, a method AGO of the present invention is shown.

Obtain In the illustrated embodiment, a subset of the genomic data of the organism under investigation A05 is obtained. The subset C05 can be obtained from known genomic data source 10 UniGene, GenBank, European Molecular Biology Laboratory (EMBL), among other sources.

Genomic data can also be obtained as sequence information derived from in vitro experiments 20 such as PCR and enzymatic digestion. A preferred subset of genomic data is the entire genomic sequence of an organism.

Preprocess In some embodiments, the obtained genomic data is preprocessed A10. Each aspect of preprocessing can be performed as needed or desired.

Convert In preferred embodiments, if necessary, the genomic data subset is converted from its native format, e. g. , standard GenBank annotated format, to a format compatible with subsequent steps. In some embodiments, where GenBank annotated form is used, the genomic data is converted to FASTA format to support a BLAST search.

Annotate The query-length genomic sequences were realigned with the genome from which they were generated in order to determine the exact start and stop point of each query length sequence within the genome. Any annotations within the genome in the region containing the query length genomic sequence were transferred to the query length genomic sequence. Annotated regions include sequences known to have a specific biological function such as protein coding regions,

biologically active RNA encoding regions, promoter and regulatory elements, spacing elements within operons, protein binding sites, and the like.

Remove Additional preprocessing involves removing or masking portions of the genomic data that are judged not to have biologically informative value. This can include sequences known to be conserved with respect to the organism set under investigation, repeats, inverted repeats, long terminal repeats, sequences otherwise known to be not favorable for hybridization.

Divide In some embodiments, genomic data is divided into query-length genomic sequences A15. In one embodiment sequences of 1000 bases in length are utilized. It is to be understood that smaller query-length genomic sequences may be used until analysis of such smaller sequence reveals the that the query length genomic sequence is no longer unique to an organism or set of organisms. In a preferred embodiment the query length sequence A15 is the entire genome data.

Note that if an entire genome was obtained, and no preprocessing performed, the query- length sequence A15 is the entire genome of the organism under investigation. In some embodiments, all the genomic data available for the organism under investigation is obtained, all preprocessing steps are completed, resulting in annotated query-length sequence of 1000 bases that do not include conserved sequences, repeats of various types, or sequences having characteristics that otherwise make them unamenable to subsequent steps.

Query In preferred embodiments, the query length sequence (preprocessed or not) is used as a query to a similarity search program A20, e. g. , BLAST. The query is directed to a selected database, A25 of genome data. In some embodiments, the selected database is limited to organisms of the same type under investigation, in order to increase search efficiency over what it would be were the search directed to a full database containing a broader variety of organisms.

For example, if only microbial organisms were under investigation, the selected database A25 would be a database of microbial genomic data-broader databases including, for example, mammalian genomic data, would be avoided at this stage. In these circumstances, a subsequent search against the broader database is preferred in order to confirm the uniqueness of these initial results.

In some embodiments, query-length sequence is removed from the selected database, while in other embodiments, results showing homology to the query itself are either ignored, or

taken as confirmation of the validity of the query with respect to the organism under investigation.

Parse Preferred embodiments parse A30 the similarity search program output A25 to identify sequences lacking significant similarity with other organisms in the selected database, e. g., unique genomic sequences A32. This is counter to the typical use of such search programs. In some embodiments, lacking significant similarity, e. g. ,"unique,"means no hits or hits with a E- value close to"0"Zero.

In practice, computational resources are finite, so the selected database may range from a database of all fully or partially known genomes to a narrower database such as known microbial genomes. Directing the initial query to a database of less than all available genomic data, while computationally economical, may make it advisable to BLAST the candidate sequences (e. g. , in preferred embodiments, those genetic sequence segments found to be unique) against the broader databases, e. g. , the NCBI nr database to detect homology with other known genomes.

At this point, the sequences (less than or equal to query-length) can be identified as unique genomic sequences to the organism or set of organisms for which they were searched. A list of unique genomic sequences identified from bacterial and viral genomic sequences of Bacillus anthracis, Dengue virus, Ebola virus, Arbovirus, Francisella tularensis, Clostridium perfringens, Escherichia coli 0157 : H7, Escherichia coli K12, Vaccinia, Yersinia pestis, and Brucella melitensis generated by the method described herein are provided in SEQ ID NOs : l- 1023. For each organism multiple unique genomic sequences were identified using the method described herein, for example B. anthracis was determined to contain 93 unique genomic sequences. Further analysis of each organism revealed the relative amount of unique genomic sequence per genome, respectively (see Table 1).

Table 1: Unique Sequences Identified in Microbial Genomes Organism Accession Number Total Unique Sequences (bp Size of Genome (bp) % Unique Bacillus anthracis NC 003ggS 39, 000 5, 093, 554 0.77 Yersinia pestis NC 004088 15, 000 4, 600, 755 0.33 Brucella melitensis Chromosome 1 NC_003317 1,300 2,117, 144 0.06 Clostridium perfringens NC 003366 360, 000 3, 031, 430 11.88 Escherica coli 0157 : H7 NC 002695 130, 000 5, 498, 450 2. 36 Escherica coli K12 NC_000913 27,000 4,639, 221 0. 58 Vaccinia NC_001559 1,400 191, 737 0. 73 Ebola NC 003549 4, 000 18, 959 21. 1 Eastern Equine Encephalitis Virus NC 003899 2, 000 11, 675 17.13 íFrancisella tularensis pOMl Plasmid NC_002109 2, 400 4, 442 54.03

In this disclosure, unique genomic sequences generally ranged in size from twenty five nucleotides in length to several thousand nucleotides in length. These sequences, with optional annotation, can be saved to a database of unique sequences A32, or added to the growing knowledge base of the genome of the organism under investigation.

Inferred Sequences The output of the similarity search program can also be used to identify further query- length sequences for organism (s) other than the original organism under investigation. For example a first query-length sequence (SEQ ID NO : 27) may show high homology/identity against the particular strain it was derived from but also significant homology to a related strain (s) (SEQ ID NOs: 1024-1029). Such sequences can be referred to as inferred unique genomic sequences A34. The portion of the related strain where limited homology is detected can be searched A20 as a query-length genomic sequence A15 (by being searched against the selected database A25) to confirm its identity as a unique genomic sequence A32 for the related organism (s). Exemplary inferred sequences have sufficient homology to the first query length genomic sequence to be indicated by a BLAST search, but not sufficient homology to cross- hybridize with oligonucleotides derived from the query length genomic sequence. Inferred unique genomic sequences are useful for identification of unique oligonucleotide sequences.

Referring to Example 2, a search against the NCBI nt database, using as a query (SEQ ID NO : 27) a Vaccinia virus sequence found to be unique by a method of the present invention, identified candidate sequences SEQ ID NO: 2072-2075 (regions of the Vaccina virus genorne) with 100% identity over the entire query sequence; Pox-virus related sequences (SEQ ID NOs: 1-24-1028,2076) with identity ranging from 92% to 96% over portions of the query sequence; and a Ectromelia virus (SEQ ID NOs: 1029,2077) with 100 identity over a small portions of the query sequence. The first group confirms that the query sequence is part of both the Vaccinia strain and complete genome. The second and third group identify sets of organisms with significant homology to the Vaccinia unique genomic sequence. Preferred embodiments of the invention infer that the second and third group of sequences come from unique regions of the genome of those organism sets. Such inferred sequences preferably undergo evaluation and validation as described herein. SEQ ID NOs: 1024-1029 lists exemplary inferred unique genomic sequences (subsequently confirmed as unique genomic sequences) found using methods of the present invention.

Unique and inferred unique genomic sequences can be identified using the method described herein for a number of other biological entities including, but not limited to; Anthrax

(Bacillus anthracis), Botulism (Clostridium botulinum toxin), Brucellosis (Brucella species), Burkholderia mallei (glanders), Burkholderia pseudomallei (melioidosis), Chlamydia psittaci (psittacosis), Cholera (Vibrio cholera), Clostridium perfringens (Epsilon toxin), Coxiella burnetii (Q fever), E. coli 0157: H7 (Escherichia coli), Emerging infectious diseases such as Nipah virus and hantavirus, Food safety threats (e. g., Salmonelle species, Escherichia coli 0157: H7, Shigella), Francisella tularensis (tularemia), Ricin toxin from Ricinus communis (castor beans), Rickettsia prowazekii (typhus fever), Salmonella Typhi (typhoid fever), Salmonellosis (Salmonella species), Smallpox (variola major), Staphylococcal enterotoxin B, Variola major (smallpox), Viral encephalitis (alphaviruses e. g. , Venezuelan equine encephalitis, eastern equine encephalitis, western equine encephalitis), Viral hemorrhagic fevers (filoviruses e. g. , Ebola, Marburg and arenaviruses e. g. , Lassa, Machupo), and Yersinia pestis (plague).

It is to be understood that the list of unique and inferred unique genomic sequences presented here is not exhaustive. Indeed, one skilled in the art can readily adapt the method described herein to identify unique genomic sequences for any known or unknown biological entity, without departing from the spirit of the present invention.

Align In some embodiments of the invention, the unique genomic sequences produced, if not already aligned, are realigned with the genome from which they were generated in order to determine the exact start and stop point of each unique genomic sequence within the genome.

Any annotations within the genome in the region containing the unique genomic sequence were transferred to the unique genomic sequence. Annotated regions include sequences known to have a specific biological function such as protein coding regions, biologically active RNA encoding regions, promoter and regulatory elements, spacing elements within operons, protein binding sites, and the like.

Phylo/FIGURE In some embodiments of the present invention, the process of obtaining genomic data, preprocessing the data, querying the selected database (s) and parsing results to identify candidate genomic sequences is implemented as a computer program product. In these embodiments, a plurality of organisms and sets of organisms can be investigated concurrently. Computer program products of this invention include the ability to indicate the organism (s) /set of organisms of interest, indicate the selected database, set thresholds for identifying inferred unique genomic sequences, direct the handling for inferred unique genomic sequences, set thresholds for identifying unique genomic sequences, direct the handling for unique genomic sequences, aligning and annotating unique genomic sequences, and output unique genomic

sequences for oligonucleotide search. Intermediate and final results can be made available for user inspection.

Evaluate Both unique genomic sequences A32 and inferred unique sequences A34 are evaluated A40 for subsets e. g. , favorably evaluated target-length oligonucleotides, that are amenable to hybridization. The evaluation is done in a target-length oligonucleotide window/range derived from the query length genomic sequence, and preferably moved one base at a time through the query-length genomic sequence. Target-length oligonucleotides are evaluated for, among other characteristics, GC content, Tm, repetitive elements, availability of primer amplification sites, and avoiding secondary structures such as hairpins and duplexes.

In some embodiments this functionality is provided using a program such as OLIGO 6 (Molecular Biology Insights, Inc. , Cascade CO). In other embodiments, this functionality is incorporated into a computer program product of the invention.

OLIGO 6 is a multi-functional program that searches for and selects oligonucleotides from a sequence file for polymerase chain reaction (PCR), DNA sequencing, site-directed mutagenesis, and various hybridization applications. It calculates hybridization temperature and secondary structure of oligonucleotides based on the nearest neighbor thermodynamic values. It is also a good tool for construction of synthetic genes, finding an appropriate sequencing primer among those already synthesized, finding and multiplexing consensus primers and probes, and even finding potential restriction sites in a protein.

In some embodiments, unique oligonucleotide sequences produced as a result of the steps described above are approximately 25-100 bases in length. In preferred embodiments, the length range for unique oligonucleotide sequences is 50-70 nucleotides. Factors that assist in the determination of optimal unique oligonucleotide sequence length include the ability to synthesize the oligonucleotide, the desired hybridization temperature of the microarray, balancing the Tm of the various oligonucleotides against G/C content of the molecule and the possible chemical composition of the hybridization solution used on the microarray. In one embodiment, target- length oligonucleotides are chosen based on their melting temperature Tm of 90° C, 3'-dimer AG of-8.0 kcal/mol, 3'-terminal stability range of-4.8 to 11.6 kcal/mol, GC clamp stability of-8.0 kcal/mol, minimal acceptable loop AG of-1. 9 kcal/mol, maximum number of acceptable sequence repeats of 6 and a maximum length of acceptable dimer of 2 base pairs.

Search and Parse In some embodiments, favorably evaluated target-length oligonucleotides A45, e. g. , those found amenable to hybridization, are used as a query to a similarity search program A50, e. g.,

BLAST. The query is directed to a selected database, A55 of genome data in order to determine whether the target-length oligonucleotide is unique to the organism or organism set under investigation. To this end, preferred embodiments parse A50 the similarity search program output to identify oligonucleotides lacking significant similarity with other organisms in the selected database, e. g. , unique target-length oligonucleotides A52. This is counter to the typical use of such search programs. In some embodiments, lacking significant similarity, e. g.,"unique," means no hits or hits with a E-value close to"0"zero.

At this point, the favorably evaluated target length oligonucleotides that were searched can be identified as unique to the organism or set or organisms for which they refer to. SEQ ID NOs: 1030-2071 lists exemplary unique oligonucleotide sequences identified by a method of this invention. Unique oligonucleotide sequences found using embodiments of the present invention include oligonucleotides generally ranging in size from 25 nucleotides to approximately 50 nucleotides in length. These unique oligonucleotide sequences, with optional annotation, can be saved to a database A38 of unique sequences, or added to the growing knowledge base of the genome of the organism under investigation.

Selection of targets.

As stated above, the unique and inferred unique genomic sequences SEQ ID NOs : l-1029 were subsequently used to prepare unique oligonucleotide sequences, see SEQ ID NOs: 1030- 2071. It is to be understood that the list of unique and inferred unique oligonucleotide sequences presented here is not exhaustive, indeed in light of this disclosure, one skilled in the art can readily adapt the method described herein to identify unique oligonucleotide sequences that identify any known or unknown biological entity, without departing from the spirit of the present invention.

Table 2 is a non-limiting list of biological entities identified and differentiated between utilizing the unique oligonucleotide sequences obtained through the method described herein.

Table 2. Species and strains of organisms that can be identified through unique oligonucleotide sequences Vaccinia related organisms Vaccinia Tian Tan Vaccinia Ankara Vaccinia Columbia Vaccinia Venezuela Vaccinia Lister Vaccinia Temple of Heaven Cowpox Brighton Red Cowpox GRI-90 Buffalo pox Rabbit pox Monkey pox Benin-1978 Monkey pox Zaire-1977 Monkey pox Zaire-1979 Monkey pox Nigeria-1971 Monkey pox Sierra Leone-1970 Monkey pox Liberia-1970 Monkey pox Zaire 96-I-16 Monkey pox Clone CV1 Monkey pox Clone CW-N1 Camel pox M-96 from Kazakhstan Camel pox Saudi-M3 Camel pox Somalia-1977 Camel pox Somalia-1978 Camel pox CMS Variola Variola Garcia-1966 Variola Major Bangladesh-1975 Variola minor Ectromelia virus Moscow Taterapox virus Brucella related organisms Brucella melitensis Brucella suis 1330 Mesorhizobium loti Rhodopseudomonas palustris Agrobacterium tumefaciens C58 Sinorhizobium meliloti Y. pestis related organisms Yersinia pestis KIM Yersinia pestis C092 Yersinia pseudotuberculosis

Clearly, the present invention is not limited to the identification of bacterial or viral species but can be used to identify naturally occurring known, unknown and genetically engineered biological entities for which sequencing information exists or can be ascertained.

Unique oligonucleotide sequences are typically prepared using a DNA synthesizer and commercially available phosphoramidites using standard automated procedures. Unique oligonucleotide sequences were dried and rehydrated in 3X sodium citrate 15 mM, sodium chloride 150 mM (SSC) pH 7.0, typically at a concentration of 150ng/ul and spotted onto prepared arrays by a microarray printing robot.

In some embodiments, the present invention identifies regions of species and strain- specific unique genomic sequence from the genomes of biological entities. Species and strain unique genomic sequences can be derived from a variety of complex samples and from both single-cell and multi-cellular organisms. Unique genomic sequences are initially screened using a similarity software package for regions of homology against other biological entities to ultimately construct unique oligonucleotide sequences. These unique oligonucleotide sequences can be used as probes, targets or primers. In one embodiment, targets may be"spotted"onto microarrays for use in the identification and detection of biological entities. Because of the large amount of unique genomic sequence generated by this method, it is possible to track genetic manipulation of biological entities, identifying virulence and antibiotic resistance genes in an otherwise harmless genetic background. By selecting sequences that are genus, species and strain specific, it is possible to extend the identification of biological entities to the classification of previously undiscovered or genetically manipulated biological entities. The discovery of unique genomic sequences from these biological entities opens the possibility of developing methods that, through the use of highly specific, unique oligonucleotide sequences, expands the number of biological entities that can be detected in a single assay.

Amplification, DNA Sources and labeled Probe Generation Genomic DNA can be obtained from a variety of different commercial and non- commercial sources to generate probes for microarray hybridization. Fluorescent genomic probes were generated by randomly labeling 250 ng of genomic material with 3 RI of Cy3-dCTP in a standard Klenow reaction. Klenow labeling was performed either at 37°C for two hours or overnight at room temperature. Labeled products were purified over Microcon columns (Millipore, Billerica, MA) prior to use in microarray hybridization, as per manufacturer's instructions.

Amplicons to unique genomic regions were generated by PCR amplification from primers that flank each unique region. The amplicons were Klenow labeled as described above to

generate a probe that is highly specific for the unique oligonucleotide sequences that were identified within that region.

In one embodiment of the present invention, in conjunction with a method of random amplification, it is possible to identify and characterize substantially all biological entities in a sample for which sequence information is available. In one embodiment of the present invention a method for detecting a biological entity in a sample comprises, randomly amplifying all nucleic acids in the sample to produce probes, labeling the probes to produce labeled probes; hybridizing the labeled probes to an array containing unique oligonucleotide sequences; and, detecting the labeled probes that hybridize to the array. Hybridization of labeled probes may result in the identification of that biological entity based on the pattern of hybridization to one or multiple unique oligonucleotide sequences located on the microarray in predetermined locations.

In an alternative embodiment, the amplification step comprises a polymerase chain reaction (PCR) or other method of generating multiple copies of the original genomic material, such as the rolling circle method. Generally, conventional PCR methodology (see e. g., Molecular Biology Techniques Manual, Third Edition (1994), Coyne et al. Eds. ) can be used for the amplification. PCR and (realtime) RTPCR amplification can be used in most environmental, veterinary, human health related samples, agricultural samples that have not been cultured. There are numerous whole genome amplification schemes such as rolling circle amplification, partially random primer amplification, and the like. These are used primarily in single cell amplification techniques for characterization of sperm or eggs.

In some embodiments of the present invention, it is possible to isolate and culture specific organisms directly from a sample. As such, purification of genomic material and Klenow labeling is sufficient for identification.

The value of an unique oligonucleotide sequence (target) as a representative region of unique genomic sequence which can identify or characterize one or more biological entities is validated by the hybridization of labeled probes to the one or more organism-specific targets immobilized on the microarray. This method is useful for such detection of one or more organisms in the context of hospitals or physicians'offices, battlefield or trauma situations, emergency responders, forensic analysis, food and water monitoring, screening for indications of genetic alterations in specific biological entities, environmental analysis and background characterizations.

Array The unique oligonucleotide sequences immobilized on the microarray may include multiple sequences from one or more known biological entities or sets of known biological

entities. Preferably, the array includes one or more multiple sequences from one or more numerous known biological entities including conserved, non-conserved or both conserved and non-conserved sequences. The array contains between at least one and two hundred different, preferably between at least two and two hundred non-overlapping sequences from each known organism possibly present in the sample. More preferably, the array contains at least five different, non-overlapping sequences from each known organism possibly present in the sample.

Most preferably the array contain at least 20 different, non-overlapping sequences from each known organism possibly present in the sample. The array optionally includes both sense and nonsense nucleic acid sequences from all known biological entities anticipated in the sample.

Most preferably, the unique oligonucleotide sequences are at predetermined positions on the array.

In certain preferred embodiments, the unique oligonucleotide sequences immobilized on the array are 30 or more nucleotides in length. More preferably, the unique oligonucleotide sequences on the array are between 50 and 70 nucleotides in length but may be a number of nucleotides of greater length.

In preferred embodiments, the unique oligonucleotide sequences are immobilized on a surface. In certain preferred embodiments, the surface on which the unique oligonucleotide sequences are immobilized is an opaque membrane. Preferred opaque membrane materials include, without limitation, nitrocellulose and nylon. Opaque membranes are particularly preferred in rugged situations, such as battlefield or other field applications. In certain preferred embodiments, the surface is silica-based. "Silica-based"means containing silica or a silica derivative, and any commercially available silicate chip would be useful. Silica-based chips are particularly useful for hospital or laboratory settings and are preferably used in a fluorescent reader.

Arraying the unique oligonucleotide sequences at predetermined positions on an array allows for an array-based approach for the detection of biological organism within a given sample. The array in some embodiments may contain hundreds or several thousand unique oligonucleotide sequences in a predetermined pattern. The unique oligonucleotide sequences are printed onto the microarray using computer-controlled, high-speed robotics, devices that are often termed"spotters". A spotter can be utilized to produce substantially identical arrays of the unique oligonucleotide sequences. Because the location of each unique oligonucleotide sequences is known, hybridization, detection, localization and analysis of the array may lead to the conclusion that known or unknown biological entities are present in the original sample.

In one embodiment, the present invention is useful for phylogenetic analysis of unknown biological entities. In this embodiment, the unique oligonucleotide sequences immobilized on the array contain a continuum of highly conserved nucleic acids and highly specific nucleic acids from a known organism or a set of known biological entities. Because the location of each unique oligonucleotide sequence is known, hybridization, detection, localization and analysis of the array may lead one to establish the unknown organism's kingdom, phylum, class, order, family, genus, and/or species.

Hybridization The presence of a particular organism within a given sample is determined by hybridizing the labeled probes from the sample to targets or an array. Hybridization is preferably conducted under high stringency hybridization conditions, as in preferred embodiments, the amplified products will be at least 30, preferably at least 50 nucleotides in length. Alternatively, hybridization at temperatures lower than those required under high stringency conditions may be employed. Most preferably, a proper means of detection is used to visualize each label incorporated in the probe in order to identify which amplified product hybridized to which target.

Forms of visualization may include, but are not limited to, microscopes, FACS devices, spectrophotometers, scintillation counters, fluorometers, densitometers, devices using mass spectrometry and devices using radioisotopes or detecting radioisotopes. As the array contains targets in a known pattern, the pattern of observed hybridization is compared to the known pattern of the array to identify biological entities within the sample.

In some embodiments, hybridization of oligonucleotide arrays was performed for 2 hours at 37-50°C. Hybridization buffer comprising 3X SSC, 20mM HEPES pH 7.0, 0.2X SDS with 1 ug yeast tRNA and 5 ; j. l of Cy3 (green) labeled probe was prepared in a total volume of 23 p. l.

Typically, post-hybridization washes consisted of 2X SSC, 2% SDS for 5 minutes, 1X SSC, 1% SDS for 5 minutes, 1X SSC for 5 minutes, and 0. 01X SSC submersion to remove residual SDS.

All washes were performed at room temperature. Washed microarrays were subsequently visualized to confirm utility of the various oligonucleotides spotted.

Detection The probes may be modified in such a way to be detectable when hybridized to the targets on the microarray however, it may be possible to detect without modification of the sample. The modification can be conducted before, after or during hybridization to the array.

Most preferably the modification occurs during the amplification step. The amplification products (probes) are modified so that they are detectable directly or indirectly. Directly detectable modifications are immediately detectable whereas indirect modification requires that

the probe, before or after hybridization to the array, be subject to a subsequent modification or reaction step. For example, the probe is directly detectable by adding a detectable molecule, such as a labeled nucleotide, to the amplification reaction mixture during amplification. The probe is indirectly modified by incorporating a reactive molecule during the amplification step. For example, an enzyme substrate is incorporated into the probe. The modified probe is then reacted with a reagent, such as an enzyme, to produce a detectable signal. In embodiments in which the probes are enzymatically detected, preferred enzymes include, without limitation, alkaline phosphatase, horseradish peroxidase, P1 nuclease, Sl nuclease and any other enzyme that produces a colored product.

In a preferred embodiment, detectable nucleotides or nucleoside triphosphates are added to the amplification reaction mixture. Preferably, the detectable nucleotides or nucleoside triphosphates are fluorescently labeled or radiolabeled. In other preferred embodiments, the label is a hapten, including, but not limited to, digoxigenin, fluorescein and dinitrophenol.

Digoxigenin labeled probes are readily detected using commercially available immunological reagents.

In certain preferred embodiments, the probes are biotinylated. Biotinylated probes are readily identified through incubation with an avidin linked colorimetric enzyme, for example, alkaline phosphatase or horseradish peroxidase. Biotin is particularly preferred in applications in which visualization is required in the absence of fluorescence-based systems.

Alternatively, the probes contain a substance that can be derivatized to subsequently allow for the attachment of labels, such as colloidal gold.

Recent advances in molecular biology, have lead to the development of new methods for labeling and detecting DNA and DNA fragments. Traditionally, radioisotopes have served as sensitive labels for DNA while, more recently, fluorescent, chemiluminescent and bioactive reporter groups have also been utilized. In one embodiment, fluorochromes may be used as a method of detection. Fluorescent and chemiluminescent labels function by the emission of light as a result of the absorption of radiation and chemical reactions, respectively. Kits and protocols for labeling probes are readily available in the published literature regarding PCR amplifications.

Such kits and protocols provide detailed instructions for the labeling of both probes which can be readily adapted for the purposes of the method of the present invention. After hybridization, arrays or membranes are often washed. There are two reasons for this. One reason is to remove excess hybridization solution from the array. This promotes only having labeled probe specifically bound to the target on the array and thus representative of the organism (s) in a given sample. Another reason is to increase the stringency of the experiment by reducing cross-

hybridization. This can be promoted by either washing in a low salt wash (0.1 SSC and 0.1 SDS) or high temperature wash. Typical automatic hybridization systems incorporate a washing cycle as part of their automated process.

Samples Preferred embodiments of the present invention relate to the identification of one or more known or unknown biological entities in a complex sample. The invention provides a method for the rapid identification of unknown biological entities in a sample. This invention allows scientists, technicians and medical workers to rapidly characterize unknown biological entities, including pathogens, in a sample taken from any source, including a biological sample, a human individual, an animal, water, plants or foodstuffs, soil, air, or any other environmental or forensic sample.

Methods of the invention have particular application to situations on the battlefield or during outbreaks of disease that may be caused by an unknown biological pathogen, as well as forensic analysis, food and water monitoring to screen for indications of genetic manipulations in specific biological entities and environmental analysis and background characterizations. Using methods of the invention, unknown biological entities having or producing nucleic acids may be detected through the use of targets on an array that directly relate to organism (s) within a sample.

In addition, methods of the invention are useful for the detection of biological pathogens that affect plants or animals. These methods are particularly powerful for the characterization of novel biological entities, such as extremophile biological entities, which grow under harsh conditions. The potential threat of terrorism and battlefield use of biological weapons is growing around the world. On the battlefield, multiple biological weapons may be released at one time, thus creating a situation in which field doctors should have the capability of identifying unknown biological species in a single test. Prior to applicants'invention, however, no such method existed. In an urban setting, a single biological pathogen might be released over a broad area, or in a crowded location, with little or no warning as to the threat and event of this release, nor any statement as to the identity of the biological species that was released.

In the situations referred to above, or in the event of a natural or accidental occurrence or dissemination of a biological pathogen, the first indication of the infection of humans could be a cluster of individuals each displaying similar symptoms. However, as the initial symptoms of many biological pathogens are very similar to each other and to symptoms of the flu (e. g., headaches, fever, fatigue, aching muscles, coughing) the rapid identification of the actual biological species causing the symptoms would be a significant benefit such that medical professionals could implement prompt and proper treatment. In addition, the method according

to the invention can be used to assess the status of the etiologic agent with respect to drug resistance, thereby affording more effective treatment e. g. through the use of one or more antibiotics for which the pathogen is not resistant.

Examples of biological pathogens which may be used for production of biological weapons, or for use in terrorism in which event the goal of such terrorism may be to kill or debilitate individuals, animals or plants, include; without-limitation, Bacillus anthracis (anthrax), Yersinia pestis (bubonic plague), Brucella suis (brucellosis), Brucella melitensis, Brucella abortus, Francisella tularensis (tularemia), Coxiella burnetti (Q-fever), Pseudomonas aeriginosa (pneumonia, meningitis), Vibrio cholera (cholera), Variola virus (small pox), Ebola virus (Ebola hemorrhagic fever), Dengue virus (Dengue hemorrhagic fever), Arboviral encephalitides, Alphaviruses (Eastern Equine Encephalitis), Flaviviruses (West Nile virus), Bunyviruses (Crimean-Congo Hemorrhagic fever), SARS-CoV (severe acute respiratory syndrome-associated coronavirus), Botulinum toxin (botulism), Saxitoxin (respiratory paralysis), Ricinus communis (ricin), Salmonella typhimurium (salmonella gastroenteritis), Staphylococcus aureus (staphylococcal food poisoning), methicillin-resistant S. aureus (MRSA), Escherichia coli 0157 : H7, Clostridium perfringens (clostridium food poisoning), Clostridium botulinum, Bacillus subtilus (Bacillus food poisoning), aflatoxin and other fungal toxins, Shigella (dysentery), Yellow Fever Virus, various hemorrhagic fever viruses, encephalomyelitis viruses and various encephalitis viruses. There are also numerous animal specific biological entities that are important to the agricultural industry as well as biological entities that are important to the medical diagnostic community that may be of interest such as staphylococcus species, streptococcus species, pseudomonas species and numerous viruses known to one of ordinary skill in the art.

In an embodiment of the method described herein, unique oligonucleotide sequences from one or more of the foregoing known biological entities are immobilized on the array as representative targets for known biological entities.

In an embodiment of the method described herein, unique oligonucleotide sequences from one or more of the foregoing known biological entities are immobilized on the array as representative targets for unknown biological entities.

In another embodiment, the unknown biological entity is a pathogen. Since the method of this invention is designed to substantially amplify all DNA within the sample, the unknown biological species will be amplified through a method described herein and be present in multiple copies.

In another preferred embodiment, the sample comprises multiple (more than one) biological entities. Depending upon the type of sample chosen and the size of the sample, an array comprised of hundreds or thousands of unique oligonucleotide sequences in a predetermined pattern is created. To increase the confidence in the conclusion that the biological sample contains a known organism, the microarray preferably includes positive and negative controls and redundancies, for example multiple copies of the same unique oligonucleotide sequences. The microarray is also useful for the partial characterization and identification of unknown biological entities and may provide broad as well as specific identification. For example, 16s ribosomal RNA is used to identify the unknown organism as a bacteria, conserved bacillus sequence is used to identify the unknown organism as a particular bacillus species, and specific DNA further classifies the bacillus species and assists in the identification of a new strain. Any desired genetic material, regardless of genus, family, species or strain may be included on the array through reference to the published literature of DNA sequences, and then by either synthesis or cloning of such published sequences.

Pre-screening Method In one embodiment, the method seeks to minimize false positive test results by pre- screening the environmental, biological or food from which a test sample is subsequently taken.

In accordance with this pre-screening method, a"background"environmental, biological or food sample of interest is obtained, and nucleic acid sequences in the sample are amplified and combined with a microarray as described above. If amplification products hybridize to any unique oligonucleotide sequences on the array, then the unique oligonucleotide sequences immobilized on the array to which the background probes hybridized are either removed from the array or any signals detected at those locations on the array are ignored in subsequent assays when samples suspected of containing the same probes are analyzed. Different arrays can then be tailored to particular predetermined environments, biological samples or foods to remove or ignore signals generated by the hybridization of background nucleic acids to the array. These methods are particularly suitable for customs, security and military applications. For example, customs officials at ports of entry including airports, harbors and country borders can utilize the pre-screening method described herein to screen food samples for commonly occurring pathogens such as E. coli, Salmonella typhi, Hepatitis A virus and the like. In pathogen-free samples the level of hybridization observed for known pathogens on the array is minimal, this information is then used as a"standard"or"acceptable"guidance level to subsequently identify contaminated shipments. In another example, security personnel at ports of entry such as airports can use the pre-screening method described herein as a guidance to"background"levels of

pathogens or biological entities amongst baggage, mail and other transit items. Samples that screen positive for known pathogens or biological weapons as compared to the background samples can be further investigated. In a military situation, troops are mobilized to remote locations, the environments of which are pre-screened using the pre-screening method to identify background biological entities. This information is then used to facilitate the subtraction of "background"from results using a new test sample. Thus, when samples are tested during combat or hostile situations for biological warfare agents, fewer false positive test results in the pre-screened environment will be observed.

For example, in a method for detecting a target organism such as B. anthracis, an environmental sample such as a air, soil, water or vegetation is obtained and the nucleic acid sequences in the sample are amplified to produce probes. The probes are combined with an array containing immobilized unique oligonucleotide sequences specific for B. anthracis as described above. If the array contains twenty unique oligonucleotide sequences for an organism such as Bacillus anthracis and twenty unique oligonucleotide sequences for an organism such as Yersinia pestis, and the background sample binds to sequences 1,3 and 6 of Bacillus anthracis and sequences 2 and 4 of Yersinia pestis even though the sample is free from both pathogens, the array is reconfigured to remove those five sequences or the detection software is adjusted to ignore signals generated when an probe binds to those sequences, thereby reducing false positive results.

In a method of the present invention for detecting toxic bacteria such as Listeria monocytogenes, in food, a sample is pre-screened for interfering bovine or avian unique oligonucleotide sequences from beef or chicken food products, respectively. A sample free of pathogenic L. monocytogenes is amplified and combined with an array containing twenty unique oligonucleotide sequences specific for L. monocytogenes and twenty unique oligonucleotide sequences specific for Salmonella enteriditis. If the background food sample contains a probe that binds to the L. monocytogenes sequence 1, then that unique oligonucleotide sequences is removed from the array or the software is adapted to ignore a signal generated at that location on the array, thereby reducing false positive results and the unnecessary recall of uncontaminated food products.

Embodiments of the present invention are also useful as a means of phylogenetic analysis. In such embodiments a continuum of highly conserved nucleic acids sequences and highly specific nucleic acids are used to categorize a multiplicity of biological entities from a single sample based upon the hybridization pattern generated. Thus one can conclude the

presence or absence of specific biological entities in the sample, as well as establish the organism's place in a hierarchy, e. g. kingdom, phylum, class, order, genus and/or species.

In addition, the present invention enables users to survey numerous unique and conserved elements throughout the genome of a particular organism of interest, in particular, those elements that are responsible in some way for causing disease or in allowing the organism to resist prophylactic or therapeutic measures to defeat it. The fact that the present invention can identify many unique genomic sequences of the genome and the microarray may contain unique oligonucleotide sequences from those unique genomic sequences, including structural, biochemical, virulence and resistance elements, dramatically increases the probability that a particular organism is present.

The present invention utilizes unique oligonucleotide sequences identified from one or more biological entities to act as targets for hybridization. Specific hybridization of genomic material to a target can be observed on a microarray at high resolution for a number of biological entities. Furthermore these biological entities may be present in complex environmental samples.

Microarrays may be used to detect the presence of a specific biological entity but may also be refined to include both highly conserved and highly unique oligonucleotide sequences to assist in the identification of precise strains or the presence of virulence factors, such as those often found in genetically modified organisms.

The power of this technique is the ability to design a large number of unique oligonucleotide sequences that are species and/or strain specific for use in the detection and characterization of biological entities, particularly by microarray analysis. The unique genomic sequences generated by this method are better than using ribosomal genes for the detection and characterization of microbes because there is much more sequence information from which to obtain unique oligonucleotide sequences (ribosomal gene analysis ignores greater than 99% of the genome). Identifying and spotting unique oligonucleotide sequences is more cost and time effective than spotting all possible oligonucleotides from every genome. The use of randomly labeled probes, generated from genomic material, to hybridize to numerous unique oligonucleotide sequences permits the simultaneous detection of numerous biological entities in a sample. Furthermore, this method permits the detection of genetic manipulation by independently assaying for species-specific unique genomic sequences as well as virulence factors that may be introduced into an otherwise harmless genetic background.

The present invention is further illustrated by the following examples, which are not to be construed in any way as imposing limitations upon the scope thereof. On the contrary, it is to be clearly understood that resort may be had to various other aspects, embodiments, modifications,

and equivalents thereof which, after reading the description herein, may suggest themselves to one of ordinary skill in the art without departing from the spirit of the present invention or the scope of the appended claims.

EXAMPLE 1 Identification of unique genomic sequences Embodiments of the invention exhibit the ability to identify organism-specific unique sequences which encompass both unique genomic sequences and unique oligonucleotide sequences that may not have a defined function as described in the current literature and to utilize such unique genomic sequences to detect naturally occurring and recombinant biological entities in complex environmental, food, forensic or biological samples.

SEQ ID NOs : l-1023 are unique genomic sequences from a variety of bacterial and viral genomes produced using the methods described herein. The percentage of unique genomic sequences from genomic DNA of various biological entities analyzed ranged from 0.06% to 21. 13% (Table 1). Since the complete genome of Francisella tularensis is not known at this time, the 54.03% unique sequence for this organism was generated from a plasmid. Generally, there was less than 1% unique DNA in bacterial genomes while there was an order of magnitude more unique sequence observed in the analyzed viral genomes.

This method of generating inferred unique sequences is demonstrated in Example 2, using a unique genomic Vaccinia sequence SEQ ID NO : 27, with the resulting inferred unique genomic sequences reported as SEQ ID NOs : 1025-1029 and SEQ ID NOs : 2072-2078. These sequences, are also unique, as determined by similarity searching these inferred unique sequences against the NCBI nr database. Those inferred genomic sequences that do not show significant homology to material in the database are then termed unique genomic sequences. As such, they too become significant material assets for the differential identification of that organism from which they are derived. The combination of these unique genomic sequences along with sequence data for organism-specific expressed genes can be utilized for the generation of unique oligonucleotide sequences (SEQ ID NOs : 1030-2071), and the differential identification of biological entities listed in Table 2.

EXAMPLE 2 BLAST search of unique Vaccinia virus sequence against the nr database of NCBI showing homology between Vaccinia virus and various other biological entities.

A unique region of the Vaccinia virus genome (SEQ ID NO : 27) was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 19 BLAST "hits". The pertinent"hits"are reported below with corresponding E values, these"hits" correspond to the SEQ ID NOs: 1025-1029,2072-2078. Two of the"hits"had an extremely high probability score, six had intermediate scores and eleven had low scores. The two"hits"with high scores were correctly identified by the similarity search as Vaccinia virus with 100% homology to the query length sequence over hundreds of nucleotides. Sequence dissimilarities within the group with intermediate scores (highlighted as boxes in the sequences below) identified sequences of related species that have significant homology to the query length sequence but were from different biological entities. Since the query length sequence originated from a unique region of Vaccinia virus, it is reasonable to infer that the sequences identified by the similarity search in other evolutionarily related biological entities are also from unique regions within their genomes. In the BLAST output below, differences within the intermediate group are outlined in boxes. These differences within related biological entities can be utilized to discriminate between two or more biological entities. In this way it is possible to generate multiple unique genomic sequences from one initial query length genomic sequence. First, the single query sequence was derived from a unique region of Vaccinia virus (SEQ ID NO : 27).

Second, the similarity search utilizing the above query sequence identified six different biological entities/strains that shared intermediate levels of homology. At this point each one of the BLAST intermediate score sequences SEQ ID NOs : 1024-1029 were termed an inferred unique genomic sequence (candidate unique genomic sequence). Finally, these inferred unique genomic sequences are useful to identify each of the six inferred biological entities/strains.

Through inference, unique sequences may be identified in a partially sequenced genome or even a single database entry that has not undergone the entire process of eliminating repetitive elements, fragmentation, reverse BLAST and so forth as outlined in the above-mentioned discovery process. Hits with low scores also presented 100% homology but over distances of less than 30 nucleotides. In the following examples, a series of thresholds, that are user dependent, were established for the BLAST search output. It is to be understood that although thresholds such as identity and sequence length were utilized in the following examples, the disclosed embodiments are merely exemplary of the invention that may be embodied in various and alternative forms as appreciated by one skilled in the art, and are not therefore construed as limiting. In the following examples, BLAST hits that contained homology over at least 25 nucleotides between the query length sequence and the BLAST"hit"were included. For instance it is noted that SEQ ID NO : 2078 corresponded to a sequence demonstrating 25 nucleotides of

homology derived from a Human DNA clone RP11-318L16 of Chromosome 1. As appreciated by one skilled in the art, more than one copy of a unique genomic sequence may exist in the genome of an individual organism. It is to be understood from this and the subsequent examples that the BLAST search output as described can be used to produce unique genomic sequences and inferred unique genomic sequences for both microbial and non-microbial species.

BLASTN 2.2. 4 [Aug-26-2002] Reference: Altschul, Stephen F. , Thomas L. Madden, Alejandro A. Schlaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST : a new generation of protein database search programs", Nucleic Acids Res. 25: 3389-3402.

RID : 1036169670-05727-22152 Query= (160 letters) Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, or phase 0,1 or 2 HTGS sequences) 1,430, 422 sequences; 7,041, 770,514 total letters Distribution of 19 BLAST Hits on the Query Sequence Score E Sequences producing significant alignments : (bits) Value SEQ ID NOs : 2072-2073,2080 gil2772662lgblU94848. 1 #U94848 Vaccinia virus strain Ankara, 317 2e-84 SEQ ID NOs : 2074-2075 gil3353171gbIM35027. 1 IVACCG Vaccinia virus, complete genome 317 2e-84 SEQ ID NO : 1024 gil3096962lemblY11842. 1lCVGRi90 CowpoxvirusstrainGRI-90 270 5e-70 SEQ ID NO : 1025 gi#3097015#emb#Y15035.1#CVY15035 Cowpox virus strain GRI-90... 270 5e-70 SEQ ID NOs : 1026 and 2076 gi#20152989#gb#AF482758.1# Cowpox virus strain Brighton Red.. 252 1 e-64

SEQ ID NO : 1027 gil18482913lgblAF438165. 1l CamelpoxvirusM-96fromKazakhs 228 2e-57 SEQ ID NO : 1028 gi#19717929#gb#AY009089.1# Camelpox virus CMS, complete genome 220 4e-55 SEQ ID NOs : 1029 and 2077 gi#22123748#gb#AF012825.2# Ectromelia virus strain Moscow, 80 9e-13 gi#14574206#gb#U23449.2# Caenorhabditis elegans cosmid K06A... 38 3.2 gi#687828#gb#U21318.1# Caenorhabditis elegans cosmid K03H9,... 38 3.2 gi#12000447#gb#AC084754.14# Homo sapiens 12p BAC RP11-874G1 38 3.2 gi#17534934#ref#NM_062895.1# Cuticulin precursor 38 3.2 gil18250549lemblAL627429. 8l Human DNA sequence from clone R... 38 3.2 SEQ ID NO : 2078 gi#16973060#emb#AL590101.9# Human DNA sequence from clone R... 38 3.2 gi#23337297#emb#AL732317.13# Mouse DNA sequence from clone 38 3.2 Alignments SEQ ID NOs: 2072-2073,2080 gi#2772662#gb#U94848.1#U94848 Vaccinia virus strain Ankara, complete genomic sequence Length = 177923 Score = 317 bits (160), Expect = 2e-84 Identities = 160/160 (100%) Strand = Plus/Plus Query:

1 aaatgcgatacagacattaagattgttcgactgttactctctcgcggagtcgagagactt 60 168675 Sbjct Sbjct Query Query 61 168735 tgtagaaacaacgaaggattaactccgctaggagcatacagtaagcatagatacgtaaaa 168794 Sbjct : Query : 121 tctcagattgtgcatctactgatatccagctattcgaatt 160

mmmmmmmmmmmmm 168795 tctcagattgtgcatctactgatatccagctattcgaatt 168834 Sbjct : Score = 317 bits (160), Expect = 2e-84 Identities = 160/160 (100%) Strand = Plus/Minus Query: 1 aaatgcgatacagacattaagattgttcgactgttactctctcgcggagtcgagagactt 60 Illllllllillllllllllllllllllllllllllllllllllllllllllllllllll 9414 aaatgcgatacagacattaagattgttcgactgttactctctcgcggagtcgagagactt 9355 Sbjct : Query : 61 tgtagaaacaacgaaggattaactccgctaggagcatacagtaagcatagatacgtaaaa 120 Illllllllllllllllllllillllllllllllllllllllllllllllllllllllil 9354 tgtagaaacaacgaaggattaactccgctaggagcatacagtaagcatagatacgtaaaa 9295 Sbjct : Query : 121 tctcagattgtgcatctactgatatccagctattcgaatt 160 Illllllillllllllllllllllllllllllllllllll 9294 tctcagattgtgcatctactgatatccagctattcgaatt 9255 Sbjct : SEQ ID NOs : 2074-2075 gi#335317#gb#M35027.1#VACCG Vaccinia virus, complete genome Length = 191737 Score = 317 bits (160), Expect = 2e-84 Identities = 160/160 (100%) Strand = Plus/Plus Query: 1 aaatgcgatacagacattaagattgttcgactgttactctctcgcggagtcgagagactt bu 182001 Sbjct Sbjct Query : 61 tgtagaaacaacgaaggattaactccgctaggagcatacagtaagcatagatacgtaaaa 120 182061 tgtagaaacaacgaaggattaactccgctaggagcatacagtaagcatagatacgtaaaa 182120 Sbjct : Query : 121 tctcagattgtgcatctactgatatccagctattcgaatt 160 1111111111111111111111111111111111111111 182121 tctcagattgtgcatctactgatatccagctattcgaatt 182160 aaatgcgatacagacattaagattgttcgactgttactctctcgcggagtcgagagactt 182060Sbjct : Score = 317 bits (160), Expect = 2e-84

Identities = 160/160 (100%) Strand = Plus/Minus Query : 1 aaatgcgatacagacattaagattgttcgactgttactctctcgcggagtcgagagactt 60 9737 Sbj Sbjct Query : 61 tgtagaaacaacgaaggattaactccgctaggagcatacagtaagcatagatacgtaaaa 120 9677 Sbjct Sbjct Query : 121 tctcagattgtgcatctactgatatccagctattcgaatt 160 Illllllllllllllllllillllllllllllllllllll 9617 tctcagattgtgcatctactgatatccagctattcgaatt 9578 Sbjct : SEQ ID NO : 1024 gil3096962lemblY11842. 1 lCVGRi90 Cowpox virus strain GRI-90 DNA (52 kb fragment) Length = 52283 Score = 270 bits (136), Expect = 5e-70 Identities = 154/160 (96%) Strand = Plus/Minus Query: 1 aaatgcgatacagacattaagattgttc 60 11111111111111111111111111111 7254 aaatgcgatacagacattaagattgttc 7195 Sbj : Query : 61 gtagaaacaacgaaggattaactccgctaggagcataca 120 1111111111111111111111111111111111111111 7194 gtagaaacaacgaaggattaactccgctaggagcataca 7135 Sbjct : Query : 121 tctcagattgtgcatctactgatatccagctattcgaatt 160 l 7134 tcagattgtgcatctactgatatccagctattcgaatt 7095 Sbjct : ctgttactctctcgcggagtcgagagacttSEQ ID NO : 2106 gi#3097015#emb#Y15035.1#CVY15035 Cowpox virus strain GRI-90 DNA (49 kb fragment) Length = 49649 Score = 270 bits (136), Expect = 5e-70 Identities = 154/160 (96%) Strand = Plus/Plus

Query : 1 aaatgcgatacagacattaagattgttc 60 Illlllllllllllillllllllllllll 42396 aaatgcgatacagacattaagattgttc 42455 Sbjct Query : 61 t 120 1111111111111111111111111111111111111111 42456 cgtagaaacaacgaaggattaactccgctaggagcatacaccaagcatags 42515 Sbjct : Query : 121 tctcagattgtgcatctactgatatccagctattcgaatt 160 I 42516 tcagattgtgcatctactgatatccagctattcgaatt 42555 Sbjct : ctgttactctctcgcggagtcgagagacttSEQ ID NO : 2109 gi#20152989#gb#AF482758.1# Cowpox virus strain Brighton Red, complete genome Length = 224501 Score = 252 bits (127), Expect = 1e-64 Identities = 148/155 (95%) Strand = Plus/Plus Query : 1 aaatgcgatacagacattaagattgttc 60 215866 aaatgcgatacagacattaagattgttc 215925 Sbjct : Query : 61 tgtagaaacaacgaaggattaactccgctagga 120 Illlllllllllllllllllllillllllllll 215926 tgtagaaacaacgaaggattaactccgctagga atacagta 215985 Sbjct : Query : 121 tctcagattgtgcatctactgatatccagctattc 155 11111111111111111111111111111111111 215986 tctcagattgtgcatctactgatatccagctattc 216020 tgttctctctcgcggagtcgagagacttSbjct : Score = 252 bits (127), Expect = 1e-64 Identities = 148/155 (95%) Strand = Plus/Minus

Query : 1 aaatgcgatacagacattaagattgttc 60 111111111111111111111111111111 Sbjct : 8636 aaatgcgatacagacattaagattgttc 8577 Query : 61 tgtagaaacaacgaaggattaactccgctagga 1111111111111|11111111111111111111 Sbjct : 8576 tgtagaaacaacgaaggattaactccgctagga 8517

Query : 121 tctcagattgtgcatctactgatatccagctattc 155 11111111111111|11111111111111111111 Sbjct : 8516 tctcagattgtgcatctactgatatccagctattc 8482 SEQ ID NO : 2111 gi#18482913#gb#AF438165.1# Camelpox virus M-96 from Kazakhstan, complete genome Length = 205719 Score = 228 bits (115), Expect = 2e-57 Identities = 145/155 (93%) Strand = Plus/Minus

Query : 60 11111111111111111111111111111 Sbjct : 8152 Query : 120 111111111111111111111111111111111 Sbjct 8092 Query : 121 155 111111111111 Sbjct 8057

SEQ ID NO : 2112 gi#19717929#gb#AY009089.1# Camelpox virus CMS, complete genome Length = 202205 Score = 220 bits (111), Expect = 4e-55 Identities = 144/155 (92%) Strand = Plus/Minus

Query : 1 aaatgcgatacagacattaagattgtt 60 I Sbjct : 6531 aaatgcgatacagacattaagattgttc 6472 Query : 61 t 120 Illlllillllllllllllllllllllllllll Sbjct : 6471 c 6412

Query : 121 tctcagattgt 155 iiiiiiiiiiii Sbjct : 6411 tctcagattgt 6377 SEQ ID NO : 2123 gil22123748lgblAFO12825. 2# Ectromelia virus strain Moscow, complete genome Length = 209771 Score = 79.8 bits (40), Expect = 9e-13 Identities = 46/48 (95%) Strand = Plus/Minus Query : 1 aaatgcgatacagacattaagattgttcgactgt 48 11111111111111111111111111111111111 Sbjct : 4668 aaatgcgatacagacattaagattgttcgactgt 4621 Score = 79.8 bits (40), Expect = 9e-13 Identities = 46/48 (95%) Strand = Plus/Plus Query : 1 aaatgcgatacagacattaagattgttcgactgt 48 11111111111111111111111111111111111 Sbjct : 205104 aaatgcgatacagacattaagattgttcgactgt 205151 SEQ ID NO : 2078 gi#14574206#gb#U23449.2# Caenorhabditis elegans cosmid K06A1, complete sequence Length = 26449 Score = 38.2 bits (19), Expect = 3.2 Identities = 19/19 (100%) Strand = Plus/Minus Query : 58 ctttgtagaaacaacgaag 76 Illllllllllllilllll Sbjct : 761 ctttgtagaaacaacgaag 743 gi#687828#gb#U21318.1# Caenorhabditis elegans cosmid K03H9, complete sequence Length = 31731 Score = 38.2 bits (19), Expect = 3.2 Identities = 19/19 (100%)

Strand = Plus/Minus Query : 58 ctttgtagaaacaacgaag 76 1111111111111111111 Sbjct : 31592 ctttgtagaaacaacgaag 31574 gi#12000447#gb#AC084754.14# Homo sapiens 12p BAC RP11-874G11 (Roswell Park Cancer Institute Human BAC Library) complete sequence Length = 176626 Score = 38.2 bits (19), Expect = 3.2 identities = 19/19 (100%) Strand = Plus/Plus Query : 10 acagacattaagattgttc 28 1111111111111111111 Sbjct : 164270 acagacattaagattgttc 164288 gi#17534934#ref#NM_062895.1# Cuticulin precursor Length = 2196 Score = 38.2 bits (19), Expect = 3.2 Identities = 19/19 (100%) Strand = Plus/Minus Query : 58 ctttgtagaaacaacgaag 76 11111111111111111|1 Sbjct : 367 ctttgtagaaacaacgaag 349 gil18250549lemblAL627429. 8l Human DNA sequence from clone RP11-361G9 on chromosome 10, complete sequence [Homo sapiens] Length = 23418 Score = 38.2 bits (19), Expect = 3.2 Identities = 19/19 (100%) Strand = Plus/Plus Query : 53 agagactttgtagaaacaa 71 1111111111111111111 Sbjct : 19509 agagactttgtagaaacaa 19527

gil16973060lemblAL590101. 9l Human DNAsequencefrom clone RP11-318L16 on chromosome 1, complete sequence [Homo sapiens] Length = 180169 Score = 38.2 bits (19), Expect = 3.2 Identities = 25/27 (92%) Strand = Plus/Plus Query : 117 aaaatc 143 11111 Sbjct : 139363 aaaa 139389 c attgtgcatctactgatgil23337297lemblAL732317. 131 Mouse DNA sequence from clone RP23-139F8 on chromosome 2, complete sequence [Mus musculus] Length = 222471 Score = 38.2 bits (19), Expect = 3.2 Identities = 19/19 (100%) Strand = Plus/Minus Query : 9 tacagacattaagattgtt 27 1111111111111111111 Sbjct : 56772 tacagacattaagattgtt 56754 EXAMPLE 3 BLAST search of unique Vaccinia virus sequence against the nr database of NCBI showing homology between Vaccinia virus and various other biological entities.

A unique region of the Vaccinia virus genome was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 155 BLAST"hits". The most pertinent"hits"are reported below with corresponding E values, these"hits"correspond to the SEQ ID NOs : 2079-2099. Four of the"hits"had an extremely high probability score, eight had intermediate scores and 143 with low scores. The four"hits"with high scores were identified correctly by the BLAST search as Vaccinia virus with 100% homology to the query sequence over one hundred fifty nucleotides. Hits with intermediate scores also presented 100% homology but over a distance of less than one hundred twenty nucleotides. The hits with low scores generally contained 90% homology for distances of less than 40 nucleotides. Sequence dissimilarities within the group with intermediate scores identify sequences of related species

that have significant homology to the query sequence but are from different biological entities.

Since the query sequence came from a unique region of Vaccinia virus, it is reasonable to infer that the sequences identified in other evolutionarily related biological entities are also from unique regions within their genomes.

Distribution of 155 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2079 gi|6969640|gb|AF095689. 1|AF095689 Vaccinia virus (strain Ti... 317 2e-84 SEQ ID NOs : 2072-2073,2080 gi|2772662|gb|U94848. 1|U94848 Vaccinia virus strain Ankara, 317 2e-84 SEQ ID NO: 2081 gi#335691#gb#M22812.1#VACLEND Vaccinia virus genome, left end 317 2e-84 SEQ ID NO : 2082 gi#335317#gb#M35027.1#VACCG Vaccinia virus, complete genome 317 2e-84 SEQ ID NO : 2083 gi#20152989#gb#AF482758. 11 Cowpox virus strain Brighton Red... 214 2e-53 SEQ ID NOs : 2084-2085 gi#17529780#gb#AF380138.1#AF380138 Monkeypox virus strain Z... 167 5e-39 SEQ ID NOs : 2086-2087 gi#18482913#gb#AF438165. 1# Camelpox virus M-96 from Kazakhs... 50 8e-04 SEQ ID NOs : 2088-2089 gi#19717929#gb#AY009089.1# Camelpox virus CMS, complete genome 50 8e-04 SEQ ID NO : 2090 gi#885724#gb#U18338.1#VVU18338 Variola virus Garcia-1966 le... 50 8e-04 SEQ ID NO : 2091 gi#5830555#emb#Y16780. 1|VMVY16780 variola minor virus compl... 50 8e-04 SEQ ID NOs : 2092-2093 gi#1808597#emb#X94355. 1ICV41KBPL Cowpox virus 41kbp fragmen... 50 8e-04 SEQ ID NOs : 2094-2095 gi#3096962#emb#Y11842.1#CVGRI90 Cowpox virus strain GRI-90... 50 8e-04 SEQ ID NO : 2096 gi#4827091#dbj#AP000192.1# Homo sapiens genomic DNA, chromo... 44 0.052

SEQ ID NO : 2097 gi|4835679|dbj|AP000310. 1| Homo sapiens genomic DNA, chromo... 44 0.052 SEQ ID NO : 2098 gi|7768678ldbj|AP001717. 1| Homo sapiens genomic DNA, chromo... 44 0.052 SEQ ID NO : 2099 gi|4730850|dbj|AP000116. 1| Homo sapiens genomic DNA of 21g2... 44 0.052 EXAMPLE 4 BLAST search of unique Vaccinia virus sequence against the nr database of NCBI showing homology between Vaccinia virus and various other biological entities.

A unique region of the Vaccinia virus genome (SEQ ID NO: 24) was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 24 BLAST "hits". The most pertinent"hits"are reported below with corresponding E values, these"hits" correspond to the SEQ ID NOs: 2100-2112. One"hit"had an extremely high probability score and twenty three with intermediate scores. The high score"hits"was correctly identified by the BLAST search as Vaccinia virus with 100% homology to the query sequence over one hundred sixty nucleotides. Hits with intermediate scores presented at least 90% homology over a distance of less than one hundred sixty nucleotides. Sequence dissimilarities within the group with intermediate scores identify sequences of related species that have significant homology to the query sequence but are from different biological entities. Since the query sequence came from a unique region of Vaccinia virus, it is reasonable to infer that the sequences identified in other evolutionarily related biological entities are also from unique regions within their genomes.

Distribution of 24 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2100 gi#335317#gb#M35027. 1IVACCG Vaccinia virus, complete genome 317 2e-84 SEQ ID NO : 2101 gi#222717#dbj#D11079. 1#VACRHF Vaccinia virus genomic DNA, 4... 309 6e-82 SEQ ID NO : 2102 gi#335810#gb#M58054.1#VACSALF19R Vaccinia virus SALF19R and... 309 6e-82 SEQ ID NO : 2103 gi#16944723#emb#AJ416893. 21 WI416893 Vaccinia virus A53R ge... 301 le-79

SEQ ID N0 : 2104 gi|6969640|gb|AF095689. 1|AF095689 Vaccinia virus (strain Ti... 301 le-79 SEQ ID NO : 2105 gi#4678693#emb#Y17728.1#VVI17728 Vaccinia virus A53R gene, 293 3e-77 SEQ ID NO : 2106 gi#3097015#emb#Y15035. 1|CVY15035 Cowpox virus strain GRI-90... 285 8e-75 SEQ ID NO : 2107 gi|22123748|gb|AF012825. 2| Ectromelia virus strain Moscow, 238 2e-60 SEQ ID NO : 2108 gi|2738197|gb|U93910. 1|U93910 Ectromelia virus tumor necros... 238 2e-60 SEQ ID NO : 2109 gi#20152989#gb#AF482758. 1# Cowpox virus strain Brighton Red... 198 le-48 SEQ ID NO : 2110 giI4097321IgbIU55052. 1ICVU55052 Cowpox virus soluble TNF re... 198 le-48 SEQ ID NO : 2111 gi#18482913#gb#AF438165.1# Camelpox virus M-96 from Kazakhs... 182 9e-44 SEQ ID NO : 2112 gi#19717929#gb#AY009089.1# Camelpox virus CMS, complete genome 182 9e-44 EXAMPLE 5 BLAST search of unique Vaccinia virus sequence against the nr database of NCBI showing homology between Vaccinia virus and various other biological entities.

A unique region of the Vaccinia virus genome was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 154 BLAST"hits". The most pertinent"hits"are reported below with corresponding E values, these"hits"correspond to the SEQ ID NOs : 2113-2128. One of the"hits"had an extremely high probability score, twelve had intermediate scores and three with low scores. The high score"hit"was correctly identified by the BLAST search as Vaccinia virus with 100% homology to the query sequence over one hundred sixty nucleotides. Hits with intermediate scores generally presented 90% homology over a distance of less than one hundred sixty nucleotides. The hits with low scores generally contained 90% homology for distances of less than 40 nucleotides. Sequence dissimilarities within the group with intermediate scores identify sequences of related species that have significant homology to the query sequence but are from different biological entities. Since the

query sequence came from a unique region of Vaccinia virus, it is reasonable to infer that the sequences identified in other evolutionarily related biological entities are also from unique regions within their genomes.

Distribution of 154 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2113 gi#335317#gb#M35027. 1|VACCG Vaccinia virus, complete genome 317 2e-84 SEQ ID NO : 2114 gil3097015|emb|Y15035. 1|CVY15035 Cowpox virus strain GRI-90... 293 3e-77 SEQ ID NO : 2115 gi|6969640|gb|AF095689. 1|AF095689 Vaccinia virus (strain Ti... 293 3e-77 SEQ ID NO : 2116 gi#222717#dbj#D11079.1#VACRHF Vaccinia virus genomic DNA, 4... 283 3e-74 SEQ ID NO : 2117 gi#335810#gb#M58054. 1#VACSALF19R Vaccinia virus SALF19R and... 283 3e-74 SEQ ID NO : 2118 gi#16944723#emb#AJ416893.2#VVI416893 Vaccinia virus A53R ge... 278 2e-72 SEQ ID NO : 2119 gi|18482913|gb|AF438165. 1| Camelpox virus M-96 from Kazakhs... 234 3e-59 SEQ ID NO : 2120 gi#19717929#gb#AY009089.1# Camelpox virus CMS, complete genome 234 3e-59 SEQ ID NO : 2121 gi#20152989#gb#AF482758. 11 Cowpox virus strain Brighton Red... 230 4e-58 SEQ ID NO : 2122 gi#4097321#gb#U55052. 1#CVU55052 Cowpox virus soluble TNF re... 230 4e-58 SEQ ID NO : 2123 gi#22123748#gb#AF012825.2# Ectromelia virus strain Moscow, 218 2e-54 SEQ ID NO : 2124 gi#4678693#emb#Y17728.1#VVI17728 Vaccinia virus A53R gene, 76 le-11 SEQ ID NO : 2125 gi#2738197#gb#U93910.1#U93910 Ectromelia virus tumor necros... 52 2e-04

SEQ ID NO : 2126 gi#15668150#gb#AC096670.1# Homo sapiens BAC clone RP11-438K... 44 0.052 SEQ ID NO : 2127 gi|8218054|emb|AL033520. 16|HS349A12 Human DNA sequence from... 44 0.052 SEQ ID NO : 2128 gi|23496061|gb|AE014838. 1| Plasmodium falciparum 3D7 chromo 42 0.21 EXAMPLE 6 BLAST search of unique Vaccinia virus sequence against the nr database of NCBI showing homology between Vaccinia virus and various other biological entities.

A unique region of the Vaccinia virus genome (SEQ ID NO: 26) was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 39 BLAST "hits". The most pertinent"hits"are reported below with corresponding E values, these"hits" correspond to the SEQ ID NOs : 2129-2144. Four of the"hits"had an extremely high probability score, eight had intermediate scores and four with low scores. The four"hits"with high scores were identified correctly by the BLAST search as Vaccinia virus with 100% homology to the query sequence over one hundred sixty nucleotides. Hits with intermediate scores generally presented at least 90% homology but over a distance of less than one hundred sixty nucleotides.

The hits with low scores generally contained 90% homology for distances of less than 40 nucleotides. Sequence dissimilarities within the group with intermediate scores identify sequences of related species that have significant homology to the query sequence but are from different biological entities. Since the query sequence came from a unique region of Vaccinia virus, it is reasonable to infer that the sequences identified in other evolutionarily related biological entities are also from unique regions within their genomes.

Distribution of 39 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2129 gi#6969640#gb#AF095689. 1|AF095689 Vaccinia virus (strain Ti... 317 2e-84 SEQ ID NO : 2130 giI222717IdbjID11079. 1IVACRHF Vaccinia virus genomic DNA, 4... 317 2e-84 SEQ ID NO : 2131 gi#335317#gb#M35027. 1IVACCG Vaccinia virus, complete genome 317 2e-84

SEQ ID NO : 2132 gi#335311#gb#M58056.1#VACB7891R Vaccinia virus B7R 21. 3K pr... 317 2e-84 SEQ ID NO : 2133 gi|3097015|emb|Y15035. 1|CVY15035 Cowpox virus strain GRI-90... 301 le-79 SEQ ID NO : 2134 gi#18482913#gb#AF438165.1# Camelpox virus M-96 from Kazakhs... 293 3e-77 SEQ ID NO : 2135 gi#19717929#gb#AY009089.1# Camelpox virus CMS, complete genome 293 3e-77 SEQ ID NO : 2136 gi#22123748#gb#AF012825. 21 Ectromelia virus strain Moscow, 270 5e-70 SEQ ID NO : 2137 gi#20152989#gb#AF482758.1# Cowpox virus strain Brighton Red... 270 5e-70 SEQ ID NO : 2138 gi#2772662#gb#U94848.1#U94848 Vaccinia virus strain Ankara,... 262 le-67 SEQ ID NO : 2139 gil4530472|gb|AF120160. 1|AF120160 Vaccinia virus 21. 3K prot... 246 7e-63 SEQ ID NO : 2140 gi#17529780#gb#AF380138.1#AF380138 Monkeypox virus strain Z... 119 le-24 SEQ ID NO : 2141 gi#21488#emb#X04753. 1|STLSlG Potato light-inducible tissue-... 46 0.013 SEQ ID NO : 2142 gi#15722188#emb#AL603887.3# Human DNA sequence from clone R... 42 0.21 SEQ ID NO : 2143 gi#213857#gb#L12206.1#SMOTC1LT Salmo salar (clone TC-TSS1) 38 3.2 SEQ ID NO : 2144 gi#8052273#emb#AL034559. 4IPFMAL3P7 Plasmodium falciparum MA... 38 3.2 EXAMPLE 7 BLAST search of unique Vaccinia virus sequence against the nr database of NCBI showing homology between Vaccinia virus and various other biological entities.

A unique region of the Vaccinia virus genome was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 36 BLAST"hits". The most pertinent"hits"are reported below with corresponding E values, these"hits"correspond to

the SEQ ID NOs : 2145-2156. One of the"hits"had an extremely high probability score, eleven had intermediate scores and 24 with low scores. The high score"hit"was identified correctly by the BLAST search as Vaccinia virus with 100% homology to the query sequence over one hundred sixty nucleotides. Hits with intermediate scores generally presented 90% homology but over a distance of less than one hundred sixty nucleotides. Sequence dissimilarities within the group with intermediate scores identify sequences of related species that have significant homology to the query sequence but are from different biological entities. Since the query sequence came from a unique region of Vaccinia virus, it is reasonable to infer that the sequences identified in other evolutionarily related biological entities are also from unique regions within their genomes.

Distribution of 36 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2145 A gi|335317|gb|M35027. 1|VACCG Vaccinia virus, complete genome 317 2e-84 SEQ ID NO : 2146 gi#17529780#gb#AF380138. 1|AF380138 Monkeypox virus strain Z... 309 6e-82 SEQ ID NO : 2147 gi#2772662#gb#U94848.1#U94848 Vaccinia virus strain Ankara, 309 6e-82 SEQ ID NO : 2148 giI6969640IgbIAF095689. 11AF095689 Vaccinia virus (strain Ti... 301 le-79 SEQ ID NO : 2149 gi#18482913#gb#AF438165.1# Camelpox virus M-96 from Kazakhs... 293 3e-77 SEQ ID NO : 2150 gi#19717929#gb#AY009089.1# Camelpox virus CMS, complete genome 293 3e-77 SEQ ID NO : 2151 gi#222704#dbj#D00382.1#VACH3K Vaccinia virus genes for ORF... 293 3e-77 SEQ ID NO : 2152 gi#1808597#emb#X94355. 1|CV41KBPL Cowpox virus 41kbp fragmen... 285 8e-75 SEQ ID NO : 2153 gi#2285915#emb#X83621. l) CVORFlL5L Cowpox virus ORFs 1L, 2L, 3... 285 8e-75 SEQ ID NO : 2154 gi#3096962#emb#Y11842.1#CVGRI90 Cowpox virus strain GRI-90... 285 8e-75

SEQ ID NO : 2155 gi|20152989|gb|AF482758. 1| Cowpox virus strain Brighton Red... 254 3e-65 SEQ ID NO : 2156 gi|22123748|gb|AF012825. 2| Ectromelia virus strain Moscow, 246 7e-63 EXAMPLE 8 BLAST search of unique Vaccinia virus sequence against the nr database of NCBI showing homology between Vaccinia virus and various other biological entities.

A unique region of the Vaccinia virus genome (SEQ ID NO: 29) was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 47 BLAST "hits". The most pertinent"hits"are reported below with corresponding E values, these"hits" correspond to the SEQ ID NOs : 2157-2178. One of the"hits"had an extremely high probability score, six had intermediate scores and forty with low scores. The"hit"with the highest score was identified correctly by the BLAST search as Vaccinia virus with 100% homology to the query sequence over one hundred sixty nucleotides. Hits with intermediate scores generally presented at least 90% homology but over a distance of less than one hundred sixty nucleotides. The hits with low scores generally contained 90% homology for distances of less than 40 nucleotides.

Sequence dissimilarities within the group with intermediate scores identify sequences of related species that have significant homology to the query sequence but are from different biological entities. Since the query sequence came from a unique region of Vaccinia virus, it is reasonable to infer that the sequences identified in other evolutionarily related biological entities are also from unique regions within their genomes.

Distribution of 47 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NOs : 2157-2158 gi|335317|gb|M35027. 1|VACCG Vaccinia virus, complete genome 317 2e-84 SEQ ID NOs : 2159-2160 gi#2772662#gb#U94848. 1#U94848 Vaccinia virus strain Ankara,... 274 3e-71 SEQ ID NOs : 2161-2164 gi#20152989#gb#AF482758. 11 Cowpox virus strain Brighton Red... 143 8e-32 SEQ ID NOs : 2165-2168 gi|18482913|gb|AF438165. 1| Camelpox virus M-96 from Kazakhs... 135 2e-29

SEQ ID NOs : 2169-2172 gi|19717929|gb|AY009089. 1| Camelpox virus CMS, complete genome 135 2e-29 SEQ ID NOs : 2173-2174 gi|3097015|emb|Y15035. 1|CVY15035 Cowpox virus strain GRI-90... 127 5e-27 SEQ ID NOs : 2175-2176 gi|3096962|emb|Y11842. 1|CVGRI90 Cowpox virus strain GRI-90 ... 127 5e-27 SEQ ID NO : 2177 gi|22002713|emb|AL731788. 81 Zebrafish DNA sequence from clo... 44 0.052 SEQ ID NO : 2178 gi#13560069#emb#AL033519.42#HS340B19 Human DNA sequence fro... 38 3.2 EXAMPLE 9 BLAST search of unique Vaccinia virus sequence against the nr database of NCBI showing homology between Vaccinia virus and various other biological entities.

A unique region of the Vaccinia virus genome was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 142 BLAST"hits". The most pertinent"hits"are reported below with corresponding E values, these"hits"correspond to the SEQ ID NOs : 2179-2272. Five of the"hits"had an extremely high probability score and forty five with intermediate scores. The five"hits"with high scores were identified correctly by the BLAST search as Vaccinia virus with 100% homology to the query sequence over one hundred sixty nucleotides. Hits with intermediate scores generally presented at least 90% homology but over a distance of less than one hundred sixty nucleotides. Sequence dissimilarities within the group with intermediate scores identify sequences of related pox virus species that have significant homology to the query sequence but are from different biological entities. Since the query sequence came from a unique region of Vaccinia virus, it is reasonable to infer that the sequences identified in other evolutionarily related biological entities are also from unique regions within their genomes.

Distribution of 142 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2179 gi#4678697#emb#Y17730.1#VVI17730 Vaccinia virus B28R/C22L g... 317 2e-84 SEQ ID NO : 2180 gi#4678695#emb#Y17729.1#VVI17729 Vaccinia virus B28R/C22L g... 317 2e-84

SEQ ID NO : 2181 gi#2738036#gb#U87584. 1#VVU87584 Vaccinia virus strain Colum... 317 2e-84 SEQ ID NO : 2182 gi#2738020#gb#U86872. 1IVVU86872 Vaccinia virus strain Venez... 317 2e-84 SEQ ID NO : 2183 gi#2738018#gb#U86871.1#VVU86871 Vaccinia virus strain Liste... 317 2e-84 SEQ ID NO : 2184 gi#2738030#gb#U87233.1#BVU87233 Buffalopox virus tumor necr... 309 6e-82 SEQ ID NO : 2185 gi#2738028#gb#U87232.1#BVU87232 Buffalopox virus tumor necr... 309 6e-82 SEQ ID NO : 2186 gi#2738014#gb#U86873.1#RVU86873 Rabbitpox virus tumor necro... 309 6e-82 SEQ ID NO : 2187 gi#2738038#gb#U87585.1#VVU87585 Vaccinia virus strain Templ... 293 3e-77 SEQ ID NOs : 2188-2189 gi#335317#gb#M35027.1#VACCG Vaccinia virus, complete genome 293 3e-77 gi#2738140#gb#U90232. 1ICVU90232 Cowpox virus tumor necrosis... 248 2e-63 SEQ ID NOs : 2190-2191 gi#2738054#gb#U87838. IICVU87838 Camelpox virus CP1 tumor ne... 218 2e-54 SEQ ID NOs : 2192-2193 gi#2738058#gb#U87840.1#CVU87840 Camelpox virus CP5 tumor ne... 218 2e-54 SEQ ID NOs : 2194-2195 gi#18482913#gb#AF438165.1# Camelpox virus M-96 from Kazakhs... 218 2e-54 SEQ ID NOs : 2196-2199 gi|l9717929|gb|AY009089. 1| Camelpox virus CMS, complete genome 218 2e-54 SEQ ID NOs : 2200-2203 gi#16944722#emb#AJ416892.1#VVI416892 Vaccinia virus disrupt... 218 2e-54 SEQ ID NOs : 2204-2205 gi#2738056#gb#U87839.1#CVU87839 Camelpox virus strain Saudi... 218 2e-54 SEQ ID NOs : 2206-2207 gi#2738052#gb#U87837.1#CVU87837 Camelpox virus strain Somal... 218 2e-54 SEQ ID NOs : 2208-2209 gi#2738128#gb#U90226.1#CVU90226 Cowpox virus tumor necrosis... 206 6e-51

SEQ ID NO : 2210 gi#885833#gb#U18341.1#VVU18341 Variola virus Somalia-1977 r... 204 2e-50 SEQ ID NOs : 2211-2212 gi#885766#gb#U18339. low18339 Variola virus Garcia-1966 ri... 204 2e-50 SEQ ID NOs : 2213-2214 gi#5830555#emb#Y16780.1#VMVY16780 variola minor virus compl... 204 2e-50 SEQ ID NOs : 2215-2216 gi#456758#emb#X69198.1#VVCGAA Variola virus DNA complete ge... 204 2e-50 SEQ ID NOs : 2217-2218 gi#516428#emb#X67117. IIVVXHOIFOH Variola virus (XhoI-F, O, H,... 204 2e-50 SEQ ID NOs : 2219-2220 gi#2738098#gb#U88150. 1WU88150 Variola virus tumor necrosi... 204 2e-50 SEQ ID NOs : 2221-2222 gi#2738096#gb#U88149.1#VVU88149 Variola virus tumor necrosi... 204 2e-50 SEQ ID NOs : 2223-2224 gi#2738094#gb#U88148.1#VVU88148 Variola virus tumor necrosi... 204 2e-50 SEQ ID NOs : 2225-2226 gi#2738092#gb#U88147.1#VVU88147 Variola virus tumor necrosi... 204 2e-50 SEQ ID NOs : 2227-2228 gi#2738090#gb#U88146.1#VVU88146 Variola virus tumor necrosi... 204 2e-50 SEQ ID NOs : 2229-2230 gi#2738088#gb#U88145.1#VVU88145 Variola virus tumor necrosi... 204 2e-50 SEQ ID NOs : 2231-2232 gi#2738082#gb#U88142.1#MVU88142 Monkeypox virus tumor necro... 204 2e-50 SEQ ID NOs : 2233-2234 gi#2738070#gb#U87846. 1#MVU87846 Monkeypox virus strain Beni... 204 2e-50 SEQ ID NOs : 2235-2236 gi#2738068#gb#U87845. 1#MVU87845 Monkeypox virus strain vair 204 2e-50 SEQ ID NOs : 2237-2238 gi#2738066#gb#U87844.1#MVU87844 Monkeypox virus strain Nige... 204 2e-50 SEQ ID NOs : 2239-2240 gi#2738064#gb#U87843. 1IMVU87843 Monkeypox virus strain Sier... 204 2e-50

SEQ ID NOs : 2241-2242 gi#2738062#gb#U87842. 1 MVU87842 Monkeypox virus strain Libe... 204 2e-50 SEQ ID NOs : 2243-2244 gi#22266276#emb#X70841. 11 WORF Variola virus genes for ORF1... 204 2e-50 SEQ ID NOs : 2245-2246 gi#2738016#gb#U86874.1#TVU86874 Taterapox virus tumor necro... 200 4e-49 SEQ ID NOs : 2247-2248 gi#2738072#gb#U87847. 1IMVU87847 Monkeypox virus strain Zair... 196 6e-48 SEQ ID NOs : 2249-2250 gi#17529780#gb#AF380138.1#AF380138 Monkeypox virus strain Z... 190 4e-46 SEQ ID NOs : 2251-2254 gi#22123748#gb#AF012825. 21 Ectromelia virus strain Moscow, 188 le-45 SEQ ID NOs : 2255-2256 gi#2738011#gb#U86381.1#EVU86381 Ectromelia virus tumor necr... 188 le-45 SEQ ID NO : 2257 gi#2738010#gb#U86380. 1#EVU86380 Ectromelia virus tumor necr... 188 le-45 SEQ ID NO : 2258 gi#2738086#gb#U88144. 1#MVU88144 Monkeypox virus tumor necro... 182 9e-44 SEQ ID NOs : 2259-2260 gi#2738084#gb#U88143.1#MVU88143 Monkeypox virus tumor necro... 182 9e-44 SEQ ID NOs : 2261-2262 gi#2738080#gb#U87995.1#MVU87995 Monkeypox virus clone CV1 t... 182 9e-44 SEQ ID NOs : 2263-2264 giI2738078IgbIU87994. 1IMVU87994 Monkeypox virus clone CW-N1... 182 9e-44 SEQ ID NOs : 2265-2266 gi#2738102#gb#U88152.1#VVU88152 Variola virus tumor necrosi... 176 5e-42 SEQ ID NOs : 2269-2270 gi#2738100#gb#U88151. 1#VVU88151 Variola virus tumor necrosi... 176 5e-42 SEQ ID NOs : 2271-2272 gi#623595#gb#L22579.1#VARCG Variola major virus (strain Ban... 176 5e-42

EXAMPLE 10 BLAST search of unique Yersinia pestis sequence against the nr database of NCBI showing homology between Yersinia pestis and various other biological entities.

A unique region of the Yersinia pestis genome (SEQ ID NO : 1) was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 29 BLAST "hits". The most pertinent"hits"are reported below with corresponding E values, these"hits" correspond to the SEQ ID NOs : 2273-2279. Two of the"hits"had an extremely high probability score and two with intermediate scores. The two"hits"with high scores were identified correctly by the BLAST search as Yersinia pestis with 100% homology to the query sequence over one thousand nucleotides. Hits with intermediate scores generally presented at least 90% homology over a distance of several hundred nucleotides. Sequence dissimilarities within the group with intermediate scores identify sequences of related species that have significant homology to the query sequence but are from different biological entities. Since the query sequence came from a unique region of Yersinia pestis, it is reasonable to infer that the sequences identified in other evolutionarily related biological entities are also from unique regions within their genomes.

Distribution of 29 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NOs : 2273-2274 giI21960540IgbIAE013960. 11 Yersinia pestis KIM section 360... 1982 0.0 SEQ ID NOs : 2275-2276 gi|15978563|emb|AJ414143. 1| Yersinia pestis strain C092 com... 1982 0.0 SEQ ID NOs : 2277-2278 gi|21958495|gb|AE013773. 1| Yersinia pestis KIM section 173... 442 e-121 SEQ ID NOs : 2279-2280 gi|15980308|emb|AJ414152. 1| Yersinia pestis strain C092 com... 442 e-121 EXAMPLE 11 BLAST search of unique Yersinia pestis sequence against the nr database of NCBI showing homology between Yersinia pestis and various other biological entities.

A unique region of the Yersinia pestis genome (SEQ ID NO: 2) was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 8 BLAST "hits". The most pertinent"hits"are reported below with corresponding E values, these"hits" correspond to the SEQ ID NOs : 2280-2282. Three of the"hits"had an extremely high probability

score. The three"hits"with high scores were identified correctly by the BLAST search as Yersinia pestis with 100% homology to the query sequence over one thousand nucleotides.

Distribution of 8 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2281 gi|21960532|gb|AE013959. 11 Yersinia pestis KIM section 359... 1982 0.0 SEQ ID NO : 2282 gi#10945159#emb#AJ277629. 1|YPE277629 Yersinia pestis yapF gene 1982 0.0 SEQ ID NO : 2283 gi115978563lemblAJ414143. 11 Yersinia pestis strain C092 com... 1982 0.0 EXAMPLE 12 BLAST search of unique Yersinia pestis sequence against the nr database of NCBI showing homology between Yersinia pestis and various other biological entities.

A unique region of the Yersinia pestis genome (SEQ ID NO : 3) was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 15 BLAST "hits". The most pertinent"hits"are reported below with corresponding E values, these"hits" correspond to the SEQ ID NOs : 2284-2285. Two of the"hits"had an extremely high probability.

The two"hits"with high scores were identified correctly by the BLAST search as Yersinia pestis with 100% homology to the query sequence over one thousand nucleotides.

Distribution of 15 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2284 gi#21960382#gb#AE013946. 11 Yersinia pestis KIM section 346... 1982 0.0 SEQ ID NO : 2285 giI159787341embIAJ414144. 11 Yersinia pestis strain C092 com... 1982 0.0 EXAMPLE 13 BLAST search of unique Yersinia pestis sequence against the nr database of NCBI showing homology between Yersinia pestis and various other biological entities.

A unique region of the Yersinia pestis genome (SEQ ID NO : 4) was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 13 BLAST "hits". The most pertinent"hits"are reported below with corresponding E values, these"hits"

correspond to the SEQ ID NOs : 2286-2288. Two of the"hits"had an extremely high probability score and one with low score. The two"hits"with high scores were identified correctly by the BLAST search as Yersinia pestis with 100% homology to the query sequence over one thousand nucleotides. The low hit scores presented 92% homology over a distance of twenty six nucleotides. Sequence dissimilarities within the group with intermediate scores identify sequences of related species that have significant homology to the query sequence but are from different biological entities. Since the query sequence came from a unique region of Yersinia pestis, it is reasonable to infer that the sequences identified in other evolutionarily related biological entities are also from unique regions within their genomes.

Distribution of 13 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2286 gi|21958827lgb|AE013803. 11 Yersinia pestis KIM section 203... 1982 0.0 SEQ ID NO : 2287 gi|15980308|emb|AJ414152. 1| Yersinia pestis strain C092 com... 1982 0.0 SEQ ID NO : 2288 gi#1619271#emb#Z80904. 1ICICOS1 Ciona intestinalis DNA seque... 40 5.7 EXAMPLE 14 BLAST search of unique Yersinia pestis sequence against the nr database of NCBI showing homology between Yersinia pestis and various other biological entities.

A unique region of the Yersinia pestis genome (SEQ ID NO: 5) was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 8 BLAST "hits". The most pertinent"hits"are reported below with corresponding E values, these"hits" correspond to the SEQ ID NOs : 2289-2291. Three of the"hits"had an extremely high probability score. The three"hits"with high scores were identified correctly by the BLAST search as Yersinia pestis with 100% homology to the query sequence over one thousand nucleotides.

Distribution of 8 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2289 gi|21958526|gb|AE013776. 1| Yersinia pestis KIM section 176... 1982 0.0

SEQ ID NO : 2290 gi#15980308#emb#AJ414152.1# Yersinia pestis strain C092 com... 1982 0. 0 SEQ ID NO : 2291 gi|5162956|gb|AF079973. 1|AF079973 Yersinia pseudotuberculos... 1830 0.0 EXAMPLE 15 BLAST search of unique Yersinia pestis sequence against the nr database of NCBI showing homology between Yersinia pestis and various other biological entities.

A unique region of the Yersinia pestis genome (SEQ ID NO: 6) was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 10 BLAST "hits". The most pertinent"hits"are reported below with corresponding E values, these"hits" correspond to the SEQ m NOs : 2292-2295. Two of the"hits"had an extremely high probability score. The high score"hits"was identified correctly by the BLAST search as Yersinia pestis with 100% homology to the query sequence over two hundred seventy nucleotides.

Distribution of 10 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NOs : 2292-2293 gi|21956857|gb|AE013619. 11 Yersinia pestis KIM section 19 o... 541 e-151 SEQ ID NOs : 2294-2295 gi#15981524#emb#AJ414158. 11 Yersinia pestis strain C092 com... 541 e-151 EXAMPLE 16 BLAST search of unique Yersinia pestis sequence against the nr database of NCBI showing homology between Yersinia pestis and various other biological entities.

A unique region of the Yersinia pestis genome (SEQ ID NO: 7) was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 11 BLAST "hits". The most pertinent"hits"are reported below with corresponding E values, these"hits" correspond to the SEQID NOs : 2296-2297. Two of the"hits"had an extremely high probability.

The two"hits"with high scores were identified correctly by the BLAST search as Yersinia pestis with 100% homology to the query sequence over one thousand nucleotides.

Distribution of 11 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2296

gi|21957368|gb|AE013668. 1| Yersinia pestis KIM section 68 0... 1982 0.0 SEQ ID NO : 2297 gi#15981328#emb#AJ414157.1# Yersinia pestis strain C092 com... 1982 0.0 EXAMPLE 17 BLAST search of unique Yersinia pestis sequence against the nr database of NCBI showing homology between Yersinia pestis and various other biological entities.

A unique region of the Yersinia pestis genome (SEQ ID NO: 8) was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 111 BLAST"hits". The most pertinent"hits"are reported below with corresponding E values, these "hits"correspond to the SEQ ID NOs : 2298-2300. Two of the"hits"had an extremely high probability score and one with low score. The two"hits"with high scores were identified correctly by the BLAST search as Yersinia pestis with 100% homology to the query sequence over nine hundred nucleotides. The low hit scores presented 93% homology over a distance of twenty seven nucleotides.

Distribution of 111 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2298 gi#21959847#gb#AE013896.1# Yersinia pestis KIM section 296... 1941 0.0 SEQ ID NO : 2299 gij15979242jembAJ414147. lj Yersinia pestis strain C092 com... 1941 0.0 SEQ ID NO : 2300 gi#12721731#gb#AE006174.1#AE006174 Pasteurella multocida PM... 42 1.4 EXAMPLE 18 BLAST search of unique Yersinia pestis sequence against the nr database of NCBI showing homology between Yersinia pestis and various other biological entities.

A unique region of the Yersinia pestis genome was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 11 BLAST"hits". The most pertinent"hits"are reported below with corresponding E values, these"hits"correspond to the SEQ ID NOs : 2301-2302. Two of the"hits"had an extremely high probability score. The two "hits"with high scores were identified correctly by the BLAST search as Yersinia pestis with 100% homology to the query sequence over one thousand nucleotides.

Distribution of 11 Blast Hits on the Query Sequence

Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2301 gi|21959617|gb|AE013876. 1| Yersinia pestis KIM section 276... 1982 0.0 SEQ ID NO : 2302 gi#15979410#emb#AJ414148. 11 Yersinia pestis strain C092 com... 1982 0.0 EXAMPLE 19 BLAST search of unique Yersinia pestis sequence against the nr database of NCBI showing homology between Yersinia pestis and various other biological entities.

A unique region of the Yersinia pestis genome was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 31 BLAST"hits". The most pertinent"hits"are reported below with corresponding E values, these"hits"correspond to the SEQ ID NOs : 2302-2305. Three of the"hits"had an extremely high probability score. The three"hits"with high scores were identified correctly by the BLAST search as Yersinia pestis with 100% homology to the query sequence over approximately one thousand nucleotides.

Distribution of 31 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2303 gi|21959227|gb|AE013840. 11 Yersinia pestis KIM section 240... 1911 0.0 SEQ ID NO : 2304 gi#15979723#emb#AJ414150.1# Yersinia pestis strain C092 com... 1911 0.0 SEQ ID NO : 2305 gi#4106567#emb#AL031866. 11YP102KB Yersinia pestis 102 kbase... 1911 0.0 EXAMPLE 20 BLAST search of unique Yersinia pestis sequence against the nr database of NCBI showing homology between Yersinia pestis and various other biological entities.

A unique region of the Yersinia pestis genome was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 12 BLAST"hits". The most pertinent"hits"are reported below with corresponding E values, these"hits"correspond to the SEQ ID NOs: 2306-2309. Three of the"hits"had an extremely high probability score and one with low score. The three"hits"with high scores were identified correctly by the BLAST search as Yersinia pestis with 100% homology to the query sequence over approximately one

thousand nucleotides. The low hit scores presented 93% homology over a distance of twenty eight nucleotides.

Distribution of 12 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2306 gi|21959227|gb|AE013840. 1| Yersinia pestis KIM section 240... 1966 0.0 SEQ ID NO : 2307 gi|15979723|emb|AJ414150. 1| Yersinia pestis strain C092 com... 1966 0.0 SEQ ID NO : 2308 gi|4106567|emb|AL031866. 1|YP102KB Yersinia pestis 102 kbase... 1966 0.0 SEQ ID NO : 2309 gi#23123895#ref#NZ_AABC01000054.1# Nostoc punctiforme Npun_... 44 0.36 EXAMPLE 21 BLAST search of unique Yersinia ipestis sequence against the nr database of NCBI showing homology between Yersinia pestis and various other biological entities.

A unique region of the Yersinia pestis genome was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 22 BLAST"hits". The most pertinent"hits"are reported below with corresponding E values, these"hits"correspond to the SEQ ID NOs: 2314-2320. Two of the"hits"had an extremely high probability score and seven with intermediate scores. The two"hits"with high scores were identified correctly by the BLAST search as Yersinia pestis with 100% homology to the query sequence over one thousand nucleotides. The intermediate scores presented at least 96% homology over a distance of nine hundred nucleotides. Sequence dissimilarities within the group with intermediate scores identify sequences of related species that have significant homology to the query sequence but are from different biological entities. Since the query sequence came from a unique region of Yersinia pestis, it is reasonable to infer that the sequences identified in other evolutionarily related biological entities are also from unique regions within their genomes.

Distribution of 22 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2314 gi#21957269#gb#AE013658.1# Yersinia pestis KIM section 58 o... 1982 0.0

SEQ ID NO : 2315 gi|15978388|emb|AJ414142. 1| Yersinia pestis strain C092 com... 1982 0.0 SEQ ID NO : 2316 gi#21960382#gb#AE013946.1# Yersinia pestis KIM section 346... 1832 0.0 SEQ ID NO : 2317 gill5978734lemblAJ-414144. 11 Yersinia pestis strain C092 com... 1832 0.0 SEQ ID NO : 2318 gi#21960376#gb#AE013945.1# Yersinia pestis KIM section 345... 1776 0.0 SEQ ID NO : 2319 gi#21958640#gb#AE013786.1# Yersinia pestis KIM section 186... 1673 0.0 SEQ ID NO : 2320 gi#15979570#emb#AJ414149.1# Yersinia pestis strain C092 com... 1665 0.0 EXAMPLE 22 BLAST search of unique Yersinia pestis sequence against the nr database of NCBI showing homology between Yersinia pestis and various other biological entities.

A unique region of the Yersinia pestis genome was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 10 BLAST"hits". The most pertinent"hits"are reported below with corresponding E values, these"hits"correspond to the SEQ ID NOs : 2321-2323. Two of the"hits"had an extremely high probability score and one with low score. The two"hits"with high scores were identified correctly by the BLAST search as Yersinia pestis with 100% homology to the query sequence over one thousand nucleotides.

The low score presented 82% homology over a distance of sixty six nucleotides.

Distribution of 10 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2321 gi#21959041#gb#AE013824.1# Yer xia pestis KIM section 224... 1982 0.0 SEQ ID NO : 2322 gi#15980007#emb#AJ414151.1# Yersinia pestis strain C092 com... 1982 0.0 SEQ ID NO : 2323 gij23471572jref NZAABH01000007. l) Pseudomonas syringae pv.... 48 0.023

EXAMPLE 23 BLAST search of unique Yersinia pestis sequence against the nr database of NCBI showing homology between Yersinia pestis and various other biological entities.

A unique region of the Yersinia pestis genome was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 26 BLAST"hits". The most pertinent"hits"are reported below with corresponding E values, these"hits"correspond to the SEQ ID NOs: 2324-2326. Two of the"hits"had an extremely high probability score and one with low scores. The two"hits"with high scores were identified correctly by the BLAST search as Yersinia pestis with 100% homology to the query sequence over approximately one thousand nucleotides. The low score presented 90% homology over a distance of twenty nine nucleotides.

Distribution of 26 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2324 gi|21960738|gb|AE013979. 1| Yersinia pestis KIM section 379 1966 0.0 SEQ ID NO : 2325 gi|15978388|emb|AJ414142. 1| Yersinia pestis strain C092 com... 1966 0.0 SEQ ID NO : 2326 gi|23474677|ref|NZ_AABI01000005. 1| Desulfovibrio desulfuric 40 5.7 EXAMPLE 24 BLAST search of unique Eastern equine encephalitis virus sequence against the nr database of NCBI showing homology between Eastern equine encephalitis virus and various other biological entities.

A unique region of the Eastern equine encephalitis virus genome was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 39 BLAST "hits". The most pertinent"hits"are reported below with corresponding E value, these"hits" correspond to the SEQ ID NOs: 2327-2344. Two of the"hits"had an extremely high probability score, and eleven with intermediate scores. The two"hits"with high scores were identified correctly by the BLAST search as Eastern equine encephalitis virus with 100% homology to the query sequence over hundreds of nucleotides. Sequence dissimilarities within the group with intermediate scores identified BLAST sequences of related species that have significant homology to the query sequence but are from different biological entities. Since the query sequence originated from a unique region of Eastern equine encephalitis virus, it is reasonable to infer that the sequences identified by the BLAST search in other evolutionarily related biological

entities are also from unique regions within their genomes. The intermediate hit scores presented approximately 92% homology over a distance of approximately 50 nucleotides.

Distribution of 39 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2327 gi#59185#emb#X63135.1#EEEVIRNA Eastern Equine Encephalomyel... 2577 0.0 SEQ ID NO : 2328 gi#393006#gb#U01034. 1#U01034 Eastern equine encephalomyelit... 2426 0.0 SEQ ID NO : 2329 gi#22001302#gb#AF525498. 11 Eastern equine encephalitis viru... 62 2e-06 SEQ ID NO : 2330 gi|22001298|gb|AF525496. 1| Eastern equine encephalitis viru... 62 2e-06 SEQ ID NOs : 2331-2332 gi#398206#emb#X74892.1#WEEVNS Western Equine Encephalitis V... 58 4e-05 SEQ ID NOs : 2333-2334 gi#6760410#gb#AF214040. 1lAF214040 Western equine encephalom... 58 4e-05 SEQ ID NOs : 2335-2336 gi#393033#gb#U01065.1#WEU01065 Western equine encephalomyel... 58 4e-05 SEQ ID NOs : 2337-2338 gi#4262314#gb#AF075256. 1|AF075256 Venezuelan equine encepha... 50 0.009 SEQ ID NOs : 2339-2340 gil323706|gb|L00930. 1|EEVNSPECFA Venezuelan equine encephal... 46 0.14 SEQ ID NO : 2341 gi#663260#emb#Z48163.1#SFVRNAIS Semliki forest virus A7 RNA... 44 0.53 SEQ ID NO : 2342 gi#4262320#gb#AF075258.1#AF075258 Venezuelan equine encepha... 40 8.3 SEQ ID NO : 2343 gi#4262317#gb#AF075257. 1|AF075257 Venezuelan equine encepha... 40 8.3 SEQ ID NO : 2344 gi#4262311#gb#AF075255. 1#AF075255 Venezuelan equine encepha... 40 8.3

EXAMPLE 25 BLAST search of unique Eastern equine encephalitis virus sequence against the nr database of NCBI showing homology between Eastern equine encephalitis virus and various other biological entities.

A unique region of the Eastern equine encephalitis virus genome was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 9 BLAST "hits". The most pertinent"hits"are reported below with corresponding E values, these"hits" correspond to the SEQ ID NOs : 2345-2346. Two of the"hits"had an extremely high probability score. These two"hits"with high scores were identified by the BLAST search as Eastern equine encephalitis virus with 100% homology to the query sequence over approximately one thousand nucleotides.

Distribution of 9 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2345 gi|59185|emb|X63135. 1|EEEVIRNA Eastern Equine Encephalomyel... 1586 0.0 SEQ ID NO : 2346 gi|393006|gb|U01034. 1|U01034 Eastern equine encephalomyelit... 1459 0.0 EXAMPLE 26 BLAST search of unique Ebola virus sequence against the nr database of NCBI showing homology between Ebola virus and various other biological entities.

A unique region of the Ebola virus genome was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 189 BLAST"hits". The most pertinent"hits"are reported below with corresponding E values, these"hits"correspond to the SEQ ID NOs : 2347-2482. Seventeen of the"hits"had an extremely high probability score, and twenty-nine with intermediate scores. The seventeen"hits"with high scores were identified correctly by the BLAST search as Ebola virus with 100% homology to the query sequence over thousands of nucleotides. Sequence dissimilarities within the group with intermediate scores identified BLAST sequences of related species or strains that have significant homology to the query sequence but are from different biological entities. Since the query sequence originated from a unique region of Ebola virus, it is reasonable to infer that the sequences identified by the BLAST search in other evolutionarily related biological entities are also from unique regions

within their genomes. The intermediate hit scores presented approximately 92% homology over a distance of less than one thousand nucleotides.

Distribution of 189 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2347 gi#10141003#gb#AF086833. 21 Zaire Ebola virus strain Mayinga... 3.742e+04 0.0 SEQ ID NO : 2348 gi#23630482#gb#AY142960.1# Zaire Ebola virus strain Mayinga... 3.741e+04 0.0 SEQ ID NO : 2349 gi#11761745#gb#AF272001. 11 Zaire Ebola virus strain Mayinga... 3.735e+04 0.0 SEQ ID NO : 2350 gi#21702647#gb#AF499101.1 Zaire Ebola virus strain Mayinga... 3.731e+04 0.0 SEQ ID NO : 2351 gi#2522270#gb#L11365.1#EBORNA Zaire Ebola virus nucleoprote... 2.311e+04 0.0 SEQ ID NO : 2352 gi#2546940#emb#X67110.1#EBLPROTG Zaire Ebola virus L gene e... 1.419e+04 0.0 SEQ ID NO : 2353 gi#323686#gb#J04337. 11EBONP Zaire Ebola virus nucleoprotein... 5935 0.0 SEQ ID N0 : 2354 gi#297395#emb#X61274. 1IEWP23 Zaire Ebola virus vp2 gene an... 5672 0.0 SEQ ID NO : 2355 gi#1041204#gb#U23187. 1#EVU23187 Zaire Ebola virus Mayinga s... 4732 0.0 SEQ ID NO : 2356 gi#1141778#gb#U31033.1#EVU31033 Zaire Ebola virus envelope... 4728 0.0 SEQ ID NO : 2357 gi1753170 gb U81161. 1 EVU81161 Zaire Ebola virus virion sp... 4708 0.0 SEQ ID NO : 2358 gi#2138276#gb#U77384. 1#EVU77384 Zaire Ebola virus strain Ga... 4458 0.0 SEQ ID NO : 2359 gi#1695251#gb#U28077.1#EVU28077 Zaire Ebola virus strain Za... 4454 0.0 SEQ ID NO : 2360 gi#6006454#emb#y09358.1#EVNUCLEOP Zaire Ebola virus N gene... 4177 0.0

SEQ ID NO : 2361 gi#3005674#gb#AF054908.1 Zaire Ebola virus nucleocapsid pr... 4163 0. 0 SEQ ID NO : 2362 gi#16751327#gb#AY058898. 11 Zaire Ebola virus spike glycopro... 3957 0.0 SEQ ID NO : 2363 gi#16751321#gb#AY058895. 11 Zaire Ebola virus nucleoprotein... 3307 0.0 SEQ ID NO : 2364 gi#16751323#gb#AY058896.1 Zaire Ebola virus matrix protein... 1972 0.0 SEQ ID NO : 2365 gi#2138279#gb#U77385.1#EVU77385 Zaire Ebola virus strain Ga... 1635 0.0 SEQ ID NO : 2366 gi|16751325|gb|AY058897. 1| Ebola virus membrane associated... 1628 0.0 SEQ ID NOs : 2367-2392 gi#15823608#dbj#AB050936. 11 Reston Ebola virus genomic RNA,... 256 le-63 SEQ ID NOs : 2393-2419 gi#22671623#gb#AF522874. 11 Reston Ebola virus strain Pennsy... 248 3e-61 SEQ ID NOs : 2420-2422 gi#5762337#gb#AF173836.1 Sudan Ebola virus strain Boniface... 206 9e-49 SEQ ID NO : 2423 gi#323684#gb#m33062.1#EBOMAY Zaire Ebola virus 3'proximal... 167 8e-37 SEQ ID NOs : 2424-2426 giIl041217IgbIU28006. 1#EVU28006 Cote d'Ivoire Ebola virus v... 143 le-29 SEQ ID NOs : 2427-2436 gi#1041213#gb#U23458.1#EVU23458 Sudan Ebola virus Maleo str... 129 2e-25 SEQ ID NO : 2437 gi#1041223#gb#U28134. 1|EVU28134 Sudan Ebola virus strain Bo... 82 4e-11 SEQ ID NO : 2438 gi#1041198#gb#U23069.1#EVU23069 Sudan Ebola virus Maleo str... 82 4e-11 SEQ ID NOs : 2439-2445 gi#450908#emb#Z29337.1#MVVIRPR Marburg virus (Popp) NP, VP... 72 3e-08 SEQ ID NOs : 2446-2451 gi#296962#emb#X68494.1#MAVSPAB Marburg Virus genomic RNA of... 72 3e-08

SEQ ID NOs : 2452-2454 gi#1041201#gb#u23152.1#EVU23152 Reston Ebola virus glycopro... 70 le-07 SEQ ID NOs : 2455-2456 gi#1041207#gb#U23416.1#EVU23416 Reston Ebola virus Philippi... 70 le-07 SEQ ID NOs : 2457-2459 gi|3253214|gb|AF034645. 1|AF034645 Ebola virus Reston (GP)... 70 le-07 SEQ ID NOs : 2460-2463 gi#332178#gb#M92834.1#MRVMBGL Marburg virus L protein (mbgl... 64 9e-06 SEQ ID NOs : 2464-2468 gi#541780#emb#Z12132.1#MVREPCYC Marburg virus genes for vp3... 64 9e-06 SEQ ID NOs : 2469-2471 gi#1041210#gb#U23417.1#EVU23417 Reston Ebola virus Siena st... 62 3e-05 SEQ ID NO : 2472 gi#8570260#gb#AC013412.3#AC013412 Homo sapiens BAC clone RP... 52 0.032 SEQ ID NO : 2473 gi#5263178#dbj#D83729.1# Homo sapiens AMGY gene for ameloge... 52 0.032 SEQ ID NO : 2474 gi#23499701#gb#AC122207.2# Mus musculus chromosome 16 clone... 50 0.13 SEQ ID NO : 2475 gi#27802036#gb#AC068476.13# Homo sapiens chromosome 8, clon... 48 0.51 SEQ ID NO : 2476 gi#18151023#gb#AC093428.2# Homo sapiens chromosome 1 clone... 48 0.51 SEQ ID NO : 2477 gi#20330806#gb#AC106793. 21 Homo sapiens chromosome 16 clone... 46 2.0 SEQ ID NO : 2478 gi#20196842#gb#AC002332. 31 Arabidopsis thaliana chromosome... 44 7.9 SEQ ID NO : 2479 gi#18854986#gb#AC108121. 2# Homo sapiens chromosome 5 clone... 44 7.9 SEQ ID NO : 2480 gi#18677374#gb#AC106771.2# Homo sapiens chromosome 5 clone... 44 7. 9 SEQ ID NO : 2481 gi#12483715#gb#AF178425.1#AF178425 Lactococcus lactis TcsCo... 44 7.9

SEQ ID NO : 2482 gi|296964|emb|X68495. 1|MAVSPAC Marburg Virus genomic RNA of... 44 7.9 EXAMPLE 27 BLAST search of unique Ebola virus sequence against the nr database of NCBI showing homology between Ebola virus and various other biological entities.

A unique region of the Ebola virus genome was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 137 BLAST"hits". The most pertinent"hits"are reported below with corresponding E values, these"hits"correspond to the SEQ ID NOs : 2483-2512. Nine of the"hits"had an extremely high probability score, and twelve with intermediate scores. The nine"hits"with high scores were identified correctly by the BLAST search as Ebola virus with 100% homology to the query sequence over one thousand nucleotides. Sequence dissimilarities within the group with intermediate scores identified BLAST sequences of related species or strains that have significant homology to the query sequence but are from different biological entities. Since the query sequence originated from a unique region of Ebola virus, it is reasonable to infer that the sequences identified by the BLAST search in other evolutionarily related biological entities are also from unique regions within their genomes. The intermediate hit scores presented approximately 92% homology over a distance of less than one thousand nucleotides.

Distribution of 137 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2483 gi|23630482|gb|AY142960. 11 Zaire Ebola virus strain Mayinga... 7929 0.0 SEQ ID NO : 2484 gi#10141003#gb#AF086833. 21 Zaire Ebola virus strain Mayinga... 7929 0.0 SEQ ID NO : 2485 gi|21702647|gb|AF499101. 1| Zaire Ebola virus strain Mayinga... 7906 0.0 SEQ ID NO : 2486 gi#11761745#gb#AF272001.1# Zaire Ebola virus strain Mayinga... 7906 0.0 SEQ ID NO : 2487 gi#2522270#gb#L11365. 11EBORNA Zaire Ebola virus nucleoprote... 7894 0.0 SEQ ID NO : 2488 gi#297395#emb#X61274.1#EVVP23 Zaire Ebola virus vp2 gene an... 5672 0.0

SEQ ID NO : 2489 gi|323686|gb|J04337. 1|EBONP Zaire Ebola virus nucleoprotein... 2008 0.0 SEQ ID NO : 2490 gi#16751323#gb#AY058896.1# Zaire Ebola virus matrix protein... 1972 0.0 SEQ ID NO : 2491 gi|6006454|emb|Y09358. 1|EVNUCLEOP Zaire Ebola virus N gene... 1281 0.0 SEQ ID NO : 2492 gi#3005674#gb#AF054908.1# Zaire Ebola virus nucleocapsid pr... 1271 0.0 SEQ ID NO : 2493 gi#16751321#gb#AY058895.1# Zaire Ebola virus nucleoprotein... 965 0.0 SEQ ID NO : 2494 gi#1041204#gb#U23187.1#EVU23187 Zaire Ebola virus Mayinga s... 204 8e-49 SEQ ID NO : 2495 gi#1753170#gb#U81161.1#EVU81161 Zaire Ebola virus virion sp... 204 8e-49 SEQ ID NO : 2496 gi#1141778#gb#U31033.1#EVU31033 Zaire Ebola virus envelope... 200 le-47 SEQ ID NO : 2497 gi#1695251#gb#U28077.1#EVU28077 Zaire Ebola virus strain Za... 172 3e-39 SEQ ID NO : 2498 gi#2138276#gb#U77384.1#EVU77384 Zaire Ebola virus strain Ga... 157 2e-34 SEQ ID NOs : 2499-2504 gi#15823608#dbj#AB050936.1# Reston Ebola virus genomic RNA,... 100 3e-17 SEQ ID NOs : 2505-2510 gi#22671623#gb#AF522874.1# Reston Ebola virus strain Pennsy... 88 le-13 SEQ ID NO : 2511 gi#23499701#gb#AC122207.2# Mus musculus chromosome 16 clone... 50 0.027 SEQ ID NO : 2512 gi#27802036#gb#AC068476.13# Homo sapiens chromosome 8, clown 48 0.11 EXAMPLE 28 BLAST search of unique Ebola virus sequence against the nr database of NCBI showing homology between Ebola virus and various other biological entities.

A unique region of the Ebola virus genome was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 117 BLAST"hits". The most pertinent"hits"are reported below with corresponding E values, these"hits"correspond to the SEQ ID NOs: 2513-2591. Six of the"hits"had an extremely high probability score, and twenty three with intermediate scores. The six"hits"with high scores were identified correctly by the BLAST search as Ebola virus with 100% homology to the query sequence over one thousand nucleotides. Sequence dissimilarities within the group with intermediate scores identified BLAST sequences of related species or strains that have significant homology to the query sequence but are from different biological entities. Since the query sequence originated from a unique region of Ebola virus, it is reasonable to infer that the sequences identified by the BLAST search in other evolutionarily related biological entities are also from unique regions within their genomes. The intermediate hit scores presented approximately 92% homology over a distance of less than one thousand nucleotides.

Distribution of 117 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2513 gi#10141003#gb#AF086833. 21 Zaire Ebola virus strain Mayinga... 2.259e+04 0.0 SEQ ID NO : 2514 gi|23630482|gb|AY142960. 11 Zaire Ebola virus strain Mayinga... 2.258e+04 0.0 SEQ ID NO : 2515 gi#11761745#gb#AF272001. 11 Zaire Ebola virus strain Mayinga... 2.255e+04 0.0 SEQ ID NO : 2516 gi|21702647|gb|AF499101. 11 Zaire Ebola virus strain Mayinga... 2.253e+04 0.0 SEQ ID NO : 2517 gi#2546940#emb#X67110. 1#EBLPROTG Zaire Ebola virus L gene e... 1.419e+04 0.0 SEQID NO : 2518 gi|2522270|gb|L11365. 1|EBORNA Zaire Ebola virus nucleoprote... 8338 0.0 SEQ ID NO : 2519 gi#2138279#gb#U77385.1#EVU77385 Zaire Ebola virus strain Ga... 1635 0.0 SEQ ID NO : 2520 gi#16751325#gb#AY058897. 11 Ebola virus membrane associated... 1628 0.0 SEQ ID NO : 2521 gi#1141778#gb#U31033.1#EVU31033 Zaire Ebola virus envelope... 1596 0.0

SEQ ID NO : 2522 giII0412041gbIU23187. 11EVU23187 Zaire Ebola virus Mayinga s... 1596 0.0 SEQ ID NO : 2523 gi#2138276#gb#U77384.1#EVU77384 Zaire Ebola virus strain Ga... 1592 0.0 SEQ ID NO : 2524 gi#1753170#gb#U81161.1#EVU81161 Zaire Ebola virus virion sp... 1580 0.0 SEQ ID NO : 2525 gi#1695251#gb#U28077. 1#EVU28077 Zaire Ebola virus strain Za... 1524 0.0 SEQ ID NO : 2526 gi|l6751327|gb|AY058898. 1| Zaire Ebola virus spike glycopro... 1261 0.0 SEQ ID NOs : 2527-2539 gi#15823608#dbj#AB050936.1# Reston Ebola virus genomic RNA,... 256 7e-64 SEQ ID NOs : 2540-2553 gi#22671623#gb#AF522874. 11 Reston Ebola virus strain Pennsy... 248 2e-61 SEQ ID NO : 2554 gi#1041217#gb#U28006. 1#EVU28006 Cote d'Ivoire Ebola virus v... 143 7e-30 SEQ ID NOs : 2555-2564 gi#1041213#gb#U23458.1#EVU23458 Sudan Ebola virus Maleo str... 129 le-25 SEQ ID NOs : 2565-2570 gi#450908#emb#Z29337. 1IMWIRPR Marburg virus (Popp) NP, VP... 72 2e-08 SEQ ID NOs : 2571-2576 gi#296962#emb#X68494.1#MAVSPAB Marburg Virus genomic RNA of... 72 2e-08 SEQ ID NOs : 2577-2580 gi#332178#gb#M92834. 1#MRVMBGL Marburg virus L protein (mbgl... 64 5e-06 SEQ ID NOs : 2581-2584 gi#541780#emb#Z12132.1#MVREPCYC Marburg virus genes for vp3... 64 5e-06 SEQ ID NO : 2585 gi#8570260#gb#AC013412. 31AC013412 Homo sapiens BAC clone RP... 52 0.020 SEQ ID NO : 2586 gi#5263178#dbj#D83729. 1# Homo sapiens AMGY gene for ameloge... 52 0.020 SEQ ID NO : 2587 gi#1041201#gb#U23152.1#EVU23152 Reston Ebola virus glycopro... 46 1.2

SEQ ID NO : 2588 gij1041207) gbjU23416. lJEVU23416 Reston Ebola virus Philippi... 46 1.2 SEQ ID NO : 2589 giI1041210IgbIU23417. 11EW23417 Reston Ebola virus Siena st... 46 1.2 SEQ ID NO : 2590 gi#20330806#gb#AC106793.2# Homo sapiens chromosome 16 clone... 46 1.2 SEQ ID NO : 2591 gi|3253214|gb|AF034645. 1|AF034645 Ebola virus Reston (GP)... 46 1.2 EXAMPLE 29 BLAST search of unique Ebola virus sequence against the nr database of NCBI showing homology between Ebola virus and various other biological entities.

A unique region of the Ebola virus genome was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 49 BLAST"hits". The most pertinent"hits"are reported below with corresponding E values, these"hits"correspond to the SEQ ID NOs: 2592-2608. Six of the"hits"had an extremely high probability score, and eight with intermediate scores. The six"hits"with high scores were identified correctly by the BLAST search as Ebola virus with 100% homology to the query sequence of approximately one thousand nucleotides. Sequence dissimilarities within the group with intermediate scores identified BLAST sequences of related species or strains that have significant homology to the query sequence but are from different biological entities. Since the query sequence originated from a unique region of Ebola virus, it is reasonable to infer that the sequences identified by the BLAST search in other evolutionarily related biological entities are also from unique regions within their genomes. The intermediate hit scores presented approximately 92% homology over a distance of less than one thousand nucleotides.

Distribution of 49 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2592 gi#23630482#gb#AY142960.1# Zaire Ebola virus strain Mayinga... 1982 0.0 SEQ ID NO : 2593 gi#2522270#gb#L11365.1#EBORNA Zaire Ebola virus nucleoprote... 1982 0. 0 SEQ ID NO : 2594 gi#10141003#gb#AF086833.2# Zaire Ebola virus strain Mayinga... 1982 0.0

SEQ ID NO : 2595 gi|ll761745|gb|AF272001. 1| Zaire Ebola virus strain Mayinga... 1982 0.0 SEQ ID NO : 2596 gi#21702647#gb#AF499101. 11 Zaire Ebola virus strain Mayinga... 1974 0.0 SEQ ID NO : 2597 gi#323686#gb#J04337.1#EBONP Zaire Ebola virus nucleoprotein... 1953 0.0 SEQ ID NO : 2598 gi|6006454|emb|Y09358. 1|EVNUCLEOP Zaire Ebola virus N gene... 1025 0. 0 SEQ ID NO : 2599 gi#3005674#gb#AF054908. 11 Zaire Ebola virus nucleocapsid pr... 1013 0.0 SEQ ID NO : 2600 gi#16751321#gb#AY058895.1# Zaire Ebola virus nucleoprotein... 452 e-124 SEQ ID NO : 2601 gi#323684#gb#M33062. 11EBOMAY Zaire Ebola virus 3'proximal... 167 4e-38 SEQ ID NOs : 2602-2603 gi#22671623#gb#AF522874. 11 Reston Ebola virus strain Pennsy... 84 5e-13 SEQ ID NOs : 2604-2605 gi#5762337#gb#AF173836. 11 Sudan Ebola virus strain Boniface... 82 2e-12 SEQ ID NOs : 2606-2607 gi#15823608#dbj#AB050936.1# Reston Ebola virus genomic RNA,... 68 3e-08 SEQ ID NO : 2608 gi#12964273#emb#AL162378.16# Human DNA sequence from clone 44.41 EXAMPLE 30 BLAST search of unique Ebola virus sequence against the nr database of NCBI showing homology between Ebola virus and various other biological entities.

A unique region of the Ebola virus genome was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 102 BLAST"hits". The most pertinent"hits"are reported below with corresponding E values, these"hits"correspond to the SEQ ID NOs: 2609-2641. Five of the"hits"had an extremely high probability score, and nine with intermediate scores. The five"hits"with high scores were identified correctly by the BLAST search as Ebola virus with 100% homology to the query sequence of over one thousand nucleotides. Sequence dissimilarities within the group with intermediate scores identified

BLAST sequences of related species or strains that have significant homology to the query sequence but are from different biological entities. Since the query sequence originated from a unique region of Ebola virus, it is reasonable to infer that the sequences identified by the BLAST search in other evolutionarily related biological entities are also from unique regions within their genomes. The intermediate hit scores presented approximately 92% homology over a distance of less than one thousand nucleotides.

Distribution of 102 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2609 gi#23630482#gb#AY142960.1# Zaire Ebola virus strain Mayinga... 9789 0.0 SEQ ID NO : 2610 gi#10141003#gb#AF086833.2# Zaire Ebola virus strain Mayinga... 9789 0.0 SEQ ID NO : 2611 giIll761745IgbIAF272001. 11 Zaire Ebola virus strain Mayinga... 9781 0.0 SEQ ID NO : 2612 gi#21702647#gb#AF499101. 11 Zaire Ebola virus strain Mayinga... 9765 0.0 SEQ ID NO : 2613 gi#2546940#emb#X67110.1#EBLPROTG Zaire Ebola virus L gene e... 9307 0.0 SEQ ID NOs : 2614-2618 gi#1041213#gb#U23458. 1#EVU23458 Sudan Ebola virus Maleo str... 113 3e-21 SEQ ID NOs : 2619-2626 gil22671623IgbIAF522874. 11 Reston Ebola virus strain Pennsy... 58 le-04 SEQ ID NOs : 2627-2633 gi#15823608#dbj#AB050936.1# Reston Ebola virus genomic RNA,... 58 le-04 SEQ ID NO : 2634 gi#8570260#gb#AC013412. 3|AC013412 Homo sapiens BAC clone RP... 52 0.008 SEQ ID NO : 2635 gil5263178ldbjID83729. 11 Homo sapiens AMGY gene for ameloge... 52 0.008 SEQ ID NOs : 2636-2637 gi#450908#emb#Z29337.1#MVVIRPR Marburg virus (Popp) NP, VP... 48 0.13 SEQ ID NOs : 2638-2639 gi#296962#emb#X68494. l) MAVSPAB Marburg Virus genomic RNA of... 48 0.13

SEQ ID NO : 2640 gill5216072|emb|AL596208. 3| Human DNA sequence from clone R... 44 2.1 SEQ ID NO : 2641 gi|20068695|emb|AL663113. 7| Mouse DNA sequence from clone R... 44 2.1 EXAMPLE 31 BLAST search of unique Francisella tularensis sequence against the nr database of NCBI showing homology between Francisella tularensis and various other biological entities.

A unique region of the Francisella tularensis genome was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 152 BLAST"hits". The most pertinent"hits"are reported below with corresponding E values, these"hits"correspond to the SEQ ID NOs : 2642-2650. One of the"hits"had an extremely high probability score, and eight with low scores. The single"hit"with high score was identified correctly by the BLAST search as Francisella tularensis with 100% homology to the query sequence of over one thousand nucleotides. The low hit scores presented approximately 96% homology over a distance of less than thirty nucleotides.

Distribution of 152 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2642 gi#148686#gb#M32059. 1|FRNTUL4 Francisella tularensis 13-kDa... 2266 0.0 SEQ ID NO : 2643 gi#23337712#emb#AL844221. 61 Mouse DNA sequence from clone R... 46 0.12 SEQ ID NO : 2644 gi#13899180#gb#AC061709. 25|AC061709 Homo sapiens 12 BAC RP1... 44 0.46 SEQ ID NO : 2645 giI9581959IgbIAC018677. 31AC018677 Homo sapiens BAC clone RP... 44 0.46 SEQ ID NO : 2646 gi#3695400#gb#AF096373.1#T9A4 Arabidopsis thaliana BAC T9A4 44 0.46 SEQ ID NO : 2647 gi#4538949#emb#AL049488.1#ATF24G24 Arabidopsis thaliana DNA... 44 0.46 SEQ ID NO : 2648 gi#7267723#emb#AL161517. 2|ATCHRIV29 Arabidopsis thaliana DN... 44 0.46

SEQ ID NO : 2649 gi|23503541|dbj|AP004617. 2| Oryza sativa (japonica cultivar... 44 0.46 SEQ ID NO : 2650 gi#22759507#emb#AL772149. 41 Mouse DNA sequence from clone R... 44 0.46 EXAMPLE 32 BLAST search of unique Francisella tularensis sequence against the nr database of NCBI showing Alomology between Francisella tularensis and various other biological entities.

A unique region of the Francisella tularensis genome was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 122 BLAST"hits". The most pertinent"hits"are reported below with corresponding E values, these"hits"correspond to the SEQ ID NOs: 2651-2678. Twenty eight of the"hits"had a low probability score. These "hits"with high score was identified correctly by the BLAST search as Francisella tularensis with 100% homology to the query sequence of over one thousand nucleotides. The low hit scores presented at least 90% homology over a distance of less than thirty five nucleotides.

Distribution of 122 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2651 gi#24347213#gb#AE015592. 11 Shewanella oneidensis MR-1 secti... 58 le-05 SEQ ID NO : 2652 gi#22954051#ref#NZ_AAAY01000001.1# Nitrosomonas europaea Ne... 56 4e-05 SEQ ID NO : 2653 gi#10305228#gb#AC074317.5#AC074317 Staphylococcus aureus cl... 52 6e-04 SEQ ID NO : 2654 gi|21203693|dbj|AP004824. 1| Staphylococcus aureus subsp. au... 52 6e-04 SEQ ID NO : 2655 gi#14349173#dbj#AP003131. 2# Staphylococcus aureus subsp. au... 52 6e-04 SEQ ID NO : 2656 gi#14246388#dbj#AP003360. 21 Staphylococcus aureus subsp. au... 52 6e-04 SEQ ID NO : 2657 gi#22987492#ref#NZ_AAAC01000283.1# Burkholderia fungorum Bc... 50 0.002 SEQ ID NO : 2658 gi#924614#gb#U20248.1#DNCVRL01 Dichelobacter nodosus C305 l... 48 0.010

SEQ ID NO : 2659 gi#3493323#gb#U20246.1#DNAVRL01 Dichelobacter nodosus strai... 48 0.010 SEQ ID NO : 2660 gi|2983975|gb|AE000749. 1|AE000749 Aquifex aeolicus section... 48 0.010 SEQ ID NO : 2661 gi#23475151#ref#NZ_AABI01000008.1# Desulfovibrio desulfuric... 48 0.010 SEQ ID NO : 2662 gi#23028157#ref#NZ_AAAT01000010.1# Microbulbifer degradans ... 48 0.010 SEQ ID NO : 2663 gi|15622956|dbj|AP000988. 1| Sulfolobus tokodaii genomic DNA... 46 0.039 SEQ ID NO : 2664 gi#30072#emb#X12784. 1 HSCOL4A12 Human col4al and col4a2 gen... 46 0.039 SEQ ID NO : 2665 gi#14970659#emb#AL161773.21# Human DNA sequence from clone... 46 0.039 SEQ ID NO : 2666 gi#9664777#gb#AF269456. 1#AF269456 Staphylococcus epidermidi... 44 0.15 SEQ ID NO : 2667 gi#9623773#gb#AF269873.1#AF269873 Staphylococcus epidermidi... 44 0.15 SEQ ID NO : 2668 gi#21646597#gb#AE012839. 11 Chlorobium tepidum TLS section ... 42 0.60 SEQ ID NO : 2669 gi#24053061#gb#AE015283. 11 Shigella flexneri 2a str. 301 se... 40 2.4 SEQ ID NO : 2670 gi#9946710#gb#AE004517. 11 Pseudomonas aeruginosa PA01, sect... 40 2.4 SEQ ID NO : 2671 gi#16421268#gb#AE008824. 11 Salmonella typhimurium LT2, sect... 40 2.4 SEQ ID NO : 2672 gi#16421239#gb#AE008823.1# Salmonella typhimurium LT2, sect... 40 2.4 SEQ ID NO : 2673 gi#12517034#gb#AE005491. 1|AE005491 Escherichia coli 0157: H7... 40 2.4 SEQ ID NO : 2674 gi#12721069#gb#AE006115. 11AE006115 Pasteurella multocida PM... 40 2.4

SEQ ID NO : 2675 gi|2367142|gb|AE000347. 1|AE000347 Escherichia coli K12 MG16... 40 2.4 SEQ ID NO : 2676 gi|4981015|gb|AE001727. 1|AE001727 Thermotoga maritima secti 40 2.4 SEQ ID NO : 2677 gi#16505370#emb#AL627283.1# Salmonella enterica serovar Typ... 40 2.4 SEQ ID NO : 2678 gi#16503805#emb#AL627276.1# Salmonella enterica serovar Typ... 40 2.4 EXAMPLE 33 BLAST search of unique Brucella melitensis sequence against the nr database of NCBI showing homology between Brucella melitensis and various other biological entities.

A unique region of the Brucella melitensis genome (SEQ ID NO: 16) was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 8 BLAST "hits". The pertinent"hits"are reported below with corresponding E values, these"hits" correspond to the SEQ ID NOs: 2679-2680. Two of the"hits"had an extremely high probability score. The two"hits"with high scores were identified by the BLAST search as Brucella species with 100% homology to the query sequence over one hundred fifty nucleotides. Sequence dissimilarities within the two sequences identified BLAST sequences of related species that have significant homology to the query sequence but are from different Brucella strains. Since the query sequence originated from a unique region of Brucella melitensis, it is reasonable to infer that the sequences identified by the BLAST search in other evolutionarily related biological entities are also from unique regions within their genomes.

Distribution of 8 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2679 gi|l7983790|gb|AE009610. 1|AE009610 Brucella melitensis stra 317 2e-84 SEQ ID NO : 2680 gi|23346950|gb|AE014331. 11 Brucella suis 1330 chromosome I... 301 le-79 EXAMPLE 34 BLASTsearch of unique Brucella melitensis sequence against the nr database of NCBI showing homology between Brucella melitensis and various other biological entities.

A unique region of the Brucella melitensis genome (SEQ ID NO: 19) was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 12 BLAST "hits". The pertinent"hits"are reported below with corresponding E values, these"hits" correspond to the SEQ ID NOs: 2681-2687. Two of the"hits"had an extremely high probability score, and five with low scoring"hits". The two"hits"with high scores were identified by the BLAST search as Brucella species with 100% homology to the query sequence over one hundred fifty nucleotides. The low hit scores presented approximately 87% homology over a distance of less than thirty nucleotides.

Distribution of 12 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2681 gi#23346822#gb#AE014320.1# Brucella suis 1330 chromosome I... 317 2e-84 SEQ ID NO : 2682 gi|17983926|gb|AE009622. 1|AE009622 Brucella melitensis stra... 317 2e-84 SEQ ID NO : 2683 gi#23347355#gb#AE014365. 11 Brucella suis 1330 chromosome I... 50 8e-04 SEQ ID NO : 2684 gi|l7983364|gb|AE009575. 1|AE009575 Brucella melitensis stra... 50 8e-04 SEQ ID NO : 2685 gi#14026998#dbj#AP003012.2# Mesorhizobium loti DNA, complet... 42 0.21 SEQ ID NO : 2686 giIl7743624IgbIAE008942. 1|AE008942 Agrobacterium tumefacien... 40 0.82 SEQ ID NO : 2687 gi#15161950#gb#AE007890. 1|AE007890 Agrobacterium tumefacien... 40 0.82 EXAMPLE 35 BLAST search of unique Brucella melitensis sequence against the nr database of NCBI showing homology between Brucella melitensis and various other biological entities.

A unique region of the Brucella melitensis genome (SEQ ID NO: 20) was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 6 BLAST "hits". The pertinent"hits"are reported below with corresponding E values, these"hits" correspond to the SEQ ID NOs: 2688-2689. Two of the"hits"had an extremely high probability score. The two"hits"with high scores were identified by the BLAST search as Brucella species with 100% homology to the query sequence over one hundred fifty nucleotides.

Distribution of 6 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value

SEQ ID NO : 2688 gi|23346813|gb|AE014319. 11 Brucella suis 1330 chromosome I... 317 2e-84 SEQ ID NO : 2689 gi#17983926#gb#AE009622.1#AE009622 Brucella melitensis stra... 317 2e-84 EXAMPLE 36 BLAST search of unique Brucella melitensis sequence against the nr database of NCBI showing homology between Brucella melitensis and various other biological entities.

A unique region of the Brucella melitensis genome (SEQ ID NO: 21) was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 11 BLAST "hits". The pertinent"hits"are reported below with corresponding E values, these"hits" correspond to the SEQ ID NOs: 2690-2691. Two of the"hits"had an extremely high probability score. The two"hits"with high scores were identified by the BLAST search as Brucella species with 100% homology to the query sequence over one hundred fifty nucleotides.

Distribution of 11 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2690 gi|23346813|gb|AE014319. 1| Brucella suis 1330 chromosome I... 317 2e-84 SEQ ID NO : 2691 gi|17983926|gb|AE009622. 1|AE009622 Brucella melitensis stra... 317 2e-84 EXAMPLE 37 BLAST search of unique Brucella melitensis sequence against the nr database of NCBI showing homology between Brucella melitensis and various other biological entities.

A unique region of the Brucella melitensis genome (SEQ ID NO: 22) was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 5 BLAST "hits". The pertinent"hits"are reported below with corresponding E values, these"hits" correspond to the SEQ ID NOs: 2692-2693. Two of the"hits"had an extremely high probability score. The two"hits"with high scores were identified by the BLAST search as Brucella species with 100% homology to the query sequence over one hundred fifty nucleotides.

Distribution of 5 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2692 gil23346813igbIAE014319. 11 Brucella suis 1330 chromosome I... 317 2e-84

SEQ ID NO : 2693 giI17983926IgbIAE009622. 1IAE009622 Brucella melitensis stra... 317 2e-84 EXAMPLE 38 BLAST search of unique Clostridium perfringens sequence against the nr database of NCBI showing homology between Clostridium perfringens and various other biological entities.

A unique region of the Clostridium perfringens genome was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 130 BLAST"hits". The observed"hits"are reported below with corresponding E values, these"hits"correspond to the SEQ ID NOs: 2694-2739. Two of the"hits"had an extremely high probability score, three had intermediate scores and nineteen with low scores. The two"hits"with high scores were identified correctly by the BLAST search as Clostridium s' with 100% homology to the query sequence over one hundred sixty nucleotides. Sequence dissimilarities within the group with intermediate scores identified BLAST sequences of related species that have significant homology to the query sequence but are from different biological entities. Since the query sequence originated from a unique region of Clostridium perfringens, it is reasonable to infer that the sequences identified by the BLAST search in other evolutionarily related biological entities are also from unique regions within their genomes. The low hit scores presented at least 81% homology over a distance of less than fifty nucleotides.

Distribution of 130 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2694 gi#18143657#dbj#AP003185.1# Clostridium perfringens str. 13... 991 0.0 SEQ ID NO : 2695 giIl5022817gbIAE007513. 1IAE007513 Clostridium acetobutylic... 82 8e-13 SEQ ID NOs : 2696-2697 gi|532282|db|D28808. 1|MYCMTLAGYR Mycoplasma capricolum mtl 68 le-08 SEQ ID NO : 2698 gi#1790872#gb#U35453.1#CAU35453 Clostridium acetobutylicum ... 66 5e-08 SEQ ID NO : 2699 gi#22775678#dbj#AP004593.1# Oceanobacillus iheyensis genomi... 66 5e-08 SEQ ID NO : 2700 gi|23496883|gb|AE014850. 11 Plasmodium falciparum 3D7 chromo... 62 8e-07 SEQ ID NO : 2701 gi#4099104#gb#U83664.1#MAU83664 Mycoplasma arthritidis gyra... 58 le-05 SEQ ID NO : 2702 gi#3861033#emb#AJ235272. 1|RPXX03 Rickettsia prowazekii stra... 56 5e-05

SEQ ID NO : 2703 gi#23121882#ref#NZ_AAAW01000001.1# Prochlorococcus marinus... 56 5e-05 SEQ ID NO : 2704 gi#21903714#gb#AF036961.2# Mycoplasma hominis glucose 1-pho... 54 2e-04 SEQ ID NO : 2705 gi#13654371#gb#AC025948.16#AC025948 Staphylococcus aureus c... 52 7e-04 SEQ ID NO : 2706 gi|9665188|gb|AC025950. 9|AC025950 Staphylococcus aureus clo... 52 7e-04 SEQ ID NO : 2707 gi#296393#emb#X71437. 1 SAGYRREC S. aureus genes gyrB, gyrA a... 52 7e-04 SEQ ID NO : 2708 gi#49345#emb#Z19108. 1ISCORICA S. citri dnaA, dnaN, gyrA and... 52 7e-04 SEQ ID NO : 2709 gi#16412421#emb#AL596163.1# Listeria innocua Clip11262 comp... 52 7e-04 SEQ ID NO : 2710 gi|21203164|dbjlAP004822. 1| Staphylococcus aureus subsp. au... 52 7e-04 SEQ ID NO : 2711 gi#14349167#dbj#AP003129. 21 Staphylococcus aureus subsp. au... 52 7e-04 SEQ ID NO : 2712 gi#153083#gb#M86227.1#STARECF Staphylococcus aureus DNA gyr... 52 7e-04 SEQ ID NO : 2713 gi#14245767#dbj#AP003358. 2 Staphylococcus aureus subsp. au... 52 7e-04 SEQ ID NO : 2714 gi#540540#dbj#D10489. 1|STAGYRABA Staphylococcus aureus gene... 52 7e-04 SEQ ID NO : 2715 gi|14089942|emb|AL445565. 1|MPULM03 Mycoplasma pulmonis (str... 50 0.003 SEQ ID NO : 2716 gi#24796729#gb#AC090937. 21 Homo sapiens chromosome 3 clone... 48 0.012 SEQ ID NOs : 2717-2718 gi#22533630#gb#AE014219.1# Streptococcus agalactiae 2603V/R... 48 0.012 SEQ ID NO : 2719 gi#15150602#gb#AC093179. 11 Homo sapiens chromosome 3 clone... 48 0.012 SEQ ID NOs : 2720-2721 gi#23094961#emb#AL766846.1#SAG766846 Streptococcus agalacti... 48 0.012 SEQ ID NO : 2722 gi#21328233#gb#AF084042. 11 Listeria monocytogenes RecF (rec... 46 0.046 SEQ ID NO : 2423 gi#10881100#gb#AC017047.4#AC017047 Homo sapiens BAC clone R... 46 0.046 SEQ ID NO : 2724 gi#9623821#gb#AF269920. 1|AF269920 Staphylococcus epidermidi... 46 0.046 SEQ ID NO : 2725 gi#16409359#emb#AL591973. 11 Listeria monocytogenes strain E... 46 0.046

SEQ ID NO : 2726 gi#23002411#ref#NZ_AAAB01000003.1# Lactobacillus gasseri Lg... 46 0.046 SEQ ID NO : 2727 gi#10172612#dbj#AP001507.1# Bacillus halodurans genomic DNA... 46 0.046 SEQ ID NO : 2728 gi|2760176|dbj|ABO10081. 1| Bacillus sp. gene for B subunit... 46 0.046 SEQ ID NO : 2729 gi#5672640#dbj#AB013492. 11 Bacillus halodurans C-125 genomi... 46 0.046 SEQ ID NO : 2730 gi|21622892|gb|AE014076. 1| Buchnera aphidicola str. Sg (Sch... 44 0.18 SEQ ID NO : 2731 giI4887144IgbIAF138873. 11AF138873 Mus musculus p73 gene, ex... 44 0.18 SEQ ID NO : 2732 gi#4099109#gb#U83665.1#MAU83665 Mycoplasma arthritidis parE... 44 0.18 SEQ ID NO : 2733 gi|2827005|gb|AF008210. 1|AF008210 Buchnera aphidicola genom... 44 0.18 SEQ ID NO : 2734 gi#453417#emb#X77529.1#MHGYRBLIC M. hominis gyrB and licA genes 44 0. 18 SEQ ID NO : 2735 gi#4138442#emb#AJ005956. 1#SAY5956 Sesarma ayatum 16S rRNA g... 44 0.18 SEQ ID NO : 2736 gi#4138440#emb#AJ005951.1#SAY5951 Sesarma ayatum 16S rRNA g... 44 0.18 SEQ ID NO : 2737 gi#3687379#emb#AJ225891.1#SAJ225891 Sesarma sp. 16S rRNA ge... 44 0.18 SEQ ID NO : 2738 gi#23003070#ref#NZ_AAAB01000010.1# Lactobacillus gasseri Lg... 44 0.18 SEQ ID NO : 2739 gi#144144#gb#M80817.1#BUHRRDDG Buchnera aphidicola ribosoma... 44 0.18 EXAMPLE 39 BLAST search of unique Clostridium perfringens sequence against the nr database of NCBI showing homology between Clostridium perfringens and various other biological entities.

A unique region of the Clostridium perfringens genome was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 121 BLAST"hits". The observed"hits"are reported below with corresponding E values, these"hits"correspond to the SEQ ID NOs: 2740-2784. Three of the"hits"had an extremely high probability score, five with intermediate scores and thirty four with low scores. The two"hits"with high scores were identified correctly by the BLAST search as Clostridium perfringens with 100% homology to the query sequence over one hundred eighty nucleotides. Sequence dissimilarities within the group

with intermediate scores identified BLAST sequences of related species that have significant homology to the query sequence but are from different biological entities. Since the query sequence originated from a unique region of Clostridium perfringens, it is reasonable to infer that the sequences identified by the BLAST search in other evolutionarily related biological entities are also from unique regions within their genomes. The low hit scores presented at least 84% homology over a distance of less than fifty nucleotides.

Distribution of 121 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2740 gi#18143657#dbj#AP003185.1# Clostridium perfringens str. 13... 601 e-169 SEQ ID NO : 2741 gi|15022817|gb|AE007513. 1|AE007513 Clostridium acetobutylic 86 3e-14 SEQ ID NO : 2742 gi#1790872#gb#U35453. 1ICAU35453 Clostridium acetobutylicum... 70 2e-09 SEQ ID NOs : 2743-2744 gi|532282|dbj|D28808. 1|MYCMTLAGYR Mycoplasma capricolum mtl... 68 7e-09 SEQ ID NO : 2745 giI22775678IdbjIAP004593. 11 Oceanobacillus iheyensis genomi... 66 3e-08 SEQ ID NO : 2746 gi#23496883#gb#AE014850.1# Plasmodium falciparum 3D7 chromo... 62 5e-07 SEQ ID NO : 2747 gi#4099104#gb#U83664. 1|MAU83664 Mycoplasma arthritidis gyra... 58 7e-06 SEQ ID NO : 2748 gi#3861033#emb#AJ235272.1#RPXX03 Rickettsia prowazekii stra... 56 3e-05 SEQ ID NO : 2749 gi#23121882#ref#NZ_AAAW01000001.1# Prochlorococcus marinus... 56 3e-05 SEQ ID NO : 2750 gi#21903714#gb#AF036961.2# Mycoplasma hominis glucose 1-pho... 54 le-04 SEQ ID NO : 2751 gi#13654371#gb#AC025948.16#AC025948 Staphylococcus aureus c... 52 4e-04 SEQ ID NO : 2752 gi#9665188#gb#AC025950.9#AC025950 Staphylococcus aureus clo... 52 4e-04 SEQ ID NO : 2753 gi#296393#emb#X71437. 1|SAGYRREC S. aureus genes gyrB, gyrA a... 52 4e-04 SEQ ID NO : 2754 gi#49345#emb#Z19108. 1|SCORICA S. citri dnaA, dnaN, gyrA and 52 4e-04 SEQ ID NO : 2755 gi#16412421#emb#AL596163.1# Listeria innocua Clipll262 comp... 52 4e-04 SEQ ID NO : 2756

gi#21203164#dbj#AP004822. 1# Staphylococcus aureus subsp. au... 52 4e-04 SEQ ID NO : 2757 gi#14349167#dbj#AP003129.2# Staphylococcus aureus subsp. au... 52 4e-04 SEQ ID NO : 2758 gi#153083#gb#M86227. 1|STARECF Staphylococcus aureus DNA gyr... 52 4e-04 SEQ ID NO : 2759 gi#14245767#dbj#AP003358.2# Staphylococcus aureus subsp. au... 52 4e-04 SEQ ID NO : 2760 gi#540540#dbj#D10489.1#STAGYRABA Staphylococcus aureus gene... 52 4e-04 SEQ ID NO : 2761 gi#14089942#emb#AL445565. 1|MPULM03 Mycoplasma pulmonis (str 50 0.002 SEQ ID NOs : 2762-2763 gi#22533630#gb#AE014219.1# Streptococcus agalactiae 2603V/R... 48 0.007 SEQ ID NOs: 2764-2765 gi#23094961#emb#AL766846.1#SAG766846 Streptococcus agalacti... 48 0.007 SEQ ID NO : 2766 gi#21328233#gb#AF084042.1# Listeria monocytogenes RecF (rec... 46 0.027 SEQ ID NO : 2767 gi#9623821#gb#AF269920.1#AF269920 Staphylococcus epidermidi... 46 0.027 SEQ ID NO : 2768 gi#16409359#emb#AL591973.1# Listeria monocytogenes strain E... 46 0.027 SEQ ID NO : 2769 gi#23002411#ref#NZ_AAAB01000003.1# Lactobacillus gasseri Lg... 46 0.027 SEQ ID NO : 2770 gi#10172612#dbj#AP001507.1# Bacillus halodurans genomic DNA... 46 0.027 SEQ ID NO : 2771 gi#2760176#dbj#AB010081.1# Bacillus sp. gene for B subunit... 46 0.027 SEQ ID NO : 2772 gi#5672640#dbj#AB13492. 11 Bacillus halodurans C-125 genomi... 46 0.027 SEQ ID NO : 2773 gi#21622892#gb#AE014076.1# Buchnera aphidicola str. Sg (Sch... 44 0.11 SEQ ID NO : 2774 gi#4887144#gb#AF138873.1#AF138873 Mus musculus p73 gene, ex... 44 0.11 SEQ ID NO : 2775 gi#4099109#gb#U83665. 1|MAU83665 Mycoplasma arthritidis parE... 44 0.11 SEQ ID NO : 2776 gi#2827005#gb#AF008210.1#AF008210 Buchnera aphidicola genom... 44 0.11 SEQ ID NO : 2777 gi#453417#emb#X77529. 1IMHGYRBLIC M. hominis gyrB and licA genes 44 0.11 SEQ ID NO : 2778 gi23003070ref) NZAAAB01000010. l) Lactobacillus gasseri Lg... 44 0.11

SEQ ID NO : 2779 gi#144144#gb#M80817. 1|BUHRRDDG Buchnera aphidicola ribosoma... 44 0.11 SEQ ID NO : 2780 gi#19915454#gb#AE010829.1# Methanosarcina acetivorans str. ... 42 0.43 SEQ ID NO : 2781 gi#19713402#gb#AE010515. 11 Fusobacterium nucleatum subsp. n... 42 0.43 SEQ ID NO : 2782 gi#13794423#gb#AF165818.4#AF165818 Guillardia theta nucleom... 42 0.43 SEQ ID NO : 2783 gi|15619977|gb|AE008642. 1|AE008642 Rickettsia conorii Malins 42 0.43 SEQ ID NO : 2784 gi#19071885#dbj#AB063521. 1# Wigglesworthia brevipalpis DNA,... 42 0.43 EXAMPLE 40 <BR> <BR> <BR> <BR> <BR> BLAST search of unique Clost idium perfringens sequence against the nr database of NCBI showing homology between Clostridium perfringens and various other biological entities.

A unique region of the Clostridium perfringens genome was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 59 BLAST"hits". The observed"hits"are reported below with corresponding E values, these"hits"correspond to the SEQ ID NOs: 2785-2813. One of the"hits"had an extremely high probability score, and twenty eight with low scores. The single"hit"with highest scores was identified correctly by the BLAST search as Clostridium perfringens with 100% homology to the query sequence over one hundred twenty nucleotides. The low hit scores presented at least 84% homology over a distance of less than fifty nucleotides.

Distribution of 59 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2785 gi#18143657#dbj#AP003185. 11 Clostridium perfringens str. 13... 254 2e-65 SEQ ID NO : 2786 gi#20799565#gb#AF494521. 11 Tomato big bud phytoplasma DNA g... 64 4e-08 SEQ ID NO : 2787 gi#21309842#gb#AF263924. 11 Peanut witches'-broom phytoplasm... 64 4e-08 SEQ ID NO : 2788 gi|21328233|gb|AF084042. 1| Listeria monocytogenes RecF (rec... 60 7e-07 SEQ ID NO : 2789 gi#16409359#emb#AL591973.1# Listeria monocytogenes strain E... 60 7e-07 SEQ ID NO : 2790 gi|16412421|emb|AL596163. 1| Listeria innocua Clipll262 comp... 58 3e-06

SEQ ID NO : 2791 gi#15982569#dbj#AB059406. 11 Enterococcus faecalis parE and... 58 3e-06 SEQ ID NO : 2792 gi#21904188#gb#AE014146. 11 Streptococcus pyogenes MGAS315, ... 52 2e-04 SEQ ID NO : 2793 gi#19747968#gb#AE010010. ll Streptococcus pyogenes strain MG... 52 2e-04 SEQ ID NO : 2794 gi|13621903|gb|AE006524. 1|AE006524 Streptococcus pyogenes M... 52 2e-04 SEQ ID NO : 2795 gi#3859563#gb#AF098862.1#AF098862 Borrelia hermsii DNA gyra... 52 2e-04 SEQ ID NO : 2796 giIl573548IgbIU32738. 11U32738 Haemophilus influenzae Rd sec... 52 2e-04 SEQ ID NO : 2797 gill9713402|gb|AE010515. 1| Fusobacterium nucleatum subsp. n... 50 7e-04 SEQ ID NO : 2798 gi#9664758#gb#AF269437.1#AF269437 Staphylococcus epidermidi... 50 7e-04 SEQ ID NO : 2799 gi#9664725#gb#AF269404.1#AF269404 Staphylococcus epidermidi... 50 7e-04 SEQ ID NO : 2800 gi12723916IgbIAE006332. 11AE006332 Lactococcus lactis subsp... 50 7e-04 SEQ ID NO : 2801 gi#9623629#gb#AF269733.1#AF269733 Staphylococcus epidermidi... 50 7e-04 SEQ ID NO : 2802 gi#453417#emb#X77529.1#MHGYRBLIC M. hominis gyrB and licA genes 50 7e-04 SEQ ID NO : 2803 gi#16413677#emb#AL596168.1# Listeria innocua Clipll262 comp... 50 7e-04 SEQ ID NO : 2804 gi#23050007#ref#NZ_AAAR01001799.1# Methanosarcina barkeri M... 50 7e-04 SEQ ID NO : 2805 gi1150245841gbIAE007672. 11AE007672 Clostridium acetobutylic... 48 0.003 SEQ ID NO : 2806 gi#14089695#emb#AL445564.1#MPULM02 Mycoplasma pulmonis (str... 48 0.003 SEQ ID NO : 2807 gi#20907007#gb#AE013485.1# Methanosarcina mazei strain Goel... 44 0.041 SEQ ID NO : 2808 gi#4376541#gb#AE001612.1#AE001613 Chlamydia pneumoniae sect... 42 0.16 SEQ ID NO : 2809 gi#9654391#gb#AE004093.1#AE004093 Vibrio cholerae chromosom... 42 0.16 SEQ ID NO : 2810 gi#8163444#gb#AE002210. 2#AE002210 Chlamydophila pneumoniae ... 42 0.16 SEQ ID NO : 2811 gi#23112424#ref#NZ_AABB01000197.1# Desulfitobacterium hafni... 42 0.16

SEQ ID NO : 2812 gi#10176692#dbj#AP002546.2# Chlamydophila pneumoniae J138 g... 42 0.16 SEQ ID NO : 2813 gi|l8497179|gb|AC097455. 31 Homo sapiens BAC clone RP11-2J13... 38 2.5 EXAMPLE 41 BLAST search of unique Clostridium perfringens sequence against the nr database of NCBI showing homology between Clostridium perfringens and various other biological entities.

A unique region of the Clostridium perfringens genome was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 27 BLAST"hits". The observed"hits"are reported below with corresponding E values, these"hits"correspond to the SEQ ID NOs: 2814-2822. Two of the"hits"had an extremely high probability score, two with intermediate scores, and three with low scores. The"hits"with highest scores was identified correctly by the BLAST search as Clostridium perfringens with 100% homology to the query sequence over three hundred nucleotides. Sequence dissimilarities within the group with intermediate scores identified BLAST sequences of related species that have significant homology to the query sequence but are from different biological entities. Since the query sequence originated from a unique region of Clostridium perfringens, it is reasonable to infer that the sequences identified by the BLAST search in other evolutionarily related biological entities are also from unique regions within their genomes. The low hit scores presented at least 92% homology over a distance of less than fifty nucleotides.

Distribution of 27 Blast Hits on the Query Sequence Score E Sequences producing significant alignments : (bits) Value SEQ ID NOs : 2814-2815 gi#18143657#dbj#AP003185. 11 Clostridium perfringens str. 13... 967 0.0 SEQ ID NOs : 2816-2817 gi|16904575|dbj|AB045282. 1| Clostridium perfringens rrnA op 636 e-179 SEQ ID NO : 2818 gi#144702#gb#M69267. l) CL016SRRNA Clostridiumperfringens rr... 74 3e-10 SEQ ID NO : 2819 gi|15022817|gb|AE007513. 1|AE007513 Clostridium acetobutylic 58 2e-05 SEQ ID NO : 2820 gi#1790872#gb#U35453. 1|CAU35453 Clostridium acetobutylicum 58 2e-05 SEQ ID NO : 2821 gi#153083#gb#M86227. 1|STARECF Staphylococcus aureus DNA gyr... 40 4.5 SEQ ID NO : 2822 gi#11991393#emb#AL357514. 19# Human DNA sequence from clone... 40 4.5

EXAMPLE 42 BLAST search of unique Eastern equie erzcephalitis virus sequence against the nr database of NCBI showing homology between Eastern equine encephalitis virus and various other biological entities.

A unique region of the Eastern equine encephalitis virus genome was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 407 BLAST"hits". The most pertinent"hits"are reported below with corresponding E values, these "hits"correspond to the SEQ ID NOs: 2823-3142. Two of the"hits"had an extremely high probability score, and forty eight with intermediate scores. The two"hits"with high scores were identified correctly by the BLAST search as Eastern equine encephalitis virus with 100% homology to the query sequence over seven thousand nucleotides. Sequence dissimilarities within the group with intermediate scores identified BLAST sequences of related species that have significant homology to the query sequence but are from different biological entities. Since the query sequence originated from a unique region of Eastern equine encephalitis virus, it is reasonable to infer that the sequences identified by the BLAST search in other evolutionarily related biological entities are also from unique regions within their genomes.

Distribution of 407 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 2823 gi|59185|emb|X63135. 1|EEEVIRNA Eastern Equine Encephalomyel... 1.399e+04 0.0 SEQ ID NO : 2824 gi#393006#gb#U01034. 1lU01034 Eastern equine encephalomyelit... 1.309e+04 0.0 SEQ ID NO : 2825 gi|22001302|gb|AF525498. 1| Eastern equine encephalitis viru... 967 0.0 SEQ ID NO : 2826 gi#22001298#gb#AF525496. 11 Eastern equine encephalitis viru... 967 0.0 SEQ ID NO : 2827 giI323702IgbIK00701. 11EEESA01 Eastern equine encephalomyeli... 194 le-45 SEQ ID NOs : 2828-2842 gi#6760410#gb#AF214040. 11AF214040 Western equine encephalom... 192 5e-45 SEQ ID NOs : 2843-2852 gi|398206|emb|X74892. 1|WEEVNS Western Equine Encephalitis V... 192 5e-45 SEQ ID NOs : 2853-2862 gi#393033#gb#U01065. 1#WEU01065 Western equine encephalomyel... 192 5e-45 SEQ ID NOs : 2863-2871 gi#323706#gb#L00930. 1#EEVNSPECFA Venezuelan equine encephal... 172 5e-39

SEQ ID NOs : 2872-2882 gi#4262314#gb#AF075256. 1|AF075256 Venezuelan equine encepha... 151 2e-32 SEQ ID NOs : 2883-2892 gi|4262308|gb|AF075254. 1lAF075254 Venezuelan equine encepha... 149 7e-32 SEQ ID NOs : 2893-2902 gi|4262317|gb|AF075257. 1|AF075257 Venezuelan equine encepha... 145 le-30 SEQ ID NOs : 2903-2912 gi#20800454#gb#U55350.2#VEU55350 Venezuelan equine encephal... 141 2e-29 SEQ ID NOs : 2913-2923 gi#20800451#gb#U55347.2#VEU55347 Venezuelan equine encephal... 141 2e-29 SEQ ID NOs : 2924-2934 gi#20800448#gb#U55345.2#VEU55345 Venezuelan equine encephal... 141 2e-29 SEQ ID NOs : 2935-2945 gi|18152933|gb|U55342. 2|VEU55342 Venezuelan equine encephal... 141 2e-29 SEQ ID NOs : 2946-2956 gi#14549692#gb#AF375051.1#AF375051 Venezuelan equine enceph... 141 2e-29 SEQ ID NOs : 2957-2967 gi#290609#gb#L04653.1#EEVCOMGEN Venezuelan equine encephali... 141 2e-29 SEQ ID NOs : 2968-2973 gi#5442468#gb#U55360.2#VEU55360 Venezuelan equine encephali... 141 2e-29 SEQ ID NOs : 2974-2979 gi#5442471#gb#U55362.2#VEU55362 Venezuelan equine encephali... 141 2e-29 SEQ ID NOs : 2980-2986 giI5442464IgbIAF004459. 21AF004459 Venezuelan equine encepha... 141 2e-29 SEQ ID NOs : 2987-2994 gi#5442458#gb#AF004458.2#AF004458 Venezuelan equine encepha... 141 2e-29 SEQ ID NOs : 2995-3002 gi#4689187#gb#AF100566.1#AF100566 Venezuelan equine encepha... 141 2e-29 SEQ ID NOs : 3003-3010 gi#5442461#gb#AF004472.2#AF004472 Venezuelan equine encepha... 135 le-27 SEQ ID NOs : 3011-3016 gi#4887231#gb#L01442.2#EEVNSPEPA Venezuelan equine encephal... 135 le-27 SEQ ID NOs : 3017-3025 gi#4262305#gb#AF075253.1#AF075253 Venezuelan equine encepha... 135 le-27 SEQ ID NOs : 3026-3031 gi#3249013#gb#AF069903.1#AF069903 Venezuelan equine encepha... 135 le-27 SEQ ID NOs : 3032-3038 gi#1144527#gb#U34999. 1#VEU34999 Venezuelan equine encephali... 135 le-27 SEQ ID NOs : 3039-3045 GI#323708#gb#J04332.1#EEVNSPENV Venezuelan equine encephali... 135 le-27 SEQ ID NOs : 3046-3051 gi#323714#gb#L01443. 11EEVNSPEPB Venezuelan equine encephali... 135 le-27

SEQ ID NOs : 3052-3059 gi#4262302#gb#AF075252.1#AF075252 Venezuelan equine encepha... 129 6e-26 SEQ ID NOs : 3060-3066 gi#17865002#gb#AF448538.1# Venezuelan equine encephalitis v... 127 3e-25 SEQ ID NOs : 3067-3073 gi#17864999#gb#AF448537. 11 Venezuelan equine encephalitis v... 127 3e-25 SEQ ID NOs : 3074-3080 gi|17864996|gb|AF448536. 1| Venezuelan equine encephalitis v... 127 3e-25 SEQ ID NOs : 3081-3087 gi#17864993#gb#AF448535. 1# Venezuelan equine encephalitis v... 127 3e-25 SEQ ID NOs : 3088-3094 gi#4262311#gb#AF075255. 1|AF075255 Venezuelan equine encepha... 125 le-24 SEQ ID NOs : 3095-3100 gi|l7865005|gb|AF448539. 1| Venezuelan equine encephalitis v... 119 6e-23 SEQ ID NOs : 3101-3105 gi#4262299#gb#AF075251. 11AF075251 Venezuelan equine encepha... Ill 2e-20 SEQ ID NOs : 3106-3114 giI4262323IgbIAF075259. 11AF075259 Venezuelan equine encepha... 107 2e-19 SEQ ID NOs : 3115-3121 gi#4262320#gb#AF075258.1#AF075258 Venezuelan equine encepha... 92 le-14 SEQ ID NO : 3122 gi#28193929#gb#AF339474.1# Buggy Creek virus strain 81V8122... 86 9e-13 SEQ ID NOs : 3123-3124 gi#331527#gb#J02246. 1|MBVCP Middelburg virus nonstructural... 84 3e-12 SEQ ID NOs : 3125-3128 giIl8857922IgbIAF237947. 11 Mayaro virus, complete genome 80 5e-11 SEQ ID NOs : 3129-3133 gi#20127133#gb#AF492770.1# Sindbis virus strain MRE16 5'UTR... 68 2e-07 SEQ ID NOs : 3134-3135 gi#3873294#gb#AF103734.1#AF103734 Sindbis-like virus YN8744... 64 3e-06 SEQ ID NOs : 3136-3137 gi#11225069#gb#U38305.1#ACU38305 Sindbis-like virus isolate... 64 3e-06 SEQ ID NO : 3138 gi#333921#gb#M20162.1#PRVNBCG Ross River virus (RRV) (strai... 62 le-05 SEQ ID NO : 3139 gi#329481#gb#K00700.1#HJV01 highlands j virus rna 5'termin... 62 le-05 SEQ ID NOs: 3140-3141 gi#16904821#gb#AF438162. 1|AF438162 Chikungunya virus nonstr... 60 5e-05 SEQ ID NO : 3142 gi#2072049#gb#U94602.1#MVU94602 Mayaro virus nonstructural... 60 5e-05

EXAMPLE 43 <BR> <BR> <BR> <BR> <BR> BLAST search of unique Eastern equine encephalitis virus sequence against the nr database of NCBI showing homology between Eastern equine encephalitis virus and various other biological entities.

A unique region of the Eastern equine encephalitis virus genome was used as a query sequence in the BLAST search against the nr database. The BLAST search identified 115 BLAST"hits". The most pertinent"hits"are reported below with corresponding E values, these "hits"correspond to the SEQ ID NOs: 3143-3241. Two of the"hits"had an extremely high probability score, and eleven with intermediate scores. The two"hits"with high scores were identified correctly by the BLAST search as Eastern equine encephalitis virus with 100% homology to the query sequence three thousand nucleotides. Sequence dissimilarities within the group with intermediate scores identified BLAST sequences of related species that have significant homology to the query sequence but are from different biological entities. Since the query sequence originated from a unique region of Eastern equine encephalitis virus, it is reasonable to infer that the sequences identified by the BLAST search in other evolutionarily related biological entities are also from unique regions within their genomes. The intermediate hit scores presented at least 83% homology over a distance of less than 500 nucleotides.

Distribution of 115 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value SEQ ID NO : 3143 gi|59185|emb|X63135. 1|EEEVIRNA Eastern Equine Encephalomyel... 6916 0.0 SEQ ID NO : 3144 gi|393006|gb|U01034. 1#U01034 Eastern equine encephalomyelit... 6441 0.0 SEQ ID NO : 3145 gi#22001302#gb#AF525498 11 Eastern equine encephalitis viru... 967 0.0 SEQ ID NO : 3146 gi|22001298|gb|AF525496. 1| Eastern equine encephalitis viru... 967 0.0 SEQ ID NOs : 3147-3152 gi#6760410#gb#AF214040. 11AF214040 Western equine encephalom... 115 5e-22 SEQ ID NOs : 3153-3159 gi#398206#emb#X74892. 1#WEEVNS Western Equine Encephalitis V... 100 3e-17 SEQ ID NOs : 3160-3166 giI393033IgbIU01065. 1IWEU01065 Western equine encephalomyel... 100 3e-17 SEQ ID NO : 3167 gi#4887231#gb#L01442. 2#EEVNSPEPA Venezuelan equine encephal... 84 2e-12 SEQ ID NO : 3168 gi|3249013|gb|AF069903. 1|AF069903 Venezuelan equine encepha... 84 2e-12

SEQ ID NO : 3169 gi|323708igb|J04332. 1|EEVNSPENV Venezuelan equine encephali... 84 2e-12 SEQ ID NO : 3170 gi#323714#gb#L01443. l) EEVNSPEPB Venezuelan equine encephali... 84 2e-12 SEQ ID NO : 3171 gi|4689187|gb|AF100566. 1|AF100566 Venezuelan equine encepha... 76 4e-10 SEQ ID NOs : 3172-3173 gi#4262311#gb#AF075255.1#AF075255 Venezuelan equine encepha... 74 2e-09 SEQ ID NOs : 3174-3176 gi#4262305#gb#AF075253.1#AF075253 Venezuelan equine encepha... 74 2e-09 SEQ ID NOs : 3177-3180 gi#4262323#gb#AF075259. 1#AF075259 Venezuelan equine encepha... 66 4e-07 SEQ ID NO : 3181 giI5442468IgbIU55360. 2IVEU55360 Venezuelan equine encephali... 60 2e-05 SEQ ID NO : 3182 gi#5442471#gb#U55362.2#VEU55362 Venezuelan equine encephali... 60 2e-05 SEQ ID NO : 3183 gi#5442464#gb#AF004459.2#AF004459 Venezuelan equine encepha... 60 2e-05 SEQ ID NO : 3184 gi#5442461#gb#AF004472.2#AF004472 Venezuelan equine encepha... 60 2e-05 SEQ ID NO : 3185 gi#5442458#gb#AF004458.2#AF004458 Venezuelan equine encepha... 60 2e-05 SEQ ID NOs : 3186-3188 gi#4262314#gb#AF075256.1#AF075256 Venezuelan equine encepha... 60 2e-05 SEQ ID NO : 3189 gi#20800454#gb#U55350. 2#VEU55350 Venezuelan equine encephal... 58 le-04 SEQ ID NO : 3190 gi#20800451#gb#U55347.2#VEU55347 Venezuelan equine encephal... 58 le-04 SEQ ID NO : 3191 gi#20800448#gb#U55345.2#VEU55345 Venezuelan equine encephal... 58 le-04 SEQ ID NO : 3192 gi#18152933#gb#U55342. 2|VEU55342 Venezuelan equine encephal... 58 le-04 SEQ ID NO : 3193 gi#14549692#gb#AF375051. 1|AF375051 Venezuelan equine enceph... 58 le-04 SEQ ID NO : 3194 gi#290609#gb#L04653.1#EEVCOMGEN Venezuelan equine encephali... 58 le-04 SEQ ID NO : 3195 gi#4262299#gb#AF075251. 1#AF075251 Venezuelan equine encepha... 58 le-04 SEQ ID NOs : 3196-3197 gi#27734686#gb#AF369024.2# Chikungunya virus strain S27-Afr... 56 4e-04 SEQ ID NOs : 3198-3199 gi#23957839#gb#AF490259. 21 Chikungunya virus strain Ross, c... 56 4e-04

SEQ ID NO : 3200-3202 gi#17865005#gb#AF448539.1# Venezuelan equine encephalitis v... 56 4e-04 SEQ ID NOs : 3203-3205 gi|17865002|gb|AF448538. 1| Venezuelan equine encephalitis v... 56 4e-04 SEQ ID NOs : 3206-3207 gi|17864999|gb|AF448537. 1| Venezuelan equine encephalitis v... 56 4e-04 SEQ ID NOs : 3208-3210 gi#17864996#gb#AF448536. 11 Venezuelan equine encephalitis v... 56 4e-04 SEQ ID NOs: 3211-3213 gi#17864993#gb#AF448535.1# Venezuelan equine encephalitis v... 56 4e-04 SEQ ID NOs : 3214-3217 gi#4262308#gb#AF075254. 11AF075254 Venezuelan equine encepha... 56 4e-04 SEQ ID NOs : 3218-3220 gi|1144527|gb|U34999. 1|VEU34999 Venezuelan equine encephali... 56 4e-04 SEQ ID NOs : 3221-3224 gi#323706#gb#L00930.1#EEVNSPECFA Venezuelan equine encephal... 52 0.006 SEQ ID NOs : 3225-3227 gi#4262302#gb#AF075252. 1|AF075252 Venezuelan equine encepha... 50 0.024 SEQ ID NOs : 3228-3230 gi#4262317#gb#AF075257.1#AF075257 Venezuelan equine encepha... 48 0.093 SEQ ID NO : 3231 gi#4240567#gb#AF126284.1#AF126284 Aura virus polyprotein 1... 48 0.093 SEQ ID NOs : 3232-3233 gi#1778358#gb#U73745.1#BFU73745 Barmah Forest virus, comple... 48 0.093 SEQ ID NO : 3234 gi#7288147#dbj##AB032553.1# Sagiyama virus genomic RNA, comp... 48 0.093 SEQ ID NO : 3235 gi#1144525#gb#U34978.1#VEU34978 Venezuelan equine encephali... 48 0.093 SEQ ID NO : 3236 gi#1125066#gb#U38304.1#ACU38304 Sindbis-like virus isolate... 44 1.5 SEQ ID NO : 3237 gi#334111#gb#M69205.1#SINOCK82 Ockelbo virus strain Edsbyn,... 42 5.8 SEQ ID NO : 3238 gi#3873294#gb#AF103734. 1|AF103734 Sindbis-like virus YN8744... 42 5.8 SEQ ID NO : 3239 gi#4262320#gb#AF075258. 11AF075258 Venezuelan equine encepha... 42 5.8 SEQ ID NO : 3240 gi#333921#gb#M20162.1#RRVNBCG Ross River virus (RRV) (strai... 42 5.8 SEQ ID NO : 3241 gi#1125069#gb#U38305.1#ACU38305 Sindbis-like virus isolate... 42 5.8

EXAMPLE 44 Hybridization of unique genomic sequehces Once a unique oligonucleotide sequence is generated and synthesized by the method described herein from the corresponding unique genomic sequence of a specific organism, the unique oligonucleotide sequence may be used as a target on a microarray. The arrangement of unique oligonucleotide sequences on a array allow for the specific identification of biological entities.

Figure 2 compares the hybridization pattern of genomic DNA for Clostridium perfringens or Bacillus anthracis that was Klenow labeled with Cy3 labeled dCTP. Probes were exposed to identical oligonucleotide microarrays. Each microarray contained control oligonucleotide sequences (see boxes within Figure 2). These controls may take the form of genomic oligonucleotide sequences comprising salmon sperm DNA at 10 ng/ul. The other form of controls are random 50-mer oligonucleotide sequences synthesized that demonstrate non- specific hybridization. These non-specific oligonucleotides are applied at different concentrations on the array. This permits the user to compensate for hybridization efficiencies and thus enables calibration of hybridization intensities, based on the controls in the array.

Labeled probes were investigated concurrently, and were therefore subjected to identical hybridization and washing conditions. The results of which were subjected to a laser scanner.

These data demonstrate the ability to discriminate between species of microbiological entities using the method described herein to generate unique genomic sequences and unique oligonucleotide sequences.

EXAMPLE 45 Discrimination of strain via hybridization In this example unique genomic sequences were identified for E. coli K12 (SEQ ID NO : 849), E. coli 0157 : H7 (SEQ ID NO : 810) or E. coli 0157 : H7 Shiga gene (SEQ ID NO : 3242) as described by the method herein. Each individual unique genomic sequence was BLAST searched against the nr database to confirm uniqueness (see Example 53). A plurality of unique oligonucleotides were generated as a result of each unique genomic sequence. These oligonucleotide sequences were also BLAST searched against the nr database using the method described herein, to confirm their uniqueness (SEQ ID NOs: 1176-1190 for E. coli K12, SEQ ID NOs: 1284-1297 for E. coli 0157: H7 and SEQ ID NOs: 1300-1328 for E. coli 0157: H7 Shiga gene). These unique oligonucleotide sequences and remaining E. coli general-genome unique oligonucleotide sequences were applied to an array. Genomic DNA from the two E. coli strains

was isolated, labeled and hybridized to the array. Figure 3 compares the hybridization pattern of genomic DNA for E. coli K12 or E. coli 0157: H7 that was Klenow labeled with Cy3 labeled dCTP. Probes were exposed to identical unique oligonucleotide microarrays. Each microarray contained control oligonucleotide sequences as described above. Labeled probes were investigated concurrently and were therefore subjected to identical hybridization and washing conditions. The results of which were subjected to a laser scanner. The exact location of strain specific unique oligonucleotide sequences for E. coli K12 and E. coli 0157: H7 on the array are known, and through interpretation of hybridization intensity values at these locations, the array is able to detect the presence or absence of microbiological entities.

EXAMPLE 46 Discrimination of species and strain via hybridization In this example, Figure 4 reports the unique oligonucleotide sequences identified in Example 3 for E. coli K12 and E. coli 0157 : H7 strains as hybridization intensities. The resulting mean intensity of hybridization for each unique oligonucleotide sequences was recorded and is presented as a point in the scatter plot. Those unique oligonucleotide sequences that fall along the slope of 1, also referred to as the line of identify, or within two standard deviations from that line are considered to be identical with respect to the ability to differentiate between two organisms, and are not considered informative. Those points located in the outlying quadrants represent unique oligonucleotide sequences that are particularly informative because they can distinguish between two strains or organisms, based on their hybridization intensity values. As genetic diversity increases between the two organisms fewer plots are observed along the line of identity. Thus, the inclusion of informative unique oligonucleotide sequences were particularly useful on an array. These date demonstrate the ability to discriminate between strains of closely related microbiological entities using the hybridization intensity of unique oligonucleotide sequences.

EXAMPLE 47 Phylogenetic assignment Figure 5 relates to the further characterization of a E. coli sample using the informative unique oligonucleotide sequences identified in the outlying quadrants of the scatter plot from Example 4. In this example, unique oligonucleotide sequences that represented informative unique oligonucleotide sequences of the E. coli genome were spotted onto a microarray. The sequences represented on the microarray included strain and gene-specific informative unique

oligonucleotide sequences as assessed in Example 4. Specifically, informative unique oligonucleotide sequences identified for E. coli K12, E. coli 0157: H7 and the subset of E. coli 0157: H7 that contain the Shiga gene were used. As such, the informative unique oligonucleotides sequences utilized on the array correspond to (SEQ ID NOs: 1176-1190 for E. coli K12, SEQ ID NOs: 1284-1297 for E. coli 0157: H7 and SEQ ID NOs: 1300-1328 for E. coli 0157: H7 Shiga gene. In this example, samples containing genomic E. coli were amplified and labeled as described previously. After hybridization the array was washed and scanned. The intensity of hybridization for each informative unique oligonucleotide sequence was determined as a numerical value. The differentiation between informative unique oligonucleotide sequences upon exposure to different strains of E. coli was graphically visualized by comparing mean hybridization intensity of each informative unique oligonucleotide sequence on the array, the results of which are presented in Figure 5. These data establish a method to produce unique oligonucleotide sequences that were useful in differentiating related organisms.

EXAMPLE 48 <BR> <BR> <BR> <BR> <BR> Discrimination of species by hybridization techniques utilizing unique oligonucleotide sequences Figure 6 shows the hybridization intensities of amplified, fluorescently labeled genomic DNA from various sources to a microarray containing a plurality of unique oligonucleotide sequences. The array contained unique oligonucleotide sequences of B. Anthracis, Vaccinia, Y. pestis, B. Melitensis, C. perfringens and F tularensis as described along the X axis.

In the top left panel, an array was exposed to a probe derived from B. anthracis. The array reported significant levels of hybridization that correspond to B. anthracis unique oligonucleotide sequences. In the top right panel an array was exposed to a probe derived from B. melitensis. Again, the array reported significant levels of hybridization that are specific for B. melitensis unique oligonucleotide sequences on the array. These specific hybridization results are also confirmed for Vaccinia probes and Y. pestis probes, as observed in the middle panels of Figure 6. The lower left panel corresponds to the hybridization intensity of oligonucleotides that were randomly synthesized and unexpectedly found to have specific hybridization properties to probes derived from B. Subtilus, and as such are unique oligonucleotides for this organism. The lower right panel reflects the hybridizing intensities observed when a probe derived from Homo sapien genomic DNA was exposed to the array. As anticipated using the unique oligonucleotide sequences generated by the method described herein, no cross-hybridization is observed. This example demonstrates genomic DNA from a variety of origins hybridizing to corresponding organism-specific unique oligonucleotide sequences. These results also demonstrate that an array

containing these unique oligonucleotide sequences is useful in detecting and differentiating between numerous biological entities.

EXAMPLE 49 Level of detection Figure 7 shows an example of the level of detection for the assay described herein, in the case of C. perfringens. A known concentration of C. perfringens was added to a DNA-rich sample. The C. perfringens sample was subsequently diluted in a stepwise fashion. Prepared samples were examined using an array containing unique oligonucleotide sequence for C. perfringens. A significant level of detection for C. perfringens was observed at a dilution of 1: 100,000. Hybridization of the C. per sample to the array demonstrated that different microbial species were distinguished from each one another and that a bacterial sequence was identified in the complex background of the human genome. This level of detection is particularly important in situations where analysis of trace contaminants or minute populations of pathogens is required.

EXAMPLE 50 Generation of gene-specific unique oligonucleotide sequences The present invention includes a method to identify organism-specific unique genomic sequences that may not have a defined function as described in the current literature. Unique genomic sequences were further analyzed using the methods described herein to produce unique oligonucleotide sequences that were utilized to detect naturally occurring biological entities in complex samples.

In one embodiment of the present method, unique genomic sequences were identified and re-aligned against the genomic sequence under investigation. Unique genomic sequences may be annotated before, during or after the generation of unique genomic sequences. Once the genomic sequence was annotated with specific markers for virulence, structural, and ribosomal genes it was possible to identify specific regions of the genome that are gene-specific. The unique genomic sequences that encode these annotated regions were further analyzed to produce unique oligonucleotide sequences that are also gene-specific. The ability to identify gene-specific regions and subsequently produce gene-specific unique oligonucleotide sequences may be particularly useful for gene expression and gene discovery studies. For example, it is known that the Clostridium perfringens 16S rRNA gene is encoded by unique genomic sequences as identified by the method of this application. The rRNA gene of the Clostridium perfringens

genome was annotated, and unique genomic sequences identified in the 16S region were further assessed for possible sites of unique oligonucleotide sequence. Ten individual unique oligonucleotide sequences were identified as described by the method herein and are presented as SEQ ID NOs: 1345-1354. The presence of these unique oligonucleotide sequences in a microarray were used to indicate the presence of Clostridium perfringens in a complex sample.

By the same method, gene-specific unique oligonucleotide sequences were also produced for the E. coli rrnH gene. It is known that the E. coli rrnH gene is encoded by unique genomic sequences as identified by the method of this application. The E. coli genome was annotated and unique genomic sequences within the annotated region further investigated for possible unique oligonucleotide sequence sites. Twelve unique oligonucleotides were detected for the E. coli rmH gene and are presented as SEQ ID NOs: 1331-1344. The presence of these unique oligonucleotide sequences in a microarray were used to indicate the presence of E. coli in a complex sample.

EXAMPLE 51 Detection of a pathogenic biological entity The present invention includes a method to identify organism-specific unique genomic sequences that may not have a defined function as described in the current literature. Unique genomic sequences were further analyzed using the methods described herein to produce unique oligonucleotide sequences that were utilized to detect naturally occurring and recombinant biological entities in complex environmental, food, forensic or biological samples.

As described in example 50 unique genomic sequences can be re-aligned against the original genome under investigation to identify regions of the genome that are gene-specific. The ability to identify gene-specific regions and subsequently produce gene-specific unique oligonucleotide sequences is particularly useful for the identification of pathogenic biological entities in a given sample. For example, it is well documented that the E. coli Shiga gene is encoded in pathogenic strains of E. coli such as E. coli 0157: H7. Using the method described herein, the Shiga gene within the E. coli genome was annotated and the corresponding unique genomic sequences were analyzed using the similarity search program to identify unique oligonucleotide sequences that would be specific for the E. coli Shiga gene. Twenty nine individual unique oligonucleotide sequences were identified for the E. coli Shiga gene and are presented as SEQ ID NOs : 1300-1328. The presence of these twenty nine unique oligonucleotide sequences in a microarray were used to indicate the presence of E. coli in a complex sample.

Furthermore, the unique oligonucleotide sequences corresponding to the E. coli Shiga gene were

also used to distinguish the harmless background associated with E. coli K12 strains from the pathogenic E. coli strain 0157 : H7.

Similarly, this gene-specific approach was used to identify unique oligonucleotide sequences in pathogenic Clostridium perfringens species that encode C. perfringens Enterotoxin M98037. Using the annotation approach described above, twenty unique oligonucleotide sequences that encoded the above enterotoxin were identified from unique genomic sequences of Clostridium perfringens SEQ ID NOs : 1357-1376. The presence of these twenty unique oligonucleotide sequences were used to indicate the presence of Clostridium perfringens in a complex sample. Furthermore, the unique oligonucleotide sequences corresponding to the enterotoxin gene were also used to distinguish the harmless background associated with some Clostridium perfringens strains from pathogenic Clostridium perfringens strains.

EXAMPLE 52 PCR Primer Amplification In this example unique genomic sequences were identified from the Clostridium perfringens genome. These sequences were BLAST searched against the nr database to confirm uniqueness. One unique genomic sequence SEQ ID NO: 240, is used here for illustrative purposes. Fifteen unique oligonucleotide sequences SEQ E) NOs: 1445-1459 were generated from the unique genomic sequence SEQ ID NO: 240 by the method described herein. Unique oligonucleotide sequences were BLAST searched to confirm uniqueness. Two amplification primers (SEQ ID NOs: 1460-1461) were also identified during this process of analysis and were subsequently utilized to amplify the unique genomic sequence SEQ ID NO: 240 from a sample containing C. perfringens. In addition, a number of known unique oligonucleotide sequences for Vaccinia, E. coli K12, E. coli 0157 : H7 and Clostridium perfringens were spotted onto an array.

Unique oligonucleotide sequences for the above organisms were spotted in triplicate in a "Vertical Linear format"with unique oligonucleotide sequences from a single region of the genome adjacent to each other. The two amplification primers SEQ ID NOs: 1460-1461 were used to amplify the 1000 base pair unique genomic sequence SEQ ID NO: 240 from C. perfringens and the resulting amplicon was purified and labeled with Cy3-dCTP. The labeled amplicon was hybridized to the array and washed. An image of the microarray after hybridization is presented in Figure 8. In the top right quadrant of the array, two Clostridium perfringens unique oligonucleotide sequences were placed on the first row of this array. Only the first unique oligonucleotide sequence hybridized with the probe. The other, to the right of the single row of three"dots"did not hybridize. The second row of the array contained the thirteen

remaining unique oligonucleotide sequences from unique genomic sequence (SEQ ID NO: 240).

Again, one column of"dots"corresponding to a Clostridium perfringens unique oligonucleotide sequence is not visible in the middle of the row. This represented a second unique oligonucleotide sequence that did not hybridize to the probe. It is noted, in the top left quadrant of the array there appears to be some cross hybridization to one or two unique oligonucleotide sequences of Vaccinia, but overall this level of hybridization as shown in the histogram below the array, is minimal. These results indicate that thirteen out of the fifteen unique oligonucleotide sequences identified for C. perfringens successfully hybridized to a sample containing C. perfringens, while two of these unique oligonucleotide sequences did not find their match in the labeled amplicon. It is speculated that these unique oligonucleotide sequences may hybridize to the correct unique genomic sequence under different hybridization conditions, such as lower/higher temperature, longer hybridization reaction, and the like. These data demonstrate the beneficial use of PCR primers to generate a unique genomic sequence for the organism from which they were identified. As such the primers provided in this disclosure can be used to generate unique genomic sequence, as required, to test the hybridization efficiencies of unique oligonucleotide sequences.

EXAMPLE 53 BLAST search of unique oligonucleotide sequences against the nr database of NCBI showing uniqueness of oligonucleotide sequences.

Three unique genomic sequences (SEQ I NOs : 810, 849, 3242) that correspond to distinct regions of the E. coli genome were identified by the method described herein. SEQ ID NO: 810 is a unique genomic sequence from E. coli 0157 : H7, SEQ ID NO: 849 is a unique genomic sequence from E. coli K12 and SEQ ID NO: 3242 is a unique genomic sequence from E. coli 0157 : H7 that contain the Shiga gene.

Each unique genomic sequence was screened for potential oligonucleotide sequences as described herein. In total, 13 unique oligonucleotide sequences were identified for these 3 regions of the E. coli genome, 10 of which are presented here for illustrative purposes.

Unique genomic sequence SEQ ID NO: 810 identified 2 unique 50-mer oligonucleotide sequences for E. coli 0157 : H7, both of which (SEQ ID NOs: 1292,1294) were BLAST searched against the nr database to confirm their uniqueness over the entire length of the unique oligonucleotide sequence. The BLAST search for each unique oligonucleotide sequence identified over 100 BLAST"hits". The most pertinent"hits"are reported below.

Unique oligonucleotide sequence: SEQ ID NO: 1292

RID : 1074620345-32204-105313520645. BLASTQ4 Query= (50 letters) Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, or phase 0,1 or 2 HTGS sequences) 2,017, 250 sequences; 9,771, 119,756 total letters Distribution of 110 Blast Hits on the Query Sequence Score E Sequences producing significant alignments : (bits) Value gi#12519298#gb#AE005660.1#AE005660 Escherichia coli O157 : H7... 100 3e-19 gil13364704ldbjlAP002569. 1 Escherichia coli 0157: H7 DNA, c... 100 3e-19 gil24430266lemblAL928973. 3l Mouse DNA sequence from clone R... 38 0.95 Unique oligonucleotide sequence: SEQ ID NO: 1294 RID : 1074620491-1989-43695076285. BLASTQ4 Query= (50 letters) Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, or phase 0,1 or 2 HTGS sequences) 2,017, 250 sequences; 9,771, 119,756 total letters Distribution of 102 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value gi#12519298#gb#AE005660.1#AE005660 Escherichia coli O157 : H7... 100 3e-19 gij13364704 ! dbj) AP002569. 1) Escherichia coli 0157: H7 DNA, c... 100 3e-19 Although each BLAST search of the 50-mer unique oligonucleotide sequences (SEQ ID NOs : 1292,1294) produced over 100"hits", it is noted that each unique oligonucleotide sequence only shares 100% homology and low E values (close to zero) over the entire length of the unique oligonucleotide sequence, with E. coli 0157 : H7. These data demonstrate the uniqueness of SEQ

ID NOs : 1292 and 1294 oligonucleotide sequences, and the usefulness of these unique oligonucleotides to identify E. coli 0157 : H7.

Unique genomic sequences SEQ ID NO: 849 identified 6 unique 50-mer oligonucleotide sequences for E. coli K12, 4 of which (SEQ ID NOs: 1176,1178, 1181,1183) were BLAST searched against the nr database to confirm their uniqueness over the entire length of the unique oligonucleotide sequence. The BLAST search for each unique oligonucleotide sequence identified over 100 BLAST"hits". The most pertinent"hits"are reported below.

Unique oligonucleotide sequence: SEQ ID NO: 1176 RID : 1074619920-26314-170811579381. BLASTQ4 Query= (50 letters) Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, or phase 0,1 or 2 HTGS sequences) 2,017, 250 sequences; 9,771, 119,756 total letters Distribution of 100 Blast Hits on the Query Sequence Score E Sequences producing significant alignments : (bits) Value gi#1787665#gb#AE000237.1#AE000237 Escherichia coli K12 MG16... 100 3e-19 gil41829lemblX62680. 1lECIS21S30 E. coli insertion sequences... 100 3e-19 gii1742287jdbj) D90779. 1) E. coligenomicDNA, Koharaclone&num; 100 3e-19 gil1742273ldbjlD90778. 1l E. coli genomic DNA, Kohara clone &num;... 100 3e-19 Unique oligonucleotide sequence: SEQ ID NO: 1178 RID : 1074620067-28141-164569449161. BLASTQ4 Query= (50 letters) Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, or phase 0,1 or 2 HTGS sequences) 2,017, 250 sequences; 9,771, 119,756 total letters Distribution of 103 Blast Hits on the Query Sequence Score E

Sequences producing significant alignments: (bits) Value gi#1787665#gb#AE000237.1#AE000237 Escherichia coli K12 MG16... 100 3e-19 gill 742273ldbjID90778. 11 E. coli genomic DNA, Kohara clone &num;... 100 3e-19 gil261079411gbIAE016760. 11 Escherichia coli CFT073 section... 68 1e-09 Unique oligonucleotide sequence: SEQ ID NO: 1181 RID : 1074620165-29432-159877309617. BLASTQ4 Query= (50 letters) Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, or phase 0,1 or 2 HTGS sequences) 2,017, 250 sequences; 9,771, 119,756 total letters Distribution of 103 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value gi#1787665#gb#AE000237.1#AE000237 Escherichia coli K12 MG16... 100 3e-19 gil41829lemblX62680. 1lECIS21S30 E. coli insertion sequences... 100 3e-19 gil1742287ldbjlD90779. 1l E. coli genomic DNA, Kohara clone &num;... 100 3e-19 gi#1742273#dbj#D90778.1# E. coli genomic DNA, Kohara clone #... 100 3e-19 Unique oligonucleotide sequence: SEQ ID NO: 1183 RID : 1074620258-30724-15997048973. BLASTQ4 Query= (50 letters) Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, or phase 0,1 or 2 HTGS sequences) 2,017, 250 sequences; 9,771, 119,756 total letters Distribution of 114 Blast Hits on the Query Sequence Score E Sequences producing significant alignments : (bits) Value gi#1787665#gb#AE000237.1#AE000237 Escherichia coli K12 MG16... 100 3e-19

gil41829lemblX62680. 1lECIS21S30 E. coli insertion sequences... 100 3e-19 gi#1742287#dbj#D90779.1# E.coli genomic DNA, Kohara clone &num;... 100 3e-19 gi#1742273#dbj#D90778.1# E. coli genomic DNA, Kohara clone &num;... 100 3e-19 gil33238289lgblAE017165. 1l Prochlorococcus marinus subsp. m... 38 0.95 Although each BLAST search of the 50-mer unique oligonucleotide sequences (SEQ ID NOs: 1176,1178, 1181, 1183) produced over 100"hits", it is noted that each unique oligonucleotide sequences only shares 100% homology and low E values (close to zero) over the entire length of the unique oligonucleotide sequence, with E. coli K12. These data demonstrate the uniqueness of SEQ ID NOs: 1176,1178, 1181 and 1183 oligonucleotide sequences, and the usefulness of these unique oligonucleotides to identify E. coli K12.

Unique genomic sequence SEQ ID NO: 3242 identified 5 unique 50-mer oligonucleotide sequences for E. coli 0157: H7 containing the Shiga Gene, 4 of which (SEQ ID NOs: 1301,1302, 1327,1328) were BLAST searched against the nr database to confirm their uniqueness over the entire length of the unique oligonucleotide sequence. The BLAST search for each unique oligonucleotide sequence identified over 100 BLAST"hits". The most pertinent"hits"are reported below.

Unique oligonucleotide sequence: SEQ ID NO: 1301 RID : 1074620702-4120-190001089681. BLASTQ4 Query= (50 letters) Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, or phase 0,1 or 2 HTGS sequences) 2,017, 250 sequences; 9,771, 119,756 total letters Distribution of 104 Blast Hits on the Query Sequence Sequences producing significant alignments: (bits) Value gil21636532lgblAF461172. 1 l Escherichia coli FD930 Shiga tox... 100 3e-19 gi#21636523#gb#AF461169.1# Escherichia coli EK921 Shiga tox... 100 3e-19 gil21636520lgblAF461168. 1l EscherichiacoliEK201 Shigatox 100 3e-19 gil21636514lgblAF461166. 1l Escherichia coli C984 Shiga toxi... 100 3e-19 gil7239813gblAF034975. 3 Bacteriophage H-19B essential rec... 100 3e-19 gi#6759950#gb#AF153317.1#AF153317 Shigella dysenteriae SapF... 100 3e-19

gi#12516385#gb#AE005442.1#AE005442 Escherichia coli O157 : H7... 100 3e-19 gil32128012ldbjlAP005153. 1l Stx1 converting bacteriophage D... 100 3e-19 gi#32400301#dbj#AB083044.1# Escherichia coli O57 : H7 stx1 g... 100 3e-19 gil32400298ldbjlAB083043. 1l EscherichiacoliO157 : H7 stx1 g... 100 3e-19 gi#46946#emb#X07903.1#SDTOXAB Shigella dysenteriae gene for... 100 3e-19 gil4454334lemblAJ132761. 1 SS0132761 ShigellasonneistxAan... 100 3e-19 gil9955818lemblAJ271153. 1lSDY271153 Shigelladysenteriaesh 100 3e-19 gi#534987#emb#Z36899.1#ECSLTIABA E. coli (serotype 048 : H21)... 100 3e-19 gi#9955605#emb#AJ251325.1#ECO251325 Escherichia coli q gene... 100 3e-19 gi#17977984#emb#AJ304858.1#ECO304858 Escherichia coli phage... 100 3e-19 gi#18147051#dbj#AB048232.1# Escherichia coli genes for Shig... 100 3e-19 gi#23343376#emb#AJ413275.1#ECO413275 Bacteriophae Lahn1 pr... 100 3e-19 gil10799908lemblAJ279086. 1 SS0279086 Shigellasonneibacter... 100 3e-19 gil30910914lemblAJ487680. 1lECO487680 Stx1-converting phage 100 3e-19 gi#152784#gb#M19437.1#SHFSHT S. dysenteriaetype 1 Shigatox 100 3e-19 gi#215072#gb#M19473.1#J93SLTI Bacteriophage 933J (from E.co... 100 3e-19 g ! 12150431gbIM16625. lIH19BSLT BacteriophageH19B (from E. co... 100 3e-19 gi#215049#gb#M23980.1#H30SLT Bacteriophage H30 shiga-like t... 100 3e-19 gi#147832#gb#L04539.1#ECOSLTTI Escherichia coli Shiga-like ... 100 3e-19 gi#11875068#dbj#AP000400.1# Escherichia coli O157 : H7 genoml... 100 3e-19 gil13362333ldbjlAP002560. 1 Escherichia coli 0157 : H7 DNA, c... 100 3e-19 gi#215046#gb#M17358.1#H19BSLTA Bacteriophage H19B shiga-lik... 100 3e-19 gil12249025ldbjlAB030485. 1 Escherichia coli stx1 genes for... 100 3e-19 gil6527100ldbjlAB015056. 1l Escherichiacoligenesforshiga. 100 3e-19 gi#6468189#dbj#AB035142. 1l Escherichia coli genes for Shiga... 100 3e-19 gi#152787#gb#M24352.1#SHFSHTA S. dysenteriae cytotoxin (SHT)... 100 3e-19 gi#535054#emb#Z36900.1#ECSLTIABB E. coll (serotype O111 : H-)... 92 7e-17

gi#23266660#gb#AY135685.1# Escherichia coli O5 : H-Stx1A (st 90 3e-16 gil28192582lgblAY170851. 1l Escherichia coli strain MH#813 s... 90 3e-16 gi#535088#emb#Z36901.1#ECSLTIABC E. coli (serotype OX3 : H8) S... 90 3e-16 gi#16580701#emb#AJ312232.1#ECO312232 Escherichia coli stx1v... 90 3e-16 gi#15986379#emb#AJ314839.1#ECO314839 Escherichia coli stx1v... 90 3e-16 gi#15986376#emb#AJ314838.1#ECO314838 Escherichia coli stx1v... 90 3e-16 gil18147066ldbjlABO48237. 1l EscherichiacoligenesforShig 90 3e-16 gil18147060dbjlAB048235. 1 Escherichia coli genes for Shig... 90 3e-16 gil18147057ldbjlAB048234. 1 Escherichia coli genes for Shig... 90 3e-16 gil18147048ldbjlAB048231. 1 Escherichia coli genes for Shig... 90 3e-16 gi#22759888#dbj#AB071623.1# Escherichia coli stx1A gene for... 90 3e-16 gil22759880ldbjlABO71619. 1l Escherichiacolistx1Agenefor 90 3e-16 gi#15869230#emb#AJ413986.1#B62413986 Bacteriophage 6220 stx... 90 3e-16 Unique oligonucleotide sequence: SEQ ID NO: 1302 RID : 1074620819-5463-184396190665. BLASTQ4 Query= (50 letters) Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, or phase 0,1 or 2 HTGS sequences) 2,017, 250 sequences; 9,771, 119, 756 total letters Distribution of 103 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value gi#21636532#gb#AF461172.1# Escherichia coli FD930 Shiga tox... 100 3e-19 gi#7239813#gb#AF034975.3# Bacteriophage H-19B essential rec... 92 7e-17 gi#12516385#gb#AE005442.1#AE005442 Escherichia coli O157 : H7... 92 7e-17

gil32128012ldbjlAP005153. 1l Stx1 converting bacteriophage D... 92 7e-17 gi#9955605#emb#AJ251325. 1 gECO251325 Escherichia coli q gene 92 7e-17 gi#17977984#emb#AJ304858.1#ECO304858 Escherichia coli phage... 92 7e-17 gil23343476lemblAJ413275. 1lECO413275 BacteriophageLahn1 pr 92 7e-17 gi#30910914#emb#AJ487680.1#ECO487680 Stx1-converting phage ... 92 7e-17 gi#147832#gb#L04539.1#ECOSLTTI Escherichia coli Shiga-like ... 92 7e-17 gi#11875068#dbj#AP000400.1# Escherichia coli O157 : H7 genomi 92 7e-17 gil13362333ldbjlAP002560. 1 Escherichia coli 0157 : H7 DNA, c... 92 7e-17 gil10799908lemblAJ279086. 1 SS0279086 Shigellasonneibacter... 90 3e-16 OLIGO SEARCH 324Unique oligonucleotide sequence: SEQ ID NO: 1327 RID : 1074620954-6852-146424325400. BLASTQ4 Query= (50 letters) Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, or phase 0,1 or 2 HTGS sequences) 2,017, 250 sequences; 9,771, 119,756 total letters Distribution of 116 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value gi#21636532#gb#af461172.1# Escherichia coli FD930 Shiga tox... 100 3e-19 gi#21636523#gb#AF461169.1# Escherichia coli EK921 Shiga tox... 100 3e-19 gi#21636520#gb#AF461168.1# Escherichia coli EK201 Shiga tox... 100 3e-19 gi#21636514#gb#AF461166.1# Escherichia coli C984 Shiga toxi... 100 3e-19 gi#25986862#gb#AY123842.1# Escherichia coli isolate 2 shiga... 100 3e-19 gi#25986860#gb#AY123841.1# Escherichia coli isolate 1 shiga... 100 3e-19 gil7239813gblAF034975. 3) Bacteriophage H-19B essential rec... 100 3e-19 gi#6759950#gb#AF153317.1#AF153317 Shigella dysenteriae SapF... 100 3e-19

gi#37360968#dbj#AB119461. 1l Escherichiacolistx1Bgenefor 100 3e-19 gi#12516385#gb#AE005442.1#AE005442 Escherichia coli O157 : H7... 100 3e-19 gil32128012ldbjlAP005153. 1l Stx1 converting bacteriophage D... 100 3e-19 gi#32400301#dbj#AB083044.1# Escherichia coli O157 : H7 stx1 g... 100 3e-19 gil32400298ldbjlAB083043. 1l Escherichia coli 0157 : H7 stx1 g... 100 3e-19 gi#46946#emb#X07903.1#SDTOXAB Shigella dysenteriae gene for... 100 3e-19 gil4454334lemblAJ132761. 1lSSO132761 ShigellasonneistxAan 100 3e-19 gil9955818lemblAJ271153. 1lSDY271153 Shigella dysenteriae sh 100 3e-19 gi#534987#emb#Z36899.1#ECSLTIABA E. coli (serotype O48 : H21)... 100 3e-19 gi#535054#emb#Z36900.1#ECSLTIABB E. coli (serotype 0111 : H-)... 100 3e-19 gil9955656lemblAJ251754. 1lECO251754 Escherichia coli stx1B 100 3e-19 gil9955605lemblAJ251325. 1lECO251325 Escherichia coli q gene 100 3e-19 gil17977984lemblAJ304858. 1 EC0304858 Escherichia coli phage... 100 3e-19 gi#18147051#dbj#AB048232.1# Escherichia coli genes for Shig... 100 3e-19 gil22759882ldbjlAB071620. 1l Escherichiacolistx1Bgenefor 100 3e-19 gil23343476lemblAJ413275. 1 JEC0413275 Bacteriophage Lahnl pr... 100 3e-19 gil10799908lemblAJ279086. 1 SS0279086 Shigella sonnei bacter... 100 3e-19 gil30910914lemblAJ487680. 1lECO487680 Stx1-converting phage 100 3e-19 gi#152784#gb#M19437.1#SHFSHT S. dysenteriae type 1 Shiga tox... 100 3e-19 gi#215072#gb#M9473.1#J93SLTI Bacteriophage 933J (from E.co... 100 3e-19 gi#215043#gb#M16625.1#H19BSLT Bacteriophage H19B (from E.co... 100 3e-19 gi#215049#gb#M23980.1#H30SLT Bacteriophage H30 shiga-like t... 100 3e-19 gi#147832#gb#L04539.1#ECOSLTTI Escherichia coli Shiga-like ... 100 3e-19 gi#11875068#dbj#AP000400.1# Escherichia coli O157 : H7genomi 100 3e-19 gi#13362333#dbj#AP002560.1# Escherichia coli O157 : H7 DNA, c... 100 3e-19 <BR> <BR> gi#215046#gb#M17358.1#H19BSLTA Bacteriophage H19B shiga-lik... 100 3e-19 gi#12249025#dbj#AB030485.1# Escherichia coli stx1 genes for... 100 3e-19

gi#6527100#dbj#AB015056. 1l Escherichia coli genesforshiga 100 3e-19 gil6468189ldbjlABO35142. 11 Escherichia coli genes for Shiga... 100 3e-19 gi#152787#gb#M24352.1#SHFSHTA S. dysenteriae cytotoxin (SHT)... 100 3e-19 gi#23266660#gb#AY135685.1# Escherichia coli O5 : H- Stx1A (st... 84 2e-14 gi#535088#emb#Z36901.1#ECSLTIABC E. coli (serotype OX3 : H8) S... 84 2e-14 gi#16580701#emb#AJ312232.1#ECO312232 Escherichia coli stx1v... 84 2e-14 gi#15986379#emb#AJ314839. 1 lECO314839 Escherichia coli stx1v 84 2e-14 gi#15986376#emb#AJ314838.1#ECO314838 Escherichia coli stx1v... 84 2e-14 gi#18147060#dbj#AB048235.1# Escherichia coli genes for Shig... 84 2e-14 gil18147057ldbjlAB048234. 1l EscherichiacoligenesforShig 84 2e-14 gil18147048ldbjlAB048231. 1 Escherichia coli genes for Shig... 84 2e-14 gil22759890ldbjlABO71624. 1l Escherichiacolistx1Bgenefor 84 2e-14 gil22759886ldbjlABO71622. 1l Escherichiacolistx1Bgenefor 84 2e-14 gil15869230lemblAJ413986. 1 B62413986 Bacteriophage 6220 stx... 84 2e-14 Unique oligonucleotide sequence: SEQ ID NO: 1328 RID : 1074621060-25163-144531891131. BLASTQ4 Query= (50 letters) Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, or phase 0, 1 or 2 HTGS sequences) 2,017, 250 sequences; 9,771, 119,756 total letters Distribution of 102 Blast Hits on the Query Sequence Score E Sequences producing significant alignments: (bits) Value gil2l6365321gblAF461172. 11 Escherichia coli FD930 Shiga tox... 100 3e-19 gil2l63652OIgblAF461168. 11 Escherichia coli EK201 Shiga tox... 100 3e-19 gi#7239813#gb#AF034975.3# Bacteriphage H-19B essential rec... 100 3e-19

gil6759950lgblAF153317. 1lAF153317 Shigella dysenteriae SapF 100 3e-19 gil4454334lemblAJ132761. 1lSSO132761 ShigellasonneistxAan 100 3e-19 gil9955818lemblAJ271153. 1lSDY271153 Shigella dysenteriae sh 100 3e-19 gi#534987#emb#Z36899.1#ECSLTIABA E.coli (serotype O48 : H21)... 100 3e-19 gil9955656lemblAJ251754. 1lECO251754 Escherichia coli stx1B 100 3e-19 gi#10799908#emb#AJ279086.1#SSO279086 Shigella sonnei bacter... 100 3e-19 gil30910914lemblAJ487680. 1lECO487680 Stx1-converting phage 100 3e-19 gi#152784#gb#M19437.1#SHFSHT S. dysenteriae type 1 Shiga tox... 100 3e-19 gi#215072#gb#M19473.1#J93SLTI Bacteriophage 933J (from E.co... 100 3e-19 gi#215043#gb#M16625.1#H19BSLT Bacteriophage H19B (from E.co... 100 3e-19 gi#215049#gb#M23980.1#H30SLT Bacteriophage H30 shiga-like t... 100 3e-19 gi#215046#gb#M17358.1#H19BSLTA Bacteriophage H19B shiga-lik... 100 3e-19 gi#152787#gb#M24352.1#SHFSHTA S. dysenteriae cytotoxin (SHT)... 100 3e-19 gil21636523lgblAF461169. 1l Escherichia coli EK921 Shiga tox 92 7e-17 gi#21636514#gb#AF461166.1# Escherichia coli C984 Shiga toxi... 92 7e-17 gi#12516385#gb#AE005442.1#AE005442 Escherichia coli O157 : H7... 92 7e-17 gi#32128012#dbj#AP005153.1# Stx1 converting bacteriophage D 92 7e-17 gi#535054#emb#Z36900.1#ECSLTIABB E. coli (serotype 0111 : H-)... 92 7e-17 gi#9955605#emb#AJ251325.1#ECO251325 Escherichia coli q gene... 92 7e-17 gi#17977984#emb#AJ304858.1#ECO304858 Escherichia coli phage... 92 7e-17 gi#147832#gb#L04539.1#ECOSLTTI Escherichia coli Shiga-like ... 92 7e-17 gi#11875068#dbj#AP000400.1# Escherichia coli O157 : H7 genomi 92 7e-17 gil13362333ldbjlAP002560. 1 Escherichia coli 0157: H7 DNA, c... 92 7e-17 gi#6468189#dbj#AB035142.1# EScherichia coli genes for Shiga... 84 2e-14 Although each BLAST search of the 50-mer unique oligonucleotide sequences (SEQ ID NOs : 1301,1302, 1327, 1328) produced over 100"hits", it is noted that each unique oligonucleotide sequences shares 100% homology and low E values (close to zero) over the entire length of the unique oligonucleotide sequence, with E. coli 0157 : H7 containing the Shiga

gene. In addition, it is noted that the Shigella species is also identified in SEQ ID NOs: 1301, 1327,1328. One skilled in the art will appreciate that historically, the Shigella gene was identified initially in the Shigella species, only later was it subsequently identified in the genome of E. coli 0157: H7. Extensive genomic research has shown that the genome of E. coli 0157: H7 and Shigella are extremely similar, and thus these 50 nucleic acids that comprise the unique oligonucleotide sequence derived from E. coli 0157: H7 Shigella gene are also likely to be present in the Shigella genome. Nevertheless, these data demonstrate the usefulness of these unique oligonucleotides to identify E. coli 0157 : H7 containing the Shiga gene.

All nucleotide sequences referred to in the present application are disclosed in the Sequence Listing submitted on a compact disk containing the file named 36609- 2825371fr. ST25. txt, 1,325, 056 bytes in size, created January 22,2004, and Table 3 submitted on a compact disk containing the file named Table_3. txt, 868, 352 bytes in size, created January 23, 2004, and are hereby incorporated by reference in their entirety.

All patents, publications and abstracts cited above are incorporated herein by reference in their entirety. It should be understood that the foregoing relates only to preferred embodiments of the present invention and that numerous modifications or alterations may be made therein without departing from the spirit and the scope of the present invention as defined in the following claims.

CLAIMS We claim: 1. An isolated unique genomic sequence comprising an isolated nucleic acid sequence of any one of SEQ ID NOs: 1 to 1023.

2. The isolated unique genomic sequence of Claim 1, wherein the isolated unique genomic sequence is from a biological organism and the biological organism is Bacillus anthracis, Dengue virus, Ebola virus, Arbovirus, Francisella tularensis, Clostridium perf7ingens, Escherichia coli, Vaccinia, Yersinia pestis or Brucella melitensis.

3. The isolated unique genomic sequence of Claim 2, wherein the Escherichia coli is Escherichia coli 0157 : H7 or Escherichia coli K12.

4. The isolated unique genomic sequence of Claim 3, wherein the isolated unique genomic sequence is any one of SEQ ID NOs: 586 to 827 and the biological organism is Escherichia coli 0157: H7.

5. The isolated unique genomic sequence of Claim 3, wherein the isolated unique genomic sequence is any one of SEQ ID NOs: 828 to 882 and the biological organism is Escherichia coli K12.

6. The isolated unique genomic sequence of Claim 2, wherein the isolated unique genomic sequence is any one of SEQ ID NOs: 1 to 15 and the biological organism is Yersinia pestis.

7. The isolated unique genomic sequence of Claim 2, wherein the isolated unique genomic sequence is any one of SEQ ID NOs: 16 to 22 and the biological organism is Brucella melitensis.

8. The isolated unique genomic sequence of Claim 2, wherein the isolated unique genomic sequence is any one of SEQ ID NOs: 23 to 30 and the biological organism is Vaccinia.

9. The isolated unique genomic sequence of Claim 2, wherein the isolated unique genomic sequence is any one of SEQ ID NOs : 31 to 585 and the biological organism is Clostridium perfringens.