Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
ARRAY FOR DETECTING MICROBES
Document Type and Number:
WIPO Patent Application WO/2008/130394
Kind Code:
A2
Abstract:
The present embodiments relate to an array system for detecting and identifying biomolecules and organisms. More specifically, the present embodiments relate to an array system comprising a microarray configured to simultaneously detect a plurality of organisms in a sample at a high confidence level.

Inventors:
ANDERSEN GARY L (US)
DESANTIS TODD Z (US)
Application Number:
PCT/US2007/024720
Publication Date:
October 30, 2008
Filing Date:
November 29, 2007
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV CALIFORNIA (US)
ANDERSEN GARY L (US)
DESANTIS TODD Z (US)
International Classes:
C12Q1/68; G16B25/00
Foreign References:
US7108968B22006-09-19
US7115364B12006-10-03
Other References:
None
See also references of EP 2099935A4
Attorney, Agent or Firm:
FULLER, Michael L. (2040 Main Street14th Floo, Irvine California, US)
Download PDF:
Claims:

WHAT IS CLAIMED IS

1. An array system comprising: a microarray configured to simultaneously detect a plurality of organisms in a sample, wherein the microarray comprises fragments of 16s RNA unique to each organism and variants of said fragments comprising at least 1 nucleotide mismatch, wherein the level of confidence of species-specific detection derived from fragment matches is about 90% or higher.

2. The array system of Claim 1, wherein the plurality of organisms comprise bacteria or archaea.

3. The array system of Claim 1, wherein the fragments of 16s RNA are clustered and aligned into groups of similar sequence such that detection of an organism based on at least 11 fragment matches is possible.

4. The array system of Claim 1, wherein the level of confidence of species- specific detection derived from fragment matches is about 95% or higher.

5. The array system of Claim 1, wherein the level of confidence of species- specific detection derived from fragment matches is about 98% or higher.

6. The array system of Claim 1, wherein the majority of fragments of 16s RNA unique to each organism have at least 1 corresponding variant fragment comprising at least 1 nucleotide mismatch.

7. The array system of Claim 1, wherein every fragment of 16s RNA unique to each organism has at least 1 corresponding variant fragment comprising at least 1 nucleotide mismatch.

8. The array system of Claim 1, wherein the fragments are about 25 nucleotides long.

9. The array system of Claim 1, wherein the sample is an environmental sample.

10. The array system of Claim 9, wherein the environmental sample comprises at least one of soil, water or atmosphere.

11. The array system of Claim 1, wherein the sample is a clinical sample.

12. The array system of Claim 11, wherein the clinical sample comprises at least one of tissue, skin, bodily fluid or blood.

13. A method of detecting at least one organism comprising: applying a sample comprising a plurality of organisms to the array system of Claim 1 ; and identifying organisms in the sample.

14. The method of Claim 13, wherein the plurality of organisms comprise bacteria or archaea.

15. The method of Claim 13, wherein the majority of fragments of 16s RNA unique to each organism have at least 1 corresponding variant fragment comprising at least 1 nucleotide mismatch.

16. The method of Claim 13, wherein every fragment of 16s RNA unique to each organism has at least 1 corresponding variant fragment comprising at least 1 nucleotide mismatch.

17. The method of Claim 13, wherein the fragments are about 25 nucleotides long.

18. The method of Claim 13, wherein the at least one organism to be detected is the most metabolically active organism or organisms in the sample.

19. A method of fabricating an array system comprising: identifying 16s RNA sequences corresponding to a plurality of organisms of interest; selecting fragments of 16s RNA unique to each organism; creating variant RNA fragments corresponding to the fragments of 16s RNA unique to each organism which comprise at least 1 nucleotide mismatch; and fabricating said array system.

20. The method of Claim 19, wherein the plurality of organisms comprise bacteria or archaea.

21. The method of Claim 19, wherein the majority of fragments of 16s RNA unique to each organism have at least one corresponding variant fragment comprising at least 1 nucleotide mismatch.

22. The method of Claim 19, wherein every fragment of 16s RNA unique to each organism has at least 1 corresponding variant fragment comprising at least 1 nucleotide mismatch.

23. The method of Claim 19, wherein the fragments are about 25 nucleotides long.

Description:

ARRAY FOR DETECTING MICROBES

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority under 35 U.S. C. § 119(e) to U.S. Provisional Application No. 60/861,834 filed November 30, 2006, which is hereby incorporated by reference in its entirety.

STATEMENT REGARDING FEDERALLY SPONSORED R&D

[0001] This invention was made with Government support under Grant No. DE-AC03-76SF00098 from the Department of Homeland Security and Contract No. DE- AC0-05CH11231 from the Department of Energy.

BACKGROUND OF THE INVENTION Field of the Invention

[0002] The present embodiments relate to an array system for detecting and identifying biomolecules and organisms. More specifically, the present embodiments relate to an array system comprising a microarray configured to simultaneously detect a plurality of organisms in a sample at a high confidence level.

Description of the Related Art

[0003] In the fields of molecular biology and biochemistry, biopolymers such as nucleic acids and proteins from organisms are identified and/or fractionated in order to search for useful genes, diagnose diseases or identify organisms. A hybridization reaction is frequently used as a pretreatment for such process, where a target molecule in a sample is hybridized with a nucleic acid or a protein having a known sequence. For this purpose, microarrays, or DNA chips, are used on which probes such as DNAs, RNAs or proteins with known sequences are immobilized at predetermined positions.

[0004] A DNA microarray (also commonly known as gene or genome chip, DNA chip, or gene array) is a collection of microscopic DNA spots attached to a solid

surface, such as glass, plastic or silicon chip forming an array. The affixed DNA segments are known as probes (although some sources will use different nomenclature), thousands of which can be used in a single DNA microarray. Measuring gene expression using, microarrays is relevant to many areas of biology and medicine, such as studying treatments, disease, and developmental stages. For example, microarrays can be used to identify disease genes by comparing gene expression in diseased and normal cells.

[0005] Molecular approaches designed to describe organism diversity routinely rely upon classifying heterogeneous nucleic acids amplified by universal 16S RNA gene PCR (polymerase chain reaction). The resulting mixed amplicons can be quickly, but coarsely, typed into anonymous groups using T-/RFLP (Terminal Restriction Fragment Length Polymorphism), SSCP (single-strand conformation polymorphism) or T/DGGE (temperature/denaturing gradient gel. electrophoresis). These groups may be classified through sequencing, but this requires additional labor to physically isolate each 16S RNA type, does not scale well for large comparative studies such as environmental monitoring, and is only suitable for low complexity environments. Also, the number of clones that would be required to adequately catalogue the majority of taxa in a sample is too large to be efficiently or economically handled. As such, an improved array and method is needed to efficiently analyze a plurality of organisms without the disadvantages of the above technologies.

SUMMARY OF THE INVENTION

[0006] Some embodiments relate to an array system including a microarray configured to simultaneously detect a plurality of organisms in a sample, wherein the microarray comprises fragments of 16s RNA unique to each organism and variants of said fragments comprising at least 1 nucleotide mismatch, wherein the level of confidence of species-specific detection derived from fragment matches is about 90% or higher.

[0007] In one aspect, the plurality of organisms comprise bacteria or archaea.

[0008] In another aspect, the fragments of 16s RNA are clustered and aligned into groups of similar sequence such that detection of an organism based on at least 11 fragment matches is possible.

[0009] In yet another aspect, the level of confidence of species-specific detection derived from fragment matches is about 95% or higher.

[0010] In still another aspect, the level of confidence of species-specific detection derived from fragment matches is about 98% or higher.

[0011] In some embodiments, the majority of fragments of 16s RNA unique to each organism have a corresponding variant fragment comprising at least 1 nucleotide mismatch.

[0012] In some aspects, every fragment of 16s RNA unique to each organism has a corresponding variant fragment comprising at least 1 nucleotide mismatch.

[0013] In other aspects, the fragments are about 25 nucleotides long.

[0014] In some aspects, the sample is an environmental sample.

[0015] In other aspects, the environmental sample comprises at least one of soil, water or atmosphere.

[0016] In yet other aspects, the sample is a clinical sample.

[0017] In still other aspects, the clinical sample comprises at least one of tissue, skin, bodily fluid or blood.

[0018] Some embodiments relate to a method of detecting an organism including applying a sample comprising a plurality of organisms to the array system which includes a microarray that comprises fragments of 16s RNA unique to each organism and variants of said fragments comprising at least 1 nucleotide mismatch, wherein the level of confidence of species-specific detection derived from fragment matches is about 90% or higher; and identifying organisms in the sample.

[0019] In some aspects, the plurality of organisms comprise bacteria or archaea.

[0020] In other aspects, the majority of fragments of 16s RNA unique to each organism have a corresponding variant fragment comprising at least 1 nucleotide mismatch.

[0021] In still other aspects, every fragment of 16s RNA unique to each organism has a corresponding variant fragment comprising at least 1 nucleotide mismatch.

[0022] In yet other aspects, the fragments are about 25 nucleotides long.

[0023] In some aspects, the organism to be detected is the most metabolically active organism in the sample.

[0024] Some embodiments relate to a method of fabricating an array system including identifying 16s RNA sequences corresponding to a plurality of organisms of interest; selecting fragments of 16s RNA unique to each organism; creating variant RNA fragments corresponding to the fragments of 16s RNA unique to each organism which comprise at least 1 nucleotide mismatch; and fabricating said array system.

[0025] In some aspects, the plurality of organisms comprise bacteria or archaea.

[0026] In other aspects, the majority of fragments of 16s RNA unique to each organism have a corresponding variant fragment comprising at least 1 nucleotide mismatch.

[0027] In still other aspects, every fragment of 16s RNA unique to each organism has a corresponding variant fragment comprising at least 1 nucleotide mismatch.

[0028] In yet other aspects, the fragments are about 25 nucleotides long.

DETAILED DESCRIPTION

[0029] The present embodiments are related to an array system for detecting and identifying biomolecules and organisms. More specifically, the present embodiments relate to an array system comprising a microarray configured to simultaneously detect a plurality of organisms in a sample at a high confidence level.

[0030] In some embodiments, the array system uses multiple probes for increasing confidence of identification of a particular organism using a 16S rRNA gene targeted high density microarray. The use of multiple probes can greatly increase the

confidence level of a match to a particular organism. Also, in some embodiments, mismatch control probes corresponding to each perfect match probe can be used to further increase confidence of sequence-specific hybridization of a target to a probe. Probes with one or more mismatch can be used to indicate non-specific binding and a possible non-match. This has the advantage of reducing false positive results due to nonspecific hybridization, which is a significant problem with many current microarrays.

[0031] Some embodiments of the invention relate to a method of using an array to simultaneously identify multiple prokaryotic taxa with a relatively high confidence. A taxa is an individual microbial species or group of highly related species that share an average of about 97% 16S rRNA gene sequence identity. The array system of the current embodiments may use multiple confirmatory probes, each with from about 1 to about 20 corresponding mismatch control probes to target the most unique regions within a 16S rRNA gene for about 9000 taxa. Preferably, each confirmatory probe has from about 1 to about 10 corresponding mismatch probes. More preferably, each confirmatory probe has from about 1 to about 5 corresponding mismatch probes. The aforementioned about 9000 taxa represent a majority of the taxa that are currently known through 16S rRNA clone sequence libraries. In some embodiments, multiple targets can be assayed through a high-density oligonucleotide array. The sum of all target hybridizations is used to identify specific prokaryotic taxa. The result is a much more efficient and less time consuming way of identifying unknown organisms that in addition to providing results that could not previously be achieved, can also provide results in hours that other methods would require days to achieve.

[0032] In some embodiments, the array system of the present embodiments can be fabricated using 16s rRNA sequences as follows. From about 1 to about 500 short probes can be designed for each taxonomic group. In some embodiments, the probes can be proteins, antibodies, tissue samples or oligonucleotide fragments. In certain examples, oligonucleotide fragments are used as probes. In some embodiments, from about 1 to about 500 short oligonucleotide probes, preferably from about 2 to about 200 short

oligonucleotide probes, more preferably from about 5 to about 150 short oligonucleotide probes, even more preferably from about 8 to about 100 short oligonucleotide probes can be designed for each taxonomic grouping, allowing for the failure of one or more probes. In one example, at least about 11 short oligonucleotide probes are used for each taxonomic group. The oligonucleotide probes can each be from about 5 bp to about 100 bp, preferably from about 10 bp to about 50 bp, more preferably from about 15 bp to about 35 bp, even more preferably from about 20 bp to about 30 bp. In some embodiments, the probes may be 5-mers, 6-mers, 7-mers, 8-mers, 9-mers, 10-mers, 11- mers, 12-mers, 13-mers, 14-mers, 15-mers, 16-mers, 17-mers, 18-mers, 19-mers, 20- mers, 21-mers, 22-mers, 23-mers, 24-mers, 25-mers, 26-mers, 27-mers, 28-mers, 29- mers, 30-mers, 31-mers, 32-mers, 33-mers, 34-mers, 35-mers, 36-mers, 37-mers, 38- mers, 39-mers, 40-mers, 41-mers, 42-mers, 43-mers, 44-mers, 45-mers, 46-mers, 47- mers, 48-mers, 49-mers, 50-mers, 51-mers, 52-mers, 53-mers, 54-mers, 55-mers, 56- mers, 57-mers, 58-mers, 59-mers, 60-mers, 61-mers, 62-mers, 63-mers, 64-mers, 65- mers, 66-mers, 67-mers, 68-mers, 69-mers, 70-mers, 71-mers, 72-mers, 73-mers, 74- mers, 75-mers, 76-mers, 77-mers, 78-mers, 79-mers, 80-mers, 81-mers, 82-mers, 83- mers, 84-mers, 85-mers, 86-mers, 87-mers, 88-mers, 89-mers, 90-mers, 91-mers, 92- mers, 93-mers, 94-mers, 95-mers, 96-mers, 97-mers, 98-mers, 99-mers, 100-mers or combinations thereof.

[0033] Non-specific cross hybridization can be an issue when an abundant 16S rRNA gene shares sufficient sequence similarity to non-targeted probes, such that a weak but detectable signal is obtained. The use of sets of perfect match and mismatch probes (PM-MM) effectively minimizes the influence of cross-hybridization. In certain embodiments, each perfect match probe (PM) has one corresponding mismatch probe (MM) to form a pair that are useful for analyzing a particular 16S rRNA sequence. In other embodiments, each PM has more than one corresponding MM. Additionally, different PMs can have different numbers of corresponding MM probes. In some embodiments, each PM has from about 1 to about 20 MM, preferably, each PM has from about 1 to about 10 MM and more preferably, each PM has from about 1 to about 5 MM.

[0034] Any of the nucleotide bases can be replaced in the MM probe to result in a probe having a mismatch. In one example, the central nucleotide base sequence can be replaced with any of the three non-matching bases. In other examples, more than one nucleotide base in the MM is replaced with a non-matching base. In some examples, 10 nucleotides are replaced in the MM, in other examples, 5 nucleotides are replaced in the MM, in yet other examples 3 nucleotides are replaced in the MM, and in still other examples, 2 nucleotides are replaced in the MM. This is done so that the increased hybridization intensity signal of the PM over the one or more MM indicates a sequence- specific, positive hybridization. By requiring multiple PM-MM probes to have a confirmation interaction, the chance that the hybridization signal is due to a predicted target sequence is substantially increased.

[0035] In other embodiments, the 16S rRNA gene sequences can be grouped into distinct taxa such that a set of the short oligonucleotide probes that are specific to the tax on can be chosen. In some examples, the 16s rRNA gene sequences grouped into distinct taxa are from about 100 bp to about 1000 bp, preferably the gene sequences are from about 400 bp to about 900 bp, more preferably from about 500 bp to about 800 bp. The resulting about 9000 taxa represented on the array, each containing from about 1% to about 5% sequence divergence, preferably about 3% sequence divergence, can represent substantially all demarcated bacterial and archaeal orders.

[0036] In some embodiments, for a majority of the taxa represented on the array, probes can be designed from regions of gene sequences that have only been identified within a given taxon. In other embodiments, some taxa have no probe-level sequence that can be identified that is not shared with other groups of 16S rRNA gene sequences. For these taxonomic groupings, a set of from about 1 to about 500 short oligonucleotide probes, preferably from about 2 to about 200 short oligonucleotide probes, more preferably from about 5 to about 150 short oligonucleotide probes, even more preferably from about 8 to about 100 short oligonucleotide probes can be designed

to a combination of regions on the 16S rRNA gene that taken together as a whole do not exist in any other taxa. For the remaining taxa, a set of probes can be selected to minimize the number of putative cross-reactive taxa. For all three probe set groupings, the advantage of the hybridization approach is that multiple taxa can be identified simultaneously by targeting unique regions or combinations of sequence.

[0037] In some embodiments, oligonucleotide probes can then be selected to obtain an effective set of probes capable of correctly identifying the sample of interest. In certain embodiments, the probes are chosen based on various taxonomic organizations useful in the identification of particular sets of organisms.

[0038] In some embodiments, the chosen oligonucleotide probes can then be synthesized by any available method in the art. Some examples of suitable methods include printing with fine-pointed pins onto glass slides, photolithography using pre-made masks, photolithography using dynamic micromirror devices, ink-jet printing or electrochemistry. In one example, a photolithographic method can be used to directly synthesize the chosen oligonucleotide probes onto a surface. Suitable examples for the surface include glass, plastic, silicon and any other surface available in the art. In certain examples, the oligonucleotide probes can be synthesized on a glass surface at an approximate density of from about 1,000 probes per μm 2 to about 100,000 probes per μm 2 , preferably from about 2000 probes per μm 2 to about 50,000 probes per μm 2 , more preferably from about 5000 probes per μm 2 to about 20,000 probes per μm 2 . In one example, the density of the probes is about 10,000 probes per μm 2 . The array can then be arranged in any configuration, such as, for example, a square grid of rows and columns. Some areas of the array can be oligonucleotide 16S rDNA PM or MM probes, and others can be used for image orientation, normalization controls or other analyses. In some embodiments, materials for fabricating the array can be obtained from Affymetrix, GE Healthcare (Little Chalfont, Buckinghamshire, United Kingdom) or Agilent Technologies (Palo Alto, California.)

[0039] In some embodiments, the array system is configured to have controls. Some examples of such controls include 1) probes that target amplicons of prokaryotic metabolic genes spiked into the 16S rDNA amplicon mix in defined quantities just prior to fragmentation and 2) probes complimentary to a pre-labeled oligonucleotide added into the hybridization mix. The first control collectively tests the fragmentation, biotinylation, hybridization, staining and scanning efficiency of the array system. It also allows the overall fluorescent intensity to be normalized across all the arrays in an experiment. The second control directly assays the hybridization, staining and scanning of the array system. However, the array system of the present embodiments is not limited to these particular examples of possible controls.

[0040] The accuracy of the array of some embodiments has been validated by comparing the results of some arrays with 16S rRNA gene sequences from approximately 700 clones in each of 3 samples. A specific taxa is identified as being present in a sample if a majority (from about 70% to about 100%, preferably from about 80% .to about 100% and more preferably from about 90% to about 100%) of the probes on the array have a hybridization signal about 100 times, 200 times, 300 times, 400 times or 500 times greater than that of the background and the perfect match probe has a significantly greater hybridization signal than its one or more partner mismatch control probe or probes. This ensures a higher probability of a sequence specific hybridization to the probe. In some embodiments, the use of multiple probes, each independently indicating that the target sequence of the taxonomic group being identified is present, increases the probability of a correct identification of the organism of interest.

[0041] Biomolecules, such proteins, DNA, RNA, DNA from amplified products and native rRNA from the 16S rRNA gene, for example can be probed by the array of the present embodiments. In some embodiments, probes are designed to be antisense to the native rRNA so that directly labeled rRNA from samples can be placed directly on the array to identify a majority of the actively metabolizing organisms in a sample with no bias from PCR amplification. Actively metabolizing organisms have

significantly higher numbers of ribosomes used for the production of proteins, therefore, in some embodiments, the capacity to make proteins at a particular point in time of a certain organism can be measured. This is not possible in systems where only the 16S rRNA gene DNA is measured which encodes only the potential to make proteins and is the same whether an organism is actively metabolizing or quiescent or dead. In this way, the array system of the present embodiments can directly identify the metabolizing organisms within diverse communities.

[0042] In some embodiments, the array system is able to measure the microbial diversity of complex communities without PCR amplification, and consequently, without all of the inherent biases associated with PCR amplification. Actively metabolizing cells typically have about 20,000 or more ribosomal copies within their cell for protein assembly compared to quiescent or dead cells that have few. In some embodiments, rRNA can be purified directly from environmental samples and processed with no amplification step, thereby avoiding any of the biases caused by the preferential amplification of some sequences over others. Thus, in some embodiments the signal from the array system can reflect the true number of rRNA molecules that are present in the samples, which can be expressed as the number of cells multiplied by the number of rRNA copies within each cell. The number of cells in a sample can then be inferred by several different methods, such as, for example, quantitative real-time PCR, or FISH (fluorescence in situ hybridization.) Then the average number of ribosomes within each cell may be calculated.

[0043] In some embodiments, the samples used can be environmental samples from any environmental source, for example, naturally occurring or artificial atmosphere, water systems, soil or any other sample of interest. In some embodiments, the environmental samples may be obtained from, for example, atmospheric pathogen collection systems, sub-surface sediments, groundwater, ancient water deep within the ground, plant root-soil interface of grassland, coastal water and sewage treatment plants. Because of the ability of the array system to simultaneously test for such a broad range of

organisms based on almost all known 16s rRNA gene sequences, the array system of the present embodiments can be used in any environment, which also distinguishes it from other array systems which generally must be targeted to specific environments.

[0044] In other embodiments, the sample used with the array system can be any kind of clinical or medical sample. For example, samples from blood, the lungs or the gut of mammals may be assayed using the array system. Also, the array system of the present embodiments can be used to identify an infection in the blood of an animal. The array system of the present embodiments can also be used to assay medical samples that are directly or indirectly exposed to the outside of the body, such as the lungs, ear, nose, throat, the entirety of the digestive system or the skin of an animal. Hospitals currently lack the resources to identify the complex microbial communities that reside in these areas.

[0045] Another advantage of the present embodiments is that simultaneous detection of a majority of currently known organisms is possible with one sample. This allows for much more efficient study and determination of particular organisms within a particular sample. Current microarrays do not have this capability. Also, with the array system of the present embodiments, simultaneous detection of the top metabolizing organisms within a sample can be determined without bias from PCR amplification, greatly increasing the efficiency and accuracy of the detection process.

[0046] Some embodiments relate to methods of detecting an organism in a sample using the described array system. These methods include contacting a sample with one organism or a plurality of organisms to the array system of the present embodiments and detecting the organism or organisms. In some embodiments, the organism or organisms to be detected are bacteria or archaea. In some embodiments, the organism or organisms to be detected are the most metabolically active organism or organisms in the sample.

[0047] Some embodiments relate to a method of fabricating an array system including identifying 16s RNA sequences corresponding to a plurality of organisms of interest, selecting fragments of 16s RNA unique to each organism and creating variant RNA fragments corresponding to the fragments of 16s RNA unique to each organism which comprise at least 1 nucleotide mismatch and then fabricating the array system.

[0048] The following examples are provided for illustrative purposes only, and are in no way intended to limit the scope of the present invention.

EXAMPLE 1

[0049] An array system was fabricated using 16s rRNA sequences taken from a plurality of bacterial species. A minimum of 11 different, short oligonucleotide probes were designed for each taxonomic grouping, allowing one or more probes to not bind, but still give a positive signal in the assay. Non-specific cross hybridization is an issue when an abundant 16S rRNA gene shares sufficient sequence similarity to non-targeted probes, such that a weak but detectable signal is obtained. The use of a perfect match-mismatch (PM-MM) probe pair effectively minimized the influence of cross-hybridization. In this technique, the central nucleotide is replaced with any of the three non-matching bases so that the increased hybridization intensity signal of the PM over the paired MM indicates a sequence-specific, positive hybridization. By requiring multiple PM-MM probe-pairs to have a positive interaction, the chance that the hybridization signal is due to a predicted target sequence is substantially increased.

[0050] The known 16S rRNA gene sequences larger than 600 bp were grouped into distinct taxa such that a set of at least 11 probes that were specific to each taxon could be selected. The resulting 8,935 taxa (8,741 of which are represented on the array), each containing approximately 3% sequence divergence, represented all 121 demarcated bacterial and archaeal orders. For a majority of the taxa represented on the array (5,737, 65%), probes were designed from regions of 16S rRNA gene sequences that

have only been identified within a given taxon. For 1,198 taxa (14%) no probe-level sequence could be identified that was not shared with other groups of 16S rRNA gene sequences, although the gene sequence as a whole was distinctive. For these taxonomic groupings, a set of at least 1 1 probes was designed to a combination of regions on the 16S rRNA gene that taken together as a whole did not exist in any other taxa. For the remaining 1,806 taxa (21%), a set of probes were selected to minimize the number of putative cross-reactive taxa. Although more than half of the probes in this group have a hybridization potential to one outside sequence, this sequence was typically from a phylogenetically similar taxon. For all three probe set groupings, the advantage of the hybridization approach is that multiple taxa can be identified simultaneously by targeting unique regions or combinations of sequence.

EXAMPLE 2

[0051] An array system was fabricated according to the following protocol. 16S rDNA sequences (Escherichia coli base pair positions 47 to 1473) were obtained from over 30,000 16S rDNA sequences that were at least 600 nucleotides in length in the 15 March 2002 release of the 16S rDNA database, "Greengenes." This region was selected because it is bounded on both ends by universally conserved segments that can be used as PCR priming sites to amplify bacterial or archaeal genomic material using only 2 to 4 primers. Putative chimeric sequences were filtered from the data set using computer software preventing them from being misconstrued as novel organisms. The filtered sequences are considered to be the set of putative 16S rDNA amplicons. Sequences were clustered to enable each sequence of a cluster to be complementary to a set of perfectly matching (PM) probes. Putative amplicons were placed in the same cluster as a result of common 17-mers found in the sequence.

[0052] The resulting 8,988 clusters, each containing approximately 3% sequence divergence, were considered operational taxonomic units (OTUs) representing all 121 demarcated prokaryotic orders. The taxonomic family of each OTU was assigned

according to the placement of its member organisms in Bergey's Taxonomic Outline. The taxonomic outline as maintained by Philip Hugenholtz was consulted for phylogenetic classes containing uncultured environmental organisms or unclassified families belonging to named higher taxa. The OTUs comprising each family were clustered into subfamilies by transitive sequence identity. Altogether, 842 sub-families were found. The taxonomic position of each OTU as well as the accompanying NCBI accession numbers of the sequences composing each OTU are recorded and publicly available.

[0053] The objective of the probe selection strategy was to obtain an effective set of probes capable of correctly categorizing mixed amplicons into their proper OTU. For each OTU, a set of 1 1 or more specific 25-mers (probes) were sought that were prevalent in members of a given OTU but were dissimilar from sequences outside the given OTU. In the first step of probe selection for a particular OTU, each of the sequences in the OTU was separated into overlapping 25-mers, the potential targets. Then each potential target was matched to as many sequences of the OTU as possible. First, a text pattern was used for a search to match potential targets and sequences, however, since partial gene sequences were included in the reference set additional methods were performed. Therefore, the multiple sequence alignment provided by Greengenes was used to provide a discrete measurement of group size at each potential probe site. For example, if an OTU containing seven sequences possessed a probe site where one member was missing data, then the site-specific OTU size was only six.

[0054] In ranking the possible targets, those having data for all members of that OTU were preferred over those found only in a fraction of the OTU members. In the second step, a subset of the prevalent targets was selected and reverse complimented into probe orientation, avoiding those capable of mis-hybridization to an unintended amplicon. Probes presumed to have the capacity to mis-hybridize were those 25-mers that contained a central 17-mer matching sequences in more than one OTU. Thus, probes that were unique to an OTU solely due to a distinctive base in one of the outer four bases were avoided. Also, probes with mis-hybridization potential to sequences having a

common tree node near the root were favored over those with a common node near the terminal branch.

[0055] Probes complementary to target sequences that were selected for fabrication were termed perfectly matching (PM) probes. As each PM probe was chosen, it was paired with a control 25-mer (mismatching probe, MM), identical in all positions except the thirteenth base. The MM probe did not contain a central 17-mer complimentary to sequences in any OTU. The probe complementing the target (PM) and MM probes constitute a probe pair analyzed together.

[0056] The chosen oligonucleotides were synthesized by a photolithographic method at Affymetrix Inc. (Santa Clara, CA, USA) directly onto a 1.28 cm by 1.28 cm glass surface at an approximate density of 10,000 probes per μm 2 . Each unique probe sequence on the array had a copy number of roughly 3.2 X 10 6 (personal communication, Affymetrix). The entire array of 506,944 features was arranged as a square grid of 712 rows and columns. Of these features, 297,851 were oligonucleotide 16S rDNA PM or MM probes, and the remaining were used for image orientation, normalization controls or other unrelated analyses. Each DNA chip had two kinds of controls on it: 1) probes that target amplicons of prokaryotic metabolic genes spiked into the 16S rDNA amplicon mix in defined quantities just prior to fragmentation and 2) probes complimentary to a pre- labeled oligonucleotide added into the hybridization mix. The first control collectively tested the fragmentation, biotinylation, hybridization, staining and scanning efficiency. It also allowed the overall fluorescent intensity to be normalized across all the arrays in an experiment. The second control directly assayed the hybridization, staining and scanning.

EXAMPLE 3

[0057] A study was done on diverse and dynamic bacterial population in urban aerosols utilizing an array system of certain embodiments. Air samples were collected using an air filtration collection system under vacuum located within six EPA

air quality network sites in both San Antonio and Austin, Texas. Approximately 10 liters of air per minute were collected in a polyethylene terephthalate (Celanex), 1.0 μm filter (Hoechst Calanese). Samples were collected daily over a 24h period. Sample filters were washed in 10 mL buffer (0. IM Sodium Phosphate, 1OmM EDTA, pH 7.4, 0.01% Tween- 20), and the suspension was stored frozen until extracted. Samples were collected from 4 May to 29 August 2003.

[0058] Sample dates were divided according to a 52-week calendar year starting January 1, 2003, with each Monday to Sunday cycle constituting a full week. Samples from four randomly chosen days within each sample week were extracted. Each date chosen for extraction consisted of 0.6 mL filter wash from each of the six sampling sites for that city (San Antonio or Austin) combined into a "day pool" before extraction. In total, for each week, 24 filters were sampled.

[0059] The "day pools" were centrifuged at 16,000 x g for 25 min and the pellets were resuspended in 400μL sodium phosphate buffer (100 mM, pH 8). The resuspended pellets were transferred into 2mL silica bead lysis tubes containing 0.9g of silica/zirconia lysis bead mix (0.3 g of 0.5 mm zirconia/silica beads and 0.6 g of 0.1 mm zirconia/silica beads). For each lysis tube, 300μL buffered sodium dodecyl sulfate (SDS) (100 mM sodium chloride, 500 mM Tris pH 8, 10% [w/v] SDS), and 300μL phenol:chloroform:isoamyl alcohol (25:24: 1) were added. Lysis tubes were inverted and flicked three times to mix buffers before bead mill homogenization with a BiolOl Fast Prep 120 machine (Qbiogene, Carlsbad, CA) at 6.5 m s "1 for 45 s. Following centrifugation at 16,000 x g for 5 min, the aqueous supernatant was removed to a new 2mL tube and kept at -2O 0 C for 1 hour to overnight. An equal volume of chloroform was added to the thawed supernatant prior to vortexing for 5 s and centrifugation at 16,000 x g for 3 min. The supernatant was then combined with two volumes of a binding buffer "Solution 3" (UltraClean Soil DNA kit, MoBio Laboratories, Solana Beach, CA). Genomic DNA from the mixture was isolated on a MoBio spin column, washed with "Solution 4" and eluted in 60μL of IX Tris-EDTA according to the manufacturer's

instructions. The DNA was further purified by passage through a Sephacryl S-200 HR spin column (Amersham, Piscataway, NJ, USA) and stored at 4°C prior to PCR amplification. DNA was quantified using a PicoGreen fluorescence assay according to the manufacturer's recommended protocol (Invitrogen, Carlsbad, CA).

[0060] The 16S rRNA gene was amplified from the DNA extract using universal primers 27F.1, (5' AGRGTTTGATCMTGGCTCAG) (SEQ ID NO: 1) and 1492R, (5' GGTTACCTTGTTACGACTT) (SEQ ID NO: 2). Each PCR reaction mix contained IX Ex Taq buffer (Takara Bio Inc, Japan), 0.8mM dNTP mixture, 0.02U/μL Ex Taq polymerase, 0.4mg/mL bovine serum albumin (BSA), and l.OμM each primer. PCR conditions were 1 cycle of 3 min at 95 0 C, followed by 35 cycles of 30 sec at 95°C, 30 sec at 53°C, and 1 min at 72°C, and finishing with 7 min incubation at 72°C. When the total mass of PCR product for a sample week reached 2μg (by gel quantification), all PCR reactions for that week were pooled and concentrated to a volume less than 40μL with a Micron YMlOO spin filter (Millipore, Billerica, MA) for microarray analysis.

[0061] The pooled PCR product was spiked with known concentrations of synthetic 16S rRNA gene fragments and non-16S rRNA gene fragments according to Table Sl. This mix was fragmented using DNAse I (0.02 U/μg DNA, Invitrogen, CA) and One-Phor-All buffer (Amersham, NJ) per Affymetrix 's protocol, with incubation at 25 0 C for 10 min., followed by enzyme denaturation at 98 0 C for 10 min. Biotin labeling was performed using an Enzo® BioArray™ Terminal Labeling Kit (Enzo Life Sciences Inc., Farmingdale, NY) per the manufacturer's directions. The labeled DNA was then denatured (99 0 C for 5 min) and hybridized to the DNA microarray at 48 0 C overnight (> 16 hr). The microarrays were washed and stained per the Affymetrix protocol.

[0062] The array was scanned using a GeneArray Scanner (Affymetrix, Santa Clara, CA, USA). The scan was recorded as a pixel image and analyzed using standard Affymetrix software (Microarray Analysis Suite, version 5.1) that reduced the data to an

individual signal value for each probe. Background probes were identified as those producing intensities in the lowest 2% of all intensities. The average intensity of the background probes was subtracted from the fluorescence intensity of all probes. The noise value (N) was the variation in pixel intensity signals observed by the scanner as it read the array surface. The standard deviation of the pixel intensities within each of the identified background cells was divided by the square root of the number of pixels comprising that cell. The average of the resulting quotients was used for N in the calculations described below.

[0063] Probe pairs scored as positive were those that met two criteria: (i) the intensity of fluorescence from the perfectly matched probe (PM) was greater than 1.3 times the intensity from the mismatched control (MM), and (ii) the difference in intensity, PM minus MM, was at least 130 times greater than the squared noise value (>130 N 2 ). These two criteria were chosen empirically to provide stringency while maintaining sensitivity to the amplicons known to be present from sequencing results of cloning the San Antonio week 29 sample. The positive fraction (PosFrac) was calculated for each probe set as the number of positive probe pairs divided by the total number of probe pairs in a probe set. A taxon was considered present in the sample when over 92% of its assigned probe pairs for its corresponding probe set were positive (PosFrac > 0.92). This was determined based on empirical data from clone library analyses. Hybridization intensity (hereafter referred to as intensity) was calculated in arbitrary units (a.u.) for each probe set as the trimmed average (maximum and minimum values removed before averaging) of the PM minus MM intensity differences across the probe pairs in a given probe set. AU intensities < 1 were shifted to 1 to avoid errors in subsequent logarithmic transformations. When summarizing chip results to the sub-family, the probe set producing the highest intensity was used.

[0064] To compare the diversity of bacteria detected with microarrays to a known standard, one sample week was chosen for cloning and sequencing and for replicate microarray analysis. One large pool of SSU amplicons (96 reactions, 50

μL/reaction) from San Antonio week 29 was made. One milliliter of the pooled PCR product was gel purified and 768 clones were sequenced at the DOE Joint Genome Institute (Walnut Creek, CA) by standard methods. An aliquot of this pooled PCR product was also hybridized to a microarray (three replicate arrays performed). Subfamilies containing a taxon scored as present in all three array replicates were recorded. Individual cloned rRNA genes were sequenced from each terminus, assembled using Phred and Phrap (S9, SlO, Sl 1), and were required to pass quality tests of Phred 20 (base call error probability < 10 "2 °) to be included in the comparison.

[0065] Sequences that appeared chimeric were removed using Bellerophon (S2) with two requirements; (1) the preference score must be less than 1.3 and (2) the divergence ratio must be less than 1.1. The divergence ratio is a new metric implemented to weight the likelihood of a sequence being chimeric according to the similarity of the parent sequences. The more distantly related the parent sequences are to each other relative to their divergence from the chimeric sequence, the greater the likelihood that the inferred chimera is real. This metric uses the average sequence identity between the two fragments of the candidate and their corresponding parent sequences as the numerator, and the sequence identity between the parent sequences as the denominator. All calculations are made using a 300 base pair window on either side of the most likely break point. A divergence ratio of 1.1 was empirically determined to be the threshold for classifying sequences as putatively chimeric.

[0066] Similarity of clones to array taxa was calculated with DNADIST (S 12) using the DNAML-F84 option assuming a transition:transversion ratio of 2.0 and an A, C, G, T 16S rRNA gene base frequency of 0.2537, 0.2317, 0.3167, 0.1979, respectively. We calculated these parameters empirically from all records of the 'Greengenes' 16S rRNA multiple sequence alignment over 1,250 nucleotides in length. The Lane mask (S 13) was used to restrict similarity observations to 1,287 conserved columns (lanes) of aligned characters. Cloned sequences from this study were rejected from further analysis when < 1,000 characters could be compared to a lane-masked reference sequence.

Sequences were assigned to a taxonomic node using a sliding scale of similarity threshold (S 14). Phylum, class, order, family, sub-family, or taxon placement was accepted when a clone surpassed similarity thresholds of 80%, 85%, 90%, 92%, 94%, or 97%, respectively. When similarity to nearest database sequence was <94%, the clone was considered to represent a novel sub-family. A full comparison between clone and array analysis is presented in Table S2.

[0067] Primers targeting sequences within particular taxa/sub-families were generated by ARB's probe design feature (S 15). Melting temperatures were constrained from 45 0 C to 65 0 C with G+C content between 40 and 70%. The probes were chosen to contain 3' bases non-complementary to sequences outside of the taxon/sub-family. Primers were matched using Primer3 (S 16) to create primer pairs (Table S3). Sequences were generated using the Takara enzyme system as described above with the necessary adjustments in annealing temperatures. Amplicons were purified (PureLink PCR Purification Kit, Invitrogen) and sequenced directly or, if there were multiple unresolved sequences, cloned using a TOPO pCR2.1 cloning kit (Invitrogen, CA) according to the manufacturer's instructions. The M 13 primer pair was used for clones to generate insert amplicons for sequencing at UC Berkeley's sequencing facility.

[0068] To determine whether changes in 16S rRNA gene concentration could be detected using the array, various quantities of distinct rRNA gene types were hybridized to the array in rotating combinations. We chose environmental organisms, organisms involved in bioremediation, and a pathogen of biodefense relevance. 16S rRNA genes were amplified from each of the organisms in Table S4. Then each of these nine distinct 16S rRNA gene standards was tested once in each concentration category spanning 5 orders of magnitude (0 molecules, 6 x 10 7 , 1.44 x 10 8 , 3.46 x 10 8 , 8.30 x 10 8 , 1.99 x 10 9 , 4.78 x 10 9 , 2.75 x 10 10 , 6.61 x 10 10 , 1.59 x lO 11 ) with concentrations of individual 16S rRNA gene types rotating between arrays such that each array contained the same total of 16S rRNA gene molecules. This is similar to a Latin Square design, although with a 9x11 format matrix.

[0069] A taxon (#9389) consisting only of two sequences of Pseudomonas oleovorans that correlated well with environmental variables was chosen for quantitative PCR confirmation of array observed quantitative shifts. Primers for this taxon were designed using the ARB (S 15) probe match function to determine unique priming sites based upon regions detected by array probes. These regions were then imputed into Primer3 (S 16) in order to choose optimal oligonucleotide primers for PCR. Primer quality was further assessed using Beacon Designer v3.0 (Premier BioSoft, CA). Primers 9389F2 (CGACTACCTGGACTGACACT) (SEQ ID NO: 3) and 9389R2 (CACCGGCAGTCTCCTTAGAG) (SEQ ID NO: 4) were chosen to amplify a 436 bp fragment.

[0070] To test the specificity of this primer pair, we used a nested PCR approach. 16S rRNA genes were amplified using universal primers (27F, 1492R) from pooled aerosol genomic DNA extracts from both Austin and San Antonio, Texas. These products were purified and used as template in PCR reactions using primer set 9389F2- 9389R2. Amplicons were then ligated to pCR2.1 and transformed into E.coli TOPlO cells as recommended by the manufacturer (Invitrogen, CA). Five clones were chosen at random for each of the two cities (10 clones total) and inserts were amplified using vector specific primers Ml 3 forward and reverse. Standard Sanger sequencing was performed and sequences were tested for homology against existing database entries (NCBI GenBank, RDPII and Greengenes).

[0071] To assay P. oleovorans 16S rRNA gene copies in genomic DNA extracts, we performed real-time quantitative PCR (qPCR) using an iCycler iQ real-time detection system (BioRad, CA) with the iQ Sybr ® Green Supermix (BioRad, CA) kit. Reaction mixtures (final volume, 25 μl) contained IX iQ Sybr ® Green Supermix, 7.5 pmol of each primer, 25 ug BSA, 0.5 μl DNA extract and DNase/RNase free water. Following enzyme activation (95 0 C, 3 min), up to 50 cycles of 95 0 C, 30 s; 61 0 C, 30 s; 85 0 C, 10 s and 72°C, 45 s were performed. The specific data acquisition step (85°C for 10

s) was set above the Tm of potential primer dimers and below the Tm of the product to minimize any non-amplicon Sybr Green fluorescence. Copy number of P. oleovorans 16S rRNA gene molecules was quantified by comparing cycle thresholds to a standard curve (in the range of 7.6 x 10° to 7.6 x 10 5 copies μl "1 ), run in parallel, using cloned P. oleovorans 16S rRNA amplicons generated by PCR using primers M 13 forward and reverse. Regression coefficients for the standard curves were typically greater than 0.99, and post amplification melt curve analyses displayed a single peak at 87.5 0 C, indicative of specific Pseudomonas oleovorans 16S rRNA gene amplification (data not shown).

[0072] To account for scanning intensity variation from array to array, internal standards were added to each experiment. The internal standards were a set of thirteen amplicons generated from yeast and bacterial metabolic genes and five synthetic 16S rRNA-like genes spiked into each aerosol amplicon pool prior to fragmentation. The known concentrations of the amplicons ranged from 4 pM to 605 pM in the final hybridization mix. The intensities resulting from the fifteen corresponding probe sets were natural log transformed. Adjustment factors for each array were calculated by fitting the linear model using the least-squares method. An array's adjustment factor was subtracted from each probe set's ln(intensity).

[0073] For each day of aerosol sampling, 15 factors including humidity, wind, temperature, precipitation, pressure, particulate matter, and week of year were recorded from the U.S. National Climatic Data Center (http://www.ncdc.noaa.gov) or the Texas Natural Resource Conservation Commission (http://www.tceq.state.tx.us). The weekly mean, minimum, maximum, and range of values were calculated for each factor from the collected data. The changes in ln(intensity) for each taxon considered present in the study was tested for correlation against the environmental conditions. The resulting p-values were adjusted using the step-up False Discovery Rate (FDR) controlling procedure (S 18).

[0074] Multivariate regression tree analysis (S 19, S20) was carried out using the package 'mvpart' within the 'R' statistical programming environment. A Bray-

Curtis-based distance matrix was created using the function 'gdist'. The Brady-Curtis measure of dissimilarity is generally regarded as a good measure of ecological distance when dealing with 'species' abundance as it allows for non-linear responses to environmental gradients (S 19, S21).

[0075] Prior to rarefaction analysis a distance matrix (DNAML homology) of clone sequences was created using an online tool at http://greengenes.lbl.gov/cgi-bin/nph- distance_matrix.cgi following alignment of the sequences using the NAST aligner (http://greengenes.lbl.gov/NAST) (S22). DOTUR (S23) was used to generate rarefaction curves, Chaol and ACE richness predictions and rank-abundance curves. Nearest neighbor joining was used with 1000 iterations for bootstrapping.

[0076] DNA yields in the pooled weekly filter washes ranged from 0.522 ng to 154 ng. As only an aliquot of the filter washes was extracted we extrapolate the range of DNA extractable from each daily filter to be between 150 ng and 4300 ng assuming 10% extraction efficiency. Using previous estimates of bacterial to fungal ratios in aerosols (49% bacterial, 44% fungal clones; S24) this range is equivalent to 1.2 x 10 7 to 3.5 x 10 8 bacterial cells per filter assuming a mean DNA content of a bacterial cell of 6 fg (S25).

[0077] Table Sl. Spike in-controls of functional genes and synthetic 16S rRNA- hke genes used for internal array normalization.

Affymetrix control Molecules Description spikes applied

AFFX-BioB-5_at 5 83 x 1O 1U E coh biotin synthetase

AFFX-BioB-M_at 5 43 x 10 10 E coh biotin synthetase

AFFX-BioC-5_at 2 26 x 10 10 E coli bioC protein

AFFX-BioC-3_at 1 26 x 10 10 E coli bioC protein

AFFX-BioDn-3_at 1 68 x 10 10 E coli dethiobiotin synthetase

AFFX-CreX-5_at 2 17 x 10 9 Bacteriophage Pl ere recombinase protein

AFFX-DapX-5_at 9 03 x 10 8 B subtilis dapB, dihydrodipicolinate reductase

AFFX-DapX-M_at 3 03 x 10 10 B subtilis dapB, dihydrodipicolinate reductase

YFL039C 5 02 x 10 s Saccharomyces, Gene for actin (Act Ip) protein

YER022W 1 21 x 10 9 Saccharomyces, RNA polymerase II mediator complex subunit (SRB4p)

YER 148 W 2 91 x 10 9 Saccharomyces, TATA-binding protein, general transcription factor (SPTl 5)

YEL002C 7 0O x 10 9 Saccharomyces, Beta subunit of the oligosaccharyl transferase (OST) glycoprotein complex (WBPl)

YEL024W 7 29 x 10 10 Saccharomyces, Ubiquinol-cytochrome-c reductase (RIPl)

Synthetic 16S rRNA control spikes

SYNM neurolyt_st 6 74 x 10 s Synthetic derivative of Mycoplasma neurolyticum 16S rRNA gene SYNLc oenos_st 3 9O x I O 9 Synthetic derivative of Leuconostoc oenos 16S rRNA gene SYNCau cres8_st 9 38 x lO 9 Synthetic derivative of Caulobacter crescenius 16S rRNA gene S YNFer nodosm_st 4 05 x 10'° Synthetic derivative of Fervidobacterium nodosum 16S rRNA gene SYNSap grandest 1 62 x lO 9 Synthetic derivative of Saprospira grandis 16S rRNA gene

[0078] Table S2. Comparison between clone and array results. (I) A sub-family must have at least one taxon present above the positive probe threshold of 0.92 (92%) in all three replicates to be considered present. (2) For a clone to be assigned to a sub-family its DNAML similarity must be above the 0.94 (94%) threshold defined for sub-families. (3) This is the maximum DNAML similarity measured. (4) Both maximum preference score and maximum divergence ratio must pass the criteria below for a clone to be considered non-chimeric. (5) Bellerophon preference score, a ratio of 1.3 or greater has been empirically shown to demonstrate a chimeric molecule. (6) Bellerophon divergence ratio. This is a new metric devised to aid chimera detection, a score greater than 1.1 indicates a potential chimera.

Array Clone detection Chimera checking 4 Comparison detection

3/3 DNAML similarity Array Array Clonin replicates only and g only

Clonin g

Sub-families pass=l, number maximum maximum maximum pass=l, pass=l, pass=l fail=0 of similarity preferenc divergenc fail=0 fail=0 fail=0 clones e score e ratio assigne d to subfamily 2 ^

Bacteria; AD3; Unclassified; Unclassified; 1 0 0 Unclassified; sf_l

Bacteria; Acidobacteria; Acidobacteria-10; 1 0 0 Unclassified; Unclassified; sf_l

Bacteria; Acidobacteria; Acidobacteria-4; 1 0 0 Ellin6075/l 1-25; Unclassified; sf_l

Bacteria; Acidobacteria; Acidobacteria-6; 1 0 0 Unclassified; Unclassified; sf_l

Bacteria; Acidobacteria; Acidobacteria; 0.973 1.16 1.06 0 1 0

Acidobacteriales; Acidobacteriaceae; sf_14

Bacteria; Acidobacteria; Acidobacteria; 1 1 0 0 Acidobacteriales; Acidobacteriaceae; sf_16

Bacteria; Acidobacteria; Solibacteres; 1 2 0.960 0.00 0.00 0 1 0 Unclassified; Unclassified; sf_l

Bacteria; Actinobacteria; Actinobacteria; 1 1 0 0 Acidimicrobiales; Acidimicrobiaceae; sf_l

Bacteria; Actinobacteria; Actinobacteria; 1 1 0 0 Acidimicrobiales; Microthrixineae; sf_l

Bacteria; Actinobacteria; Actinobacteria; 0 1 0.947 0.00 0.00 0 0 1 Acidimicrobiales; Microthrixineae; sf_12

Bacteria; Actinobacteria; Actinobacteria; 1 1 0.961 1.28 1.06 0 1 0 Acidimicrobiales; Unclassified; sf l

Bacteria; Actinobacteria; Actinobacteria; 1 1 0.947 0.00 0.00 0 1 0 Actinomycetales; Acidothermaceae; sf_l

Bacteria; Actinobacteria; Actinobacteria; 1 1 0 0 Actinomycetales; Actinomycetaceae; sf_l

Bacteria; Actinobacteria; Actinobacteria; 1 1 0 0 Actinomycetales; Actinosynnemataceae; sf_l

Bacteria; Actinobacteria; Actinobacteria; 1 4 0.998 0.00 0.00 0 1 0 Actinomycetales; Brevibacteriaceae; sf l

Bacteria; Actinobacteria; Actinobacteria; 1 2 0.981 1.20 1.08 0 1 0 Actinomycetales; Cellulomonadaceae; sf l

Bacteria; Actinobacteria; Actinobacteria; 1 1 0 0 Actinomycetales; Corynebacteriaceae; sf l

Bacteria; Actinobacteria; Actinobacteria; 1 2 0.999 1.21 1.03 0 1 0 Actinomycetales; Dermabacteraceae; sf_l

Bacteria; Actinobacteria; Actinobacteria; 1 1 0 0

Actinomycetales; Dermatophilaceae; sf l

Bacteria; Actinobacteria; Actinobacteria; 1 1 0 0

Actinomycetales; Dietziaceae; sf_l

Bacteria; Actinobacteria; Actinobacteria; 1 1 0 0

Actinomycetales; Frankiaceae; sf_l

Bacteria; Actinobacteria; Actinobacteria; 1 2 1.000 0.00 0.00 0 1 0

Actinomycetales; Geodermatophilaceae; sf l

Bacteria; Actinobacteria; Actinobacteria; 1 1 0 0

Actinomycetales; Gordoniaceae; sf l \

Bacteria; Actinobacteria; Actinobacteria; 1 10 0.999 1.20 1.18 0 1 0

Actinomycetales; Intrasporangiaceae; sf l

Bacteria; Actinobacteria; Actinobacteria; 1 1 0 0

Actinomycetales; Kineosporiaceae; sf_l

Bacteria; Actinobacteria; Actinobacteria; 1 4 0.999 0.00 0.00 0 1 0

Actinomycetales; Microbacteriaceae; sf_l

Bacteria; Actinobacteria; Actinobacteria; 1 2 0.985 1.26 1.15 0 1 0

Actinomycetales; Micrococcaceae; sf l

Bacteria; Actinobacteria; Actinobacteria; 1 3 1.000 1.27 1.20 0 1 0

Actinomycetales; Micromonosporaceae; sf_l

Bacteria; Actinobacteria; Actinobacteria; 1 1 0 0

Actinomycetales; Mycobacteriaceae; sf_l

Bacteria; Actinobacteria; Actinobacteria; 1 1 0.999 0.00 0.00 0 1 0

Actinomycetales; Nocardiaceae; sf l

Bacteria; Actinobacteria; Actinobacteria; 1 4 0.994 1.16 1.07 0 1 0

Actinomycetales; Nocardioidaceae; sf_l

Bacteria; Actinobacteria; Actinobacteria; 1 1 1.000 0.00 0.00 0 1 0

Actinomycetales; Nocardiopsaceae; sf_l . Bacteria; Actinobacteria; Actinobacteria; 1 1 0 0

Actinomycetales; Promicromonosporaceae; sf_l

Bacteria; Actinobacteria; Actinobacteria; 1 3 0.982 1.20 1.05 0 1 0

Actinomycetales; Propionibacteriaceae; sf_l

Bacteria; Actinobacteria; Actinobacteria; 1 3 0.999 1.14 1.1 1 0 1 0

Actinomycetales; Pseudonocardiaceae; sf_l

Bacteria; Actinobacteria; Actinobacteria; 1 1 0 0

Actinomycetales; Sporichthyaceae; sf_l

Bacteria; Actinobacteria; Actinobacteria; 1 3 0.998 1.30 1.14 0 1 0

Actinomycetales; Streptomycetaceae; sf l

Bacteria; Actinobacteria; Actinobacteria; 1 2 0.996 0.00 0.00 0 1 0

Actinomycetales; Streptomycetaceae; sf_3

Bacteria; Actinobacteria; Actinobacteria; 1 1 0 0

Actinomycetales; Streptosporangiaceae; sf l

Bacteria; Actinobacteria; Actinobacteria; 1 ' 1 0 0

Actinomycetales; Thermomonosporaceae; sf l

Bacteria; Actinobacteria; Actinobacteria; 1 1 0 0

Actinomycetales; Unclassified; sf_3

Bacteria; Actinobacteria; Actinobacteria; 0 1 0.987 1.18 1.12 0 0 1

Actinomycetales; Williamsiaceae; sf_l

Bacteria; Actinobacteria; Actinobacteria; 1 1 0 0

Bifidobacteriales; Bifidobacteriaceae; sf l

Bacteria; Actinobacteria; Actinobacteria; 1 13 0.990 1.56 1.05 0 1 0

Rubrobacterales; Rubrobacteraceae; sf l

Bacteria; Actinobacteria; Actinobacteria; 1 λ 1 0 0

Unclassified; Unclassified; sf l

Bacteria; Aquificae; Aquificae; Aquifϊcales; Hydrogenothermaceae; sf l

Bacteria; BRCl ; Unclassified; Unclassified; Unclassified; sf_2

Bacteria; Bacteroidetes; Bacteroidetes; Bacteroidales; Porphyromonadaceae; sf l

Bacteria; Bacteroidetes; Bacteroidetes; Bacteroidales; Prevotellaceae; sf l

Bacteria; Bacteroidetes; Bacteroidetes; Bacteroidales; Rikenellaceae; sf_5

Bacteria; Bacteroidetes; Bacteroidetes; Bacteroidales; Unclassified; sf l 5

Bacteria; Bacteroidetes; Flavobacteria; 0.943 0.00 0.00 Flavobacteriales; Blattabacteriaceae; sf_l

Bacteria; Bacteroidetes; Flavobacteria; Flavobacteriales; Flavobacteriaceae; sf_l

Bacteria; Bacteroidetes; Flavobacteria; Flavobacteriales; Unclassified; sf_3

Bacteria; Bacteroidetes; KSAl ; Unclassified; Unclassified; sf l

Bacteria; Bacteroidetes; Sphingobacteria; 0.973 1.22 1.07 Sphingobacteriales; Crenotrichaceae; sf_l 1

Bacteria; Bacteroidetes; Sphingobacteria; Sphingobacteriales; Flammeovirgaceae; sf_5

Bacteria; Bacteroidetes; Sphingobacteria; Sphingobacteriales; Flexibacteraceae; sf_19

Bacteria; Bacteroidetes; Sphingobacteria;

Sphingobacteriales; Sphingobacteriaceae; sf_l

Bacteria; Bacteroidetes; Sphingobacteria; Sphingobacteriales; Unclassified; sf_3

Bacteria; Bacteroidetes; Sphingobacteria; Sphingobacteriales; Unclassified; sf_6

Bacteria; Bacteroidetes; Unclassified; Unclassified; Unclassified; sf_4

Bacteria; Caldithrix; Unclassified; Caldithrales; Caldithraceae; sf_l

Bacteria; Caldithrix; Unclassified; Caldithrales; Caldithraceae; sf_2

Bacteria; Chlamydiae; Chlamydiae;

Chlamydiales; Chlamydiaceae; sf l

Bacteria; Chlorobi; Chlorobia; Chlorobiales; Chlorobiaceae; sf_l

Bacteria; Chlorobi; Unclassified; Unclassified; Unclassified; sf_l

Bacteria; Chlorobi; Unclassified; Unclassified; Unclassified; sf_6

Bacteria; Chlorobi; Unclassified; Unclassified; Unclassified; sf_9

Bacteria; Chloroflexi; Anaerolineae; 0.992 0.00 0.00

Chloroflexi-la; Unclassified; sf l

Bacteria; Chloroflexi; Anaerolineae;

Chloroflexi- Ib; Unclassified; sf_2

Bacteria; Chloroflexi; Anaerolineae;

Unclassified; Unclassified; sf_9

Bacteria; Chloroflexi; Chloroflexi-3; 1

Roseiflexales; Unclassified; sf_5

Bacteria; Chloroflexi; Dehalococcoidetes; 1

Unclassified; Unclassified; sf_l

Bacteria; Chloroflexi; Unclassified; 1

Unclassified; Unclassified; sf_12

Bacteria; Coprothermobacteria; Unclassified; 1

Unclassified; Unclassified; sf l

Bacteria; Cyanobacteria; Cyanobacteria; 1

Chloroplasts; Chloroplasts; sf l 1

Bacteria; Cyanobacteria; Cyanobacteria; 1 0.995 0.00 0.00

Chloroplasts; Chloroplasts; sf_5

Bacteria; Cyanobacteria; Cyanobacteria; 1

Chroococcales; Unclassified; sf_l

Bacteria; Cyanobacteria; Cyanobacteria; 0 0.954 1.09 1.12

Chroococcidiopsis; Unclassified; sf_l

Bacteria; Cyanobacteria; Cyanobacteria; 1

Leptolyngbya; Unclassified; sf l

Bacteria; Cyanobacteria; Cyanobacteria; 1

Nostocales; Unclassified; sf_l

Bacteria; Cyanobacteria; Cyanobacteria; 1

Oscillatoriales; Unclassified; sf_l

Bacteria; Cyanobacteria; Cyanobacteria; 1

Phormidium; Unclassified; sf l

Bacteria; Cyanobacteria; Cyanobacteria; 1

Plectonema; Unclassified; sf_l

Bacteria; Cyanobacteria; Cyanobacteria; 1

Prochlorales; Unclassified; sf_l

Bacteria; Cyanobacteria; Cyanobacteria; 1 1 0 0 Pseudanabaena; Unclassified; sf_l

Bacteria; Cyanobacteria; Cyanobacteria; 1 1 0 0 Spirulina; Unclassified; sf_l

Bacteria; Cyanobacteria; Unclassified; 1 1 0 0 Unclassified; Unclassified; sf_5

Bacteria; Cyanobacteria; Unclassified; 1 1 0 0 Unclassified; Unclassified; sf_8

Bacteria; Cyanobacteria; Unclassified; 1 1 0 0 Unclassified; Unclassified; sf_9

Bacteria; DSSl ; Unclassified; Unclassified; 1 1 0 0 Unclassified; sf_2

Bacteria; Deinococcus-Thermus; Unclassified; 1 1 0 0 Unclassified; Unclassified; sf_l

Bacteria; Deinococcus-Thermus; Unclassified; 0 1 0.993 1.19 1.05 0 0 1 Unclassified; Unclassified; sf_3

Bacteria; Firmicutes; Bacilli; Bacillales; 1 2 0.963 1.14 1.15 0 1 0 Alicyclobacillaceae; sf_l

Bacteria; Firmicutes; Bacilli; Bacillales; 1 151 1.000 1.37 1.23 0 1 0 Bacillaceae; sf_l

Bacteria; Firmicutes; Bacilli; Bacillales; 1 6 0.997 1.15 1.07 0 1 0 Halobacillaceae; sf l

Bacteria; Firmicutes; Bacilli; Bacillales; 1 14 0.999 1.19 1.07 0 1 0 Paenibacillaceae; sf l

Bacteria; Firmicutes; Bacilli; Bacillales; 1 2 0.999 1.12 1.04 0 1 0 Sporolactobacillaceae; sf_l

Bacteria; Firmicutes; Bacilli; Bacillales; 1 6 0.999 1.30 1.06 0 1 0 Staphylococcaceae; sf l

Bacteria; Firmicutes; Bacilli; Bacillales; 1 6 0.999 1.15 1.09 0 1 0 Thermoactinomycetaceae; sf l

Bacteria; Firmicutes; Bacilli; Exiguobacterium; 0 1 0.998 0.00 0.00 0 0 1 Unclassified; sf_l

Bacteria; Firmicutes; Bacilli; Lactobacillales; 1 6 0.998 1.23 1.26 0 1 0 Aerococcaceae; sf l

Bacteria; Firmicutes; Bacilli; Lactobacillales; 1 1 0 0 Carnobacteriaceae; sf_l

Bacteria; Firmicutes; Bacilli; Lactobacillales; 1 3 0.999 1.32 1.08 0 1 0 Enterococcaceae; sf_l

Bacteria; Firmicutes; Bacilli; Lactobacillales; 1 1 0 0 Lactobacillaceae; sf_l

Bacteria; Firmicutes; Bacilli; Lactobacillales; 1 1 0 0 Leuconostocaceae; sf_l

Bacteria; Firmicutes; Bacilli; Lactobacillales; 1 1 0 0 Streptococcaceae; sf_l

Bacteria; Firmicutes; Bacilli; Lactobacillales; 1 1 0 0 Unclassified; sf_l

Bacteria; Firmicutes; Catabacter; Unclassified; 1 1 0 0 Unclassified; sf_l

Bacteria; Firmicutes; Catabacter; Unclassified; 1 1 0.954 0.00 0.00 0 1 0 Unclassified; sf_4

Bacteria; Firmicutes; Clostridia; Clostridiales; 1 14 0.998 1.45 1.15 0 1 0 Clostridiaceae; sf_12

Bacteria; Firmicutes; Clostridia; Clostridiales; 1 1 0 0

Eubacteriaceae; sf l

Bacteria; Firmicutes; Clostridia; Clostridials; 1 2 0.990 1.12 1.12 0 1 0 Lachnospiraceae; sf_5

Bacteria; Firmicutes; Clostridia; Clostridials; 1 4 0.980 1.12 1.16 0 1 0 Peptococc/Acidaminococc; sf l 1

Bacteria; Firmicutes; Clostridia; Clostridials; 1 1 0.976 1.21 1.04 0 1 0 Peptostreptococcaceae; sf_5

Bacteria; Firmicutes; Clostridia; Clostridiales; 1 1 0 0 Syntrophomonadaceae; sf_5

Bacteria; Firmicutes; Clostridia; Clostridiales; 1 1 0 0 Unclassified; sf_17

Bacteria; Firmicutes; Clostridia; Unclassified; 1 1 0 0 Unclassified; sf_3

Bacteria; Firmicutes; Desulfotomaculum; 1 3 0.984 1.14 1.04 0 1 0 Unclassified; Unclassified; sf_l

Bacteria; Firmicutes; Mollicutes; 1 - 1 0 0

Acholeplasmatales; Acholeplasmataceae; sf l

Bacteria; Firmicutes; Symbiobacteria; 1 1 0 0 Symbiobacterales; Unclassified; sf_l

Bacteria; Firmicutes; Unclassified; 1 1 0 0

Unclassified; Unclassified; sf_8

Bacteria; Firmicutes; gut clone group; 1 1 0 0 Unclassified; Unclassified; sf_l

Bacteria; Gemmatimonadetes; Unclassified; 1 1 0 0 Unclassified; Unclassified; sf_5

Bacteria; Natronoanaerobium; Unclassified; 1 1 0 0 Unclassified; Unclassified; sf_l

Bacteria; Nitrospira; Nitrospira; Nitrospirales; Nitrospiraceae; sf_l

Bacteria; ODl ; OPl 1-5; Unclassified; Unclassified; sf_l

Bacteria; 0P8; Unclassified; Unclassified; Unclassified; sf_3

Bacteria; Planctomycetes; Planctomycetacia; Planctomycetales; Anammoxales; sf_2

Bacteria; Planctomycetes; Planctomycetacia; Planctomycetales; Anammoxales; sf_4

Bacteria; Planctomycetes; Planctomycetacia; Planctomycetales; Pirellulae; sf_3

Bacteria; Planctomycetes; Planctomycetacia; Planctomycetales; Planctomycetaceae; sf_3

Bacteria; Proteobacteria; Alphaproteobacteria; 0.943 0.00 0.00 Acetobacterales; Acetobacteraceae; sf_l

Bacteria; Proteobacteria; Alphaproteobacteria; 0.980 1.24 1.17 Acetobacterales; Roseococcaceae; sf_l

Bacteria; Proteobacteria; Alphaproteobacteria; 0.947 1.12 1.10 Azospirillales; Azospirillaceae; sf_l

Bacteria; Proteobacteria; Alphaproteobacteria; Azospirillales; Magnetospirillaceae; sf l

Bacteria; Proteobacteria; Alphaproteobacteria; Azospirillales; Unclassified; sf_l

Bacteria; Proteobacteria; Alphaproteobacteria; 0.951 1.13 1.08 Bradyrhizobiales; Beijerinck/Rhodoplan/Methylocyst; sf_3

Bacteria; Proteobacteria; Alphaproteobacteria; Bradyrhizobiales; Bradyrhizobiaceae; sf l

Bacteria; Proteobacteria; Alphaproteobacteria; Bradyrhizobiales; Hyphomicrobiaceae; sf_l

Bacteria; Proteobacteria; Alphaproteobacteria; 0.999 0.00 0.00 Bradyrhizobiales; Methylobacteriaceae; sf l

Bacteria; Proteobacteria; Alphaproteobacteria; 0.982 1.15 1.11 Bradyrhizobiales; Unclassified; sf l

Bacteria; Proteobacteria; Alphaproteobacteria; Bradyrhizobiales; Xanthobacteraceae; sf_l

Bacteria; Proteobacteria; Alphaproteobacteria; 0.968 0.00 0.00 Caulobacterales; Caulobacteraceae; sf_l

Bacteria; Proteobacteria; Alphaproteobacteria; 0.951 0.00 0.00 Consistiales; Caedibacteraceae; sf_3

Bacteria; Proteobacteria; Alphaproteobacteria; Consistiales; Caedibacteraceae; sf_4

Bacteria; Proteobacteria; Alphaproteobacteria; Consistiales; Caedibacteraceae; sf_5

Bacteria; Proteobacteria; Alphaproteobacteria; Consistiales; Unclassified; sf_4

Bacteria; Proteobacteria; Alphaproteobacteria; 0.976 1.18 1.05 Devosia; Unclassified; sf_l

Bacteria; Proteobacteria; Alphaproteobacteria; Ellin314/wr0007; Unclassified; sf_l

Bacteria; Proteobacteria; Alphaproteobacteria; Ellin329/RizlO46; Unclassified; sf_l

Bacteria; Proteobacteria; Alphaproteobacteria;

Fulvimarina; Unclassified; sf_l

Bacteria; Proteobacteria; Alphaproteobacteria; Rhizobiales; Bartonellaceae; sf l

Bacteria; Proteobacteria; Alphaproteobacteria; Rhizobiales; Beijerinck/Rhodoplan/Methylocyst; sf_l

Bacteria; Proteobacteria; Alphaproteobacteria; Rhizobiales; Bradyrhizobiaceae; sf_l

Bacteria; Proteobacteria; Alphaproteobacteria; Rhizobiales; Brucellaceae; sf l

Bacteria; Proteobacteria; Alphaproteobacteria; Rhizobiales; Hyphomicrobiaceae; sf l

Bacteria; Proteobacteria; Alphaproteobacteria; Rhizobiales; Phyllobacteriaceae; sf l

Bacteria; Proteobacteria; Alphaproteobacteria; 0.981 1.27 1.26 Rhizobiales; Rhizobiaceae; sf_l

Bacteria; Proteobacteria; Alphaproteobacteria; Rhizobiales; Unclassified; sf_l

Bacteria; Proteobacteria; Alphaproteobacteria; Rhodobacterales; Hyphomonadaceae; sf_l

Bacteria; Proteobacteria; Alphaproteobacteria; 0.985 1.13 1.1 1 Rhodobacterales; Rhodobacteraceae; sf_l

Bacteria; Proteobacteria; Alphaproteobacteria; Rickettsiales; Anaplasmataceae; sf_3

Bacteria; Proteobacteria; Alphaproteobacteria; Rickettsiales; Rickettsiaceae; sf_l

Bacteria; Proteobacteria; Alphaproteobacteria;

Rickettsiales; Unclassified; sf_2

Bacteria; Proteobacteria; Alphaproteobacteria; 1 99 00..999944 11..2233 11..1100 00 11 00

Sphingomonadales; Sphingomonadaceae; sf l

Bacteria; Proteobacteria; Alphaproteobacteria; 1 6 0.990 1.13 1.06 0 1 0

Sphingomonadales; Sphingomonadaceae; sf_15

Bacteria; Proteobacteria; Alphaproteobacteria; O 3 0.997 1.20 1.08 0 0 1

Sphingomonadales; Unclassified; sf_l

Bacteria; Proteobacteria; Alphaproteobacteria; 1 1 0.954 0.00 0.00 0 1 0

Unclassified; Unclassified; sf_6

Bacteria; Proteobacteria; Betaproteobacteria; 1 3 1.000 1.35 1.07 0 1 0

Burkholderiales; Alcaligenaceae; sf l

Bacteria; Proteobacteria; Betaproteobacteria; 1 12 1.000 0.00 0.00 0 1 0

Burkholderiales; Burkholderiaceae; sf_l

Bacteria; Proteobacteria; Betaproteobacteria; 1 1 0 0

Burkholderiales; Comamonadaceae; sf_l

Bacteria; Proteobacteria; Betaproteobacteria; 1 2 0.996 0.00 0.00 0 1 0

Burkholderiales; Oxalobacteraceae; sf l

Bacteria; Proteobacteria; Betaproteobacteria; 1 1 0 0

Burkholderiales; Ralstoniaceae; sf_l

Bacteria; Proteobacteria; Betaproteobacteria; 1 1 0 0

MNDl clone group; Unclassified; sf_l

Bacteria; Proteobacteria; Betaproteobacteria; 1 1. 0 0

Methylophilales; Methylophilaceae; sf_l Bacteria; Proteobacteria; Betaproteobacteria; 1 1 0 0 „

Neisseriales; Unclassified; sf_l

Bacteria; Proteobacteria; Betaproteobacteria; 1 11 O0 O0

Nitrosomonadales; Nitrosomonadaceae; sf 1

Bacteria; Proteobacteria; Betaproteobacteria; Rhodocyclales; Rhodocyclaceae; sf_l

Bacteria; Proteobacteria; Betaproteobacteria; Unclassified; Unclassified; sf_3

Bacteria; Proteobacteria; Deltaproteobacteria; AMD clone group; Unclassified; sf_l

Bacteria; Proteobacteria; Deltaproteobacteria; Bdellovibrionales; Unclassified; sf_l

Bacteria; Proteobacteria; Deltaproteobacteria; Desulfobacterales; Desulfobulbaceae; sf l

Bacteria; Proteobacteria; Deltaproteobacteria; Desulfobacterales; Nitrospinaceae; sf_2

Bacteria; Proteobacteria; Deltaproteobacteria; Desulfobacterales; Unclassified; sf_4

Bacteria; Proteobacteria; Deltaproteobacteria; Desulfovibrionales; Desulfohalobiaceae; sf_l

Bacteria; Proteobacteria; Deltaproteobacteria; Desulfovibrionales; Desulfovibrionaceae; sf l

Bacteria; Proteobacteria; Deltaproteobacteria; Desulfovibrionales; Unclassified; sf_l

Bacteria; Proteobacteria; Deltaproteobacteria; EB 1021 group; Unclassified; sf_4

Bacteria; Proteobacteria; Deltaproteobacteria; 0.974 0.00 0.00 Myxococcales; Myxococcaceae; sf_l

Bacteria; Proteobacteria; Deltaproteobacteria; Myxococcales; Polyangiaceae; sf_3

Bacteria; Proteobacteria; Deltaproteobacteria;

Myxococcales; Unclassified; sf_l

Bacteria; Proteobacteria; Deltaproteobacteria; Syntrophobacterales; Syntrophobacteraceae; sf_l

Bacteria; Proteobacteria; Deltaproteobacteria; Unclassified; Unclassified; sf_9

Bacteria; Proteobacteria; Deltaproteobacteria; dechlorinating clone group; Unclassified; sf_l

Bacteria; Proteobacteria; Epsilonproteobacteria; Campy lobacterales; Campylobacteraceae; sf_3

Bacteria; Proteobacteria; Epsilonproteobacteria; Campylobacterales; Helicobacteraceae; sf_3

Bacteria; Proteobacteria; Epsilonproteobacteria; Campylobacterales; Unclassified; sf_l

Bacteria; Proteobacteria; Gammaproteobacteria; Aeromonadales; Aeromonadaceae; sf_l

Bacteria; Proteobacteria; Gammaproteobacteria; Alteromonadales; Alteromonadaceae; sf_l

Bacteria; Proteobacteria; Gammaproteobacteria; Alteromonadales; Pseudoalteromonadaceae; sf_l

Bacteria; Proteobacteria; Gammaproteobacteria; Alteromonadales; Unclassified; sf_l

Bacteria; Proteobacteria; Gammaproteobacteria; Chromatiales; Chromatiaceae; sf_l

Bacteria; Proteobacteria; Gammaproteobacteria; Chromatiales; Ectothiorhodospiraceae; sf_l

Bacteria; Proteobacteria; Gammaproteobacteria; Chromatiales; Unclassified; sf l

Bacteria; Proteobacteria; Gammaproteobacteria; Ellin307/WD2124; Unclassified; sf_l

Bacteria; Proteobacteria; Gammaproteobacteria; 0.995 1.12 1.04 Enterobacteriales; Enterobacteriaceae; sf_l

Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; sf_6

Bacteria; Proteobacteria; Gammaproteobacteria; GAO cluster; Unclassified; sf l

Bacteria; Proteobacteria; Gammaproteobacteria; Legionellales; Coxiellaceae; sf_3

Bacteria; Proteobacteria; Gammaproteobacteria; Legionellales; Unclassified; sf_l

Bacteria; Proteobacteria; Gammaproteobacteria; Legionellales; Unclassified; sf_3

Bacteria; Proteobacteria; Gammaproteobacteria; Methylococcales; Methylococcaceae; sf l

Bacteria; Proteobacteria; Gammaproteobacteria; Oceanospirillales; Alcanivoraceae; sf_l

Bacteria; Proteobacteria; Gammaproteobacteria; Oceanospirillales; Halomonadaceae; sf_l

Bacteria; Proteobacteria; Gammaproteobacteria; Oceanospirillales; Unclassified; sf_3

Bacteria; Proteobacteria; Gammaproteobacteria; Pasteurellales; Pasteurellaceae; sf l

Bacteria; Proteobacteria; Gammaproteobacteria; 0.996 1.16 1.10

Pseudomonadales; Moraxellaceae; sf 3

Bacteria; Proteobacteria; Gammaproteobacteria; 0.998 1.18 1.03 Pseudomonadales; Pseudomonadaceae; sf l

Bacteria; Proteobacteria; Gammaproteobacteria; SUP05; Unclassified; sf_l

Bacteria; Proteobacteria; Gammaproteobacteria; Shewanella; Unclassified; sf_l

Bacteria; Proteobacteria; Gammaproteobacteria; Symbionts; Unclassified; sf l

Bacteria; Proteobacteria; Gammaproteobacteria; Thiotrichales; Francisellaceae; sf_l

Bacteria; Proteobacteria; Gammaproteobacteria; Thiotrichales; Piscirickettsiaceae; sf_3

Bacteria; Proteobacteria; Gammaproteobacteria; Thiotrichales; Thiotrichaceae; sf_3

Bacteria; Proteobacteria; Gammaproteobacteria; Unclassified; Unclassified; sf_3

Bacteria; Proteobacteria; Gammaproteobacteria; 0.997 0.00 0.00 Xanthomonadales; Xanthomonadaceae; sf_3

Bacteria; Proteobacteria; Gammaproteobacteria; aquatic clone group; Unclassified; sf_l

Bacteria; Proteobacteria; Gammaproteobacteria; uranium waste clones; Unclassified; sf l

Bacteria; Proteobacteria; Unclassified; Unclassified; Unclassified; sf_20

Bacteria; Spirochaetes; Spirochaetes;

Spirochaetales; Leptospiraceae; sf_3

Bacteria; Spirochaetes; Spirochaetes;

Spirochaetales; Spirochaetaceae; sf l

Bacteria; Spirochaetes; Spirochaetes; Spirochaetales; Spirochaetaceae; sf_3

Bacteria; TM7; TM7-3; Unclassified; Unclassified; sf_l

Bacteria; TM7; Unclassified; Unclassified; Unclassified; sf_l

Bacteria; Verrucomicrobia; Unclassified; Unclassified; Unclassified; sf_4

Bacteria; Verrucomicrobia; Unclassified; Unclassified; Unclassified; sf_5

Bacteria; Verrucomicrobia; Verrucomicrobiae; Verrucomicrobiales; Unclassified; sf_3

Bacteria; Verrucomicrobia; Verrucomicrobiae; Verrucomicrobiales; Verrucomicrobia subdivision 5; sf_l

Bacteria; Verrucomicrobia; Verrucomicrobiae; Verrucomicrobiales; Verrucomicrobiaceae; sf_6

Bacteria; Verrucomicrobia; Verrucomicrobiae; Verrucomicrobiales; Verrucomicrobiaceae; sf_7

Bacteria; WS3; Unclassified; Unclassified; Unclassified; sf l

Bacteria; marine group A; mgA-1 ; Unclassified; Unclassified; sf_l

Bacteria; marine group A; mgA-2; Unclassified; Unclassified; sf l

Totals 238 67 178 60

Array Clone Array Array Clone subsubonly and only families families sub- clone subfamilie sub- families families

[0079] Table S3. Confirmation of array sub-family detections by taxon-specific PCR and sequencing. Tm = Melting temperature; Ta = Optimal annealing temperature used in PCR reaction.

Genbank Sub-family (sf) verified Closest BLAST homolog SEQ Primer Sequences (5' to 3') Tm Ta accession

GenBank accession number ID number of retrieved (% identity) NO. sequence

DQ236248 Actinobacteria, Actinokineospora diospyrosa, For - ACCAAGGCTACGACGGGTA 60.5 67.0

Actinosynnemataceae, sf_l AFl 14797 (94.3%)

Rev - ACACACCGCATGTCAAACC 60.4

DQ515230 Actinobacteria, Bifidobacterium adolescentis, For - GGGTGGTAATGCCSGATG 60.0 62.0

Bifidobacteriaceae, sf 1 AF275881 (99.6 %)

Rev - CCRCCGTTACACCGGGAA 64.0

DQ236245 Actinobacteria, Kineosporiaceae, Actinomycetaceae SR 1 1, For- CAATGGACTCAAGCCTGATG 53.5 53.0 s f - J X87617 (97.7%)

10 Rev- CTCTAGCCTGCCCGTTTC 53.9

DQ236250 Chloroflexi, Anaerolineae, sf_9 penguin droppings clone KD4-96, 1 1 For - GAGAGGATGATCAGCCAG 54.0 61.7

AY218649 (90%)

12 Rev - 57.0

TACGGYTACCTTGTTACGACTT

DQ236247 Cyanobacteria, Geitlerinema, sf_l Geitlehnema sp. PCC 7105, 13 For - 62.2 55.0

TCCGTAGGTGGCTGTTCAAGTCTG

AB039010 (89.3%)

14 Rev - 61.7

GCTTTCGTCCCTCAGTGTCAGTTG

DQ236246 Cyanobacteria, Thermosynechococcus elongatus 15 For 58.7 55.0

Thermosynechococcus, sf_l BP-I, TGTCGTGAGATGTTGGGTTAAGTC

BA000039 (96.0%)

16 Rev 58.8

TGAGCCGTGGTTTAAGAGATTAGC

DQ129654 Gammaproteobacteria, Pseudoalteromonas sp. S51 1 -1 , 17 For - GCCTCACGCCAT AAGATT AG 53.1 50.0

Pseudoaltermonadaceae, sf_l AB02 9824 (99.1%)

18 Rev - 53.0

GTGCTTTCTTCTGTAAGTAACG

DQ 129656 Nitrospira, Nitrospiraceae, sf l Nitrospira moscoviensis, 19 For - TCGAAAAGCGTGGGG 57.6 47.0 X82558 (98.5%)

20 Rev - CTTCCTCCCCCGTTC 54.4

DQ 129666 Planctomycetes, Planctomyces brasiliensis, 21 For - GAAACTGCCCAGACAC 50.0 60.0

Plantomycetaceae, sf_3 AJ231 190 (94%)

22 Rev - AGTAACGTTCGCACAG 48.0

DQ515231 Proteobacteria, Uncultured Arcobacter sp. clone 23 For - GGATGACACTTTTCGGAG 54.0 48.0

Campylobacteraceae, sf_3 DS017,

DQ234101 (98 %)

24 Rev - AATTCCATCTGCCTCTCC 55.0

DQ 129662 Spirochaetes, Leptospiracea, sf_3 Leptospira borgpetersenii, 25 For - GGCGGCGCGTTTTAAGC 57.0 58.7

X 17547 (90.9%)

DQl 29661 Spirochaetes, Spirochaetaceae, Spirochaeta asiatica, 26 Rev - ACTCGGGTGGTGTGACG 57.0 s f - 1 X93926 (90.0%)

DQ 129660 Spirochaetes, Spirochaetaceae, Borrelia hermsii sf 3 M72398 (91.0 %)

DQ236249 TM7, TM7-3, sf_l oral clone EW096, 27 For - AYTGGGCGTAAAGAGTTGC 58.0 66.3 AY349415 (88.8%)

28 Rev - 57.0

TACGGYTACCTTGTTACGACTT

[0080] Table S4. Bacteria and Archaea used for Latin square hybridization assays.

Organism Phylum/Sub-phylum ATCC

Arthrobacter oxydans Actinobacteria 14359 a Bacillus anthracis AMES pXOl- pX02- Firmicutes b Caulobacter crescentus CB 15 Alpha-proteobacteria 19089 Dechloromonas agitata CKB Beta-proteobacteria 700666 c Dehalococcoides ethenogenes 195 Chloroflexi _ d Desulfovibrio vulgaris Hildenborough Delta-proteobacteria 29579 e Francisella tularensis Gamma-proteobacteria 6223 Geobacter metallireducens GS- 15 Delta-proteobacteria 53774 c Geothrix fermentans H-5 Acidobacteria 700665 c Sulfolobus solfataricus Crenarchaeota 35092 a Stain obtained from Hoi-Ying Holman, LBNL. Strain obtained from Arthur Friedlander USAMRID. c Strain obtained from John Coates, UC Berkeley. d Strain obtained from Lisa Alvarez-Cohen, UC Berkeley. e Strain obtained from Terry Hazen, LBNL.

[0081] Table S5. Correlations between environmental/temporal parameters. Underlined font indicates a significant positive correlation, while italic font indicates a significant negative correlation at a 95% confidence interval.

[0082] Table S6. Sub-families detected in Austin or San Antonio correlating significantly with environmental parameters. a P- value is adjusted for multiple comparisons using false discovery rate controlling procedure (S 18). All of the below are in the Domain of Bacteria

Phylum Class Order Family Sub- taxon and representative Environ, Correl. P BH family organism name factor Coeff. adjusted value p. value a

Actinobact Actinobacteria Actinomycetales Unclassified sf_3 max

1114 clone PENDANT-38 0.64 4.05E-05 2.49E-02 eria TEMP

Actinobact Actinobacteria Actinomycetales Unclassified sf_3 mean

1 114 clone PENDANT-38 0.66 2.16E-05 2.01E-02 eria TEMP

Actinobact Actinobacteria Actinomycetales Unclassified sf_3 week

1114 clone PENDANT-38 0.63 6.73E-05 3.18E-02 eria

Actinobact Actinobacteria Actinomycetales Gordoniacea sf_l week

1116 Gordona terrae 0.61 1.18E-04 3.68E-02 eria e

Actinobact Actinobacteria Actinomycetales Actinosynne sf_l Actinokineospora max eria mataceae 1 125 diospyrosa str. NRRL B- TEMP 0.6 1.53E-04 4.30E-02

24047T

Actinobact Actinobacteria Actinomycetales Actinosynne sf_l Actinokineospora week eria mataceae 1125 diospyrosa str. NRRL B- 0.63 7.42E-05 3.38E-02

24047T

Actinobact Actinobacteria Actinomycetales Streptomyce sf_l 1 1 1198 Streptomyces sp. str. week δO 0.7 3.75E-06 1.18E-02 eria taceae YIM 80305

Actinobact Actinobacteria Actinomycetales Sporichthya sf_l mean

1223 Sporichthya polymorpha 0.61 1.42E-04 4.21E-02 eria ceae TEMP

Actinobact Actinobacteria Actinomycetales Sporichthya SfJ min eria ceae 1223 Sporichthya polymorpha MINTE 0.61 1.50E-04 4.27E-02

MP

Actinobact Actinobacteria Actinomycetales Sporichthya sf_l week

1223 Sporichthya polymorpha 0.7 4.39E-06 1.18E-02 eria ceae

Actinobact Actinobacteria Actinomycetales Microbacteri sf_l 1 1 / ")(OLHλ Waste-gas biofilter clone mean 0.61 1.47E-04 4.25E-02 eria aceae BIhi33 TEMP

Actinobact Actinobacteria Actinomycetales Microbacteri SfJ 1 1 7 Z. AO-τI Waste-gas biofilter clone week - 0.69 7.62E-06 1.18E-02 eria aceae BIhi33

Actinobact Actinobacteria Actinomycetales Streptomyce sfj max

1344 Streptomyces species 0.64 5.42E-05 2.84E-02 eria taceae TEMP

Actinobact Actinobacteria Actinomycetales Streptomyce SfJ mean

1344 Streptomyces species 0.62 9.56E-05 3.63E-02 eria taceae TEMP

Actinobact Actinobacteria Actinomycetales Thermomon SfJ week

1406 Actinomadura kijaniata 0.65 2.91E-05 2.29E-02 eria osporaceae

Actinobact Actinobacteria Actinomycetales Kineosporia SfJ 1424 Actinomycetaceae SR max

0.6 1.70E-04 4.59E-02 eria ceae 139 VISIB

Actinobact Actinobacteria Actinomycetales Kineosporia SfJ Actinomycetaceae SR week

1424 0.62 8.03E-05 3.50E-02 eria ceae 139

Actinobact Actinobacteria Actinomycetales Intrasporang sfj Ornithinimicrobium week eria iaceae 1445 humiphilum str. DSM 0.62 9.46E-05 3.63E-02

12362 HKI 124

Actinobact Actinobacteria Actinomycetales Unclassified sf_3 uncultured human oral week

IJ 14 0.69 7.08E-06 1.18E-02 eria bacterium Al l

Actinobact Actinobacteria Actinomycetales Pseudonocar sfj Pseudonocardia max eria diaceae 1530 thermophila str. IMSNU TEMP 0.64 5.10E-05 2.79E-02

20112T

Actinobact Actinobacteria Actinomycetales Pseudonocar sfj Pseudonocardia mean eria diaceae 1530 thermophila str. IMSNU TEMP 0.66 1.99E-05 1.97E-02

201 12T

Actinobact Actinobacteria Actinomycetales Pseudonocar sfj Pseudonocardia min eria diaceae 1530 thermophila str. IMSNU MINTE 0.61 1.10E-04 3.63E-02

20112T MP

Actinobact Actinobacteria Actinomycetales Pseudonocar sfj Pseudonocardia min eria diaceae 1530 thermophila str. IMSNU TEMP 0.6 1.82E-04 4.73E-02

20112T

Actinobact Actinobacteria Actinomycetales Pseudonocar sfj Pseudonocardia week eria diaceae 1530 thermophila str. IMSNU 0.73 1.15E-06 5.92E-03

20112T

Actinobact Actinobacteria Actinomycetales Cellulomona sfj 1 Lake Bogoria isolate week

1 S JQy"Z) 0.61 1.15E-04 3.63E-02 eria daceae 69B4

Actinobact Actinobacteria Actinomycetales Corynebacte sfj max

1642 Corynebacterium otitidis 0.62 8.87E-05 3.63E-02 eria riaceae TEMP

Actinobact Actinobacteria Actinomycetales Corynebacte sfj mean

1642 Corynebacterium otitidis 0.64 4.12E-05 2.49E-02 eria riaceae TEMP

Actinobact Actinobacteria Actinomycetales Corynebacte SfJ mm eria riaceae 1642 Corynebacterium otitidis MINTE 0.62 1.07E-04 3.63E-02

MP

Actinobact Actinobacteria Actinomycetales Corynebacte sf_l week

1642 Corynebacterium otitidis 0.63 5.53E-05 2.84E-02 eria riaceae

Actinobact Actinobacteria Actinomycetales Dermabacter Sf J Brachybacterium max eria aceae 1736 rhamnosum LMG 19848 TEMP 0.63 6.17E-05 3.09E-02

T 1

Actinobact Actinobacteria Actinomycetales Dermabacter Sf-J Brachybacterium mean eria aceae 1736 rhamnosum LMG 19848 TEMP 0.6 1.91E-04 4.90E-02

1

Actinobact Actinobacteria Actinomycetales Dermabacter SfJ Brachybacterium week eria aceae 1736 rhamnosum LMG 19848 0.64 4.47E-05 2.62E-02

1

Actinobact Actinobacteria Actinomycetales Streptomyce sf_3 Streptomyces scabiei str. week

1743 0.6 1.60E-04 4.38E-02 eria taceae DNK-GOl

Actinobact Actinobacteria Actinomycetales Nocardiacea sf_l Nocardia week

1746 0.66 2.48E-05 2.21E-02 eria e corynebacteroides

Actinobact Actinobacteria Actinomycetales Unclassified sf_3 French Polynesia: Tahiti max

1806 0.65 3.37E-05 2.29E-02 eria clone 23 TEMP

Actinobact Actinobacteria Actinomycetales Unclassified sf_3 French Polynesia: Tahiti mean

1806 0.66 1.97E-05 1.97E-02 eria clone 23 TEMP

Actinobact Actinobacteria Actinomycetales Micromonos sf_l Catellatospora subsp. max eria poraceae 1821 citrea str. IMSNU TEMP 0.61 1.10E-04 3.63E-02

22008T

Actinobact Actinobacteria Actinomycetales Micromonos sf_l Catellatospora subsp. mean eria poraceae 1821 citrea str. IMSNU MINTE 0.61 1.22E-04 3.72E-02

22008T MP

Actinobact Actinobacteria Actinomycetales Micromonos Sf J Catellatospora subsp. mean eria poraceae 1821 citrea str. IMSNU TEMP 0.67 1.76E-05 1.97E-02

22008T

Actinobact Actinobacteria Actinomycetales Micromonos sf_l Catellatospora subsp. min eria poraceae 1821 citrea str. IMSNU MINTE 0.7 4.92E-06 1.18E-02

22008T MP

Actinobact Actinobacteria Actinomycetales Micromonos sf_l Catellatospora subsp. min

1821 0.65 poraceae 2.68E-05 2.29E-02 eria citrea str. IMSNU TEMP

22008T

Actinobact Actinobacteria Actinomycetales Micromonos sf 1 Catellatospora subsp. week eria poraceae 1821 citrea str. IMSNU 0.62 8.24E-05 3.52E-02 22008T

Actinobact Actinobacteria Rubrobacterales Rubrobacter sf 1 min

Sturt arid-zone soil eria aceae 1892 MINTE 0.62 8.51E-05 3.56E-02 clone 0319-7H2

MP

Actinobact Actinobacteria Rubrobacterales Rubrobacter sf. 1 Sturt arid-zone soil week

1892 0.68 9.19E-06 1.18E-02 eria aceae clone 0319-7H2

Actinobact Actinobacteria Actinomycetales Actinosynne sf. 1 Saccharothrix tangerinus max 1984 0.68 8.01E-06 1.18E-02 eria mataceae str. MK27-91F2 TEMP

Actinobact Actinobacteria Actinomycetales Actinosynne sf_ 1 Saccharothrix tangerinus mean 1984 0.67 1.64E-05 1.97E-02 eria mataceae str. MK27-91F2 TEMP

Actinobact Actinobacteria Actinomycetales Actinosynne sf_ 1 Saccharothrix tangerinus week 1984 mataceae 0.7 3.54E-06 1.18E-02 eria str. MK27-91F2

Actinobact Actinobacteria Actinomycetales Nocardiacea sf_ 1 Rhodococcus fascians max 1999 e 0.61 1.14E-04 3.63E-02 eria str. DFA7 TEMP

Actinobact Actinobacteria Actinomycetales Propionibact sf_ 1 Propionibacterium week eria eriaceae 2023 propionicum str. DSM 0.62 9.19E-05 3.63E-02

43307T

Actinobact Actinobacteria Actinomycetales Streptospora sf_ 1 Nonomuraea terrinata week 2037 0.61 1.13E-04 3.63E-02 eria ngiaceae str. DSM 44505

Firmicutes Bacilli Bacillales Thermoactin sf_ 1 Thermoactinomyces range omycetaceae 3619 intermedius str. ATCC MINTE 0.65 3.41E-05 2.29E-02

33205T MP

Cyanobact Cyanobacteria Symploca Unclassified sf_ 1 Symploca atlantica str. week 5165 0.63 6.84E-05 3.18E-02 eria PCC 8002

Bacteroidet Sphingobacteri Sphingobacterial Crenotrichac sf 1 1 Austria: Lake mean es a es eae 5491 Gossenkoellesee clone TEMP 0.61 1.15E-04 3.63E-02

GKS2-106

Bacteroidet Sphingobacteri Sphingobacterial Crenotrichac sf l l Austria: Lake week es a es eae 5491 Gossenkoellesee clone 0.63 6.62E-05 3.18E-02

GKS2-106

Bacteroidet Sphingobacteri Sphingobacterial Flexibactera sf_19 Taxeobacter ocellatus week ceae 5866 0.62 1.08E-04 3.63E-02 es a es str. Myx2105

Bacteroidet Bacteroidetes Bacteroidales Prevotellace sf 1 6047 deep marine sediment week 0.62 9.52E-05 3.63E-02

es ae clone MB-A2- 107

Bacteroidet Sphingobacteri Sphingobacterial Crenotrichac sf_l l ή Bifissio spartinae str. max

O l 17 / 11 -0.62 es a es eae AS 1.1762 9.95E-05 3.63E-02

PM2.5

Bacteroidet Sphingobacteri Sphingobacterial Crenotrichac sf_l l ή io spartinae str. max

Oi 17 / 1 Bifiss 1 0.62 1.09E-04 3.63E-02 es a es eae ASl .1762 VISIB

Bacteroidet Sphingobacteri Sphingobacterial Crenotrichac sfj 1 Bifissio spartinae str. range es a es eae 2.86E-05 2.29E-02

Proteobact Alphaproteoba Sphingomonadale Sphingomon sfj 6 PCB-polluted soil clone mean

U8O0U8O 0.63 7.73E-05 3.45E-02 eria cteria S adaceae WD267 TEMP

Proteobact Alphaproteoba Sphingomonadale Sphingomon sf J AR il clone week

UOπUR PCB-polluted so O 0.69 5.54E-06 1.18E-02 eria cteria S adaceae WD267

Proteobact Alphaproteoba Sphingomonadale Sphingomon sfj min SLP

7132 Sphingomonas sp. KlOl 0.64 5.16E-05 2.79E-02 eria cteria S adaceae

Proteobact Alphaproteoba Sphingomonadale Sphingomon sfj week

7132 Sphingomonas sp. KlOl 0.75 2.74E-07 2.81E-03 eria cteria S adaceae

Proteobact Alphaproteoba Bradyrhizobiales Unclassified sfj Pleomorphomonas max

I δJ J 0.65 3.57E-05 2.29E-02 eria cteria oryzae str. B -32 TEMP

Proteobact Alphaproteoba Bradyrhizobiales Unclassified sfj 7?*> S Pleomorphomonas mean

0.64 4.62E-05 2.63E-02 eria cteria oryzae str. B -32 TEMP

Proteobact Alphaproteoba Sphingomonadale Sphingomon sfj week

7344 rhizosphere soil RS 1-21 0.68 8.96E-06 1.18E-02 eria cteria S adaceae

Proteobact Alphaproteoba Sphingomonadale Sphingomon sfj 741 1 Sphingomonas min SLP

0.66 eria cteria S adaceae adhaesiva 2.01E-05 1.97E-02

Proteobact Alphaproteoba Sphingomonadale Sphingomon sfj 7 /4t11 11 Sphingomonas week

0.74 6.42E-07 eria cteria S adaceae adhaesiva 4.39E-03

Proteobact Alphaproteoba Rhodobacterales Rhodobacter sfj mean

7527 clone CTD56B 0.61 1.44E-04 4.23E-02 eria cteria aceae TEMP

Proteobact Alphaproteoba Sphingomonadale Sphingomon sfj eria cteria S adaceae 4.38E-02

Proteobact Alphaproteoba Bradyrhizobiales Methylobact sfj eria cteria eriaceae 2.29E-02

Proteobact Alphaproteoba Bradyrhizobiales Methylobact sfj eria cteria eriaceae 3.63E-02

Proteobact Alphaproteoba Bradyrhizobiales Methylobact sf 1 1.18E-02

eπa cteria eriaceae organophilum

Proteobact Alphaproteoba Devosia Unclassified sf 1 week 7626 Devosia neptuniae str. Jl 0.6 1.80E-04 4.73E-02 eria cteria

Proteobact Betaproteobac Burkholderiales Comamonad sf_l unidentified alpha mean

7786 0.65 3.45E-05 2.29E-02 eria teria aceae proteobacterium TEMP

Proteobact Betaproteobac Burkholderiales Burkholderi sf_l Burkholderia week

7899 0.65 3.43E-05 2.29E-02 eria teria aceae andropogonis

Proteobact Gammaproteo Unclassified Unclassified sf_3 max

8759 Agricultural soil SC-I-87 J^, 0.6 1.74E-04 4.63E-02 eria bacteria

Proteobact Gammaproteo Pseudomonadales Pseudomona sf_l Pseudomonas min SLP

9389 0.68 8.57E-06 1.18E-02 eria bacteria daceae oleovorans

Proteobact Gammaproteo Pseudomonadales Pseudomona sf l Pseudomonas week

9389 0.83 1.03E-09 2.1 1E-05 eria bacteria daceae oleovorans

[0083] Table S7. Bacterial sub-families detected (92% or greater of probes in probe set positive) most frequently over 17 week study. Italic text indicates sub-families not found in all 17 weeks. AU= Austin, SA= San Antonio.

Most frequently detected 16S rRNA gene sequences AU SA

Bacteπa;Acidobacteπa;Acidobacteπa;Acidobacteπales;Aci dobacteπaceae,sf_14 17 17

Bacteπa;Acidobacteπa;Acidobacteπa-6;Unclassified,Uncla ssified,sf_l 16 17

Bacteria; Acidobacteπa;Solibacteres;Unclassified;Unclassified;sf_l 17 17

Bacteria; Actinobacteπa;Actinobacteπa;Actinomycetales;Cellulornonada ceae;sf_l 17 17

Bacteria; Actinobacteπa;Actinobacteπa;Actinomycetales;Corynebacteπa ceae;sf_l 16 17

Bacteπa;Actinobacteπa;Actinobacteπa;Actinomycetales;Go rdoniaceae;sf_l 17 17

Bacteria; Actinobacteπa;Actinobacteπa;Actinomycetales;Kineospoπacea e;sf_l 17 17

Bacteπa;Actinobacteπa;Actinobacteπa;Actinomycetales;Mi crobacteπaceae;sf_l 16 17

Bacteria; Actinobacteπa;Actinobacteπa;Actinomycetales;Micrococcaceae ;sf_l 17 17

Bacteria; Actinobacteπa;Actinobacteria;Actinomycetales;Micromonospora ceae;sf_ 17 17

1

Bacteπa;Actinobacteπa;Actinobacteπa;Actinomycetales;My cobacteπaceae;sf_l 17 17

Bacteria; Actinobacteπa;Actinobacteπa;Actinomycetales;Nocardiaceae;s f_1 17 17

Bacteria; Actinobacteπa;Actinobacteπa;Actinomycetales;Promicromonosp oraceae; 17 17 sf_l

Bacteπa;Actinobacteπa;Actinobacteπa;ActinomycetaIes;Ps eudonocardiaceae;sf_l 16 17

Bacteria;Actinobactena;Actinobacteria;Actinomycetales;Str eptomycetaceae;sf_l 17 17

Bacteπa;Actinobacteπa;Actinobacteπa;Actinomycetales;Th ermomonosporaceae;s 16 17 f_l

Bacteπa;Actinobacteπa;Actinobacteπa;Actinomycetales;Un classifϊed;sf_3 17 17

Bacteria; Actinobacteπa;Actinobacteπa;Rubrobacterales;Rubrobacterace ae;sf_l 16 17

Bacteria; Actinobacteπa;Actinobacteπa;Unclassifϊed;Unclassified;sf_ l 16 17

Bacteπa;Actinobacteπa;BD2-10 group;Unclassified;Unclassified;sf_2 17 16

Bacteπa;Bacteroidetes;Sphingobacteπa;Sphingobacteπales ;Unclassified;sf_3 16 17

Bacteπa;Chloroflexi;Anaerolineae;Chloroflexi-l a;Unclassified;sf_l 16 17

Bacteπa;Chloroflexi;Anaerolineae;Unclassified;Unclassifi ed;sf_9 16 17

Bacteπa;Chloroflexi;Dehalococcoidetes;Unclassified;Uncla ssified;sf_l 16 17

Bacteπa;Cyanobacteπa;Cyanobacteπa;Chloroplasts;Chlorop ]asts;sf_5 17 17

Bacteπa;Cyanobacteπa;Cyanobacteπa;Plectonema;Unclassif ied;sf_l 16 17

Bacteπa;Cyanobacteπa;Unclassified;Unclassified;Unclassi fied;sf_5 16 17

Bacteπa;Firmicutes;Bacilli;Bacillales;Bacillaceae;sf_l 17 17

Bacteπa;Firmicutes;Bacilli;Bacillales;Halobacillaceae;sf _l 17 17

Bacteπa;Firmicutes;Bacilli;Bacillales;Paenibacillaceae;s f_l 16 17

Bacteπa;Firmicutes;Baci Ih; Lactobacillales;Enterococcaceae;sf_ 1 17 17

Bacteπa;Firmicutes;Baci]]i;Lactobacillales;Streptococcac eae;sf_l 16 17

Bacteπa;Firmicutes;Catabacter;Unclassified;Unclassified; sf_l 16 17

Bacteπa;Firmicutes;Clostridia;Clostπdiales;Clostπdiace ae;sf_12 17 17

Bacteπa;Firmicutes;Clostπdia;Clostπdiales;Lachnospirac eae;sf_5 17 17

Bacteπa;Firmicutes;Clostπdia;Clostπdiales;Peptococc/Ac idaminococc;sf_l 1 17 17

Bacteπa;Firmicutes;Clostridia;Clostπdiales;Peptostrepto coccaceae;sf_5 17 17

Bacteπa;Firmicutes;Clostπdia;Clostπdiales;Unclassified ;sf_17 16 17

Bacteπa;Firmicutes;Unclassified;Unclassifϊed;Unclassifi ed;sf_8 16 17

Bacteπa;Nitrospira;Nitrospira;Nitrospirales;Nitrospirace ae;sf_l 17 16

Bactena;OP3;Unclassified;Unclassified;Unclassified;sf_4 16 17

Bacteπa;Proteobacteπa;Alphaproteobacteπa;Acetobacteral es;Acetobacteraceae;sf 17 16

J

Bacteπa;Proteobacteπa;Alphaproteobacteπa;Azospiπllale s;Unclassified;sf_l 16 17

Bactena;Proteobactena;Alphaproteobactena;Bradyrhizobiales ;Beijennck/Rhodop 17 17 lan/Methylocyst;sf_3

Bacteπa;Proteobacteπa;A]phaproteobacteπa;Bradyrhizobia les;Bradyrhizobiaceae; 17 17 sf_l

Bacteπa;Proteobacteπa;Alphaproteobacteπa;Bradyrhizobia les;Hyphomicrobiacea 17 17 e;sfj

Bacteπa;Proteobacteπa;Alphaproteobacteπa;Bradyrhizobia les;Methylobacteπace 16 17 ae;sf_l

Bacteπa;Proteobacteπa;Alphaproteobacteπa;Elhn314/wr000 7;Unclassified;sf_l 16 17

Bacteπa;Proteobacteπa;Alphaproteobacteπa;Rhizobiales;B radyrhizobiaceae;sf_l 16 17

Bacteπa;Proteobacteπa;Alphaproteobacteπa;Rhizobiales;P hyllobacteπaceae;sf_l 17 17

Bacteπa;Proteobacteπa;Alphaproteobacteπa;RhizobiaIes;U nclassified;sf_l 16 17

Bacteπa;Proteobacteπa;Alphaproteobacteπa;Rhodobacteral es;Rhodobacteraceae;s 17 17 f_l

Bacteπa;Proteobacteπa;Alphaproteobacteπa;Rickettsiales ;Unclassified;sf_l 17 17

Bacteπa;Proteobacteπa;Alphaproteobacteπa;Sphingomonada les;Sphingomonadac 17 17 eae;sf_l

Bacteπa;Proteobacteπa;Alphaproteobacteπa;Sphingomonada les;Sphingornonadac 17 17 eae;sf_15

Bacteπa;Proteobacteπa;Alphaproteobacteπa;Unclassifϊed ;Unclassified;sf_6 17 17

Bacteπa;Proteobacteπa;Betaproteobacteπa;Burkholdeπale s;Alcaligenaceae;sf_l 16 17

Bacteπa;Proteobacteπa;Betaproteobacteπa;Burkholdeπale s;Burkholdeπaceae;sf_ 16 17

1

Bacteπa;Proteobacteπa;Betaproteobacteπa;Burkholdeπale s;Comarnonadaceae;sf_ 16 17

1

Bacteπa;Proteobacteπa;Betaproteobacteπa;Burkholdeπale s;Oxalobacteraceae;sf_ 17 17

1

Bacteπa;Proteobacteπa;Betaproteobacteπa;Burkholdeπale s;Ralstoniaceae;sf_l 16 17

Bacteπa;Proteobacteπa;Betaproteobacteπa;Methylophilale s;Methylophilaceae;sf_ 16 17

1

Bacteπa;Proteobacteπa;Betaproteobacteπa;Rhodocyclales; Rhodocyclaceae;sf_l 16 17

Bacteπa;Proteobacteπa;Betaproteobacteπa;Unclassified;U nclassified;sf_3 17 17

Bacteπa;Proteobacteπa;Deltaproteobacteπa;Syntrophobact erales;Syntrophobacter 16 17 aceae;sf_l

Bacteπa;Proteobacteπa;Epsilonproteobactena;Campylobacte rales;Campylobacter 17 17 aceae;sf_3

Bacteπa;Proteobacteπa;Epsilonproteobacteπa;Campylobact erales;Helicobacterac 17 17 eae;sf_3

Bacteπa;Proteobacteπa;Epsilonproteobacteπa;Campylobact erales;Unclassified;sf 17 17

_1

Bacteπa;Proteobacteπa;Gammaproteobacteπa;Alteromonadal es;Alteromonadacea 16 17 e;sf_l

Bacteπa;Proteobacteπa;Gammaproteobacteπa;Chromatiales; Chromatiaceae;sf_l 16 17

Bacteπa;Proteobacteπa;Gammaproteobacteπa;Enterobacteπ ales;Enterobacteπace 16 17 ae;sf_l

Bacteπa;Proteobacteπa;Gammaproteobacteπa;Enterobacteπ ales;Enterobacteπace 17 17 ae;sf_6

Bacteπa;Proteobacteπa;Gammaproteobacteπa;Legionellales ;Unclassified;sf_l 17 17

Bacteπa;Proteobacteπa;Gammaproteobacteπa;Legionellales ;Unclassified;sf_3 16 17

Bacteπa;Proteobacteπa;Gammaproteobacteπa;Pseudomonadal es;Moraxellaceae;s 16 17 f_3

Bacteπa;Proteobacteπa;Gammaproteobacteπa;Pseudomonadal es;Pseudomonadac 16 17 eae;sf_l

Bacteπa;Proteobacteπa,Gammaproteobacteπa,Unclassifϊed ,Unclassified,sf_3 17 17

Bacteπa;Proteobacteπa;Gammaproteobacteπa;Xanthomonadal es;Xanthomonadac 17 17 eae;sf_3

Bacteπa;TM7,TM7-3;Unclassified;Unclassified;sf_l 16 17

Bacteπa;Unclassified;Unclassified;Unclassifϊed;Unclassi fied;sf_148 16 17

Bacteπa;Unclassified;Unclassified;Unclassified;Unclassif ied;sf_160 17 17

Bacteria; Verrucomicrobia;Verrucomicrobiae,Verrucomicrobiales;Verrucom icrobi 17 17 aceae;sf_7

Number of sub-families detected in all samples over 17 week period 43 80

[0084] Table S8. Bacterial sub-families containing pathogens of public health and bioterrorism significance and their relatives that were detected in aerosols over the 17 week monitoring period.

Pathogens and relatives Austin San Antonio

Weeks % of Weeks % of

Bacillus anthracis taxon # detected weeks detected weeks

Bacillus cohnu, B psychrosaccharolyticus, 3439 17 100.0 17 100.0

B benzoevorans

Bacillus megatenum 3550 1 1 64.7 12 70.6

Bacillus honkoshii 3904 9 52.9 14 82.4

Bacillus litorahs, B macroides, B

3337 5 29.4 8 47.1 psychrosaccharolyticus

Staphylococcus saprophyticus, S xylosus, S 3659 7 41.2 15 88.2 cohnu

Bacillus anthracis, cereus, thuringiensis, 3262 0 0.0 1 5.9 mycoides + others

Rickettsia prowazekii - rickettsii

Rickettsia austrahs, R eschhmannii, R typhi,

7556 2 11.8 5 29.4

R tarasevichiae + others

Rickettsia prowazekii 71 14 0 0^.0 0 0.0

Rickettsia rickettsii, R japomca, R honei +

6809 4 23.5 10 58.8 others

Burkholderia mallei - pseudomallei

Burkholdena pseudomallei, B thailandensis 7870 10 58.8 14 82.4 Burkholderia mallei 7747 10 58.8 8 47.1

Burkholdena pseudomallei, Burkholderia

8097 13 76.5 15 88.2 cepacia, B tropica, B gladioli, B stabihs, B

plantarii + others

Clostriduin botulinuin - perfringens

Clostridium butyricum, C. baratii, C.

4598 3 17.6 10 58.8 sardiniense + others Clostridium botulinum type C 4587 2 11.8 4 23.5 Clostridium perfringens 4576 1 5.9 1 5.9 Clostridium botulinum type G 4575 3 17.6 7 41.2 Clostridium botulinum types B and E 4353 0 0.0 0 0.0

Francisella tularensis

Tilapia parasite 9554 1 5.9 2 11.8 Francisella tularensis 9180 0 0.0 0 0.0

[0085] Table S9. Distribution of array taxa among Bacterial and Archaeal phyla. Phyla Numbers of taxa in phylum represented on array

Archaea

Crenarchaeota 79

Euryarchaeota 224

Korarchaeota 3

YNPFFA 1

Archaeal taxa subtotal 307

Bacteria

1959 group 1

Acidobacteπa 98

Actinobacteπa 810

AD3 1

Aquifϊcae 19

Bacteroidetes 880

BRCl 3

Caldithπx 2

Chlamydiae 27

Chlorobi 21

Chloroflexi 1 17

Chrysiogenetes 1

Coprothermobacteπa 3

Cyanobacteπa 202

Deferπbacteres 5

Deinococcus-Thermus 18

Dictyoglomi 5

DSS l 2

EM3 2

Fibrobacteres 4

Firmicutes 2012

Fusobacteπa 29

Gemmatimonadetes 15

LD l PA group 1

Lentisphaerae 8 marine group A 5

Natronoanaerobium 7

NCl O 4

Nitrospira 29

NK.B 19 2

OD l 4

OD2 6

OP l 5

OPl O 12

OP I l 20

OP3 5

OP5 3

OP8 8

0P9/JS1 12

OS-K 2

OS-L 1

Planctomycetes 182

Proteobacteria 3170

SPAM 2

Spirochaetes 150

SRl 4

Synergistes 19

Termite group 1 6

Thermodesulfobacteria 4

Thermotogae 15

TM6 5

TM7 45

Unclassified 329

Verrucomicrobia 78

WSl 2

WS3 7

WS5 1

WS6 4

Bacterial taxa subtotal 8434

Total taxa 8741

Figure Sl. Rank-abundance curve of phylotypes within the urban aerosol clone library obtained from San Antonio calendar week 29. Phylotypes were determined by clustering at 99% homology using nearest neighbor joining.

phylotype rank

Figure S2. Chaol and ACE richness estimators are non-asymptotic indicating an underestimation of predicted richness based on numbers of clones sequenced.

100 200 300 400 number of clones sampled

Figure S3. Latin square assessment of 16S rRNA gene sequence quantitation by microarray.

10 7 10 8 10 9 1 o io 10 11 10 12 copies of 16S molecules in hydridization

Figure S4. Comparison of real-time PCR and array monitoring of Pseudomonas oleovorans density in aerosol samples from San Antonio. Corrected Array Hybridization Score is the ln(intensity) normalized by internal spikes as described under Normalization.

5.0 5.2 5.4 5.6 5.8 6.0 6.2 6.4 6.6

Corrected ArrayHybridization Score

EQUIVALENTS

[0057] The foregoing written specification is considered to be sufficient to enable one skilled in the art to practice the present embodiments. The foregoing description and Examples detail certain preferred embodiments and describes the best mode contemplated by the inventors. It will be appreciated, however, that no matter how detailed the foregoing may appear in text, the present embodiments may be practiced in many ways and the present embodiments should be construed in accordance with the appended claims and any equivalents thereof.

[0058] The term "comprising" is intended herein to be open-ended, including not only the recited elements, but further encompassing any additional elements.