Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHODS FOR QUANTIFYING MICROORGANISMS AND CELLS AND USES THEREOF
Document Type and Number:
WIPO Patent Application WO/2024/010465
Kind Code:
A1
Abstract:
The present invention broadly relates to methods for determining the presence of and/or quantifying micoorganisms present in a sample, for example in a sample obtained from a subject for diagnostic purposes. In particular, the invention relates to methods for quantifying and in certain examples identifying one or more microorganisms present in a sample by nucleic acid sequencing methods, such as shotgun sequencing. Methods for determining the presence of and/or quantifying prokaryotic and/or eukaryotic cells present in a sample are also provided.

Inventors:
WALLACE ANDREW (NZ)
HARLAND CHAD (NZ)
COULDREY CHRISTINE (NZ)
Application Number:
PCT/NZ2023/050066
Publication Date:
January 11, 2024
Filing Date:
July 04, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
LIVESTOCK IMPROVEMENT CORPORATION LTD (NZ)
International Classes:
C12Q1/06; C12Q1/6869; C12Q1/689
Other References:
PIWOSZ KASIA; SHABAROVA TANJA; TOMASCH JüRGEN; ŠIMEK KAREL; KOPEJTKA KAREL; KAHL SILKE; PIEPER DIETMAR H.; KOBLí: "Determining lineage-specific bacterial growth curves with a novel approach based on amplicon reads normalization using internal standard (ARNIS)", THE ISME JOURNAL, NATURE PUBLISHING GROUP UK, LONDON, vol. 12, no. 11, 6 July 2018 (2018-07-06), London, pages 2640 - 2654, XP036856810, ISSN: 1751-7362, DOI: 10.1038/s41396-018-0213-y
JI BRIAN W.; SHETH RAVI U.; DIXIT PURUSHOTTAM D.; HUANG YIMING; KAUFMAN ANDREW; WANG HARRIS H.; VITKUP DENNIS: "Quantifying spatiotemporal variability and noise in absolute microbiota abundances using replicate sampling", NATURE METHODS, NATURE PUBLISHING GROUP US, NEW YORK, vol. 16, no. 8, 15 July 2019 (2019-07-15), New York, pages 731 - 736, XP036847256, ISSN: 1548-7091, DOI: 10.1038/s41592-019-0467-y
ZARAMELA LIVIA S., TJUANTA MEGAN, MOYNE ORIANE, NEAL MAXWELL, ZENGLER KARSTEN: "synDNA—a Synthetic DNA Spike-in Method for Absolute Quantification of Shotgun Metagenomic Sequencing", MSYSTEMS, HIGHWIRE PRESS (FREE ACCESS), vol. 7, no. 6, 20 December 2022 (2022-12-20), XP093128670, ISSN: 2379-5077, DOI: 10.1128/msystems.00447-22
OLAUSSON JOSEFIN, BRUNET SOFIA, VRACAR DIANA, TIAN YARONG, ABRAHAMSSON SANNA, MEGHADRI SRI HARSHA, SIKORA PER, LIND KARLBERG MARIA: "Optimization of cerebrospinal fluid microbial DNA metagenomic sequencing diagnostics", SCIENTIFIC REPORTS, NATURE PUBLISHING GROUP, US, vol. 12, no. 1, 1 January 2022 (2022-01-01), US , pages 3378, XP093128677, ISSN: 2045-2322, DOI: 10.1038/s41598-022-07260-x
STEPHEN NAYFACH;KATHERINE S POLLARD: "Average genome size estimation improves comparative metagenomics and sheds light on the functional ecology of the human microbiome", GENOME BIOLOGY, BIOMED CENTRAL LTD., vol. 16, no. 1, 25 March 2015 (2015-03-25), pages 51, XP021215488, ISSN: 1465-6906, DOI: 10.1186/s13059-015-0611-7
FRANK JEREMY A., SØRENSEN SØREN J.: "Quantitative Metagenomic Analyses Based on Average Genome Size Normalization", APPLIED AND ENVIRONMENTAL MICROBIOLOGY, AMERICAN SOCIETY FOR MICROBIOLOGY, US, vol. 77, no. 7, 11 February 2011 (2011-02-11), US , pages 2513 - 2521, XP093128678, ISSN: 0099-2240, DOI: 10.1128/AEM.02167-10
HOQUE M. NAZMUL, ISTIAQ ARIF, CLEMENT REBECCA A., SULTANA MUNAWAR, CRANDALL KEITH A., SIDDIKI AMAM ZONAED, HOSSAIN M. ANWAR: "Metagenomic deep sequencing reveals association of microbiome signature with functional biases in bovine mastitis", SCIENTIFIC REPORTS, NATURE PUBLISHING GROUP, US, vol. 9, no. 1, US , pages 13536, XP093128683, ISSN: 2045-2322, DOI: 10.1038/s41598-019-49468-4
TONYA L WARD;SERGEY HOSID;ILYA IOSHIKHES;ILLIMAR ALTOSAAR: "Human milk metagenome: a functional capacity analysis", BMC MICROBIOLOGY, BIOMED CENTRAL LTD., GB, vol. 13, no. 1, 25 May 2013 (2013-05-25), GB , pages 116, XP021151998, ISSN: 1471-2180, DOI: 10.1186/1471-2180-13-116
MTSHALI KHETHIWE, KHUMALO ZAMANTUNGWA THOBEKA HAPPINESS, KWENDA STANFORD, ARSHAD ISMAIL, THEKISOE ORIEL MATLAHANE MOLIFI: "Exploration and comparison of bacterial communities present in bovine faeces, milk and blood using 16S rRNA metagenomic sequencing", PLOS ONE, PUBLIC LIBRARY OF SCIENCE, US, vol. 17, no. 8, 31 August 2022 (2022-08-31), US , pages e0273799, XP093128686, ISSN: 1932-6203, DOI: 10.1371/journal.pone.0273799
WILSON MICHAEL R., SAMPLE HANNAH A., ZORN KELSEY C., AREVALO SHAUN, YU GUIXIA, NEUHAUS JOHN, FEDERMAN SCOT, STRYKE DOUG, BRIGGS BE: "Clinical Metagenomic Sequencing for Diagnosis of Meningitis and Encephalitis", THE NEW ENGLAND JOURNAL OF MEDICINE, MASSACHUSETTS MEDICAL SOCIETY, US, vol. 380, no. 24, 13 June 2019 (2019-06-13), US , pages 2327 - 2340, XP093128690, ISSN: 0028-4793, DOI: 10.1056/NEJMoa1803396
ANONYMOUS: "A framework for human microbiome research", NATURE, vol. 486, no. 7402, 1 June 2012 (2012-06-01), pages 215 - 221, XP093128692, ISSN: 0028-0836, DOI: 10.1038/nature11209
Attorney, Agent or Firm:
CATALYST INTELLECTUAL PROPERTY LIMITED (NZ)
Download PDF:
Claims:
CLAIMS A method of determining the number and/or concentration of a plurality of microorganisms present in a biological sample, the method comprising:

(a) providing a sample comprising or suspected of comprising one or more microorganisms, wherein the sample comprises one or more populations of mammalian cells;

(b) sequencing nucleic acid from the sample to provide genomic sequence information from the one or more microorganisms;

(c) determining or providing data indicative of the number and/or concentration of mammalian cells present in the sample;

(d) determining from the sequencing the relative amounts of microbial and mammalian nucleic acid present in the sample; and

(e) determining one or more of

(i) the number or concentration of one or more microorganisms present in the sample from the amount of microbial nucleic acid present in the sample; and/or

(ii) the number of one or more microorganisms present in the sample from the relative amounts of microbial and mammalian nucleic acid and the number of mammalian cells present in the sample; and/or

(ill) the number of one or more microorganisms present in the sample from the relative amounts of microbial and mammalian nucleic acid and the concentration of mammalian cells present in the sample; and/or

(iv) the concentration of one or more microorganisms present in the sample from the relative amounts of microbial and mammalian nucleic acid and the number of mammalian cells present in the sample; and/or

(v) the concentration of one or more microorganisms present in the sample from the relative amounts of microbial and mammalian nucleic acid and the concentration of mammalian cells present in the sample. A method of determining the amount and/or concentration of a plurality of microorganisms present in a sample, the method comprising:

(a) providing a sample comprising or suspected of comprising a plurality of microorganisms;

(b) contacting the sample with a known amount of reference nucleic acid and/or reference cells;

(c) sequencing nucleic acid from the sample to provide genomic sequence information from the plurality of microorganisms and sequence information from the known amount of reference nucleic acid and/or reference cells; (d) determining using the genomic sequence information from the plurality of microorganisms the amount of at least one microorganism's genomic nucleic acid present in the sample; and

(e) determining the amount and/or concentration of the at least one microorganism present in the sample from the amount of the at least one microorganism's genomic nucleic acid present in the sample relative to the amount of reference nucleic acid and/or reference cells in the sample. The method of claim 2 comprising:

(a) providing a sample comprising or suspected of comprising a plurality of microorganisms;

(b) contacting the sample with a known amount of reference nucleic acid and/or reference cells;

(c) sequencing nucleic acid from the sample to provide genomic sequence information from the plurality of microorganisms and sequence information from the known amount of reference nucleic acid and/or reference cells, such as reference microbial cells;

(d) identifying from the genomic sequence information the number of whole genomes of a plurality of microorganisms or microorganism strains;

(e) determining the frequency of the or each microorganism strain present in the sample by associating sequencing reads to the identified whole genomes;

(f) determining an absolute cell count for the or each microorganism strain in the sample from the total quantity of the or each microorganism strain and the frequency for the or each microorganism strain; wherein the determination is indicative of the number of the or each microorganism strain present in the sample relative to the known amount of reference nucleic acid and/or reference cells, such as reference microbial cells, present in the sample and/or the concentration and/or abundance of microorganisms present in the sample. The method of any one of claims 1 to 3 wherein the genomic sequence information comprises sequence information from genes or genomes other than or in addition to sequence information from one or more genes encoding ribosomal RNA or a ribosomal protein. The method of any one of claims 2 to 4, wherein the method comprises:

(a) providing a biological sample from a mammalian subject, wherein the sample comprises a plurality of microorganisms;

(b) contacting the sample with a known amount of reference nucleic acid and/or reference cells;

(c) sequencing nucleic acid from the sample to provide genomic sequence information from the plurality of microorganisms and sequence information from the known amount of reference nucleic acid and/or reference cells; (d) determining using the genomic sequence information from the plurality of microorganisms the amount of each microorganism's genomic nucleic acid present in the sample; and

(e) determining the amount and/or concentration of one or more of the microorganisms present in the sample from the amount of each microorganism's genomic nucleic acid present in the sample relative to the amount of reference nucleic acid and/or reference cells in the sample. The method of claim 1, the method comprising:

(a) providing a biological sample from a mammalian subject, wherein the sample comprises:

(i) one or more mammalian cells from the subject, and

(ii) one or more microorganisms;

(b) determining or providing data indicative of the number of mammalian cells comprising at least one of the populations of mammalian cells present in the sample;

(c) sequencing nucleic acid from the sample;

(d) determining from the sequencing the relative amounts of microbial and mammalian nucleic acid present in the sample; and

(e) determining the number of microbial cells present in the sample from the relative amounts of microbial and mammalian nucleic acid and the number of mammalian cells present in the sample. The method of any one of claims 1 to 6, wherein the number of microbial cells of two or more strains or species of microorganisms is determined. The method of any one of claims 1 to 7, wherein the method additionally comprises determining from the sequencing the presence and/or number of whole genomes of two or more of the one or more microorganisms. The method according to any preceding claim, wherein the method comprises determining from the sequencing the identity of one or more of the microorganisms present. The method according to any preceding claim, wherein the method additionally comprises determining from the sequencing the identity of each of the strains of microorganisms present. The method according to any preceding claim, wherein the identity of the one or more microorganisms determined to be present is used to determine the size of the or more microbial genomes present. The method according to any preceding claim, wherein the identity of the one or more microorganisms determined to be present is used to establish a representative microbial genome size. The method according to any one of claims 1 to 11, wherein when two or more strains, species, or populations of microorganisms are determined to be present in the sample, the number of microbial cells is determined using an average of the size of the genomes of the two or more strains, species, or populations of microorganisms determined to be present. The method according to any preceding claim, wherein the number of microbial cells present in the sample is determined using the average size of the genomes of the microorganisms determined, predicted to be, or capable of being present in the sample. The method according to any preceding claim, wherein the number of microbial cells present in the sample is determined using the average size of the genomes of substantially all of the microorganisms determined to be present in the sample. The method according to to any preceding claim, wherein the data indicative of the number of mammalian cells present in the sample comprises data derived from analysis of the sample or of an equivalent sample from the mammalian subject. The method according to any preceding claim, additionally comprising determining the concentration of mammalian cells, such as the concentration of at least one population of mammalian cells, present in the sample. The method according to any preceding claim, wherein when present, the data indicative of the number of mammalian cells, such as the number of mammalian cells comprising at least one of the populations of mammalian cells, present in the sample comprises data indicative of the concentration of mammalian cells present in the sample. The method according to any preceding claim, wherein the number of mammalian cells present in the sample is determined by direct cell counting. The method according to any preceding claim, wherein the number of mammalian cells present in the sample is determined by direct microscopy, by flow cytometry, by Coulter counting, by spectroscopy, by California Mastitis Test, or by a quantitative nucleic acid amplification method. The method according to any preceding claim, wherein the sample comprises, consists essentially of, or consists of milk. The method according to any preceding claim, wherein one of the populations of mammalian cells present in the sample are somatic cells. The method according to any preceding claim, wherein the number of microorganisms present in the sample relative to the number of mammalian cells present in the sample is indicative of the concentration of the one or more microorganisms in the sample. The method according to any preceding claim, wherein when the sequencing identifies one or more markers unique to and/or associated with a strain or species of microorganism, the presence of the one or more unique markers is indicative of the presence of the microorganism strain or species in the sample. The method according to claim 24, wherein the absolute cell count for a microorganism strain or species is determined from the frequency of the one or more markers unique to and/or associated with the microorganism strain or species. The method according to any preceding claim, wherein the sequencing comprises subjecting genomic DNA in an unpurified sample to sequencing. The method according to any preceding claim, wherein the sequencing is high throughput sequencing. A method of determining the identity and/or abundance of a plurality of microorganisms present in a biological sample, the method comprising:

(a) providing a biological sample from a mammalian subject, wherein the sample comprises:

(I) one or more populations of mammalian cells from the subject, and

(ii) one or more microorganisms;

(b) determining or providing data indicative of the number of mammalian cells comprising at least one of the populations of mammalian cells present in the sample;

(c) sequencing DNA from the sample to identify one or more whole genomes of the one or more microorganisms;

(d) determining the frequency of one or more genomes within the sample by mapping sequencing reads to the identified whole genomes;

(e) determining the abundance of one or more microorganisms in the sample from the frequency of whole genomes of the microorganisms present; and

(f) optionally determining the total number of microorganisms present in the sample by determining the number of different microorganism genomes in the sample and the relative frequency of each genome within the sample to provide a total number of microorganisms; wherein the determination is indicative of the abundance of microorganisms, such the number of each of the strains, species, or populations of microorganisms, present in the sample relative to the number of mammalian cells present in the sample. The method of claim 2 wherein the method comprises determining the total number of microorganisms present in the sample by determining the number of different microorganism genomes in the sample and the relative frequency of each genome within the sample to provide a total number of microorganisms. The method of claim 2 wherein the sample is an environmental sample.

Description:
METHODS FOR QUANTIFYING MICROORGANISMS AND CELLS AND USES THEREOF

TECHNICAL FIELD

The present invention broadly relates to methods for determining the presence of one or more microorganisms, and more particularly methods for quantifying one or more micoorganisms present in a sample, for example present in a sample from a subject for diagnostic purposes, or an environmental sample for assessment. The invention also relates to methods for determining the presence of one or more cells (whether prokaryotic or eukaryotic) in a sample, and more particularly methods for quantifying one or more cells present in a sample.

BACKGROUND OF THE INVENTION

The following includes information that may be useful in understanding the present inventions. It is not an admission that any of the information provided herein is prior art, or relevant, to the presently described or claimed inventions, or that any publication or document that is specifically or implicitly referenced is prior art. Any discussion of the prior art throughout the specification should in no way be considered as an admission that such prior art is widely known or forms part of the common general knowledge in the field.

There are many circumstances where determining the presence of a particular microorganism is of importance. For example, determining the presence of a particular microorganism will frequently be of interest in clinical settings, particularly when that microorganism is associated with or causative of a clinically relevant or commercially important disease or condition, such as a communicable disease relevant to public health, or when a microorganism is associated with or causative of one or more beneficial outcomes. In many circumstances, determining that a given microorganism is present is sufficient. Environmental sampling, such as sampling waste water and sewage processing, or soil or airborne sampling, is similarly conducted in numerous settings to determine the presence of microorganisms of interest.

However, in certain circumstances it is beneficial to know not only the identity of the particular microorganisms present, but also the amount, such as the absolute number, of the microorganisms present. For example, establishing that one or more microorganisms are present (such as in a biological sample taken from a subject) in an amount above a clinically relevant threshold can inform treatment and/or management decisions for that subject. Similarly, establishing that one or more microorganisms are present but in an amount that is below a clinically relevant threshold will likewise be of use in determining what actions, if any, should be taken. In other examples, determining both the presence and amount of one or more microorganisms in other types of samples, for example in an environmental sample, enables appropriate decision making to occur, such as implementing or continuing monitoring, determining appropriate mitigation approaches, communicating results to stakeholders, and the like.

Nucleic acid sequencing and related methods including nucleic amplification methods are routinely used to determine that a particular nucleic acid, and therefore a particular organism from which the nucleic acid is derived, is or has been present. However, sequence data does not in and of itself provide sufficient information to quantify absolute abundances of the organism(s) from which it is derived. There exists an ongoing challenge in quantifying the amount, such as the absolute abundance, of a microorganism present using nucleic acid sequencing techniques.

The present invention aims to go at least some way to overcoming this challenge, for example by providing a method of determining the number and/or concentration of one or more of a plurality of microorganisms, for example the absolute abundance of one or more microorganisms present in a sample such as a biological sample, or to at least provide the public with a useful choice. Other aims and advantages of different aspects of the present invention will become apparent from the following disclosure.

SUMMARY OF THE INVENTION

This invention relates to methods for quantifying the amount, such as the absolute abundance, of one or more micrororganisms present in a sample, such as in a biological sample, via nucleic acid sequencing or nucleic acid sequence information. In one example the invention relates to the quantification of one or more of the microorganisms present in a biological sample from a mammalian subject. In another example the invention relates to the quantification of one or more of the microorganisms present in a sample, such as an environmental, industrial, or forensic sample.

This gives rise to numerous, and separate, aspects of the invention.

Accordingly, in a first aspect the invention relates to a method of determining the number and/or concentration of a plurality of microorganisms present in a biological sample, the method comprising:

(a) providing a biological sample from a mammalian subject, wherein the sample comprises:

(i) one or more populations of mammalian cells from the subject, and

(ii) one or more microorganisms;

(b) determining or providing data indicative of the number and/or concentration of mammalian cells present in the sample;

(c) sequencing nucleic acid from the sample;

(d) determining from the sequencing the relative amounts of microbial and mammalian nucleic acid present in the sample; and

(e) determining one or more of

(i) the number or concentration of microbial cells present in the sample from the amount of microbial nucleic acid present in the sample; and/or

(ii) the number of microbial cells present in the sample from the relative amounts of microbial and mammalian nucleic acid and the number of mammalian cells present in the sample; and/or

(ill) the number of microbial cells present in the sample from the relative amounts of microbial and mammalian nucleic acid and the concentration of mammalian cells present in the sample; and/or (iv) the concentration of microbial cells present in the sample from the relative amounts of microbial and mammalian nucleic acid and the number of mammalian cells present in the sample; and/or

(v) the concentration of microbial cells present in the sample from the relative amounts of microbial and mammalian nucleic acid and the concentration of mammalian cells present in the sample.

In one example, the sample comprises two or more populations of mammalian cells.

In one example, the sample comprises two or more strains or species of microorganism.

In one example, the method comprises:

(a) providing a biological sample from a mammalian subject, wherein the sample comprises:

(I) one or more mammalian cells from the subject, and

(II) one or more microorganisms;

(b) determining or providing data indicative of the number of mammalian cells comprising at least one of the populations of mammalian cells present in the sample;

(c) sequencing nucleic acid from the sample;

(d) determining from the sequencing the relative amounts of microbial and mammalian nucleic acid present in the sample; and

(e) determining the number of one or more microorganisms present in the sample from the relative amounts of microbial and mammalian nucleic acid and the number of mammalian cells present in the sample.

In another aspect the invention relates to a method of determining the amount and/or concentration of a plurality of microorganisms present in a biological sample, the method comprising:

(a) providing a biological sample from a mammalian subject, wherein the sample comprises:

(I) one or more populations of mammalian cells from the subject, and

(II) one or more microorganisms;

(b) determining or providing data indicative of the number and/or concentration of mammalian cells present in the sample;

(c) sequencing nucleic acid from the sample;

(d) determining from the sequencing the relative amounts of microbial and mammalian nucleic acid present in the sample; and

(e) determining one or more of

(I) the amount and/or concentration of microorganisms present in the sample from the amount of microbial nucleic acid present in the sample; and/or (ii) the amount and/or concentration of microorganisms present in the sample from the relative amounts of microbial and mammalian nucleic acid and the number of mammalian cells present in the sample; and/or

(iii) the amount and/or concentration of microorganisms present in the sample from the relative amounts of microbial and mammalian nucleic acid and the concentration of mammalian cells present in the sample.

In one example, the method comprises:

(a) providing a biological sample from a mammalian subject, wherein the sample comprises:

(I) one or more mammalian cells from the subject, and

(II) one or more microorganisms;

(b) determining or providing data indicative of the number of mammalian cells comprising at least one of the populations of mammalian cells present in the sample;

(c) sequencing nucleic acid from the sample;

(d) determining from the sequencing the relative amounts of microbial and mammalian nucleic acid present in the sample; and

(e) determining the amount and/or concentration of microorganisms present in the sample from the relative amounts of microbial and mammalian nucleic acid and the number of mammalian cells present in the sample.

Accordingly, in another aspect the invention relates to a method of determining the number and/or concentration of a plurality of microorganisms present in a sample, the method comprising:

(a) providing a sample comprising or suspected of comprising one or more microorganisms, wherein the sample comprises one or more populations of mammalian cells;

(b) sequencing nucleic acid from the sample;

(c) determining or providing data indicative of the number and/or concentration of mammalian cells present in the sample;

(d) determining from the sequencing the relative amounts of microbial and mammalian nucleic acid present in the sample; and

(e) determining one or more of

(I) the number or concentration of one or more microorganisms present in the sample from the amount of microbial nucleic acid present in the sample; and/or

(II) the number of one or more microorganisms present in the sample from the relative amounts of microbial and mammalian nucleic acid and the number of mammalian cells present in the sample; and/or (iii) the number of one or more microorganisms present in the sample from the relative amounts of microbial and mammalian nucleic acid and the concentration of mammalian cells present in the sample; and/or

(iv) the concentration of one or more microorganisms present in the sample from the relative amounts of microbial and mammalian nucleic acid and the number of mammalian cells present in the sample; and/or

(v) the concentration of one or more microorganisms present in the sample from the relative amounts of microbial and mammalian nucleic acid and the concentration of mammalian cells present in the sample.

In a further aspect, the invention relates to a method of determining the amount and/or concentration of a plurality of microorganisms present in a sample, the method comprising:

(a) providing a sample comprising or suspected of comprising a plurality of microorganisms;

(b) contacting the sample with a known amount of reference nucleic acid and/or reference cells, such as a known amount of microbial cells;

(c) sequencing nucleic acid from the sample to provide genomic sequence information from the plurality of microorganisms and sequence information from the known amount of reference nucleic acid and/or reference cells, such as the known amount of reference microbial cells;

(d) determining using the genomic sequence information from the plurality of microorganisms the amount of at least one microorganism's genomic nucleic acid present in the sample; and

(e) determining the amount and/or concentration of one or more of the microorganisms present in the sample from the amount of at least one microorganism's genomic nucleic acid present in the sample relative to the amount of reference nucleic acid and/or reference cells in the sample.

In one example, the sample is a biological sample, such as a sample obtained from an animal subject.

In one example, the sample is an environmental sample, such as an effluent sample, a waste sample, or a water sample such as a water sample from a body of water or a water flow, such as a lake, pond, stream, river, sea, or the like, a soil sample, or an atmospheric sample.

In various examples, the reference cells are microbial cells, such as one or microbial cells likely to respond similarly to cell lysis, nucleic acid extraction or processing, or sequencing as do one or more of the plurality of microorganisms present in the sample, or expected to be present in the sample.

In one example, determining the amount and/or concentration of one or more of the microorganisms present in the sample comprises determining the amount and/or concentration of each microorganism present in the sample. In one example, the method of determining the amount and/or concentration of microorganisms present in a biological sample comprises:

(a) providing a biological sample comprising a plurality of microorganisms;

(b) contacting the sample with a known amount of reference nucleic acid and/or reference cells, such as reference microbial cells;

(c) sequencing nucleic acid from the sample to provide genomic sequence information from the plurality of microorganisms and sequence information from the known amount of reference nucleic acid and/or reference cells, such as reference microbial cells;

(d) determining using the genomic sequence information from the plurality of microorganisms the amount of each microorganism's genomic nucleic acid present in the sample; and

(e) determining the amount and/or concentration of one or more of the microorganisms present in the sample from the amount of each microorganism's genomic nucleic acid present in the sample relative to the amount of reference nucleic acid and/or reference cells, such as reference microbial cells, in the sample.

In one example, the method comprises:

(a) providing a sample comprising or suspected of comprising a plurality of microorganisms;

(b) contacting the sample with a known amount of reference nucleic acid and/or reference cells;

(c) sequencing nucleic acid from the sample to provide genomic sequence information from the plurality of microorganisms and sequence information from the known amount of reference nucleic acid and/or reference cells;

(d) determining using the genomic sequence information from the plurality of microorganisms the amount of at least one microorganism's genomic nucleic acid present in the sample; and

(e) determining the amount and/or concentration of the at least one microorganism present in the sample from the amount of the at least one microorganism's genomic nucleic acid present in the sample relative to the amount of reference nucleic acid and/or reference cells in the sample.

In one example, the method comprises:

(a) providing a biological sample from a mammalian subject, wherein the sample comprises a plurality of microorganisms;

(b) contacting the sample with a known amount of reference nucleic acid and/or reference cells, such as reference microbial cells;

(c) sequencing nucleic acid from the sample to provide genomic sequence information from the plurality of microorganisms and sequence information from the known amount of reference nucleic acid and/or reference cells, such as reference microbial cells; (d) determining using the genomic sequence information from the plurality of microorganisms the amount of each microorganism's genomic nucleic acid present in the sample; and

(e) determining the amount and/or concentration of one or more of the microorganisms present in the sample from the amount of each microorganism's genomic nucleic acid present in the sample relative to the amount of reference nucleic acid and/or reference cells, such as reference microbial cells, in the sample.

In one example, the known amount of reference nucleic acid is in or from an organism, such as a microorganism strain or species, or a eukaryotic organism, that is known or predicted to not be already present in the sample.

In one example, the known amount of reference microbial cells comprises or consist of a cells of microorganism strain or species that is known or predicted to not be already present in the sample.

In one example, the known amount of reference nucleic acid is provided by contacting the sample with a known amount and/or concentration of microbial cells, such as a known amount and/or concentration of microbial cells of a species that is known or predicted to not be already present in the sample.

In one example, the known amount of reference nucleic acid is provided as isolated, purified, recombinant or synthetic nucleic acid.

In one example, the reference nucleic acid is provided by contacting the sample with a known amount and/or concentration of microbial cells from one or more of the species selected from the group consisting of: Listeria monocytogenes, Pseudomonas aeruginosa, Bacillaceae, Saccharomyces cerevisiae, Salmonella entericia, Escherichia coli, Limosilactobacillus fermentum, Enterococcus faecalis, Staphylococcus aureus, and Cryptococcus neoformans.

Accordingly, in one example the method comprises:

(a) providing a sample comprising or suspected of comprising a plurality of microorganisms;

(b) contacting the sample with a known amount and/or concentration of reference microbial cells, such as a known amount and/or concentration of reference microbial cells of a species that is known or predicted to not be already present in the sample;

(c) maintaining the sample under conditions suitable for cell lysis and/or

(d) sequencing nucleic acid from the sample to provide genomic sequence information from the plurality of microorganisms and sequence information from the known amount and/or concentration of reference microbial cells;

(e) determining using the genomic sequence information from the plurality of microorganisms the amount of each microorganism's genomic nucleic acid present in the sample; and

(f) determining the amount and/or concentration of microorganisms present in the sample from the amount of each microorganism's genomic nucleic acid present in the sample relative to the amount and/or concentration of reference microbial cells in the sample. In one example, the sample comprises two or more strains or species of microorganism.

In one example, the genomic sequence information comprises sequence information from genes or genomes, such as bacterial genes or genomes, other than or in addition to sequence information from one or more genes encoding ribosomal RNA or a ribosomal protein. In one example, the genomic sequence information comprises sequence information from genes or genomes other than or in addition to sequence information from one or more genes encoding 5S rRNA, 16S rRNA, or 23S rRNA. In one example, the genomic sequence information comprises sequence information from genes or genomes other than or in addition to sequence information from a gene encoding 16S rRNA.

In one example, the genomic sequence information comprises sequence information from fungal or yeast genes or genomes other than or in addition to sequence information from one or more genes encoding ribosomal RNA or ribosomal proteins. In one example, the genomic sequence information comprises sequence information from genes or genomes other than or in addition to sequence information from one or more genes encoding 18S rRNA, or 25S rRNA.

In one example, the sequencing to provide genomic sequence information comprises one or more of:

• sequencing of microbial genes other than or in addition to genes encoding ribosomal RNA or ribosomal proteins;

• shotgun sequencing of microbial genes other than or in addition to genes encoding ribosomal RNA or ribosomal proteins;

• whole genome sequencing of microbial genes other than or in addition to genes encoding ribosomal RNA or ribosomal proteins;

• high-throughput sequencing of microbial genes other than or in addition to genes encoding ribosomal RNA or ribosomal proteins;

• target enrichment of microbial genes other than or in addition to genes encoding ribosomal RNA or ribosomal proteins.

In various examples, the sequencing to provide genomic sequence information comprises the use of one or more of the technologies selected from the group consisting of: Oxford Nanopore Technologies Pic (Oxford, UK), Pacific Biosciences of California Inc (Menlo Park, USA), MGI Tech Co Ltd (Shenzhen, China), and Illumina Inc (San Diego, USA).

In one example, the amount and/or concentration of microorganims of two or more strains or species is determined.

In one example, the method additionally comprises determining from the genomic sequence information the presence and/or number of whole microbial genomes.

In one example, the method comprises determining from the genomic sequence information the identity of more than one of the microorganisms present.

In one example, the method additionally comprises determining from the genomic sequence information the identity of each of the strains of microorganism present.

In one example, the identity of one or more of the microorganisms determined to be present is used to determine the size of the one or more microbial genomes present. In one example, the identity of the one or more microorganisms determined to be present is used to establish a representative microbial genome size.

In one example, when two or more strains, species, or populations of microorganisms are determined to be present in the sample, the amount and/or concentration of mircoorganims is determined using an average of the size of the genomes of the two or more strains, species, or populations of microorganisms determined to be present.

In one example, the amount and/or concentration of microorganisms present in the sample is determined using the average size of the genomes of the microorganisms determined, predicted to be, or capable of being present in the sample.

In one example, the amount and/or concentration of microorganisms present in the sample is determined using the average size of the genomes of substantially all of the microorganisms determined to be present in the sample.

In one example, the amount and/or concentration of microorganisms present in the sample is determined using the size of the genomes of substantially all of the microorganisms determined to be present in the sample.

In one example, the amount and/or concentration of microorganisms present in the sample is determined by determining the number of sequencing reads comprising sequence information associated with the reference nucleic acid. For example, the amount and/or concentration of microorganisms present in the sample is determined by determining the number of sequencing reads comprising sequence associated with the reference nucleic acid and determining the number of sequencing reads comprising microbial genomic sequence.

In one example, the biological sample comprises, consists essentially of, or consists of milk.

In one example, the sample comprises or is suspected of comprising one or more microorganisms associated with or causative of mastitis.

In one example, the one or more microorganisms associated with or causative of mastitis are selected from the group consisting of Staphylococcus species including Staphylococcus aureus, coagulase negative Staphylococci; Streptococcal species including Streptococcus uberis, Streptococcus dysgalactiae, Streptococcus agalactiae, and Streptococcus bovis; Mycoplasma species; and coliform bacteria such as E. coll; Enterobacter species; Klebsiella species; Nocardia species including Nocardia asteroides, Nocardia neocaledoniensis, and Nocardia cyriacigeogica; and Citrobacter.

In one example, the sequencing comprises subjecting genomic DNA in an unpurified sample to sequencing.

Any of the examples described herein can relate to any of the aspects presented herein.

In various examples, the sequencing of nucleic acid from the sample comprises lysing cells present in the sample.

In one example, the number of microbial cells of two or more strains or species of microorganisms is determined. In one example, the method additionally comprises determining from the sequencing the presence and/or number of whole genomes of the one or more microorganisms.

In one example, the method comprises determining from the sequencing the identity of one or more of the microorganisms present.

In one example, the method additionally comprises determining from the sequencing the identity of each of the strains of microorganisms present.

In one example, the identity of the one or more microorganisms determined to be present is used to determine the size of the or more microbial genomes present.

In one example, the identity of the one or more microorganisms determined to be present is used to establish a representative microbial genome size.

In one example, when two or more strains, species, or populations of microorganisms are determined to be present in the sample, the number of microbial cells is determined using an average of the size of the genomes of the two or more strains, species, or populations of microorganisms determined to be present.

In one example, the number of microbial cells present in the sample is determined using the average size of the genomes of the microorganisms determined, predicted to be, or capable of being present in the sample.

In one example, the number of microbial cells present in the sample is determined using the average size of the genomes of substantially all of the microorganisms determined to be present in the sample.

In one example, the data indicative of the number of mammalian cells present in the sample comprises data derived from analysis of the sample or of an equivalent sample from the mammalian subject.

In one example, the step of determining or providing data indicative of the number and/or concentration of mammalian cells present in the sample comprises determining the concentration of mammalian cells, such as the concentration of at least one population of mammalian cells, present in the sample.

In one example, the data indicative of the number of mammalian cells, such as the number of mammalian cells comprising at least one of the populations of mammalian cells, present in the sample comprises data indicative of the concentration of mammalian cells present in the sample.

In one example, the number of mammalian cells present in the sample is determined by direct cell counting.

In one example, the number of mammalian cells present in the sample is determined by direct microscopy, by flow cytometry, by Coulter counting, by spectroscopy, by California Mastitis Test, or by a quantitative nucleic acid amplification method.

In one example, the biological sample comprises, consists essentially of, or consists of milk. In one example, one of the populations of mammalian cells present in the sample are somatic cells.

In one example, the sample comprises or is suspected of comprising one or more microorganisms associated with or causative of mastitis.

In one example, the one or more microorganisms associated with or causative of mastitis are selected from the group consisting of Staphylococcus species including Staphylococcus aureus, coagulase negative Staphylococci; Streptococcal species including Streptococcus uberis, Streptococcus dysgalactiae, Streptococcus agalactiae, and Streptococcus bovis; Mycoplasma species; and coliform bacteria such as E. coll; Enterobacter species; Klebsiella species; Nocardia species including Nocardia asteroides, Nocardia neocaledoniensis, and Nocardia cyriacigeogica; and Citrobacter.

In one example, the step of determining or providing data indicative of the number and/or concentration of mammalian cells present in the sample comprises, consists essentially of, or consists of determining the number and/or concentration of cells present in the sample.

In one example, the step of determining or providing data indicative of the number and/or concentration of mammalian cells present in the sample comprises, consists essentially of, or consists of determining the number and/or concentration of somatic cells present in the sample.

In one example, the data provided in the step of determining or providing data indicative of the number and/or concentration of mammalian cells present in the sample comprises data indicative of the concentration of mammalian cells present in the sample.

In one example, the number of microorganisms present in the sample relative to the number of mammalian cells present in the sample is indicative of the concentration of the one or more microorganisms in the sample.

In one example, when the sequencing identifies one or more markers unique to and/or associated with a strain or species of microorganism, the presence of the one or more unique markers is indicative of the presence of the microorganism strain or species in the sample.

In one example, the absolute cell count for a microorganism strain or species is determined from the frequency of the one or more markers unique to and/or associated with the microorganism strain or species.

In one example, the sequencing comprises subjecting genomic DNA in an unpurified sample to sequencing.

In one example, the sequencing is high throughput sequencing.

In another aspect, the invention relates to a method of determining the identity and/or abundance of a plurality of microorganisms present in a biological sample, the method comprising:

(a) providing a sample, such as a biological sample from a mammalian subject, wherein the sample comprises:

(i) one or more populations of mammalian cells, such as one or more cells from the subject, and (ii) one or more microorganisms;

(b) determining or providing data indicative of the number of mammalian cells comprising at least one of the populations of mammalian cells present in the sample;

(c) sequencing DNA from the sample to identify one or more whole genomes of the one or more microorganisms;

(d) determining the frequency of one or more genomes within the sample by mapping sequencing reads to the identified whole genomes;

(e) determining the abundance of one or more microorganisms in the sample from the frequency of whole genomes of the microorganisms present; and

(f) optionally determining the total number of microorganisms present in the sample by determining the number of different microorganism genomes in the sample and the relative frequency of each genome within the sample to provide a total number of microorganisms; wherein the determination is indicative of the abundance of microorganisms, such the number of each of the strains, species, or populations of microorganisms, present in the sample relative to the number of mammalian cells present in the sample.

In a further aspect, the invention relates to a method for determining the concentration and/or abundance of a plurality of microorganisms present in a biological sample,

(a) providing a biological sample from a mammalian subject, wherein the sample comprises:

(i) one or more populations of mammalian cells from the subject, and

(ii) one or more microorganisms;

(b) determining or providing data indicative of the number of mammalian cells comprising at least one of the populations of mammalian cells present in the sample;

(c) sequencing nucleic acid in the sample to identify the number of whole genomes of a plurality of microorganisms or microorganism strains;

(d) determining the frequency of the or each microorganism strain present in the sample by associating sequencing reads to the identified whole genomes;

(e) determining an absolute cell count for the or each microorganism strain in the sample from the total quantity of the or each microorganism strain and the frequency for the or each microorganism strain; wherein the determination is indicative of the number of the or each microorganism strain present in the sample relative to the number of mammalian cells present in the sample and/or the concentration and/or abundance of microorganisms present in the sample.

In another aspect, the invention relates to a method of determining the identity and/or abundance of a plurality of microorganisms present in a biological sample, the method comprising: (a) providing a sample, such as a biological sample from a mammalian subject, wherein the sample comprises a plurality of microorganisms;

(b) contacting the sample with a known amount of reference nucleic acid and/or reference cells, such as reference microbial cells;

(c) sequencing nucleic acid from the sample to provide genomic sequence information from the plurality of microorganisms and sequence information from the known amount of reference nucleic acid and/or reference cells, such as reference microbial cells;

(d) identifying one or more whole genomes of the one or more microorganisms from the genomic sequence information;

(e) determining the frequency of one or more genomes within the sample by mapping sequencing reads to the identified whole genomes;

(f) determining the abundance of one or more microorganisms in the sample from the frequency of whole genomes of the microorganisms present; and

(g) optionally determining the total number of microorganisms present in the sample by determining the number of different microorganism genomes in the sample and the relative frequency of each genome within the sample to provide a total number of microorganisms; wherein the determination is indicative of the abundance of microorganisms, such the number of each of the strains, species, or populations of microorganisms, present in the sample relative to the known amount of reference nucleic acid and/or reference cells, such as reference microbial cells present in the sample.

In a further aspect, the invention relates to a method for determining the concentration and/or abundance of a plurality of microorganisms present in a sample, such as a biological sample,

(a) providing a sample, such as a biological sample from a mammalian subject, wherein the sample comprises a plurality of microorganisms;

(b) contacting the sample with a known amount of reference nucleic acid and/or reference cells, such as reference microbial cells;

(c) sequencing nucleic acid from the sample to provide genomic sequence information from the plurality of microorganisms and sequence information from the known amount of reference nucleic acid and/or reference cells, such as reference microbial cells;

(d) identifying from the genomic sequence information the number of whole genomes of a plurality of microorganisms or microorganism strains;

(e) determining the frequency of the or each microorganism strain present in the sample by associating sequencing reads to the identified whole genomes;

(f) determining an absolute cell count for the or each microorganism strain in the sample from the total quantity of the or each microorganism strain and the frequency for the or each microorganism strain; wherein the determination is indicative of the number of the or each microorganism strain present in the sample relative to the known amount of reference nucleic acid and/or reference cells, such as reference microbial cells, present in the sample and/or the concentration and/or abundance of microorganisms present in the sample.

In one example, the method comprises determining the total number of microorganisms present in the sample by determining the number of different microorganism genomes in the sample and the relative frequency of each genome within the sample to provide a total number of microorganisms.

In a further aspect, the invention relates to a method for identifying a subject or group of subjects who would benefit from antimicrobial treatment, the method comprising determining the abundance of one or more microorganisms in a sample from the subject or group of subjects by a method as described herein, wherein a determination of abundance of the microorganism in the sample above a threshold amount is indicative that the subject or group of subjects would benefit from antimicrobial treatment.

In another aspect, the invention relates to a method for identifying a subject or group of subjects who would benefit from antimicrobial treatment, the method comprising determining the abundance of one or more microorganisms in a sample by a method as described herein, wherein the sample is not from the subject or group of subjects but is from a related subject or group of subjects, and wherein a determination of abundance of the microorganism in the sample above a threshold amount is indicative that the subject or group of subjects would benefit from antimicrobial treatment.

In one example, the relatedness of the subject or group of subjects who would benefit from antimicrobial treatment and the subject or group of subjects from which the sample is obtained is determined in a method as herein contemplated. For example, the relatedness is determined by a method such as those disclosed in the Examples. In one example, the relatedness is physical proximity.

In one example, the treatment is prophylactic treatment, including prophylactic treatment of a subject or group of subjects in which one or more symptoms normally associated with the microbial population are not evident.

In one example, the treatment is prophylactic treatment of a subject or group of subjects in which the microbial population is not yet present.

In another aspect, the invention relates to a method of treating a subject or group of subjects for a microbial infection, the method comprising administering to the subject or group of subjects an antimicrobial agent in an antimicrobially-effective amount, wherein the subject or group of subjects has been identified in a method as described herein, such as a subject or group of subjects who would benefit from antimicrobial treatment in a method as described herein. In various examples, the subject or group of subjects has or is suspected of having a microbial infection, said microbial infection identified by a method described herein.

In one example, the treatment is prophylactic treatment, including prophylactic treatment of a subject or group of subjects in which one or more symptoms normally associated with the microbial population are not evident. In one example, the treatment is prophylactic treatment of a subject or group of subjects in which the microbial population is not yet present.

In various examples, the method is a prophylactic treatment of a subject or group of subjects for a microbial infection, the method comprising administering to the subject or group of subjects an antimicrobial agent in an antimicrobially-effective amount, wherein the microorganism associated with or causative of the microbial infection is not present in the subject or group of subjects or a sample therefrom, but the abundance of said microorganism in a sample from a related subject or group of subjects is above a threshold.

In one example, the microbial infection is or is causative of mastitis. In one example, the sample is a bulk milk sample or herd test milk sample. In one example, the related subject or group of subjects is one or more bovine from an adjacent or proximal farm, or one or more farms within a geographical region. In various examples, the treatment is an anti-mastitis treatment, such as a prophylactic treatment for mastitis, including drying off, isolation, or treatment with antibiotics such as beta-lactams such as penicillin G or macrolides such as erythromycin.

In another aspect, the invention relates to a method of determining the amount and/or concentration of a plurality of eukaryotic cells present in a sample, the method comprising:

(a) providing a sample comprising or suspected of comprising a plurality of eukaryotic cells;

(b) contacting the sample with a known amount of reference nucleic acid and/or reference cells, such as a known amount of microbial cells;

(c) sequencing nucleic acid from the sample to provide genomic sequence information from the plurality of eukaryotic cells and sequence information from the known amount of reference nucleic acid and/or reference cells, such as the known amount of reference microbial cells;

(d) determining using the genomic sequence information from the plurality of eukaryotic cells the amount of at least one eukaryote's genomic nucleic acid present in the sample; and

(e) determining the amount and/or concentration of one or more of the eukaryotic cells present in the sample from the amount of at least one eukaryote's genomic nucleic acid present in the sample relative to the amount of reference nucleic acid and/or reference cells in the sample.

In various examples, the eukaryotic cells are or comprise one or more of the group consisting of plant cells, algal cells, acheal cells, yeast cells, or fungal cells. In one example, the eukaryotic cells are or comprise animal cells, such as mammalian cells, avian cells, fish cells, reptilian cells, amphibian cells, insect cells, or arachnid cells.

In various examples, the sample comprising or suspected of comprising a plurality of eukaryotic cells is an environmental sample, such as an effluent sample, a waste sample, or a water sample such as a water sample from a body of water or a water flow, such as a lake, pond, stream, river, sea, or the like, a soil sample, or an atmospheric sample. In one example, the presence, concentration or amount of eukaryotic cells, for example in an environmental sample is used to establish the presence and/or identity of one or more eukaryotes (whether an individual eukaryotic organism or a population of eukaryotic organisms) in or adjacent to the environment from which the sample was obtained. For example, the method is used to determine the presence and/or identity of one or more eukaryotes, such as one or more plants or animals, in or adjacent to the environment from which the sample was obtained.

Representative examples of such methods are employed in, for example, environmental monitoring of agricultural runoff, such as monitoring of waterways on or adjacent to a farm. Detection of the presence of one or more eukaryotic cells of interest, or detection of an amount of eukaryotic cells of interest such as detection of an amount of eukaryotic cells of interest above a threshold, in an environmental sample from, for example, a waterway, will in certain examples be indicative of the presence of at least one eukaryotic organism, or material from such an organism, being or having been in contact with the environment from which the sample was obtained.

Accordingly, in one example the method is a method of determining the presence or identity of one or more eukaryotes, such as one or more plants or animals, in or adjacent to the environment from which an environmental sample is obtained.

In another example, the method is a method of determining the presence or identity of one or more substances or contaminants from a eukaryotic organism, such as one or more plants or animals, in or adjacent to the environment from which an environmental sample is obtained.

In one specifically contemplated example, the method is a method of determining the presence and or identity of contamination, effluent, faecal matter, or biological matter in an environment, such as a waterway on or adjacent an agricultural site, such as a farm, meat processing works, barn, poultry shed, or the like.

In other examples, the sample comprising or suspected of comprising a plurality of eukaryotic cells is a biological sample, such as a sample obtained from a mammalian subject. In one example, the sample is a faecal sample. In other examples, the sample is a tissue sample, a blood sample, a plasma sample, or the like. In certain examples, the sample is provided for monitoring purposes, for example to establish the provenance of a sample obtained from a subject. Examples of such uses are provided herein in the Examples, where faecal samples obtained from a mammalian subject are assessed to establish the identity and amount of plant cells, such as Lolium cells, present in the sample. This in turn allows the identity of one or more foods consumed by the subject to be determined - in the case set out in Example 3 herein, for example, it allows the principal food of the bovine subject, ryegrass, to be established.

In certain examples, the methods described herein are applicable to both environmental and biological samples. For example, the identification of the sources of contamination of an environment, such as a waterway, can be established using the methods described herein. It will be appreciated that the methods described herein are also useful for establishing whether or not one or more potential sources of contamination of a environment are in fact a or the source of such contamination. An example of such an analysis is provided herein in Example 3, where the prevalence of Lolium cells in faecal samples from bovine subjects is contrasted with the prevalence of these cells in effluent. In another aspect, the invention relates to a diagnostic kit comprising:

(a) one or more sequencing reagents for sequencing nucleic acid from one or more mammalian cells and one or more microorganisms;

(b) one or more sample buffers or excipients;

(c) optionally one or more containers for sample collection and/or processing;

(d) and instructions for the use of the kit for determining the identity, amount, number, or abundance of a plurality of microorganisms present in a sample, such as a biological sample.

In another aspect, the invention relates to a diagnostic kit comprising

(a) one or more sequencing reagents for sequencing nucleic acid from one or more microorganisms;

(b) a source of of reference nucleic acid and/or reference cells, such as reference microbial cells;

(c) one or more sample buffers or excipients;

(d) optionally one or more containers for sample collection and/or processing;

(e) and instructions for the use of the kit for determining the identity, amount, number, or abundance of a plurality of microorganisms present in a sample.

It is intended that reference to a range of numbers disclosed herein (for example, 1 to 10) also incorporates reference to all rational numbers within that range (for example, 1, 1.1, 2, 3, 3.9, 4, 5, 6, 6.5, 7, 8, 9 and 10) and also any range of rational numbers within that range (for example, 2 to 8, 1.5 to 5.5 and 3.1 to 4.7). These are only examples of what is specifically intended and all possible combinations of numerical values between the lowest value and the highest value enumerated are to be considered to be expressly stated in this application in a similar manner.

Other aims, aspects, features and advantages of the present invention will become apparent from the following description. It should be understood, however, that the detailed description and the specific examples, while indicating preferred examples of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description.

The invention is exemplified in the following non limiting examples and with reference to the accompanying figures.

BRIEF DESCRIPTION OF THE FIGURES

Figure 1 presents a graph showing the correlation between the absolute number of bacteria present in milk vat samples calculated using a method as contemplated herein using somatic cell count (SCC) data and the absolute number of bacteria calculated from spike-in data, as described in Example 1 herein.

Figure 2 presents two graphs showing the amount of Streptococcus uberis (TOP) and the amount of Staphylococcus aureus (BOTTOM) in bulk milk samples from New Zealand farms, calculated using somatic cell count (SCC) data compared to that calculated from the ZymoBIOMICS spikein data, as described in Example 1 herein.

Figure 3 is a graph presenting the amount of S. aureus in Herd Test samples calculated from shotgun sequence data of microbial genomes using somatic cell count (SCC) data compared to that calculated using a commercially avialalbe S. aureus qPCR test (LIC, New Zealand). Dotted line is the 1:1 line.

Figure 4 is a graph showing the abundance of 10 microbial species present in the ZymoBIOMICS Community Standard II (Log Distribution) reagent added to bulk milk samples, as determined in a representative method disclosed herein in Example 1 compared to the abundance of each species as reported by the manufacturer.

Figure 5 presents three graphs showing the distribution of concentration by species across the bulk milk samples analysed as described in Example 2, for Staphylococcus aureus (Figure 5A), Corynebacterium bovis (Figure 5B), and Bifidobacterium pseudoIongum (Figure 5C).

Figure 6 presents two graphs showing shows the variation in the abundance of a single bacteria, Staphylococcus aureus, in Herd Test milk samples obtained in different months (Figure 6A) and in different geographic regions (Figure 6B), as described in Example 2.

Figure 7 presents data showing the variation of microbiomes in samples obtained from different regions across multiple time periods, with outlier samples (boxed regions) being readily identified, as described in Example 2.

Figure 8 presents data showing microbiome variation to identify outliers on the basis of host cell count, with outlier samples shown in red.

Figure 9 is a graph presenting microbiome variation in conjunction with principal component analysis to show the variation in bacterial populations across samples obtained in different regions, as described in Example 2.

Figure 10 is a graph presenting microbiome variation in conjunction with principal component analysis to show the variation in bacterial populations across samples obtained at different time points, as described in Example 2.

Figure 11 is a graph showing the variation in bacterial populations across seasons, as described in Example 2.

Figure 12 is a graph showing some of the most commonly observed bacteria in faecal samples, as described in Example 3.

Figure 13 is a graph showing some of the most commonly observed bacteria in effluent samples, as described in Example 3.

Figure 14 is a graph showing the differences in distribution of Bifidobacterium pseudoIongum in effluent samples across 4 different farms, as described in Example 3.

Figure 15 is a graph showing the differences in distribution of Bifidobacterium pseudoIongum in faecal samples from a farm across 3 different years, as described in Example 3. Figure 16 is a graph showing the differences in distribution of the Lolium genus between effluent and faecal samples, as described in Example 3.

DETAILED DESCRIPTION

The present invention in one aspect recognises for the first time that the amount, such as the absolute number, of microorganisms present in a sample, such as a biological sample taken from a subject, can be determined from nucleic acid sequence information by reference to the number of host cells determined to be present in the sample.

Broadly, the invention relates in one aspect to methods of determining the abundance of one or more microorganisms in a sample, such as a biological sample, using nucleic acid sequencing methods, wherein the sample comprises one or more non-microbial cells, such as one or more host cells such as one or more mammalian cells. In particularly contemplated examples, the number of host cells present in the sample is provided, for example by analysis of the sample to determine the number or concentration of host cells present in the sample. In another aspect, the invention relates to methods of determining the abundance and in certain examples the identity of one or more microorganisms in a sample, such as a biological sample, using nucleic acid sequencing methods to provide genomic sequence information from the microorganisms, such as genomic sequence information derived from whole genome sequencing or shotgun sequencing methods.

Selected definitions

As used herein, the singular forms "a", "an" and "the" include plural references unless the context clearly dictates otherwise. For example, the term "a sample" includes a plurality of samples, including mixtures thereof.

The term "and/or" can mean "and" or "or".

Those skilled in the art will appreciate the meaning of various terms of degree used herein. For example, as used herein in the context of referring to an amount (e.g., "about 9%"), the term "about" represents an amount close to and including the stated amount that still performs a desired function or achieves a desired result, e.g. "about 9%" can include 9% and amounts close to 9% that still perform a desired function or achieve a desired result. For example, the term "about" can refer to an amount that is within less than 10% of, within less than 5% of, within less than 1% of, within less than 0.1% of, or within less than 0.01% of the stated amount. It is also intended that where the term "about" is used, for example with reference to a figure, concentration, amount, integer or value, the exact figure, concentration, amount, integer or value is also specifically contemplated.

The term "comprising" as used in this specification means "consisting at least in part of". When interpreting each statement in this specification that includes the term "comprising", features other than that or those prefaced by the term may also be present. Related terms such as "comprise" and "comprises", and the terms "including", "include" and "includes" are to be interpreted in the same manner.

The term "consisting essentially of" when used in this specification refers to the features stated and allows for the presence of other features that do not materially alter the basic characteristics of the features specified. The term "consisting of" as used herein means the specified materials or steps of the claimed invention, excluding any element, step, or ingredient not specified in the claim.

The terms "determining", "measuring", "evaluating", "assessing," "assaying," and "analyzing" can be used interchangeably herein to refer to any form of measurement and include determining if an element is present or not (e.g., detection). These terms can include both quantitative and/or qualitative determinations. Assessing may be relative or absolute. These terms can include use of the algorithms described herein. "Detecting the presence of" can include determining the amount of something present, as well as determining whether it is present or absent.

When used herein, a "gene" includes coding sequences encoding one or more products of the gene, non-coding sequences such as introns, as well as all nucleotide regions which regulate the production of the one or more gene products, whether or not such regulatory sequences are adjacent to coding and/or transcribed sequences. Accordingly, a gene includes, but is not necessarily limited to, promoter sequences, terminators, translational regulatory sequences such as ribosome binding sites and internal ribosome entry sites, enhancers, silencers, insulators, boundary elements, replication origins, matrix attachment sites and locus control regions.

The term "genome" as used herein, refers to the entirety of an organism's hereditary information that is encoded in its primary DNA sequence. The genome includes both the genes and the non-coding sequences. For example, the genome may represent a microbial genome or a mammalian genome.

The terms "homology" and "homologous" when used herein in reference to nucleotide sequences refer to a degree of complementarity with other nucleotide sequences. There may be partial homology or complete homology (i.e., identity). A nucleotide sequence which is partially complementary, i.e., "substantially homologous," to a nucleic acid sequence is one that at least partially inhibits a completely complementary sequence from hybridizing to a target nucleic acid sequence.

The terms "microbe", "microbial", "microbiota", "microorganism" and related terms as used herein refer to a virus, a phage, a single-celled organism such as a unicellular organism selected from the group comprising bacteria, archaea, protozoa, protists, fungi, yeasts, and algae, and a multicellular organism selected from the multicellular species or forms of fungi, algae, and protists.

The term "microbiome", as used herein, refers to the ecological community of commensal, symbiotic, or pathogenic microorganisms that inhabit a tissue, fluid, or body space on or in a subject.

The terms "nucleic acid sequence" and "nucleotide sequence" and related terms as used herein refer to an oligonucleotide or polynucleotide, and fragments or portions thereof, and to DNA or RNA of genomic or synthetic origin which may be single- or double-stranded, and represent the sense or antisense strand.

The term "polynucleotide(s)" when used herein, means a single or double-stranded deoxyribonucleotide or ribonucleotide polymer of any length, and include as non-limiting examples, coding and non-coding sequences of a gene, sense and antisense sequences, exons, introns, genomic DNA, cDNA, pre-mRNA, mRNA, rRNA, siRNA, miRNA, tRNA, ribozymes, recombinant polynucleotides, isolated and purified naturally occurring DNA or RNA sequences, synthetic RNA and DNA sequences, nucleic acid probes, primers, fragments, genetic constructs, vectors and modified polynucleotides. Reference to nucleic acids, nucleic acid molecules, nucleotide sequences and polynucleotide sequences is to be similarly understood. It will be appreciated that a wide variety of synthetic and/or non- naturally occurring nucleotide analogues are available, such that polynucleotides comprising one or more of said synthetic or non-naturally occurring nucleotide analogues can be prepared. The use of such polynucleotides in the methods and compositions described herein is likewise contemplated.

The term "primer" refers to a short polynucleotide, usually having a free 3'OH group, that is hybridised to a template and used for priming polymerization of a polynucleotide complementary to the template. Such a primer is preferably at least 5, more preferably at least 6, more preferably at least 7, more preferably at least 9, more preferably at least 10, more preferably at least 11, more preferably at least 12, more preferably at least 13, more preferably at least 14, more preferably at least 15, more preferably at least 16, more preferably at least 17, more preferably at least 18, more preferably at least 19, more preferably at least 20 nucleotides in length.

The term "sequencing" as used herein refers to sequencing methods for determining the order of the nucleotide bases— adenine, guanine, cytosine, and thymine— in a nucleic acid molecule (e.g., a DNA nucleic acid molecule), or adenine, guanine, cytosine, and uracil in an RNA nucleic acid molecule.

A "subject" as used herein is an animal, usually a mammal, including a mammalian agricultural animal or a companion animal or a human, and includes a cell, nucleus, gamete, zygote, or embryo such as a cell, nucleus, gamete, zygote, or embryo of animal origin. Particularly contemplated subjects are non-human animals. Representative agricultural animals include caprine, ovine, bovine, cervine, and porcine. Representative companion animals include feline, equine, and canine.

Accordingly, the term "animal" is used herein primarily in reference to mammals. In one particular example, the mammal is a ruminant. In another example, the mammal is one within within the Bovidae family. In particular examples, the animal is a bovine animal. More particularly the animal is Bos taurus or Bos indicus. In one particular example the animal is a beef or dairy breed. By way of further example, the animal may be chosen from the group of animals including, but not limited to, Jersey, Holstein-Friesian, Ayrshire, crossbred dairy cattle, Angus, Hereford, Simmental and crossbred beef cattle.

Various aspects of the invention are described in further detail in the following subsections. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as is commonly understood by one of ordinary skill in the art to which this invention belongs. In case of conflict, the present specification, including definitions, will control. Although methods and materials similar or equivalent to those described herein can be used in the practice of the invention, examples of suitable methods and materials are described below. The materials, methods, and examples described herein are illustrative only and are not intended to be limiting.

The methods contemplated herein in certain examples target the detection and/or quantification of micoorganisms present in a biological sample, for example, for diagnostic or prognostic purposes, for monitoring animal or public health, for determining the need or desirability of a particular therapeutic approach, and the like. The methods contemplated herein in certain examples target the detection and/or quantification of micoorganisms present in a sample comprising or suspected of comprising one or more microorganisms, such as an environmental, industrial, clinical, or forensic sample, for example for monitoring environmental or public health, assessing mitigation measures, informing planning and development, and the like.

Accordingly, the methods contemplated herein are applicable to the detection and/or quantification of any microorganism. Of particular interest in certain examples are microorganisms that are associated with or causative of a disease or condition, for example a disease or condition for which a prophylactic or therapeutic treatment exists, or a disease or condition to which part of the typical response comprises management, such as isolation, of the individual determined to be suffering from or predicted to be likely to suffer from the disease or condition.

In various particularly contemplated examples, the methods described herein are targeted to the detection and/or quantification of one or more microorganisms that cause mastitis. Mastitis is an inflammation of the mammary gland typically associated with intramammary infection. Mastitis is primarily caused by bacterial infection, although mastitis associated with mycoplasmal, fungal, and in some cases algal infections have been reported.

While mastitis is problematic in a number of mammalian species, including humans, the diagnosis and treatment of mastitis is of significant importance to the dairy industry. Indeed, mastitis is the most costly disease in dairy cattle, costing over $2 billion per year in the United States alone. The disease is estimated to affect 50 per cent of US dairy cows and a similar percentage of cows in dairy herds in other dairying nations, leading to increased animal management costs and burdens, unusable milk, decreased milk production, and, in cases of severe infection, the death of the animal. In large part this cost is associated with the need to discard milk from cows undergoing treatment for mastitits and those for which a withholding period in which milk cannot be collected for consumption pertains, together with reduced milk production, including amongst cows with preclinical or subclinical mastitis. The cost of veterinary care of infected cows, and labour costs, are also significant. Animal wellbeing is also a serious consideration.

Important pathogens associated with mastitis in cows include Staphylococcus aureus, coagulase negative Staphylococci, Streptococcal species such as Streptococcus uberis, Streptococcus dysgalactiae, Streptococcus agalactiae, and Streptococcus bovis, Mycoplasma spp., and coliform bacteria such as E. coll, Enterobacter spp., Klebsiella spp., and Citrobacter.

Staphylococcus aureus or Streptococcus agalactiae are the principal mastitis causing bacteria in US dairy herds, and Staphylococcus aureus or Streptococcus uberis in New Zealand herds, and to a lesser degree by E. coll and other gram-negative bacteria or combinations thereof. Most streptococcal infections have proven to be effectively treatable using conventional antibiotic therapy. Staphylococcal mastitis has, however, proven more difficult to cure.

In various particularly contemplated examples, the methods described herein are targeted to the detection and/or quantification of one or more microorganisms that cause disease in production animals. In certain examples, the methods contemplated herein utilise biological samples from fluids or tissues likely to contain one or more microorganisms of interest or one or more analytes therefrom. For example, faecal samples, blood samples, plasma samples, and the like are all amenable to use with the methods described herein.

However, the efficacy and/or sensitivity of the methods contemplated herein is such that samples from tissues or fluids not generally expected to contain a microorganism of interest will in certain circumstances nevertheless enable the detection and/or quantification of such microorganisms. For example, methods contemplated herein that employ the use of bovine milk are expected to be useful in the detection of systemic or localised bacterial pathogens including those not usually present in the tissues of the bovine udder, such as the bacterial pathogen Theileria (e.g., Theileria parva or Theileria annulata), or bacteria common in the bovine rumen, for example.

In various examples, the methods target the diagnosis of diseases such as respiratory diseases in bovine or porcine animals, for example by detecting and/or quantifying microorganisms associated with or causative of respiratory diseases such as Pasteurella haemolytica, Pasteurella multocida, Mycoplasma hyopneumonia, Haemophilus (Activobacillus) pleuropneumonia, Streptococcus suis, Salmonella cholersuis, Bordetella bronchiseptica, pseudorabies virus, and swine influenza virus. Similarly, sepsis in animals, such as equine animals, is problematic, and rapid detection of gram negative bacteria usually associated with sepsis is desirable. Infections of the gut can be readily detected through the use of faecal samples.

It will be appreciated that the early and/or ready detection of pathogens associated with diseases such as mastitis is of significant importance.

In other particularly contemplated examples, the methods disclosed herein are implemented to detect and/or quantify one or more microorganisms of interest in outside of the diagnosis of a disease or condition in a particular individual from whom a sample has been obtained. In certain examples, the one or more microorganisms are selected from the group consisting of: a microorganism associated with or causative of an environmental impact, a microorganism of interest from a public health perspective (such as in food or beverage safety or in population health such as a microorganism causative of or associated with communicable diseases or for which epidemic or pandemic monitoring is of interest), and a microorganism of interest in industry, such as to monitor and/or avoid contamination of products such as pharmaceutical or cosmetic products.

Representative examples of such microorganisms include pathogenic bacteria causing communicable infections such as tetanus, typhoid fever, diphtheria, syphilis, cholera, food borne illness, leprosy, peptic ulcer disease, bacterial meningitis, and tuberculosis.

Representative examples of such microorganisms include viruses including human pathogens, animal pathogens and plant pathogens. Non-limiting examples of viruses include influenza virus, HIV, hepatitis A, B and C, Epstein-Barr virus, papillomaviruses, herpesvirus, adenovirus, Ebola, SARS, and SARS-Cov-2.

Further representative examples of such microorganisms include protozoa such as human parasites causing diseases including malaria, amoebiasis, giardiasis, toxoplasmosis, trichomoniasis, Chagas disease, leishmaniasis, sleeping sickness and dysentery.

The methods contemplated herein utilise nucleic acid sequencing and information derived from such sequencing. In principle, any method of sequencing is amenable to use in the methods contemplated here, but those skilled in the art will recognise that high throughput or 'next-gen' sequencing technologies, such as those used for whole genome sequencing and shotgun sequencing, are particularly suited and are exemplified herein.

A number of sequencing methods and platforms are particularly suited to large- scale implementation and are amenable to use in the methods contemplated herein. These include pyrosequencing methods, such as that utilised in the Genome Sequencer™ FLX pyrosequencing platform available from 454 Life Sciences (Branford, CT) which can generate -400 million nucleotide data in a 10 hour run with a single machine, solid-state sequencing methods, such as that utilised in the SOLiDcl M> sequencing platform (Applied Biosystems, Foster City, CA), second-generation synthetic sequencing technologies such as the TruSeq™ massively parallel terminator-based sequencing platform (Illumina, San Diego, CA), the PacBio RS realtime single molecule sequencing system (Pacific Biosciences, CA), the PostLight™ semiconductor-based sequencing platform (Ion Torrent, Guilford, CT), nanopore-based sequencing technologies including exonuclease-associated nanopore sequencing (Oxford Nanopore Technologies, Oxford, UK), and the tSMS™ single molecule sequencing flow cell-based platform (Helicos Bioscience Corporation, Cambridge, MA). Certain sequencing methods and platforms are particularly well suited to short-read (e.g. ~150bp) sequencing methods including short-read shotgun sequencing, such as the Illumina Nextera DNA Flex and Illumina NovaSeq flowcell platform exemplified herein in the Examples.

Particularly contemplated, for example for examples of the methods described herein utilising whole genome sequencing or shotgun sequencing are technologies from Oxford Nanopore Technologies Pic (Oxford, UK, www.nanoporetech.com), Pacific Biosciences of California Inc (Menlo Park, USA, www.pacb.com), MGI Tech Co Ltd (Shenzhen, China, www.en.mgi-tech.com), and Illumina Inc (San Diego, USA, www.illumina.com).

Representative quantitative methods

A representative method for the quantification of the absolute number of a microbial species present in a sample is provided in the Examples herein. Broadly, the determination of the amount of a given microbial species can be done using whole genome sequencing or shotgun sequencing as follows:

The number of host cells present in a given sample is determined. In the following representative example, the host cell number (e.g., Somatic Cell Count (SCC)) of various milk samples is provided.

Estimating the prevalence of a given microbial species (S) of interest recognises that the DNA present from a species is equal to the number of cells from that species multiplied by its genome length and ploidy, i.e.:

(1) DNA S = Cells s * Ploidy s * Genome Length s

Applying this formula to both the bacterial species of interest and the host, and dividing one by the other gives: Assuming that no significant bias exists in the sequencing, the sequence data should provide reads from all species in proportion to their amount of DNA in the sample. Thus:

Combining 2 and 3 together and rearranging yields a formula that can be used to calculate the unknown value Cells s from the other known values:

Thus, the known concentration of host cells, can be used in conjunction with the genome lengths and the sequencing reads obtained in the sample to estimate the concentration of cells of a species of interest, S, in the sample.

As a non-trivial proportion of a genome (of one species) may be shared across multiple species, sequencing reads on these shared parts of the genome cannot be uniquely resolved to a specific species. For each species of interest, the percentage of reads on the genome that can be successfully resolved back to that species can be calculated by simulating reads on the genome as follows:

(5) ClassifiableReadFraction s = classi f iedReadSs

This formula can then be combined with (4) to give:

Where only part of a genome is assembled, the relevant genome length to be used is the length of the assembled genome and not the estimated length of the total genome of the organism. This is because reads on unassembled parts of the organism's genome will neither be able to be simulated nor classified.

To calculate the host-estimated amount of the species of interest, formula 6 is used in conjunction with the SCC values obtained for the samples.

In certain examples, such as where one or a selection of microorganism species are of interest, certain of the terms in the formulae above can be combined in a single unknown, k. This single k value can then be empirically determined from a single sample where the quantities are known, due to a manufacturer-provided quantity, spike-ins, or some other absolute quantification method:

Alternatively, the equation can be written in logarithmic form and use made of the log(AB/C) = log(A) + log(B) - log(C) identity to separate the terms out:

As bacteria generally multiply exponentially, determining, analysing and/or visualising bacterial counts on a logarithmic scale (e.g. as in Figures 1-4) is useful. An advantage of the log-form of the equation is that when comparing results for a different species across multiple samples, any unknown terms, such as a term to accommodate biases in extraction, sequencing efficiency, or other known variables can be included as a constant offset value, so between-sample differences can readily be seen.

In certain examples, for estimating total bacteria an average bacterial genome length, for example of 5mb, can be used, and all reads from bacterial genomes are assumed to be classifiable to the domain level.

It will be appreciated that certain factors will affect the accuracy of quantification, which may or may not be relevant to a particular circumstance. For example, where the determination is focused on a specific target bacteria, there is a high likelihood that the genome length is precisely known. Thus, the universal estimate of bacterial genome length (e.g., 5mb as above) can be substituted with the actual genome length.

Similarly, the host genome length will usually be precisely known. Likewise, the number of host cells present can usually be very accurately determined, particularly in circumstances where host cell counts are routinely practised.

In some examples, the non-homogenous nature of the solutions being sequenced, extraction bias due to some species lysing more easily than others, GC bias in sequencing reads, randomness in the sequencer sampling, or sample handling inaccuracies and general experimental variation may each affect the number of reads of different species obtained in a particular iteration of the method. Those skilled in the art will be aware of steps that can be taken to mitigate any inaccuracies introduced by such variation, including the selection of appropriate sequencing reagents, equipment, or methodology to be used to accommodate varying GC content, the use of multiple rounds of extraction and sequencing, robust experimental technique, and the like.

Sample preparation and analysis

In certain examples, the methods contemplated herein will involve taking a sample from a subject, for example from an animal (or cell, gamete, embryo, nucleus, etc.), to be tested. The sample may be any appropriate tissue or body fluid sample. In one example, the sample is a milk, blood, muscle, bone, cell, saliva, faecal, or semen sample. It will be appreciated that samples of particular interest will comprise one or more host cells, and one or more microbial cells. Such samples can be taken from an animal using standard techniques known in the art, including cell scrapings or biopsy techniques, such as ear punch, or blood sampling. It should be appreciated that a sample may be taken from an animal at any stage of life, including prior to birth; by way of nonlimiting example, a zygote, an embryo, a foetus. In specifically contemplated examples, such as those exemplified herein, the sample comprises milk, such as dairy milk.

Accordingly, the term "sample" as used herein with reference to biological samples should be taken to include in certain examples any biological material derived or obtained from a subject, and will usually include at least one host cell, gamete, or zygote, or may comprise host tissue or fluid, or material obtained from the subject, such as a faecal sample. In certain examples, the sample may also be taken after the death of an animal. The samples are analysed using techniques which allow for the observation or analysis as contemplated herein. Methods for storing and processing biological samples are well known in the art. For example, milk or tissue samples may be frozen until tested if required. In addition, one of skill in the art would realize that some test samples would be more readily analysed following a fractionation or purification procedure, for example, separation of whole blood into serum or plasma components. Nevertheless, in particularly contemplated methods including those exemplified herein, little or no sample processing prior to sequencing is required.

Indeed, certain particularly contemplated methods herein utilise samples routinely collected and processed by the dairy industry, usually as part of Herd Testing programmes such as those employed by purchasing companies to calculate the value of milk supplied or to ensure safety. For example, one of the tests most commonly performed on milk is a Somatic Cell Count (SCC), which indicates the number of bovine cells present in the milk (e.g. ~60,000 cells/ml), and methods to prepare samples for SCC analyses are well known.

Those skilled in the art will appreciate, having the benefit of this disclosure, that the routine production of SCC data for milk samples provides a readily available source of data indicative of the number and/or concentration of mammalian (host) cells present in the milk, and thus useful data for application to the methods disclosed herein. In examples where it is available and of sufficient quality, SCC data for milk samples thus avoids the need for sample manipulation such as introduction of known amounts of reference nucleic acid (such as are employed in methods reliant on spike-in of microbes such as the ZymoBIOMICS™ Spike-in Control I (High Microbial Load) reagent used in the Examples herein). Those skilled in the art will appreciate that SCC data or similar data representative of the number of mammalian or non-microbial cells present in a sample will not always be of sufficiently high quality to enable its use, and in these circumstances and in other situations where it is practicable to use an exogeneous reference, the methods contemplated herein involving the contacting of the sample with a known amount of reference nucleic acid and/or a known amount of microbial cells can be conveniently used.

In certain examples, the methods contemplated herein will employ a normalisation to ensure that any differences in the efficiency of nucleic acid extraction from microbial cells compared to that of host cells is accommodated. For example, empricically determining the relative efficiency of DNA extraction for a given target microorganism, such as a target bacteria of particular interest, compared to that of host cells known to be present in the sample, is performed and factored in to the calculation of genomic reads and the attribution of the number of a particular genome present.

In certain examples, the methods disclosed herein relate to the analysis of samples from sources other than a subject, such as an environmental sample, including for example samples from waterways including streams, lakes, rivers, seas and oceans, samples from waste water, effluent, or sewage, samples comprising organic matter including samples comprising plant material, soil, atmospheric samples, and the like, an industrial sample such as a food or beverage sample, a pharmaceutical or cosmetic sample, a forensic sample, or samples relating to defense for example samples suspected of comprising one or more biological warfare agents.

In specifically contemplated examples, the methods disclosed herein relate to the analysis of samples from environmental sources, such as environmental sources associated with agriculture or the production of animal products. For example, certain methods contemplated herein are directed to the analysis of samples from environments such as waterways including streams, rivers, ponds, and lakes, on or adjacent to farms, stables, milking sheds, barns, or other facilities used to house or maintain agricultural animals. In certain examples, soil, effluent, or water samples are provided. In certain examples, such samples are provided in addition to one or more biological samples, such as one or more samples from one or more animals currently or previously present on, in or adjacent to the source from which the one or more environmental samples were obtained. It will be appreciated that the use of the methods described herein in monitoring efflux from facilities on or in which agricultural animals are maintained, and/or in establishing the source of any such efflux, is contemplated.

Accordingly, the term "sample" as used herein, for example with reference to environmental samples, should be taken to include in certain examples any material comprising or suspected of comprising one or more microorganisms. In other examples, the term "sample" as used herein contemplates any material comprising or suspected of comprising one or more eukaryotic cells, such as environmental samples such as effluent samples, or biological samples such as faecal samples from an animal subject, that comprise or are suspected of comprising one or more eukaryotic cells such as one or more plant cells, one or more algal cells, or the like.

Diagnostic kits

The invention further relates to diagnostic kits useful in determining the number and/or concentration of one or a plurality of microorganisms present in a sample such as a biological sample, for example a diagnostic kit for use in a method described herein.

Accordingly, in one example the invention relates to a diagnostic kit comprising one or more sequencing reagents, one or more sequencing primers, optionally one or more reagents for determining the number of host cells present in a sample, and instructions for the use of the kit for determining the abundance of one or a plurality of microorganism species present in the sample.

One exemplary kit comprises one or more sets of primers used for amplifying the genetic material present in the biological sample, and/or one or more reagents suitable for sequencing mammalian nucleic acid, and/or one or more reagents suitable for sequencing microbial nucleic acid. In various examples the kit includes instructions for use, for example in accordance with a method as contemplated herein.

The invention is further described with reference to the following examples. It will be appreciated that the invention as claimed is not intended to be limited in any way by these examples.

EXAMPLES

Example 1: Absolute quantification of microbiota in dairy milk samples

This example reports the development and assessment of methods to determine the abundance of microbiota of various types present in dairy milk samples using shotgun sequencing. Materials and methods

Sampling

393 bulk milk samples were obtained from 276 commercial New Zealand dairy herds which undergo periodic milk recording (also known as herd testing). Samples from the bulk milk vat were collected on herd test days in 35 mL vials containing O.lmL of bronopol solution (10%) to prevent microbial replication.

For five of the farms from which bulk milk samples were collected, milk samples were collected from all individual cows on the farm and also used for DNA sequencing. These samples, referred to herein as the Herd Test samples, were heated to 30°C - 36°C and then shaken at 240 rpm for 5 minutes to mix the fat into the sample. An 800 pL aliquot was then subsampled into a 96-well plate for freezing and subsequent DNA extraction and sequencing.

Laboratory Analysis and DNA extraction

Somatic Cell Counts (SCCs) of all samples were measured using commercially available Bentley FTS Combi or Foss MilkoScan FT 6000 instruments. For the Herd Test samples this was performed as part of the routine commercial herd testing process. 15mL aliquots of the bulk milk samples were analysed in the same way. All samples were frozen at -20°C until time for DNA extraction.

Full 96-well plates (93 samples) of bulk milk samples were thawed until samples were at room temperature. Sample vials were shaken by hand, mixing fat into the milk before aliquoting 1.7 mL whole milk into a 96-well plate and spiking with 17 uL of 1: 100 diluted ZymoBIOMICS™ Spike-in Control I (High Microbial Load).

Herd Test sample plates were allowed to equalize to room temperature for an hour to soften the fat layer before beginning the extraction process. The fat layer was then mixed in by pipetting the samples up and down 3 times. 400pl of whole milk was taken from these herd test subsamples.

Samples were extracted using a custom BioSprint® 96 DNA Kit (Qiagen, Germany) and a Kingfisher (Thermofisher) machine, and eluted into 100 pL of elution buffer.

Three further samples were prepared in a similar way, but 325 pL of cow milk was taken from the Herd Test samples of three very low SCC animals (SCCs = 7k, Ilk, 14k), and to these were added 75 pL preparations of ZymoBIOMICS™ Microbial Community Standard II (Log Distribution), and extraction proceeded as above. This ZymoBIOMICS product contains 8 bacterial and 2 fungal strains in known quantities in a log-distributed abundance for use as positive controls in testing microbiomics workflows.

Staphylococcus aureus PCR test

Herd test samples were tested for the presence of Staphylococcus aureus using a commercially available qPCR test (LIC, New Zealand). Two Staphylococcus aureus PCR positive controls and two nuclease-free water only PCR negative controls were used on each 96-well qPCR reaction plate.

DNA sequencing

Short-read (150bp paired) shotgun sequencing was performed on an Illumina NovaSeq using SI and S4 flowcells, targeting 15 million reads per sample. Taxonomic classification

For taxonomic classification, reads shorter than 130bp were discarded as were reads consisting of all Ns or all Gs (G being the null read in the Illumina chemistry). Reads were identified using Kraken2 taxonomic classifier (Wood et al., 2019) against a database built from microbiome, human, and bovine sequences downloaded from NCBI's RefSeq database. The Kraken2 'confidence' setting parameter of 0.5 was used to limit false positives in identification.

Bacterial controls

The ZymoBIOMICS™ Spike-in Control I (High Microbial Load)™ (Zymo Research Corporation, California^ multi-species bacterial control samples were used as controls in spike-in experiments, and the ZymoBIOMICS™ Microbial Community Standard II (Log Distribution)™ (Zymo Research Corporation, California^ multi-species standard was used for assessing the accuracy of the methods described herein.

Calculating taxonomic classification percentages

The Kraken2 classifier does not attempt to resolve all shotgun reads to the species level, but rather assigns reads to the location in the taxonomic hierarchy they best match (Domain, family, genera, subspecies, strain etc), given the confidence setting and sequence database contents. The presence of many similar organisms in the sequence database will reduce how much of a given species' genome is unique to that species and how much is common to other organisms in the sequence database, thus altering the taxonomic classifications of reads from that organism. Therefore, simulated reads on genomes of interest were used to quantify what proportion of reads from those species would be classified to the species level or below.

Genome assemblies were downloaded from RefSeq for Staphylococcus aureus and Streptococcus uberis. The provided ZymoBIOMICS™ genomes were used for their spike-in bacteria, and the ARS-UCD1.2 bovine genome assembly was used for bovine genomes. RTG-Tools 3.10.1 was used to generate 100k simulated reads on the microbial genomes and 100m simulated reads on the bovine genome. Kraken2 was then used to classify the simulated samples, and the proportion of them classified at the species level or below for each organism was recorded. For the species where the proportion of reads classified at the species level was 5%, the proportion classified at the genus level was also recorded (data not shown).

Absolute quantification of microbiota

When a spike-in species is used, estimating the prevalence of a given microbial species (S) of interest recognises that the DNA present from a species is equal to the number of cells from that species multiplied by its genome length and ploidy, i.e. :

(1)

DNA S = Cells s * Ploidy s * GenomeLength s

Applying this formula to both the species of interest and the spike-in, and dividing one by the other gives:

(lb) Assuming that no significant bias exists in the sequencing, the sequencer should provide reads from both species in proportion to their amount of DNA in the sequencing library. Thus:

(lc)

Reads s DNA S Reads spike DN A sp ke

Combining lb and lc together and rearranging yields a formula that can be used to calculate the unknown value Cells s from the other known values:

(2a)

Thus, the known concentration of spike-in cells can be used in conjunction with the genome lengths and the sequencing reads obtained in the sample to estimate the concentration of cells of a species of interest, S, in the sample. Cells s and Cells sP ike can either be in absolute quantities of cells, or a concentration (e.g. cells/ml) as is convenient. Units of cells/ml are used below.

This formula applies readily when host cells are being used as an internal standard for a sample, and the number of host cells (such as Somatic Cell Count (SCC)) is known. Thus:

(2b)

The Kraken2 taxonomic classifier assigns only some reads at the level of species or below (e.g., subspecies, strain, etc.). This is because much of the genomes are common across more than one species, and sequencing reads on these parts cannot be uniquely resolved to a species of interest. For each species of interest, the percentage of reads on the genome that can be successfully resolved back to that species can be calculated by simulating reads on the genome and putting them through the classification software.

(3a)

ClassifiedReads s

Classi

J fiableReadFraction k = - ; - Reads s

This definition can then be combined with (2b) to give:

Where only part of a genome is assembled, the relevant genome length used is the length of the assembled genome, not the estimated length of the total genome of the organism, as reads on unassembled parts of the organism's genome will neither be able to be simulated nor classified.

This formula is unaffected by the use of spike-ins being added to the samples, regardless of the volume of the spike-in, as the formula relies on comparing the ratio of host to microbial reads and the ratio of host to microbe cells independently of volume.

Similarly, formula 2a can be combined with 3a to give: The spike in product used, ZymoBIOMICS™ Spike-in Control I (High Microbial Load), contains two bacterial strains, Imtechella halotolerans and Allobacillus halotolerans. For each of these two bacteria equation 3c was used to estimate the amount of the species of interest. The results of those two calculations were averaged to give the spike-in estimated amount of the species of interest for the samples.

To calculate the host-estimated amount of the species of interest, formula 3b was used in conjunction with the SCC values obtained for the samples. The abundance of bacteria was determined as logarithmic data using the logarithmic form of formula 3b as follows:

(4b) log Cells s ) « log ClassifiedReads s ) — log ClassifiedReads host )

+ log ClassifiableReadFraction host ) — log(ClassifiableReadFraction s )

+ log(Ploidy host ) - log(Ploidy s ) + log GenomeLength host )

— log GenomeLength s ) + log(Cells host )

For estimating total bacterial cells present, an average bacterial genome length of 5 million bp was assumed (Land et al., 2015), and it was assumed all reads from bacterial genomes would be classifiable to at least the Domain level due to how high this level is up the taxonomic tree.

All bacterial abundances are analysed and presented on a log-scale, as bacteria grow exponentially, and there are 5-8 orders of magnitude of abundance variation to accomodate. R 2 values of log-transformed data were calculated in Excel (data not shown).

Results

The absolute number of bacteria present in milk samples as determined using the host cell data, in this case SCC, correlated strongly with the results obtained using the spike-in method.

An average of 14.8 million reads (s.d. 6.7 million) was obtained for the 393 bulk milk sample, with an average of 83% (s.d. 1.3) of reads being classified as Bos taurus. Seven samples were discarded due to sample degradation, and one sample with <100k reads was omitted from further analysis due to too few reads being obtained. The samples were classified by Kraken2 analysis (as described above) as containing an average of 110 species (s.d. 50) per sample, with 1955 species observed in total.

Figure 1 shows the estimated amounts of total bacteria in each sample using the spike-in formula and that obtained from the host-cell method. As can readily be seen, there is a strong correlation between these two results across 4 orders of magnitude, with R 2 =0.98. In all cases the SCC-calculated bacterial load in each sample was within a factor of 3 of the spike-in calculated bacterial load, with average absolute error being +/-35% of the value of the spike-in calculated bacterial load.

The representative method using host cell data was effective in determining the number of commercially relevant target bacteria. The two most common dairy pathogens in New Zealand are Streptococcus uberis and Staphylococcus aureus. Figure 2 shows the amount of Streptococcus uberis (Top graph) and Staphylococcus aureus (bottom graph) in the bulk milk samples, as calculated using the host cell count (SCC) as compared to using a bacterial spike-in. A strong correlation (R 2 > 0.97) between the two results was observed. The representative method using host cell data was capable of producing results for multiple species of bacteria in a cost effective and timely manner compared to industry standard methods when implemented to detect multiple species of microorganism. In Herd Test samples for which the Staphylococcus aureus qPCR test returned a CT value, the CT value was converted to a cells/ml value using a cultured dilution series for calibration. Figure 3 compares those calculated values to the amount of Staphylococcus aureus in the Herd Test samples as calculated from the shotgun sequenced samples using equation (3b).

The results obtained with bulk milk samples clearly established the ability of the representative host cell method to detect a substantial number of bacterial species present in commercially relevant samples. To verify this ability, three milk samples were spiked with the ZymoBIOMICS™ Microbial Community Standard II (Log Distribution) reagent comprising a known concentration of each of 10 microbial species.

After shotgun sequencing of these samples, the amounts of each of the 10 microbes was calculated using formula 3b. As can readily be seen in Figure 4, the amounts of microbes determined by the method disclosed herein track closely to the manufacturer-stated amounts, with a RMSRE on the log-scale data of 16%. For the species Bacillus subtilis, reads were classified at the Family taxonomic level (Bacillaceae). The results obtained with Cryptococcus neoformans had the largest outliers in relative log values, which without wishing to be bound by any theory is believed to result from variation in lysability of these bacteria given it's thick and resistant capsule accounting for over 70% of cellular volume. Strong concordance between the abundance of other species determined in the representative method and the abundance as stated by the manufacturers was observed.

Discussion

The data presented above strongly supports the utility of the methods described herein in quantifying the absolute abundance of microorganisms present in a biological sample using nucleic acid sequencing techniques, and without the need for spiking or other sample manipulation performed so as to enable microbiota quantification. The ready implementation, ease of sample handling, and ready analysis enabled by such methods are expected to have a substantial benefit to monitoring individual subject and population health and wellbeing, to enable the rapid identification of clinically relevant microbial populations, and improve animal health and production outcomes.

Example 2: Quantification of microbiota in dairy milk samples

This example reports the assessment of methods to determine the absolute abundance of microbiota of various types present in dairy milk samples using shotgun sequencing, and the use of such methods to determine information regarding the samples.

Materials and methods

The bulk milk samples and Herd Test samples were obtained and processed as described in Example 1 above. The Herd Test samples comprised 5038 total samples from 1890 individual cows across 5 farms collected in the 2019-2022 period. Results

The number of sequencing reads per sample and the methods for determining the absolute number of bacteria present in milk samples were as discussed in Example 1 above.

Figure 5 herein shows the distribution of concentration by species across the bulk milk samples analysed. The concentration of important representative bacterial species Staphylococcus aureus (Figure 5A), Corynebacterium bovis (Figure 5B), and Bifidobacterium pseudoIongum (Figure 5C) is shown, establishing that accurate monitoring of the abundance of multiple species in a single assay capable of routine application has been achieved.

Figure 6 shows the variation in the abundance of a single bacteria, Staphylococcus aureus, in Herd Test milk samples obtained in different months (Figure 6A) and in different geographic regions (Figure 6B). Thus, the methods contemplated herein enable period- and region-specific information to be determined, having application in animal wellbeing and production management.

The method employed in this Example enables the rapid identification of outlier samples on a range of considerations. For example, Figure 7 presents data showing the variation of microbiomes in samples obtained from different regions across multiple time periods. Using principal component analysis, outlier samples (boxed regions) can readily be identified for further analysis or follow-up. Similarly, Figure 8 presents a second analysis of microbiome variation to identify outliers on the basis of host cell count, with outlier samples shown in red. Again, the methods employed in this example enable the ready identification of samples, and here suppliers, for which further follow-up (such as, for example, increased monitoring, prophylactic treatment or management of identified bacterial infections, and the like) could be considered.

Analysis of the samples to determine microbiome variation in conjunction with principal component analysis enables the observation of variation in bacterial populations across samples obtained in different regions, as shown in Figure 9, and in samples obtained at different time points, as shown in Figure 10. These analyses enable the ready identification of bacterial populations becoming prominent, for example in different seasons (see Figure 11). Thus, trends in microbial populations over time and location can be observed, enabled by the accurate quantification of microbial loading in routinely collected milk samples.

Discussion

The data presented above supports the utility of the methods described herein in determining the absolute abundance of microorganisms present in a biological sample, which, given the accuracy of the determinations enabled by the methods disclosed herein enables various trends in microbial populations to be observed across different regions and over time periods from which samples are obtained. This in turn enables the implementation of appropriate and responsive animal and farm management practices, for example to implement prophylactic treatment regimens to be initiated when a problematic abundance or population of microorganisms is identified, for example, in a sample from a particular supplier, or when an increased likelihood of a microbial population of concern in identified on the basis of related information, such as the presence of such populations in other farms in the region. Example 3: Quantification of microbiota in dairy milk samples

This example illustrates the use of methods to determine the absolute abundance of microbiota of various types present in faecal samples from dairy cows and effluent samples from dairy farms using sequencing.

Materials and methods

Faecal and Effluent Sampling

325 faecal samples were aseptically collected from individual cows on a Waikato dairy farm between 2020 and 2023. Defecation was induced by stimulating the cow's rectum with a gloved hand, and collecting faeces were collected into a 35 mL vial. Faecal samples were placed on ice after collection, before being frozen at -20°C until time for DNA extraction.

65 effluent samples were collected from five locations on four New Zealand dairy farms, mostly in 2016. Locations included the effluent pond, sand trap inlet, sand trap outlet, effluent sump, and effluent stirrer. Samples were collected into 70 mL container and placed on ice after collection, before being frozen at -20°C until time for DNA extraction.

Faecal and Effluent DNA extraction

Samples were defrosted overnight at 4°C. The samples were then mixed, before weighing out 100 mg into a 2 mL nuclease free tube (Bio-strategy, NZ). Then, 1 mL of IX DNA/RNA shield (Zymo Research) was added to the tube, and vortexed to completely re-suspend the sample before spiking with 20 pL of ZymoBIOMICS™ Spike-in Control I (High Microbial Load).

Samples were extracted using a custom BioSprint® 96 DNA Kit (Qiagen, Germany) and a Kingfisher (Thermofisher) machine, and eluted into 100 pL of elution buffer.

Quantification of absolute abundances took place as described in Example 1 above.

Results

An average of 18.9 million reads per sample was obtained.

The most commonly observed species among the faecal samples were Methanobrevibacter millerae, Rhodococcus coprophilus, and Streptococcus equinus. Figure 12 shows a plot of these species in the faecal samples.

In effluent samples the most commonly observed species were Acinetobacter pseudolwoffii, Corynebacterium xerosis, Corynebacterium marinum, Bifidobacterium pseudoIongum, Jeotgalibaca porci, and Acinetobacter Iwoffii. Figure 13 shows plot of these species in the effluent samples.

The bacterium which showed the most differences in location and time across the effluent and faecal samples was Bifidobacterium pseudoIongum. Figure 14 shows a plot of this species in the effluent samples across the 4 different farms, and Figure 15 shows a plot of this species in the faecal samples across the different years.

The primary food of New Zealand dairy cattle is typically ryegrass Lolium genus). Figure 16 shows a plot quantifying the amount of Lolium in the effluent and faecal samples. It will be appreciated that the methods exemplified in this Example are useful in establishing and monitoring a number of considerations relevant to animal production, and animal and environmental health, wellbeing and status, including the incidence, prevalence, and distribution of microbial populations in both animal subjects and in the supporting environment, as well as in monitoring and validating the provenance of animal subjects, for example with reference to food consumption and wellbeing.

Publications

Wood, D.E., J. Lu, and B. Langmead. 2019. Improved metagenomic analysis with Kraken 2. Genome Biology 20 :257. doi : 10.1186/sl3059-019-1891-0.

Land, M., L. Hauser, S.-R. Jun, I. Nookaew, M.R. Leuze, T.-H. Ahn, T. Karpinets, O. Lund, G. Kora, T. Wassenaar, S. Poudel, and D.W. Ussery. 2015. Insights from 20 years of bacterial genome sequencing. Funct Integr Genomics 15 : 141-161. doi : 10.1007/sl0142-015-0433-4.

The entire disclosures of all applications, patents and publications cited above and below, if any, are herein incorporated by reference.

Where in the foregoing description reference has been made to integers or components having known equivalents thereof, those integers are herein incorporated as if individually set forth.

It should be noted that various changes and modifications to the presently preferred examples described herein will be apparent to those skilled in the art. Such changes and modifications may be made without departing from the spirit and scope of the invention and without diminishing its attendant advantages. It is therefore intended that such changes and modifications be included within the present invention.

The invention may also be said broadly to consist in the parts, elements and features referred to or indicated in the specification of the application, individually or collectively, in any or all combinations of two or more of said parts, elements or features.

Aspects of the invention have been described by way of example only, and it should be appreciated that variations, modifications and additions may be made without departing from the scope of the invention, for example when present the invention as defined in the indicative claims. Furthermore, where known equivalents exist to specific features, such equivalents are incorporated as if specifically referred in this specification.