Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
CELL-FREE RNA LIBRARY PREPARATIONS
Document Type and Number:
WIPO Patent Application WO/2020/092646
Kind Code:
A1
Abstract:
Diverse cDNA libraries derived from cell-free mRNA and methods of preparing the same are provided. The library may be prepared by extracting RNA from a bodily fluid such as serum or plasma, separating the RNA from contaminants, synthesizing cDNA with a reverse transcriptase enzyme, and enriching protein-coding nucleotide sequences. The library may include a multiplicity of transcripts from solid tissues. Cf-RNA can be measured by qPCR, sequencing, or other suitable methods.

Inventors:
SALATHIA NEERAJ (US)
NERENBERG MICHAEL (US)
IBARRA ARKAITZ (US)
ZHUANG JIALI (US)
Application Number:
PCT/US2019/058961
Publication Date:
May 07, 2020
Filing Date:
October 30, 2019
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
MOLECULAR STETHOSCOPE INC (US)
SALATHIA NEERAJ (US)
International Classes:
C40B40/08; C12Q1/6883; C40B40/10
Foreign References:
US20130012405A12013-01-10
US20160032396A12016-02-04
US20180126003A12018-05-10
Other References:
See also references of EP 3874079A4
Attorney, Agent or Firm:
RIRIE, Ted (US)
Download PDF:
Claims:
CLAIMS

1. A method of preparing a cf-RNA sample, the method comprising:

(a) centrifuging a biological sample at from 1,600 g to 16,000 g; and

(b) isolating RNA from the biological sample;

wherein at least 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, or 1500 non blood genes selected from the list in Table 1 or low stringency non-blood genes selected from Table 10 are present in the cf-RNA sample.

2. The method of claim 1, wherein the biological sample is cell-free.

3. The method of claim 1 or 2, wherein the biological sample is serum, plasma, saliva, urine, interstitial fluid, cerebrospinal fluid, semen, vaginal fluid, amniotic fluid, tears, synovial fluid, mucus, or lymphatic fluid.

4. The method of any one of claims 1-3, wherein the biological sample is serum or plasma.

5. The method of any one of claims 1-4, further comprising performing a size selection or immune selection in the biological sample prior to (b).

6. The method of claim 5, wherein performing the size selection comprises centrifugation of the biological sample.

7. The method of claim 6, wherein the centrifugation is performed for at least 1 minute, for at least 10 minutes, from 5 minutes to 20 minutes, from 10 minutes to 15 minutes, or for about 10 minutes.

8. The method of claim 5, wherein performing the size selection comprises filtering the sample.

9. The method of any one of claims 1-8, wherein the biological sample is centrifuged at from 10,000 g to 15,000 g.

10. The method of any one of claim 1-9, wherein the biological sample is centrifuged at about 12,000 g.

11. The method of any one of claims 1-10, wherein (b) comprises isolating an extracellular vesicle from the biological sample and isolating the RNA from the extracellular vesicle.

12. The method of claim 11, wherein the extracellular vesicle is an exosome.

13. The method of any one of claims 1-12, wherein (b) comprises isolating a nucleoprotein complex from the biological sample and isolating the RNA from the nucleoprotein complex.

14. The method of any one of claims 1-13, further comprising treating the RNA with a deoxyrib onucl ease .

15. The method of claim 14, wherein the deoxyribonuclease is TurboDNase I.

16. The method of claim 14 or 15, wherein the RNA is in solution when treated with the deoxyrib onucl ease .

17. The method of any one of claims 1-16, wherein (b) comprises contacting the RNA with at least one of an affinity column, a desalting column, or a silica membrane.

18. The method of claim 17, wherein (b) comprises contacting the RNA with an affinity column, a desalting column, and a silica membrane.

19. The method of any one of claims 1-18, further comprising enriching at least one protein-coding nucleotide sequence.

20. The method of any one of claims 1-19, comprising depleting ribosomal RNA sequences from the RNA.

21. A method of identifying a cf-RNA molecule, the method comprising:

(a) isolating RNA from a biological sample;

(b) preparing a cDNA library from the RNA;

(c) sequencing the cDNA library; and

(d) identifying at least one gene in the cDNA library, wherein the biological sample is substantially cell-free, and wherein at least 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, or 1500 non-blood genes selected from the list in Table 1 or low stringency non-blood genes selected from Table 10 are detected.

22. The method of claim 21, wherein the biological sample is cell free.

23. The method of claim 21 or 22, further comprising (e) aligning sequences from the cDNA library to a reference genome.

24. The method of any one of claims 21-23, wherein the biological sample is serum, plasma, saliva, urine, interstitial fluid, cerebrospinal fluid, semen, vaginal fluid, amniotic fluid, tears, synovial fluid, mucus, or lymphatic fluid.

25. The method of any one of claims 21-23, wherein the biological sample is serum or plasma.

26. The method of any one of claims 21-25, wherein at least 35, 50, 75, 100, 200, 300, 400, or 500 tissue-specific genes selected from Table 2 are identified.

27. The method of any one of claims 21-25, wherein at least 20, 30, 40, 50, 100, or 150 brain-specific genes selected from Table 6 are identified.

28. The method of any one of claims 21-25, wherein at least 3, 5, 10, 20, or 50 liver- specific genes selected from Table 7 or one liver-diagnostic genes selected from Table 8 are identified.

29. The method of any one of claims 21-25, wherein at least 3, 5, 10, or 15 pregnancy associated genes selected from Table 9 are identified.

30. The method of any one of claims 21-28, wherein a first gene is identified, and wherein the RNA comprises less than 500, 200, 150, 100, 50, 25, or 15 cf-mRNA polynucleotides that align to the first gene.

31. The method of any one of claims 21-30, wherein at least 2, 4, 6, 8, or 10 unique fragments are detected per 100 reads.

32. The method of any one of claims 21-31, wherein at least 2, 4, 6, 8, or 10 protein-coding genes are detected per 10,000 reads.

33. The method of any one of claims 21-32, further comprising performing a size selection or immune selection in the biological sample prior to (a).

34. The method of claim 33, wherein performing the size selection comprises centrifugation of the biological sample.

35. The method of claim 34, further comprising centrifuging the biological sample at from 1,600 g to 16,000 g.

36. The method of claim 35, wherein the centrifuging is performed for at least 1 minute, for at least 5 minutes, for at least 10 minutes, from 5 minutes to 20 minutes, from 10 minutes to 15 minutes, or for about 10 minutes.

37. The method of claim 35 or 36, wherein the biological sample is centrifuged at from 10,000 g to 15,000 g.

38. The method of claim 35 or 36, wherein the biological sample is centrifuged at about

12,000 g.

39. The method of claim 34, wherein performing the size selection comprises filtering the sample.

40. The method of any one of claims 21-39, wherein (a) comprises isolating an extracellular vesicle from the biological sample and isolating the RNA from the extracellular vesicle.

41. The method of claim 40, wherein the extracellular vesicle is an exosome.

42. The method of any one of claims 21-41, wherein (a) comprises isolating a nucleoprotein complex from the biological sample and isolating the RNA from the nucleoprotein complex.

43. The method of any one of claims 21-42, further comprising adding an exogenous RNA polynucleotide comprising a first nucleotide sequence to the biological sample and detecting a cDNA polynucleotide comprising the first nucleotide sequence, wherein the first nucleotide sequence of the cDNA polynucleotide comprises a thymine at each position where the first nucleotide sequence of the RNA polynucleotide comprises a uracil.

44. The method of any one of claims 21-43, further comprising treating the RNA with a deoxyrib onucl ease .

45. The method of claim 44, wherein the deoxyribonuclease is TurboDNase I.

46. The method of claim 44 or 45, wherein the RNA is in solution when treated with the deoxyrib onucl ease .

47. The method of any one of claims 21-46, wherein (a) comprises contacting the RNA with at least one of an affinity column, a desalting column, or a silica membrane.

48. The method of claim 47, wherein (a) comprises contacting the RNA with an affinity column, a desalting column, and a silica membrane.

49. The method of any one of claims 21-48, wherein (b) comprises contacting the RNA with a primer comprising a random sequence.

50. The method of claim 49, wherein the primer is a hexanucleotide

51. The method of claim 50, wherein the concentration of the hexanucleotide is at least 60 mM, 70 pM, 80 pM, 90 pM, 100 pM, 150 pM, 200 pM, 300 pM, 400 pM, 500 pM, 600 pM, 700 pM, 800 pM, 900 pM, 1000 pM, 1100 pM, 1200 pM, 1300 pM, 1400 pM, or 1500 pM.

52. The method of any one of claims 21-51, wherein (b) comprises forming a single- stranded cDNA.

53. The method of claim 52, further comprising contacting the RNA with a reverse transcriptase to form the single-stranded cDNA.

54. The method of any one of claims 51-53, further comprising forming a double-stranded cDNA from the single-stranded cDNA.

55. The method of claim 54, further comprising contacting the single-stranded DNA with a NEBNext DNA polymerase to form the double-stranded cDNA.

56. The method of claim 55, further comprising ligating unique dual indexes to both ends of the double-stranded cDNA.

57. The method of any one of claims 21-56, further comprising enriching at least one protein-coding nucleotide sequence.

58. The method of claim 57, comprising depleting ribosomal RNA sequences from the RNA.

59. The method of claim 57, comprising depleting ribosomal RNA sequences from the cDNA library.

60. The method of any one of claims 57-59, comprising isolating the at least one protein coding sequence from the RNA.

61. The method of any one of claims 57-59, comprising isolating the at least one protein coding sequence from the cDNA.

62. The method of claim 61, comprising hybridizing whole exome baits to the cDNA.

63. The method of claim 62, wherein the whole exome baits comprise RNA

polynucleotides.

64. The method of claim 62, wherein the whole exome baits comprise DNA

polynucleotides.

65. A method of detecting at least 10, 20, 30, 50, or 100 non-blood cf-mRNAs genes in a biological sample, the method comprising:

(a) centrifuging a serum or plasma sample for at least 10 minutes at from 8,000 g to 16,000 g to form a supernatant;

(b) extracting RNA from the supernatant;

(c) contacting the RNA with a deoxyribonuclease;

(d) forming cDNA from the RNA

(e) preparing a cDNA library from the cDNA;

(f) sequencing the cDNA library; and

(g) aligning the sequences to a reference genome to identify sequences arising from at least 10, 20, 30, 50, or 100 non-blood cf-mRNAs genes per biological sample.

66. The method of claim 65, further comprising (h) contacting the cDNA library with baits comprising polynucleotide fragments from at least 10, 20, 30, 50, or 100 genes of interest to enrich translated genes.

67. The method of claim 65, wherein (d) comprises contacting the RNA with a reverse transcriptase enzyme to form a single-stranded cDNA and contacting the single-stranded cDNA with a second strand synthesis enzyme to form double-stranded cDNA.

68. The method of claim 65, wherein (e) further comprises ligating unique dual indexes to the cDNA library to form an indexed cDNA library.

69. The method of claim 68, further comprising pooling up to ten indexed cDNA libraries.

70. The method of claim 69, wherein (f) comprises performing massively parallel sequencing on the pooled cDNA libraries.

71. The method of any one of claims 65-70, wherein at least 25, 50, 75, 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000 genes are detected in the biological sample.

72. The method of any one of claims 65-71, wherein the sequences are aligned to a reference genome to identify sequences arising from at least 25, 50, 75, 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000 non-blood cf-mRNAs genes per biological sample.

73. The method of any one of claims 65-70, further comprising contacting the single- stranded cDNA with a second-strand synthesis enzyme to form the double stranded cDNA.

74. The method of any one of claims 65-70, wherein (c) is performed in solution.

75. A method of detecting at least 10, 20, 30, 50, or 100 non-blood cf-mRNAs genes in a biological sample, the method comprising:

(a) centrifuging or filtering a serum or plasma sample at from 1,900 g to 16,000 g;

(b) extracting an RNA sample from the supernatant;

(c) contacting the RNA sample with a deoxyribonuclease;

(d) contacting the RNA with a reverse transcriptase enzyme to form a single-stranded cDNA;

(e) contacting the single-stranded cDNA with a second-strand synthesis enzyme to form the double stranded cDNA;

(f) preparing a cDNA library from the double-stranded cDNA;

(g) contacting the indexed cDNA library with baits comprising polynucleotide fragments to enrich translated genes;

(h) sequencing the cDNA library; and

(i) aligning the sequences to a reference genome to identify sequences arising from at least 10, 20, 30, 50, or 100 non-blood cf-mRNAs genes per biological sample.

76. The method of claim 65, further comprising (j) adding unique dual indexes to the cDNA library to form an indexed cDNA library.

77. The method of claim 66, further comprising (k) pooling up to ten indexed cDNA libraries.

78. The method of claim 77, further comprising (1) performing massively parallel sequencing on the pooled cDNA libraries.

79. The method of any one of claims 75-78, wherein (c) is performed in solution.

80. A cf-mRNA sequencing library comprising cDNA molecules arising from at least 500, 600, 700, 800, 900, 1000, 1250, or 1500 non-blood genes selected from the list in Table 1 or low stringency non-blood genes selected from Table 10.

81. A cf-mRNA sequencing library comprising at least one cDNA molecule arising from at least 25, 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000 non-blood genes selected from the list in Table 1 or low stringency non-blood genes selected from Table 10 per 1,000,000 cDNA polynucleotides.

82. The cf-mRNA sequencing library of claim 80 or 81, comprising cDNA molecules arising from at least 35 50, 100, 200, 300, 400, or 500 tissue-specific genes selected from Table 2.

83. The cf-mRNA sequencing library of claim 80 or 81, comprising cDNA molecules arising from at least 18, 20, 50, or 100 brain-specific genes selected from Table 6.

84. The cf-mRNA sequencing library of claim 80 or 81, comprising cDNA molecules arising from at least 3, 5, 10, 20, or 50 liver-specific genes selected from Table 7 or liver genes of interest selected from Table 8.

85. The cf-mRNA sequencing library of claim 80 or 81, comprising cDNA molecules arising from at least 3, 5, 10, or 15 pregnancy-associated genes selected from Table 9.

86. A cf-mRNA sequencing library comprising cDNA polynucleotides arising from at least 2000, 3000, 4000, 5000, or 6000 protein coding genes, wherein at least 8%, 15%, or 24% of the protein coding genes are non-blood genes.

Description:
CELL-FREE RNA LIBRARY PREPARATIONS

CROSS REFERENCE

[0001] This application claims the benefit of U.S. Provisional Application No. 62/752,533, filed October 30, 2018, which is entirely incorporated herein by reference.

BACKGROUND

[0002] A variety of markers are available for detecting various conditions. However, many of these conditions are ones that can affect different tissues. Detecting markers of these conditions in circulation, such as in a blood sample, are not always helpful in identifying which tissue is affected. For example, generic markers for inflammation can indicate an inflammatory response somewhere in the body, but it may not be known which tissue is suffering the response, such as the liver, kidney, lungs, or joints. Tissue-specific tests, such as biopsies, are often invasive, carrying a risk of infection, and typically not comprehensive of the entire organ or tissue. Imaging techniques, such as MRIs and CT-scans, may be used to assess tissue health, but generally can only detect overt features and changes. Thus, these imaging techniques are generally not sensitive enough to pick up early onset of conditions or fairly recent developments of conditions.

INCORPORATION BY REFERENCE

[0003] All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference.

SUMMARY

[0004] Cell-free mRNA provides a potential window into the health, phenotype, and developmental programs of a variety of tissues and organs. The present disclosure provides diverse cell-free mRNA libraries enriched in non-blood genes and methods of preparing the same.

[0005] In one aspect, provided herein is a method of preparing a cf-RNA sample comprising (a) centrifuging a biological sample at from 1,600 g to 16,000 g and (b) isolating RNA from the biological sample, wherein at least 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, or 1500 non-blood genes selected from the list in Table 1 or low stringency non-blood genes selected from Table 10 are present in the cf-RNA sample. The biological sample may be a cell-free biological sample; and may be serum, plasma, saliva, urine, interstitial fluid, cerebrospinal fluid, semen, vaginal fluid, amniotic fluid, tears, synovial fluid, mucus, or lymphatic fluid. In certain embodiments, the biological sample is serum or plasma.

[0006] The method of preparing a cf-RNA sample may comprise performing a size selection or immune selection in the biological sample prior to isolating RNA from the biological sample. In some embodiments, performing the size selection comprises centrifugation of the biological sample. Centrifugation may be performed for at least 1 minute, for at least 10 minutes, from 5 minutes to 20 minutes, from 10 minutes to 15 minutes, or for about 10 minutes. In some embodiments, the biological sample is centrifuged at from 10,000 g to 15,000 g. In some embodiments, the biological sample is centrifuged at about 12,000 g. In some embodiments, performing the size selection comprises filtering the sample.

[0007] In some embodiments, isolating RNA from the biological sample comprises isolating an extracellular vesicle, which may be an exosome, from the biological sample and isolating the RNA from the extracellular vesicle. In some embodiments, isolating RNA from the biological sample comprises isolating a nucleoprotein complex from the biological sample and isolating the RNA from the nucleoprotein complex.

[0008] The method of preparing a cf-RNA sample may further comprising treating the RNA with a deoxyribonuclease. In some aspects, the deoxyribonuclease is TurboDNase I. In some embodiments, the RNA is treated with the deoxyribonuclease in solution.

[0009] In some embodiments, isolating RNA from the biological sample comprises contacting the RNA with at least one of an affinity column, a desalting column, or a silica membrane. In additional embodiments, the RNA is contacted with an affinity column, a desalting column, and a silica membrane.

[0010] In some embodiments, the method of preparing a cf-RNA sample further comprises enriching at least one protein-coding nucleotide sequence. In other embodiments, the method of preparing a cf-RNA sample comprises depleting ribosomal RNA sequences from the RNA.

[0011] In another aspect, provided herein is a method of identifying a cf-RNA molecule comprising (a) isolating RNA from a biological sample; (b) preparing a cDNA library from the RNA; (c) sequencing the cDNA library; and (d) identifying at least one gene in the cDNA library, wherein the biological sample is substantially cell-free, and wherein at least 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, or 1500 non-blood genes selected from the list in Table 1 or low stringency non-blood genes selected from Table 10 are detected. In some embodiments, the method of identifying a cf-RNA molecule further comprises aligning sequences from the cDNA library to a reference genome.

[0012] In this method of identifying a cf-RNA molecule, in some aspects, the biological sample is cell free. In some embodiments, the biological sample is serum, plasma, saliva, urine, interstitial fluid, cerebrospinal fluid, semen, vaginal fluid, amniotic fluid, tears, synovial fluid, mucus, or lymphatic fluid. In other embodiments, the biological sample is serum or plasma.

[0013] In some embodiments, the method of identifying a cf-mRNA molecule identifies at least 1, 5, 10, 20, 50, 100, 200, 300, 400, or 500 tissue-specific genes selected from Table 2; at least 1, 5, 10, 20, 50, 100, or 150 brain-specific genes selected from Table 6; at least 1, 5, 10, 20, or 50 liver-specific genes selected from Table 7 or liver-diagnostic genes from Table 8; or any combination thereof.

[0014] In some aspects, the method of identifying a cf-RNA molecule provided herein comprises identifying a first gene, wherein the RNA comprises less than 500, 200, 150, 100, 50, 25, or 15 cf-mRNA polynucleotides that align to the first gene.

[0015] In some embodiments of the method of identifying a cf-mRNA molecule, at least 2, 4, 6, 8, or 10 unique fragments are detected per 100 reads. In some embodiments, at least 2, 4,

6, 8, or 10 protein-coding genes are detected per 10,000 reads.

[0016] In some embodiments, the method of identifying a cf-mRNA molecule further comprises performing a size selection or immune selection in the biological sample prior to isolating RNA from a biological sample. In some aspects, the size selection comprises centrifugation of the biological sample. The biological sample may be centrifuged at from 1,600 g to 16,000 g; and may be centrifuged for at least 1 minute, for at least 5 minutes, for at least 10 minutes, from 5 minutes to 20 minutes, from 10 minutes to 15 minutes, or for about 10 minutes. In some embodiments, the biological sample is centrifuged at from 10,000 g to 15,000 g, or at about 12,000 g. In other embodiments, performing the size selection comprises filtering the sample.

[0017] In some embodiments of the method of identifying a cf-RNA molecule provided herein, isolating RNA from a biological sample comprises isolating an extracellular vesicle from the biological sample and isolating the RNA from the extracellular vesicle. In some embodiments, the extracellular vesicle is an exosome.

[0018] In some embodiments, isolating RNA from a biological sample comprises isolating a nucleoprotein complex from the biological sample and isolating the RNA from the nucleoprotein complex. In some embodiments, the method of identifying a cf-RNA molecule further comprises adding an exogenous RNA polynucleotide comprising a first nucleotide sequence to the biological sample and detecting a cDNA polynucleotide comprising the first nucleotide sequence, wherein the first nucleotide sequence of the cDNA polynucleotide comprises a thymine at each position where the first nucleotide sequence of the RNA polynucleotide comprises a uracil.

[0019] In some embodiments, the method of identifying a cf-RNA molecule further comprises treating the RNA with a deoxyribonuclease. In some embodiments, the deoxyribonuclease is TurboDNase I. In some embodiments, the RNA is in solution when treated with the deoxyribonuclease.

[0020] In some embodiments, the isolating RNA from a biological sample step comprises contacting the RNA with at least one of an affinity column, a desalting column, or a silica membrane. In further embodiments, the RNA is contacted with an affinity column, a desalting column, and a silica membrane.

[0021] In some aspects, preparing a cDNA library from the RNA comprising a random sequence, which may be a random hexanucleotide. In some embodiments, the concentration of the random hexanucleotide is at least 60 mM, 70 pM, 80 pM, 90 pM, 100 pM, 150 pM, 200 pM, 300 pM, 400 pM, 500 pM, 600 pM, 700 pM, 800 pM, 900 pM, 1000 pM, 1100 pM, 1200 pM, 1300 pM, 1400 pM, or 1500 pM.

[0022] In some embodiments, the preparing a cDNA library from the RNA step of identifying a cf-mRNA molecule comprises forming a single-stranded cDNA. In some aspects, the method further comprising contacting the RNA with a reverse transcriptase to form the single-stranded cDNA. In additional aspects, a double-stranded cDNA is formed from the single-stranded cDNA. In yet further aspects, the single-stranded DNA is contacted with a NEBNext DNA polymerase to form the double-stranded cDNA. In some

embodiments, the method further comprises ligating unique dual indexes to both ends of the double-stranded cDNA.

[0023] In some embodiments, the method of identifying a cf-RNA molecule further comprises enriching at least one protein-coding nucleotide sequence. The enrichment comprises depleting ribosomal RNA sequences from the RNA in some embodiments, and depleting ribosomal RNA sequences from the cDNA library in some embodiments. In some embodiments, enriching at least one protein-coding nucleotide sequence comprises isolating the at least one protein-coding sequence from the RNA, or from the cDNA. In some embodiments, enriching at least one protein-coding nucleotide sequence comprises hybridizing whole exome baits to the cDNA. The whole exome baits may be RNA

polynucleotides or DNA polynucleotides.

[0024] Other aspects of the disclosure provided herein are cf-mRNA sequencing libraries comprising cDNA molecules arising from at least 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000 non-blood genes selected from the list in Table 1 or low stringency non-blood genes selected from Table 10; at least 1, 10, 20, 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000 non-blood genes selected from the list in Table 1 or low stringency non-blood genes selected from Table 10 per 1,000,000 cDNA polynucleotides; at least 5, 10, 20, 50,

100, 200, 300, 400, or 500 tissue-specific genes selected from Table 2; at least 1, 5, 10, 20, 50, or 100 brain-specific genes selected from Table 6; or at least 1, 5, 10, 20, or 50 liver- specific genes selected from Table 7.

[0025] Yet another aspect of the disclosure is a cf-mRNA sequencing library comprising cDNA polynucleotides arising from at least 2000, 3000, 4000, 5000, or 6000 protein coding genes, wherein at least 8%, 15% or 24% of the protein coding genes are non-blood genes.

BRIEF DESCRIPTION OF THE DRAWINGS

[0026] The novel features of the disclosure are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present disclosure will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the disclosure are utilized, and the accompanying drawings.

[0027] FIG. 1 depicts a flowchart of a method of cf-mRNA analysis according to some embodiments of the present disclosure.

[0028] FIG. 2A-2E are graphs showing the centrifugation of plasma at forces ranging from 1,900 g to 16,000 g or the filtration of plasma through membranes with pore sizes ranging from 0.2 um to 0.8 um depletes cf-mRNA transcripts derived from red blood cells (FIG. 2A), platelets (FIG. 2B), neutrophils (FIG. 2C), liver (FIG. 2D), and brain (FIG. 2E). [0029] FIGS. 3A-3B are graphs showing enrichment of non-blood cf-mRNA transcripts with increasing centrifugal force. Blood transcripts are depleted at lower speeds than non-blood transcripts. FIG. 3A shows copy number of blood and non-blood transcripts. The number of transcripts was normalized to 1.0 for the 1900 g spin. FIG. 3B shows normalized number of blood and non-blood transcripts per million. For each group, the highest number of transcripts per million was normalized to 1.0.

[0030] FIG. 4 is a graph depicting number of non-blood genes detected from cf-mRNA transcripts after centrifugation with forces ranging from 1,900 g to 16,000 g.

[0031] FIG. 5 is a graph depicting cf-RNA yields for three RNA extraction kits as determined by qPCR for b-actin cf-mRNA.

[0032] FIGS. 6A-6B are graphs showing enhanced yield of shorter cf-RNA polynucleotide fragments with the QIAamp Circulating Nucleic Acid kit using an optimized miRNA extraction protocol (FIG. 6A) compared with the manufacturer’s standard nucleic acid extraction protocol as determined by capillary electrophoresis on a Bioanalyzer (FIG. 6B).

[0033] FIG. 7 is a graph depicting enhanced cf-RNA yield with the QIAamp Circulating Nucleic Acid kit using an optimized miRNA extraction protocol compared with the QIAamp ccfDNA/RNA kit.

[0034] FIG. 8 is a graph illustrating that treatment of the extracted c-RNA with TurboDNase I in solution eliminated trace contamination with DNA polynucleotides as determined by capillary electrophoresis on a Bioanalyzer.

[0035] FIG. 9 is a graph showing that removing inhibitors from the sample increased the apparent yield of cf-RNA as determined by qPCR of 18S rRNA.

[0036] FIGS. 10A-10B are graphs showing the OneStep PCR inhibitor removal column (Current desalting column) retains less RNA than other columns the Micro-Bio-Spin column (Desalting column 1) as determined by capillary electrophoresis on a Bioanalyzer.

[0037] FIG. 11 is a graph showing reduced cf-RNA extraction failures with the method described in Example 1 (Now) compared with an older method (Before).

[0038] FIG. 12 is a graph showing recovery of cf-RNA with the method described in Example 1 is linear, consistent and increases with increased plasma input as determined by qPCR of b-actin cf-mRNA.

[0039] FIG. 13 is a graph quantifying RNA yields with different reverse transcriptase enzymes as determined by qPCR of 18S cDNA. [0040] FIG. 14 is a graph depicting conversion of RNA into cDNA from varied amounts of input cf-RNA as determined by qPCR of 18S cDNA.

[0041] FIGS. 15A-15B are graphs depicting unique sequence fragments in cDNA libraries prepared using Accel-NGS 1S Plus (Swift 1) and Accel-NGS 2S Plus (Swift 2) (FIG. 15A) and standard and optimized Accel-NGS 2S Plus protocols (FIG. 15B).

[0042] FIGS. 16A-16B are graphs showing misassigned sequences using unique dual indexes (FIG. 16 A) compared to standard indexes (FIG. 16B).

[0043] FIG. 17 depicts abundance of RNA forms in total cf-RNA (verl), in rRNA depleted cf-RNA (Ver2) and upon whole exome capture of mRNA (Ver3) as determined by DNA sequencing.

[0044] FIGS. 18A-18C are graphs showing the sensitivity of cf-mRNA sequencing using the methods described in Example 1 (Swift 1S) compared to the SMART er kit with rRNA depletion. Detection sensitivity for ERCC standards (FIG. 18A). Number of genes detected (FIG. 18B). Exemplary determinations of the detection sensitivity for ERCC standards

(FIGS. 18C). ERCC were standards spiked into samples from four patients (Pt 7171, Pt,

7131, Pt 7139, and Pt 7l55).

[0045] FIGS. 19A-19E are graphs illustrating comparison of cf-mRNA libraries prepared by the method of Example 1 with cf-mRNA libraries described by Pan et al. (FIG. 19A) number of sequencing reads; (FIG. 19B) number of unique fragments detected; (FIG. 19C) number of protein-coding genes detected; (FIG. 19D) number of genes with >80% coverage; and (FIG. 19E) number of liver genes detected.

DETAILED DESCRIPTION

[0046] Provided herein are methods that can employ upfront centrifugation to reduce contamination of unwanted“blood” transcripts from cf-mRNA sequencing data. The methods herein can reduce background noise arising from blood cell RNA (the“blood component”). Such noise can increase sequencing depth requirements and dilute signal from tissue-specific cf-mRNA.

[0047] Protocols, methods and kits disclosed herein can be consistent with a broad range of centrifugal force ranges, such as ranges spanning, lower than or greater than from 1,500 g to 20,000 g, 1,900 g to 16,000 g, 4,000g to 16,000 g, 8,000 g to 16,000, l0,000g to 14,000 g, l l,000g to 13,000 g, 11,500 g to 12,500 g, about 12,000 g, essentially 12,000 g, substantially l2,000g, or about 12,000 g. Some ranges span about 12,000 g. Some ranges are within 100 g of 12,000 g. Some centrifugation protocols do not differ substantially from 12,000 g, such as centrifugations at 12,000 g. Some ranges are within 100 g of 16,000 g. Some centrifugation protocols do not differ substantially from 16,000 g, such as centrifugations at 16,000 g.

Alternate ranges having a starting point at a low figure listed above or ending at a high figure listed above are also contemplated. Such centrifugation protocols can contribute to an improvement such as a 2.5x (for example, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9. 2.0, 2.1,

2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3.0, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9. 40 or greater than 4. Ox) improvement in the diversity of an extracted cf-RNA sample for processing.

[0048] The rate of separation in a suspension of particles by way of gravitational force applied by centrifugation generally depends on the particle size and density. Particles of higher density or larger size generally travel at a faster rate and at some point can be separated from particles less dense or smaller. Alternative technologies for separating particles according to their size include, but are not limited to, gel filtration chromatography and filtration through size-selective membranes. All such technologies are within the scope of this disclosure.

[0049] Some commercially available extraction protocols can exhibit high sample extraction failure rates, extract low amounts of cf-mRNA, and fail to eliminate many contaminants that cause downstream assay steps to underperform. Such kits and protocols may only extract sub- populations of either smaller or larger cf-mRNA fragments. As such, provided herein are methods for extracting cf-mRNA from blood, which aids in generating high quality sequencing data that can be rich in biological information. The methods herein may employ a kit for consistent extraction of cf-mRNA from blood with a low failure rate and an enhanced yield of cf-mRNA. Such a yield may retain both the smaller and larger cf-mRNA fragments to produce amplifiable cf-mRNA.

[0050] As disclosed herein, some approaches may improve sample extraction success or RNA library diversity through the retention of eluates of at least one extraction wash step, such that small RNA polynucleotides otherwise lost in a wash step eluate are retained so as to contribute to diversity of an RNA library for processing.

[0051] Low level DNA contamination can be a source of error in gene-expression

quantification, whereas contaminants in blood may inhibit downstream assay biochemistries. Further, commercially available RNA extraction kits can ignore steps to remove DNA or recommend on-column DNAse treatments, which may be suboptimal for robust removal of DNA. For example, in low-yield cf-mRNA samples, low levels of contamination can contribute to significant data misrepresentation. As such, provided herein are methods and systems configured with cf-mRNA washing conditions to remove contaminating substance in blood. Further, such methods may eliminate sporadic genomic DNA contamination of cf- mRNA samples.

[0052] Alternately, or in combination, methods and systems disclosed herein may remove contaminating substances by adding an enzymatic DNAse step to remove DNA

contamination and/or carry-over. A number of enzymatic and nonenzymatic DNA-removing treatments are consistent with the disclosure herein, often sharing an effect of a removal of DNA from a cf-RNA sample. The methods herein can provide a desalting cleanup column that enhances sample amplifiability (such as by removal of inhibitors) and cf-mRNA enrichment, diversity and yield.

[0053] Oligo-dT priming for cDNA synthesis may be suboptimal for fragmented and/or degraded mRNA. In particular, degraded samples may comprise fragments lacking poly-A tails, and incomplete reverse transcription may lead to reverse-transcription products lacking a 5’ region. Accordingly, some systems, methods and kits consistent with the disclosure herein may comprise a step of adding reagents for random priming of reverse transcription, such as using oligos comprising up to 4, 5, 6, 7, 8, 9, 10 or more than 10 bases, such as pentamers, hexamers, heptamers, octamers, nonamers, or decamers. In some embodiments, hexamers may be used to prime reverse transcription.

[0054] Further, some commercial enzymes may inhibit cDNA production due to inhibitors from previous steps, whereas quantification of cDNA for reverse-transcriptase enzymes may display poor quantification accuracy when using known RNA inputs.

[0055] Provided herein are systems and methods that may improve RNA to cDNA

conversion efficiency and quantification accuracy from cf-mRNA. The methods herein may employ relatively high concentrations (such as concentrations greater than those

recommended in some commercially available kits) of oligos such as hexamers instead of oligo-dT priming for cDNA synthesis, whereas the best reverse-transcriptase enzyme was selected to produce the highest quantity and amount of cDNA from the RNA inputs. Oligos such as random hexamers or other length oligos can be used at a range of concentrations consistent with the disclosure herein. For example, concentrations of up to, at least, consistent with the range of, about or substantially 60 mM, 70 pM, 80 pM, 90 pM, 100 pM, 150 pM, 200 pM, 300 pM, 400 pM, 500 pM, 600 pM, 700 pM, 800 pM, 900 pM, 1000 pM, 1100 pM, 1200 pM, 1300 pM, 1400 pM, or 1500 pM, or greater than 1500 pM are contemplated. Concentrations within these ranges are also consistent with the disclosure herein, such as at least 60 mM, 70 mM, 80 mM, 90 mM, 100 mM, 150 mM, 200 mM, 300 mM, 400 mM, 500 mM, 600 mM, 700 mM, 800 mM, 900 mM, 1000 mM, 1100 mM, 1200 mM, 1300 mM, 1400 mM, or 1500 mM , or greater than 1500 mM. That is, in some cases, random oligos such as random hexamers can be used at about 200 mM. In some cases, random oligos such as random hexamers can be used at about 500 mM. In some cases, random oligos such as random hexamers can be used at about 1000 mM. In some cases, random oligos such as random hexamers can be used at about 1500 mM. In some cases, random oligos such as random hexamers can be used at about 2000 mM. Fractional concentrations are also contemplated.

[0056] Alternately, random oligos such as random hexamers or other length oligos may be used at a higher concentration relative to an amount recommended in a kit. Concentrations such as in a range of 2x, 3x, 4x, 5x, 6x, 7x, 8x, 9x, lOx, 1 lx, 12c, 13c, 14c, 15c, 16c, 17c, 18c, 19c, 20x, 2lx, 22x, 23x, 24x, 25x, 26x, 27x, 28x, 29x, 30x, 3 lx, 32x, 33x, 34x, 35x, 36x, 37x, 38x, 39x, 40x, 4lx, 42x, 43x, 44x, 45x, 46x, 47x, 48x, 49x, 50x or greater than 50x are contemplated for use with the methods, systems and kits herein. In some cases, concentrations ranging from 15c to 40x, 20x to 35x, 25x to 35x, 28x to 32x, or at least 25x, 26x, 27x, 28x, 29x, 30x, 3 lx, 32x, 33x, 34x, 35x, or greater than 35x are used, such as, for example, 3 Ox.

[0057] The implementation of random hexamers at high quantities and a specific reverse- transcriptase enzyme can enable robust and accurate amounts of cDNA to go into library prep. The methods and systems herein may harness improved cDNA synthesis processes to identify improved Library Prep protocol to reduce the number of sample failures and to improve the richness and robustness of biological data and tissue-specific transcript identification. Such methods can reduce the amount of sequencing resources wasted on uninformative reads, such ribosomal RNA, which can comprise >80% of the transcriptome. As such, the methods and systems herein can include whole-exome enrichment to capture only cf-mRNA. Improvements in assay sensitivity RNA molecule detection can be shown. Further, the methods and systems herein may leverage enrichment protocols which are typically not used for RNA-preps and obtain custom probes to capture spike-in transcript cDNA.

[0058] In some embodiments, selected populations of cf-mRNA and/or cDNA derived from cf-mRNA can be enriched by hybridization to baits representative of certain organs or tissues such as brain, liver, lung, bladder, kidney, heart, breast, stomach, intestine, colon, gall bladder, pancreas, lung, prostate, ovary, epithelial, connective, nervous, or muscular. In some embodiments, selected populations of cf-mRNA and/or cDNA derived from cf-mRNA can be enriched by hybridization to baits that distinguish between certain organs or tissues or are diagnostic or prognostic for a disease or condition.

[0059] In some instances, methods provided herein can improve the efficiency of converting RNA to a sequence-able cDNA library by, for example, choosing a DNA-seq library kit that exhibits improved efficiency. To facilitate application of a reverse-transcribed cf-RNA kit to a sequencing library protocol, in some cases, cDNA libraries can be treated using a second strand synthesis enzyme or protocol so as to generate a population of double-stranded cDNA molecules representative of the cf-RNA in the sample. Double-stranded DNA molecules so generated can then be subjected to analysis or sequencing library generation using protocols directed to DNA library generation rather than RNA or single-stranded DNA library generation. In some cases, library generation protocols directed to double-stranded DNA are observed to produce higher-quality libraries for downstream analysis such as sequencing libraries, than are produced through protocols directed to library generation from cf-RNA or single-stranded reverse-transcription products. In certain embodiments, cf-RNA can be treated using a method comprising contacting an RNA sample to a reverse-transcriptase such as Superscript IV, prior to a second strand synthesis regimen comprising, for example, an NEBNext polymerase, prior to initiation of a sequencing library protocol directed to double- stranded DNA.

[0060] Also provided herein are methods, systems and kits that may mitigate misassignment of sequencing reads to the wrong sample. The methods and systems may minimize loss of cf- mRNA library from stringent cleanup conditions during the enrichment process. Stringent conditions may be required to prevent carry-over of indexed primers that can partake in subsequent PCR amplification of cf-mRNA derived library. The method may comprise employing reagents from IDT technologies with unique dual indexes (UDI) to prevent misalignment of sequencing reeds. When standard indexes were used, sequencing reads were misassigned to a negative control (NTC).

[0061] As the majority of transcripts found in blood may be derived from blood-cells, provided herein is a list of“non-blood” genes, which can be detected in blood. The list was determined by merging sample processing (centrifugation speeds) and bioinformatics tools to identify“non-blood” and tissue-specific signatures. Non-blood vs blood transcripts as a function of centrifugation speeds was determined. Centrifugation speeds ranging from 8,000 g to 16,000 g provided a balance between the number of transcripts and genes detected and signal to noise ratio.

[0062] A partial list of genes relevant to the identification of non-blood cf-RNA transcripts in blood includes the following: Gene ID SEMA3F; HSPB6; MEOX1; CX3CL1; CDKL3; SEMA3G; DCN; IGF1; WWTR1; PHLDB1; SNAI2; CPS1; RAI14; PREX2; KITLG; ELN; BCAR1; ITIH1; LIMCH1; WISP2; CALCRL; EML1; KIF26A; ACSM2B; ADGRF5; GAL; PTPN21; LMCD1; LNX1; FERMT2; CD5L; NTN4; NUAK1; RASAL2; CTTNBP2; RARB; FBLN1; MAP2; NEBL; HOXA9; RAPGEF3; RIMS 1 ; PTPRH; CADPS2; COL16A1;

MECOM; MMP2; PIR; EPB41L1; ARHGAP28; NOS1; FXYD3; RAPGEF4; TF; APOH; PITPNM3; ZFHX4; CCDC80; TGFB2; GABRP; FM02; CRTAC1; PALMD; PALM;

CARD 10; RASL10A; RBFOX2; GALNT16; CCM2L; PLS3; ASB9; GABRE; FLT1;

ZNF423; NDRG4; CD276; TJP1; PLAT; TUSC3; CLEC4M; NOVA2; SYDE1; RASIP1; ATP6V0A4; CAV1; MET; HOXA5; TSPAN12; SFRP4; MEOX2; RARRES2; GLI3; OGN; LHX6; PTGR1; AMBP; MPDZ; GLIS3; APBA1; ATRNL1; CXCL12; PALD1; CCL2; COL1A1; HLF; KIAA1211; SOD3; CRYAB; APOA4; APOC3; ART4; MGP; CDCA3; AICDA; TPD52L1; LAMA4; C7; FGF1; LIFR; DPYSL3; HRG; AMOTL2; RBP1; FGF12; EVA1A; EFEMP1; IGFBP5; EFHD1; TPO; SDC1; RND3; PARD3B; PRRX1; PRG4;

PLA2G4A; NR5A2; ADGRL2; MFAP2; KIF17; HSD11B1; PROX1; APOA1; TTR;

ELOVL4; FILIP1; PCDH17; ELOVL3; NKX2-3; TEK; KIAA1217; IQSEC3; TBX2;

FABP3; TMEM54; HOXA7; DNAI1; RASSF8; IL13RA2; SLC12A5; PTGIS; POF1B;

HIF3A; HIST1H1A; NRN1; SSUH2; MT1G; ID1; F10; RHOJ; AIF1L; MASP1; PTPRB; KDR; RFPL1; A4GALT; KRT17; CPA4; FLNC; MYOlB; CHN1; MY05C; CGNL1; ISLR; RNASE1; SHC2; DOCK6; APOE; APOC1; USHBP1; UNC13A; PXDN; ASS1; GALNT15; PDLIM4; RAMP2; KHDRBS3; RAI2; NR0B2; RHPN2; PPARG; REEP2; HSPA12B; NES; ALDH3B2; BHMT2; STARD13; BEX1; PDZD2; SPINK5; LYVE1; MRO; MEIS2;

CABLES 1; APLNR; COL4A2; TBX3; AMHR2; HEY2; PKIB; STAB2; THSD1; EDNRB; RAPGEF5; ALPK3; GATA4; DAB2IP; ALDOB; NR5A1; IL33; CCL21; SLC02B1;

LRRC32; SULF1; YAP1; SMAD6; ARHGAP29; TACC2; RBP4; OIT3; AOX1; DUOXA1; GCSH; GATA6; CCDC40; FKBP10; MMEL1; PRDM16; FCN3; TINAGL1; RGS5; RGL1; MALL; RBMS3; IL17RD; SHROOM2; DENND2A; CXorf36; AWAT2; FAM13C; ADIRF; ROM1; OOSP2; CLEC1A; ADGRL3; CCDC102B; DOCK1; MAGI1; THRSP; AKR1C2; PTPN14; HSPB8; TMEM178A; SPARCL1; GJA1; PLOD2; FBXL2; SEMA3D; CABYR; ROB04; ABI3BP; CEP112; UCHL1; ENAH; PDLIM3; JAM2; FGD5; GNA14; KCNMA1; NMNAT2; CCNB2; AFAP1L1; ERG; HPD; SHROOM4; LAD1; C1QC; CIART; FCN2; AZGP1; COX7A1; CYGB; MPP3; BCL6B; SHANK2; PLPP3; FBLIM1; ADGRL4; SNX7; VCAM1; DDR2; Clorfl l5; PIGR; RFTN2; FAM84A; NOSTRIN; FABP1; ALB;

PRICKLE2; ADAMTS9; APBB2; TM4SF18; EMCN; SPINK 1; MYOZ3; BMPER;

ZNF704; COL1A2; SOX17; DEFB1; AQP7; KIAA1462; SMC02; FBN1; LARP6; SPIC; CYYR1; TMEM100; MFAP4; NNMT; GPR182; IGF2; MY05B; CDC42EP5; SEMA6B; GGT6; KLK4; ACER1; GSDMA; DNASE1L2; ACOX2; FAM107A; COL3A1; FAM178B; CPLX1; EFNA1; SHE; ANTXR1; ROBOl; CTNND2; TM4SF1; MYRIP; FABP4;

GPRC5C; GSTA4; PRKCDBP; SOX7; TMEM37; KRT19; PDE7B; KRT20; MAP6; FGA; FGB; PAH; ARNT2; SYNP02; AGXT; MUCL1; SNTG2; GXYLT2; SNCG; STOX2;

C1QTNF1; CD34; PHLDA3; PODN; SLC02A1; DES; LPL; NR2F1; HOXD8; NUPR1; CIDEA; CLEC14A; C8orf4; C8G; CASKIN2; PTRF; CALML3; PSAPL1; LGALS7B;

WSCD1; PIPOX; CDH5; TMEM45A; OR6S1; C1 S; BGN; CLEC4G; PYCR1; CTNNA3; FBXL7; FAM167B; MAATS1; DGAT2L6; ALDH1A3; TACSTD2; TCEAL2; WBP5;

NR2F2; KRT79; RGS7BP; KRT14; KRTAP23-1; LYPD6; FAM9C; Cl lorf96; GJA4;

NANOS3; PLA2G2A; Cl5orf52; S100A16; FSIP2; AADACL3; APOD; S100A13; KIF19; HRCT1; ADH1B; CLPSL2; SRGAP1; KIAA1671; FAM177B; HOXA4; MFAP5; PARVA; TEAD4; SULT1C4; ADH4; HMGN5; ZNF442; ARHGEF15; DMD; Clorf53; SMIM9; SOX18; AWAT1; IGFL2; ERICH4; MT1M; C2CD4B; FAM127C; KLHL23; EMP2; UBD; NEURL1B; C1QTNF5; APOC4-APOC2; CFLAR- AS 1 ; PLCL2- AS 1 ; LAl6c-395Fl0. l; Cl4orfl32; AC046143.3; PPP5D1; RP11-14N7.2; HLA-DQB2; PHGR1; RP1-67K17.3; APOC2; RP11-758P17.3; TDGF1; INMT; GSTA1; ETV5; RP11-148B6.1; EC SCR; RP11- 548K23.11; IQCJ-SCHIP1; SHANK3; RP11-116G8.5; CTD-2135J3.4; RP11-923111.5; RP11-315D16.2; HOXB7; RP11-521L9.1; RP11-680G10.1; RP11-1260E13.4; GJA5; CTD- 2350C19.2; AP000275.65; RP11-45215.2; APOC4; AC003002.4; AC007193.8; DOC2B; CCL14; PIK3R3; RP5-1042K10.14; MATR3; RP11-717K11.2; CDR1-AS; and AL365273.1.

[0063] Provided herein are methods that can preferentially deplete blood-cell cf-mRNA and thereby enhance the ability to detect organ derived cf-mRNA, which may be more

informative for diagnostic purposes. Centrifugation speed and times may be optimized for each tissue to collect relevant organ-specific transcripts and separate cf-mRNA fractions from different organ-types. Such isolation of“non-blood” organ-specific cf-mRNA from blood can enable extraction of meaningful biological information. [0064] Through an implementation of at least one of the approaches above, up to and including methods, systems and kits employing combinations of the above-mentioned approaches using a majority or all of the approaches herein, one can obtain an improvement, or a substantial improvement, in preparation of cf-RNA library preparation. Such

improvement can be observed through at least one of an increase in library diversity such as 2.5x (for example, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9. 2.0, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3.0, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9. 40 or greater than 4. Ox) improvement in an RNA library diversity. Increased library diversity can allow the same number of unique genes to be detected with a smaller number of sequencing reads, or more unique transcripts to be observed in the same number of sequencing reads. Similarly, one can observe in various cases an increase, or a substantial increase, in non-blood transcripts sequenced in a cf-mRNA library, such as an increase of up to or at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, or greater than a 70% increase. Some systems, methods, or kits can exhibit an increase, for example, of about 50%. Such increases can be observed prior to or in combination with selective removal of sequences identified as being blood-related transcripts. Such increases can facilitate or can be observed in samples having a reduced volume, reduced sequencing depth or both a reduced volume and sequencing depth relative to some standard protocols. In some cases, sequencing depth can be decreased by up to or at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, or greater than 70% without a corresponding reduction in the number of unique transcripts detected. Some systems, methods, or kits can exhibit a decrease, for example, of about 50%, yet can still provide the improvements in diversity describe above. In some cases, sample volume can be decreased by up to or at least 10%, 20%, 30%, 33%, 40%, 50%, 60%, 70%, or greater than 70%. Some systems, methods, or kits exhibit a decrease, for example, of about 33%, despite the improvements in diversity described above.

[0065] In some instances, on an absolute scale, methods, systems and kits consistent with the disclosure herein can increase the resolution of low-abundance transcript sequences in a sequence read library resulting from analysis of a sample as provided herein. As measured by sample transcripts or by internal standard RNA molecules or externally processed molecules assayed concurrently or independently, one can observe inclusion in a final sequence dataset of transcripts present at a low range of molecules per initial sample, such as at least or no more than 900, 800, 700, 600, 500, 400, 300, 200, 100, 90, 80, 70, 60, 50, 40, 30, 20, 15, or 10 molecules per sample. That is, in some cases one can observe inclusion in a final sequence data set of transcripts present at a total amount of, for example, 10-100 molecule per sample. This can represent an improvement over some other methods.

[0066] The biological sample for cf-mRNA production may be any biological fluid.

Exemplary fluids include blood, saliva, urine, interstitial fluid, cerebrospinal fluid, semen, vaginal fluid, amniotic fluid, tears, synovial fluid, mucus, or lymphatic fluid. Cells may be removed from a biological fluid by centrifugation or other means including filtration. Within the blood, cf-RNA may associate with proteins, lipids, salts, or other components. Some cf- RNA is released from cells in extracellular vesicles such as exosomes. Exosomes may be isolated by methods such as, but not limited to, sedimentation in a centrifuge, size exclusion, filtration, equilibrium density centrifugation, immunoisolation, immunodepletion, and combinations thereof.

[0067] In some embodiments, the methods of the present disclosure can allow or permit detection of one or more extracellular RNA transcripts in a biological sample (e.g., a biofluid). The biological sample may be serum, plasma, saliva, urine, interstitial fluid, cerebrospinal fluid, semen, vaginal fluid, amniotic fluid, tears, synovial fluid, mucus, lymphatic fluid, or another suitable biological sample. In various embodiments, the methods can enable detection of one or more cell-free mRNA molecules derived from non-blood cells in a serum sample. The methods can enable detection of one or more cell-free mRNA molecules derived from non-blood cells in a serum sample in addition to hematopoietic transcripts.

[0068] The genes detected in cf-mRNA can be traced back to a tissue and/or organ of origin (e.g., tissue-specific genes; see Tables 2-7) or may be of particular interest for diagnosing a disease or condition (see Tables 8-9). Furthermore, the methods provided herein may be sensitive such that extracellular RNA molecules present at a copy number as low as 10, 15, 25, 50, 100, 150, 200 or 500 in the biological sample (e.g., biofluid) may be detected. The RNA molecules can be detected by sequencing, qPCR, ddPCR, microarray, or any other suitable method.

[0069] The methods provided herein may detect and/or measure extracellular RNA molecules present in a biological sample (e.g., circulating in a biofluids). In various embodiments, the methods may detect and/or measure cell-free mRNA transcripts derived from hematopoietic and/or non-hematopoietic cells (see, e.g., the non-blood genes of Table 1 or Table 10). The methods may generate purified cf-RNA samples, wherein 1, 5, 10, 20, 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000 or more non-blood genes from Table 1 or Table 10 can be detected and/or measured. The methods can measure or detect at least 1,

5, 10, 20, 50, 100, 200, 300, 400, or 500 tissue-specific, organ-specific or diagnostically important genes, e.g., from Tables 2 -9, from cf-RNA extracted from a biological sample.

The methods can generate cf-RNA samples from a biological sample, wherein RNA molecules present at a copy number of at not more than 10, 15, 25, 50, 100, 150, 200 or 500, or less can be detected.

[0070] Methods of detecting at least 10, 20, 30, 50, or 100 non-blood cf-mRNAs genes in a biological sample are also provided herein. The methods may include, but are not limited to, (a) centrifuging a serum or plasma sample for at least 10 minutes at from 8,000 g to 16,000 g (or other ranges as provided herein) to form a supernatant; (b) extracting RNA from the supernatant; (c) contacting the RNA with a deoxyribonuclease; (d) forming cDNA from the RNA; (f) preparing a cDNA library from the cDNA; (g) sequencing the cDNA library; and/or (h) aligning the sequences to a reference genome to identify sequences arising from at least 10, 20, 30, 50, or 100 non-blood cf-mRNAs genes per biological sample.

[0071] The methods may also include (i) contacting the cDNA library with baits comprising polynucleotide fragments from at least 10, 20, 30, 50, or 100 genes of interest to enrich translated genes. In some cases, the methods (d) may comprise contacting the RNA with a reverse transcriptase enzyme to form a single-stranded cDNA and contacting the single- stranded cDNA with a second strand synthesis enzyme to form double-stranded cDNA. The methods may also include (j) ligating unique dual indexes to the cDNA library to form an indexed cDNA library. In some embodiments, the methods may include (k) pooling up to 2,

3, 4, 5, 6, 7, 8, 9, 10 or more indexed cDNA libraries. The methods may also include (1) performing massively parallel sequencing on the pooled cDNA libraries.

[0072] In some embodiments, at least 25, 50, 75, 100, 200, 300, 400, 500, 600, 700, 800,

900, or 1000 genes are detected in the biological sample. In various embodiments, the sequences may be aligned to a reference genome to identify sequences arising from at least 25, 50, 75, 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000 non-blood cf-mRNAs genes per biological sample. The method may further include contacting the single-stranded cDNA with a second-strand synthesis enzyme to form the double stranded cDNA. In some cases, (c) may be performed in solution.

[0073] In certain embodiments, methods of detecting at least 10, 20, 30, 50, or 100 non-blood cf-mRNAs genes in a biological sample may include, but are not limited to, (a) centrifuging or filtering a serum or plasma sample at from 1,900 g to 16,000 g (or other ranges as provided herein); (b) extracting an RNA sample from the supernatant; (c) contacting the RNA sample with a deoxyribonuclease; (d) contacting the RNA with a reverse transcriptase enzyme to form a single-stranded cDNA; (e) forming double-stranded cDNA from the RNA; (f) preparing a cDNA library from the double-stranded cDNA; (g) contacting the indexed cDNA library with baits comprising polynucleotide fragments to enrich translated genes; (h) sequencing the cDNA library; and/or (i) aligning the sequences to a reference genome to identify sequences arising from at least 10, 20, 30, 50, or 100 non-blood cf-mRNAs genes per biological sample.

[0074] The methods may further include (j) adding unique dual indexes to the cDNA library to form an indexed cDNA library (e.g., via ligation, PCR, etc.). In some embodiments, the methods may comprise (k) pooling up to ten indexed cDNA libraries. The methods may further comprise (1) performing massively parallel sequencing on the pooled cDNA libraries. In some cases, the methods may further comprise contacting the single-stranded cDNA with a second-strand synthesis enzyme to form the double stranded cDNA. In some embodiments,

(c) may be performed in solution.

[0075] A polynucleotide sequence that“aligns” to a gene generally has about 100% identity to the sequence of part or all of the gene.

[0076] As used herein, the singular forms“a,”“an,” and“the” include plural references unless the context clearly dictates otherwise. Any reference to“or” herein is intended to encompass“and/or” unless otherwise stated.

[0077] While preferred embodiments of the present disclosure have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the disclosure. It should be understood that various alternatives to the embodiments of the disclosure described herein may be employed in practicing the disclosure.

EXAMPLES

[0078] The application may be better understood by reference to the following non-limiting examples, which are provided as exemplary embodiments of the application. The following examples are presented in order to more fully illustrate embodiments and should in no way be construed, however, as limiting the broad scope of the application. Example 1: Methods

[0079] Processes used for cell-free mRNA (cf-mRNA) analysis include biological sample processing, cf-mRNA extraction, cf-mRNA purification, cDNA synthesis, library

preparation, DNA sequencing, and bioinformatics (Fig. 1).

[0080] Blood was collected in an EDTA vacutainer (BD) for plasma processing or a red-top vacutainer (BD) for serum processing. For serum processing, the blood was incubated for at least 30 minutes at room temperature. After less than 2 hours of post-collection room temperature storage, the blood was centrifuged for 10 minutes at 1600 g to yield plasma or serum (supernatant). Samples can be either processed or frozen for storage at -80C. To remove residual cells from frozen or fresh samples, the plasma/serum was centrifuged a second time for 10 minutes at 10,000 g to 16,000 g, depending on the application.

[0081] Cell-free RNA was extracted and purified using reagents from a QIAamp Circulating Nucleic Acid Kit (Qiagen cat. no. 55114). The following conditions were used for up to 1 ml of plasma or serum. The supernatant was transferred to a new tube, mixed with 130 mΐ Proteinase K, 1.1 ml Buffer ACL (without carrier), and 330 mΐ Buffer ATL, and incubated at 60 °C for 45 min. The products were mixed with 3 ml Buffer ACB, 1 mΐ diluted ERCC RNA Spike-In Mix (Life Technologies cat. no. 4456740), and 2.5 ml chilled isopropanol and incubated for 5 minutes on ice. The sample was loaded onto a QiAmp Mini column using a vacuum manifold, and then washed with 600 mΐ Buffer ACW1 followed by 750 mΐ Buffer ACW2, and 2X 750 mΐ EtOH. The column was dried at 56 °C for 10 min. RNA was then eluted twice with 50 mΐ of Buffer AVE by incubating for 3 minutes each at room temperature followed by 1 minute of centrifugation at 16,000 g. The eluate (-100 mΐ) was treated with 3 mΐ Turbo DNase (Life Technologies cat. no. AM 1907) in IX Turbo DNA buffer for 20 minutes at 37 °C. The reaction was stopped with 10 mΐ DNase Inactivation mix, incubated for 5 minutes at room temperature, and the centrifuged at 10,000 g for 90 seconds.

[0082] RNA in the supernatant was brought to 100 mΐ with water, if necessary, and cleaned using a OneStep PCR Inhibitor Removal Kit (Zymo cat. no. D6030). The Zymo spin column was prepared with 600 mΐ of Prep buffer, followed by 400 mΐ and 100 mΐ of water, all centrifuged at 8000 g for 3 minutes. The sample was then passed through the column by centrifugation at 8000 g for 3 minutes. The sample was then cleaned a second time using reagents from a RNeasy MinElute Cleanup kit (Qiagen cat. no. 74204). The RNA sample was mixed with 350 mΐ RLT buffer and 900 mΐ of EtOH and loaded on a RNeasy MinElute column by centrifugation at > 8,000 g. The column was washed with 500 mΐ of RPE Wash Buffer followed by 500 mΐ of 80% ethanol and then dried as recommended by the

manufacturer. For elution, 15 mΐ of water was added to the column, incubated for 1 min at room temperature, and then collected in a microcentrifuge tube by centrifugation at 16,000 g for 1 minute. For quality control, 1 mΐ (of 15 mΐ) was analyzed on a Bioanalyzer using RNA 6000 Pico reagents (Agilent).

[0083] cDNA was synthesized using Superscript IV Reverse Transcriptase (Life

Technologies cat. no. 18090050) followed by processing with a NEBNext® Second Strand Synthesis kit (New England BioLabs cat. no. E6111L). RNA (up to 10 mΐ) was mixed with 1.12 mΐ random hexamer primers (3 mg/ml) and 0.56 mΐ dNTPs (10 mM each) in a total volume of 14 mΐ, incubated for 5 minutes at 65 °C, and then chilled to 4 °C. The sample was then mixed with 0.43 mΐ water, 4 mΐ SSIV Buffer, 0.57 mΐ DTT (0.1 M), and 1 mΐ reverse transcriptase (200 U / mΐ), and incubated at 23 °C for 10 min, 50 °C for 50 min, 80 °C for 10 min, and then held at 4 °C. For second strand synthesis, the reaction was supplemented with 4 mΐ reaction buffer, 2 mΐ NEBNext Enzyme, brought to a total volume of 40 mΐ with water, and incubated for 1 hour at 16 °C. The dsDNA was cleaned with AMPure XP SPRI beads (Beckman Coulter Inc. cat. no. A63882). 40 mΐ dsDNA was mixed for 2 minutes with 40 mΐ Low EDTA TE (Swift Biosciences cat. no. 90296) and 144 mΐ SPRI beads followed by a 3- minute incubation at room temperature. The beads were collected using a magnetic rack, washed twice with 200 mΐ 80% ethanol, and air dried for 5 minutes.

[0084] A library was prepared with reagents from the Accel-NGS 2S Plus DNA Library Kit (Swift Biosciences cat. no. SP-2014-96) and Unique Dual Indexes (UDI) (Integrated DNA Technologies). The SPRI beads were suspended in 53 mΐ Low EDTA TE, 6 mΐ Buffer Wl, and 1 mΐ Enzyme W2 and incubated for 10 minutes and 37 °C. 108 mΐ PEGNaCl Solution (Swift Biosciences cat. no. 90196) was added. The beads were mixed for 2 minutes, incubated for 3 minutes at room temperature, and then collected for 5 minutes on a magnetic rack. After removing the supernatant, the beads were washed twice for 30 seconds with 180 mΐ 80% ethanol and then air dried. The beads were resuspended in 30 mΐ Low EDTA TE, 5 mΐ Buffer Gl, 13 mΐ Reagent G2, 1 mΐ Enzyme G3, and 1 mΐ Enzyme G4 and incubated at 20 °C for 20 minutes. 82.5 mΐ of PEGNaCl Solution was added, followed by 2 minutes of mixing, a 3 -minute room temperature incubation, and collection for 5 minutes on a magnetic rack.

After removing the supernatant, the beads were washed twice for 30 seconds with 180 mΐ 80% ethanol and then air dried for 1 minute. The beads were resuspended in 20 mΐ Low EDTA TE, 5 mΐ Reagent Y2, 3 mΐ Buffer Yl, and 2 mΐ Enzyme Y3 and incubated for 15 minutes at 25 °C. 49.5 mΐ of PEG NaCl Solution was added, followed by 2 minutes of mixing, a 3 -minute room temperature incubation, and collection for 5 minutes on a magnetic rack. After removing the supernatant, the beads were washed twice for 30 seconds with 180 mΐ 80% ethanol and then air dried for 1 minute. The beads were resuspended in 30 mΐ Low EDTA TE, 5 mΐ Buffer B l, 2 mΐ Reagent B2, 9 mΐ Reagent B3, 1 mΐ Enzyme B4, 2 mΐ Enzyme B5, and 1 mΐ Enzyme B6, incubated at 40 °C for 10 minutes, and then returned to 25 °C. 70 mΐ of PEG NaCl Solution was added, followed by 2 minutes of mixing, a 3-minute room temperature incubation, and collection for 5 minutes on a magnetic rack. After removing the supernatant, the beads were washed twice for 30 seconds with 180 mΐ 80% ethanol and then air dried for 1 minute. The beads were resuspended in 21 mΐ low EDTA TE by mixing for 2 minutes followed by a 2-minute incubation. The beads were collected on a magnetic rack, and the supernatant was transferred to a new plate and mixed with 5 mΐ of Illumina ETDI Primer Mix (1-72) (Integrated DNA Technologies), 10 mΐ Low EDTA TE, 4 mΐ Reagent R2, 10 mΐ Buffer R3, and 1 mΐ Enzyme R4. The PCR reaction was heated to 98 °C for 30 seconds, cycled 16 times at 98 °C for 10 seconds, 60 °C for 30 seconds and 68 °C for 60 seconds, and then held at 4 °C. 70 mΐ SPRI beads were added. Beads and sample were mixed for 2 minutes, incubated for an additional 2 minutes, and collected on a magnetic rack for 5 minutes. After removing the supernatant, the beads were washed twice for 30 seconds with 180 mΐ 80% ethanol and then air dried for 1 minute. Nucleic acids were eluted in 21 mΐ water.

[0085] cDNA and ERCC DNA were enriched using Sure Select XT V6 whole exome+ETTR capture probes and ERCC capture probes in connection with the SureSelect Custom Reagent kit (Agilent Technologies cat. no. 931170) to form cf-mRNA sequencing libraries. Lip to ten indexed samples with a total cDNA library mass of 750-1000 ng were pooled. Vacuum centrifugation was used to reduce the volume to 3.4 mΐ. The sample was then mixed with 5.6 mΐ SureSelect XT2 Block Mix. 9 mΐ of the sample was transferred to a PCR strip tube, sealed, incubated at 95 °C for 5 minutes, and then held at 65 °C for at least 5 minutes. 1.5 mΐ water, 0.5 mΐ SureSelect RNase Block, 6.63 mΐ Hybl, 0.27 mΐ Hyb2, 2.65 mΐ Hyb3, 3.45 mΐ Hyb 4, 1 mΐ ERCC capture library (Agilent), and 5 mΐ Capture Library >3Mb (All Exon V6 60Mb) were added, and the samples were incubated overnight at 65 °C with a heated lid. MyOne streptavidin beads (50 mΐ) were prepared by washing four times with 200 mΐ SureSelect binding buffer. The pooled sample was added to the streptavidin beads and mixed for 30 minutes at 1800 rpm. The beads were collected with a magnet, washed for 15 minutes with 200 mΐ SureSelect Wash Buffer 1, and then washed three times for 10 minutes at 65 °C with 200 mΐ SureSelect Wash Buffer 2. Nucleic acids were eluted from the beads by incubating in 20 mΐ water for 5 min at 95 °C, transferred to a new tube, and mixed with 6 mΐ water, 25 mΐ 2X Herculase Master Mix, and 1 mΐ XT2 Primer Mix. The sample was incubated at 98 °C for 2 minutes, cycled 15 times at 98 °C for 30 seconds, 60 °C for 30 seconds, and 72 °C for 1 minute, extended at 72 °C for 10 minutes, and then held at 4 °C. The reaction was cleaned with 90 mΐ AMPureXP beads and eluted in 15 mΐ water. Products were analyzed by Kapa qPCR and capillary electrophoresis. For Kapa qPCR, dilutions were prepared in 10 mM Tris-HCl, pH 8. Capillary electrophoresis was performed on a Bioanalyzer.

[0086] After quantification, sequencing pools were denatured and diluted according to their size and following Illumina’ s recommendations to obtain optimal clustering. PhiX control was added to the samples as reference. Using a lOOOuL pipette all the diluted library was loaded into reservoir #10 according to the NextSeq 500 (Illumina) instructions. Illumina Basespace was used to conduct sequencing run according to their instructions. Sequencing was conducted with paired end and read cycle was set to 76. NextSeq was selected as the sequencing machine. Sequencing run was started on NextSeq 500 according to the manufacturer’s instruction.

[0087] Base-calling was performed on BaseSpace platform (Illumina Inc), using the FASTQ Generation Application. For sequencing data analysis, adaptor sequences were removed, and low-quality bases were trimmed using cutadapt (vl . l 1). Reads shorter than 15 base pairs after trimming were excluded from subsequent analysis. Read sequences greater than 15 base pairs were aligned to the human reference genome GRCh38 using STAR (v2.5.2b) with

GENCODE v24 gene models. Duplicated reads were removed using the samtools (vl .3. l) rmdup command.

[0088] For cell type deconvolution, a normalization was implemented whereby the expression levels of each gene were divided by its maximum value across the samples. This step rescales expression levels among different genes to avoid domination of the

decomposition process by a few highly expressed genes. The normalized expression matrix was then subjected to non-negative matrix factorization (NMF) decomposition using skleam.decomposition.NMF within the Python library Scikit-leam. NMF decomposition achieves a more parsimonious representation of the data by decomposing an expression matrix into the product of two matrices X = WH; wherein X is the expression matrix with n rows (n samples) and m columns (m genes); W is the coefficient matrix with n rows (n samples) and p columns (p components); and H is the loading matrix with p rows (p components) and m columns (m genes). W is in a sense a summarization of the original matrix H with a reduced number of dimensions. H contains information about how much each gene contributes to the components. Biological interpretation of the derived components was achieved by performing pathway analysis on the top genes that contribute the most to each component.

[0089] Whole blood and matched plasma samples were sequenced to identify“non-blood” genes. A gene is considered“non-blood” if its normalized expression (Transcripts Per Million, TPM) is three-fold higher in plasma compared to whole blood (containing blood cells)“non-blood” genes are presumably derived from tissues and/or organs, not blood cells. A blood cell polynucleotide has a sequence that aligns to a blood cell gene and is not a non blood polynucleotide with a sequence that aligns to a non-blood gene. Non-blood genes represented, on average, 18% of the TPMs in the library, with a range of from 11% to 24%. Non-blood genes represented 15% of all genes detected (gene counted as detected if TPM > 3), range 8%-24%. A list of 2,855 non-blood genes detected in this study is presented in Table 1. A list of lower stringency non-blood genes detected in this study is presented in

Table 10.

Table 1 : Non-blood genes detected in cell-free mRNA

Table 10: Low-stringency non-blood genes detected in cell-free mRNA

Example 2: Enrichment of tissue-derived cf-mRNA by size fractionation

[0090] Generally, most RNA in blood is found within blood cells. Methods for preparing serum and plasma generally involve a low-speed spin that removes most blood cells.

However, residual blood cells are a significant source of noise that may interfere with an analysis of cf-RNA.

[0091] The GTEx tissue expression database was used to identify tissue-specific and organ- specific genes. A gene is considered tissue or organ specific if its expression is at least five- fold higher is one tissue or organ compared to all other tissues and organs. Tissue-specific and organ-specific genes detected in this study are presented in Tables 2-7.

Table 2: Tissue-specific genes (455) detected in cell-free mRNA

Table 3: Red blood cell-specific genes (23) detected in cell-free mRNA

Table 4: Platelet-specific genes (326) detected in cell-free mRNA

Table 5: Neutrophil-specific genes (239) detected in cell-free mRNA

Table 6: Brain-specific genes (163) detected in cell-free mRNA

Table 7: Liver-specific genes (63) detected in cell-free mRNA

[0092] Tables 8 and 9 present genes of interest for the diagnosis of liver-specific disease and pregnancy that to not meet the stringent criteria for non-blood genes.

Table 8: Additional liver-specific genes detected in cell-free mRNA

Table 9: Pregnancy associated genes detected in cell-free mRNA

Example 2: Enrichment of tissue-derived cf-mRNA by size fractionation

[0093] Generally, most RNA in blood is found within blood cells. Methods for preparing serum and plasma generally involve a low-speed spin that removes most blood cells.

However, residual blood cells are a significant source of noise that may interfere with an analysis of cf-RNA.

[0094] Size-selection performed on serum or plasma can increase the ratio of solid tissue- derived cf-mRNA compared to blood cell-derived cf-mRNA. To prepare serum or plasma, cells are sedimented in a 1,600 g spin. A second centrifugation step was implemented to enrich tissue-derived cf-mRNA. Plasma was centrifuged for 10 minutes at various speeds causing sedimentation forces ranging from 1,900 g to 16,000 g, followed by cf-RNA isolation, cDNA synthesis, library preparation, and sequencing. As the centrifugation speed increased, RNA transcript from blood cell components - platelet, and neutrophil transcripts (representative of transcripts from blood cells) - decreased more rapidly than tissue specific transcripts such as transcripts from liver or brain, resulting in an increase in the ratio of non blood cf-mRNA to blood cell-derived cf-mRNA (Figs. 2A-3B). This enrichment was counterbalanced by a decrease in the number of detectable tissue-derived genes (Fig. 4). The optimal speed for preparation of a low noise but representative and diverse cf-mRNA library depends upon the application and often ranges from 10,000 g to 16,000 g. For example, higher centrifugation speeds are preferred for the analysis of liver cf-mRNA transcripts compared to brain cf-mRNA transcripts. l6,000g g was used for the results presented below.

[0095] Size selection was also performed by filtration through membranes with a size-cutoff of 0.8 pm, 0.45 pm, or 0.2 pm (Fig. 2). As the pore size decreased, RNA transcript from blood cell components - platelet, and neutrophil transcripts (representative of transcripts from blood cells) - decreased more rapidly than tissue specific transcripts such as transcripts from liver or brain, resulting in an increase in the ratio of non-blood cf-mRNA to blood cell- derived cf-mRNA. (Fig. 2).

Example 3: Selection of cf-mRNA extraction methodologies

[0096] Various kits and methods were evaluated to optimize cf-mRNA extraction, including phenol based extractions of total cf-RNA: TRIzol, miRNeasy (Qiagen), Direct-zol (Zymo Research), nucleoZOL (Macherey-Nagel), mirVana (Life Technologies); extracellular vesicle capture based approaches followed by lysis (either phenol based or not): exoRNeasy

(Qiagen), ExoComplete (Hitachi); immunoselection or immunodepletion of vesicles followed by extraction of nucleic acids; lysis followed by total RNA/nucleic acid isolation:

Plasma/Serum RNA Purification Mini Kit (Norgen), QIAamp Circulating Nucleic Acids kit (“CNA kit”), QIAamp ccfDNA/RNA kit (Qiagen); and others. The CNA kit was selected because it showed the best balance between efficiency, scalability, linearity and consistency of cf-RNA extraction (see Fig. 5). The CNA kit is a total cf-RNA extraction kit which is agnostic to whether circulating cf-RNA is traveling as free RNA or is protected by proteins, lipids, or vesicles.

[0097] cf-RNA tends to be degraded in vivo and is further fragmented during the extraction process. miRNAs are also typically shorter than mRNAs. Thus Qiagen’ s“purification of miRNAs” protocol was selected instead of their standard“nucleic acid purification” protocol.

[0098] Multiple changes were made to the protocol supplied with the CNA kit to maximize the efficiency and the consistency of extraction. The protocol was adapted by (1) not adding carrier RNA to the lysis buffer because it interferes with sequencing results; (2) spiking in ERCC external reference RNAs during lysis; (3) pre-warming lysis buffers to lysis temperature (60 °C instead of 25 °C); (4) extending the lysis time from 30 minutes to a minimum of 45 minutes; (5) adding a second 100% ethanol washing step to better remove inhibitors from the sample; and (6) adding a second nucleic acid elution step to more completely remove RNA from the column. The size distribution of polynucleotides extracted with the modified method revealed an improved yield of fragmented cf-RNAs compared to the standard nucleic acid purification protocol (Figs. 6A-6B). The modified extraction method with the CNA kit yielded more cf-mRNA than the QIAamp ccfDNA/RNA kit and showed better linearly with increased or reduced plasma input (Fig. 7).

[0099] A dedicated enzymatic DNase step was incorporated into the protocol to remove DNA contamination and carry-over. Low level DNA contamination can be a source of error in gene-expression quantification and can be relevant for cf-RNA isolation because the amount of cf-RNA in serum or plasma is extremely low. Some commercially available cf- RNA extraction kits either ignore steps to remove DNA (e.g., many phenol-based kits) or recommend on-column DNase I treatments, which can be suboptimal for complete removal of DNA. Indeed, DNase I can be sensitive to salts, which are abundant during RNA extractions, and can have low efficiency with low DNA concentrations, which can be the case when working with cell-free biofluids. Turbo DNase (Ambion), a mutated version of DNase I with increased affinity for DNA, was utilized because it is particularly effective in removing trace quantities of DNA and more resistant to inhibitors. DNase treatment was performed according to the Ambion protocol except for an increase in the amount of enzyme to 2.5-3 mΐ enzyme per sample. DNase I treatment eliminated a substantial amount of contaminating nucleic acids that would otherwise interfere with cf-RNA analysis (Fig. 8).

[0100] Titrating the input of extracted material into downstream reactions revealed variable traces of inhibitors from blood that lower the efficiency of biochemical reactions. Inhibitor removal and silica-membrane-based purification steps were therefore implemented after the DNase treatment. Inhibitor removal was performed with a OneStep PCR Inhibitor Removal kit (Zymo) with extra column washes to ensure complete removal of the prep buffer. Cleanup on this column increased the apparent yield of cf-RNA by removing contaminants that interfere with enzymes, and the OneStep column had a higher apparent yield than a Micro Bio-Spin column (Bio-Rad) (Fig 9). Cleanup using the OneStep PCR Inhibitor Removal kit also preserved the recovery of fragmented cf-RNA polynucleotides (Figs. 10A-10B). Silica- membrane purification was performed with a MinElute PCR Purification kit (Qiagen) using a larger volume of ethanol to maximize recovery.

[0101] The final cf-RNA extraction and purification process dramatically reduced the frequency of assay fails due to suboptimal yields of less than 20 pg RNA and provided efficient and linear recovery of cell free mRNA using a range of input volumes (lOOul to 3ml) (Figs. 11-12).

Example 4: cDNA synthesis, library preparation, and whole exome capture

[0102] Commercially available low-input RNA sequencing kits generally include reagents for cDNA synthesis and removal of non-informative RNA species. The cDNA synthesis step in SMART er (Takara) was found to be inefficient. A three-step strategy was therefore developed with a dedicated initial cDNA synthesis step, a second strand synthesis reaction, and a commercially available kit optimized for library preparation from low levels of dsDNA, followed by capture of the whole exome.

[0103] Superscript IV (Invitrogen) was selected as the enzyme for reverse transcription because it demonstrated increased enzyme efficiency, linearity and resistance to traces of inhibitors with cf-RNA input when compared to i Script (Bio-Rad), qScript (Quantabio), Superscript III (Invitrogen) and SMARTScribe (Takara). The conversion efficiency with Superscript IV was further optimized by (1) priming with random hexamers instead of oligo dT, (2) increasing the primer concentration by 30-fold to 3 mg/ml, and (3) extending the reaction time from 10 minutes to 50 minutes as a precaution. The optimized Superscript IV method yielded more cDNA than iScript (Fig 13) and had better linearity than SMARTScribe (Fig. 14).

[0104] Double-stranded cDNA was generated in a second strand synthesis reaction using NEBNext Enzyme. No cleanup was performed between first and second strand synthesis. The second strand synthesis reaction was optimized by reducing the reagents used and total reaction volume by 50%.

[0105] Accel-NGS 2S Plus was found to be the most robust and scalable library preparation method to generate sequencing libraries from low amounts of input cDNA when compared with Accel-NGS 1 S Plus (Swift Biosciences) and others such as NxSeq® ETltraLow DNA Library Preparation Kit (Lucigen). In particular, the number of unique sequence fragments was approximately 30% higher using Accel-NGS 2S Plus compared to Accel-NGS 1S Plus (Fig. 15 A). Reagents remaining from the NEBNext second-strand synthesis step were not compatible with the chemistry of the Repair I step of Accel-NGS 2S Plus. The cDNA was therefore cleaned before Repair I using 1.8X SPRI beads. To minimize sample loss, Repair I was performed with the SPRI beads in the solution, and polyethylene glycol (PEG) was added after Repair I to promote DNA binding to the beads. To maximize recovery while avoiding contaminants including aptamers and dimers, the amounts of beads were adjusted to 1.8X for Repair I, 1.65 for Repair II, 1.65 for Ligation I and 1.4 for Ligation II. These modifications to the Accel-NGS 2S Plus protocol increased the number of unique sequence fragments by approximately 20% (Fig. 15B).

[0106] During PCR and before enrichment, each library prep was uniquely labeled with UDIs on both ends. UDIs were selected instead of standard indexes to minimize index hopping, which was especially important for cf-RNA libraries given the low number of copies of the input material. When standard indexes were used, sequencing reads were misassigned to a negative control (NTC). This contamination is alleviated by the use of UDIs (Figs. 16A-16B).

[0107] cf-mRNA was enriched from total cf-RNA by whole exome capture. This method was selected instead of rRNA depletion because mRNA constitutes less than 10% of the RNA molecules in circulation. Capture is performed with either RNA baits (Agilent) or DNA capture probes (IDT). RNA baits are preferred due to higher coverage of specific regions of interest. However, both can both be used. For normalization and quality control, the whole exome probes were combined with another set of probes designed to capture the 35 ERCC standards covering a wide range of copy numbers and sizes that were spiked during the extraction step. Capture was performed on pools of up to 10 cDNA samples according to a modified version of an Agilent protocol using XT2 blockers and reagents with XT probes.

The percentage of RNA polynucleotides arising from mRNA and other sources such as ribosomal RNA, mitochondrial RNA, non-coding RNA and other RNA species was determined by sequencing to compare different enrichment strategies. The mRNA fraction was substantially higher with whole exome capture compared to negative enrichment by rRNA depletion and the starting pool of total RNA (Fig. 17). In particular, approximately 80% of the sequence reads from the whole exon capture material were mRNA sequences, approximately 45% of the sequences after rRNA depletion were mRNA sequences, and less than 5% of sequences from total RNA were mRNA sequences.

[0108] The cDNA synthesis, library preparation, and whole exome capture methods presented in this section yielded sequencing reads more representative of the broad spectrum of cf-mRNAs compared to libraries constructed and depleted of rRNA using the SMARTer kit. The sensitivity for detecting low amounts of spiked in ERCC standards was improved by a factor of 5 to 20 (Fig. 18 A). This increased sensitivity facilitated detection of substantially more protein coding genes (Fig. 18B). Using spiked in ERCC standards of known

concentration, the sensitivity of detection was estimated to be approximately 14 copies (Fig. 18C).

Example 5: Comparison to other cf-mRNA libraries

[0109] Cell-free-mRNA libraries prepared by the method of Example 1 were superior to the “Pan et al” cell-free mRNA libraries (Pan et al, Clin Chem. 2017 Nov;63(l l): 1695-1704).

For this analysis, raw sequencing data from both studies was processed using the

bioinformatics pipeline described in Example 1. The Pan libraries were prepared from the equivalent of 500 mΐ of serum per prep, whereas only 165 mΐ of serum per prep was used to prepare the Example 1 libraries. Despite approximately two-fold fewer sequencing reads per prep (Fig. 19 A), the Example 1 protocol (modified with centrifugation at 16,000 g and enrichment with DNA capture probes) yielded approximately six-fold more unique fragments (Fig. 19B), approximately three-fold more protein coding genes (Fig. 19C), approximately four-fold more genes with >80% coverage (Fig. 19D), and approximately eight-fold more liver genes (Fig. 19E).

[0110] While preferred embodiments of the present disclosure have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the disclosure. It should be understood that various alternatives to the embodiments of the disclosure described herein may be employed in practicing the disclosure. It is intended that the following claims define the scope of the disclosure and that methods and structures within the scope of these claims and their equivalents be covered thereby.