Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHODS AND KITS FOR INTEGRATING GENOMIC SEQUENCES WITH IMMUNE MONITORING
Document Type and Number:
WIPO Patent Application WO/2014/011735
Kind Code:
A1
Abstract:
The present disclosure provides methods and kits for relating immune repertoire with selected conditions to provide e.g., prognosis, diagnosis, and/or monitoring or modulation of treatment. The immune repertoire may be linked, e.g., with genomic locus measurement and/or an abnormality.

Inventors:
JOHNSON DAVID SCOTT (US)
LOEHR ANDREA (US)
HSU ANDRO (US)
MEYER EVERETT HURTEAU (US)
Application Number:
PCT/US2013/049872
Publication Date:
January 16, 2014
Filing Date:
July 10, 2013
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
GIGAGEN INC (US)
International Classes:
C12Q1/68; G16B20/40; C12N5/07; G01N33/53
Other References:
PORCELL ET AL.: "Analysis of T cell antigen receptor (TCR) expression by human peripheral blood CD4-8- alpha/beta T cells demonstrates preferential use of several V beta genes and an invariant TCR alpha chain", THE JOURNAL OF EXPERIMENTAL MEDICINE, vol. 178, no. 1, 1993, pages 1 - 16
BOYD ET AL.: "Measurement and clinical monitoring of human lymphocyte clonality by massively parallel VDJ pyrosequencing", SCIENCE TRANSLATIONAL MEDICINE, vol. 1, no. 12, 2009, pages 12RA23
WARREN ET AL.: "Exhaustive T-cell repertoire sequencing of human peripheral blood samples reveals signatures of antigen selection and a directly measured repertoire size of at least 1 million clonotypes", GENOME RESEARCH, vol. 5, no. 21, 2011, pages 790 - 797
HALL ET AL.: "Quantitative-trait loci on chromosomes 1, 2, 3, 4, 8, 9, 11, 12, and 18 control variation in levels of T and B lymphocyte subpopulations", THE AMERICAN JOURNAL OF HUMAN GENETICS, vol. 70, no. 5, 2002, pages 1172 - 1182
FREEMAN ET AL.: "Profiling the T-cell receptor beta-chain repertoire by massively parallel sequencing", GENOME RESEARCH, vol. 19, no. 10, 2009, pages 1817 - 1824
Attorney, Agent or Firm:
SHUSTER, Michael, J. et al. (801 California StreetMountain View, CA, US)
Download PDF:
Claims:
What is claimed is:

1. A method for associating an immune cell repertoire with a genomic locus

measurement, comprising:

a. obtaining a first dataset comprising quantitative sequence information regarding a first immune cell repertoire of a first population of individuals, and obtaining a genotype of said first population of individuals at a target genomic locus;

b. obtaining a second dataset comprising quantitative sequence information regarding a second immune cell repertoire of a second population of individuals, and obtaining a genotype of said second population of individuals at said target genomic locus; and

c. analyzing, by a computer, said first and second datasets to associate a feature of said immune cell repertoire with a genotype of said target genomic locus.

2. A method for associating an immune cell repertoire with a genomic locus

measurement, comprising:

a. obtaining a first dataset comprising quantitative sequence information regarding the immune cell repertoire of a first population of individuals, and obtaining a genotype of said first population of individuals at a target genomic locus;

b. obtaining a second dataset comprising quantitative sequence information regarding the immune cell repertoire of a second population of individuals and the genotype of said second population of individuals at said target genomic locus; and

c. analyzing, by a computer, said first and second datasets to associate a feature of an immune cell repertoire with a genotype of said target genomic locus.

3. The method of claim 1 or 2, wherein said genomic locus comprises a germline sequence.

4. The method of claim 1 or 2, wherein said genomic locus comprises a somatic cell mutation.

5. The method of claim 1 or 2, wherein said genomic locus comprises an epigenetically modified locus.

6. The method of claim 1 or 2, wherein said first population and said second population of individuals differ with respect to an immune -mediated function.

7. The method of claim 6, wherein said immune-mediated function is selected from the group consisting of: transplant rejection, incidence of auto-immune disorder, and immune-mediated treatment response.

8. The method of claim 1 or 2, wherein said analyzing step comprises use of a statistical model or machine learning algorithm.

9. The method of claim 8, wherein said statistical model or machine learning algorithm comprises use of linear regression, maximum likelihood, analysis of variance, general linear model, generalized linear model, multilevel model, or structural equation model.

10. The method of claim 1 or 2, wherein said feature of said immune cell repertoire is selected from the group consisting of: an immune cell clonal sequence and an immune cell clonal abundance.

11. The method of claim 1 or 2, wherein said quantitative sequence information comprises nucleic acid or polypeptide sequence information.

12. The method of claim 1 or 2, wherein said first and said second dataset comprises quantitative immune cell clone abundance for each of said populations of individuals.

13. The method of claim 1 or 2, wherein said immune cell clones comprise T cells or B cells.

14. The method of claim 1 or 2, wherein said immune cell clones comprise a cell selected from the group consisting of: splenocytes, spleen cells, skin cells, lymph node cells, colon cells, and peripheral blood cells.

15. The method of claim 1 or 2, wherein a sequence comparison of said target genomic locus of said first population and said second population indicates the sequences to be syngeneic, comprise an allogeneic minor mismatch, or comprise an allogeneic major mismatch.

16. The method of claim 1 or 2, wherein the dataset comprising information about the immune cell repertoire comprises quantitative sequence information from variable immunoglobulin or T cell receptor sequences.

17. The method of claim 1 or 2, wherein said quantitative sequence information is derived from B cell receptor or T cell receptor sequences.

18. The method of claim 17, wherein said quantitative sequence information comprises VDJ sequence information from clonal B cells or clonal T cells.

19. The method of claim 18, wherein said VDJ sequence information is from TCRp.

20. The method of claim 1 or 2, wherein said first population is associated with an abnormality.

21. The method of claim 20, wherein said abnormality is selected from the group consisting of a cancer, an inflammatory condition, a cardiovascular disease, an endocrine disease, an eye disease, a genetic disorder, an infectious disease, an intestinal disease, and a neurological disorder.

22. The method of claim 1 or 2, wherein said second population is associated with a healthy state.

23. The method of claim 1 or 2, wherein said first population comprises an experimental group.

24. The method of claim 1 or 2, wherein the first population comprises an experimental group, the second population comprises a control group, and wherein said experimental group and said control group are differentiated by clinical factors selected from the group consisting of: age, weight, height, ethnicity, gender, and environmental exposures.

25. The method of claim 1 or 2, wherein said second population comprises a control group.

26. The method of claim 1 or 2, wherein said target genomic locus is a sequence variation or mutation in a gene selected from the group consisting of: interleukin-2 (IL-2), interleukin-4 (IL-4), interferon gamma (IFNy), interleukin-10 (IL-10), interleukin-1 (IL- 1), interleukin-13 (IL-13), interleukin-17 (IL-17), interleukin-18 (IL-18), tumor necrosis factor alpha (TNFa), tumor necrosis factor beta (TNF ), T-box transcription factor 21 (TBX21), forkhead box P3 (FOXP3), cluster of differentiation 4 (CD4), cluster of differentiation 8 (CD8), cluster of differentiation Id (CD Id), cluster of differentiation 161 (CD 161), cluster of differentiation 3 (CD3), major histocompatibility complex (MHC), cluster of differentiation 19 (CD 19), interleukin 7 receptor (IL-17 receptor), cluster of differentiation 10 (CD 10), cluster of differentiation 20 (CD20), cluster of differentiation 22 (CD22), cluster of differentiation 34 (CD34), cluster of differentiation 27 (CD27), cluster of differentiation 5 (CD5), and cluster of differentiation 45 (CD45), cluster of differentiation 38 (CD38), cluster of differentiation 78 (CD78), interleukin-6 receptor, Interferon regulatory factor 4 (IRF4), and cluster of differentiation 138 (CD138).

27. The method of claim 20, wherein said abnormality is a cancer selected from the group consisting of: lung carcinoma, non-small cell lung cancer, small cell lung cancer, uterine cancer, thyroid cancer, breast carcinoma, prostate carcinoma, pancreas carcinoma, colon carcinoma, lymphoma, Burkitt lymphoma, Hodgkin lymphoma, myeloid leukemia, leukemia, sarcoma, blastoma, melanoma, seminoma, brain cancer, glioma, glioblastoma, cerebellar astrocytoma, cutaneous T-cell lymphoma, gastric cancer, liver cancer, ependymona, laryngeal cancer, neck cancer, stomach cancer, kidney cancer, pancreatic cancer, bladder cancer, esophageal cancer, testicular cancer, medulloblastoma, vaginal cancer, ovarian cancer, cervical cancer, basal cell carcinoma, pituitary adenoma, rhabdomyosarcoma, and Kaposi sarcoma.

28. The method of claim 20, wherein said abnormality is an autoimmune disease, such as inflammatory bowel disease (IBD), multiple sclerosis (MS), or systemic lupus erythematosus (SLE).

29. The method of claim 20, wherein said abnormality is graft-versus-host-disease.

30. The method of claim 1 or 2, wherein said genotype comprises information regarding at least 1 target genomic locus.

31. The method of claim 30, wherein said genotype comprises information regarding at least 10 target genomic loci.

32. The method of claim 31 , wherein said genotype comprises information regarding at least 100 target genomic loci.

33. The method of claim 32, wherein said genotype comprises information regarding at least 1000 target genomic loci.

34. The method of claim 10, wherein at least 1 immune cell clonal sequence is associated with a germline, genotype, somatic cell mutation, or epigenetic modification.

35. The method of claim 34, wherein at least 10 immune cell clonal sequences are associated with a germline, genotype, somatic cell mutation, or epigenetic modification.

36. The method of claim 35, wherein at least 100 immune cell clonal sequences are associated with a germline, genotype, somatic cell mutation, or epigenetic modification.

37. The method of claim 36, wherein at least 1000 immune cell clonal sequences are associated with a germline, genotype, somatic cell mutation, or epigenetic modification.

38. The method of claim 1 or 2, wherein, said quantitative sequence information is derived from a single cell.

39. The method of claim 38, wherein said quantitative sequence information is derived from a sequence attached to a barcode.

40. The method of claim 39, wherein said barcode is unique to said single cell.

41. The method of claim 1 or 2, wherein, said quantitative sequence information is derived from a single cell or cell population encapsulated in a droplet or reaction container.

42. The method of claim 41, wherein said quantitative sequence information is derived from a sequence attached to a barcode, wherein said barcode is unique to said droplet or reaction container.

43. A method for immune cell repertoire monitoring, comprising:

a. obtaining a first dataset comprising quantitative sequence information

regarding a first immune cell repertoire of a first population of individuals and the genotype of said first population of individuals at a target genomic locus; b. obtaining a second dataset comprising quantitative sequence information regarding a second immune cell repertoire of a second population of individuals and the genotype of said second population of individuals at said target genomic locus; c. analyzing said first and second datasets, by a computer, to associate a feature of said immune cell repertoire with a genotype of said target genomic locus, wherein said first or said second population is associated with said abnormality; and d. obtaining a genotype at said target locus from a sample and characterizing clinical status based on said determined association between the genotype at said target genomic locus and said associated feature of said immune cell repertoire.

44. A kit for the method of claim 43, comprising: reagents for assaying a sample with unknown disease status at said genetic locus and for said immune cell clone, and instructions for diagnosing immune abnormality based on the known association between the genotype at said target genomic locus and the quantity of said immune cell clone.

45. A method for identifying a feature of an immune cell repertoire, comprising:

a. obtaining an association between a feature of said immune cell repertoire and a genotype at a target genomic locus; and

b. obtaining a genotype at said target genomic locus from a sample and

identifying, by a computer, said feature of the immune cell repertoire of said patient based on said determined association between the genotype at said target genomic locus and said associated feature of said immune cell repertoire.

46. The method of any one of the preceding claims, further comprising: modifying a treatment plan based on the associated feature of said immune cell repertoire with said genotype of said target genomic locus.

47. A method for associating an immune cell repertoire with a genomic locus

measurement, comprising:

a. obtaining a first dataset comprising quantitative sequence information

regarding a first immune cell repertoire of a first population of individuals, and obtaining a genotype of said first population of individuals at a target genomic locus;

b. obtaining a second dataset comprising quantitative sequence information

regarding a second immune cell repertoire of a second population of individuals, and obtaining a genotype of said second population of individuals at said target genomic locus; and

c. inputting said first and second datasets into an interpretation function stored on a computer to generate a score indicative of an association of a feature of said immune cell repertoire with a genotype of said target genomic locus.

48. A method for associating an immune cell repertoire with a genomic locus

measurement, comprising:

a. obtaining a first dataset comprising quantitative sequence information about the immune cell repertoire of a first population of individuals, and obtaining a genotype of said first population of individuals at a target genomic locus; b. obtaining a second dataset comprising quantitative sequence information about the immune cell repertoire of a second population of individuals and the genotype of said second population of individuals at said target genomic locus; and

c. inputting said first and second datasets into an interpretation function stored on a computer to generate a score indicative of an association of a feature of an immune cell repertoire with a genotype of said target genomic locus.

49. A system for associating an immune cell repertoire with a genomic locus

measurement, the system comprising: a. a processor; b. a storage memory comprising a first dataset and a second dataset, wherein said first dataset comprises quantitative sequence information about an immune cell repertoire of a first population of individuals and a genotype of said first population of individuals at a target genomic locus, said second dataset comprising quantitative sequence information about an immune cell repertoire of a second population of individuals and a genotype of said second population of individuals at said target genomic locus; and c. an interpretation function engine, executed by the processor, adapted to

determine a score that indicates a correlation between a genotype at said target genomic locus and a feature of an immune cell repertoire from said first dataset and said second dataset.

50. A computer program product for determining a score indicative of a correlation between a genotype at a garget genomic locus and a feature of an immune cell repertoire, the computer program product stored on a non-transitory computer readable medium and including program code for, when loaded into memory and executed by a processor, carrying out the steps of: a. accessing a first dataset and a second dataset, wherein said first dataset

comprises quantitative sequence information about an immune cell repertoire of a first population of individuals and a genotype of said first population of individuals at a target genomic locus, said second dataset comprising quantitative sequence information about an immune cell repertoire of a second population of individuals and a genotype of said second population of individuals at said target genomic locus; and b. determining a score with an interpretation function from said first dataset and said second dataset, wherein said score is indicative of a correlation between a genotype at said target genomic locus and a feature of an immune cell repertoire.

51. A system for associating an immune cell repertoire with a genomic locus

measurement, the system comprising: a. a processor; b. a storage memory comprising a first dataset and a second dataset, wherein said first dataset comprises quantitative sequence information about an immune cell repertoire of a first population of individuals and a genotype of said first population of individuals at a target genomic locus, said second dataset comprising quantitative sequence information about an immune cell repertoire of a second population of individuals and a genotype of said second population of individuals at said target genomic locus; and c. an interpretation function engine, executed by the processor, adapted to

determine a score that indicates a correlation between a genotype at said target genomic locus and a feature of an immune cell repertoire from said first dataset and said second dataset.

52. A computer program for determining a score indicative of a correlation between a genotype at a target genomic locus and a feature of an immune cell repertoire, the computer program product stored on a non-transitory computer readable medium and including program code for, when loaded into memory and executed by a processor, carrying out the steps of: a. accessing a first dataset and a second dataset, wherein said first dataset

comprises quantitative sequence information about an immune cell repertoire of a first population of individuals and a genotype of said first population of individuals at a target genomic locus, said second dataset comprising quantitative sequence information about an immune cell repertoire of a second population of individuals and a genotype of said second population of individuals at said target genomic locus; and b. determining a score with an interpretation function from said first dataset and said second dataset, wherein said score is indicative of a correlation between a genotype at said target genomic locus and a feature of an immune cell repertoire.

Description:
METHODS AND KITS FOR INTEGRATING GENOMIC SEQUENCES WITH

IMMUNE MONITORING

Inventors

David Scott Johnson, PhD

Andrea Loehr, PhD

Andro Hsu, PhD

Everett Hurteau Meyer, MD, PhD

Cross-Reference to Related Applications

[0002] This application claims the benefit of U.S. Provisional Patent Application No.

61/669,820, filed July 10, 2012, the disclosure of which is incorporated herein by reference.

Field

[0003] The disclosure relates to the fields of molecular biology and molecular diagnostics to perform immune monitoring and more specifically to methods and kits for relating immune repertoire with genomic sequences.

Background

[0004] Millions of people worldwide with immune disorders might end up hospitalized tomorrow, without warning. Currently medicine lacks the tools to provide warning, which could help reduce the impact and severity of the disease flare. Therefore there is a critical need for specific, noninvasive, and objective methods for monitoring of immune status of the immune repertoire under selected conditions. These conditions include cancer,

transplantation, autoimmunity, infectious disease, and a number of other immune-modulated diseases.

[0005] In one specific application, immune monitoring is used to assist in the prevention and treatment of acute graft- versus-host disease (aGVHD) after bone marrow transplantation (BMT). In another application, the immune system of the cancer patient responds to the genetic makeup of the tumor as well as germline variants. In autoimmune disease, germline HLA haplotypes and other immune-related genetic loci are associated with severity of disease, and T cell repertoires are patients specific. What is needed, therefore, is a method to combine genomic or germline and T cell repertoire diagnostic data, which each have separate sensitivity and specificity, to increase the overall sensitivity and specificity of the diagnosis and prognosis. Summary

[0006] The present invention includes a method for associating an immune cell repertoire with a genomic genotype. In one embodiment, the genomic genotype can be either germline variant, an epigenetically modified locus, or a somatic cell mutation. The method includes obtaining a first dataset comprising quantitative sequence information regarding a first immune cell repertoire of a first population of individuals, and obtaining a genotype of said first population of individuals at a target genomic locus. The method further includes obtaining a second dataset comprising quantitative sequence information regarding a second immune cell repertoire of a second population of individuals, and obtaining a genotype of said second population of individuals at said target genomic locus. The method further includes analyzing, by a computer, said first and second datasets to associate a feature of said immune cell repertoire with a genotype of said target genomic locus.

[0007] In one embodiment, the genomic locus comprises a germline sequence. In another embodiment, the genomic locus comprises a somatic cell mutation. In still another embodiment, the genomic locus comprises an epigenetically modified locus.

[0008] In another aspect, the invention includes a method for associating an immune cell repertoire with a genomic genotype. The method includes obtaining a first dataset comprising quantitative sequence information regarding the immune cell repertoire of a first population of individuals, and obtaining a genotype of said first population of individuals at a target genomic locus. The method further includes obtaining a second dataset comprising quantitative sequence information regarding the immune cell repertoire of a second population of individuals and the genotype of said second population of individuals at said target genomic locus. The method further includes analyzing, by a computer, said first and second datasets to associate a feature of an immune cell repertoire with a genotype of said target genomic locus.

[0009] In one embodiment, the first population and said second population of individuals differ with respect to an immune-mediated function. In a further embodiment, the immune- mediated function is a transplant rejection, incidence of auto-immune disorder, or immune- mediated treatment response.

[0010] In one aspect, the analyzing step comprises use of a dimension reduction step. In a further aspect, the dimension reduction step comprises principal component analysis.

[0011] In one embodiment, the feature of said immune cell repertoire is an immune cell clonal sequence or an immune cell clonal abundance. In another embodiment, the quantitative sequence information comprises nucleic acid or polypeptide sequence information. In still another embodiment, the first and said second dataset comprises quantitative immune cell clone abundance for each of said populations of individuals.

[0012] In one aspect, the immune cell clones comprise T cells or B cells. In another aspect, the immune cell clones are comprised of splenocytes, spleen cells, skin cells, lymph node cells, colon cells, or peripheral blood cells. In another aspect, the sequence comparison of said target genomic locus of said first population and said second population indicates the sequences to be syngeneic, comprise an allogeneic minor mismatch, or comprise an allogeneic major mismatch.

[0013] In some embodiments, the dataset comprising information about the immune cell repertoire comprises quantitative sequence information from variable immunoglobulin or T cell receptor sequences. In other embodiments, the quantitative sequence information is derived from B cell receptor or T cell receptor sequences. In yet another embodiment, the quantitative sequence information comprises VDJ sequence information from clonal B cells or clonal T cells. In still another embodiment, the VDJ sequence information is from TCRp.

[0014] In one aspect, the first population is associated with an abnormality. The abnormality could be, e.g., a cancer, an inflammatory condition, a cardiovascular disease, an endocrine disease, an eye disease, a genetic disorder, an infectious disease, an intestinal disease, or a neurological disorder. In another aspect, the second population is associated with a healthy state.

In still another aspect, the first population comprises an experimental group. In a related aspect, the first population comprises an experimental group, the second population comprises a control group, and wherein said experimental group and said control group are differentiated by clinical factors selected from the group consisting of: age, weight, height, ethnicity, gender, and environmental exposure. In still another aspect, the second population comprises a control group.

[0015] In one embodiment, the target genomic locus is interleukin-2 (IL-2), interleukin-4 (IL-4), interferon gamma (IFNy), inter leukin- 10 (IL-10), interleukin-1 (IL-1), interleukin-13 (IL-13), interleukin-17 (IL-17), interleukin-18 (IL-18), tumor necrosis factor alpha (TNFa), tumor necrosis factor beta (TNFP), T-box transcription factor 21 (TBX21), forkhead box P3 (FOXP3), cluster of differentiation 4 (CD4), cluster of differentiation 8 (CD8), cluster of differentiation Id (CD Id), cluster of differentiation 161 (CD 161), cluster of differentiation 3 (CD3), major histocompatibility complex (MHC), cluster of differentiation 19 (CD 19), interleukin 7 receptor (IL-17 receptor), cluster of differentiation 10 (CD 10), cluster of differentiation 20 (CD20), cluster of differentiation 22 (CD22), cluster of differentiation 34 (CD34), cluster of differentiation 27 (CD27), cluster of differentiation 5 (CD5), and cluster of differentiation 45 (CD45), cluster of differentiation 38 (CD38), cluster of differentiation 78 (CD78), interleukin-6 receptor, Interferon regulatory factor 4 (IRF4), or cluster of

differentiation 138 (CD 138). The sequence information can comprise a sequence variation or mutation in one or more of the target loci.

[0016] In one aspect, the clinical abnormality is a cancer selected from the group consisting of: lung carcinoma, non-small cell lung cancer, small cell lung cancer, uterine cancer, thyroid cancer, breast carcinoma, prostate carcinoma, pancreas carcinoma, colon carcinoma, lymphoma, Burkitt lymphoma, Hodgkin lymphoma, myeloid leukemia, leukemia, sarcoma, blastoma, melanoma, seminoma, brain cancer, glioma, glioblastoma, cerebellar astrocytoma, cutaneous T-cell lymphoma, gastric cancer, liver cancer, ependymona, laryngeal cancer, neck cancer, stomach cancer, kidney cancer, pancreatic cancer, bladder cancer, esophageal cancer, testicular cancer, medulloblastoma, vaginal cancer, ovarian cancer, cervical cancer, basal cell carcinoma, pituitary adenoma, rhabdomyosarcoma, and Kaposi sarcoma. In another aspect, the abnormality is an autoimmune disease, such as inflammatory bowel disease (IBD), multiple sclerosis (MS), or systemic lupus erythematosus (SLE). In still another aspect, the abnormality is graft-versus-host-disease.

[0017] In one embodiment, the genotype comprises information regarding at least 1, at least 10, at least 100, or at least 1000 genomic loci. In another embodiment, at least 1, at least 10, at least 100, or at least 1000 immune cell clonal sequences are associated with a germline, genotype, somatic cell mutation, or epigenetic modification.

[0018] In one aspect, the quantitative sequence information is derived from a single cell. In another aspect, the quantitative sequence information is derived from a sequence attached to a barcode. In a further embodiment, barcode is unique to said single cell. In another embodiment, the quantitative sequence information is derived from a single cell or cell population encapsulated in a droplet or reaction container. In still another embodiment, the quantitative sequence information is derived from a sequence attached to a barcode, wherein said barcode is unique to said droplet or reaction container.

[0019] Also provided herein is a method for immune cell repertoire monitoring. The method includes obtaining a first dataset comprising quantitative sequence information regarding a first immune cell repertoire of a first population of individuals and the genotype of said first population of individuals at a target genomic locus. The method also includes obtaining a second dataset comprising quantitative sequence information regarding a second immune cell repertoire of a second population of individuals and the genotype of said second population of individuals at said target genomic locus. The first and second datasets are analyzed, by a computer, to associate a feature of said immune cell repertoire with a genotype of said target genomic locus, wherein said first or said second population is associated with said

abnormality. The method further includes obtaining a genotype at said target locus from a sample and characterizing clinical status based on said determined association between the genotype at said target genomic locus and said associated feature of said immune cell repertoire.

[0020] The invention further encompasses a kit for performing a method of the invention, the kit comprising reagents for assaying a sample with unknown disease status at said genetic locus and for said immune cell clone, and instructions for diagnosing immune abnormality based on the known association between the genotype at said target genomic locus and the quantity of said immune cell clone.

[0021] In one embodiment the invention provides a method identifying a feature of an immune cell repertoire. The method includes obtaining an association between a feature of said immune cell repertoire and a genotype at a target genomic locus. The method also includes obtaining a genotype at said target genomic locus from a sample and identifying, by a computer, said feature of the immune cell repertoire of said patient based on said determined association between the genotype at said target genomic locus and said associated feature of said immune cell repertoire.

[0022] The invention also encompasses a method of modifying a treatment plan based on the associated feature of said immune cell repertoire with said genotype or epigenetic status of said target genomic locus.

[0023] In one embodiment, the invention encompasses a method for associating an immune cell repertoire with a genomic genotype. The method includes obtaining a first dataset comprising quantitative sequence information regarding a first immune cell repertoire of a first population of individuals, and obtaining a genotype of said first population of individuals at a target genomic locus. The method further includes obtaining a second dataset comprising quantitative sequence information regarding a second immune cell repertoire of a second population of individuals, and obtaining a genotype of said second population of individuals at said target genomic locus. The first and second dataset are input into an interpretation function stored on a computer to generate a score indicative of an association of a feature of said immune cell repertoire with a genotype of said target genomic locus.

[0024] In one aspect, the invention encompasses a method for associating an immune cell repertoire with a genomic genotype is described. The method includes obtaining a first dataset comprising quantitative sequence information about the immune cell repertoire of a first population of individuals, and obtaining a genotype of said first population of individuals at a target genomic locus. The method further includes obtaining a second dataset comprising quantitative sequence information about the immune cell repertoire of a second population of individuals and the genotype of said second population of individuals at said target genomic locus. The first and second datasets are input into an interpretation function stored on a computer to generate a score indicative of an association of a feature of an immune cell repertoire with a genotype of said target genomic locus.

[0025] The invention further provides a system for associating an immune cell repertoire with a genomic genotype, the system comprising: a processor; a storage memory comprising a first dataset and a second dataset, wherein said first dataset comprises quantitative sequence information about an immune cell repertoire of a first population of individuals and a genotype of said first population of individuals at a target genomic locus, said second dataset comprising quantitative sequence information about an immune cell repertoire of a second population of individuals and a genotype of said second population of individuals at said target genomic locus; and an interpretation function engine, executed by the processor, adapted to determine a score that indicates a correlation between a genotype at said target genomic locus and a feature of an immune cell repertoire from said first dataset and said second dataset.

[0026] In one embodiment, the invention includes a computer program product for determining a score indicative of a correlation between a genotype at a garget genomic locus and a feature of an immune cell repertoire, the computer program product stored on a non- transitory computer readable medium and including program code for, when loaded into memory and executed by a processor, carrying out the steps of: accessing a first dataset and a second dataset, wherein said first dataset comprises quantitative sequence information about an immune cell repertoire of a first population of individuals and a genotype of said first population of individuals at a target genomic locus, said second dataset comprising quantitative sequence information about an immune cell repertoire of a second population of individuals and a genotype of said second population of individuals at said target genomic locus; and determining a score with an interpretation function from said first dataset and said second dataset, wherein said score is indicative of a correlation between a genotype at said target genomic locus and a feature of an immune cell repertoire.

[0027] The invention also provides a system for associating an immune cell repertoire with a genomic genotype, the system comprising: a processor; a storage memory comprising a first dataset and a second dataset, wherein said first dataset comprises quantitative sequence information about an immune cell repertoire of a first population of individuals and a genotype of said first population of individuals at a target genomic locus, said second dataset comprising quantitative sequence information about an immune cell repertoire of a second population of individuals and a genotype of said second population of individuals at said target genomic locus; and an interpretation function engine, executed by the processor, adapted to determine a score that indicates a correlation between a genotype at said target genomic locus and a feature of an immune cell repertoire from said first dataset and said second dataset.

[0028] In one embodiment, the invention provides a computer program for determining a score indicative of a correlation between a genotype at a target genomic locus and a feature of an immune cell repertoire, the computer program product stored on a non-transitory computer readable medium and including program code for, when loaded into memory and executed by a processor, carrying out the steps of: accessing a first dataset and a second dataset, wherein said first dataset comprises quantitative sequence information about an immune cell repertoire of a first population of individuals and a genotype of said first population of individuals at a target genomic locus, said second dataset comprising quantitative sequence information about an immune cell repertoire of a second population of individuals and a genotype of said second population of individuals at said target genomic locus; and determining a score with an interpretation function from said first dataset and said second dataset, wherein said score is indicative of a correlation between a genotype at said target genomic locus and a feature of an immune cell repertoire.

Brief Description of the Figures

[0029] These and other features, aspects, and advantages of the present invention will become better understood with regard to the following description, and accompanying drawings, where:

[0030] Figure 1 depicts a method of linkage of germline and immune cell repertoire of two samples to determine predictive information on immune cell repertoire from genomic genotype.

[0031] Figure 2 depicts a method of linkage of germline and immune cell repertoire between in a host, e.g., a patient, and in a tissue donor, and the processing of this information to provide predictive information on aGVHD in a host after transplant from a donor. [0032] Figure 3 depicts a method of analysis of a control sample and a sample with an immunity disorder to derive information for linking sample's immune repertoire or genotype with an immunity disorder.

[0033] Figure 4 depicts a method of monitoring of immune cell repertoire of a patient having a transplant, immune disorder, or other status where a treatment affecting an immune cell repertoire is provided, and modifying treatment based on the results of monitoring the patient.

[0034] Figure 5 shows a diagrammatic representation of a machine in the exemplary form of a computer system within which a set of computer-implemented instructions, for causing the machine to perform any one of the methodologies discussed herein, may be executed.

[0035] Figure 6 shows a workflow for high-throughput generation of repertoire sequencing libraries. The first PCR uses a set of 30 primers to amplify the full T cell receptor β (TCRP) repertoire and attaches universal priming regions. The second PCR amplifies the repertoire with universal primers and adds sequences for next-generation sequencing.

[0036] Figure 7 shows a table including a summary of protocol performance in repertoire sequencing optimization experiments. R 2 and slope are computed from a regression analysis between the observed vs. known input counts.

[0037] Figure 8 shows a) the flare risk score (range of 0-30) of patients with relapsing- recurring multiple sclerosis (x-axis) versus risk of flare at one year (y-axis) as determined by measuring oligoclonal T cells by repertoire sequencing first in lumbar puncture samples and then in peripheral blood and b) detection of MS-specific immune cells in both cerebrospinal fluid (CSF) and blood using repertoire sequencing. Each axis represents a repertoire sequencing sample and contains thousands of nodes corresponding to the clones identified in the sample by repertoire sequencing. The nodes are ranked along each axis from most abundant (distal) to least abundant (proximal). A node's thickness is proportional to the clonotype's frequency in the sample. Links between axes connect clonotypes present in both samples and thus indicate differences in rank and frequency. Dark gray nodes are < l% frequency. The global difference between each sample pair is quantified by the

Bhattacharyya coefficient, here annotated midway between the nodes. Such plotting methodology has been described previously (Krzywinski et al, 2011).

[0038] Figure 9 is a flowchart depicting steps in using repertoire sequencing to assay multiple sclerosis (MS). Detailed Description

[0039] Briefly, and as described in more detail below, described herein are methods and systems for integrating genomic sequences with immune monitoring. The presence of monoclonal populations of immune cells is determined by repertoire sequencing to determine an immune status. This immune status may be linked, e.g., to genomic genotype or treatment to provide e.g. a prognosis or guidance to modify treatment. This immune status determined from repertoire sequencing may also be used to guide selection of matches for tissue implantation into a host, through monitoring immune status and/or linkage of immune status to genomic genotype. The immune status may also be used to monitor or modify treatments for suppressing an undesirable response (e.g., aGVHD) to a tissue transplant, such as bone marrow transplant (BMT). One embodiment of a method for obtaining information useful for predicting immune cell repertoire from genomic genotype from two samples is shown in Figure 1.

[0040] The invention may be practiced, for example, with the use of a method and/or system for massively parallel genetic analysis of single cells in emulsion droplets or reaction containers as described in co-owned WO/2012/083225, hereby incorporated by reference in its entirety into this application for all purposes.

Definitions

[0041] Terms used in the claims and specification are defined as set forth below unless otherwise specified.

[0042] The term "cell" refers to a functional basic unit of living organisms. A cell includes any kind of cell (prokaryotic or eukaryotic) from a living organism. Examples include, but are not limited to, mammalian mononuclear blood cells, yeast cells, or bacterial cells.

[0043] The term "germline" refers to the set of genomic sequences transferred from parents to their offspring. A "germline variant" is therefore a genomic sequence or variant that is inherited from the parents by their offspring.

[0044] The term "somatic cell mutation" is a genomic sequence in a particular individual that differs from their inherited germline sequences. A somatic cell mutation typically arises during the course of an individual's lifetime due to errors during DNA replication or environmental exposure. Somatic cell mutations sometimes lead to growth of tumors.

[0045] The term "epigenetic modification" refers to any modifications to genomic DNA which confer genetic information but which are not nucleotide substitutions. For example, epigenetic modifications may result from methylation of CpG DNA sequences. Methylation in a promoter region of the genome can suppress gene expression patterns. Epigenetic modifications are often associated with cancer.

[0046] The term "polymerase chain reaction" or PCR refers to a molecular biology technique for amplifying a DNA sequence from a single copy to several orders of magnitude (thousands to millions of copies). PCR relies on thermal cycling, which requires cycles of repeated heating and cooling of the reaction for DNA melting and enzymatic replication of the DNA. Primers (short DNA fragments) containing sequences complementary to the target region of the DNA sequence and a DNA polymerase are key components to enable selective and repeated amplification. As PCR progresses, the DNA generated is itself used as a template for replication, setting in motion a chain reaction in which the DNA template is exponentially amplified. A heat-stable DNA polymerase, such as Taq polymerase, is used. The thermal cycling steps are necessary first to physically separate the two strands in a DNA double helix at a high temperature in a process called DNA melting. At a lower temperature, each strand is then used as the template in DNA synthesis by the DNA polymerase to selectively amplify the target DNA. The selectivity of PCR results from the use of primers that are

complementary to the DNA region targeted for amplification under specific thermal cycling conditions.

[0047] The term "bulk sequencing" or "next generation sequencing" or "massively parallel sequencing" refers to any high throughput sequencing technology that parallelizes the DNA sequencing process. For example, bulk sequencing methods are typically capable of producing more than one million polynucleic acid amplicons in a single assay. The terms "bulk sequencing," "massively parallel sequencing," and "next generation sequencing" refer only to general methods, not necessarily to the acquisition of greater than 1 million sequence tags in a single run. Any bulk sequencing method can be implemented in the invention, such as reversible terminator chemistry (e.g., Illumina), pyrosequencing using polony emulsion droplets (e.g., Roche), ion semiconductor sequencing (IonTorrent), single molecule sequencing (e.g., Pacific Biosciences), massively parallel signature sequencing, etc.

[0048] The term "T cell" refers to a type of cell that plays a central role in cell-mediated immune response. T cells belong to a group of white blood cells known as lymphocytes and can be distinguished from other lymphocytes, such as B cells and natural killer T (NKT) cells by the presence of a T cell receptor (TCR) on the cell surface. T cells responses are antigen- specific and are activated by foreign antigens. T cells are activated to proliferate and differentiate into effector cells when the foreign antigen is displayed on the surface of the antigen-presenting cells in peripheral lymphoid organs. T cells recognize fragments of protein antigens that have been partly degraded inside the antigen-presenting cell. There are two main classes of T cells - cytotoxic T cells and helper T cells. Effector cytotoxic T cells directly kill cells that are infected with a virus or some other intracellular pathogen. Effector helper T cells help to stimulate the responses of other cells, mainly macrophages, B cells and cytotoxic T cells.

[0049] The term "B cell" refers to a type of lymphocyte that plays a large role in the humoral immune response (as opposed to the cell-mediated immune response, which is governed by T cells). The principal functions of B cells are to make antibodies against antigens, perform the role of antigen-presenting cells (APCs) and eventually develop into memory B cells after activation by antigen interaction. B cells are an essential component of the adaptive immune system.

[0050] The term "immune cell repertoire" or "immune repertoire" is defined as the number of different sub-types an organism's immune system makes, of any of the 6 keys types of protein, either immunoglobulin or T cell receptor. The 6 key types of proteins refer to heavy and light chain immunoglobulins, and alpha, beta, gamma, and delta T cell receptors. The sub-type of these proteins may be identified by the nucleotide sequence of a variable domain of these proteins, which can then be used to identify clonal populations of immune cells, e.g., T cells and/or B cells. An immune cell repertoire or immune repertoire may refer herein to information on the abundance and identity of each clonal population of immune cells, e.g., T cells and/or B cells. This information may be specific to the source of the sample used to obtain the immune cell repertoire. The source may be, e.g., splenocytes, peripheral blood, spleen, skin, mesenteric lymph nodes, and/or colon. A "feature" of an immune cell repertoire may include a quantity of a clonal population identified, e.g., by the sequence of a variable domain, e.g., a VDJ segment. This quantity could be zero, indicating that there is an absence of a known clonal population. This could indicate a negative association between, e.g., a genomic genotype and the presence of a clonal population in an immune cell repertoire. It may also indicate a negative association between an immune cell repertoire and a selected treatment, such as a pharmaceutical treatment, a biologic treatment, or a tissue implant. This association may be related to the genomic genotype.

[0051] The term "syngeneic" refers to individuals who are genetically identical or closely related, so as to allow tissue transplant, e.g., they are immunologically compatible (i.e., antigenically similar. This is the case e.g., when referring to a bone marrow transplant from one identical twin to the other. [0052] The term "allogeneic mismatch" refers to individuals who are genetically distinct and may have some immunological incompatibility. A "minor" allogeneic mismatch refers to the relationship between a genetically distinct donor and acceptor (e.g., a host) having the same human leukocyte antigen (HLA). A "major" allogeneic mismatch refers to the relationship between a genetically distinct donor and acceptor having distinct HLAs.

[0053] The term "Bhattacharyya coefficient" refers to an approximate measurement of the amount of overlap between two statistical samples. The coefficient may be used to determine the relative closeness of the two samples being considered. In one embodiment, calculating the Bhattacharyya coefficient involves a form of integration of the overlap of the two samples. The interval of the values of the two samples is split into a chosen number of partitions, and the number of members of each sample in each partitions is used in the following formula:

where considering the samples a and b, n is the number of partitions, and

number of members of samples a and b in the i'th partition. This formula is larger with each partition that has members from both sample, and larger with each partition that has a large overlap of the two sample's members within it. The Bhattacharyya coefficient will be 0 if there is no overlap at all.

[0054] The term "immune-mediated function" as used herein refers to an immune status giving an immune response. This response may be the result of, e.g., a transplant rejection, an auto-immune disorder, or an immune-mediated treatment response.

[0055] The term "abnormality" refers to the presence of an activity or feature which differs from a normal activity or feature. An abnormality may refer to a disease or disorder in need of treatment.

[0056] The term "treating" or "treatment" as used herein refers to, in one embodiment, ameliorating a disease or disorder (i.e., arresting or reducing the development of the disease or at least one of the clinical symptoms thereof). In another embodiment "treating" or "treatment" refers to ameliorating at least one physical parameter, which may not be discernible by the patient. In yet another embodiment, "treating" or "treatment" refers to modulating the disease or disorder, either physical (e.g., stabilization of a discernible symptom), physiologically, (e.g., stabilization of a physical parameter), or both. In yet another embodiment, "treating" or "treatment" refers to preventing or delaying the onset or development or progression of a disease or disorder.

[0057] It must be noted that, as used in the specification and the appended claims, the singular forms "a," "an," and "the" include plural referents unless the context clearly dictates otherwise.

[0058] Unless specifically stated otherwise as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as "processing" or "computing" or "calculating" or "determining" or "displaying" or "analyzing" or

"comparing" or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.

[0059] The present invention also relates to system apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus. An example diagram of such a device is represented in Figure 5.

[0060] The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the required method procedures. The required structure for a variety of these systems will appear form the description below. In addition, the present invention is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the invention as described herein. Repertoire sequencing

[0061] There is a need for measurements to help personalize therapy and develop new therapeutic protocols. For more than 80% of diseases, 65-85% of disease risk cannot be explained by sequencing the entire genome. For example, only 35% of identical twin multiple sclerosis patients are both affected. Blood cell populations multiply and decline in the course of multiple sclerosis. These cells could be used as biomarkers to predict disease progression. Every person has millions of different immune blood cell genomes, which are reflected by the immune cell repertoire. Standard sequencing only produces an average value of these genomes. Repertoire sequencing provides a single-cell analysis method to provide the identity and abundance of the immune cells in the immune cell repertoire. Repertoire sequencing also finds many uses in other techniques, for example bone marrow

transplantation prognosis and immune response treatment modulation, as described throughout this specification.

[0062] The disclosed repertoire sequencing methods and systems provide a standardized quantitative endpoint for monitoring immune cell repertoire. These can be used, e.g., for monitoring immunosuppressive therapy during acute graft- versus-host disease (aGVHD). One embodiment of determining predictive information on aGVHD in a host after transplant from a donor is shown in Figure 2. Repertoire sequencing is capable of detecting, e.g., T cell populations or B cell populations from selected cell populations, e.g., from peripheral blood samples. Characterization of the immune cell repertoire using repertoire sequencing can be used e.g., to link an immune cell repertoire or immune status to a genotype of an individual or sample. This immune cell repertoire can be linked, e.g., to genomic genotype or treatment to provide e.g. a prognosis or guidance to modify treatment. This immune status determined from repertoire sequencing can also be used to guide selection of matches for tissue implantation into a host, through monitoring immune status and/or linkage of immune status to genomic genotype. In one embodiment, the invention is used to provide predictive information on the presence of an immunity disorder from the genomic genotype as shown in Figure 3. The immune status can also be used to monitor or modify treatments for suppressing an undesirable response (e.g., aGVHD) to a tissue implant, such as bone marrow implant.

[0063] Information from repertoire sequencing can be used e.g., to help in the treatment of disorders by helping physicians decide when to taper, increase, or change drug therapy in response to an immune repertoire or immune status. In one aspect, the invention is used to monitor a treated sample at selected timepoints to modify or maintain treatment based on the results of the monitoring as shown in Figure 4. Repertoire sequencing can also be used for management of a variety of other related disorders, such as acute kidney graft rejection and chronic graft-versus-host disease.

[0064] To perform repertoire sequencing, primers are designed to amplify a variable region of a nucleic acid molecule encoding an immunoglobulin or T cell receptor protein to identify clonal subtypes of populations comprising the immune cell repertoire. Immunoglobulins include both heavy and light chain immunoglobulins. T cell receptors include alpha, beta, gamma, and delta T cell receptors. The variable region can include, for example, a VDJ segment. The variable region is sequenced to obtain subtype identity of a clonal population of an immune cell repertoire. The quantity of sequences matching the subtype can also be obtained to identify both subtype and quantity of subtype relative to the immune cell repertoire population. This information provides a characterization of the immune cell repertoire, which can be used to provide, e.g., a prognosis for a treatment, or to modulate or maintain a treatment. Such treatments include a tissue transplant (e.g., a bone marrow transplant), the administration of a pharmacological substance, or the administration of a biologies. Other such treatments include any treatment of a disease, abnormality, or other disorder, any treatment to alleviate symptoms of a disorder, or any treatment to alleviate symptoms of a related treatment of a disorder.

[0065] In one aspect, repertoire sequencing is used to identify the presence and quantity of cell populations from a sample, e.g., tissue biopsies of patients diagnosed with an immune disorder. In one embodiment, the present invention is used to compare compatibility of a host and an implant. This can be done, e.g., by obtaining the genotype of the host and the donor, and inserting them into a database to determine compatibility and possible adverse immune responses, such as aGVHD. The database comprises information about the relationship between an immunological repertoire of a host and its genotype. The database also can comprise information about the relationship between an immunological repertoire of a donor and its genotype. The database can also comprise, for example, information about the compatibility of a host and a donor based on information about the genotype of the host or the donor and, e.g., information about the immunological repertoire of the host or the donor.

[0066] The immunological repertoire of an organism includes information pertaining to the make-up and status of the immune cells in an organism. The make-up of the immunological repertoire is in part determined by the genomic genotype of an organism. Characterization of the immunological repertoire can include information such as the types of clonal populations of cells, e.g., B cell and T cells, from various sources and tissues throughout an organism, e.g., spleen, liver, lymph nodes, etc. The immunological repertoire also can contain quantitative information relating to the quantity (absolute or relative) of the clonal populations in various sources or tissues throughout an organism. This information about the immunological repertoire of an organism helps to characterize the immune status of an organism. The immune status can be useful for characterizing and predicting a response of an organism to various treatments, including pharmaceutical compounds, biologies, and tissue implants (artificial or from another organism).

[0067] Tissue implants are useful for a wide range of treatments. Transplant of a tissue can result in adverse effects, such as aGVHD. These adverse effects can be due to an undesirable immune status. This immune status can be characterized by information about the immunological repertoire of an organism, which can be linked to the organism's genomic genotype. Implants into a host organism may be from another organism, or implants may be artificial. A relationship between the genomic genotype of a host organism and a donor organism plays a role in the compatibility of donor tissue from the donor organism to the host organism. These relationships are characterized as, e.g., syngeneic, allogeneic major mismatch, or allogeneic minor mismatch.

[0068] Repertoire sequencing is used to detect greater than 50%, 60%, 70%>, 75%, 80%, or 90%) of colonoscopy-associated T cell clonotypes (e.g., T-cells retrieved from a colon using an endoscope) in repertoire sequencing profiles in 50%>, 60%>, 70%>, 80%>, or 90%> of confirmed aGVHD patients. In one embodiment, these repertoire sequencing profiles are from matched peripheral blood. In a preferred embodiment, repertoire sequencing will be used to detect >75%> of colonoscopy-associated T cell clonotypes in repertoire sequencing profiles from matched peripheral blood, for 80% of confirmed aGVHD patients (power=0.8; a=0.05);

[0069] In another embodiment, repertoire sequencing is used to show that colonoscopy- associated T cell clonotypes in peripheral blood from subjects diagnosed with aGVHD are significantly more abundant in subjects diagnosed with aGVHD than those in subjects who underwent endoscopy but did not have confirmed aGVHD. In still another embodiment, repertoire sequencing is used to provide information to show that aGVHD-associated T cell clones in peripheral blood during and after tapering of immunosuppressive therapy are absent or greatly reduced compared with the peripheral blood of patients with uncontrolled or relapsed GVHD. [0070] In one embodiment, repertoire sequencing is applied to any type of transplantation, chronic GVHD, and for more generalized monitoring of drug response and autoimmune disease. In another embodiment, repertoire sequencing is used for characterization of GVHD, e.g. in studies that aim to understand why some patients develop self-limited disease with an accompanying graft- versus-leukemia effect whereas others develop fatal aGVHD.

[0071] In other aspects, repertoire sequencing is used to identity, monitor, or characterize a disorder, e.g., an immune disorder, an autoimmune disorder (e.g., multiple sclerosis, inflammatory bowel disease, or rheumatoid arthritis) or a cancer (e.g., lung cancer, colorectal cancer, or prostate cancer). Repertoire sequencing is used for giving a prognosis or a diagnosis. Repertoire sequencing is also used for monitoring immune response to a treatment.

Statistical Analysis

[0072] In one embodiment of the invention, the risk of acute GVHD can be predicted by identifying T cell clones that were not apparent prior to but which are dominant after bone marrow transplantation (BMT). After quantifying clones from patient samples before and after BMT we can determine T cell count normalized clonotype frequencies for each sample. From an asymptomatic repertoire we characterize the "background" repertoire. Each repertoire from a subsequent blood draw can then be compared to the asymptomatic background repertoire. We monitor the gradient in clonotype frequency, with special focus on large positive gradients and the Bhattacharyya coefficient. The subject's risk can then be classified as low or high using a dynamically determined threshold for the clonotype frequency gradient and Bhattacharyya coefficient.

[0073] Assuming the gradient threshold is 10, and the Bhattacharyya coefficient threshold is 0.0015. In a simplified example, if a patient sample shows one clonotype at normalized frequency 0.5% in the blood draw before BMT (Sample P, time 0), and at values 3% at time 1 (Sample Ql), 4% at time 2 (Sample Q2), 5% at time 3 (Sample Q3), and 6% at time 4 (Sample Q4), the frequency gradient between time 0 and time 3 is greater or equal to the gradient threshold. The Bhattacharyya coefficient at time 1, time 2, time 3, time 4 is

BC=sqrt(p(x)*q(x) where p(x) and q(x) are the clone frequencies in a combination of sample P and one sample Qi, yielding numbers of 0.0011, 0.0014, 0.0016, and 0.0055. Both the gradient threshold and Bhattacharyya coefficient thresholds have been met or exceeded at time 3, where the risk of acute GVHD is determined to be actionably high. [0074] In one embodiment of the invention, T cell repertoire sequencing alone has a sensitivity of -90% (Sens l) and a specificity of -5% (Spec l) for predicting acute GVHD outcomes. In this embodiment, a germline genetic diagnostic, such as HLA type mismatch, is assumed to have 73% sensitivity (Sens_2) and 20% specificity (Spec_2). Combining T cell repertoire sequencing and the HLA type mismatch in parallel or combination (Fundamentals of Clinical Research for Radiologists, 2005, 184 (1), Susan Weinstein, Nancy A.

Obuchowski, Michael L. Lieber), we can crease the sensitivity (Sens_tot = Sens_l + Sens_2 - (Sens_l * Sens_2)) to over 97% (Sens_tot) while maintaining a reasonable specificity at (Spec_tot = Spec_l * Spec_2) 10% (Spec_tot). In this way, germline genetic measurements are integrated with T cell repertoire sequencing to increase the overall sensitivity and specificity of the clinical test.

[0075] In other embodiments of the invention, measurements of tumor somatic cell mutations or epigenetic changes are associated with cancer phenotypes. In parallel, T or B cell repertoire sequencing data are associated with cancer phenotypes. The repertoire data and the somatic cell mutation or epigenetic change data are then combined using statistical methods such as those described above to produce a more sensitive and specific test for predicting cancer outcomes.

Examples

Example 1: Repertoire Sequencing (REP-SEQ):

[0076] We have developed an optimized protocol for repertoire sequencing, e.g., for TCRP repertoire sequencing, identified as GigaMune™ REP-SEQ. This repertoire sequencing (REP-SEQ) technology is more quantitative than currently available methods. In one aspect, REP-SEQ is used to identify specific T cell clones dominant in cell populations from a sample, e.g., tissue biopsies of patients diagnosed with acute graft-versus-host disease (aGVHD). These T cell clones are then quantified e.g., in peripheral blood over time. This protocol is used, e.g., for monitoring aGVHD-associated T cell activity during treatment of aGVHD, thus providing a quantitative, objective, and personalized surrogate endpoint highly predictive of clinical outcomes.

[0077] REP-SEQ provides a standardized quantitative endpoint for monitoring

immunosuppressive therapy during, e.g., aGVHD. REP-SEQ detects oligoclonal T cell populations in endoscopy-derived patient samples and then follows these specific T cell clones in peripheral blood. This information helps doctors decide when to taper, increase, or change drug therapy. The technology is also applied to related clinical problems, such as acute kidney graft rejection and chronic GVHD.

Example 2: REP-SEQ Library Construction

[0078] We validate protocols for library construction, sequencing, and analysis. To this end, we tested our ability to accurately measure clone ratios a defined template pool. We have also used REP-SEQ to detect disease-associated clones in both colon and peripheral blood in mouse models for T cell transplantation and in human pilot studies.

[0079] We have implemented a high-throughput protocol for human T cell receptor β (TCRP) repertoire bulk sequencing, using an Illumina next generation sequencer. We first perform multiplex PCR on the VDJ segment of the T cell receptor using a set of 20 primers to amplify across all 50 V segments and 10 primers to amplify across all 13 J segments (Figure 6). The primers also have tails with the same sequence as a portion of a next-generation sequencing (NGS) library adapter (Figure 6). The 30 primers are pooled in a single 400μ1 PCR reaction, which contains genomic DNA from at least 5xl0 5 cells. The reactions are then thermocycled for no more than 25 cycles. Next, we perform a second round of PCR, using an aliquot of the first round analyte and a set of universal primers. The second PCR produces products that have the full Illumina sequencing adapter sequence fused to a library of TCRP sequences (Figure 6). These products can then be sequenced directly by an Illumina Next Generation Sequencer (i.e., "bulk sequencing). We have built and sequenced several hundred libraries using this protocol. One person can build as many as 96 libraries in only a few days. We have also implemented similar protocols for human immunoglobulin heavy chain variable regions (CDR3).

REP-SEQ Protocol Optimization Using 48-plex Pool of TCR Plasmid Clones

[0080] For clinical use, the REP-SEQ protocol must be highly quantitative and reproducible. Therefore, we performed a number of tests to document the protocol's performance. The true content of any particular endogenous TCRP repertoire is not known, so a blood sample cannot serve as a gold standard for protocol optimization. Therefore, we have designed a 48- plex pool of TCRP plasmid clones to act as template for protocol optimization. First, we performed multiplexed amplification of the TCRP repertoire as described above, using a normal genomic DNA control as template. We subcloned the PCR products using the TOPO- TA vector (Life Technologies), transformed the ligation, and sequenced 48 clones. The resulting plasmids were then mixed in a single tube across three orders of magnitude of concentration and with six replicates at each concentration. [0081] We then used this 48-plex mixture to optimize the TCRP REP-SEQ protocol. We focused on purification after the first and second PCR steps, the number of cycles in the first PCR, and the annealing temperature in the first PCR. We made 68 REP-SEQ libraries, each with different protocol parameters. We sequenced the libraries on our next-generation sequencing (NGS) machine (Illumina) to obtain >500k paired-end 80bp sequence tags for each library. To analyze the sequencing data, we first aligned each sequence tag to the sequences of the 48 known plasmid clones. Then, we tallied the number of NGS tags aligned to each plasmid for each library, and then correlated these results with the expected ratios of the input plasmid clones. We then performed a linear regression analysis to fit each data set (Figure 7). This optimization demonstrated that the best protocol uses 15 cycles of amplification for the first PCR, an annealing temperature of 61°C, PCR column purification after the first PCR, and gel purification following the second PCR. At an R 2 of 0.72 and a slope of 0.71, the protocol is more quantitative than any published method.

Example 3: Computational Pipeline for TCRB REP-SEQ Data Analysis.

[0082] Because the TCRP repertoire contains as many as 5xl0 6 clonotypes, and CDR3 sequences often differ by only a few nucleotides, a sophisticated custom analysis platform is necessary to identify the clones in the library. We have built an algorithm that is faster than any current method by almost an order of magnitude. We first start with a table of 4-8 nucleotide "words" that uniquely identify the V and J segments of mouse or human within the amplified region. We then test the validity of each match by identifying the distance to and the sequence of the second conserved cysteine. The match is accepted as correct only if both distance and sequence confirm the match. Using data from our TCRP repertoire sequencing experiments, we typically identify -99.98% of V-jp combinations unambiguously. The remaining reads are discarded.

[0083] There are two further quality control steps: (i) we require that the CDR3 region must not contain any sequencing errors in the form of uncalled bases; and (ii) we require that the CD 3 region is in frame as defined by the second conserved cysteine, and (iii) we require the absence of in frame stop codons in the CDR3 region. If all quality tests are passed, the method identifies the protein coding sequence of the CDR3 region within the known reading frame for that particular gene. This algorithm ensures speed, accuracy and lowest error rates.

[0084] We have implemented similar analyses for analysis of immunoglobulin heavy chain variable regions. In principle this method can be used for any type of variable region and in any species with an adaptive immune system. Example 4: Multiple Sclerosis Prognosis

[0085] Annually, patients with MS suffer an average of one debilitating disease flare that necessitates inpatient hospital care (Naci et al., 2010) (Figure 8A). Patients would benefit substantially from a molecular test that could help predict disease progression. MS is already associated with germline variants, particularly at the HLA-DRA locus, the interleukin-7 receptor alpha gene, and the interleukin-2 receptor alpha gene (Hafler et al., 2007), and separate data have pointed to the importance of peripheral blood T cells in disease progression (Muraro et al, 2003). Therefore we use our biostatistical methods to combine prognostic factors in MS. We first set out to determine whether particular immune cell repertoire patterns could be associated with MS. From a commercial biorepository, we obtained cerebrospinal fluid (CSF) cell pellets and matched peripheral blood mononuclear cells (PBMCs) from patients diagnosed with primary progressive or secondary progressive multiple sclerosis. Genomic DNA was extracted and purified from each sample, and REP- SEQ was performed on the samples, as described above. Our data show that we are able to use CSF to discover MS-related clones in PBMCs (Figure 8B). In one embodiment of the invention, these MS-related clones are then associated with germline factors such as HLA- DRA, interleukin-7 receptor alpha, interleukin-2 receptor alpha, or others, and the integrated data sets are used to formulate a high specificity, high sensitivity prognosis for the patient. Based on this prognosis, the attending doctor alters the course of therapy, which improves patient outcomes (Figure 9).

[0086] While the invention has been particularly shown and described with reference to preferred and various alternate embodiments, it will be understood by persons skilled in the relevant art that various changes in form and details can be made therein without departing from the spirit and scope of the invention.

[0087] All references, issued patents and patent applications cited within the body of the instant specification are hereby incorporated by reference in their entirety, for all purposes.

References

Hafler, D.A., Compston, A., Sawcer, S., Lander, S., Daly, M.J., DeJager, P.L., de Bakker, P.I.W., Gabriel, S.B., Mirel, D.B., Ivinson, A.J., Pericak- Vance, M.A., Gregory, S.G., Rioux, J.D., McCauley, J.L., Haines, J.L., Barcellos, L.F., Cree, B., Oksenberg, J.R., Hauser, S.,L. Risk alleles for multiple sclerosis identified by a genomewide study. N Engl J Med 30;

357(9):851-62, 2007. Krzywinski, M., Birol, I., Jones, S. J. & Marra, M. A. Hive plots—rational approach to visualizing networks. Briefings in Bioinformatics (201 l).doi: 10.1093/bib/bbr069 Muraro PA, Wandinger K-P, Bielekova B, Gran B, Marques A, Utz U, McFarland HF, Jacobson S, Martin R. (2003). Molecular tracking of antigen-specific T cell clones in neurological immune-mediated disorders.

Naci H, Fleurence R, Birt J, Duhig A. (2010). Economic burden of multiple sclerosis. Pharmacoeconomics 28: 363-379.