Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHODS AND SYSTEMS FOR MACHINE LEARNING ANALYSIS OF LUPUS NEPHRITIS
Document Type and Number:
WIPO Patent Application WO/2024/015621
Kind Code:
A1
Abstract:
A method for assessing a lupus nephritis disease state of a patient, the method comprising: analyzing a data set comprising or derived from gene expression measurement data of at least 2 genes or human orthologs thereof selected from the genes listed in Tables 19-1 to 19-36, Tables 19A-1 to 19A-36, Table 20, Table 21, Table 22, Tables 23-1 to 23-28, Tables 25-1 to 25-32, Tables 26-1 to 26-60, Tables 27-1 to 27-48, and Tables 28-1 to 28-22 in a biological sample from the patient, to classify the lupus nephritis disease state of the patient.

Inventors:
ALLISON KATHRYN K (US)
SHROTRI SNEHA (US)
DAAMEN ANDREA (US)
GRAMMER AMRIE C (US)
LIPSKY PETER E (US)
Application Number:
PCT/US2023/027847
Publication Date:
January 18, 2024
Filing Date:
July 14, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
AMPEL BIOSOLUTIONS LLC (US)
International Classes:
C12Q1/6883; C12Q1/6881; G16B25/10; G16H50/20; G16H50/50
Foreign References:
US20140135225A12014-05-15
Other References:
FAIRFAX, BP ET AL.: "Genetics of gene expression in primary immune cells identifies cell type-specific master regulators and roles of HLA alleles", NATURE GENETICS, vol. 44, 25 March 2012 (2012-03-25), pages 502 - 510, XP055262254, DOI: 10.1038/ng.2205
CLANCY ROBERT M., MARION MIRANDA C., KAUFMAN KENNETH M., RAMOS PAULA S., ADLER ADAM, HARLEY JOHN B., LANGEFELD CARL D., BUYON JILL: "Identification of candidate loci at 6p21 and 21q22 in a genome‐wide association study of cardiac manifestations of neonatal lupus", ARTHRITIS & RHEUMATISM, WILEY INTERSCIENCE, US, vol. 62, no. 11, 1 November 2010 (2010-11-01), US , pages 3415 - 3424, XP093132359, ISSN: 0004-3591, DOI: 10.1002/art.27658
DAAMEN ANDREA R., WANG HONGYANG, BACHALI PRATHYUSHA, SHEN NAN, KINGSMORE KATHRYN M., ROBL ROBERT D., GRAMMER AMRIE C., FU SHU MAN,: "Molecular mechanisms governing the progression of nephritis in lupus prone mice and human lupus patients", FRONTIERS IN IMMUNOLOGY, FRONTIERS MEDIA, LAUSANNE, CH, vol. 14, 1 March 2023 (2023-03-01), Lausanne, CH , pages 01 - 17, XP093132360, ISSN: 1664-3224, DOI: 10.3389/fimmu.2023.1147526
Attorney, Agent or Firm:
CHANG, ARDITH (US)
Download PDF:
Claims:
CLAIMS

WHAT IS CLAIMED IS:

1. A method for assessing a lupus nephritis disease state of a patient, the method comprising: analyzing a data set comprising or derived from gene expression measurement data of at least 2 genes or human orthologs thereof selected from the genes listed in Tables 19-1 to 19- 36, Tables 19A-1 to 19A-36, Table 20, Table 21, Table 22, Tables 23-1 to 23-28, Tables 25-1 to 25-32, Tables 26-1 to 26-60, Tables 27-1 to 27-48, and Tables 28-1 to 28-22 in a biological sample from the patient, to classify the lupus nephritis disease state of the patient.

2. The method of claim 1, wherein the lupus nephritis disease state of the patient is classified as acute lupus nephritis, transitional lupus nephritis, chronic lupus nephritis, or absence of lupus nephritis.

3. The method of claim 1 or claim 2, wherein the data set comprises or is derived from gene expression measurement data of at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 255, 260, 265, 270, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350, 355, 360, 365, 370, 375, 380, 385, 390, 395, 400, 450, 500, 550, 600, 650, 700, 750, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1550, 1600, 1700, 1800, 1900, or 2000 genes, selected from the genes listed in Tables 19-1 to 19-36, Tables 19A-1 to 19A-36, Table 20, Table 21, Table 22, Tables 23-1 to 23-28, Tables 23-1 to 23-28, Tables 26-1 to 26-60, Tables 27-1 to 27-48, and Tables 28-1 to 28-22 in the biological sample from the patient.

4. The method of any one of claims 1 to 3, wherein the genes or human orthologs thereof are selected from the genes listed in Tables 19-1 to 19-36.

5. The method of any one of claims 1 to 3, wherein the genes or human orthologs thereof are selected from the genes listed in Table 20.

6. The method of any one of claims 1 to 3, wherein the genes or human orthologs thereof are selected from the genes listed in Table 21.

7. The method of any one of claims 1 to 3, wherein the genes or human orthologs thereof are selected from the genes listed in Table 22.

8. The method of any one of claims 1 to 3, wherein the genes are selected from the genes listed in Tables 23-1 to 23-28.

9. The method of any one of claims 1 to 3, wherein the genes are selected from the genes listed in Tables 25-1 to 25-32.

10. The method of any one of claims 1 to 3, wherein the genes are selected from the genes listed in Tables 26-1 to 26-60.

11. The method of any one of claims 1 to 3, wherein the genes are selected from the genes listed in Tables 27-1 to 28-48.

12. The method of any one of claims 1 to 3, wherein the genes are selected from the genes listed in Tables 28-1 to 28-22.

13. The method of any one of claims 1 to 12, wherein the data set comprises or is derived from gene expression measurement data of at least 2 to all, or any value or range there between, genes or human orthologs thereof selected from the genes listed in each of one or more Tables selected from Tables 19-1 to 19-36, Tables 19A-1 to 19A-36, Table 20, Table 21,

Table 22, Tables 23-1 to 23-28, Tables 25-1 to 25-32, Tables 26-1 to 26-60, Tables 27-1 to 27-48, and Tables 28-1 to 28-22 in the biological sample from the patient, wherein a different or identical number of genes are selected from the genes listed in each selected table.

14. The method of any one of claims 1 to 4 and 13, wherein the one or more Tables are selected from Tables 19-1 to 19-36.

15. The method of claim 14, wherein the one or more Tables comprise at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, or 36 Tables selected from Tables 19-1 to 19-36.

16. The method of claim 14 or 15, wherein the selected Tables are Tables 19-1 to 19-36.

17. The method of any one of claims 1 to 3, 8 and 13, wherein the one or more Tables are selected from Tables 23-1 to 23-28.

18. The method of claim 17, wherein the one or more Tables comprise at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, or 28 Tables selected from Tables 23-1 to 23-28.

19. The method of claim 17 or 18, wherein the selected Tables are Tables 23-1 to 23-28.

20. The method of any one of claims 1 to 3, 9 and 13, wherein the one or more Tables are selected from Tables 25-1 to 25-32.

21. The method of claim 20, wherein the one or more Tables comprise at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, or 28 Tables selected from Tables 25-1 to 25-32.

22. The method of claim 20 or 21, wherein the selected Tables are Tables 25-1 to 25-32.

23. The method of any one of claims 1 to 3, 10 and 13, wherein the one or more Tables are selected from Tables 26-1 to 26-60.

24. The method of claim 23, wherein the one or more Tables comprise at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,

34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, or 60 Tables selected from Tables 26-1 to 26-60.

25. The method of claim 23 or 24, wherein the selected Tables are 26-1 to 26-60.

26. The method of any one of claims 1 to 3, 11 and 13, wherein the one or more Tables are selected from Tables 27-1 to 27-48.

27. The method of claim 26, wherein the one or more Tables comprise at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, or 48 Tables selected from Tables 27-1 to 27-48.

28. The method of claim 26 or 27, wherein the selected Tables are 27-1 to 27-48.

29. The method of any one of claims 1 to 3, 12 and 13, wherein the one or more Tables are selected from Tables 28-1 to 28-22.

30. The method of claim 29, wherein the one or more Tables comprise at least 2, 3, 4, 5, 6, 7, 8,

9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or 22 Tables selected from Tables 28-1 to 28-22.

31. The method of claim 29 or 30, wherein the selected Tables are Tables 28-1 to 28-22.

32. The method of any one of claims 1 to 31, wherein the lupus nephritis disease state of the patient is classified with an accuracy of at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%.

33. The method of any one of claims 1 to 32, wherein the lupus nephritis disease state of the patient is classified with a sensitivity of at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%.

34. The method of any one of claims 1 to 33, wherein the lupus nephritis disease state of the patient is classified with a specificity of at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%.

35. The method of any one of claims 1 to 34, wherein the lupus nephritis disease state of the patient is classified with a positive predictive value of at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%.

36. The method of any one of claims 1 to 35, wherein the lupus nephritis disease state of the patient is classified with a negative predictive value of at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%.

37. The method of any one of claims 1 to 36, wherein the lupus nephritis disease state of the patient is classified with a Receiver operating characteristic (ROC) curve having an Area- Under-Curve (AUC) of at least about 0.80, at least about 0.85, at least about 0.90, at least about 0.91, at least about 0.92, at least about 0.93, at least about 0.94, at least about 0.95, at least about 0.96, at least about 0.97, at least about 0.98, at least about 0.99, or more than about 0.99.

38. The method of any one of claims 1 to 37, wherein the data set is derived from the gene expression measurement data using gene set variation analysis (GSVA), gene set enrichment analysis (GSEA), enrichment algorithm, multiscale embedded gene co-expression network analysis (MEGENA), weighted gene co-expression network analysis (WGCNA), differential expression analysis, Z-score, log2 expression analysis, or any combination thereof.

39. The method of any one of claims 1 to 38, wherein the data set is derived from the gene expression measurement data using GSVA.

40. The method of claim 39, wherein the data set comprises one or more GSVA scores of the patient, wherein the one or more GSVA scores are generated based on one or more Tables selected from Tables 19-1 to 19-36, Tables 19A-1 to 19A-36, Tables 23-1 to 23-28, Tables 25-1 to 25-32, Tables 26-1 to 26-60, Tables 27-1 to 27-48, and Tables 28-1 to 28-22, wherein for each selected Table, at least one GSVA score of the patient is generated based on enrichment of expression of at least 2 genes or human orthologs thereof listed in the selected Table, and wherein the one or more GSVA scores comprise each generated GSVA score.

41. The method of claim 40, wherein the one or more Tables are selected from Tables 19-1 to

19-36.

42. The method of claim 40 or claim 41, wherein the one or more Tables comprise at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, or 36 Tables selected from Tables 19-1 to 19-36.

43. The method of any one of claims 40 to 42, wherein the selected tables comprise Tables 19-1 to 19-36

44. The method of claim 40, wherein the one or more Tables are selected from Tables 23-1 to

23-28.

45. The method of claim 40 or claim 44, wherein the one or more Tables comprise at least 2, 3,

4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, or 28

Tables selected from Tables 23-1 to 23-28.

46. The method of claim 40, 44, or 45, wherein the selected tables comprise Tables 23-1 to 23-

28.

47. The method of claim 40, wherein the one or more Tables are selected from Tables 25-1 to

25-32.

48. The method of claim 40 or claim 47, wherein the one or more Tables comprise at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, or 28 Tables selected from Tables 25-1 to 25-32.

49. The method of claim 40, 47, or 48, wherein the selected tables comprise Tables 25-1 to 25-

32.

50. The method of claim 40, wherein the one or more Tables are selected from Tables 26-1 to

26-60.

51. The method of claim 40 or claim 50, wherein the one or more Tables comprise at least 2, 3,

4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,

31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, or 60 Tables selected from Tables 26-1 to 26-60.

52. The method of claim 40, 50, or 51, wherein the selected tables comprise Tables 26-1 to 26- 60.

53. The method of claim 40, wherein the one or more Tables are selected from Tables 27-1 to

27-48.

54. The method of claim 40 or claim 53, wherein the one or more Tables comprise at least 2, 3,

4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, or 48 Tables selected from

Tables 27-1 to 27-48.

55. The method of claim 40, 53, or 54, wherein the selected tables comprise Tables 27-1 to 27-

22.

56. The method of claim 40, wherein the one or more Tables are selected from Tables 28-1 to 28-22.

57. The method of claim 40 or claim 56, wherein the one or more Tables comprise at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, or 22 Tables selected from Tables 28-1 to 28-22.

58. The method of claim 40, 56, or 57, wherein the selected tables comprise Tables 28-1 to 28-

22.

59. The method of any one of claims 40 to 58, wherein independently for each selected Table, the at least one GSVA score of the patient is generated based on enrichment of expression of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 255, 260, 265, 270, 275, 280, 285, 290, or 295 or all genes selected from the genes listed in the respective Table.

60. The method of any one of claims 1 to 59, wherein the analyzing the data set comprises providing the data set as an input to a trained machine-learning model to classify the lupus nephritis disease state of the patient, wherein the trained machine-learning model generates an inference indicative of the lupus nephritis disease state of the patient based at least on the data set.

61. The method of claim 60, wherein the data set comprises the one or more GSVA scores of the patient, and the trained machine-learning model generates the inference based at least on the one or more GSVA scores.

62. The method of claim 60 or claim 61, wherein the method further comprises receiving, as an output of the trained machine-learning model, the inference; and/or electronically outputting a report indicating the lupus nephritis disease state of the patient.

63. The method of any one of claims 60 to 62, wherein the machine-learning model is trained using linear regression, logistic regression, Ridge regression, Lasso regression, elastic net (EN) regression, support vector machine (SVM), gradient boosted machine (GBM), k nearest neighbors (kNN), generalized linear model (GEM), naive Bayes (NB) classifier, neural network, Random Forest (RE), deep learning algorithm, linear discriminant analysis (EDA), decision tree learning (DTREE), adaptive boosting (ADB), Classification and Regression Tree (CART), hierarchical clustering, or any combination thereof.

64. The method of any one of claims 1 to 63, wherein the lupus nephritis disease state of the patient is classified based on a lupus nephritis disease risk score generated from the data set.

65. The method of claim 64, wherein the lupus nephritis disease risk score is generated based on the one or more GSVA scores of the patient.

66. The method of any one of claims 1 to 65, wherein the patient is at elevated risk of having lupus.

67. The method of any one of claims 1 to 66, wherein the patient is suspected of having lupus.

68. The method of any one of claims 1 to 67, wherein the patient is asymptomatic for lupus.

69. The method of any one of claims 1 to 68, wherein the patient has lupus.

70. The method of any one of claims 1 to 69, wherein the patient is at elevated risk of having lupus nephritis.

71. The method of any one of claims 1 to 70, wherein the patient is suspected of having lupus nephritis.

72. The method of any one of claims 1 to 71, wherein the patient is asymptomatic for lupus nephritis.

73. The method of any one of claims 1 to 72, wherein the patient has lupus nephritis.

74. The method of any one of claims 1 to 73, further comprising identifying, selecting, recommending and/or administering a treatment to the patient based at least in part on the classification of the lupus nephritis disease state of the patient.

75. The method of claim 74, wherein the treatment is configured to treat lupus nephritis.

76. The method of claim 74, wherein the treatment is configured to reduce a severity of lupus nephritis.

77. The method of claim 74, wherein the treatment is configured to reduce a risk of having lupus nephritis.

78. The method of any one of claims 74 to 77, wherein the treatment comprises a pharmaceutical composition.

79. The method of any one of claims 1 to 78, wherein the biological sample comprises a kidney biopsy sample, a blood sample, isolated peripheral blood mononuclear cells (PBMCs), or any derivative thereof.

80. A method for validating a mouse model useful for identifying and/or characterizing a human disease, the method comprising: a) providing a gene set capable of classifying a mouse as having an endotype selected from two or more endotypes of the disease; b) determining human orthologs of the gene set; c) classifying a human patient as having an endotype selected from the two or more endotypes of the disease using the human orthologs; and d) using the human orthologs to classify the mouse model as having an endotype selected from the two or more endotypes of the disease, wherein the endotype of a validated mouse model classified using the human orthologs corresponds to the human endotype of step (c) identified using the human orthologs.

81. The method of claim 80, wherein the disease is lupus, or lupus nephritis.

Description:
METHODS AND SYSTEMS FOR MACHINE LEARNING ANALYSIS OF LUPUS NEPHRITIS

[0001] This application claims priority to U.S. Provisional Patent Applications No. 63/389,804, filed 07/15/2022; No. 63/424,096, filed 11/09/2022; and No. 63/448,628, filed 02/27/2023, all of which are incorporated in full herein by reference.

BACKGROUND

[0002] Systemic lupus erythematosus (SLE) is an autoimmune disorder that can affect a variety of tissues, including the kidney. Lupus nephritis (LN) is one of the most severe organ manifestations of SLE and affects approximately 40% of adult lupus patients with 10-20% of patients developing endstage renal disease (ESRD). The immune mechanisms of LN disease progression and risk factors for end organ damage are poorly understood. There is a need for understanding molecular pathways involved in disease progression in LN to allow identification and optimization of therapies.

SUMMARY

[0003] An aspect of the current disclosure is directed to a method for assessing a lupus nephritis (LN) disease state of a patient. Based on transcriptomic analysis of lupus prone mice, the inventors have identified molecular pathways and risk factors for development of end-stage renal disease in human lupus patients. Using a gene expression-based clustering approach, disclosed sets of curated gene signatures are identified which, can be used e to classify disease stages of murine glomerulonephritis into molecular endotypes that effectively translate to human LN patients. A newly recognized, intermediate stage (e.g., endotype) of LN, referred to herein as “transitional LN”, occurring between acute and chronic LN disease state, was identified. Based on an understanding of molecular mechanisms of LN disease state progression from acute LN disease state to transitional LN disease state, and transitional LN disease state to chronic LN disease state, and gene expression analysis of the molecular endotypes (e.g., acute LN, transitional LN and chronic LN), targeted therapy was developed to stop, slow and/or reverse LN disease progression in a patient. The method for assessing the LN disease state of the patient can include analyzing a data set comprising or derived from gene expression measurement data of at least 2 genes or human orthologs thereof, from a biological sample from the patient, to classify the LN disease state of the patient. In certain embodiments, the at least 2 genes are selected from the genes listed in Tables 19-1 to 19-36, Tables 19A-1 to 19A-36, Table 20, Table 21, Table 22, Tables 23-1 to 23-28, Tables 25-1 to 25-32, Tables 26-1 to 26-60, Tables 27-1 to 27-48 and Tables 28-1 to 28-22. As an illustrative example, “genes listed in Table X and Y” includes x+y genes, where Table X contains x genes and Table Y contains y genes, considering no overlap exists between x and y genes. In the event of overlap, duplicate copies can be excluded from analysis. [0004] In certain embodiments, the data set comprises or is derived from gene expression measurement data of at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 255, 260,

265, 270, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350, 355, 360,

365, 370, 375, 380, 385, 390, 395, 400, 450, 500, 550, 600, 650, 700, 750, 850, 900, 950, 1000,

1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1550, 1600, 1700, 1800, 1900, 2000 or all, or any range or value therebetween, genes (or human orthologs thereof), selected from the genes listed in Tables 19-1 to 19-36, Tables 19A-1 to 19A-36, Table 20, Table 21, Table 22, Tables 23-1 to 23-28, Tables 25-1 to 25-32, Tables 26-1 to 26-60, Tables 27-1 to 27-48, and Tables 22-1 to 22-28, from the biological sample from the patient. In certain embodiments, the at least two genes are selected from the genes listed in Tables 19-1 to 19-36. In certain embodiments, the at least two genes are selected from the genes listed in Tables 19A-1 to 19A-36. In certain embodiments, the at least two genes are selected from the genes listed in Table 20. In certain embodiments, the at least two genes are selected from the genes listed in Table 21. In certain embodiments, the at least two genes are selected from the genes listed in Table 22. In certain embodiments, the at least two genes are selected from the genes listed in Tables 23-1 to 23-28. In certain embodiments, the at least two genes are selected from the genes listed in Tables 25-1 to 25- 32. In certain embodiments, the at least two genes are selected from the genes listed in Tables 26-1 to 26-60. In certain embodiments, the at least two genes are selected from the genes listed in Tables 27-1 to 27-48. In certain embodiments, the at least two genes are selected from the genes listed in Tables 28-1 to 28-22. The at least 2 genes may or may not include gene(s) that are not listed in Tables 19-1 to 19-36, Table 20, Table 21, Table 22, Tables 23-1 to 23-28, Tables 25-1 to 25-32, Tables 26-1 to 26-60, Tables 27-1 to 27-48, and/or Tables 28-1 to 28-22. In certain embodiments, the at least 2 genes do not include any gene that are not listed in Tables 19-1 to 19-36, Table 20, Table 21, Table 22, Tables 23-1 to 23-28, Tables 25-1 to 25-32, Tables 26-1 to 26-60, Tables 27- 1 to 27-48, and/or Tables 28-1 to 28-22. In certain embodiments, the at least 2 genes do not include any gene that is not listed in Tables 19-1 to 19-36. In certain embodiments, the at least 2 genes do not include any gene that is not listed in Tables 23-1 to 23-28. In certain embodiments, the at least 2 genes do not include any gene that is not listed in Tables 25-1 to 25-32. In certain embodiments, the at least 2 genes do not include any gene that is not listed in Tables 26-1 to 26-60. In certain embodiments, the at least 2 genes do not include any gene that is not listed in Tables 27-1 to 27-48. In certain embodiments, the at least 2 genes do not include any gene that is not listed in Tables 28-1 to 28-22. In certain embodiments, the data set comprises or is derived from gene expression measurement data of one or more human orthologs of the genes selected from Tables 19-1 to 19-36, Table 20, Table 21, Table 22, Tables 28-1 to 28-22, Tables 26-1 to 26-60, and Tables 27-1 to 27-

48. Gene sets listed in each of these Tables can be used as effective biomarkers for classifying the LN disease state of the patients. In certain embodiments, the data set comprises or is derived from gene expression measurement data of one or more human orthologs of the genes selected from Tables 19-1 to 19-36. In certain embodiments, the data set comprises or is derived from gene expression measurement data of one or more human orthologs of the genes selected from Table 20. In certain embodiments, the data set comprises or is derived from gene expression measurement data of one or more human orthologs of the genes selected from Table 21. In certain embodiments, the data set comprises or is derived from gene expression measurement data of one or more human orthologs of the genes selected from Table 22. A human ortholog of a non-human gene (such as a mouse gene) can be identified using a method as described in U.S. Pat. App. Pub. No. 2021/0104321 (“Machine Learning Disease Prediction and Treatment Prioritization”), incorporated herein by reference in its entirety, as described in the Examples therein, and/or by any method published and/or known toone of skill in the art. As a non-limiting example, human orthologs of the mouse gene sets can be identified on a gene-by-gene basis using publicly available online databases, including but not limited to GeneCards, the Mouse Genome Informatics (MGI), and UniProtKB, as well as literature mining. Through this process, genes with similar tissue expression, cellular localization, and functions between mouse and human can be retained in the human gene sets. One or more human ortholog of a non-human gene may be identified. Gene expression measurement data of any of the one or more identified human orthologs of a given non human gene may be comprised by the data set. It is understood that in the absence of a human ortholog for a given non human gene, that expression measurement data of human ortholog for that non human gene may not be comprised by the data set.

[0005] In certain embodiments, the data set comprises or is derived from gene expression measurement data of human orthologs of at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240,

245, 250, 255, 260, 265, 270, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340,

345, 350, 355, 360, 365, 370, 375, 380, 385, 390, 395, 400, 450, 500, 550, 600, 650, 700, 750, 850,

900, 950, 1000, 1050, 1100, 1150, 1200, 1250, or 1291 or all, or any range or value therebetween, genes selected from the genes listed in Tables 19-1 to 19-36, from the biological sample from the patient.

[0006] In certain embodiments, the data set comprises or is derived from gene expression measurement data of at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 255, 260,

265, 270, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350, 355, 360,

365, 370, 375, 380, 385, 390, 395, 400, 450, 500, 550, 600, 650, 700, 750, 850, 900, 950, 1000,

1050, 1100, 1150, 1200, 1250, or 1291 or all, or any range or value therebetween, genes selected from the genes listed in Tables 19A-1 to 19A-36, from the biological sample from the patient.

[0007] In certain embodiments, the data set comprises or is derived from gene expression measurement data of at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 255, 260,

265, 270, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350, 355, 360,

365, 370, 375, 380, 385, 390, 395, 400, 450, 500, 550, 600, 650, 700, 750, 850, 900, 950, 1000,

1050, 1100, 1150, 1200, 1250, 1300, 1500, 2000, or all or any range or value there between genes, selected from the genes listed in Tables 26-1 to 26-60, from the biological sample from the patient.

[0008] In certain embodiments, the data set comprises or is derived from gene expression measurement data of at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 255, 260,

265, 270, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350, 355, 360,

365, 370, 375, 380, 385, 390, 395, 400, 450, 500, 550, 600, 650, 700, 750, 850, 900, 950, 1000,

1050, 1100, 1150, 1200, 1250, 1300, 1500, 2000, or all or any range or value there between genes, selected from the genes listed in Tables 27-1 to 27-48, from the biological sample from the patient.

[0009] In certain embodiments, the data set comprises or is derived from gene expression measurement data of human orthologs of at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 255, 260, 265, 270, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350, 355, 360, 365, 370, 375, 380, 385, 390, 395, 400, 450, 500, 550, 600, 650, 700, 727, or all, or any range or value there between genes, selected from the genes listed in Tables 28-1 to 28-22, from the biological sample from the patient.

[0010] In certain embodiments, the data set comprises or is derived from gene expression measurement data of at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 255, 260,

265, 270, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350, 355, 360,

365, 370, 375, 380, 385, 390, 395, 400, 450, 500, 550, 600, 650, 700, 750, 850, 900, 950, 960, 968, or all, or any range or value therebetween, genes selected from the genes listed in Tables 23-1 to 23-

28, from the biological sample from the patient.

[0011] In certain embodiments, the data set comprises or is derived from gene expression measurement data of at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 255, 260,

265, 270, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350, 355, 360,

365, 370, 375, 380, 385, 390, 395, 400, 450, 500, 550, 600, 650, 700, 750, 850, 900, 950, 960, 968,

1000, or all, or any range or value therebetween, genes selected from the genes listed in Tables 25-1 to 25-32, from the biological sample from the patient.

[0012] In certain embodiments, the data set comprises or is derived from gene expression measurement data of at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 255, 260,

265, 270, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350, 355, 360,

365, 370, 375, 380, 385, 390, 395, 400, 450, 500, 550, 600, 650, 700, 727, or all, or any range or value there between genes, selected from the genes listed in Tables 28-1 to 28-22, from the biological sample from the patient.

[0013] In certain embodiments, the data set comprises or is derived from gene expression measurement data of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, or 203, or all, or any range or value there between, genes or human orthologs thereof selected from the genes listed in each of one or more Tables selected from Tables 19-1 to 19-36, Tables 19A-1 to 19A-36, Table 20, Table 21, Table 22, Tables 23-1 to 23- 28, Tables 25-1 to 25-32, Tables 26-1 to 26-60, Tables 27-1 to 27-48, and Table 28-1 to 28-22 from the biological sample from the patient, wherein the number of genes selected from the genes listed in each selected table may be different or same (e.g., a different or identical number of genes can be selected from the genes listed in each selected table, e.g., in an illustrative example Table 25- 1, Table 25-2, and Table 25-3 are selected, and 4 genes from Table 25-1, 2 genes from Table 25-2, and 7 genes from Table 25-3 are selected). In a non-limiting example, the data set comprises or is derived from gene expression measurement data of at least 2 genes selected from the genes listed in each of 28 tables (i.e., one or more Tables selected comprises 28 tables) selected from Tables 23-1 to 23-28, from the biological sample from the patient, i.e., 28 Tables from Tables 23-1 to 23-28 are selected, and at least 2 genes are selected from the genes listed in each of the selected Tables, thereby the data set comprises or is derived from gene expression measurement data of, at least 2 genes selected from the genes listed in Table 23-1, at least 2 genes selected from the genes listed in Table 23-2, at least 2 genes selected from the genes listed in Table 23-3, at least 2 genes selected from the genes listed in Table 23-4, at least 2 genes selected from the genes listed in Table 23-5, at least 2 genes selected from the genes listed in Table 23-6, at least 2 genes selected from the genes listed in Table 23-7, at least 2 genes selected from the genes listed in Table 23-8, at least 2 genes selected from the genes listed in Table 23-9, at least 2 genes selected from the genes listed in Table 23-10, at least 2 genes selected from the genes listed in Table 23-11, at least 2 genes selected from the genes listed in Table 23-12, at least 2 genes selected from the genes listed in Table 23-13, at least 2 genes selected from the genes listed in Table 23-14, at least 2 genes selected from the genes listed in Table 23-15, at least 2 genes selected from the genes listed in Table 23-16, at least 2 genes selected from the genes listed in Table 23-17, at least 2 genes selected from the genes listed in Table 23-18, at least 2 genes selected from the genes listed in Table 23-19, at least 2 genes selected from the genes listed in Table 23-20, at least 2 genes selected from the genes listed in Table 23-21, at least 2 genes selected from the genes listed in Table 23-22, at least 2 genes selected from the genes listed in Table 23-23, at least 2 genes selected from the genes listed in Table 23-24, at least 2 genes selected from the genes listed in Table 23-25, at least 2 genes selected from the genes listed in Table 23-26, at least 2 genes selected from the genes listed in Table 23-27, and at least 2 genes selected from the genes listed in Table 23-28, from the biological sample from the patient. Genes selected from each selected Table of the one or more Tables, can be used as effective biomarkers for classifying the LN disease state of the patient.

[0014] In certain embodiments, the data set comprises or is derived from gene expression measurement data of one or more human orthologs of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41,

42, 43, 44, 45, 46, 47, 48, 49, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125,

130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, or 191 or all, or any range or value there between, genes selected from the genes listed in each of one or more Tables selected from

Tables 19-1 to 19-36, from the biological sample from the patient, wherein the number of genes selected from the genes listed in each selected table may be different or same (e.g., a different or identical number of genes can be selected from the genes listed in each selected table). In certain embodiments, the data set comprises or is derived from gene expression measurement data of one or more human orthologs of an effective number of genes selected from the genes listed in each of the one or more Tables selected from Tables 19-1 to 19-36, from the biological sample from the patient, wherein a different or identical number of genes can be selected from the genes listed in each selected table. In certain embodiments, the data set comprises or is derived from gene expression measurement data of one or more human orthologs of the genes listed in each of one or more Tables selected from Tables 19-1 to 19-36, from the biological sample from the patient. The one or more Tables selected from Tables 19-1 to 19-36 can include at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,

14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, or 36, or any range therebetween Tables. In certain embodiments, Tables 19-1 to 19-36 (e.g., 36 Tables) are selected.

[0015] In certain embodiments, the data set comprises or is derived from gene expression measurement data of one or more human orthologs of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,

15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, or 191 or all, or any range or value there between, genes selected from the genes listed in each of one or more Tables selected from Tables 19-1 to 19-36, from the biological sample from the patient, wherein the number of genes selected from the genes listed in each selected table may be different or same (e.g., a different or identical number of genes can be selected from the genes listed in each selected table). In certain embodiments, the data set comprises or is derived from gene expression measurement data of one or more human orthologs of an effective number of genes selected from the genes listed in each of the one or more Tables selected from Tables 19-1 to 19-36, from the biological sample from the patient, wherein a different or identical number of genes can be selected from the genes listed in each selected table. In certain embodiments, the data set comprises or is derived from gene expression measurement data of one or more human orthologs of the genes listed in each of one or more Tables selected from Tables 19-1 to 19-36, from the biological sample from the patient. The one or more Tables selected from Tables 19-1 to 19-36 can include at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, or 36, or any range therebetween Tables. In certain embodiments, Tables 19-1 to 19-36 (e.g., 36 Tables) are selected.

[0016] In certain embodiments, the data set comprises or is derived from gene expression measurement data of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, or 191 or all, or any range or value there between, genes selected from the genes listed in each of one or more Tables selected from Tables 19A-1 to 19A-36, from the biological sample from the patient, wherein the number of genes selected from the genes listed in each selected table may be different or same (e.g., a different or identical number of genes can be selected from the genes listed in each selected table). In certain embodiments, the data set comprises or is derived from gene expression measurement data of an effective number of genes selected from the genes listed in each of the one or more Tables selected from Tables 19A-1 to 19A-36, from the biological sample from the patient, wherein a different or identical number of genes can be selected from the genes listed in each selected table. In certain embodiments, the data set comprises or is derived from gene expression measurement data of the genes listed in each of one or more Tables selected from Tables 19A-1 to 19A-36, from the biological sample from the patient. The one or more Tables selected from Tables 19A-1 to 19A-36 can include at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,

11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, or 36, or any range therebetween Tables. In certain embodiments, Tables 19A-1 to 19A-36 (e.g., 36 Tables) are selected.

[0017] In certain embodiments, the data set comprises or is derived from gene expression measurement data of one or more human orthologs of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, or all, or any range or value there between, genes selected from the genes listed in each of one or more Tables selected from Tables 28-1 to 28-22, from the biological sample from the patient, wherein the number of genes selected from the genes listed in each selected table may be different or same (e.g., a different or identical number of genes can be selected from the genes listed in each selected table). In certain embodiments, the data set comprises or is derived from gene expression measurement data of one or more human orthologs of an effective number of genes selected from the genes listed in each of the one or more Tables selected from Tables 28-1 to 28-22, from the biological sample from the patient, wherein a different or identical number of genes can be selected from the genes listed in each selected table. In certain embodiments, the data set comprises or is derived from gene expression measurement data of one or more human orthologs of the genes listed in each of the one or more Tables selected from Tables 28-1 to 28-22, from the biological sample from the patient. The one or more Tables selected from Tables 28-1 to 28-22 can include at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,

12, 13, 14, 15, 16, 17, 18, 19, 20, 21, or 22, or any range there between Tables. In certain embodiments, Tables 28-1 to 28-22 (e.g., 22 Tables) are selected. In certain embodiments, Tables 28-1, 28-2, 28-3, 28-4, 28-5, 28-6, 28-7, 28-8, 28-12, 28-13, 28-14, 28-15, 28-16, 28-17, 28-18, 28- 19, 28-20, 28-21 and 28-22 are selected. In certain embodiments, Tables 28-1, 28-2, 28-3, 28-4, 28- 6, 28-7, 28-8, 28-9, 28-10, 28-11, 28-12, 28-13, 28-14, 28-15, 28-16, 28-17, 28-18, 28-20, 28-21 and 28-22 are selected. In certain embodiments, Tables 28-1, 28-2, 28-3, 28-4, 28-5, 28-6, 28-7, 28-8, 28-9, 28-10, 28-11, 28-12, 28-13, 28-14, 28-15, 28-16, 28-17, 28-18, 28-20, 28-21 and 28-22 are selected. In certain embodiments, Tables 28-1 to 28-22 (e.g., 22 Tables) are selected.

[0018] In certain embodiments, the data set comprises or is derived from gene expression measurement data of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, or 191 or all, or any range or value therebetween, genes selected from the genes listed in each of one or more Tables selected from Tables 26-1 to 26-60, from the biological sample from the patient, wherein the number of genes selected from the genes listed in each selected table may be different or same (e.g., a different or identical number of genes can be selected from the genes listed in each selected table). In certain embodiments, the data set comprises or is derived from gene expression measurement data of an effective number of genes selected from the genes listed in each of the one or more Tables selected from Tables 26-1 to 26-60, from the biological sample from the patient, wherein a different or identical number of genes can be selected from the genes listed in each selected table. In certain embodiments, the data set comprises or is derived from gene expression measurement data of the genes listed in each of the one or more Tables selected from Tables 26-1 to 26-60, from the biological sample from the patient. The one or more Tables selected from Tables 26-1 to 26-60 can include at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59 or 60, or any range there between Tables. In certain embodiments, Tables 26-1 to 26-60 (e.g., 60 Tables) are selected.

[0019] In certain embodiments, the data set comprises or is derived from gene expression measurement data of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, or 191 or all, or any range or value therebetween, genes selected from the genes listed in each of one or more Tables selected from Tables 27-1 to 27-48, from the biological sample from the patient, wherein the number of genes selected from the genes listed in each selected table may be different or same (e.g., a different or identical number of genes can be selected from the genes listed in each selected table). In certain embodiments, the data set comprises or is derived from gene expression measurement data of an effective number of genes selected from the genes listed in each of the one or more Tables selected from Tables 27-1 to 27-48, from the biological sample from the patient, wherein a different or identical number of genes can be selected from the genes listed in each selected table. In certain embodiments, the data set comprises or is derived from gene expression measurement data of the genes listed in each of the one or more Tables selected from Tables 27-1 to 27-48, from the biological sample from the patient. The one or more Tables selected from Tables 27-1 to 27-48 can include at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, or any range there between Tables. In certain embodiments, Tables 27-1 to 27-48 (e.g., 48 Tables) are selected.

[0020] In certain embodiments, the data set comprises or is derived from gene expression measurement data of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, or 203, or all, or any range or value therebetween, genes selected from the genes listed in each of one or more Tables selected from Tables 23-1 to 23-28, from the biological sample from the patient, wherein the number of genes selected from the genes listed in each selected table may be different or same (e.g., a different or identical number of genes can be selected from the genes listed in each selected table). In certain embodiments, the data set comprises or is derived from gene expression measurement data of an effective number of genes selected from the genes listed in each of the one or more Tables selected from Tables 23-1 to 23-28, from the biological sample from the patient, wherein a different or identical number of genes can be selected from the genes listed in each selected table. In certain embodiments, the data set comprises or is derived from gene expression measurement data of the genes listed in each of the one or more Tables selected from Tables 23-1 to 23-28, from the biological sample from the patient. The one or more Tables selected from Tables 23-1 to 23-28 can include at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, or 28, or any range there between Tables. In certain embodiments, Tables 23-1 to 23-28 (e.g., 28 Tables) are selected.

[0021] In certain embodiments, the data set comprises or is derived from gene expression measurement data of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, or all, or any range or value therebetween, genes selected from the genes listed in each of one or more Tables selected from Tables 25-1 to 25-32, from the biological sample from the patient, wherein the number of genes selected from the genes listed in each selected table may be different or same (e.g., a different or identical number of genes can be selected from the genes listed in each selected table). In certain embodiments, the data set comprises or is derived from gene expression measurement data of an effective number of genes selected from the genes listed in each of the one or more Tables selected from Tables 25-1 to 25-32, from the biological sample from the patient, wherein a different or identical number of genes can be selected from the genes listed in each selected table. In certain embodiments, the data set comprises or is derived from gene expression measurement data of the genes listed in each of the one or more Tables selected from Tables 25-1 to 25-32, from the biological sample from the patient. The one or more Tables selected from Tables 25-1 to 25-32 can include at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, or 32, or any range there between Tables. In certain embodiments, Table 25-8 is selected. In certain embodiments, Table 25- 31 is selected. In certain embodiments, Tables 25-8 and 25-31 are selected. In certain embodiments, Tables 25-1 to 25-32 (e.g., 32 Tables) are selected.

[0022] In certain embodiments, the data set comprises or is derived from gene expression measurement data of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, or all, or any range or value there between, genes selected from the genes listed in each of one or more Tables selected from Tables 28-1 to 28-22, from the biological sample from the patient, wherein the number of genes selected from the genes listed in each selected table may be different or same (e.g., a different or identical number of genes can be selected from the genes listed in each selected table). In certain embodiments, the data set comprises or is derived from gene expression measurement data of an effective number of genes selected from the genes listed in each of the one or more Tables selected from Tables 28-1 to 28-22, from the biological sample from the patient, wherein a different or identical number of genes can be selected from the genes listed in each selected table. In certain embodiments, the data set comprises or is derived from gene expression measurement data of the genes listed in each of the one or more Tables selected from Tables 28-1 to 28-22, from the biological sample from the patient. The one or more Tables selected from Tables 28-1 to 28-22 can include at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, or 22, or any range there between Tables. In certain embodiments, Tables 28-1 to 28-22 (e.g., 22 Tables) are selected. In certain embodiments, Tables 28-1, 28-2, 28-3, 28-4, 28-5, 28-6, 28-7, 28-8, 28-12, 28-13, 28-14, 28-15, 28-16, 28-17, 28-18, 28- 19, 28-20, 28-21 and 28-22 are selected. In certain embodiments, Tables 28-1, 28-2, 28-3, 28-4, 28- 6, 28-7, 28-8, 28-9, 28-10, 28-11, 28-12, 28-13, 28-14, 28-15, 28-16, 28-17, 28-18, 28-20, 28-21 and 28-22 are selected. In certain embodiments, Tables 28-1, 28-2, 28-3, 28-4, 28-5, 28-6, 28-7, 28-8, 28-9, 28-10, 28-11, 28-12, 28-13, 28-14, 28-15, 28-16, 28-17, 28-18, 28-20, 28-21 and 28-22 are selected. In certain embodiments, Tables 28-1 to 28-22 (e.g., 22 Tables) are selected.

[0023] Genes selected form each of the selected Tables can be used as effective biomarkers for classifying the LN disease state of the patients.

[0024] Selecting effective number of genes from a selected Table can include selecting at least minimum number of genes from the table to obtain desired accuracy, sensitivity, specificity, positive predictive value, and/or negative predictive value in classification of the LN disease state of the patient. Desired accuracy, sensitivity, specificity, positive predictive value, and/or negative predictive value, can be an accuracy, sensitivity, specificity, positive predictive value, and/or negative predictive value described herein. In certain embodiments, the desired accuracy, sensitivity, specificity, positive predictive value, and/or negative predictive value, is at least 80%. In certain embodiments, the desired accuracy, sensitivity, specificity, positive predictive value, and/or negative predictive value, is at least 85%. In certain embodiments, the desired accuracy, sensitivity, specificity, positive predictive value, and/or negative predictive value, is at least 90%. In certain embodiments, the desired accuracy, sensitivity, specificity, positive predictive value, and/or negative predictive value, is at least 95%. In certain embodiments, effective number of genes for a Table can be determined using adjusted rand index (ARI) method. The ARI method can include performing k-Means clustering on randomly selected gene subsets by standard interval based on the total number of genes of a Table. Similarity between two clusters can be measured by adjusted rand index (ARI). As a non-limiting example, the adjusted rand index (ARI) can be calculated between k-Means cluster memberships from the randomly selected gene subsets to the cluster memberships obtained using total number of genes of a Table. The higher the ARI, the similar the cluster memberships and lower the ARI the weaker the cluster memberships, suggesting more genes may be required. The ARI can be calculated to determine the effective number of genes for a Table. In certain embodiments, selecting effective number of genes from a selected Table can include selecting at least 60%, 70%, 80 %, 90%, or all genes listed in the selected Table. In certain embodiments, selecting effective number of genes from a selected Table can include selecting at least 60% of the genes listed in the selected Table. In certain embodiments, selecting effective number of genes from a selected Table can include selecting at least 70% of the genes listed in the selected Table. In certain embodiments, selecting effective number of genes from a selected Table can include selecting at least 80% of the genes listed in the selected Table. In certain embodiments, selecting effective number of genes from a selected Table can include selecting at least 90% of the genes listed in the selected Table. In certain embodiments, selecting effective number of genes from a selected Table can include selecting all the genes listed in the selected Table. In certain embodiments, selecting effective number of genes from a selected Table can include selecting at least about 60%, 65%, 70%, 75%, 80 %, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% of the genes in the Table. In certain embodiments, selecting an effective number of genes from a selected Table can include selecting at least about 60%, 65%, 70%, 75%, 80 %, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% of the genes in the Table, where the Table contains 100 or more genes. In certain embodiments, selecting effective number of genes from a selected Table can include selecting at least 70%, genes from the Table, where the Table contains 100 or more genes. In certain embodiments, selecting effective number of genes from a selected Table can include selecting at least about 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% of the genes in the Table, where the Table contains less than 100 genes. In certain embodiments, selecting effective number of genes from a selected Table can include selecting all genes from the Table, where the Table contains less than 100 genes. In certain embodiments, collinear genes (such as with r > 0.9, > 0.8, > 0.7, or > 0.6) are be removed from the gene set forming the effective number of genes. In some embodiments, an effective number of genes in a Table disclosed herein comprises about 60 percent to about 100 percent of the genes in the Table. In some embodiments, an effective number of genes in a Table disclosed herein comprises about 60 percent to about 65 percent, about 60 percent to about 70 percent, about 60 percent to about 75 percent, about 60 percent to about 80 percent, about 60 percent to about 85 percent, about 60 percent to about 90 percent, about 60 percent to about 95 percent, about 60 percent to about 97 percent, about 60 percent to about 98 percent, about 60 percent to about 99 percent, about 60 percent to about 100 percent, about 65 percent to about 70 percent, about 65 percent to about 75 percent, about 65 percent to about 80 percent, about 65 percent to about 85 percent, about 65 percent to about 90 percent, about 65 percent to about 95 percent, about 65 percent to about 97 percent, about 65 percent to about 98 percent, about 65 percent to about 99 percent, about 65 percent to about 100 percent, about 70 percent to about 75 percent, about 70 percent to about 80 percent, about 70 percent to about 85 percent, about 70 percent to about 90 percent, about 70 percent to about 95 percent, about 70 percent to about 97 percent, about 70 percent to about 98 percent, about 70 percent to about 99 percent, about 70 percent to about 100 percent, about 75 percent to about 80 percent, about 75 percent to about 85 percent, about 75 percent to about 90 percent, about 75 percent to about 95 percent, about 75 percent to about 97 percent, about 75 percent to about 98 percent, about 75 percent to about 99 percent, about 75 percent to about 100 percent, about 80 percent to about 85 percent, about 80 percent to about 90 percent, about 80 percent to about 95 percent, about 80 percent to about 97 percent, about 80 percent to about 98 percent, about 80 percent to about 99 percent, about 80 percent to about 100 percent, about 85 percent to about 90 percent, about 85 percent to about 95 percent, about 85 percent to about 97 percent, about 85 percent to about 98 percent, about 85 percent to about 99 percent, about 85 percent to about 100 percent, about 90 percent to about 95 percent, about 90 percent to about 97 percent, about 90 percent to about 98 percent, about 90 percent to about 99 percent, about 90 percent to about 100 percent, about 95 percent to about 97 percent, about 95 percent to about 98 percent, about 95 percent to about 99 percent, about 95 percent to about 100 percent, about 97 percent to about 98 percent, about 97 percent to about 99 percent, about 97 percent to about 100 percent, about 98 percent to about 99 percent, about 98 percent to about 100 percent, or about 99 percent to about 100 percent of the genes in the Table. In some embodiments, an effective number of genes in a Table disclosed herein comprises about 60 percent, about 65 percent, about 70 percent, about 75 percent, about 80 percent, about 85 percent, about 90 percent, about 95 percent, about 97 percent, about 98 percent, about 99 percent, or about 100 percent of the genes in the Table. In some embodiments, an effective number of genes in a Table disclosed herein comprises at least about 60 percent, about 65 percent, about 70 percent, about 75 percent, about 80 percent, about 85 percent, about 90 percent, about 95 percent, about 97 percent, about 98 percent, or about 99 percent of the genes in the Table.

[0025] In certain embodiments, a minimum number of Tables are selected (e.g., from Tables 23- 1 to 23-28, or from Tables 25-1 to 25-32, or from Tables 26-1 to 26-60, or from Tables 27-1 to 27-48, or from Tables 28-1 to 28-22) such that the method can classify/identify all four endotypes (acute LN, transitional LN, chronic group I LN and chronic group II LN) of LN disease state. In certain embodiments, a minimum number of Tables are selected (e.g., from Tables 23-1 to 23-28, or from Tables 25-1 to 25-32, or from Tables 26-1 to 26-60, or from Tables 27-1 to 27-48, or from Tables 28-1 to 28-22) such that the method can classify the LN disease state of the patient with a desired accuracy, sensitivity, specificity, positive predictive value, and/or negative predictive value. The desired accuracy, sensitivity, specificity, positive predictive value, and/or negative predictive value, can be an accuracy, sensitivity, specificity, positive predictive value, and/or negative predictive value described herein. In certain embodiments, the desired accuracy, sensitivity, specificity, positive predictive value, and/or negative predictive value, is at least 80%. In certain embodiments, the desired accuracy, sensitivity, specificity, positive predictive value, and/or negative predictive value, is at least 85%. In certain embodiments, the desired accuracy, sensitivity, specificity, positive predictive value, and/or negative predictive value, is at least 90%. In certain embodiments, the desired accuracy, sensitivity, specificity, positive predictive value, and/or negative predictive value, is at least 95%.

[0026] The data set can be generated from the biological sample from the patient. For example, nucleic acid molecules of the patient in the biological sample can be assessed to obtain the data set. In certain embodiments, the gene expression measurements of the at least 2 genes from the biological sample can be performed using any suitable method known to those of skill in the art including but not limited to DNA sequencing, RNA sequencing, microarray, RNA-Seq, qPCR, northern blotting, fluorescence in situ hybridization, serial analysis of gene expression, tiling arrays or any combination thereof, to obtain the data set. In certain embodiments, the gene expression measurements of the at least 2 genes in the biological sample can be performed using RNA-Seq. RNA-Seq can include single cell RNA-Seq, and/or bulk RNA-Seq. In certain embodiments, the gene expression measurements of the at least 2 genes in the biological sample can be performed using microarray analysis. In certain embodiments, the data set is derived from the gene expression measurement data from the biological sample, wherein the gene expression measurement data is analyzed using a suitable data analysis tool including but not limited to BIG-C™ big data analysis tool, an I-Scope™ big data analysis tool, a T-Scope™ big data analysis tool, a CellScan big data analysis tool, an MS (Molecular Signature) Scoring ™ analysis tool, gene set variation analysis (GSVA), gene set enrichment analysis (GSEA), enrichment algorithm, Z score, multiscale embedded gene co-expression network analysis (MEGENA), weighted gene co-expression network analysis (WGCNA), differential expression analysis, log2 expression analysis, or any combination thereof, to obtain the dataset. In certain embodiments, the data set is derived from the gene expression measurement data using gene set variation analysis (GSVA), gene set enrichment analysis (GSEA), enrichment algorithm, multiscale embedded gene co-expression network analysis (MEGENA), weighted gene co-expression network analysis (WGCNA), differential expression analysis, Z-score, log2 expression analysis, or any combination thereof. In certain embodiments, the data set is derived from the gene expression measurement data using gene set variation analysis (GSVA). In certain embodiments, the method comprises obtaining and/or deriving the biological sample from the patient. In certain embodiments, the method comprises analyzing the biological sample to obtain the gene expression measurement data from the biological sample. In certain embodiments, the method comprises analyzing the gene expression measurement data to obtain the dataset. In certain embodiments, the method comprises obtaining and/or deriving the biological sample from the patient, and/or analyzing the biological sample to obtain the gene expression measurement data from the biological sample. In certain embodiments, the method comprises obtaining and/or deriving the biological sample from the patient, analyzing the biological sample to obtain the gene expression measurement data from the biological sample, and/or analyzing the gene expression measurement data to obtain the dataset.

[0027] In certain embodiments, the data set is derived from the gene expression measurement data, and the data set comprises one or more enrichment scores of the patient. The one or more enrichment scores of the patient can be generated based on the one or more Tables selected from Tables 19-1 to 19-36, Tables 19A-1 to 19A-36, Tables 23-1 to 23-28, Tables 25-1 to 25-32, Tables 26-1 to 26-60, Tables 27-1 to 27-48, and Tables 28-1 to 28-22 wherein for each selected Table, at least one enrichment score of the patient is generated based on enrichment of expression of the at least 2 genes (or one or more human orthologs thereof) selected from the genes listed in the selected Table, in the biological sample. The one or more enrichment scores can contain the at least one enrichment score generated from each of the selected Table. The at least 2 genes selected from a respective selected Table, can form the input gene set for generating the at least one enrichment score from the respective selected Table. The at least 2 genes of the data set can comprise the at least 2 genes selected from each of the selected table. In certain embodiments, the data set can be derived from the gene expression measurements of the genes selected from the selected Tables using GSVA, and the data set comprises one or more enrichment scores of the patient. In certain embodiments, for each selected Table, the at least one enrichment score of the patient is generated based on enrichment of expression of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 203, or all, any range or value there between genes (or one or more human orthologs thereof) selected from the genes listed in the respective Table, in the biological sample, wherein number of genes selected from different selected Tables can be the same or different. In certain embodiments, for each selected Table, the at least one enrichment score of the patient is generated based on enrichment of expression of an effective number of genes (or one or more human orthologs thereof) selected from the genes listed in the selected Table, in the biological sample, wherein a different or identical number of genes can be selected from the genes listed in each selected table. In certain embodiments, for each selected Table, the at least one enrichment score of the patient is generated based on enrichment of expression of all the genes (or one or more human orthologs thereof) listed in the selected Table, in the biological sample. The genes selected from a respective selected Table (or one or more human orthologs thereof), can form the input gene set for generating the at least one enrichment score of the patient based on the respective selected Table. The at least one enrichment score based on a selected Table can be generated based on enrichment of the input gene set (e.g., containing genes selected from the selected Table, e.g., at least 2 genes, effective number of genes, or all the genes selected from the selected Table) based on the selected Table, in the biological sample. Enrichment can be determined with respect to a reference data set, as described herein. In a non-limiting example, the one or more Tables selected comprise Tables: 23-1 and 23-2, and effective number of genes are selected from the genes listed in each of the Tables selected, and the dataset comprises the one or more enrichment scores of the patient, thereby the one or more enrichment scores of the patient comprise at least one enrichment score generated based on Table 23-1, and at least one enrichment score generated based on Table 23-2, wherein the at least one enrichment score generated based on Table 23-1 is generated based on enrichment of the input gene set (e.g., containing the effective number of genes selected from the genes listed in Table 23-1) based on Table 23-1 in the biological sample, and the at least one enrichment score generated based on Table 23-2 is generated based on enrichment of the input gene set (e.g., containing the effective number of genes selected from the genes listed in Table 23-2) based on Table 23-2 in the biological sample. In certain embodiments, one enrichment score is generated from each of the selected Tables. In certain embodiments, the dataset comprises the one or more enrichment scores of the patients, and analyzing the data set comprises analyzing the one or more enrichment scores of the patient to classify the LN disease state of the patient. In certain embodiments, the one or more enrichment scores of the patients are analyzed, to classify the LN disease state of the patient. The enrichment score can be generated using any suitable method, including but not limited to GSEA and GSVA. In certain embodiments, the enrichment scores are generated based on GSVA, and the enrichment scores are GSVA scores.

[0028] In certain embodiments, the one or more enrichment scores of the patient are generated based on one or more Tables selected from Tables 19-1 to 19-36. In certain embodiments, the one or more enrichment scores of the patient are generated based on one or more Tables selected from Tables 19-1 to 19-36, and the one or more Tables comprise at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, or 36, or any range therebetween Tables. In certain embodiments, Tables 19-1 to 19-36 (e.g., 36 Tables) are selected.

[0029] In certain embodiments, the one or more enrichment scores of the patient are generated based on one or more Tables selected from Tables 19A-1 to 19A-36. In certain embodiments, the one or more enrichment scores of the patient are generated based on one or more Tables selected from Tables 19A-1 to 19A-36, and the one or more Tables comprise at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, or 36, or any range therebetween Tables. In certain embodiments, Tables 19A-1 to 19A-36 (e.g., 36 Tables) are selected.

[0030] In certain embodiments, the one or more enrichment scores of the patient are generated based on one or more Tables selected from Tables 26-1 to 26-60. In certain embodiments, the one or more enrichment scores of the patient are generated based on one or more Tables selected from Tables 26-1 to 26-60, and the one or more Tables comprises at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,

12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,

39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59 or 60, or any range therebetween Tables. In certain embodiments, Tables 26-1 to 26-60 (e.g., 60 Tables) are selected.

[0031] In certain embodiments, the one or more enrichment scores of the patient are generated based on one or more Tables selected from Tables 27-1 to 27-48. In certain embodiments, the one or more enrichment scores of the patient are generated based on one or more Tables selected from Tables 27-1 to 27-48, and the one or more Table comprise at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,

13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,

40, 41, 42, 43, 44, 45, 46, 47, or 48, or any range therebetween Tables. In certain embodiments, Tables 27-1 to 27-48 (e.g., 48 Tables) are selected. [0032] In certain embodiments, the one or more enrichment scores of the patient are generated based on one or more Tables selected from Tables 23-1 to 23-28. In certain embodiments, the one or more enrichment scores of the patient are generated based on one or more Tables selected from Tables 23-1 to 23-28, and the one or more Tables comprises at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,

12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, or 28, or any range therebetween Tables. In certain embodiments, Tables 23-1 to 23-28 (e.g., 28 Tables) are selected.

[0033] In certain embodiments, the one or more enrichment scores of the patient are generated based on one or more Tables selected from Tables 25-1 to 25-32. In certain embodiments, the one or more enrichment scores of the patient are generated based on one or more Tables selected from Tables 25-1 to 25-32, and the one or more Tables comprise at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,

13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 or 32, or any range therebetween Tables. In certain embodiments, Tables 25-1 to 25-32 (e.g., 32 Tables) are selected.

[0034] In certain embodiments, the one or more enrichment scores of the patient are generated based on one or more Tables selected from Tables 28-1 to 28-22. In certain embodiments, the one or more enrichment scores of the patient are generated based on one or more Tables selected from Tables 28-1 to 28-22, and the one or more Tables comprise at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, or 22, or any range therebetween Tables. In certain embodiments, Tables 28-1, 28-2, 28-3, 28-4, 28-5, 28-6, 28-7, 28-8, 28-12, 28-13, 28-14, 28-15, 28-16, 28-17, 28- 18, 28-19, 28-20, 28-21 and 28-22 are selected. In certain embodiments, Tables 28-1, 28-2, 28-3, 28-4, 28-6, 28-7, 28-8, 28-9, 28-10, 28-11, 28-12, 28-13, 28-14, 28-15, 28-16, 28-17, 28-18, 28-20, 28-21 and 28-22 are selected. In certain embodiments, Tables 28-1, 28-2, 28-3, 28-4, 28-5, 28-6, 28- 7, 28-8, 28-9, 28-10, 28-11, 28-12, 28-13, 28-14, 28-15, 28-16, 28-17, 28-18, 28-20, 28-21 and 28- 22 are selected. In certain embodiments, Tables 28-1 to 28-22 (e.g., 22 Tables) are selected.

[0035] In certain embodiments, the data set is derived from the gene expression measurement data using GSVA. In certain embodiments, the data set is derived from the gene expression measurement data using GSVA, and the data set comprises one or more GSVA scores of the patient. The one or more GSVA scores of the patient can be generated based on the one or more Tables selected from Tables 19-1 to 19-36, Tables 19A-1 to 19A-36, Tables 23-1 to 23-28, Tables 25-1 to 25-32, Tables 26-1 to 26-60, Tables 27-1 to 27-48, and Tables 28-1 to 28-22, wherein for each selected Table, at least one GSVA score of the patient is generated based on enrichment of expression of the at least 2 genes (or one or more human orthologs thereof) selected from the genes listed in the selected Table, in the biological sample. The one or more GSVA scores can contain the at least one GSVA score generated from each of the selected Table. The at least 2 genes (or one or more human orthologs thereof) selected from a respective selected Table, can form the input gene set for generating the at least one GSVA score from the respective selected Table, using GSVA. The at least 2 genes of the data set can comprise the at least 2 genes selected from each of the selected table. In certain embodiments, for each selected Table, the at least one GSVA score of the patient is generated based on enrichment of expression of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 203, or all, any range or value there between genes (or one or more human orthologs thereof) selected from the genes listed in the respective Table, in the biological sample, wherein number of genes selected from different selected Tables can be the same or different. In certain embodiments, for each selected Table, the at least one GSVA score of the patient is generated based on enrichment of expression of an effective number of genes (or one or more human orthologs thereof) selected from the genes listed in the selected Table, in the biological sample, wherein a different or identical number of genes can be selected from the genes listed in each selected table. In certain embodiments, for each selected Table, the at least one GSVA score of the patient is generated based on enrichment of expression of all the genes (or one or more human orthologs thereof) listed in the selected Table, in the biological sample. The genes selected from a respective selected Table (or one or more human orthologs thereof), can form the input gene set for generating the at least one GSVA score of the patient based on the respective selected Table, using GSVA. The at least one GSVA score based on a selected Table can be generated based on enrichment of the input gene set (e.g., containing the genes selected from the selected Table, e.g., at least 2 genes, effective number of genes, or all the genes selected from the selected Table) based on the selected Table, in the biological sample. Enrichment can be determined with respect to a reference data set, as described herein. In a non-limiting example, the one or more Tables selected comprise Tables: 23-1 and 23-2, and effective number of genes are selected from the genes listed in each of the Table selected, and the dataset comprises the one or more GSVA scores of the patient, thereby the one or more GSVA scores of the patient comprise at least one GSVA score generated based on Table 23-1, and at least one GSVA score generated based on Table 23-2, wherein the at least one GSVA score generated based on Table 23-1 is generated based on enrichment of the input gene set (e.g., containing the effective number of genes selected from the genes listed in Table 23-1) based on Table 23-1 in the biological sample, and the at least one GSVA score generated based on Table 23-2 is generated based on enrichment of the input gene set (e.g., containing the effective number of genes selected from the genes listed in Table 23-2) based on Table 23-2 in the biological sample. In certain embodiments, one GSVA score is generated from each of the selected Tables. In certain embodiments, the dataset comprises the one or more GSVA scores of the patients, and analyzing the data set comprises analyzing the one or more GSVA scores of the patient to classify the LN disease state of the patient. In certain embodiments, the one or more GSVA scores of the patients are analyzed, to classify the LN disease state of the patient [0036] In certain embodiments, the one or more GSVA scores of the patient are generated based on one or more Tables selected from Tables 19-1 to 19-36. In certain embodiments, the one or more GSVA scores of the patient are generated based on one or more Tables selected from Tables 19-1 to 19-36, and the one or more Tables comprise at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, or 36, or any range therebetween Tables. In certain embodiments, Tables 19-1 to 19-36 (e.g., 36 Tables) are selected.

[0037] In certain embodiments, the one or more GSVA scores of the patient are generated based on one or more Tables selected from Tables 19A-1 to 19A-36. In certain embodiments, the one or more GSVA scores of the patient are generated based on one or more Tables selected from Tables 19A-1 to 19A-36, and the one or more Tables comprise at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, or 36, or any range therebetween Tables. In certain embodiments, Tables 19A-1 to 19A-36 (e.g., 36 Tables) are selected.

[0038] In certain embodiments, the one or more GSVA scores of the patient are generated based on one or more Tables selected from Tables 26-1 to 26-60. In certain embodiments, the one or more GSVA scores of the patient are generated based on one or more Tables selected from Tables 26-1 to

26-60, and the one or more Tables comprise at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59 or 60, or any range therebetween Tables. In certain embodiments, Tables 26-1 to 26-60 (e.g., 60 Tables) are selected.

[0039] In certain embodiments, the one or more GSVA scores of the patient are generated based on one or more Tables selected from Tables 27-1 to 27-48. In certain embodiments, the one or more GSVA scores of the patient are generated based on one or more Tables selected from Tables 27-1 to

27-48, and the one or more Tables comprise at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, or 48, or any range therebetween Tables. In certain embodiments, Tables 27-1 to 27- 48 (e.g., 48 Tables) are selected.

[0040] In certain embodiments, the one or more GSVA scores of the patient are generated based on one or more Tables selected from Tables 23-1 to 23-28. In certain embodiments, the one or more GSVA scores of the patient are generated based on one or more Tables selected from Tables 23-1 to 23-28, and the one or more Tables comprise at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, or 28, or any range therebetween Tables. In certain embodiments, Tables 23-1 to 23-28 (e.g., 28 Tables) are selected.

[0041] In certain embodiments, the one or more GSVA scores of the patient are generated based on one or more Tables selected from Tables 25-1 to 25-32. In certain embodiments, the one or more GSVA scores of the patient are generated based on one or more Tables selected from Tables 25-1 to 25-32, and the one or more Tables comprise at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 or 32, or any range therebetween Tables. In certain embodiments, Tables 25-1 to 25-32 (e.g., 32 Tables) are selected.

[0042] In certain embodiments, the one or more GSVA scores of the patient are generated based on one or more Tables selected from Tables 28-1 to 28-22. In certain embodiments, the one or more GSVA scores are generated based on one or more Tables selected from Tables 28-1 to 28-22, and the one or more Tables comprise at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, or 22, or any range there between Tables. In certain embodiments, Tables 28-1, 28-2, 28-3, 28-4, 28-5, 28-6, 28-7, 28-8, 28-12, 28-13, 28-14, 28-15, 28-16, 28-17, 28-18, 28-19, 28-20, 28-21 and 28-22 are selected. In certain embodiments, Tables 28-1, 28-2, 28-3, 28-4, 28-6, 28-7, 28-8, 28- 9, 28-10, 28-11, 28-12, 28-13, 28-14, 28-15, 28-16, 28-17, 28-18, 28-20, 28-21 and 28-22 are selected. In certain embodiments, Tables 28-1, 28-2, 28-3, 28-4, 28-5, 28-6, 28-7, 28-8, 28-9, 28-10, 28-11, 28-12, 28-13, 28-14, 28-15, 28-16, 28-17, 28-18, 28-20, 28-21 and 28-22 are selected. In certain embodiments, Tables 28-1 to 28-22 (e.g., 22 Tables) are selected.

[0043] In certain embodiments, analyzing the dataset comprises analyzing gene expression of one or more gene sets formed based on the one or more Tables selected from Tables 19-1 to 19- 36, Tables 19A-1 to 19A-36, Tables 23-1 to 23-28, Tables 25-1 to 25-32, Tables 26-1 to 26- 60, Tables 27-1 to 27-48, and Tables 28-1 to 28-22, wherein genes (or one or more human orthologs thereof) selected from each of the selected Table can form a gene set of the one or more gene sets. Genes (or one or more human orthologs thereof) selected from different selected Tables can form different gene sets of the one or more gene sets. The dataset can comprise the gene expression measurement data of the one or more gene sets. The at least 2 genes (or one or more human orthologs thereof) of the dataset can comprise the genes within the one or more gene sets. The one or more Tables selected (e.g., based on which the one or more gene sets are formed) can comprise the selected Tables as described above or elsewhere herein. For a selected Table the genes selected from the selected Table can comprise the selected genes as described above or elsewhere herein, such as at least 2 genes, effective number of genes, and/or all genes from the selected Table. In certain embodiments, for each selected Table the genes selected (e.g., that forms the gene set based on the selected Table) comprise at least 2 genes (or one or more human orthologs thereof)selected from the genes listed in the selected Table, wherein the number of genes selected from different selected Tables can be the same or different. In certain embodiments, for each selected Table the genes selected (e.g., that forms the gene set based on the selected Table) comprise at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71,

72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97,

98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136,

137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150 or all genes (or one or more human orthologs thereof) selected from the genes listed in the selected Table, wherein the number of genes selected from different selected Tables can be the same or different. In certain embodiments, for each selected Table the genes selected (e.g., that forms the gene set based on the selected Table) comprise effective number of genes (or one or more human orthologs thereof) selected from the genes listed in the selected Table, wherein the number of genes selected from different selected Tables can be the same or different. In certain embodiments, for each selected Table the genes selected (e.g., that forms the gene set based on the selected Table) comprise all the genes (or one or more human orthologs thereof) listed in the selected Table. Each of the one or more gene sets can be generated based on one of the one or more selected Tables, wherein for each selected Table the genes selected (e.g., at least 2 genes, effective number of genes, and/or all genes) from the selected Table (or one or more human orthologs thereof) forms a gene set of the one or more gene set. In a non-limiting example, the one or more Tables selected comprise Tables: 23-1, 23-2 and 23-3, and effective number of genes are selected from each of the Table selected, and the data set comprises gene expression measurement data of one or more gene sets formed based on the one or more Tables selected, thereby the one or more gene sets comprise a gene set formed based on Table 23-1, a gene set formed based on Table 23-2, and a gene set formed based on Table 23-3, wherein the gene set formed based on Table 23-1 comprises effective number of genes selected from the genes listed in Table 23-1, the gene set formed based on Table 23-2 comprises effective number of genes selected from the genes listed in Table 23-2, and the gene set formed based on Table 23-3 comprises effective number of genes selected from the genes listed in Table 23-3. In certain embodiments, analyzing gene expression (e.g., in the biological sample) of a gene set (e.g., of the one or more gene sets) can include analyzing module eigengenes (MEs) of the gene set (e.g., forming a module). In certain embodiments, the dataset comprises the gene expression measurement data of the one or more gene sets, and analyzing the dataset comprises analyzing gene expression of one or more gene sets to classify the LN disease of the patient. In certain embodiments, the gene expression (e.g., in the biological sample) of the one or more gene sets can be analyzed to classify the LN disease of the patient. In certain embodiments, the one or more gene sets are generated based on one or more Tables selected from Tables 19-1 to 19-36. In certain embodiments, the one or more gene sets are generated based on one or more Tables selected from Tables 19-1 to 19-36, and the one or more Tables comprise at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, or 36, or any range therebetween Tables. In certain embodiments, Tables 19-1 to 19-36 (e.g., 36 Tables) are selected. In certain embodiments, the one or more gene sets are generated based on one or more Tables selected from Tables 19A-1 to 19A-36. In certain embodiments, the one or more gene sets are generated based on one or more Tables selected from Tables 19A-1 to 19A-36, and the one or more Tables comprise at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,

23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, or 36, or any range therebetween Tables. In certain embodiments, Tables 19A-1 to 19A-36 (e.g., 36 Tables) are selected. In certain embodiments, the one or more gene sets are generated based on one or more Tables selected from Tables 26-1 to 26- 60. In certain embodiments, the one or more gene sets are generated based on one or more Tables selected from Tables 26-1 to 26-60, and the one or more Tables comprise at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59 or 60, or any range therebetween Tables. In certain embodiments, Tables 26-1 to 26-60 (e.g., 60 Tables) are selected. In certain embodiments, the one or more gene sets are generated based on one or more Tables selected from Tables 27-1 to 27-48. In certain embodiments, the one or more gene sets are generated based on one or more Tables selected from Tables 27-1 to 27-48, and the one or more Tables comprise at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,

24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, or 48, or any range therebetween Tables. In certain embodiments, Tables 27-1 to 27-48 (e.g., 48 Tables) are selected. In certain embodiments, the one or more gene sets are generated based on one or more Tables selected from Tables 23-1 to 23-28. In certain embodiments, the one or more gene sets are generated based on one or more Tables selected from Tables 23-1 to 23-28, and the one or more Tables comprise at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, or 28, or any range there between Tables. In certain embodiments, Tables 23-1 to 23- 28 (e.g., 28 Tables) are selected. In certain embodiments, the one or more gene sets are generated based on one or more Tables selected from Tables 25-1 to 25-32. In certain embodiments, the one or more gene sets are generated based on one or more Tables selected from Tables 25-1 to 25-32, and the one or more Tables comprise at leastl, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 or 32, or any range therebetween Tables. In certain embodiments, Tables 25-1 to 25-32 (e.g., 32 Tables) are selected. In certain embodiments, the one or more gene sets are generated based on one or more Tables selected from Tables 28-1 to 28-22. In certain embodiments, the one or more gene sets generated based on one or more Tables selected from Tables 28-1 to 28-22, and the one or more Tables comprise at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, or 22, or any range there between Tables. In certain embodiments, Tables 28-1, 28-2, 28-3, 28-4, 28-5, 28-6, 28-7, 28-8, 28-12, 28-13, 28-14, 28-15, 28- 16, 28-17, 28-18, 28-19, 28-20, 28-21 and 28-22 are selected. In certain embodiments, Tables 28-1, 28-2, 28-3, 28-4, 28-6, 28-7, 28-8, 28-9, 28-10, 28-11, 28-12, 28-13, 28-14, 28-15, 28-16, 28-17, 28- 18, 28-20, 28-21 and 28-22 are selected. In certain embodiments, Tables 28-1, 28-2, 28-3, 28-4, 28- 5, 28-6, 28-7, 28-8, 28-9, 28-10, 28-11, 28-12, 28-13, 28-14, 28-15, 28-16, 28-17, 28-18, 28-20, 28- 21 and 28-22 are selected. In certain embodiments, Tables 28-1 to 28-22 (e.g., 22 Tables) are selected.

[0044] In certain embodiments, analyzing the data set comprises providing the data set as an input to a machine-learning model to classify the LN disease state of the patient. The machine-learning model can generate an inference indicative of the LN disease state of the patient, based at least on the data set. The method can classify the LN disease state of the patient based on the inference. In certain embodiments, the data set comprises the one or more enrichment scores of the patient, and the machine-learning model generates the inference based at least on the one or more enrichment scores. In certain embodiments, the data set comprises the one or more GSVA scores of the patient, and the machine-learning model generates the inference based at least on the one or more GSVA scores. In certain embodiments, the data set comprises gene expression measurement data (such as MEs) of the one or more gene sets, and the machine-learning model generates the inference based at least on the gene expression (such as MEs) of the one or more gene sets. In certain embodiments, the method further comprises receiving, as an output of the machine-learning model, the inference; and/or electronically outputting a report indicating of the LN disease state of the patient based on the inference. The machine learning model can be a trained machine learning model.

[0045] The trained machine learning model can generates the inference based at least on comparing the data set to a reference data set. The reference data set can comprise and/or be derived from gene expression measurements from reference biological samples of at least 2 genes (or human orthologs thereof) selected from the genes listed in Tables 19-1 to 19-36, Tables 19A-1 to 19A-36, Table 20, Table 21, Table 22, Tables 23-1 to 23-28, Tables 25-1 to 25-32, Tables 26-1 to 26-60, Tables 27- 1 to 27-48 and Tables 28-1 to 28-22. In certain embodiments, the at least 2 genes expression measurements of which, the reference data set is comprised of and/or derived from are selected from the genes listed in Tables 23-1 to 23-28. In certain embodiments, the at least 2 genes expression measurements of which, the reference data set is comprised of and/or derived from are selected from the genes listed in Tables 25-1 to 25-32. In certain embodiments, the at least 2 genes expression measurements of one or more human orthologs of which, the reference data set is comprised of and/or derived from are selected from the genes listed in Tables 19-1 to 19-36. In certain embodiments, the at least 2 genes expression measurements of which, the reference data set is comprised of and/or derived from are selected from the genes listed in Tables 19A-1 to 19A-36. In certain embodiments, the at least 2 genes, expression measurements of which, the reference data set is comprised of and/or derived from are selected from the genes listed in Tables 26-1 to 26-60. In certain embodiments, the at least 2 genes, expression measurements of which, the reference data set is comprised of and/or derived from are selected from the genes listed in Tables 27-1 to 27-48. In certain embodiments, the at least 2 genes, expression measurements of which, the reference data set is comprised of and/or derived from are selected from the genes listed in Tables 28-1 to 28-22. The at least 2 genes gene expression measurements of which, the reference data set is comprised of and/or derived from, and the at least 2 genes gene expression measurements of which, the data set is comprised of and/or derived from can at least partially overlap (e.g., one or more genes can be the same). In certain embodiments, the selected genes, the gene expression measurements of which (or one or more human orthologs thereof) are comprised by the data set, and the selected genes the gene expression measurements of which are comprised by the reference data set are same. In certain embodiments, selected genes of the dataset, and selected genes of the reference dataset are same. In certain embodiments, selected genes of the dataset, and selected genes of the reference dataset are same, and can be any selected set of genes e.g., of the data set, as described above or elsewhere herein. The Tables selected, and genes selected from a selected Table for the data set and the reference data set can be the same, and can be as described (e.g., for the data set) herein. In certain embodiments, the reference data set contains gene expression (such as MEs) from the reference biological samples of the one or more gene sets formed based on the selected Tables, wherein the one or more gene sets of the reference dataset can be the same (e.g., formed based on the same selected Tables and contains same genes selected from the selected Tables) as the one or more gene sets of the dataset, as described above. In certain embodiments, the machine learning model is trained based on gene expression (such as MEs) from the reference biological samples, of the one or more gene sets, and analyzing the data set include providing the gene expression (such as MEs) from the biological sample, of the one or more gene sets, to the trained machine learning model. The reference biological samples can be obtained or derived from a plurality of reference subjects. In certain embodiments, the reference biological samples comprise a first plurality of reference biological samples obtained or derived from reference subjects having LN, and a second plurality of reference biological samples obtained or derived from reference subjects not having LN. In certain embodiments, the reference biological samples comprise a first plurality of reference biological samples obtained or derived from reference subjects having acute LN, a second plurality of reference biological samples obtained or derived from reference subjects having transitional LN, a third plurality of reference biological samples obtained or derived from reference subjects having chronic LN, and/or a fourth plurality of reference biological samples obtained or derived from reference subjects not having LN. In certain embodiments, the reference biological samples comprise a first plurality of reference biological samples obtained or derived from reference subjects having acute LN, a second plurality of reference biological samples obtained or derived from reference subjects having transitional LN, a third plurality of reference biological samples obtained or derived from reference subjects having chronic LN, and a fourth plurality of reference biological samples obtained or derived from reference subjects not having LN. In certain embodiments, the reference biological samples comprise a first plurality of reference biological samples obtained or derived from reference subjects having acute LN, a second plurality of reference biological samples obtained or derived from reference subjects having transitional LN, and a third plurality of reference biological samples obtained or derived from reference subjects having chronic LN. In certain embodiments, the reference biological samples comprise a first plurality of reference biological samples obtained or derived from reference subjects having acute LN, a second plurality of reference biological samples obtained or derived from reference subjects having transitional LN, a third plurality of reference biological samples obtained or derived from reference subjects having chronic group I LN, a fourth plurality of reference biological samples obtained or derived from reference subjects having chronic group II LN, and/or a fifth plurality of reference biological samples obtained or derived from reference subjects not having LN. In certain embodiments, the reference biological samples comprise a first plurality of reference biological samples obtained or derived from reference subjects having acute LN, a second plurality of reference biological samples obtained or derived from reference subjects having transitional LN, a third plurality of reference biological samples obtained or derived from reference subjects having chronic group I LN, and a fourth plurality of reference biological samples obtained or derived from reference subjects having chronic group II LN. In certain embodiments, the reference biological samples comprise a first plurality of reference biological samples obtained or derived from reference subjects having acute LN, a second plurality of reference biological samples obtained or derived from reference subjects having transitional LN, a third plurality of reference biological samples obtained or derived from reference subjects having chronic group I LN, a fourth plurality of reference biological samples obtained or derived from reference subjects having chronic group II LN, and a fifth plurality of reference biological samples obtained or derived from reference subjects not having LN. The trained machine learning model can be trained (e.g., obtained by training) using the reference data set. A first portion of the reference data set can be used as training data set, and a second portion of the reference data set can be used as validation data set. One-vs.-one and one-vs.-rest multi-class classifications with leave-one-out cross- validation can employed to infer reference a subject’s LN disease state to one of the five groups, e.g., acute, transitional, chronic group I chronic group II, LN disease state and not having LN. In certain embodiments, 0 to 25 fold, such as 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 or 25 fold cross-validation is used. In certain embodiments, 6 fold cross- validation is used. In certain embodiments, 10 fold cross-validation is used. In certain embodiments, oversampling or undersampling correction is made during training of the machine learning model. Synthetic Minority Oversampling Technique (SMOTE) can be applied on the training data to handle class imbalances. In certain embodiments low intensity genes (e.g., with IQR< 0 ) in the reference dataset, were filtered out during training the machine learning model using the reference data set, and from the dataset during analysis of the dataset using the trained machine learning model. The trained machine learning model can be trained to generate an inference indicative of the LN disease state of a reference subject, based at least on an individual data set comprising and/or derived from gene expression measurement data of the at least 2 genes (e.g., of the reference data set) from a reference biological sample from the reference subject. In certain embodiments, the machine learning model can be trained using a method and/or reference dataset as described in the Examples. In certain embodiments, the reference data set can be derived from the gene expression measurement data of the reference biological samples, wherein the gene expression measurement data is analyzed using a suitable data analysis tool including but not limited to a BIG-C™ big data analysis tool, an I- Scope™ big data analysis tool, a T-Scope™ big data analysis tool, a CellScan big data analysis tool, an MS (Molecular Signature) Scoring ™ analysis tool, gene set variation analysis (GSVA), gene set enrichment analysis (GSEA), enrichment algorithm, Z score, multiscale embedded gene coexpression network analysis (MEGENA), weighted gene co-expression network analysis (WGCNA), differential expression analysis, log2 expression analysis, or any combination thereof, to obtain the reference data set. In certain embodiments, the gene expression measurement data of the reference biological samples can be analyzed using GSVA, to obtain the reference data set.

[0046] In certain embodiments, the reference data set comprises one or more enrichment scores of the reference biological samples, wherein for a respective reference biological sample one or more enrichment scores are generated based on one or more of the Tables selected from Tables 19-1 to 19-36, Tables 19A-1 to 19A-36, Table 20, Table 21, Table 22, Tables 23-1 to 23-28, Tables 25-1 to 25-32, Tables 26-1 to 26-60, Tables 27-1 to 27-48, and Table 28-1 to 28-22, wherein for each selected Table, at least one enrichment score of the respective reference biological sample based on the selected Table is generated based on enrichment of expression of at least 2 genes (or one or more human orthologs thereof) selected from the genes listed in the respective selected Table, in the respective reference biological sample. In certain embodiments, for a reference biological sample, the one or more enrichment scores of the reference biological sample can be generated using the same method as that used for the patient (test) sample (e.g., using the same selected Tables and genes selected from the selected Tables). The at least 2 genes, effective number of genes, all genes (or one or more human orthologs thereof) selected from the genes listed in a respective selected Table, can form the input gene set for generating the at least one enrichment score based on the respective selected Table. Enrichment of the input gene set formed based on a selected Table, in a reference biological sample can be measured for generating the at least one enrichment score based on the selected Table, of the reference biological sample. In certain embodiments, the one or more Tables are selected from Tables 19-1 to 19-36. In certain embodiments, the one or more Tables are selected from Tables 19A-1 to 19A-36. In certain embodiments, the one or more Tables are selected from Tables 26-1 to 26-60. In certain embodiments, the one or more Tables are selected from Tables 27-1 to 27-48. In certain embodiments, the one or more Tables are selected from Tables 23-1 to 23-28. In certain embodiments, the one or more Tables are selected from Tables 25- 1 to 25-32. In certain embodiments, the one or more Tables are selected from Tables 28-1 to 28-22. The one or more Tables selected, and the genes selected from the selected Tables for generating the one or more enrichment scores of the reference biological samples can be same as the one or more Tables selected, and the genes selected from the selected Tables respectively used for generating the one or more enrichment scores of the patient, and can be any of the selected Tables and selected genes described herein. The one or more enrichment scores can comprise the at least one enrichment score from each of the selected Table. The at least 2 genes of the reference data set can include the at least 2 genes from each of the selected table. In certain embodiments, the selected tables of the data set (e.g., based on which the one or more enrichment scores of the patient are generated), and the selected tables of the reference data set (e.g., based on which the one or more enrichment scores of the reference biological samples are generated) can at least partially overlap (e. g., one or more selected Tables can be same). In certain embodiments, the selected tables of the data set, and the selected tables of the reference data set are the same. In certain embodiments, the selected tables and genes selected from the selected Tables of the data set, and the selected tables and genes selected from the selected Tables of the reference data set, are the same. Enrichment of expression the selected genes (or one or more human orthologs thereof) in a respective reference biological sample, e.g., for calculating the one or more enrichment scores of the respective reference biological sample, can be measured by comparing the gene expression from the respective reference biological sample with that of the cohort (e.g., the reference biological samples). In certain embodiments, the one or more enrichment scores of the patient are generated based on comparing the data set with a reference data set, wherein the reference data set can be a reference data set described herein. In certain embodiments, the one or more enrichment scores of the patient are generated based on comparing the data set with the reference data set, and the enrichment of expression of the selected genes, (e.g., for calculating the one or more enrichment scores of the patient) in the biological sample from the patient can be calculated based on comparing gene expression measurement data of the biological sample, with the gene expression measurement data of the reference biological samples. In certain embodiments, the machine learning model is trained based on the one or more enrichment scores of the reference biological samples, and analyzing the data set include providing the one or more enrichment scores of the patient to the trained machine learning model. The reference data set used for generating the one or more enrichment scores of the patient, can be the same as or different from the reference data set used for training the machine learning model. In certain embodiments, the reference data set used for generating the one or more enrichment scores of the patient, is same as the reference data set used for training the machine learning model. The enrichment score can be generated using any suitable method, including but not limited to GSEA, and GSVA. In certain embodiments, the enrichment scores are generated based on GSVA, and the enrichment scores are GSVA scores.

[0047] In certain embodiments, the reference data set is obtained using GSVA, wherein the reference data set comprises one or more GSVA scores of the reference biological samples, wherein for a respective reference biological sample one or more GSVA scores are generated based on one or more of the Tables selected from Tables 19-1 to 19-36, Tables 19A-1 to 19A-36, Table 20, Table 21, Table 22, Tables 23-1 to 23-28, Tables 25-1 to 25-32, Tables 26-1 to 26-60, Tables 27-1 to 27-48, and Table 28-1 to 28-22, wherein for each selected Table, at least one GSVA score of the respective reference biological sample based on the selected Table is generated based on enrichment of expression of at least 2 genes selected from the genes listed in the respective selected Table, in the respective reference biological sample. In certain embodiments, for a reference biological sample, the one or more GSVA scores of the reference biological sample can be generated using a method same (e.g., using the same selected Tables and genes selected from the selected Tables) as of the patient. The at least 2 genes, effective number of genes, all genes (or one or more human orthologs thereof) selected from the genes listed in a respective selected Table, can form the input gene set for generating the at least one GSVA score based on the respective selected Table, using GSVA. Enrichment of the input gene set formed based on a selected Table, in a reference biological sample can be measured for generating the at least one GSVA score based on the selected Table of the reference biological sample. In certain embodiments, the one or more Tables are selected from Tables 19-1 to 19-36. In certain embodiments, the one or more Tables are selected from Tables 19A-1 to 19A-36. In certain embodiments, the one or more Tables are selected from Tables 26-1 to 26-60. In certain embodiments, the one or more Tables are selected from Tables 27-1 to 27-48. In certain embodiments, the one or more Tables are selected from Tables 23-1 to 23-28. In certain embodiments, the one or more Tables are selected from Tables 25-1 to 25-32. In certain embodiments, the one or more Tables are selected from Tables 28-1 to 28-22. The one or more Tables selected, and the genes selected from the selected Tables for generating the one or more GSVA scores of the reference biological samples can be same as the one or more Tables selected, and the genes selected from the selected Tables respectively used for generating the one or more GSVA scores of the patient, and can be any of the selected Tables and selected genes described herein. The one or more GSVA scores can comprise the at least one GSVA score from each of the selected Table. The at least 2 genes of the reference data set can include the at least 2 genes from each of the selected table. In certain embodiments, the selected tables of the data set (e.g., based on which the one or more GSVA scores of the patient are generated), and the selected tables of the reference data set (e.g., based on which the one or more GSVA scores of the reference biological samples are generated) can at least partially overlap (e. g., one or more selected Tables can be same). In certain embodiments, the selected tables of the data set, and the selected tables of the reference data set are the same. In certain embodiments, the selected tables and genes selected from the selected Tables of the data set, and the selected tables and genes selected from the selected Tables of the reference data set, are the same. Enrichment of expression of the selected genes in a respective reference biological sample, e.g., for calculating the one or more GSVA scores of the respective reference biological sample, can be measured by comparing the gene expression from the respective reference biological sample with that of the cohort (e.g., the reference biological samples). In certain embodiments, the one or more GSVA scores of the patient are generated based on comparing the data set with a reference data set, wherein the reference data set can be a reference data set described herein. In certain embodiments, the one or more GSVA scores of the patient are generated based on comparing the data set with the reference data set, and the enrichment of expression of the selected genes, (e.g., for calculating the one or more GSVA scores of the patient) in the biological sample from the patient can be calculated based on comparing gene expression measurement data of the biological sample, with the gene expression measurement data of the reference biological samples. In certain embodiments, the machine learning model is trained based on the one or more GSVA scores of the reference biological samples, and analyzing the data set include providing the one or more GSVA scores of the patient to the trained machine learning model. The reference data set used for generating the one or more GSVA scores of the patient, can be same or different as the reference data set used for training the machine learning model. In certain embodiments, the reference data set used for generating the one or more GSVA scores of the patient, is same as the reference data set used for training the machine learning model. In certain embodiments, the reference data set can be a data set described in the examples. The reference subjects can be human. The patient can be a human patient.

[0048] The trained machine-learning model can be trained (e.g., obtained by training) using linear regression, logistic regression, Ridge regression, Lasso regression, elastic net (EN) regression, support vector machine (SVM), gradient boosted machine (GBM), k nearest neighbors (kNN), generalized linear model (GLM), naive Bayes (NB) classifier, neural network, Random Forest (RF), deep learning algorithm, linear discriminant analysis (LDA), decision tree learning (DTREE), adaptive boosting (ADB), Classification and Regression Tree (CART), hierarchical clustering, or any combination thereof. The algorithm of the trained machine learning model can be a machine learning classifier, e.g., mentioned in this paragraph. The machine learning classifier (e.g., linear regression, LOG, Ridge regression, Lasso regression, EN regression, SVM, GBM, kNN, GLM, NB classifier, neural network, a RF, deep learning algorithm, LDA, DTREE, ADB, CART, and/or hierarchical clustering) can be trained to obtain the trained machine learning model. In some embodiments, the trained machine learning model, is trained using a supervised machine learning algorithm or an unsupervised machine learning algorithm, e.g., the classifier can be a supervised machine learning algorithm or an unsupervised machine learning algorithm. In certain embodiments, the trained machine-learning model is trained using linear regression. In certain embodiments, the trained machine-learning model is trained using logistic regression. In certain embodiments, the trained machine-learning model is trained using Lasso regression. In certain embodiments, the trained machine-learning model is trained using EN regression. In certain embodiments, the trained machine-learning model is trained using SVM. In certain embodiments, the trained machine-learning model is trained using GBM. In certain embodiments, the trained machine-learning model is trained using kNN. In certain embodiments, the trained machine-learning model is trained using GLM. In certain embodiments, the trained machine-learning model is trained using NB classifier. In certain embodiments, the trained machine-learning model is trained using neural network. In certain embodiments, the trained machine-learning model is trained using RF. In certain embodiments, the trained machine-learning model is trained using deep learning algorithm. In certain embodiments, the trained machine-learning model is trained using LDA. In certain embodiments, the trained machine-learning model is trained using DTREE. In certain embodiments, the trained machinelearning model is trained using ADB. In certain embodiments, the trained machine-learning model is trained using CART. In certain embodiments, the trained machine-learning model is trained using hierarchical clustering.

[0049] The LN disease state of the patient can be classified with an accuracy of at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%. The LN disease state of the patient can be classified with a sensitivity of at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%. The LN disease state of the patient can be classified with a specificity of at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%. The LN disease state of the patient can be classified with a positive predictive value of at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%. The LN disease state of the patient can be classified with a negative predictive value of at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%. The LN disease state of the patient can be classified with a Receiver operating characteristic (ROC) curve having an Area-Under-Curve (AUC) of at least about 0.70, at least about 0.75, at least about 0.80, at least about 0.85, at least about 0.90, at least about 0.91, at least about 0.92, at least about 0.93, at least about 0.94, at least about 0.95, at least about 0.96, at least about 0.97, at least about 0.98, at least about 0.99, or more than about 0.99. The trained machine learning model can have a Receiver operating characteristic (ROC) curve having an Area-Under-Curve (AUC) of at least about 0.70, at least about 0.75, at least about 0.80, at least about 0.85, at least about 0.90, at least about 0.91, at least about 0.92, at least about 0.93, at least about 0.94, at least about 0.95, at least about 0.96, at least about 0.97, at least about 0.98, at least about 0.99, or more than about 0.99 for classifying LN disease states.

[0050] In some embodiments, the method classifies the LN disease state of the patient with an accuracy of 70 % to 100 %. In some embodiments, the method classifies the LN disease state of the patient with an accuracy of 70 % to 75 %, 70 % to 80 %, 70 % to 85 %, 70 % to 90 %, 70 % to 92 %, 70 % to 95 %, 70 % to 96 %, 70 % to 97 %, 70 % to 98 %, 70 % to 99 %, 70 % to 100 %, 75 % to 80 %, 75 % to 85 %, 75 % to 90 %, 75 % to 92 %, 75 % to 95 %, 75 % to 96 %, 75 % to 97 %, 75 % to 98 %, 75 % to 99 %, 75 % to 100 %, 80 % to 85 %, 80 % to 90 %, 80 % to 92 %, 80 % to 95 %, 80 % to 96 %, 80 % to 97 %, 80 % to 98 %, 80 % to 99 %, 80 % to 100 %, 85 % to 90 %, 85 % to 92 %, 85 % to 95 %, 85 % to 96 %, 85 % to 97 %, 85 % to 98 %, 85 % to 99 %, 85 % to 100 %, 90 % to 92 %, 90 % to 95 %, 90 % to 96 %, 90 % to 97 %, 90 % to 98 %, 90 % to 99 %, 90 % to 100 %, 92 % to 95 %, 92 % to 96 %, 92 % to 97 %, 92 % to 98 %, 92 % to 99 %, 92 % to 100 %, 95 % to 96 %, 95 % to 97 %, 95 % to 98 %, 95 % to 99 %, 95 % to 100 %, 96 % to 97 %, 96 % to 98 %, 96 % to 99 %, 96 % to 100 %, 97 % to 98 %, 97 % to 99 %, 97 % to 100 %, 98 % to 99 %, 98 % to 100 %, or 99 % to 100 %. In some embodiments, the method classifies the LN disease state of the patient with an accuracy of 70 %, 75 %, 80 %, 85 %, 90 %, 92 %, 95 %, 96 %, 97 %, 98 %, 99 %, or 100 %. In some embodiments, the method classifies the LN disease state of the patient with an accuracy of at least 70 %, 75 %, 80 %, 85 %, 90 %, 92 %, 95 %, 96 %, 97 %, 98 %, or 99 %. In some embodiments, the method classifies the LN disease state of the patient with a sensitivity of 70 % to 100 %. In some embodiments, the method classifies the LN disease state of the patient with a sensitivity of 70 % to 75 %, 70 % to 80 %, 70 % to 85 %, 70 % to 90 %, 70 % to 92 %, 70 % to 95 %, 70 % to 96 %, 70 % to 97 %, 70 % to 98 %, 70 % to 99 %, 70 % to 100 %, 75 % to 80 %, 75 % to 85 %, 75 % to 90 %, 75 % to 92 %, 75 % to 95 %, 75 % to 96 %, 75 % to 97 %, 75 % to 98 %, 75 % to 99 %, 75 % to 100 %, 80 % to 85 %, 80 % to 90 %, 80 % to 92 %, 80 % to 95 %, 80 % to 96 %, 80 % to 97 %, 80 % to 98 %, 80 % to 99 %, 80 % to 100 %, 85 % to 90 %, 85 % to 92 %, 85 % to 95 %, 85 % to 96 %, 85 % to 97 %, 85 % to 98 %, 85 % to 99 %, 85 % to 100 %, 90 % to 92 %, 90 % to 95 %, 90 % to 96 %, 90 % to 97 %, 90 % to 98 %, 90 % to 99 %, 90 % to 100 %, 92 % to 95 %, 92 % to 96 %, 92 % to 97 %, 92 % to 98 %, 92 % to 99 %, 92 % to 100 %, 95 % to 96 %, 95 % to 97 %, 95 % to 98 %, 95 % to 99 %, 95 % to 100 %, 96 % to 97 %, 96 % to 98 %, 96 % to 99 %, 96 % to 100 %, 97 % to 98 %, 97 % to 99 %, 97 % to 100 %, 98 % to 99 %, 98 % to 100 %, or 99 % to 100 %. In some embodiments, the method classifies the LN disease state of the patient with a sensitivity of 70 %, 75 %, 80 %, 85 %, 90 %, 92 %, 95 %, 96 %, 97 %, 98 %, 99 %, or 100 %. In some embodiments, the method classifies the LN disease state of the patient with a sensitivity of at least 70 %, 75 %, 80 %, 85 %, 90 %, 92 %, 95 %, 96 %, 97 %, 98 %, or 99 %. In some embodiments, the method classifies the LN disease state of the patient with a specificity of 70 % to 100 %. In some embodiments, the method classifies the LN disease state of the patient with a specificity of 70 % to 75 %, 70 % to 80 %, 70 % to 85 %, 70 % to 90 %, 70 % to 92 %, 70 % to 95 %, 70 % to 96 %, 70 % to 97 %, 70 % to 98 %, 70 % to 99 %, 70 % to 100 %, 75 % to 80 %, 75 % to 85 %, 75 % to 90 %, 75 % to 92 %, 75 % to 95 %, 75 % to 96 %, 75 % to 97 %, 75 % to 98 %, 75 % to 99 %, 75 % to 100 %, 80 % to 85 %, 80 % to 90 %, 80 % to 92 %, 80 % to 95 %, 80 % to 96 %, 80 % to 97 %, 80 % to 98 %, 80 % to 99 %, 80 % to 100 %, 85 % to 90 %, 85 % to 92 %, 85 % to 95 %, 85 % to 96 %, 85 % to 97 %, 85 % to 98 %, 85 % to 99 %, 85 % to 100 %, 90 % to 92 %, 90 % to 95 %, 90 % to 96 %, 90 % to 97 %, 90 % to 98 %, 90 % to 99 %, 90 % to 100 %, 92 % to 95 %, 92 % to 96 %, 92 % to 97 %, 92 % to 98 %, 92 % to 99 %, 92 % to 100 %, 95 % to 96 %, 95 % to 97 %, 95 % to 98 %, 95 % to 99 %, 95 % to 100 %, 96 % to 97 %, 96 % to 98 %, 96 % to 99 %, 96 % to 100 %, 97 % to 98 %, 97 % to 99 %, 97 % to 100 %, 98 % to 99 %, 98 % to 100 %, or 99 % to 100 %. In some embodiments, the method classifies the LN disease state of the patient with a specificity of 70 %, 75 %, 80 %, 85 %, 90 %, 92 %, 95 %, 96 %, 97 %, 98 %, 99 %, or 100 %. In some embodiments, the method classifies the LN disease state of the patient with a specificity of at least 70 %, 75 %, 80 %, 85 %, 90 %, 92 %, 95 %, 96 %, 97 %, 98 %, or 99 %. In some embodiments, the method classifies the LN disease state of the patient with a positive predictive value of 70 % to 100 %. In some embodiments, the method classifies the LN disease state of the patient with a positive predictive value of 70 % to 75 %, 70 % to 80 %, 70 % to 85 %, 70 % to 90 %, 70 % to 92 %, 70 % to 95 %, 70 % to 96 %, 70 % to 97 %, 70 % to 98 %, 70 % to 99 %, 70 % to 100 %, 75 % to 80 %, 75 % to 85 %, 75 % to 90 %, 75 % to 92 %, 75 % to 95 %, 75 % to 96 %, 75 % to 97 %, 75 % to 98 %, 75 % to 99 %, 75 % to 100 %, 80 % to 85 %, 80 % to 90 %, 80 % to 92 %, 80 % to 95 %, 80 % to 96 %, 80 % to 97 %, 80 % to 98 %, 80 % to 99 %, 80 % to 100 %, 85 % to 90 %, 85 % to 92 %, 85 % to 95 %, 85 % to 96 %, 85 % to 97 %, 85 % to 98 %, 85 % to 99 %, 85 % to 100 %, 90 % to 92 %, 90 % to 95 %, 90 % to 96 %, 90 % to 97 %, 90 % to 98 %, 90 % to 99 %, 90 % to 100 %, 92 % to 95 %, 92 % to 96 %, 92 % to 97 %, 92 % to 98 %, 92 % to 99 %, 92 % to 100 %, 95 % to 96 %, 95 % to 97 %, 95 % to 98 %, 95 % to 99 %, 95 % to 100 %, 96 % to 97 %, 96 % to 98 %, 96 % to 99 %, 96 % to 100 %, 97 % to 98 %,

97 % to 99 %, 97 % to 100 %, 98 % to 99 %, 98 % to 100 %, or 99 % to 100 %. In some embodiments, the method classifies the LN disease state of the patient with a positive predictive value of 70 %, 75 %, 80 %, 85 %, 90 %, 92 %, 95 %, 96 %, 97 %, 98 %, 99 %, or 100 %. In some embodiments, the method classifies the LN disease state of the patient with a positive predictive value of at least 70 %, 75 %, 80 %, 85 %, 90 %, 92 %, 95 %, 96 %, 97 %, 98 %, or 99 %. In some embodiments, the method classifies the LN disease state of the patient with a negative predictive value of 70 % to 100 %. In some embodiments, the method classifies the LN disease state of the patient with a negative predictive value of 70 % to 75 %, 70 % to 80 %, 70 % to 85 %, 70 % to 90 %, 70 % to 92 %, 70 % to 95 %, 70 % to 96 %, 70 % to 97 %, 70 % to 98 %, 70 % to 99 %, 70 % to 100 %, 75 % to 80 %, 75 % to 85 %, 75 % to 90 %, 75 % to 92 %, 75 % to 95 %, 75 % to 96 %, 75 % to 97 %, 75 % to 98 %, 75 % to 99 %, 75 % to 100 %, 80 % to 85 %, 80 % to 90 %, 80 % to 92 %, 80 % to 95 %, 80 % to 96 %, 80 % to 97 %, 80 % to 98 %, 80 % to 99 %, 80 % to 100 %, 85 % to 90 %, 85 % to 92 %, 85 % to 95 %, 85 % to 96 %, 85 % to 97 %, 85 % to 98 %, 85 % to 99 %, 85 % to 100 %, 90 % to 92 %, 90 % to 95 %, 90 % to 96

%, 90 % to 97 %, 90 % to 98 %, 90 % to 99 %, 90 % to 100 %, 92 % to 95 %, 92 % to 96 %, 92

% to 97 %, 92 % to 98 %, 92 % to 99 %, 92 % to 100 %, 95 % to 96 %, 95 % to 97 %, 95 % to

98 %, 95 % to 99 %, 95 % to 100 %, 96 % to 97 %, 96 % to 98 %, 96 % to 99 %, 96 % to 100

%, 97 % to 98 %, 97 % to 99 %, 97 % to 100 %, 98 % to 99 %, 98 % to 100 %, or 99 % to 100

%. In some embodiments, the method classifies the LN disease state of the patient with a negative predictive value of 70 %, 75 %, 80 %, 85 %, 90 %, 92 %, 95 %, 96 %, 97 %, 98 %, 99 %, or 100 %. In some embodiments, the method classifies the LN disease state of the patient with a negative predictive value of at least 70 %, 75 %, 80 %, 85 %, 90 %, 92 %, 95 %, 96 %, 97 %, 98 %, or 99 %. In some embodiments, the AUC of the ROC curve of the trained machine learning model is 0.7 to 1, for classifying LN disease states. In some embodiments, the AUC of the ROC curve of the trained machine learning model is 0.7 to 0.75, 0.7 to 0.8, 0.7 to 0.85, 0.7 to 0.9, 0.7 to 0.92, 0.7 to 0.95, 0.7 to 0.96, 0.7 to 0.97, 0.7 to 0.98, 0.7 to 0.99, 0.7 to 1, 0.75 to 0.8, 0.75 to 0.85, 0.75 to 0.9, 0.75 to 0.92, 0.75 to 0.95, 0.75 to 0.96, 0.75 to 0.97, 0.75 to 0.98, 0.75 to 0.99, 0.75 to 1, 0.8 to 0.85, 0.8 to 0.9, 0.8 to 0.92, 0.8 to 0.95, 0.8 to 0.96, 0.8 to 0.97, 0.8 to 0.98, 0.8 to 0.99, 0.8 to 1, 0.85 to 0.9, 0.85 to 0.92, 0.85 to 0.95, 0.85 to 0.96, 0.85 to 0.97, 0.85 to 0.98, 0.85 to 0.99, 0.85 to 1, 0.9 to 0.92, 0.9 to 0.95, 0.9 to 0.96, 0.9 to 0.97, 0.9 to 0.98, 0.9 to 0.99, 0.9 to 1, 0.92 to 0.95, 0.92 to 0.96, 0.92 to 0.97, 0.92 to 0.98, 0.92 to 0.99, 0.92 to 1, 0.95 to 0.96, 0.95 to 0.97, 0.95 to 0.98, 0.95 to 0.99, 0.95 to 1, 0.96 to 0.97, 0.96 to 0.98, 0.96 to 0.99, 0.96 to 1, 0.97 to 0.98, 0.97 to 0.99, 0.97 to 1, 0.98 to 0.99, 0.98 to 1, or 0.99 to 1, for classifying LN disease states. In some embodiments, the AUC of the ROC curve of the trained machine learning model is 0.7, 0.75, 0.8, 0.85, 0.9, 0.92, 0.95, 0.96, 0.97, 0.98, 0.99, or 1, for classifying LN disease states. In some embodiments, the AUC of the ROC curve of the trained machine learning model is at least 0.7, 0.75, 0.8, 0.85, 0.9, 0.92, 0.95, 0.96, 0.97, 0.98, or 0.99, for classifying LN disease states.

[0051] The trained machine-learning model can have the accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and ROC-AUC value, described above, and the accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and ROC-AUC value of the method for classifying the LN disease state of the patient can be based on the classification parameters (e.g., accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and ROC-AUC respectively) of the trained machine-learning model for classifying the LN disease state of patients, as described herein and/or as understood by one of skill in the art. In certain embodiments, the accuracy, sensitivity, specificity, positive predictive value, and negative predictive value for classifying LN disease state of the patient can be the accuracy, sensitivity, specificity, positive predictive value, and negative predictive value respectively for classifying whether the patient has acute LN, transitional LN, or chronic LN, or does not have LN. In certain embodiments, the accuracy, sensitivity, specificity, positive predictive value, and negative predictive value for classifying LN disease state of the patient can be the accuracy, sensitivity, specificity, positive predictive value, and negative predictive value respectively for classifying whether the patient has acute LN, transitional LN, or chronic LN. In certain embodiments, the accuracy, sensitivity, specificity, positive predictive value, and negative predictive value for classifying LN disease state of the patient can be the accuracy, sensitivity, specificity, positive predictive value, and negative predictive value respectively for classifying whether the patient has acute LN, transitional LN, chronic group I LN, or chronic group II LN, or does not have LN. In certain embodiments, the accuracy, sensitivity, specificity, positive predictive value, and negative predictive value for classifying LN disease state of the patient can be the accuracy, sensitivity, specificity, positive predictive value, and negative predictive value respectively for classifying whether the patient has acute LN, transitional LN, chronic group I LN, or chronic group II LN. In certain embodiments, the accuracy, sensitivity, specificity, positive predictive value, and negative predictive value for classifying LN disease state of the patient can be the accuracy, sensitivity, specificity, positive predictive value, and negative predictive value respectively for classifying whether the patient has LN. The accuracy, sensitivity, specificity, positive predictive value, and/or negative predictive value can be calculated based on the AUC of the ROC curve of the trained machine learning model for classifying the LN disease states.

[0052] In certain embodiments, the LN disease state of the patient is classified based on a LN disease risk score. The LN disease risk score can be generated from the data set. In certain embodiments, the LN disease risk score is generated based on the one or more GSVA scores of the patient. In certain embodiments, the LN disease state of the patient is classified based on comparing the LN disease risk score of the patient to one or more reference values. In certain embodiments, generating the LN disease risk score of the patient comprises developing one or more weighted GSVA scores of the patient from the one or more GSVA scores, and summing the one or more weighted GSVA scores to obtain the disease risk score of the patient. For a respective GSVA score of the one or more GSVA scores, the corresponding weighted GSVA score is obtained by multiplying the respective GSVA score with its corresponding weight factor, wherein the corresponding weight factor is determined based on contribution of the set of genes based on which the respective GSVA score is generated, on the classification of the LN disease state of the patient. The set of genes based on which the respective GSVA score is generated are the genes based on enrichment of expression which in the biological sample, the respective GSVA score is generated. In certain particular embodiments, the one or more GSVA scores of the patient is binarized, and the binarized GSVA scores are multiplied with the corresponding weight factors to obtain the weighted GSVA scores. In certain embodiments, binarizing the one or more GSVA scores includes replacing all GSVA scores (e.g., of the one or more GSVA scores) above a threshold value with a first value, and replacing all GSVA scores (e.g., of the one or more GSVA scores) equal to or below the threshold value with a second value. In certain particular embodiments, the threshold value is 0, the first value is 1, and the second value is 0. The one or more GSVA scores can be generated using a method as described herein. In certain embodiments, the weight factors are calculated based on training a machine learning model, wherein the trained machine learning model can generate an inference indicating the LN disease state of a reference subject based on the one or more GSVA scores of the reference subject. The trained machine learning model can be a trained machine learning model as described herein, and/or can be trained according a method and a reference data set as described herein. The gene sets based on which the one or more GSVA scores are generated can be features of the machine learning model. The GSVA scores can be the feature values. For a respective reference subject GSVA score generated based on a gene set, can be the feature value of the gene set for the respective reference subject. The feature co-efficients of the features can be the weight factors. The corresponding weight factor for a respective GSVA score is the feature coefficient of the gene set (e.g., a feature) based on which the GSVA score is generated. The feature co-efficient, can be the average feature co-efficients of the iterations run during training the model. In certain embodiments, the machine learning model is trained using the one or more GSVA scores of the reference subjects (e.g., of a reference dataset described herein) and a ridge regression algorithm with penalty, to obtain the weight factors. In certain embodiments, the machine learning model was trained using the one or more GSVA scores of the reference subjects (e.g., of a reference dataset described herein) having acute LN disease state and of the reference subjects having group II chronic LN disease state, and a ridge regression algorithm with penalty, to obtain the weight factors. [0053] In certain embodiments, classifying the LN disease state of the patient includes classifying whether the patient has acute LN disease state, transitional LN disease state, or chronic LN disease state, or does not have LN, e.g., the LN disease state of the patient is classified as acute lupus nephritis, transitional lupus nephritis, chronic lupus nephritis, or absence of lupus nephritis. In certain embodiments, classifying the LN disease state of the patient includes classifying whether the patient has acute LN disease state, transitional LN disease state, or chronic LN disease state. In certain embodiments, classifying the LN disease state of the patient includes classifying whether the patient has acute LN disease state, transitional LN disease state, chronic group I LN disease state, or chronic group II LN disease state, or does not have LN. In certain embodiments, classifying the LN disease state of the patient includes classifying whether the patient has acute LN disease state, transitional LN disease state, chronic group I LN disease state, or chronic group II LN disease state. In certain embodiments, classifying LN disease state of the patient includes classifying whether the patient has LN. In certain embodiments, the LN disease state of the patient is classified as acute LN disease state, transitional LN disease state, chronic LN disease state, or absence of LN. In certain embodiments, the LN disease state of the patient is classified as acute LN disease state, transitional LN disease state, or chronic LN disease state. In certain embodiments, the LN disease state of the patient is classified as acute LN disease state, transitional LN disease state, chronic group I LN disease state, chronic group II LN disease state, or absence of LN. In certain embodiments, the LN disease state of the patient is classified as acute LN disease state, transitional LN disease state, chronic group I LN disease state, or chronic group II LN disease state. In certain embodiments, the LN disease state of the patient is classified as presence of LN, or absence of LN. The chronic LN disease state can be chronic group I LN disease state or chronic group II LN disease state.

[0054] The inference of the trained machine learning model can include a confidence value between 0 and 1. In certain embodiments, the confidence value of the inference of the trained machine learning model is between 0 and 1, such as, 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 or 1, or any value or ranges there between, that the patient has acute LN. In certain embodiments, the confidence value of the inference of the trained machine learning model is between 0 and 1, such as, 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 or 1, or any value or ranges there between, that the patient has transitional LN. In certain embodiments, the confidence value of the inference of the trained machine learning model is between 0 and 1, such as, 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 or 1, or any value or ranges there between, that the patient has chronic LN. In certain embodiments, the confidence value of the inference of the trained machine learning model is between 0 and 1, such as, 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 or 1, or any value or ranges there between, that the patient has chronic group I LN disease state. In certain embodiments, the confidence value of the inference of the trained machine learning model is between 0 and 1, such as, 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 or 1, or any value or ranges there between, that the patient has chronic group II LN disease state.

[0055] In certain embodiments, the patient has lupus. In certain embodiments, the patient has LN. In certain embodiments, the patient is at elevated risk of having lupus. In certain embodiments, the patient is at elevated risk of having LN. In certain embodiments, the patient is suspected of having lupus. In certain embodiment, the patient is suspected of having LN. In certain embodiments, the patient is asymptomatic for lupus. In certain embodiments, the patient is asymptomatic for LN.

[0056] In certain embodiments, the method further includes recommending, selecting and/or administering a treatment to the patient based at least in part on the classification of the LN disease state of the patient. In certain embodiments, the method further includes administering a treatment to the patient based at least in part on the classification of the LN disease state of the patient. In certain embodiments, the treatment is configured to treat LN. In certain embodiments, the treatment is configured to reduce a severity of LN. In certain embodiments, the treatment is configured to reduce a risk of having LN. In certain embodiments, the treatment administered is configured to prevent, reverse and/or slow disease progression from non-disease state to acute LN, acute LN to transitional LN, and/or transitional LN to chronic LN. In certain embodiments, the treatment administered is configured to prevent, reverse and/or slow disease progression from non-disease state to acute LN. In certain embodiments, the treatment administered is configured to prevent, reverse and/or slow disease progression from acute LN to transitional LN. In certain embodiments, the treatment administered is configured to prevent, reverse and/or slow disease progression from transitional LN to chronic LN. In certain embodiments, the treatment administered is configured to prevent, reverse and/or slow disease progression from acute LN to transitional LN, and/or transitional LN to chronic LN. In certain embodiments, the treatment configured to prevent, reverse and/or slow disease progression from acute LN to transitional LN, may be configured to target inflammatory macrophages, and/or the GC/Tfh cell response. In certain embodiments, the treatment configured to prevent, reverse and/or slow disease progression from transitional LN to chronic LN, may be configured to protect kidney tubules. In certain embodiments, the treatment configured to protect kidney tubules may target mitochondrial dysfunction in the kidney tubulointerstitial tissue. In certain embodiments, a patient determined to have acute LN is administered with the treatment configured to prevent, reverse and/or slow disease progression from acute LN to transitional LN. In certain embodiments, a patient determined to have transitional LN is administered with the treatment configured to prevent, reverse and/or slow disease progression from transitional LN to chronic LN, and/or the treatment configured to prevent, reverse and/or slow disease progression from acute LN to transitional LN. The treatment can include one or more treatments of lupus nephritis. The treatment can comprises a pharmaceutical composition. As shown in Example 3, increased enrichment of inflammatory cells/pathways at the transitional LN stage appears to represent whether mice will progress to chronic LN stage and/or renal failure. Relatively early detection of unique gene signatures indicative of transitional LN, can be beneficial in treating LN, and stop disease progression to chronic LN stage and/or renal failure. Specific features of this stage can include an enrichment of gene signatures for IFN, Thl7 cells, and/or increased enrichment of macrophages, and the treatment administered can include one or more pharmaceutical composition targeting one or more molecular pathways associated with these gene signatures. In certain embodiments, the one or more pharmaceutical composition targeting one or more molecular pathways associated with enrichment of gene signatures for IFN, Thl7 cells, and/or increased enrichment of macrophages, can include one or more of anifrolumab, secukinumab, ibrutinib, or the like. In certain embodiments, the treatment administered can include the one or more pharmaceutical composition targeting one or more molecular pathways associated with enrichment of gene signatures for IFN, Thl7 cells, and/or increased enrichment of macrophages. In certain embodiments, the treatment administered can include one or more of anifrolumab, secukinumab, ibrutinib, or the like. In certain embodiments, a patient determined to have transitional LN is administered with the treatment comprising the one or more pharmaceutical composition targeting one or more molecular pathways associated with enrichment of gene signatures for IFN, Thl7 cells, and/or increased enrichment of macrophages. In certain embodiments, the treatment comprises a neutrophil function inhibitor, a TNF inhibitor, an IL1 inhibitor, a Plasma cell inhibitor, a NK cell inhibitor, a B Cell Inhibitor, an IFN inhibitor, or any combination thereof. Non-limiting examples of an IFN inhibitor include Anifrolumab. Non-limiting examples of a Plasma cell inhibitor include Mycophenolate, Bortezomib, Carfdzomib, Ixazomib, Daratumumab, Isatuximab and Elotuzumab. Mycophenolate can be Mycophenolate Mofetil. Nonlimiting examples of an IL1 inhibitor include Anakinra, and Canakinumab. Non-limiting examples of a TNF inhibitor include Adalimumab, Certolizumab pegol, Etanercept, Golimumab, and Infliximab. Non-limiting examples of a Neutrophil function inhibitor include Dasatinib, Apremilast, and Roflumilast. Non-limiting examples of a NK cell inhibitor include Azathioprine. Non-limiting examples of a B cell inhibitor include Belimumab, Rituximab, Obinutuzumab, Ocrelizumab, Ofatumumab, and Inebilizumab. In certain embodiments, the treatment comprises Anifrolumab, Mycophenolate, Bortezomib, Carfdzomib, Ixazomib, Daratumumab, Isatuximab, Elotuzumab, Anakinra, Canakinumab Adalimumab, Certolizumab pegol, Etanercept, Golimumab, Infliximab, Dasatinib, Apremilast, Roflumilast, Azathioprine, Belimumab, Rituximab, Obinutuzumab, Ocrelizumab, Ofatumumab, Inebilizumab, or any combination thereof. In certain embodiments, the treatment for acute LN disease state comprises a neutrophil function inhibitor, a TNF inhibitor, an IL1 inhibitor, a Plasma cell inhibitor, a NK cell inhibitor, a B Cell Inhibitor, an IFN inhibitor, or any combination thereof. In certain embodiments, the treatment for acute LN disease state comprises Anifrolumab, Mycophenolate, Bortezomib, Carfdzomib, Ixazomib, Daratumumab, Isatuximab, Elotuzumab, Anakinra, Canakinumab Adalimumab, Certolizumab pegol, Etanercept, Golimumab, Infliximab, Dasatinib, Apremilast, Roflumilast, Azathioprine, Belimumab, Rituximab, Obinutuzumab, Ocrelizumab, Ofatumumab, Inebilizumab, or any combination thereof. In certain embodiments, a patient determined to have transitional LN is administered with the treatment comprising an IFN inhibitor. In certain embodiments, a patient determined to have transitional LN is administered with the treatment comprising an IFN inhibitor, a Th 17 cell inhibitor, a B cell inhibitor, or any combination thereof. In certain embodiments, the treatment for transitional LN disease state comprises an IFN inhibitor, a TNF inhibitor, a Th 17 cell inhibitor, a B cell inhibitor, or any combination thereof. In certain embodiments, a patient determined to have transitional LN is administered with the treatment comprising anifrolumab, secukinumab, ibrutinib, or the like. In certain embodiments, the treatment for transitional LN disease state comprises anifrolumab, secukinumab, ibrutinib, or the like. In certain embodiments, the treatment for transitional LN disease state comprises anifrolumab, secukinumab, ibrutinib, belimumab, Rituximab, Obinutuzumab, Ocrelizumab, Ofatumumab, Inebilizumab, or any combination thereof. In certain embodiments, the treatment for transitional LN disease state comprises anifrolumab, secukinumab, ibrutinib, belimumab, Rituximab, Obinutuzumab, Ocrelizumab, Ofatumumab, Inebilizumab, Adalimumab, Certolizumab pegol, Etanercept, Golimumab, and Infliximab or any combination thereof. In certain embodiments, the treatment for chronic LN disease state comprises kidney transplantation. Association between mitochondrial dysfunction and loss of kidney tubule cells indicates that attempts to target immunometabolism by treatment with anti-diabetic drugs such as metformin may exacerbate metabolic defects and contribute to kidney damage in lupus nephritis patients. In certain embodiments, the treatment administered does not target immunometabolism by treatment with anti-diabetic drugs such as metformin. In certain embodiments, the treatment administered does not include an anti-diabetic drug, such as metformin, that targets immunometabolism. In certain embodiments, the treatment administered to a patient determined to have transitional LN, does not include an anti-diabetic drug, such as metformin, that targets immunometabolism.

[0057] The biological sample can comprise a kidney biopsy sample, a blood sample, isolated peripheral blood mononuclear cells (PBMCs), urine sample, or any derivative thereof. In certain embodiments, the biological sample comprises a kidney biopsy sample or any derivative thereof. In certain embodiments, the biological sample comprises a blood sample or any derivative thereof. In certain embodiments, the biological sample comprises PBMCs or any derivative thereof. In certain embodiments, the biological sample comprises a urine sample or any derivative thereof. The reference biological samples can comprise kidney biopsy samples, blood samples, isolated peripheral blood mononuclear cells (PBMCs), urine samples, or any derivative thereof. In certain embodiments, the reference biological samples comprise kidney biopsy samples or any derivative thereof. In certain embodiments, the reference biological samples comprise blood samples or any derivative thereof. In certain embodiments, the reference biological samples comprise PBMCs or any derivative thereof. In certain embodiments, the reference biological samples comprise urine samples or any derivative thereof. The biological sample and the reference biological sample can be of similar type. In certain embodiments, the biological sample comprises PBMCs or any derivative thereof, and the reference biological samples comprise PBMCs or any derivative thereof. In certain embodiments, the biological sample comprises a blood sample or any derivative thereof, and the reference biological samples comprise blood samples or any derivative thereof. In certain embodiments, the biological sample comprises a kidney biopsy sample or any derivative thereof, and the reference biological samples comprise kidney biopsy samples or any derivative thereof. The blood sample can be whole blood sample or any derivative thereof. In certain embodiments, the kidney biopsy sample contains renal cortex. In certain embodiments, the kidney biopsy sample contains non-microdissected renal cortex. In certain embodiments, the kidney biopsy sample contains glomerular tissue. In certain embodiments, the kidney biopsy sample contains tubulointerstitial tissue. In certain embodiments, the kidney biopsy sample contains glomerular and tubulointerstitial tissue. In certain embodiments, the biological sample comprise a blood sample, or any derivative thereof, and the data set comprises and/or derived from gene expression measurement of genes selected from the genes listed in Tables 25-1 to 25-32, e.g., the one or more Tables are selected from Tables 25-1 to 25-32. In certain embodiments, the biological sample comprise PBMCs, or any derivative thereof, and the data set comprises and/or derived from gene expression measurement of genes selected from the genes listed in Tables 25-1 to 25-32, e.g., the one or more Tables are selected from Tables 25-1 to 25-32. In certain embodiments, the biological sample comprise a kidney biopsy sample, or any derivative thereof, and the data set comprises and/or derived from gene expression measurement of genes selected from the genes listed in Tables 23-1 to 23-28, e.g., the one or more Tables are selected from Tables 23-1 to 23-28. In certain embodiments, the biological sample comprise a kidney biopsy sample, or any derivative thereof, and the data set comprises and/or derived from gene expression measurement of the genes selected from genes listed in Tables 28-1 to 28-22, e.g., the one or more Tables are selected from Tables 28-1 to 28-22. In certain embodiments, the biological sample comprise a kidney biopsy sample, or any derivative thereof, wherein the kidney biopsy sample contains glomerular tissue and the data set comprises and/or derived from gene expression measurement of genes selected from genes listed in Tables 28- 1 to 28-8, and 28-12 to 28-22, e.g., the one or more Tables selected comprise Tables 28-1 to 28-8, and 28-12 to 28-22. In certain embodiments, the biological sample comprise a kidney biopsy sample, or any derivative thereof, wherein the kidney biopsy sample contains tubulointerstitial tissue and the data set comprises and/or derived from gene expression measurement of genes selected from genes listed in Tables 28-1 to 28-4, 28-6 to 28-18, and 28-20 to 28-22, e.g., the one or more Tables selected comprise Tables 28-1 to 28-4, 28-6 to 28-18, and 28-20 to 28-22. In certain embodiments, the biological sample comprise a kidney biopsy sample, or any derivative thereof, wherein the kidney biopsy sample contains tubulointerstitial tissue and the data set comprises and/or derived from gene expression measurement of genes selected from genes listed in Tables 28-1 to 28-18, and 28-20 to 28-22, e.g., the one or more Tables selected comprise Tables 28-1 to 28-18, and 28-20 to 28-22. The patient can be a human patient.

[0058] To obtain a kidney biopsy sample, various techniques may be used. The kidney biopsy sample can include kidney samples removed from the body. Kidney biopsy can be performed using any suitable technique known to those of skill in the art. In certain embodiments, kidney biopsy can be performed using percutaneous (through the skin) biopsy, open biopsy, or any combination thereof. Percutaneous biopsy can include isolating the kidney biopsy sample by inserting a needle through the skin that lies above the kidney. Open biopsy can include isolating the kidney biopsy sample during surgery. The area, size, and amount of the kidney biopsy sample may vary depending upon the condition being analyzed.

[0059] In certain embodiments, the method further comprises monitoring the LN disease state of the patient, wherein the monitoring comprises assessing the LN disease state of the patient at a plurality of different time points. A difference in the assessment of the LN disease state of the patient among the plurality of time points can be indicative of one or more clinical indications selected from the group consisting of: (i) a diagnosis of the LN disease state of the patient, (ii) a prognosis of the LN disease state of the patient, and (iii) an efficacy or non-efficacy of a course of treatment for treating the LN disease state of the patient. In certain embodiments, the patient has been administered a treatment, and the method can assess an efficacy or non-efficacy of the treatment, for treating the LN disease state of the patient.

[0060] Acute, transitional and chronic lupus nephritis disease state can be characterized by gene enrichment analysis corresponding to the coral group (acute LN), yellow group (transitional LN), purple group (chronic group I LN) and black group (chronic group II LN) as described in Example 4, and FIGs. 74A and 74B (in a kidney/renal biopsy sample), and/or in FIG. 77 (in a blood sample).

[0061] Patients having acute lupus nephritis disease state can have transcriptomic characteristics corresponding to (e.g., falls within) the coral group, as described in Example 4, FIGs. 74A and 74B (in a kidney/renal biopsy sample), and/or FIG. 77 (in a blood sample). In certain embodiments, patients having acute lupus nephritis disease state have i) unchanged expression, of one or more immune/inflammatory cell signature gene modules, ii) minimally decreased expression of one or more kidney cell signature gene modules, iii) unchanged expression of one or more metabolic signature gene modules, iv) unchanged expression of one or more endothelial cell signature gene modules, and/or v) unchanged expression of one or more fibroblast signature gene modules, in a kidney biopsy sample compared to a control. The control can be subjects without LN. In certain embodiments, patients with acute lupus nephritis disease state have i) a mean GSVA score of 0.33 ± 0.1 of kidney cell signature gene modules, ii) a mean GSVA score of -0.12 ± 0.1 of endothelial cell signature gene modules, iii) a mean GSVA score of -0.15 ± 0.1 of fibroblast signature gene modules, iv) a mean GSVA score of -0.15 ± 0.1 of mesangial cell signature gene modules, v) a mean GSVA score of 0.24 ± 0.1 of metabolic signatures gene modules, vi) a mean GSVA score of -0.26 ± 0.1 of immune/inflammatory cell signature gene modules, or any combination thereof, where the GSVA scores are determined with the method, and with respect to the data set described in example 4. In certain embodiments, disease pathology of patients having acute LN is sufficiently confined to glomeruli, with or no or minimum damage to kidney cells, and kidney tubules. In certain embodiments, glomeruli of patients having acute LN is increased in size compared to a control. In certain embodiments, glomeruli of patients having acute LN have immune complex deposition. In certain embodiments, glomeruli of patients having acute LN are increased in size with immune cell infiltration and/or immune complex deposition, such as IgG, C3, and/or anti-nuclear antibody (ANA) deposits. Patients having acute lupus nephritis disease state can have any one or more characteristics selected from those described in this paragraph.

[0062] Patients having transitional LN disease state, have more severe disease compared to patients with acute LN, but less severe disease compared to chronic LN. In certain embodiments, patients having transitional LN disease state have transcriptomic characteristics corresponding to (e.g., falls within) the yellow group, as described in Example 4, FIGs. 74A and 74B (in a kidney/renal biopsy sample), and/or FIG. 77 (in a blood sample). In certain embodiments, patients having transitional LN disease state have i) minimally increased expression of one or more immune/inflammatory cell signature gene modules, ii) minimally decreased expression of one or more kidney cell signature gene modules, iii) minimally decreased expression of one or more metabolic signature gene modules, iv) minimally increased expression of one or more endothelial cell signature gene modules, and/or v) minimally increased expression of one or more fibroblast signature gene modules, in a kidney biopsy sample compared to a control. The control can be subjects without LN. In certain embodiments, patients having transitional lupus nephritis disease state have i) a mean GSVA score of 0.18 ± 0.1 of kidney cell signature gene modules, ii) a mean GSVA score of 0.02 ± 0.1 of endothelial cell signature gene modules, iii) a mean GSVA score of -0.02 ± 0.1 of fibroblast signature gene modules, iv) a mean GSVA score of -0.06 ± 0.1 of mesangial cell signature gene modules, v) a mean GSVA score of 0.17 ± 0.1 of metabolic signatures gene modules, vi) a mean GSVA score of 0.05 ± 0.1 of immune/inflammatory cell signature gene modules, or any combination thereof, where the GSVA scores are determined with the method, and with respect to the data set described in example 4. In certain embodiments, patients having transitional LN have inflammatory disease, with no or minimal kidney cell damage and/or no or minimal metabolic dysfunction. In certain embodiments, glomeruli of patients having transitional LN have immune cell infiltration with IgG and/or C3 deposition. For patients having transitional LN, IgG and/or C3 deposition in glomeruli, and serum levels of anti-DNA antibodies can be higher compared to acute LN stage. Interstitium of patients having transitional LN can have more inflammatory cells than acute LN stage, and tubular cells may show some dilation and atrophy. Patients having transitional LN may have minimum to no tubule damage. Patients having transitional lupus nephritis disease state can have any one or more characteristics selected from those described in this paragraph.

[0063] Patients having chronic LN disease state, have more severe disease compared to patients with transitional LN. In certain embodiments, patients having chronic LN disease state have transcriptomic characteristics corresponding to (e.g., falls within) the purple group (chronic group I LN) or black group (chronic group II LN), as described in Example 4, FIGs. 74A and 74B (in a kidney/renal biopsy sample), and/or FIG. 77 (in a blood sample). In certain embodiments, patients having chronic group I LN disease state have i) increased expression of one or more immune/inflammatory cell signature gene modules, ii) decreased expression of one or more kidney cell signature gene modules, iii) decreased expression of one or more metabolic signature gene modules, iv) unchanged expression of one or more endothelial cell signature gene modules, and/or v) increased expression of one or more fibroblast signature gene modules, in a kidney biopsy sample compared to a control. The control can be subjects without LN. In certain embodiments, chronic group I LN disease state have i) a mean GSVA score of -0.24 ± 0.1 of kidney cell signature gene modules, ii) a mean GSVA score of -0.07 ± 0.1 of endothelial cell signature gene modules, iii) a mean GSVA score of 0.08 ± 0.1 of fibroblast signature gene modules, iv) a mean GSVA score of 0.09 ± 0.1 of mesangial cell signature gene modules, v) a mean GSVA score of -0.19 ± 0.1 of metabolic signatures gene modules, vi) a mean GSVA score of 0.31 ± 0.1 of immune/inflammatory cell signature gene modules, or any combination thereof, where the GSVA scores are determined with the method, and with respect to the data set described in example 4. In certain embodiments, patients having chronic group I LN disease state have inflammatory disease with kidney cell damage and/or metabolic dysfunction. In certain embodiments, patients having chronic group II LN disease state have i) unchanged expression of one or more immune/inflammatory cell signature gene modules, ii) decreased expression of one or more kidney cell signature gene modules, iii) decreased expression of one or more metabolic signature gene modules, iv) increased expression of one or more endothelial cell signature gene modules, and/or v) increased expression of one or more fibroblast signature gene modules, in a kidney biopsy sample compared to a control. The control can be subjects without LN. In certain embodiments, chronic group II LN disease state have i) a mean GSVA score of -0.21 ± 0.1 of kidney cell signature gene modules, ii) a mean GSVA score of 0.25 ± 0.1 of endothelial cell signature gene modules, iii) a mean GSVA score of 0.07 ± 0.1 of fibroblast signature gene modules, iv) a mean GSVA score of 0.1 ± 0.1 of mesangial cell signature gene modules, v) a mean GSVA score of -0.15 ± 0.1 of metabolic signatures gene modules, vi) a mean GSVA score of -0.25 ± 0.1 of immune/inflammatory cell signature gene modules, or any combination thereof, where the GSVA scores are determined with the method, and with respect to the data set described in example 4. In certain embodiments, patients having chronic group II LN disease state have kidney cell damage and/or metabolic dysfunction, with no or minimal inflammation. In certain embodiments, patients having chronic LN, exhibit glomerular sclerosis, fibrosis with interstitial inflammation, and elevated level of immune complex depositions. Immune complex depositions in patients having chronic LN can be higher compared to acute and transitional disease stages. In certain embodiments, 80% or more tubular cells of patients with chronic LN, have tubular dilation, and/or with evidence of atrophy and tubular casts. Patients having chronic lupus nephritis disease state can have any one or more characteristics selected from those described in this paragraph.

[0064] The gene modules are disclosed in Table 23-1 to 23-28, and Tables 28-1 to 28-22. Immune/inflammatory cell signature gene modules can include Anergic/activated T cell, B Cell signature module, Dendritic Cell signature module, GC B Cell signature module, Granulocyte signature module, LDG signature module, Monocyte/Myeloid Cell signature module, NK Cell signature module, PDC signature module, Plasma Cell signature module, Platelet signature module, Antigen presenting cell signature module, CD8 T cell signature module, IG Chain cell signature module, Monocyte-Macrophage signature module, Myeloid Cell signature module, Platelet signature module, Tfh Cell signature module, Thl7 Cell signature module, and T Cell signature module. Metabolic signature gene modules can include Amino Acid Metabolism signature module, Fatty Acid Oxidation signature module, Fatty Acid Alpha Oxidation signature module, Fatty Acid Beta Oxidation signature module, Glycolysis signature module, Oxidative Phosphorylation signature module, Pentose Phosphate signature module, general mitochondria signature module, and TCA cycle signature module. Kidney cell signature gene modules can include Collecting Duct signature module, Distal Tubule signature module, Kidney Cell signature module, Loop of Henle signature module, Mesangial Cell signature module, Podocyte signature module, and Proximal Tubule signature module.

[0065] Unchanged expression of a gene module in a patient, compared to a control can refer to GSVA score change, e.g., GSVA score of the module for the patient vs. GSVA score of the module for the control, of less than ± 0.1 (e.g., within -0.1 and ±0.1). Minimally increased expression of a gene module, compared to control can refer to GSVA score change of 0 to < 0.2. Minimally decreased expression of a gene module, compared to control can refer to GSVA score change of - 0.2 < to 0. Increased expression of a gene module, compared to control can refer to GSVA score change of greater than 0.2. Decrease expression of a gene module, compared to control can refer to GSVA can refer to GSVA score change more negative than -0.2. The control can be subjects/patients with absence of lupus nephritis. The GSVA scores (discussed in this paragraph) can be determined with the method, and with respect to the data set described herein, e.g., in example 4.

[0066] Certain aspects are directed to use of a data set described herein.

[0067] Certain aspects of the present disclosure is directed to a method for validating a mouse model useful for identifying and/or characterizing a human disease. The method can include (a) providing a gene set capable of classifying a mouse as having an endotype selected from two or more endotypes of the disease; (b) determining human orthologs of the gene set; (c) classifying a human patient as having an endotype selected from the two or more endotypes of the disease using the human orthologs; and/or (d) using the human orthologs to classify the mouse model as having an endotype selected from the two or more endotypes of the disease. The endotype of a validated mouse model classified using the human orthologs can correspond to the human endotype of step (c) identified using the human orthologs. The disease can be lupus, or lupus nephritis. In certain embodiments, the disease is lupus nephritis. In certain embodiments, disease is lupus nephritis, and the endotypes of lupus nephritis are acute LN, transitional LN, and chronic LN.

[0068] In another aspect, the present disclosure provides a method of identifying one or more records having a specific phenotype, the method comprising: receiving a plurality of first records, wherein each first record is associated with one or more of a plurality of phenotypes; receiving a plurality of second records, wherein each second record is associated with one or more of the plurality of phenotypes, and wherein the plurality of second records and the plurality of first records are non-overlapping; applying a machine learning algorithm to at least one first record and at least one second record to determine a classifier; receiving a plurality of third records, wherein the third records are distinct from the plurality of first records and the plurality of second records; and applying the classifier to the plurality of third records to identify one or more third records associated with the specific phenotype.

[0069] In some embodiments, the first records and the second records comprise nucleic acid sequencing data, transcriptome data, genome data, epigenome data, proteome data, metabolome data, virome data, methylome data, lipidomic data, lineage-ome data, nucleosomal occupancy data, a genetic variant, a gene fusion, an insertion or deletion (indel), or any combination thereof. In some embodiments, the first records and the second records are in different formats. In some embodiments, the first records and the second records are from different sources, different studies, or both. In some embodiments, the phenotype comprises a disease state, an organ involvement, a medication response, or any combination thereof. In some embodiments, the classifier comprises an elastic generalized linear model classifier, a ^nearest neighbors classifier, a random forest classifier, or any combination thereof.

[0070] In some embodiments, the elastic generalized linear model classifier employs an elastic penalty of about 0.8 to about 1. In some embodiments, the elastic generalized linear model classifier employs an elastic penalty of at least about 0.8, about 0.825, about 0.85, about 0.875, about 0.9, about 0.925, about 0.95, about 0.975, or about 1. In some embodiments, the elastic generalized linear model classifier employs an elastic penalty of at most about 0.8, about 0.825, about 0.85, about 0.875, about 0.9, about 0.925, about 0.95, about 0.975, or about 1. In some embodiments, the elastic generalized linear model classifier employs an elastic penalty of about 0.8 to about 0.825, about 0.8 to about 0.85, about 0.8 to about 0.875, about 0.8 to about 0.9, about 0.8 to about 0.925, about 0.8 to about 0.95, about 0.8 to about 0.975, about 0.8 to about 1, about 0.825 to about 0.85, about 0.825 to about 0.875, about 0.825 to about 0.9, about 0.825 to about 0.925, about 0.825 to about 0.95, about 0.825 to about 0.975, about 0.825 to about 1, about 0.85 to about 0.875, about 0.85 to about 0.9, about 0.85 to about 0.925, about 0.85 to about 0.95, about 0.85 to about 0.975, about 0.85 to about 1, about 0.875 to about 0.9, about 0.875 to about 0.925, about 0.875 to about 0.95, about 0.875 to about 0.975, about 0.875 to about 1, about 0.9 to about 0.925, about 0.9 to about 0.95, about 0.9 to about 0.975, about 0.9 to about 1, about 0.925 to about 0.95, about 0.925 to about 0.975, about 0.925 to about 1, about 0.95 to about 0.975, about 0.95 to about 1, or about 0.975 to about 1. In some embodiments, the elastic generalized linear model classifier employs an elastic penalty of about 0.8, about 0.825, about 0.85, about 0.875, about 0.9, about 0.925, about 0.95, about 0.975, or about 1.

[0071] In some embodiments, the k-nearest neighbors classifier employs a K value of the size of the plurality of distinct first data sets, wherein k is about 1 to about 20. In some embodiments, the k- nearest neighbors classifier employs a K value of the size of the plurality of distinct first data sets, wherein k is at least about 1, about 2, about 3, about 4, about 5, about 6, about 8, about 10, about 12, about 14, about 16, or about 20. In some embodiments, the k-nearest neighbors classifier employs a K value of the size of the plurality of distinct first data sets, wherein k is at most about 1, about 2, about 3, about 4, about 5, about 6, about 8, about 10, about 12, about 14, about 16, or about 20. In some embodiments, the k-nearest neighbors classifier employs a K value of the size of the plurality of distinct first data sets, wherein k is about 1 to about 2, about 1 to about 3, about 1 to about 4, about 1 to about 5, about 1 to about 6, about 1 to about 8, about 1 to about 10, about 1 to about 12, about 1 to about 14, about 1 to about 16, about 1 to about 20, about 2 to about 3, about 2 to about 4, about 2 to about 5, about 2 to about 6, about 2 to about 8, about 2 to about 10, about 2 to about 12, about 2 to about 14, about 2 to about 16, about 2 to about 20, about 3 to about 4, about 3 to about 5, about 3 to about 6, about 3 to about 8, about 3 to about 10, about 3 to about 12, about 3 to about 14, about 3 to about 16, about 3 to about 20, about 4 to about 5, about 4 to about 6, about 4 to about 8, about 4 to about 10, about 4 to about 12, about 4 to about 14, about 4 to about 16, about 4 to about 20, about 5 to about 6, about 5 to about 8, about 5 to about 10, about 5 to about 12, about 5 to about 14, about 5 to about 16, about 5 to about 20, about 6 to about 8, about 6 to about 10, about 6 to about 12, about 6 to about 14, about 6 to about 16, about 6 to about 20, about 8 to about 10, about 8 to about 12, about 8 to about 14, about 8 to about 16, about 8 to about 20, about 10 to about 12, about 10 to about 14, about 10 to about 16, about 10 to about 20, about 12 to about 14, about 12 to about 16, about 12 to about 20, about 14 to about 16, about 14 to about 20, or about 16 to about 20. In some embodiments, the k-nearest neighbors classifier employs a K value of the size of the plurality of distinct first data sets, wherein k is about 1, about 2, about 3, about 4, about 5, about 6, about 8, about 10, about 12, about 14, about 16, or about 20.

[0072] In some embodiments, the K-value of the random forest classifier is incremented by 1 if the k-value is an even number. In some embodiments, applying a machine learning algorithm to the third data set comprises applying a machine learning algorithm to a plurality of unique third data sets.

[0073] In some embodiments, the classifier identifies said one or more third records associated with the specific phenotype with an accuracy of about 70% to about 100%. In some embodiments, the classifier identifies said one or more third records associated with the specific phenotype with an accuracy of at least about 70%, about 75%, about 80%, about 85%, about 90%, about 95%, or about 100%. In some embodiments, the classifier identifies said one or more third records associated with the specific phenotype with an accuracy of at most about 70%, about 75%, about 80%, about 85%, about 90%, about 95%, or about 100%. In some embodiments, the classifier identifies said one or more third records associated with the specific phenotype with an accuracy of about 70% to about 75%, about 70% to about 80%, about 70% to about 85%, about 70% to about 90%, about 70% to about 95%, about 70% to about 100%, about 75% to about 80%, about 75% to about 85%, about 75% to about 90%, about 75% to about 95%, about 75% to about 100%, about 80% to about 85%, about 80% to about 90%, about 80% to about 95%, about 80% to about 100%, about 85% to about 90%, about 85% to about 95%, about 85% to about 100%, about 90% to about 95%, about 90% to about 100%, or about 95% to about 100%. In some embodiments, the classifier identifies said one or more third records associated with the specific phenotype with an accuracy of about 70%, about 75%, about 80%, about 85%, about 90%, about 95%, or about 100%.

[0074] In some embodiments, the classifier identifies said one or more third records associated with the specific phenotype with an accuracy of about 70% to about 100%. In some embodiments, the classifier identifies said one or more third records associated with the specific phenotype with an accuracy of at least about 70%, about 75%, about 80%, about 85%, about 90%, about 95%, or about 100%. In some embodiments, the classifier identifies said one or more third records associated with the specific phenotype with an accuracy of at most about 70%, about 75%, about 80%, about 85%, about 90%, about 95%, or about 100%. In some embodiments, the classifier identifies said one or more third records associated with the specific phenotype with an accuracy of about 70% to about 75%, about 70% to about 80%, about 70% to about 85%, about 70% to about 90%, about 70% to about 95%, about 70% to about 100%, about 75% to about 80%, about 75% to about 85%, about 75% to about 90%, about 75% to about 95%, about 75% to about 100%, about 80% to about 85%, about 80% to about 90%, about 80% to about 95%, about 80% to about 100%, about 85% to about 90%, about 85% to about 95%, about 85% to about 100%, about 90% to about 95%, about 90% to about 100%, or about 95% to about 100%. In some embodiments, the classifier identifies said one or more third records associated with the specific phenotype with an accuracy of about 70%, about 75%, about 80%, about 85%, about 90%, about 95%, or about 100%.

[0075] In some embodiments, the classifier herein enables a specific phenotype association sensitivity of about 70% to about 100%. In some embodiments, the classifier herein enables a specific phenotype association sensitivity of at least 70%, about 75%, about 80%, about 85%, about 90%, about 95%, or about 100%. In some embodiments, the classifier herein enables a specific phenotype association sensitivity of at most 70%, about 75%, about 80%, about 85%, about 90%, about 95%, or about 100%. In some embodiments, the classifier herein enables a specific phenotype association sensitivity of about 70% to about 75%, about 70% to about 80%, about 70% to about 85%, about 70% to about 90%, about 70% to about 95%, about 70% to about 100%, about 75% to about 80%, about 75% to about 85%, about 75% to about 90%, about 75% to about 95%, about 75% to about 100%, about 80% to about 85%, about 80% to about 90%, about 80% to about 95%, about 80% to about 100%, about 85% to about 90%, about 85% to about 95%, about 85% to about 100%, about 90% to about 95%, about 90% to about 100%, or about 95% to about 100%. In some embodiments, the classifier herein enables a specific phenotype association sensitivity of about 70%, about 75%, about 80%, about 85%, about 90%, about 95%, or about 100%.

[0076] In some embodiments, the classifier herein enables a specific phenotype association specificity of about 70% to about 100%. In some embodiments, the classifier herein enables a specific phenotype association specificity of at least 70%, about 75%, about 80%, about 85%, about 90%, about 95%, or about 100%. In some embodiments, the classifier herein enables a specific phenotype association specificity of at most 70%, about 75%, about 80%, about 85%, about 90%, about 95%, or about 100%. In some embodiments, the classifier herein enables a specific phenotype association specificity of about 70% to about 75%, about 70% to about 80%, about 70% to about 85%, about 70% to about 90%, about 70% to about 95%, about 70% to about 100%, about 75% to about 80%, about 75% to about 85%, about 75% to about 90%, about 75% to about 95%, about 75% to about 100%, about 80% to about 85%, about 80% to about 90%, about 80% to about 95%, about 80% to about 100%, about 85% to about 90%, about 85% to about 95%, about 85% to about 100%, about 90% to about 95%, about 90% to about 100%, or about 95% to about 100%. In some embodiments, the classifier herein enables a specific phenotype association specificity of about 70%, about 75%, about 80%, about 85%, about 90%, about 95%, or about 100%.

[0077] In some embodiments, the method further comprises filtering the first records, the second records, or both. In some embodiments, the filtering comprises removing outliers, removing background noise, removing data without annotation data, normalizing, scaling, variance correcting, Weighted Gene Co-expression Network Analysis, enrichment analysis, dimensionality reduction, or any combination thereof. In some embodiments, the normalizing is performed by Robust MultiArray Analysis (RMA), Guanine Cytosine Robust Multi-Array Analysis (GCRMA), Linear Models for Microarray Data, variance stabilizing transformation (VST), normal-exponential quantile correction (NEQC), or any combination thereof. In some embodiments, the variance correction comprises employing a local empirical Bayesian shrinkage, adjusting the p-values for multiple hypothesis testing using the Benjamini -Hochberg correction, and removing all data with a set false discovery rate

[0078] In some embodiments, the false discovery rate is about 0.000001 to about 0.2. In some embodiments, the false discovery rate is at least about 0.000001. In some embodiments, the false discovery rate is at most about 0.2. In some embodiments, the false discovery rate is about 0.000001 to about 0.00005, about 0.000001 to about 0.00001, about 0.000001 to about 0.0005, about 0.000001 to about 0.0001, about 0.000001 to about 0.005, about 0.000001 to about 0.001, about 0.000001 to about 0.05, about 0.000001 to about 0.01, about 0.000001 to about 0.2, about 0.00005 to about 0.00001, about 0.00005 to about 0.0005, about 0.00005 to about 0.0001, about 0.00005 to about 0.005, about 0.00005 to about 0.001, about 0.00005 to about 0.05, about 0.00005 to about 0.01, about 0.00005 to about 0.2, about 0.00001 to about 0.0005, about 0.00001 to about 0.0001, about 0.00001 to about 0.005, about 0.00001 to about 0.001, about 0.00001 to about 0.05, about 0.00001 to about 0.01, about 0.00001 to about 0.2, about 0.0005 to about 0.0001, about 0.0005 to about 0.005, about 0.0005 to about 0.001, about 0.0005 to about 0.05, about 0.0005 to about 0.01, about 0.0005 to about 0.2, about 0.0001 to about 0.005, about 0.0001 to about 0.001, about 0.0001 to about 0.05, about 0.0001 to about 0.01, about 0.0001 to about 0.2, about 0.005 to about 0.001, about 0.005 to about 0.05, about 0.005 to about 0.01, about 0.005 to about 0.2, about 0.001 to about 0.05, about 0.001 to about 0.01, about 0.001 to about 0.2, about 0.05 to about 0.01, about 0.05 to about 0.2, or about 0.01 to about 0.2. In some embodiments, the false discovery rate is about 0.000001, about 0.00005, about 0.00001, about 0.0005, about 0.0001, about 0.005, about 0.001, about 0.05, about 0.01, or about 0.2.

[0079] In some embodiments, the Weighted Gene Co-expression Network Analysis comprises calculating a topology matrix, clustering the data based on the topology matrix, and correlating module eigenvalues for traits on a linear scale by Pearson correlation, for nonparametric traits by Spearman correlation, and for dichotomous traits by point-biserial correlation or t-test. The Pearson correlation or the Product Moment Correlation Coefficient (PMCC), is a number between -1 and 1 that indicates the extent to which two variables are linearly related. The Spearman correlation is a nonparametric measure of rank correlation; statistical dependence between the rankings of two variables.

[0080] In another aspect, the present disclosure provides a non-transitory computer-readable storage media encoded with a computer program including instructions executable by a processor to create an application for identifying one or more records having a specific phenotype, the application comprising: a first receiving module receiving a plurality of first records, wherein each first record is associated with one or more of a plurality of phenotypes; a second receiving module receiving a plurality of second records, wherein each second record is associated with one or more of the plurality of phenotypes, and wherein the plurality of second records and the plurality of first records are non-overlapping; a machine learning module applying a machine learning algorithm to at least one first record and at least one second record to determine a classifier; a third receiving module receiving a plurality of third records, wherein the third records are distinct from the plurality of first records and the plurality of second records; and a classifying module applying the classifier to the plurality of third records to identify one or more third records associated with the specific phenotype.

[0081] In some embodiments, the first records and the second records comprise nucleic acid sequencing data, transcriptome data, genome data, epigenome data, proteome data, metabolome data, virome data, methylome data, lipidomic data, lineage-ome data, nucleosomal occupancy data, a genetic variant, a gene fusion, an insertion or deletion (indel), or any combination thereof. In some embodiments, the first records and the second records are in different formats. In some embodiments, the first records and the second records are from different sources, different studies, or both. In some embodiments, the phenotype comprises a disease state, an organ involvement, a medication response, or any combination thereof. In some embodiments, the classifier comprises an elastic generalized linear model classifier, a A;-nearest neighbors classifier, a random forest classifier, or any combination thereof. In some embodiments, the elastic generalized linear model classifier employs an elastic penalty of about 0.9. In some embodiments, the A;-nearest neighbors classifier employs a K-value of about 5% of the size of the plurality of distinct first data sets. In some embodiments, the K-value of the random forest classifier is incremented by 1 if the k-value is an even number. In some embodiments, applying a machine learning algorithm to the third data set comprises applying a machine learning algorithm to a plurality of unique third data sets. In some embodiments, said classifier identifies said one or more third records associated with the specific phenotype with an accuracy of at least about 70%. In some embodiments, the method further comprises filtering the first records, the second records, or both. In some embodiments, the filtering comprises removing outliers, removing background noise, removing data without annotation data, normalizing, scaling, variance correcting, Weighted Gene Co-expression Network Analysis, enrichment analysis, dimensionality reduction, or any combination thereof. In some embodiments, the normalizing is performed by Robust Multi-Array Analysis (RMA), Guanine Cytosine Robust Multi-Array Analysis (GCRMA), Linear Models for Microarray Data, variance stabilizing transformation (VST), normal-exponential quantile correction (NEQC), or any combination thereof. In some embodiments, the variance correction comprises employing a local empirical Bayesian shrinkage, adjusting the p-values for multiple hypothesis testing using the Benjamini -Hochberg correction, and removing all data with a false discovery rate of less than 0.2. In some embodiments, the Weighted Gene Co-expression Network Analysis comprises calculating a topology matrix, clustering the data based on the topology matrix, and correlating module eigenvalues for traits on a linear scale by Pearson correlation, for nonparametric traits by Spearman correlation, and for dichotomous traits by point-biserial correlation or t-test.

[0082] In some embodiments, the plurality of quantitative measures comprises gene expression measurements. In some embodiments, the immunological state comprises an active or inactive state of each of one or more of the plurality of genomic loci. In some embodiments, the plurality of genomic loci comprises one or more genes selected from the group consisting of: RAB4B, ADAR, MRPL44, CDCA5, MYD88, SNN, BRD3, C7orf43, CDC20, SP1, POFUT1, SAMD4B, ATP6V1B2, TSPAN9, SP140, STK26, IRF4, LCP1, LM02, SF3B4, HIST2H2AA3, CITED4, ADAM8, TICAM1, and HSD17B7.

Biological Data Analysis

[0083] In another aspect, the present disclosure provides a computer-implemented method for assessing a condition of a subject, comprising: (a) receiving a dataset of a biological sample of the subject; (b) selecting one or more data analysis tools, wherein the one or more data analysis tools comprise an analysis tool selected from the group consisting of: a BIG-C™ big data analysis tool, an LScope™ big data analysis tool, a T-Scope™ big data analysis tool, a CellScan big data analysis tool, an MS (Molecular Signature) Scoring ™ analysis tool, a Gene Set Variation Analysis (GSVA) tool (e.g., P-Scope), a CoLTs® (Combined Lupus Treatment Scoring) analysis tool, and a Target Scoring analysis tool, or a combination thereof; (c) processing the dataset using the one or more data analysis tools to generate a data signature of the biological sample of the subject; and (d) based at least in part on the data signature generated in (c), assessing the condition of the subject. For use in the context of the methods set forth in the present disclosure, any tools and methods known to those in the skill of the art may be applied, e.g., as described in “Machine Learning Disease Prediction and Treatment Prioritization,” published as U.S. Pat. App. Pub. No. 2021/0104321 (and WO 2020/102043), incorporated herein by reference in its entirety.

[0084] In some embodiments, the dataset comprises mRNA gene expression or transcriptome data, DNA genomic data, proteomic data, metabolomic data, or a combination thereof. In some embodiments, the biological sample comprises a whole blood (WB) sample, a PBMC sample, a tissue sample, a cell sample, or any derivative thereof. In some embodiments, assessing the condition of the subject comprises identifying a disease or disorder of the subject.

[0085] In some embodiments, the method further comprises identifying a disease or disorder of the subject at a sensitivity or specificity of at least about 70%. In some embodiments, the method further comprises determining a likelihood of the identification of the disease or disorder of the subject. In some embodiments, the method further comprises providing a therapeutic intervention for the disease or disorder of the subject. In some embodiments, the method further comprises monitoring the disease or disorder of the subject, wherein the monitoring comprises assessing the disease or disorder of the subject at a plurality of time points, wherein the assessing is based at least on the disease or disorder identified at each of the plurality of time points.

[0086] In some embodiments, selecting the one or more data analysis tools comprises receiving a user selection of the one or more data analysis tools. In some embodiments, selecting the one or more data analysis tools is automatically performed by the computer without receiving a user selection of the one or more data analysis tools.

[0087] In another aspect, the present disclosure provides a computer system for assessing a condition of a subject, comprising: a database that is configured to store a dataset of a biological sample of the subject; and one or more computer processors operatively coupled to the database, wherein the one or more computer processors are individually or collectively programmed to: (i) select one or more data analysis tools, wherein the one or more data analysis tools comprise an analysis tool selected from the group consisting of: a BIG-C™ big data analysis tool, an I-Scope™ big data analysis tool, a T-Scope™ big data analysis tool, a CellScan big data analysis tool, an MS (Molecular Signature) Scoring ™ analysis tool, a Gene Set Variation Analysis (GSVA) tool (e.g., P- Scope), a CoLTs® (Combined Lupus Treatment Scoring) analysis tool, and a Target Scoring analysis tool; (ii) process the dataset using the one or more data analysis tools to generate a data signature of the biological sample of the subject; and (iii) based at least in part on the data signature generated in (ii), assess the condition of the subject.

[0088] In another aspect, the present disclosure provides a non-transitory computer readable medium comprising machine- executable code that, upon execution by one or more computer processors, implements a method for assessing a condition of a subject, the method comprising: (a) receiving a dataset of a biological sample of the subject; (b) selecting one or more data analysis tools , wherein the one or more data analysis tools comprise an analysis tool selected from the group consisting of: a BIG-C™ big data analysis tool, an I-Scope™ big data analysis tool, a T-Scope™ big data analysis tool, a CellScan big data analysis tool, an MS (Molecular Signature) Scoring ™ analysis tool, a Gene Set Variation Analysis (GSVA) tool (e.g., P-Scope), a CoLTs® (Combined Lupus Treatment Scoring) analysis tool, and a Target Scoring analysis tool; (c) processing the dataset using the one or more data analysis tools to generate a data signature of the biological sample of the subject; and (d) based at least in part on the data signature generated in (c), assessing the condition of the subject. In any embodiment described herein, the one or more data analysis tools may be a plurality of data analysis tools each independently selected from a BIG-C™ big data analysis tool, an I-Scope™ big data analysis tool, a T-Scope™ big data analysis tool, a CellScan big data analysis tool, an MS (Molecular Signature) Scoring ™ analysis tool, a Gene Set Variation Analysis (GSVA) tool (e.g., P-Scope), a CoLTs® (Combined Lupus Treatment Scoring) analysis tool, and a Target Scoring analysis tool.

[0089] In an aspect, the present disclosure provides systems and methods for using bioinformatics approaches to deconvolute bulk mRNA for various cells and processes involved in lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) organ pathology, including inflammatory cells, endothelial cells, tissue cells.

[0090] In an aspect, the present disclosure provides systems and methods for the delineation of the altered metabolism of cells by using gene expression analysis.

[0091] In an aspect, the present disclosure provides systems and methods for using various regression models (e.g., classification and regression trees, linear regression, step-wise regression) to dissect the specific metabolic alterations in individual cell types.

[0092] In an aspect, the present disclosure provides systems and methods for using animal models and the ability to translate mouse gene expression into the human equivalent to confirm the results in humans and also analyze the effects of treatment.

[0093] In an aspect, the present disclosure provides systems and methods for the delineation of the role of specific cells (myeloid cells) and processes (interferon, mitochondrial dysfunction) in lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) tissue pathology.

[0094] In an aspect, the present disclosure provides systems and methods for using non-lymphocyte populations in skin and kidney toward diagnostic and/or prognostic biopsy tests.

[0095] In an aspect, the present disclosure provides systems and methods for defining gene signatures in individual cell types in a mixed population such as blood or tissue (e.g., skin, kidney).

[0096] In an aspect, the present disclosure provides systems and methods for analyzing sets of metabolism genes and their relationship to function and cell type, including subsets of myeloid cells.

[0097] Another aspect of the present disclosure provides a non-transitory computer readable medium comprising machine executable code that, upon execution by one or more computer processors, implements any of the methods above or elsewhere herein. [0098] Another aspect of the present disclosure provides a system comprising one or more computer processors and computer memory coupled thereto. The computer memory comprises machine executable code that, upon execution by the one or more computer processors, implements any of the methods above or elsewhere herein.

[0099] Additional aspects and advantages of the present disclosure will become readily apparent to those skilled in this art from the following detailed description, wherein only illustrative embodiments of the present disclosure are shown and described. As will be realized, the present disclosure is capable of other and different embodiments, and its several details are capable of modifications in various obvious respects, all without departing from the disclosure. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.

INCORPORATION BY REFERENCE

[0100] All publications, including any supplementary materials, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference.

BRIEF DESCRIPTION OF THE DRAWINGS

[0101] The patent application file contains at least one drawing executed in color. Copies of this patent application publication with color drawings(s) will be provided by the Office upon request and payment of the necessary fee.

[0102] The novel features of the disclosure are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present disclosure will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the disclosure are utilized, and the accompanying drawings of which:

[0103] FIGs. 1A-1I show that dysregulation of metabolic gene signatures is common among lupus- affected tissues. FIG. 1A: Comparison of DEGs among DLE, class III/IV LN GL, and class III/IV LN TI (lupus nephritis tubulointerstitial inflammation). FIG. IB: MCODE protein-protein interactions of common UP and DOWN DEGs were generated with Cytoscape using the STRING and ClusterMaker2 plugins and annotated with BIG-C functional categories (odds ratio (OR) > 1, p < 0.05) in Adobe Illustrator. Overlap p-value was calculated using Fisher’s exact test. GSVA of signatures for glycolysis (FIG. 1C), the PPP (FIG. ID), the TCA cycle (FIG. IE), OXPHOS (FIG. IF), FAAO (FIG. 1G), FABO (FIG. 1H), and AA metabolism (FIG. II) in lupus tissues and controls (CTLs; unshaded plot on left in each comparison).

[0104] FIGs. 2A-2C show that increased myeloid cell signatures and decreased non-hematopoietic cell signatures characterize the majority of lupus patients. FIG. 2A: Hedges’ g effect sizes of immune and non-hematopoietic cell signatures in DLE, class III/IV LN GL, and class III/IV LN TI as compared to tissue CTLs. Significant p-values reflect significant differences in enrichment of the immune cell signatures or non-hematopoietic cell signatures in lupus tissues as compared to CTL as determined by Welch’s t-test with Bonferroni correction (FIG. 9). FIG. 2B: R 2 values derived from linear regression of the monocyte-derived macrophage or the tissue-resident macrophage markers with the monocyte/MC GSVA scores in individual patients and CTLs from lupus-affected tissues (FIGs. 11A-11B). Significant p-values reflect significantly non-zero slopes. FIG. 2C: Pearson correlation coefficients between tissue-resident macrophage markers in LN.

[0105] FIGs. 3A-3U show that metabolic and cellular signature changes in class II LN GL are similar to those seen in class III/IV. GSVA of metabolic pathway signatures (FIGs. 3A-3G) and cell signatures (FIGs. 3H-3T) in all classes of LN GL. Each point represents an individual sample. Significant differences in enrichment of the metabolic signatures, immune cell signatures, or non- hematopoietic cell signatures between class II LN GL and CTL, class III/IV LN GL and CTL, and class II LN GL and class III/IV LN GL were performed by Welch’s t-test with Bonferroni correction. FIG. 3U: Hierarchical clustering (k = 4) of all glomerular samples.

[0106] FIGs. 4A-4H show that metabolic gene expression changes in LN GL are associated with changes in the EC, kidney cell, and fibroblast gene signatures. FIG. 4A: Stepwise regression coefficients and FIGs. 4B-4H CART analysis for metabolic pathway signatures in all glomerular LN samples and CTLs.

[0107] FIGs. 5A-5O show that mitochondrial and peroxisomal signature changes and local hypoxia contribute to changes in metabolic gene expression in specific cells. GSVA of signatures for mitochondria- (FIGs. 5A-5F) or peroxisome-related gene signatures (FIGs. 5G-5H) in lupus tissues and CTLs. Each point represents an individual sample. FIGs. 5I-5K: Stepwise regression coefficients for mitochondrial and peroxisomal signatures in all tissues and CTLs. FIG. 5L: GSVA of HIF1A in lupus tissues and CTLs. Each point represents an individual sample. FIGs. 5M-5O: Stepwise regression coefficients for metabolic pathway signatures with the addition of HIF1A in all tissues and CTLs.

[0108] FIGs. 6A-6H show that metabolic gene expression changes occur independent of acute IFN stimulation in murine LN. FIG. 6A: GSVA of the IGS in the kidney of IFNα-accelerated NZB/W mice (GSE86423). FIGs. 6B-6H: GSVA of metabolic signatures and linear regression between the IGS and metabolic signature GSVA scores.

[0109] FIGs. 7A-7E show that metabolic gene expression changes in murine LN are corrected with immunosuppressive treatment. GSVA of metabolism signatures in the kidney of NZM2410 (GSE32583, GSE49898) (FIG. 7A), NZB/W (GSE32583, GSE49898) (FIG. 7B), IFNα-accelerated NZB/W (GSE72410) (FIG. 7C), MRL/lpr (GSE153021) (FIG. 7D), and NZW/BXSB (GSE32583, GSE49898) mice (FIG. 7E) with and without treatment.

[0110] FIGs. 8A-8F show that cellular and metabolic gene expression changes correlate with expression of genes indicating tubular damage in human and murine LN. Log2 expression of HAVCRl/Havcrl (FIG. 8A) and LCN2/Lcn2 (FIG. 8B) in human LN TI and the kidneys of (NZM2410 (GSE32583, GSE49898), NZB/W (GSE32583, GSE49898), IFNα-accelerated NZB/W (GSE86423), IFNα-accelerated NZB/W (GSE72410), and MRL/lpr (GSE153021) mice.

[0111] FIGs. 9A-9O show that increased myeloid cell signatures and decreased tissue cell signatures characterize the majority of lupus patients. GSVA of signatures for granulocytes (FIG. 9A), pDCs (FIG. 9B), dendritic cells (FIG. 9C), monocyte/MCs (FIG. 9D), T cells (FIG. 9E), B cells (FIG. 9F), plasma cells (FIG. 9G), platelets (FIG. 9H), immune cells (FIG. 91) with expression found only in DLE, endothelial cells (FIG. 9J), fibroblasts (FIG. 9K), skin cells (FIG. 9L), kidney cells (FIG. 9M), glomerular cells (FIG. 9N), and tubule cells (FIG. 90) in lupus tissues and CTLs.

[0112] FIG. 10 shows that anergic/ Activated T cell marker genes have no change in expression in LN class IILTV. Log2 expression of CD 160, CD244, CTLA4, ICOS, KLRG1, LAG3, and PDCD1 in lupus tissues and CTLs.

[0113] FIGs. 11A-11B show that monocyte/MC gene signatures reflect both monocyte-derived macrophage and tissue-resident macrophage populations. Linear regression between the monocyte/MC GSVA score and FCN1 expression (FIG. 11A) or TRM marker expression (FIG. 11B) in lupus-affected tissues.

[0114] FIG. 12 shows that metabolic and cellular gene expression changes in class II LN GL are similar to those seen in class III/IV. Hierarchical clustering (k = 4) of class II LN GL samples (n = 8).

[0115] FIGs. 13A-13U show that metabolic and cellular gene expression changes in class II LN TI are less robust than those seen in class III/IV. GSVA of metabolic pathway signatures (FIGs. 13A- 13G) and cell signatures (FIGs. 13H-13T) in all classes of LN TI. Each point represents an individual sample. Significant differences in enrichment of the metabolic signatures, immune cell signatures, or non-hematopoietic cell signatures between class II LN TI and CTL, class III/IV LN TI and CTL, and class II LN TI and class III/IV LN TI were performed by Welch’s t-test with Bonferroni correction. FIG. 13U: Hierarchical clustering (k = 4) of all tubulointerstitial samples.

[0116] FIG. 14 shows that metabolic and cellular gene expression changes in some class II LN TI patients are similar to those seen in class III/IV patients. Hierarchical clustering (k = 4) of class II LN TI samples (n = 8). Each of FIGs. 13A-13T shows, from left to right: CTL points; class II LN TI points; and class II/IV LN TI points.

[0117] FIGs. 15A-15B show that numerous cellular gene signatures contribute to the observed metabolic changes in DLE. FIG. 15A: Stepwise regression coefficients for metabolic pathway GSVA scores in all samples for DLE and CTLs. For stepwise repression the pDC, skin-specific DC, monocyte/MC, T Cell, anergic/activated T cell, B cell, and plasma cell signatures were combined into the “inflammatory cell” signature because of collinearity. FIG. 15B: Hierarchical clustering (k=2) of all skin samples.

[0118] FIGs. 16A-16H show that metabolic gene expression changes in LN TI are associated with changes in the kidney cell, proximal tubule, and monocyte/MC gene signatures. Stepwise regression coefficients (FIG. 16A) and CART (FIGs. 16B-16H) analysis for metabolic pathway signatures in all tubulointerstitial LN samples and CTLs.

[0119] FIG. 17 shows that metabolic genes are altered in scRNA-seq from LN biopsies. DEGs related to metabolism in scRNA-seq clusters (CM2 left panel: tissue-resident macrophages, CTOa, center panel: effector memory CD4+ T cells, and CEO, right panel: epithelial cells) that were present in both LN patients and CTL samples from Arazi et al (Ref. 30).

[0120] FIGs. 18A-18Q show that cellular gene expression changes in NZM2410 kidneys may be corrected with immunosuppressive treatment. GSVA of immune (FIGs. 18A-18H) and non- hematopoietic (FIGs. 18I-18Q) cell signatures in the kidneys of NZM2410 mice (GSE32583, GSE49898) with and without treatment.

[0121] FIGs. 19A-19R show that cellular gene expression changes in NZB/W kidneys may be corrected with immunosuppressive treatment. GSVA of immune (FIGs. 19A-19I) and non- hematopoietic (FIGs. 19J-19R) cell signatures in the kidneys of NZB/W mice (GSE32583, GSE49898) with and without treatment. From left to right in each graph: plot 1 -

[0122] FIGs. 20A-20S show that immune/inflammatory cell gene expression is increased and proximal tubule cell gene expression is decreased in IFNα-accelerated NZB/W kidneys. GSVA of immune (FIGs. 20A-20J) and non-hematopoietic (FIGs. 20K-20S) cell signatures in the kidneys of IFNα-accelerated NZB/W mice (GSE86423).

[0123] FIGs. 21A-21S show that cellular gene expression changes in IFNα-accelerated NZB/W kidneys may be corrected with immunosuppressive treatment. GSVA of immune (FIGs. 21A-21 J) and non-hematopoietic (FIGs. 21K-21S) cell signatures in the kidneys of IFNα-accelerated NZB/W mice (GSE72410) with and without treatment. [0124] FIGs. 22A-22R show that cellular gene expression in the MRL/lpr kidney is not significantly altered. GSVA of immune (FIGs. 22A-22I) and non-hematopoietic (FIGs. 22J-22R) cell signatures in the kidneys of MRL/lpr mice (GSE 153021) with and without treatment.

[0125] FIGs. 23A-23Q show that immune/inflammatory cell gene expression is increased and kidney cell and proximal tubule cell gene expression is decreased in NZW/BXSB kidneys. GSVA of immune (FIGs. 23A-23H) and non-hematopoietic (FIGs. 23I-23Q) cell signatures in the kidneys of NZW/BXSB mice (GSE32583, GSE49898).

[0126] FIGs. 24A-24F show that cellular gene expression changes in murine LN correlate with metabolic gene signatures. Pearson correlation coefficients for all metabolic pathway and cellular GSVA scores in all samples of each murine LN model NZM2410 (GSE32583, GSE49898) (FIG. 24A), NZB/W (GSE32583, GSE49898) (FIG. 24B), IFNα-accelerated NZB/W (GSE86423) (FIG. 24C), IFNα-accelerated (GSE72410) (FIG. 24D), MRL/lpr (GSE 153021) (FIG. 24E), and NZW/BXSB (GSE32583, GSE49898) (FIG. 24F).

[0127] FIG. 25 shows that cellular and metabolic gene expression changes correlate with expression of genes indicating tubular damage in murine LN. Correlation between Haver 1 or Lcn2 gene expression and GSVA scores for kidney cell, proximal tubule, and TCA cycle in all samples from the kidneys of NZM2410 (GSE32583, GSE49898), NZB/W (GSE32583, GSE49898), IFNα- accelerated NZB/W (GSE86423), IFNα-accelerated NZB/W (GSE72410), MRL/lpr (GSE153021), and NZW/BXSB (GSE32583) mice.

[0128] FIGs. 26A-26F show alteration/dysregulation of metabolic gene signatures in lupus, psoriasis, atopic dermatitis, and scleroderma-affected tissues. Each graph shows comparison of DEGs among class III/IV LN GL (violin plot 2), class III/IV LN TI (violin plot 4), DLE (violin plot 6), PSO (violin plot 8), AD (violin plot 10), and SSc (violin plot 12), and respective controls (unshaded violin plots 1, 3, 5, 7, 9 and 11 in each panel). The graphs show GSVA of signatures for glycolysis (FIG. 26A), the PPP (FIG. 26B), the TCA cycle (FIG. 26C), OXPHOS (FIG. 26D), FABO (FIG. 26E), and AA metabolism in lupus tissues and controls (CTLs) (FIG. 26F). Each point represents an individual sample. Numbers below each tissue indicate the number of lupus patients with enrichment scores 1 SD less than (< 1SD) or greater than (> 1SD) the CTL mean. Significant p- values reflect significant differences in GSVA enrichment of the metabolic or cellular signatures in each lupus tissue as compared to CTL in was determined by Welch’s t-test with Bonferroni correction. **, p < 0.01; ***, p < 0.001; ****, p < 0.0001. See methods described in relation to FIGs. 1A-1I, Example 1.

[0129] FIGs. 27A and 27B show that increased immune cell signatures and decreased non- hematopoietic cell signatures characterize the majority of lupus patients. FIG. 27A: Hedges’ g effect sizes of immune cell signatures in class III/IV LN GL, class III/IV LN TI, DLE, PSO, AD, and SSc as compared to tissue CTLs. FIG. 27B: Hedges’ g effect sizes of non-hematopoietic cell signatures in class III/IV LN GL, class III/IV LN TI, DLE, PSO, AD, and SSc as compared to tissue CTLs. Significant p-values reflect significant differences in GSVA enrichment of the metabolic or cellular signatures in each lupus tissue as compared to CTL was determined by Welch’s t-test with Bonferroni correction. **, p < 0.01; ***, p < 0.001; ****, p < 0.0001. See methods described in relation to FIG. 2A, Example 1.

[0130] FIGs. 28A-28C show that metabolic and cellular gene signatures are concurrently altered in the tissues of inflammatory skin diseases, with different metabolic changes reflecting different cellular signatures. Stepwise regression coefficients are shown for the glycolysis (FIG. 28A), TCA cycle (FIG. 28B), and FABO (FIG. 28C) signatures in class ILIV LN GL, class ILIV LN TI, DLE, PSO, AD, SSc and tissue CTLs. Significant p-values reflect significant coefficients in the stepwise regression model. *, p < 0.05; **, p < 0.01; ***, p < 0.001; ****, p < 0.0001. See methods described in relation to FIGs. 15A and 16A, Example 1.

[0131] FIGs. 29A - 29B show DLE is characterized by enrichment of inflammatory cell and cytokine signatures, including the IFN, IL- 12, and TNF signatures. FIG. 29A: Hierarchical clustering (k=4 clusters)of DLE and healthy control samples from five lupus datasets using GSVA enrichment scores of cellular and pathway gene signatures. FIG. 29B: Hedges’ g effect sizes of cellular (left) and pathway (right) gene signatures for DLE compared to healthy control samplesin five lupus datasets. Heatmap visualization uses red (enriched signature, >0) and blue (decreased signature, <0). Welch’s t-test: * p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001.

[0132] FIGs. 30A - 30K show enrichment of myeloid, lymphoid, IFN, IL- 12, IL-23, and TNF signatures is shared among DLE, PSO, AD, and SSc. FIG. 30A: Hedges’ g effect sizes of cellular (left) and pathway (right) gene signatures for disease samples compared to their respective control samples in five DLE, three PSO, two AD and three SSc datasets. Heatmap visualization uses red (enriched signature, >0) and blue (decreased signature, <0). Welch’s t-test: * p < 0.05; ** p < 0.01; *** p < 0.001; 0.0001. CART analysis for disease or control classification using GSVA enrichment scores in lesional: lesional DLE (FIG. 30B), lesional PSO (FIG. 30C), lesional AD (FIG. 30D) and lesional SSc (FIG. 30E), and non-lesional (NL) DLE (FIG. 30F), NL PSO (FIG. 30G), and NL AD (FIG. 30H). FIGs. 30I-K CART of nonlesional skin that was pooled without z- score normalization and non-lesional (NL) DLE (FIG. 301), NL PSO (FIG. 30 J), and NL AD (FIG. 30K). Sample numbers below bottom leaves represent the number of samples of each group classified into that leaf.

[0133] FIGs. 31A - 31B show that analysis of cellular and molecular pathway signatures in lesional DLE shows increased expression of inflammatory pathways regulated by, e.g., monocytes, B cells, T cells and plasmacytoid dendritic cells (pDC). GSVA enrichment scores (y-axis) of (FIG. 31A) cellular gene signatures and (FIG. 31B) pathway gene signatures in five datasets including DLE samples and control samples. The number of DLE samples per dataset that he -1 standard deviation of the average of the control samples is denoted on the first subtext line. The number of DLE samples per dataset that lie +1 standard deviation of the average of the control samples is denoted on the second subtext line. Welch’s T-test: * p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001, as indicated where observed for a given pair of violin plots by a bracket and corresponding number of asterisks above the plots. Plots for DLE samples are shown in dark gray (the right plot of each pair of violin plots). Plots for control samples (CTL) are shown in light gray (the left plot of each pair of violin plots). In each panel, each pair of violin plots corresponds to analysis of NCBI Gene Expression Omnibus dataset, from left to right, GSE52471, GSE72535, GSE81071, GSE81071(2), and GSE109248. Dotted horizontal line indicates GSVA enrichment score of 0, with positive scores above and negative scores below. In FIG. 31A the panels show GSVA scores for cell types, in each row from left to right: row 1 (top row) - granulocyte, neutrophil, LDG, skinspecific DC, Langerhans; row 2 - pDC, monocyte, monocyte/myeloid, NK cell, T cell; row 3 - B cell, GC B cell, plasma cell, platelet, erythrocyte; row 4: endothelial cell, fibroblast, keratinocyte, melanocyte. In FIG. 31B the panels show GSVA scores for pathways, in each row from left to right: row 1 (top row) - IFN, IL-1 cytokines, IL- 12 complex, T cell IL- 12 signature, IL- 12, IL- 17 complex; row 2 - IL-21 complex, IL-23 complex, T cell IL-23 signature, TGFB fibroblast, TNF, Thl7; row 3 - anti-inflammation, complement proteins, inflammasome, ROS production, apoptosis, cell cycle; row 4 - immunoproteasome, proteasome, unfolded protein, glycolysis, pentose phosphate, TCA cycle; row 5 - OXPHOS, FAAO, FABO, AA metabolism, peroxisome.

[0134] FIGs. 32A - 32B show that analysis of cellular and molecular pathway signatures in lesional PSO shows increased expression of keratinocyte cell signatures as well as TNF and Thl7 pathway gene signatures. GSVA enrichment scores (y-axis) of (FIG. 32A) cellular gene signatures and (FIG. 32B) pathway gene signatures in three datasets including PSO samples and control samples. The number of PSO samples per dataset that he -1 standard deviation of the average of the control samples is denoted on the first subtext line. The number of PSO samples per dataset that lie +1 standard deviation of the average of the control samples is denoted on the second subtext line. Welch’s T-test: * p< 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001, as indicated where observed for a given pair of violin plots by a bracket and corresponding number of asterisks above the plots. Plots for PSO samples are the right plot of each pair of violin plots. Plots for control samples (CTL) are the left plot of each pair of violin plots. In each panel, each pair of violin plots corresponds to analysis of NCBI Gene Expression Omnibus dataset, from left to right, GSE52471, GSE109248, and GSE121212. Dotted horizontal line indicates GSVA enrichment score of 0, with positive scores above and negative scores below. In FIG. 32A the panels show GSVA scores for cell types, in each row from left to right: row 1 (top row) - granulocyte, neutrophil, LDG, skin-specific DC, Langerhans; row 2 - pDC, monocyte, monocyte/myeloid, NK cell, T cell; row 3 - B cell, GC B cell, plasma cell, platelet, erythrocyte; row 4: endothelial cell, fibroblast, keratinocyte, melanocyte. In FIG. 32B the panels show GSVA scores for pathways, in each row from left to right: row 1 (top row) - IFN, IL-1 cytokines, IL- 12 complex, T cell IL- 12 signature, IL- 12, IL- 17 complex; row 2 - IL-21 complex, IL-23 complex, T cell IL-23 signature, TGFB fibroblast, TNF, Thl7; row 3 - antiinflammation, complement proteins, inflammasome, ROS production, apoptosis, cell cycle; row 4 - immunoproteasome, proteasome, unfolded protein, glycolysis, pentose phosphate, TCA cycle; row 5

- OXPHOS, FAAO, FABO, AA metabolism, peroxisome.

[0135] FIGs. 33A- 33B show that analysis of cellular and molecular pathway signatures in lesional AD shows increased expression of skin-specific dendritic cell, B cell and IL12 inflammatory pathway gene signatures. GSVA enrichment scores of (FIG. 33A) cellular gene signatures and (FIG. 33B) pathway gene signatures in two datasets including AD samples and control samples. The number of AD samples per dataset that he -1 standard deviation of the average of the control samples is denoted on the first subtext line. The number of AD samples per dataset that lie +1 standard deviation of theaverage of the control samples is denoted on the second subtext line.

Welch’s T-test: * p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001, as indicated where observed for a given pair of violin plots by a bracket and corresponding number of asterisks above the plots. Plots for AD samples are the right plot of each pair of violin plots. Plots for control samples (CTL) are the left plot of each pair of violin plots. In each panel, each pair of violin plots corresponds to analysis of NCBI Gene Expression Omnibus dataset, from left to right, GSE130588 and GSE121212. Dotted horizontal line indicates GSVA enrichment score of 0, with positive scores above and negative scores below. In FIG. 33A the panels show GSVA scores for cell types, in each row from left to right: row 1 (top row) - granulocyte, neutrophil, LDG, skin-specific DC, Langerhans; row 2 - pDC, monocyte, monocyte/myeloid, NK cell, T cell; row 3 - B cell, GC B cell, plasma cell, platelet, erythrocyte; row 4: endothelial cell, fibroblast, keratinocyte, melanocyte. In FIG. 33B the panels show GSVA scores for pathways, in each row from left to right: row 1 (top row) - IFN, IL-1 cytokines, IL- 12 complex, T cell IL- 12 signature, IL- 12, IL- 17 complex; row 2 - IL-21 complex, IL-23 complex, T cell IL-23 signature, TGFB fibroblast, TNF, Thl7; row 3 - antiinflammation, complement proteins, inflammasome, ROS production, apoptosis, cell cycle; row 4 - immunoproteasome, proteasome, unfolded protein, glycolysis, pentose phosphate, TCA cycle; row 5

- OXPHOS, FAAO, FABO, AA metabolism, peroxisome.

[0136] FIGs. 34A - 34B show that analysis of cellular and molecular pathway signatures in lesional SSc samples show increased expression of myeloid-specific cell and TGFβ fibroblast gene signatures. GSVA enrichment scores of (FIG. 34A) cellular gene signatures and (FIG. 34B) pathway gene signatures in three datasets including SSc samples and control samples. The number of SSc samples per dataset that he -1 standard deviation of the average of the control samples is denoted on the first subtext line. The number of SSc samples per dataset that he +1 standard deviation of the average of the control samples is denoted on the second subtext line. Welch’s T- test: * p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001, as indicated where observed for a given pair of violin plots by a bracket and corresponding number of asterisks above the plots. Plots for SSc samples are shown in dark gray (the right plot of each pair of violin plots). Plots for control samples (CTL) are shown in light gray (the left plot of each pair of violin plots). In each panel, each pair of violin plots corresponds to analysis of NCBI Gene Expression Omnibus dataset, from left to right, GSE58095, GSE95065, and GSE130955. Dotted horizontal line indicates GSVA enrichment score of 0, with positive scores above and negative scores below. In FIG. 34A the panels show GSVA scores for cell types, in each row from left to right: row 1 (top row) - granulocyte, neutrophil, LDG, skin-specific DC, Langerhans; row 2 - pDC, monocyte, monocyte/myeloid, NK cell, T cell; row 3 - B cell, GC B cell, plasma cell, platelet, erythrocyte; row 4: endothelial cell, fibroblast, keratinocyte, melanocyte. In FIG. 34B the panels show GSVA scores for pathways, in each row from left to right: row 1 (top row) - IFN, IL-1 cytokines, IL- 12 complex, T cell IL- 12 signature, IL- 12, IL- 17 complex; row 2 - IL-21 complex, IL-23 complex, T cell IL-23 signature, TGFB fibroblast, TNF, Thl7; row 3 - anti-inflammation, complement proteins, inflammasome, ROS production, apoptosis, cell cycle; row 4 - immunoproteasome, proteasome, unfolded protein, glycolysis, pentose phosphate, TCA cycle; row 5 - OXPHOS, FAAO, FABO, AA metabolism, peroxisome.

[0137] FIGs. 35A - 35H shows ML effectively classifies lesional skin samples from DLE, PSO, AD, and SSc. ROC curve (FIG. 35A) and PR curve(FIG. 35B) of lesional DLE, lesional PSO, lesional AD, and lesional SSc samples compared to pooled control samples using all cellular and pathway gene signatures. Top 15 features important in classifying: (FIG. 35C) lesional DLE, (FIG. 35D) lesional PSO (FIG. 35E), lesional AD, and (FIG. 35F) lesional SSc from pooled control samples using Gini feature importance. (FIG. 35G) Comparison of the top 15 features for classifying each lesional disease compared to control using Gini feature importance. (FIG. 35H, Table 7) Classification metrics to properly separate DLE, PSO, AD or SSc and control samples using all 48 (top) or the top 15 (bottom) cellular and pathway gene signatures. Refer to Tables 5A-B for ML details. Collinear features were removed (FIG. 37). The AUC values of the ROC curves (FIG. 35A) for lesional DLE vs. control, lesional PSO vs. control, lesional AD vs. control, and lesional SSc vs. control classification are 0.977, 0.977, 0.963 and 0.965 respectively. The AUC values of the PR curves (FIG. 35B) for lesional DLE vs. control, lesional PSO vs. control, lesional AD vs. control, and lesional SSc vs. control are 0.972, 0.982, 0.970 and 0.968 respectively. Top 15 features important in classifying lesional DLE vs. control (FIG. 35C) are (in order of gini index, highest to lowest) IFN, TNF, IL-23 Complex, Plasma Cell, T Cell IL- 12 signature, IL- 12 Complex, Monocyte, Inflammasome, Unfolded Protein, B Cell, T Cell, pDC, Anti-inflammation, Immunoproteasome, and T Cell IL-23 signature. Top 15 features important in classifying lesional PSO vs. control (FIG. 35D) are (in order of gini index, highest to lowest) Cell Cycle, TNF, IL- 12 Complex, Inflammasome, IFN, IL-23 complex, Apoptosis, Keratinocyte, Anti-inflammation, T Cell IL-23 signature, Proteasome, Unfolded Protein, Neutrophil, Pentose Phosphate, and Plasma Cell. Top 15 features important in classifying lesional AD vs. control (FIG. 35E) are (in order of gini index, highest to lowest) IL- 12 Complex, TNF, IFN, T Cell IL- 12 signature, Anti-inflammation, Inflammasome, Plasma Cell, IL-23 Complex, IL-21 Complex, T Cell IL-23 signature, Glycolysis, Immunoproteasome, Monocyte/Myeloid Cell, Cell Cycle and Apoptosis. Top 15 features important in classifying lesional SSc vs. control (FIG. 35F) are (in order of gini index, highest to lowest) Plasma Cell, IFN, TNF, ROS production, Unfolded Protein, IL- 12 Complex, Anti-inflammation, Apoptosis, TGFB Fibroblast, IL-23 Complex, Skin-specific DC, Granulocyte, pDC, IL- 17 Complex, and T Cell IL-23 Signature. From FIG. 35G shared features between Lesional DLE, PSO, AD and SSc are IFN, TNF, IL-23 Complex, Plasma Cell, IL-12 Complex, Anti-inflammation, and T Cell IL- 23 Signature; Lesional DLE only features are Monocyte, B cell and T cell; Lesional AD only features are IL-21 Complex, Glycolysis, Monocyte/Myeloid Cell; Lesional PSO only features are Keratinocyte, Proteasome, Neutrophil, and Pentose Phosphate; and Lesional SSc only features are ROS production, TGFB fibroblast, skin-specific DC, Granulocyte and IL-17 Complex.

[0138] FIGs. 36A - 36E shows ML accurately classifies lesional skin and control skin samples. ROC curve and PR curve of all ML algorithms to separate lesional samples from healthy control samples using all cellular and pathway gene signatures/ features. ML classifiers include: logistic regression (LR, blue), K-nearest neighbors (KNN, orange), randomforest (RF, green), naive bayes (NB, red), support-vector machine (SVM, purple) and gradient boosting (GB, brown). (FIG. 36A) DLE versus control; (FIG. 36B) PSO versus control; (FIG. 36C) AD versus control; and (FIG. 36D) SSc versus control. FIG. 36E: Classification metrics including sensitivity, specificity, Cohen Kappa score, precision, f-1 score and accuracy to properly separate lesional disease samples (DLE, PSO, AD or SSc) from healthy control sampleswith each ML classifier (Table 8). Refer to Tables 5A-B for details about ML. Collinear features were removed (FIG. 37). The AUC values of the ROC curves (FIG. 36A) for lesional DLE vs. control classification, for ML classifiers LR, KNN, RF, NB, SVM, and GB are 0.959, 0.975, 0.977, 0.974, 0.959, and 0.949 respectively. The AUC values of the PR curves (FIG. 36A) for lesional DLE vs. control classification, for ML classifiers LR, KNN, RF, NB, SVM, and GB are 0.954, 0.963, 0.972, 0.971, 0.962, and 0.944 respectively. The AUC values of the ROC curves (FIG. 36B) for lesional PSO vs. control classification, for ML classifiers LR, KNN, RF, NB, SVM, and GB are 0.986, 0.972, 0.977, 0.983, 0.984, and 0.978 respectively. The AUC values of the PR curves (FIG. 36B) for lesional PSO vs. control classification, for ML classifiers LR, KNN, RF, NB, SVM, and GB are 0.988, 0.980, 0.982, 0.986, 0.986, and 0.982 respectively. The AUC values of the ROC curves (FIG. 36C) for lesional AD vs. control classification, for ML classifiers LR, KNN, RF, NB, SVM, and GB are 0.955, 0.936, 0.963, 0.945, 0.945, and 0.968 respectively. The AUC values of the PR curves (FIG. 36C) for lesional AD vs. control classification, for ML classifiers LR, KNN, RF, NB, SVM, and GB are 0.962, 0.961, 0.970, 0.959, 0.966, and 0.973 respectively. The AUC values of the ROC curves (FIG. 36D) for lesional SSc vs. control classification, for ML classifiers LR, KNN, RF, NB, SVM, and GB are 0.964, 0.967, 0.965, 0.952, 0.980, and 0.946 respectively. The AUC values of the PR curves (FIG. 36D) for lesional SSc vs. control classification, for ML classifiers LR, KNN, RF, NB, SVM, and GB are 0.967, 0.972, 0.968, 0.956, 0.983, and 0.955 respectively.

[0139] FIGs. 37A - 37D show correlated features from cellular and pathway signatures used to extract collinear features for lesional ML binary classifications. Correlation plots of GSVA enrichment scores of pooled control samples and pooled lesional (FIG. 37A) DLE, (FIG. 37B) PSO, (FIG. 37C) AD and (FIG. 37D) SSc samples. Black boxes indicate collinear samples with Pearson correlation coefficient greater than 0.8, then the feature with the lower correlation was removed using a greedy elimination approach.

[0140] FIGs. 38A - 38B show that direct comparison of DLE and PSO samples using GSVA shows key differences in enrichment of inflammatory cell and pathway signatures. (FIG. 38A) Hierarchical clustering (k=4 clusters) of GSVA enrichments scores of cellular and pathway gene signatures in two datasets including DLE, PSO and healthy control samples. (FIG. 38B) Heatmap of GSVA enrichment scores of DLE compared to PSO samples in two datasets of cellular (left) andpathway (right) gene signatures. Heatmap visualization uses red (enriched signature, >0) and blue (decreased signature, <0). Welch’s t-test: * p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001.

[0141] FIGs. 39A - 39F show that ML classification of DLE versus PSO, AD, and SSc confirms distinct disease- specific gene signatures. (FIG. 39A) ROC curve and (FIG. 39B) PR curve of lesional DLE samples compared to lesional PSO (purple) samples and lesional DLE samples compared to lesional AD samples (orange) using all cellular and pathway gene signatures. Top 15 features important in classifying (FIG. 39C) lesional DLE and lesional PSO, (FIG. 39D) lesional DLE and lesional AD, and FIG. 39E) lesional DLE and lesional SSc using Gini feature importance. (FIG. 39F, Table 9) Classification metrics to properly separate lesional DLE samples and lesional PSO orlesional AD samples using all 48 (top) or the top 15 (bottom) cellular and pathway gene signatures. Refer to Table 5A-B for ML details. Collinear features were removed (FIG. 41). The AUC values of the ROC curves (FIG. 39A) for lesional DLE vs. PSO, lesional DLE vs. AD, and lesional DLE vs. SSc classification are 0.902, 0.816, and 0.774 respectively. The AUC values of the PR curves (FIG. 39B) for lesional DLE vs. PSO, lesional DLE vs. AD, and lesional DLE vs. SSc classification are 0.845, 0.754, and 0.776 respectively. Top 15 features important in classifying lesional DLE vs. PSO (FIG. 39C) are (in order of gini index, highest to lowest) Amino Acid Metabolism, Fibroblast, Keratinocyte, NK Cell, Granulocyte, Cell Cycle, Proteasome, Plasma Cell, pDC, Pentose Phosphate, IL- 12, Monocyte, OXPHOS, Fatty Acid Alpha Oxidation and Glycolysis. Top 15 features important in classifying lesional DLE vs. AD (FIG. 39D) are (in order of gini index, highest to lowest) Glycolysis, TGFB Fibroblast, Langerhans Cell, Low Density Granulocyte, Cell Cycle, Melanocyte, Fibroblast, Complement Proteins, Amino Acid Metabolism, pDC, IFN, Monocyte, IL-21 Complex, Platelet and IL- 12 complex. Top 15 features important in classifying lesional DLE vs. SSc (FIG. 39E) are (in order of gini index, highest to lowest) Thl7, TGFB Fibroblast, IL-12, IFN, Fibroblast, T Cell, Low Density Granulocyte, Proteasome, Inflammasome, Glycolysis, ROS production, T Cell IL-23 Signature, pDC, IL-21 Complex, and Langerhans Cell.

[0142] FIGs. 40A - 40D show that ML accurately classifies lesional DLE from lesional PSO, AD and SSc. ROC curve and PR curve of all ML algorithms to separate lesional DLE from other inflammatory skin diseases using all cellular and pathway gene signatures/ features. ML classifiers include: logistic regression (LR, blue), K-nearest neighbors (KNN, orange), random forest (RF, green), naive bayes (NB, red), support-vector machine (SVM, purple) and gradient boosting (GB, brown). (FIG. 40A) DLE versus PSO; (FIG. 40B) DLE versus AD; and (FIG. 40C) DLE versus SSc. (FIG. 40D, Table 10) Classification metrics including sensitivity, specificity, Cohen Kappa score, precision, f-1 score and accuracy to properly separate lesional DLE samples from lesional PSO, AD, and SSc samples with each ML classifier. Referto Table 5A-B for details about ML. Collinear features were removed (FIG. 41). The AUC values of the ROC curves (FIG. 40A) for lesional DLE vs. PSO classification, for ML classifiers LR, KNN, RF, NB, SVM, and GB are 0.909, 0.919, 0.902, 0.853, 0.936, and 0.890 respectively. The AUC values of the PR curves (FIG. 40A) for lesional DLE vs. PSO classification, for ML classifiers LR, KNN, RF, NB, SVM, and GB are 0.851, 0.901, 0.845, 0.805, 0.907, and 0.849 respectively. The AUC values of the ROC curves (FIG. 40B) for lesional DLE vs. AD classification, for ML classifiers LR, KNN, RF, NB, SVM, and GB are 0.715, 0.911, 0.816, 0.780, 0.879, and 0.837 respectively. The AUC values of the PR curves (FIG. 40B) for lesional DLE vs. AD classification, for ML classifiers LR, KNN, RF, NB, SVM, and GB are 0.693, 0.880, 0.754, 0.755, 0.864, and 0.793 respectively. The AUC values of the ROC curves (FIG. 40C) for lesional DLE vs. SSc classification, for ML classifiers LR, KNN, RF, NB, SVM, and GB are 0.720, 0.816, 0.774, 0.689, 0.805, and 0.784 respectively. The AUC values of the PR curves (FIG. 40C) for lesional DLE vs. SSc classification, for ML classifiers LR, KNN, RF, NB, SVM, and GB are 0.790, 0.838, 0.776, 0.745, 0.846, and 0.802 respectively.

[0143] FIGs. 41A - 41C show correlated features from cellular and pathway signatures used to extract collinear features for lesional ML binary classifications compared to DLE. Correlation plots of GSVA enrichment scores of lesional DLE and lesional (FIG. 41A) PSO, (FIG. 41B) AD and (FIG. 41C) SSc samples. Correlations outlined in black were reduced to only includeone feature. Black boxes indicate collinear samples with Pearson correlation coefficient greater than 0.8, then the feature with the lower correlation was removed using a greedy elimination approach. [0144] FIGs. 42A - 42B show GSVA enrichment of lesional skin compared to nonlesional skin. Hedges’ g effect sizes of GSVA enrichment scores for paired lesional and nonlesional samples, including two DLE, four AD and three PSO datasets using (FIG. 42A) cellular gene signatures and (FIG. 42B) pathway gene signatures. Lesional samples werecompared to their respective nonlesional paired samples in DLE, AD and PSO. Heatmapvisualization uses red (enriched signature, >0) and blue (decreased signature, <0). Paired t-test: * p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001.

[0145] FIGs. 43A - 43G show that ML classification reveals nonlesional skin of DLE, PSO, and AD is distinct from control skin. (FIG. 43A) ROC curve and (FIG. 43B) PR curve of nonlesional DLE, nonlesional PSO, and nonlesional AD samples compared to pooled control samples using all cellular and pathway gene signatures. The top 15 features important in classifying (FIG. 43C) nonlesional DLE, (FIG. 43D) nonlesional PSO and (FIG. 43E) nonlesional AD and control samples using Gini feature importance. (FIG. 43F) Comparison of the top 15 features for classifying each nonlesional disease compared to control using Gini feature importance. (FIG. 43G, Table 11) Classification metrics to properly separate nonlesional DLE and control samples, nonlesional PSO and control samples, as well as nonlesional AD and control samples using all 48 (top) or the top 15 (bottom) cellular and pathway gene signatures. Refer to Tables 5A-B for ML details. Collinear features were removed (FIG. 46). The AUC values of the ROC curves (FIG. 43A) for non-lesional DLE vs. control, non-lesional PSO vs. control, and non-lesional AD vs. control, classification are 0.996, 0.859, and 0.922 respectively. The AUC values of the PR curves (FIG. 43B) for non-lesional DLE vs. control, non-lesional PSO vs. control, and non-lesional AD vs. control, are 0.997, 0.902, and 0.941 respectively. Top 15 features important in classifying non-lesional DLE vs. control (FIG. 43C) are (in order of gini index, highest to lowest) Unfolded Protein, Langerhans Cell, NK Cell, Plasma Cell, IL-12, B Cell, Fatty Acid Beta Oxidation, Melanocyte, IL-12 Complex, Inflammasome, Apoptosis, Peroxisome, IL-21 Complex, Amino Acid Metabolism, and TNF. Top 15 features important in classifying non-lesional PSO vs. control (FIG. 43D) are (in order of gini index, highest to lowest) Amino Acid Metabolism, Cell Cycle, IL-17 Complex, NK Cell, Thl7, OXPHOS, Proteasome, TGFB Fibroblast, Low Density Granulocyte, pDC, Skin-specific DC, Neutrophil, Unfolded Protein, Apoptosis, and GC B Cell. Top 15 features important in classifying non-lesional AD vs. control (FIG. 43E) are (in order of gini index, highest to lowest) OXPHOS, Anti-inflammation, Granulocyte, Keratinocyte, Apoptosis, Proteasome, Low-density Granulocyte, Pentose Phosphate, Monocyte/Myeloid Cell, Plasma Cell, Neutrophil!, T Cell IL-23 Signature, IL-1 Cytokines, Erythrocyte and Melanocyte. From FIG. 43F shared features between non-lesional DLE, PSO, and AD is Apoptosis; non-lesional DLE only features are Langerhans Cell, IL- 12, B Cell, Fatty Acid Beta Oxidation, IL-12 Complex, Inflammasome, Peroxisome, IL-21 Complex, and TNF; non-lesional AD only features are Anti -inflammation, Granulocyte, Keratinocyte, Pentose Phosphate, Monocyte/Myeloid Cell, T Cell IL-23 Signature, IL-1 Cytokines, and Erythrocyte; and non-lesional PSO only features are Cell Cycle, IL-17 Complex, Thl7, TGFB Fibroblast, pDC, Skinspecific DC, and GC B Cell.

[0146] FIG. 44A - 44D show that ML accurately separates nonlesional skin and control skingroups. ROC curve and PR curve of all machine learning classification algorithms to separate nonlesional samples from healthy control samples using all cellular and pathway gene signatures/ features. ML classifiers include: logistic regression (LR, blue), K-nearest neighbors (KNN, orange), random forest (RF, green), naive bayes (NB, red), support-vector machine (SVM, purple) and gradient boosting (GB, brown). (FIG. 44A) DLE versus control; (FIG. 44B) PSO versus control; and (FIG. 44C) AD versus control. (FIG. 44D, Table 12) Classification metrics including sensitivity, specificity, Cohen’s Kappa score, precision, f-1 score and accuracy to properly separate nonlesional disease samples (DLE, PSO or AD) from healthy control samples with each ML classifier. Refer to Tables 5A-B for details about ML. Collinear features were removed (FIG. 46). The AUC values of the ROC curves (FIG. 44A) for non-lesional DLE vs. control classification, for ML classifiers LR, KNN, RF, NB, SVM, and GB are 0.934, 0.958, 0.996, 0.942, 0.994, and 0.983 respectively. The AUC values of the PR curves (FIG. 44A) for non-lesional DLE vs. control classification, for ML classifiers LR, KNN, RF, NB, SVM, and GB are 0.912, 0.963, 0.997, 0.968, 0.995, and 0.987 respectively. The AUC values of the ROC curves (FIG. 44B) for non-lesional PSO vs. control classification, for ML classifiers LR, KNN, RF, NB, SVM, and GB are 0.840, 0.889, 0.859, 0.822, 0.883, and 0.832 respectively. The AUC values of the PR curves (FIG. 44B) for non-lesional PSO vs. control classification,, for ML classifiers LR, KNN, RF, NB, SVM, and GB are 0.885, 0.930, 0.902, 0.856, 0.925, and 0.886 respectively. The AUC values of the ROC curves (FIG. 44C) for non-lesional AD vs. control classification, for ML classifiers LR, KNN, RF, NB, SVM, and GB are 0.813, 0.836, 0.922, 0.771, 0.940, and 0.894 respectively. The AUC values of the PR curves (FIG. 44C) for non-lesional AD vs. control classification, for ML classifiers LR, KNN, RF, NB, SVM, and GB are 0.805, 0.842, 0.941, 0.793, 0.931, and 0.904 respectively.

[0147] FIGs. 45A - 45E show nonlesional DLE is distinct from PSO and AD. (FIG. 45A) ROC curve and (FIG. 45B) PR curve of nonlesional DLE samples compared to nonlesional PSO (purple) samples and nonlesional DLE samples compared to nonlesional AD samples (orange) using all cellular and pathway gene signatures. Top 15 features important in classifying (FIG. 45C) nonlesional DLE and nonlesional PSO and (FIG. 45D) nonlesional DLE and nonlesional AD using Gini feature importance. (FIG. 45E, Table 13) Classification metrics to properly separate DLE samples and PSO or AD samples using all 48 (top) or the top 15 (bottom) cellular and pathway gene signatures. Refer to Tables 5A-B for ML details. Collinear features were removed (FIG. 50). The AUC values of the ROC curves (FIG. 45A) for non-lesional DLE vs. PSO, and non-lesional DLE vs. AD, classification are 1 and 0.990 respectively. The AUC values of the PR curves (FIG. 45B) for non-lesional DLE vs. PSO, and non-lesional DLE vs. AD classification are 1 and 0.989 respectively. Top 15 features important in classifying non-lesional DLE vs. PSO (FIG. 45C) are (in order of gini index, highest to lowest) NK Cell, Amino Acid Metabolism, Plasma Cell, pDC, Inflammasome, Monocyte/Myeloid Cell, Langerhans Cell, B Cell, TNF, Unfolded Protein, TCA Cycle, T Cell IL- 12 Signature, Keratinocyte, IL- 12 Complex, and Melanocyte. Top 15 features important in classifying non-lesional DLE vs. AD (FIG. 45D) are (in order of gini index, highest to lowest) Inflammasome, NK Cell, Unfolded Protein, B Cell, pDC, IL- 12 Complex, TNF, Langerhans Cell, Plasma Cell, Antiinflammation, Amino Acid Metabolism, Melanocyte, Monocyte/Myeloid Cell, IL-21 Complex, and Immunoproteasome.

[0148] FIGs. 46A - 46C show correlated features from cellular and pathway signatures usedto extract collinear features for nonlesional ML binary classification. Correlation plots of GSVA enrichment scores of control samples and nonlesional (FIG. 46A) DLE, (FIG. 46B) PSO and (FIG. 46C) AD samples. Correlations outlined in black were reduced to only include one feature. Black boxes indicate collinear samples with Pearson correlation coefficientgreater than 0.8, then the feature with the lower correlation was removed using a greedy elimination approach.

[0149] FIG. 47A - 47C show ML distinguishes nonlesional DLE from nonlesional PSO and nonlesional AD. ROC curve and PR curve of all machine learning classification algorithms to separate nonlesional DLE from other inflammatory skin diseases using all cellular and pathway gene signatures/ features. ML classifiers include: logistic regression (LR, blue), K-nearest neighbors (KNN, orange), random forest (RF, green), naive bayes (NB, red), support-vector machine (SVM, purple) and gradient boosting(GB, brown). (FIG. 47A) DLE versus PSO and (FIG. 47B) DLE versus AD. (FIG. 47C, Table 14) Classification metrics including sensitivity, specificity, Cohen Kappa score, precision, f-1 score and accuracy toproperly separate nonlesional DLE samples from nonlesional PSO and nonlesional AD samples with each ML classifier. Refer to Tables 5A-B for details about ML. Collinear features were removed (FIG. 50). The AUC values of the ROC curves (FIG. 47A) for non-lesional DLE vs. PSO classification, for ML classifiers LR, KNN, RF, NB, SVM, and GB are 0.974, 0.953, 1, 0.982, 1, and 0.971 respectively. The AUC values of the PR curves (FIG. 47A) for non-lesional DLE vs. PSO classification, for ML classifiers LR, KNN, RF, NB, SVM, and GB are 0.944, 0.947, 1, 0.986, 1, and 0.963 respectively. The AUC values of the ROC curves (FIG. 47B) for non-lesional DLE vs. AD classification, for ML classifiers LR, KNN, RF, NB, SVM, and GB are 0.983, 0.953, 0.990, 0.961, 0.997, and 0.974 respectively. The AUC values of the PR curves (FIG. 47B) for non-lesional DLE vs. AD classification,, for ML classifiers LR, KNN, RF, NB, SVM, and GB are 0.983, 0.946, 0.989, 0.969, 0.997, and 0.975 respectively.

[0150] FIG. 48A - 48D show ML classification of nonlesional PSO and AD. (FIG. 48A) ROC curve and PR curve of all ML classification algorithms to separate nonlesional PSO from nonlesional AD samples using all cellular and pathway gene signatures/ features. ML classifiers include: logistic regression (LR, blue), K-nearest neighbors (KNN, orange), random forest (RF, green), naive bayes (NB, red), support-vector machine (SVM, purple) and gradient boosting (GB, brown). (FIG. 48B) Top 15 features important in classifying nonlesional PSO from nonlesional AD using Gini feature importance. (FIG. 48C, Table 15) Classification metrics including sensitivity, specificity, Cohen Kappa score, precision, f-1 score and accuracy to properly separate nonlesional PSO samples from nonlesional AD samples with each ML classifier. (FIG. 48D) Correlation plots of GSVA enrichment scores of nonlesional PSO and nonlesional AD samples. Black boxes indicate collinear samples with Pearson correlation coefficient greater than 0.8, then the feature with the lower correlation was removed using a greedy elimination approach. The AUC values of the ROC curves (FIG. 48A) for non-lesional AD vs. PSO classification, for ML classifiers LR, KNN, RF, NB, SVM, and GB are 0.684, 0.830, 0.801, 0.807, 0.739, and 0.824 respectively. The AUC values of the PR curves (FIG. 48A) for non-lesional AD vs. PSO classification, for ML classifiers LR, KNN, RF, NB, SVM, and GB are 0.682, 0.867, 0.841, 0.854, 0.767, and 0.851 respectively. Top 15 features important in classifying non-lesional AD vs. PSO (FIG. 48D) are (in order of gini index, highest to lowest) Amino Acid Metabolism, IL-23 Complex, Cell Cycle, Glycolysis, OXPHOS, Low-density Granulocyte, IL- 17 Complex, Fibroblast, IL- 12 Complex, NK Cell, Proteasome, T Cell IL- 12 Signature, Inflammasome, IL-21 Complex, and Monocyte.

[0151] FIGs. 49A - 49B show nonlesional skin is characterized by upregulation of unique cellular and pathway signatures. (FIG. 49A) Hedges’ g effect sizes of cellular (left) and pathway (right) gene signatures for pooled nonlesional disease samples compared to pooled control samples DLE, PSO and AD datasets. Heatmap visualization uses red (enriched signature, >0)and blue (decreased signature, <0). Welch’s t-test: * p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001. (FIG. 49B) Comparison of the most important features determined by ML that are also statistically significant by Z-score GSVA of nonlesional skin versus controls for nonlesional DLE (left), nonlesional PSO (middle) and nonlesional AD (right). 40 features were used in the nonlesional Z- score GSVA, only these features were used in the comparison to nonlesional ML.

[0152] FIGs. 50A - 50B show correlated features from cellular and pathway signatures usedto extract collinear features for nonlesional ML binary classification compared to DLE. Correlation plots of GSVA enrichment scores of nonlesional DLE and (FIG. 50A) nonlesional PSO and (FIG. 50B) nonlesional AD samples. Black boxes indicate collinear samples with Pearson correlation coefficient greater than 0.8, then the feature with the lower correlation was removed using a greedy elimination approach.

[0153] FIGs. 51A - 51B show that analysis of cellular and molecular pathway signatures in nonlesional DLE (NL DLE) shows upregulation of B cell, plasma cell and fatty acid metabolism gene signatures. GSVA enrichment scores using Z-scores of (FIG. 51A) cellular gene signatures and (FIG. 51B) pathway gene signatures in nonlesional DLE and control samples. The number of nonlesional DLE samples per dataset that lie -1 standard deviation of the average of the control samples is denoted on the first subtext line. The number of DLE samples per dataset that lie +1 standard deviation of the average of the control samples is denoted on the second subtext line. Welch’s T-test: * p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001, as indicated where observed for a given pair of violin plots by a bracket and corresponding number of asterisks above the plots. Plots for NL DLE samples are shown in dark gray (the right plot of each pair of violin plots). Plots for control samples (CTL) are shown in light gray (the left plot of each pair of violin plots). Dotted horizontal line indicates GSVA enrichment score of 0, with positive scores above and negative scores below. In FIG. 51A the panels show GSVA scores for cell types, in each row from left to right: row 1 (top row) - granulocyte, neutrophil, LDG, skin-specific DC, Langerhans, monocyte, monocyte/myeloid, NK cell, T cell, B cell, GC B cell; row 2 - plasma cell, platelet, endothelial cell, fibroblast, keratinocyte, melanocyte. In FIG. 51B the panels show GSVA scores for pathways, in each row from left to right: row 1 (top row) - IFN, T cell IL- 12 signature, IL- 12, IL- 17 complex; T cell IL-23 signature, TGFB fibroblast, TNF, Thl7, anti-inflammation, complement proteins; row 2 - inflammasome, ROS production, apoptosis, cell cycle, proteasome, unfolded protein, glycolysis, pentose phosphate, TCA cycle, OXPHOS; row 3 - FAAO, FABO, AA metabolism, peroxisome.

[0154] FIGs. 52A - 52B show that analysis of cellular and molecular pathway signatures in nonlesional PSO (NL PSO) shows upregulation of innate immune cell and IL- 17 gene signatures. GSVA enrichment scores using Z-scores of (FIG. 52A) cellular gene signatures and (FIG. 52B) pathway gene signatures in nonlesional PSO and control samples. The number of nonlesional PSO samples per dataset that lie -1 standard deviation of theaverage of the control samples is denoted on the first subtext line. The number of PSO samples per dataset that he +1 standard deviation of the average of the control samples is denoted on the second subtext line. Welch’s T-test: * p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001, as indicated where observed for a given pair of violin plots by a bracket and corresponding number of asterisks above the plots. Plots for NL PSO samples are shown as the right plot of each pair of violin plots. Plots for control samples (CTL) are shown as the left plot of each pair of violin plots. Dotted horizontal line indicates GSVA enrichment score of 0, with positive scores above and negative scores below. In FIG. 52A the panels show GSVA scores for cell types, in each row from left to right: row 1 (top row) - granulocyte, neutrophil, LDG, skinspecific DC, Langerhans, monocyte, monocyte/myeloid, NK cell, T cell, B cell, GC B cell; row 2 - plasma cell, platelet, endothelial cell, fibroblast, keratinocyte, melanocyte. In FIG. 52B the panels show GSVA scores for pathways, in each row from left to right: row 1 (top row) - IFN, T cell IL- 12 signature, IL-12, IL-17 complex; T cell IL-23 signature, TGFB fibroblast, TNF, Thl7, antiinflammation, complement proteins; row 2 - inflammasome, ROS production, apoptosis, cell cycle, proteasome, unfolded protein, glycolysis, pentose phosphate, TCA cycle, OXPHOS; row 3 - FAAO, FABO, AA metabolism, peroxisome. [0155] FIGs. 53A - 53B show that analysis of cellular and molecular pathway signatures in nonlesional AD (NL AD) shows upregulation of anti-inflammation, neutrophil, NK cell and Th17 gene signatures. GSVA enrichment scores using Z-scores of (FIG. 53A) cellular gene signatures and (FIG. 53B) pathway gene signatures in nonlesional AD and control samples. The number of nonlesional AD samples per dataset that he -1 standard deviation of the average of the control samples is denoted on the first subtext line. The number of AD samples per dataset that he +1 standard deviation of the average of the control samples is denoted on the second subtext line. Welch’s T-test: * p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001, as indicated where observed for a given pair of violin plots by a bracket and corresponding number of asterisks above the plots. Plots for NL AD samples are shown as the right plot of each pair of violin plots. Plots for control samples (CTL) are shown as the left plot of each pair of violin plots. Dotted horizontal line indicates GSVA enrichment score of 0, with positive scores above and negative scores below. In FIG. 53A the panels show GSVA scores for cell types, in each row from left to right: row 1 (top row) - granulocyte, neutrophil, LDG, skin-specific DC, Langerhans, monocyte, monocyte/myeloid, NK cell, T cell, B cell, GC B cell; row 2 - plasma cell, platelet, endothelial cell, fibroblast, keratinocyte, melanocyte. In FIG. 53B the panels show GSVA scores for pathways, in each row from left to right: row 1 (top row) - IFN, T cell IL- 12 signature, IL- 12, IL- 17 complex; T cell IL-23 signature, TGFB fibroblast, TNF, Thl7, anti-inflammation, complement proteins; row 2 - inflammasome, ROS production, apoptosis, cell cycle, proteasome, unfolded protein, glycolysis, pentose phosphate, TCA cycle, OXPHOS; row 3 - FAAO, FABO, AA metabolism, peroxisome.

[0156] FIGs. 54A - 54B show analysis of cellular and molecular pathway signatures in nonlesional DLE using mean of Z-score. Box plots of the mean of Z-scores of genes for each sample and gene category for (FIG. 54A) cellular gene signatures and (FIG. 54B) pathway gene signatures in nonlesional DLE and control samples. Welch’s T- test: * p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001, as indicated where observed for a given pair of box plots by a bracket and corresponding number of asterisks above the plots. Plots for NL DLE samples are shown as the right plot of each pair of box plots. Plots for control samples (CTL) are shown as the left plot of each pair of box plots. Dotted horizontal line indicates mean of Z-score of 0, with positive scores above and negative scores below. In FIG. 54A the panels show the mean of Z-scores for cell types, in each row from left to right: row 1 (top row) - granulocyte, neutrophil, skin-specific DC, Langerhans, monocyte, monocyte/myeloid, NK cell, T cell, B cell; row 2 - plasma cell, platelet, endothelial cell, fibroblast, keratinocyte, melanocyte. In FIG. 54B the panels show mean of Z-scores for pathways, in each row from left to right: row 1 (top row) - IFN, T cell IL- 12 signature, IL- 12, IL- 17 complex; T cell IL-23 signature, TGFB fibroblast, TNF, Thl7, anti-inflammation, complement proteins; row 2 - inflammasome, ROS production, apoptosis, cell cycle, proteasome, unfolded protein, glycolysis, pentose phosphate, TCA cycle, OXPHOS; row 3 - FAAO, FABO, AA metabolism, peroxisome. [0157] FIG. 55A - 55B show analysis of cellular and molecular pathway signatures in nonlesional PSO using mean of Z-score. Box plots of the mean of Z-scores of genes for each sample and gene category for (FIG. 55A) cellular gene signatures and (FIG. 55B) pathway gene signatures in nonlesional PSO and control samples. Welch’s T- test: * p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001, as indicated where observed for a given pair of box plots by a bracket and corresponding number of asterisks above the plots. Plots for NL PSO samples are shown as the right plot of each pair of box plots. Plots for control samples (CTL) are shown as the left plot of each pair of box plots. Dotted horizontal line indicates mean of Z-score of 0, with positive scores above and negative scores below. In FIG. 55A the panels show the mean of Z-scores for cell types, in each row from left to right: row 1 (top row) - granulocyte, neutrophil, skin-specific DC, Langerhans, monocyte, monocyte/myeloid, NK cell, T cell, B cell; row 2 - plasma cell, platelet, endothelial cell, fibroblast, keratinocyte, melanocyte. In FIG. 55B the panels show mean of Z-scores for pathways, in each row from left to right: row 1 (top row) - IFN, T cell IL- 12 signature, IL- 12, IL- 17 complex; T cell IL-23 signature, TGFB fibroblast, TNF, Thl7, anti -inflammation, complement proteins; row 2 - inflammasome, ROS production, apoptosis, cell cycle, proteasome, unfolded protein, glycolysis, pentose phosphate, TCA cycle, OXPHOS; row 3 - FAAO, FABO, AA metabolism, peroxisome.

[0158] FIGs. 56A - 56B show analysis of cellular and molecular pathway signatures in nonlesional AD using mean of Z-score. Box plots of the mean of Z-scores of genes for each sample and gene category for (FIG. 56A) cellular gene signatures and (FIG. 56B) pathway gene signatures in nonlesional AD (light yellow) and control samples (grey). Welch’s T- test: * p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001, as indicated where observed for a given pair of box plots by a bracket and corresponding number of asterisks above the plots. Plots for NL AD samples are shown as the right plot of each pair of box plots. Plots for control samples (CTL) are shown as the left plot of each pair of box plots. Dotted horizontal line indicates a mean of Z-score of 0, with positive scores above and negative scores below. In FIG. 56A the panels show the mean of Z-scores for cell types, in each row from left to right: row 1 (top row) - granulocyte, neutrophil, skin-specific DC, Langerhans, monocyte, monocyte/myeloid, NK cell, T cell, B cell; row 2 - plasma cell, platelet, endothelial cell, fibroblast, keratinocyte, melanocyte. In FIG. 56B the panels show mean of Z- scores for pathways, in each row from left to right: row 1 (top row) - IFN, T cell IL- 12 signature, IL-12, IL-17 complex; T cell IL-23 signature, TGFB fibroblast, TNF, Thl7, anti-inflammation, complement proteins; row 2 - inflammasome, ROS production, apoptosis, cell cycle, proteasome, unfolded protein, glycolysis, pentose phosphate, TCA cycle, OXPHOS; row 3 - FAAO, FABO, AA metabolism, peroxisome.

[0159] FIGs. 57A - 57D show cellular and pathway enrichment in SCLE is quantitatively similar to enrichment observed in DLE. Hedges’ g effect sizes of GSVA enrichment scores for (FIG. 57A) cellular gene signatures and (FIG. 57B) pathway gene signatures in lesionalSCLE and control samples in three datasets. Hedges’ g effect sizes of GSVA enrichmentscores for (FIG. 57C) cellular gene signatures and (FIG. 57D) pathway gene signatures in lesional DLE and SCLE samples in three datasets. Heatmap visualization uses red (enriched signature, >0) and blue (decreased signature, <0). Welch’s t-test: * p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001.

[0160] FIGs. 58A - 58F show DLE and SCLE can be transcriptionally classified using ML. (FIG. 58A) Hierarchical clustering (k=4) of DLE and SCLE samples from three lupus datasets based on GSVA scores of cellular and pathway gene signatures in control. (FIG. 58B) Correlation plot of GSVA enrichment scores of lesional DLE and lesional SCLE samples. (FIG. 58C) ROC curve and (FIG. 58D) PR curve separating DLE and SCLE using ML classifiers, including: logistic regression (LR, blue), random forest (RF, orange), support-vector machine (SVM, green)and gradient boosting (GB, red). Random oversampling was used to adjust for class imbalance errors. (FIG. 58E) Top 15 features important in classifying DLE from SCLE using Ginifeature importance. (FIG. 58F, Table 16) Classification metrics including sensitivity, specificity, Cohen Kappa score, precision, f-1 score and accuracy to properly separate DLE and SCLE. Refer to Table 5A-B for details about ML. The AUC values of the ROC curves (FIG. 58C) for DLE vs. SCLE classification, for ML classifiers LR, RF, SVM, and GB are 0.828, 0.910, 0.924, and 0.901 respectively. The AUC values of the PR curves (FIG. 58D) for DLE vs. SCLE classification, for ML classifiers LR, RF, SVM, and GB are 0.838, 0.885, 0.914, and 0.874 respectively. Top 15 features important in classifying DLE vs. SCLE (FIG. 58E) are (in order of gini index, highest to lowest) Plasma Cell, Unfolded Protein, TNF, Apoptosis, T Cell IL- 12 Signature, IL-23 Complex, Neutrophil, pDC, Complement Proteins, IL-1 Cytokines, Melanocyte, Monocyte/Myeloid Cell, Fatty Acid Beta Oxidation, Amino Acid Metabolism and GC B Cell.

[0161] FIG. 59 show stimulated keratinocyte signatures are highly enriched in skin inflammatory diseases. Hedges’ g effect sizes of GSVA enrichment scores for disease samples compared to their respective healthy control samples in five DLE, three PSO, two AD and three SSc datasets using curated keratinocyte-curated cellular signatures treated with various types of cytokines and immune molecules. Heatmap visualization uses red (enriched signature, >0) and blue (decreased signature, <0). Welch’s t-test: * p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001.

[0162] FIGs. 60A - 60D show overabundance of correlated features from keratinocyte cell gene signatures. Correlation plot of GSVA enrichment scores to find keratinocyte gene signatures that are correlated to each other in (FIG. 60A) DLE and control samples; (FIG. 60B) PSO and control samples; (FIG. 60C) AD and control samples; and (FIG. 60D) SSc and control samples.

[0163] FIGs. 61A - 61E show T cell subtype signatures are highly enriched in skin inflammatory diseases. GSVA enrichment scores for (FIG. 61A) T cell cellular signatures in disease samples compared to their respective healthy control samples in five DLE, threePSO, two AD and three SSc datasets. Heatmap visualization uses red (enriched signature, >0) and blue (decreased signature, <0). Welch’s t-test: * p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001. Correlation plots of GSVA enrichment scores to find T cell gene signatures that are correlated to each other in (FIG. 61B) DLE and control samples; (FIG. 61C) PSO and control samples; (FIG. 61D) AD and control samples; and (FIG. 61E) SSc and control samples. In each of FIGs. 61B-61E: top to bottom and left to right: Dermal Aner/Act T cell, Dermal CD8 T cell, Dermal Tfh, Dermal Thl, Dermal Thl7, Dermal Th2, Dermal Treg, label.

[0164] FIGs. 62A-62B show nonlesional skin from patients with inflammatory skin diseases manifests a specific set of pre-clinical, molecular abnormalities that predispose the development of both shared and unique clinical features in lesional DLE, PSO, AD and SSc after encountering an environmental trigger. FIG. 62A shows summary graphic detailingfeatures determined by ML and upregulated in nonlesional skin or lesional skin of DLE, PSO, AD and SSc versus control as determined by GSVA. Some features are upregulated in both nonlesional and lesional skin. The bottom box shows important ML features upregulated by GSVA in lesional skin and shared among all four inflammatory skin diseases. Refer to Table 6 for details about comparison between GSVA and Z-score methods. FIG. 62B shows a summary of possible therapies of lesional skin diseases analyzed (left) and possible therapies for both lesional and nonlesional regions of each disease (right) based on molecular characterization. * delineates drugs in development.

[0165] FIGs. 63A - 63E show ML classification of DLE versus PSO, AD, and SSc confirms distinct disease-specific gene signatures. Top 15 features important in classifying lesional DLE versus lesional PSO (FIG. 63A), lesional DLE versus lesional AD (FIG. 63B), lesional DLE versus lesional SSc (FIG. 63C), nonlesional DLE versus nonlesional PSO (FIG. 63D), and nonlesional DLE versus nonlesional AD (FIG. 63E) using SHAP values. Collinear features were removed.

[0166] FIG. 64 shows derivation of the inflammatory skin disease risk score to calculate activity of cellular and immune pathways in lesional skin diseases. Coefficients resulting from the logistic regression and ridge penalty model of 48 cellular and pathway coefficients run with 500 iterations.

[0167] FIGs. 65A-C showK-means clustering of CLE and SSc skin reveals molecular endotypes. K-means clustering of (FIG. 65A) DLE (GSE184989) and (FIG. 65B) SSc (GSE58095) using GSVA enrichment scores of cellular and pathway gene signatures. (FIG. 65C) Cosine similarity analysis to compare the molecular profiles of the endotypes derived from DLE to those of SSc. In FIG. 65A and 65B, the modules listed from top to bottom (left vertical axis) are OXPHOS, TCA cycle, FABO, IL 12, TNF, Inflammasome, Proteasome, Unfolded protein, Apoptosis, pDC, T Cell IL 12 Signature, T Cell, Skin-specific DC, Keratinocyte, Plasma Cell, Endothelial Cell, Cell cycle, Peroxisome, Complement Proteins, Monocyte, Pentose Phosphate, TGFB Fibroblast, AA metabolism, Fibroblast, Glycolysis, Monocyte/Myeloid Cell, IL 17 Complex, IL 1 cytokines, Anti inflammation, IL 21 Complex, NK Cell, IFN, Immunoproteasome, ROS Production, Langerhans Cell, IL23 Complex, IL 12 Complex, FAAO, GC B Cell, Melanocyte, Granulocyte, Neutrophil, Thl7, B Cell, LDG, Platelet, T Cell IL 23 Signature, and Erythrocyte.

[0168] FIGs. 66A-D show transcriptional analysis of immune populations in NZM2328 mice with acute GN (AGN). Individual sample gene expression from the glomeruli (FIGs. 66A&C) and tubulointerstitial tissue (FIGs. 66B&D) of CTL and AGN mice was analyzed by GSVA for enrichment of immune cells/inflammatory pathways (FIGs. 66A-B) and kidney tissue cells (FIGs. 66C-D). Enrichment scores are shown as violin plots. *p<0.05, **p<0.01, ***p<0.001.

[0169] FIGs. 67A-F show histologic and transcriptomic analysis of LN disease stages in the glomeruli of NZM2328 mice. FIGs. 67A-D show H&E staining of kidneys from normal/CTL (FIG. 67A) NZM2328 females and mice with acute (FIG. 67B), transitional (FIG. 67C), and chronic (FIG. 67D) stage GN. (FIG. 67E) Heatmap of GSVA scores for enrichment of immune cell and pathway gene signatures in the glomeruli of (control), AGN (acute stage glomerular nephritis), TGN (transitional glomerular nephritis), and CGN (chronic stage glomerular nephritis) mice. Asterisks (in black or white) indicate significant comparisons with CTL mice. (FIG. 67F) GSVA enrichment of podocytes gene signatures in cohorts shown in FIG. 67E. *p<0.05, **p<0.01, ***p<0.001, ****p<0.0001.

[0170] FIGs. 68A-D show immune profiling and kidney tissue analysis of LN disease stages in the TI of NZM2328 mice. (FIG. 68A) Heatmap of GSVA scores for enrichment of immune cell and pathway gene signatures in the TI of CTL, AGN, TGN, and CGN mice. Asterisks indicate significant comparisons with CTL mice. (FIG. 68B) GSVA enrichment of kidney tissue cell gene signatures in cohorts from FIG. 68A. (FIG. 68C) Log2 expression values of kidney tubule damage-associated genes for cohorts from FIG. 68A. (FIG. 68D) Linear regression between log2 expression of kidney tubule damage genes and GSVA scores of kidney tubule cells.

*p<0.05, **p<0.01, ***p<0.001.

[0171] FIGs. 69A-C show male NZM2328 mice lack inflammatory signature enrichment associated with progression to chronic GN. (FIG. 69A) Heatmap of GSVA scores for enrichment of immune cell and pathway gene signatures in the glomeruli of male CTL and AGN mice. Asterisks indicate significant comparisons with CTL mice. (FIG. 69B) GSVA enrichment of podocytes gene signatures in cohorts shown in FIG. 69A. (FIG. 69C) GSVA enrichment of signatures for estrogen-regulated and androgen-regulated genes in the glomeruli of female and male AGN mice. *p<0.05, **p<0.01.

[0172] FIGs. 70A-C show inflammatory gene signatures in the glomeruli of R27 mice differ from NZM2328 mice. (FIG. 70A) Bubbleplot depicting the overlap of DEGs up-regulated in the glomeruli of NZM2328 and R27 AGN mice with immunologic gene signatures. Bubble size indicates odds ratio and color indicates p-value of the comparison with CTL mice. Asterisks indicate statistically significant comparisons (shown in lower left of all cells except R27 pattern recognition receptor). (FIG. 70B) Heatmap of GSVA scores for enrichment of immune cell and pathway gene signatures in the glomeruli of R27 CTL and AGN mice. Asterisks (in black or white) indicate significant comparisons with CTL mice. (FIG. 70C) GSVA enrichment of podocytes gene signatures in cohorts shown in FIG. 70B. *p<0.05, **p<0.01, ***p<0.001.

[0173] FIGs. 71A-E show gene expression analysis of the TI of R27 mice indicates resistance to kidney tubule damage. (FIG. 71A) Bubbleplot depicting the overlap of DEGs up-regulated in the TI NZM2328 and R27 AGN mice with immunologic gene signatures. Bubble size indicates odds ratio and color indicates p-value of the comparison with CTL mice. Asterisks indicate statistically significant comparisons. (FIG. 71B) Heatmap of GSVA scores for enrichment of immune cell and pathway gene signatures in the TI of R27 CTL and AGN mice. Asterisks indicate significant comparisons with CTL mice. (FIG. 71C) GSVA enrichment of kidney tissue cell signatures in cohorts shown in FIG. 71B. (FIG. 71D) Log2 expression values of kidney tubule damage-associated genes for cohorts from FIG. 71B. (FIG. 71E) Linear regression between GSVA scores of kidney tubule cell and metabolic pathway gene signatures. *p<0.05, **p<0.01, ***p<0.001, ****p<0.0001.

[0174] FIGs. 72A-E show expression of chronic risk locus genes is associated with disease severity and kidney tubule resistance in NZM2328 and R27 AGN mice. (FIGs. 72A&B) Log2 expression values of immune receptor genes in the Cgnzl risk locus from the glomeruli (FIG. 72A) and TI (FIG. 72B) of NZM2328 CTL, AGN, TGN, and CGN mice. (FIG. 72C) Linear regression between log2 expression of Cgnzl locus genes (x-axis) and GSVA scores (y-axis) of kidney tubule cells from the TI of R27 mice. All statistically significant correlations are shown. (FIG. 72D) Log2 expression values Cgnzl locus genes from FIG. 72C in the TI of NZM2328 CTL, AGN, TGN, and CGN mice. (FIG. 72E) Linear regression between log2 expression of Cgnzl locus genes from FIG. 72C and GSVA scores of kidney tubule cells from the TI of NZM2328 mice. All statistically significant correlations are shown. *p<0.05, **p<0.01, ***p<0.001, ****p<0.0001.

[0175] FIG. 73. Schematic showing progression to CGN in NZM2328 AGN mice, and resistance to CGN in NZM2328.R27 AGN mice.

[0176] FIGs. 74A-B show clustering of GSVA enrichment scores in lupus kidneys of 76 patients with LN (BH11201) reveals four distinct endotypes of patients with LN. (FIG. 74A) Row and column hierarchical clustering of 76 patients with LN into four groups based upon gene expression of cellular and pathway gene modules. (FIG. 74B) Reordered clustering of LN patients in order of molecular disease severity from least to greatest. The columns represent individual patients that are grouped into four clusters (from left to right: black, coral, yellow, and purple). The rows represent gene modules indicative of immune/inflammatory cells, non- hematopoietic cells, and cellular metabolism. In FIGs. 74A-74B, the GSVA sets are (right vertical axis, top to bottom) - Amino Acid Metabolism, Fatty Acid Beta Oxidation, Kidney Proximal Convoluted Tubule, Fatty Acid Alpha Oxidation, Podocyte, TCA cycle, Kidney Cell, Kidney Distal Tubule, Oxidative Phosphorylation, Granulocyte, LDG, Platelet, NK Cell, Endothelial Cell, Kidney Loop of Henle Cell, Kidney Tubule Collecting Duct Cell, Pentose Phosphate, Glycolysis, pDC, Fibroblast, Mesangial Cell, Dendritic Cell, Anergic or Activated T Cell, GC B Cell, Plasma Cell, B Cell, Monocyte/Myeloid Cell, and T Cell.

[0177] FIGs. 75A-H show comparison of molecular endotypes with clinical features reveals some correlation between gene expression and histology. Distribution of (FIG. 75A) ISN/RPS (International Society of Nephrol ogy/Renal Pathology Society; see, e.g., Markowitz and Agati, 2007, “The ISN/RPS 2003 classification of lupus nephritis: An assessment at 3 years,” Kidney International 71 : 491-495, incorporated herein by reference in its entirety) classes in 46 patients with LN (three bars each for coral, yellow, purple and black, from left to right in each group: mesangial, proliferative, membranous), (FIG. 75B) positive or negative IgA deposition in 44 patients with LN (two bars each for coral, yellow, purple and black, from left to right in each group: negative, positive), (FIG. 75C) inactive or active SLED Al in 32 patients with LN (two bars each for coral, yellow, purple and black, from left to right in each group: Inactive (SLEDAK6), Active (SLED Al > 6), (FIG. 75D) renal activity index in 49 patients with LN, and (FIG. 75E) renal chronicity index in 48 patients with LN among the LN endotypes. FIG. 75F shows proteinuria values (g/24h) in 24 patients with LN. FIG. 75G shows the percent of 41 patients with LN having negative (“0”; left bar of each pair) or positive (“>0”; right bar of each pair) IgG deposition. FIG. 75H shows the percent of 42 patients with LN having negative (“0”; left bar of each pair) or positive (“>0”; right bar of each pair) IgM deposition. In (FIGs. 75A-C) significant differences in expected and observed frequencies between coral, the “least abnormal” LN endotype, and all other clusters (denoted with asterisk above bars) for (FIG. 75A) proliferative LN, (FIG. 75B) positive IgA deposition, and (FIG. 75C) active SLED Al were identified by Chi Square Test. The likelihood of having proliferative LN in the coral cluster was not significantly different than the other clusters. The likelihood (odds ratio) of having positive IgA deposition in the coral cluster is 0.43 (p < 0.0001) as compared to the other three clusters. The likelihood (odds ratio) of having active SLE (SLED Al > 6) in the coral cluster is 0.06 (p < 0.01) as compared to the other three clusters. In FIGs. 75A-75C significant associations between the categorical variables and all clusters (denoted with asterisks on the y-axis) were identified using Chi Square Test of Independence. In FIGs. 75A only the yellow group had mesangial patients. In (FIGs. 75D-E) significant differences in mean of the renal activity or renal chronicity index between the coral cluster and each other cluster was assessed by Brown- Forsythe and Welch ANOVA with Dunnett’s T3 multiple comparisons. **, p < 0.01, ****, p < 0.0001.

[0178] FIGs. 76A-I show comparison of the molecular endotypes, as shown in FIGs. 74A and B, for GSVA enrichment of signatures for TCA cycle (FIG. 76A), Oxidative Phosphorylation (FIG. 76B), Fatty Acid Beta Oxidation (FIG. 76C), Kidney Cell (FIG. 76D), Podocyte (FIG. 76E), Proximal Tubule (FIG. 76F), Monocyte/Myeloid Cell (FIG. 76G), T cell (FIG. 76H), and B cell (FIG. 761). Significant differences in mean GSVA enrichment score between Coral, the “least abnormal” LN endotype, and each other cluster were assessed by Brown-Forsythe and Welch ANOVA with Dunnett’s T3 multiple comparisons. *, p < 0.05, ***, p < 0.001, ****, p < 0.0001.

[0179] FIG. 77 shows clustering of GSVA enrichment scores into the four kidney-derived molecular clusters, using informative cellular and pathway signatures (Tables 25-1 to 25-32) in paired blood from patients with LN (BH11201). The columns represent individual patients that are grouped into four clusters (Coral, Yellow, Purple, Black). The rows represent gene modules indicative of immune/inflammatory cells and cellular pathways/processes. For FIG. 77 the molecular features (e.g., modules) listed from top to bottom (on the left vertical axis) are IFN, Immunoproteasome, Plasma Cell, IG Chains, Cell Cycle, SNOR Low UP, IL1 Pathway, Inflammasome, Inhibitory Macrophage, Inflammatory Cytokines, Anti-inflammation, TNF, Monocyte, Neutrophil, Granulocyte, LDG, Dendritic Cell, pDC, TCRD, NK Cell, MHCII, B Cell, gd T Cell, Anergic/activated T Cell, Oxidative Phosphorylation, Unfolded Protein, TCRAJ, T Cell, TCRA, TCRB, IL23 Complex and Treg

[0180] FIGs. 78A-L show analysis of paired blood of patients with LN demonstrates clusterspecific enrichment of LDG, T cell, dendritic cell, and glucocorticoid signatures. GSVA enrichment of (FIG. 78A) LDG, (FIG. 78B) T cell, (FIG. 78C) TCRA, (FIG. 78D) TCRAJ, (FIG. 78E) TCRB, (FIG. 78F) anergic/activated T cell, (FIG. 78G) dendritic cell, (FIG. 78H) glucocorticoid, (FIG. 781) interferon (IFN), (FIG. 78J) monocyte, (FIG. 78K) B cell, and (FIG. 78L) plasma cell signatures in the blood of 71 patients with LN (BH11201) are shown. X- axis clusters denote the cluster to which the sample belongs based upon analysis of paired kidney gene expression. Significant differences in enrichment of gene signatures between each cluster and Coral was assessed by Brown-Forsythe and Welch ANOVA with Dunnett’s T3 multiple comparisons. *, p < 0.05, **, p < 0.01, ***, p < 0.001. The glucocorticoid signature is derived from Northcott et al. (2).

[0181] FIGs. 79A-I show the LDG and T cell signatures are consistently correlated with the glucocorticoid signature in the blood of patients with LN, whereas the dendritic cell signature is not. Linear regression of the glucocorticoid signature with the LDG, T cell, and dendritic cell signatures in the blood of patients with lupus nephritis for (FIGs. 79A-C) BH11201 (n = 71), (FIGs. 79D-F) GSE49454 (n = 19), and (FIGs. 79G-J) GSE99967 (n = 28) is shown. The glucocorticoid signature is derived from Northcott et al. (2). In each of FIGs. 79-1 the glucocorticoid signature is shown in the x-axis.

[0182] FIGs. 80A-B show the expression of erythropoietin (EPO) or a recombinant human erythropoietin (rHuEPO) signature in the blood of patients with LN is not associated with the molecular endotypes LN. (a) Log2 expression of EPO in the paired blood of 71 patients with LN. (b) GSVA enrichment of the rHuEPO signature in the paired blood of 71 patients with LN. The rHuEPO signature was derived from Wang et al (3), where differentially expressed genes were measured after administration of rHuEPO, and nine of the genes that were consistently expressed after rHuEPO administration comprised the signature.

[0183] FIGs. 81A-B show unsupervised gene co-expression network analysis defines molecular profiles of NZM2328 mice correlated with disease severity. K-means clustering (k=4) of NZM2328 CTL, AGN, TGN, and CGN mouse glomeruli (FIG. 81 A) and TI (FIG. 8 IB) based on GSVA enrichment scores of MEGENA modules. The optimal number of module clusters was defined by the silhouette method and annotated by gene overlap with curated immunologic signatures and GO terms. Heatmap visualizations depict positive to negative GSVA scores on a red to blue gradient and positive to negative correlations between GSVA scores and disease classification on a gold to blue gradient. For FIGs. 81A-B, clusters (vertical) shown from left to right are coral, maroon, green and blue.

[0184] FIGs. 82A-E show gene signature-based clustering of GN stages in NZM2328 mice translates to human LN patients. (FIGs. 82A-B) K-means clustering (k=4) of NZM2328 CTL, AGN, TGN, and CGN mouse glomeruli (FIG. 82A) and TI (FIG. 82B) based on GSVA enrichment scores of selected immune cell, kidney cell, and metabolic pathway gene sets. (FIGs. 82C-E) K-means clustering (k=4) of microdissected glomeruli (FIG. 82C), TI (FIG. 82D), and whole kidney (FIG. 82E) from human LN patients based on GSVA score from human orthologs of the mouse gene sets used in FIGs. 82A&B. Heatmap visualizations depict positive to negative GSVA scores on a red to blue gradient and positive to negative correlations between GSVA scores and disease classification on a gold to blue gradient. For FIGs. 82A-E, clusters (vertical) shown from left to right are coral, maroon, green and blue.

[0185] FIG. 83 shows expression of Cgnzl locus genes in the TI of NZM2328 and R27 AGN mice. Log2 expression values of genes in the Cgnzl risk locus from the TI of NZM2328 and R27 CTL and AGN mice. Statistical significance was evaluated separately for NZM2328 CTL vs AGN and R27 CTL vs AGN comparisons. For each gene the bars from left to right show expression in NZM2328 control, NZM2328 AGN, R27 control and R27 AGN mice.

[0186] FIG. 84 shows gene signature-based clustering of IFNa-NZB mouse kidneys. K-means clustering (k=3) of IFNa-NZB mice over time after IFNa treatment based on GSVA enrichment scores of selected immune cell, kidney cell, and metabolic pathway gene sets. Heatmap visualizations depict positive to negative GSVA scores on a red to blue gradient and positive to negative correlations between GSVA scores and disease classification on a gold to blue gradient. For FIG. 84, clusters (vertical) shown from left to right are coral, maroon, green and blue.

[0187] FIGs. 85A-B show NZM2328 mouse MEGENA module-based clustering of human LN kidneys. K-means clustering (k=4) of whole kidney samples from human LN patients based on GSVA scores from human orthologs of the MEGENA modules from NZM2328 mouse microdissected glomeruli (A) and TI (B). The optimal number of module clusters was defined by the silhouette method and annotated by gene overlap with curated immunologic signatures and GO terms. Heatmap visualizations depict positive to negative GSVA scores on a red to blue gradient and positive to negative correlations between GSVA scores and disease classification on a gold to blue gradient. For FIGs. 85A-B, clusters (vertical) shown from left to right are coral, maroon, green and blue.

INCLUDED EMBODIMENTS

1. A method for assessing a lupus nephritis disease state of a patient, the method comprising: analyzing a data set comprising or derived from gene expression measurement data of at least 2 genes or human orthologs thereof selected from the genes listed in Tables 19-1 to 19- 36, Tables 19A-1 to 19A-36, Table 20, Table 21, Table 22, Tables 23-1 to 23-28, Tables 25-1 to 25-32, Tables 26-1 to 26-60, Tables 27-1 to 27-48, and Tables 28-1 to 28-22 in a biological sample from the patient, to classify the lupus nephritis disease state of the patient.

2. The method of embodiment 1, wherein the lupus nephritis disease state of the patient is classified as acute lupus nephritis, transitional lupus nephritis, chronic lupus nephritis, or absence of lupus nephritis. The method of embodiment 1 or 2, wherein the data set comprises or is derived from gene expression measurement data of at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,

44, 45, 46, 47, 48, 49, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125,

130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215,

220, 225, 230, 235, 240, 245, 250, 255, 260, 265, 270, 275, 280, 285, 290, 295, 300, 305,

310, 315, 320, 325, 330, 335, 340, 345, 350, 355, 360, 365, 370, 375, 380, 385, 390, 395,

400, 450, 500, 550, 600, 650, 700, 750, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1550, 1600, 1700, 1800, 1900, or 2000 genes, selected from the genes listed in Tables 19-1 to 19-36, Tables 19A-1 to 19A-36, Table 20, Table 21, Table 22, Tables 23-1 to 23-28, Tables 23-1 to 23-28, Tables 26-1 to 26-60, Tables 27-1 to 27-48, and Tables 28-1 to 28-22 in the biological sample from the patient. The method of any one of embodiments 1 to 3, wherein the genes or human orthologs thereof are selected from the genes listed in Tables 19-1 to 19-36. The method of any one of embodiments 1 to 3, wherein the genes or human orthologs thereof are selected from the genes listed in Table 20. The method of any one of embodiments 1 to 3, wherein the genes or human orthologs thereof are selected from the genes listed in Table 21. The method of any one of embodiments 1 to 3, wherein the genes or human orthologs thereof are selected from the genes listed in Table 22. The method of any one of embodiments 1 to 3, wherein the genes are selected from the genes listed in Tables 23-1 to 23-28. The method of any one of embodiments 1 to 3, wherein the genes are selected from the genes listed in Tables 25-1 to 25-32. The method of any one of embodiments 1 to 3, wherein the genes are selected from the genes listed in Tables 26-1 to 26-60. The method of any one of embodiments 1 to 3, wherein the genes are selected from the genes listed in Tables 27-1 to 28-48. The method of any one of embodiments 1 to 3, wherein the genes are selected from the genes listed in Tables 28-1 to 28-22. The method of any one of embodiments 1 to 12, wherein the data set comprises or is derived from gene expression measurement data of at least 2 to all, or any value or range there between, genes or human orthologs thereof selected from the genes listed in each of one or more Tables selected from Tables 19-1 to 19-36, Tables 19A-1 to 19A-36, Table 20, Table

21, Table 22, Tables 23-1 to 23-28, Tables 25-1 to 25-32, Tables 26-1 to 26-60, Tables 27-1 to 27-48, and Tables 28-1 to 28-22 in the biological sample from the patient, wherein a different or identical number of genes are selected from the genes listed in each selected table. The method of any one of embodiments 1 to 4 and 13, wherein the one or more Tables are selected from Tables 19-1 to 19-36. The method of embodiment 14, wherein the one or more Tables comprise at least 2, 3, 4, 5,

6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,

32, 33, 34, 35, or 36 Tables selected from Tables 19-1 to 19-36. The method of embodiment 14 to 15, wherein the selected Tables are Tables 19-1 to 19-36. The method of any one of embodiments 1 to 3, 8 and 13, wherein the one or more Tables are selected from Tables 23-1 to 23-28. The method of embodiment 17, wherein the one or more Tables comprise at least 2, 3, 4, 5,

6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, or 28 Tables selected from Tables 23-1 to 23-28. The method of embodiment 17 or 18, wherein the selected Tables are Tables 23-1 to 23-28. The method of any one of embodiments 1 to 3, 9 and 13, wherein the one or more Tables are selected from Tables 25-1 to 25-32. The method of embodiment 20, wherein the one or more Tables comprise at least 2, 3, 4, 5,

6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, or 28 Tables selected from Tables 25-1 to 25-32. The method of embodiment 20 or 21, wherein the selected Tables are Tables 25-1 to 25-32. The method of any one of embodiments 1 to 3, 10 and 13, wherein the one or more Tables are selected from Tables 26-1 to 26-60. The method of embodiment 23, wherein the one or more Tables comprise at least 2, 3, 4, 5,

6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,

32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, or 60 Tables selected from Tables 26-1 to 26-60. The method of embodiments 23 or 24, wherein the selected Tables are 26-1 to 26-60. The method of any one of embodiments 1 to 3, 11 and 13, wherein the one or more Tables are selected from Tables 27-1 to 27-48. The method of embodiment 26, wherein the one or more Tables comprise at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, or 48 Tables selected from Tables 27-1 to 27-48. The method of embodiments 26 or 27, wherein the selected Tables are 27-1 to 27-48. The method of any one of embodiments 1 to 3, 12 and 13, wherein the one or more Tables are selected from Tables 28-1 to 28-22. The method of embodiment 29, wherein the one or more Tables comprise at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or 22 Tables selected from Tables 28-1 to 28-22. The method of embodiment 29 or 30, wherein the selected Tables are Tables 28-1 to 28-22. The method of any one of embodiments 1 to 31, wherein the lupus nephritis disease state of the patient is classified with an accuracy of at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%. The method of any one of embodiments 1 to 32, wherein the lupus nephritis disease state of the patient is classified with a sensitivity of at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%. The method of any one of embodiments 1 to 33, wherein the lupus nephritis disease state of the patient is classified with a specificity of at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%. The method of any one of embodiments 1 to 34, wherein the lupus nephritis disease state of the patient is classified with a positive predictive value of at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%. The method of any one of embodiments 1 to 35, wherein the lupus nephritis disease state of the patient is classified with a negative predictive value of at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%. The method of any one of embodiments 1 to 36, wherein the lupus nephritis disease state of the patient is classified with a Receiver operating characteristic (ROC) curve having an Area-Under-Curve (AUC) of at least about 0.80, at least about 0.85, at least about 0.90, at least about 0.91, at least about 0.92, at least about 0.93, at least about 0.94, at least about 0.95, at least about 0.96, at least about 0.97, at least about 0.98, at least about 0.99, or more than about 0.99. The method of any one of embodiments 1 to 37, wherein the data set is derived from the gene expression measurement data using gene set variation analysis (GSVA), gene set enrichment analysis (GSEA), enrichment algorithm, multiscale embedded gene coexpression network analysis (MEGENA), weighted gene co-expression network analysis (WGCNA), differential expression analysis, Z-score, log2 expression analysis, or any combination thereof. The method of any one of embodiments 1 to 38, wherein the data set is derived from the gene expression measurement data using GSVA. The method of embodiment 39, wherein the data set comprises one or more GSVA scores of the patient, wherein the one or more GSVA scores are generated based on one or more Tables selected from Tables 19-1 to 19-36, Tables 19A-1 to 19A-36, Tables 23-1 to 23-28, Tables 25-1 to 25-32, Tables 26-1 to 26-60, Tables 27-1 to 27-48, and Tables 28-1 to 28- 22, wherein for each selected Table, at least one GSVA score of the patient is generated based on enrichment of expression of at least 2 genes or human orthologs thereof listed in the selected Table, and wherein the one or more GSVA scores comprise each generated GSVA score. The method of embodiment 40, wherein the one or more Tables are selected from Tables 19-1 to 19-36. The method of embodiments 40 or 41, wherein the one or more Tables comprise at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, or 36 Tables selected from Tables 19-1 to 19-36. The method of any one of embodiments 40 to 42, wherein the selected tables comprise Tables 19-1 to 19-36 The method of embodiment 40, wherein the one or more Tables are selected from Tables 23-1 to 23-28. The method of embodiment 40 or 44, wherein the one or more Tables comprise at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, or 28 Tables selected from Tables 23-1 to 23-28. The method of embodiment 40, 44, or 45, wherein the selected tables comprise Tables 23-1 to 23-28. The method of embodiment 40, wherein the one or more Tables are selected from Tables

25-1 to 25-32. The method of embodiment 40 or 47, wherein the one or more Tables comprise at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, or 28 Tables selected from Tables 25-1 to 25-32. The method of embodiment 40, 47, or 48, wherein the selected tables comprise Tables 25-1 to 25-32. The method of embodiment 50, wherein the one or more Tables are selected from Tables

26-1 to 26-60. The method of embodiment 40 or 50, wherein the one or more Tables comprise at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, or 60 Tables selected from Tables 26-1 to 26-60. The method of embodiment 40, 50, or 51, wherein the selected tables comprise Tables 26-1 to 26-60. The method of embodiment 40, wherein the one or more Tables are selected from Tables

27-1 to 27-48. The method of embodiment 40 or 53, wherein the one or more Tables comprise at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, or 48 Tables selected from Tables 27-1 to 27-48. The method of embodiment 40, 53, or 54, wherein the selected tables comprise Tables 27-1 to 27-22. The method of embodiment 40, wherein the one or more Tables are selected from Tables

28-1 to 28-22. The method of embodiment 40 or 56, wherein the one or more Tables comprise at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, or 22 Tables selected from Tables 28-1 to 28-22. The method of embodiment 40, 56, or 57, wherein the selected tables comprise Tables 28-1 to 28-22. The method of any one of embodiments 40 to 58, wherein independently for each selected Table, the at least one GSVA score of the patient is generated based on enrichment of expression of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 255, 260, 265, 270, 275, 280, 285, 290, or 295 or all genes selected from the genes listed in the respective Table. The method of any one of embodiments 1 to 59, wherein the analyzing the data set comprises providing the data set as an input to a trained machine-learning model to classify the lupus nephritis disease state of the patient, wherein the trained machine-learning model generates an inference indicative of the lupus nephritis disease state of the patient based at least on the data set. The method of embodiment 60, wherein the data set comprises the one or more GSVA scores of the patient, and the trained machine-learning model generates the inference based at least on the one or more GSVA scores. The method of embodiment 60 or 61, wherein the method further comprises receiving, as an output of the trained machine-learning model, the inference; and/or electronically outputting a report indicating the lupus nephritis disease state of the patient. The method of any one of embodiments 60 to 62, wherein the machine-learning model is trained using linear regression, logistic regression, Ridge regression, Lasso regression, elastic net (EN) regression, support vector machine (SVM), gradient boosted machine (GBM), k nearest neighbors (kNN), generalized linear model (GLM), naive Bayes (NB) classifier, neural network, Random Forest (RF), deep learning algorithm, linear discriminant analysis (LDA), decision tree learning (DTREE), adaptive boosting (ADB), Classification and Regression Tree (CART), hierarchical clustering, or any combination thereof. The method of any one of embodiments 1 to 63, wherein the lupus nephritis disease state of the patient is classified based on a lupus nephritis disease risk score generated from the data set. The method of embodiment 64, wherein the lupus nephritis disease risk score is generated based on the one or more GSVA scores of the patient. The method of any one of embodiments 1 to 65, wherein the patient is at elevated risk of having lupus. The method of any one of embodiments 1 to 66, wherein the patient is suspected of having lupus. The method of any one of embodiments 1 to 67, wherein the patient is asymptomatic for lupus. The method of any one of embodiments 1 to 68, wherein the patient has lupus. The method of any one of embodiments 1 to 69, wherein the patient is at elevated risk of having lupus nephritis. The method of any one of embodiments 1 to 70, wherein the patient is suspected of having lupus nephritis. The method of any one of embodiments 1 to 71, wherein the patient is asymptomatic for lupus nephritis. The method of any one of embodiments 1 to 72, wherein the patient has lupus nephritis. The method of any one of embodiments 1 to 73, further comprising identifying, selecting, recommending and/or administering a treatment to the patient based at least in part on the classification of the lupus nephritis disease state of the patient. The method of embodiment 74, wherein the treatment is configured to treat lupus nephritis. The method of embodiment 74, wherein the treatment is configured to reduce a severity of lupus nephritis. The method of embodiment 74, wherein the treatment is configured to reduce a risk of having lupus nephritis. The method of any one of embodiments 74 to 77, wherein the treatment comprises a pharmaceutical composition. The method of any one of embodiments 1 to 78, wherein the biological sample comprises a kidney biopsy sample, a blood sample, isolated peripheral blood mononuclear cells (PBMCs), or any derivative thereof. A method for validating a mouse model useful for identifying and/or characterizing a human disease, the method comprising: a) providing a gene set capable of classifying a mouse as having an endotype selected from two or more endotypes of the disease; b) determining human orthologs of the gene set; c) classifying a human patient as having an endotype selected from the two or more endotypes of the disease using the human orthologs; and d) using the human orthologs to classify the mouse model as having an endotype selected from the two or more endotypes of the disease, wherein the endotype of a validated mouse model classified using the human orthologs corresponds to the human endotype of step (c) identified using the human

DETAILED DESCRIPTION

[0188] Unless otherwise defined, all technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

[0189] As used herein, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise. Any reference to “or” herein is intended to encompass “and/or” unless otherwise stated.

[0190] As used herein, the term “about” refers to an amount that is near the stated amount by 10%, 5%, or 1%, including increments therein.

[0191] As used herein, the phrases “at least one”, “one or more”, and “and/or” are open-ended expressions that are both conjunctive and disjunctive in operation. For example, each of the expressions “at least one of A, B and C”, “at least one of A, B, or C”, “one or more of A, B, and C”, “one or more of A, B, or C” and “A, B, and/or C” means A alone, B alone, C alone, A and B together, A and C together, B and C together, or A, B and C together.

[0192] As used herein, the term “Gini impurity” refers to a measure of how often a randomly chosen element from the set may be incorrectly labeled if it is randomly labeled according to the distribution of labels in the subset.

[0193] As used herein the term “lesion” refers to a potential disease lesion, e.g., a skin lesion potentially associated with and/or potentially directly resulting from lupus, psoriasis, atopic dermatitis, systemic sclerosis (scleroderma), or a combination thereof, as determined by one of skill in the art. In some embodiments, the lesion does not include a traumatic injury, e.g., a cut, scrape, scratch, bum, etc., and/or a skin affliction of any known origin not associated with a disease state indicated by the skin classification, e.g., contact dermatitis, a food allergy, and/or a drug reaction. In some embodiments, the skin lesion does not include a lesion that is not potentially associated with and/or potentially directly resulting from lupus, psoriasis, atopic dermatitis, systemic sclerosis (scleroderma), or a combination thereof. [0194] Reference in the specification to “embodiments,” “certain embodiments,” “preferred embodiments,” “specific embodiments,” “some embodiments,” “an embodiment,” “one embodiment” or “other embodiments” mean that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least some embodiments, but not necessarily all embodiments, of the present disclosure.

[0195] Many complex and multi-systematic diseases and conditions currently pose major diagnostic and therapeutic challenges. Despite the wealth of records from, for example, genetic, epigenetic, and gene expression data that has emerged in the past few years, physicians often still rely on clinical evaluation and laboratory tests, including measurement of autoantibodies and complement levels.

[0196] Successful relation of records (e.g., gene expression records) to a specific disease phenotype activity has been attempted, including efforts to identify individual genes that predicted subsequent flares, and through the determination of a discrete group of differentially expressed (DE) genes that may be found in a particular record. Despite these advances, however, no such approach is available with sufficient predictive value to utilize in evaluation and treatment.

[0197] As such, there is a need for a predictive tool for evaluating patient at both the chemical and cellular levels to advance personalized treatment. Data analytical techniques such as machine learning enable proper correlation between genetic records and phenotypes.

[0198] The machine learning models tested here provide the basis of personalized medicine. Integration of the methods herein with emerging high-throughput record sampling technologies may unlock the potential to develop a simple blood test to predict phenotypic activity. The disclosures herein may be generalized to predict other manifestations, such as organ involvement. A better understanding of the cellular processes that drive pathogenesis may eventually lead to customized therapeutic strategies based on records’ unique patterns of cellular activation.

Method of Identifying One or More Records Having a Specific Phenotype

[0199] One aspect disclosed herein is a method of identifying one or more records (e.g., raw gene expression data, whole gene expression data, blood gene expression data, or informative gene modules). The method may comprise receiving a plurality of first records, receiving a plurality of second records, receiving a plurality of third records, applying a machine learning algorithm to at least one first record and at least one second record to determine a classifier (e.g., a machine learning classifier), and applying the classifier to the plurality of third records. Applying the classifier to the plurality of third records may identify one or more third records associated with the specific phenotype. In some embodiments, applying a machine learning algorithm to the third data set comprises applying a machine learning algorithm to a plurality of unique third data sets.

Records [0200] The records may comprise, for example, raw gene expression data, whole gene expression data, blood gene expression data, informative gene modules, or any combination thereof. The records may be generated by Weighted Gene Co-expression Network Analysis (WGCNA). In some embodiments, at least one of the first records and the second records comprise nucleic acid sequencing data, transcriptome data, genome data, epigenome data, proteome data, metabolome data, virome data, methylome data, lipidomic data, lineage-ome data, nucleosomal occupancy data, a genetic variant, a gene fusion, an insertion or deletion (indel), or any combination thereof. In some embodiments, the first records and the second records are in different formats. In some embodiments, the first records and the second records are from different sources, different studies, or both.

[0201] In some embodiments each record is associated with a specific phenotype (e.g., a disease state, an organ involvement, or a medication response). Each first record may be associated with one or more of a plurality of phenotypes. The plurality of second records and the plurality of first records may be non-overlapping. The third records may be distinct from the plurality of first records, the plurality of second records, or both. The third records may comprise a plurality of unique third data sets.

[0202] The records may be received from the Gene Expression Omnibus (GEO, publicly available from the National Center for Biotechnology Information, e.g., on the website operated by National Library of Medicine, National Institutes of Health). The records may be associated with purified cell populations, whole blood gene expression, or both. A data set may comprise records comprising microarray, next-generation sequencing, and any other form of high- throughput functional genomic data known to those of skill in the art. The records received from a Gene Expression Omnibus source may comprise GSE10325, GSE26975, GSE38351, GSE39088, GSE45291, GSE49454, GSE72535, GSE52471, GSE81071, GSE109248, GSE100093, GSE120809, GSE117239, GSE117468, GSE130588, GSE58095, GSE95065, GSE121212, GSE137430, GSE157194, GSE130955, or any combination thereof. The records received from a Gene Expression Omnibus source may comprise GSE32583, GSE49898, GSE72410, GSE153021, GSE32591, GSE86423, GSE8642, or any combination thereof.

[0203] For example, as the most important genes may be involved in a number of functions other than interferon signaling, such RNA processing, ubiquitylation, and mitochondrial processes, these pathways may play important roles in directing, or at least be indicative of, phenotypic activity. CD4 T cells originally may contribute the most important modules. However, when the modules are deduplicated, CD 14 monocyte-derived modules prove important as unique genes expressed by CD 14 monocytes in tandem with interferon genes may be informative in the study of cell-specific methods of pathogenesis. Phenotypes

[0204] In some embodiments, the phenotype comprises a disease state, an organ involvement a medication response, or any combination thereof. The disease state may comprise an active disease state, or an inactive disease state. At least one of the active disease state and the inactive disease state may be characterized by standard clinical composite outcome measures. The active disease state may comprise a Disease Activity Index of 6 or greater.

[0205] The disease may comprise an acute disease, a chronic disease, a clinical disease, a flare-up disease, a progressive disease, a refractory disease, a subclinical disease, or a terminal disease. The disease may comprise a localized disease, a disseminated disease, or a systemic disease. The disease may comprise an immune disease, a cancer, a genetic disease, a metabolic disease, an endocrine disease, a neurological disease, a musculoskeletal disease, or a psychiatric disease. The active disease state may comprise a Systemic Lupus Erythematosus Disease Activity Index (SLED Al) of 6 or greater.

[0206] The organ involvement may comprise a possibly involved organ. The possibly involved organ may comprise bone, skin, hematopoietic system, spleen, liver, lung, mucosa, eye, ear, pituitary, or any combination thereof. The medication response may comprise an ultra-rapid metabolizer response, an extensive metabolizer response, an intermediate metabolizer response, or a poor metabolizer response. The ultra-rapid metabolizer response may refer to a record with substantially increased metabolic activity. The extensive metabolizer response may refer to a record with normal metabolic activity. The intermediate metabolizer response may refer to a record with reduced metabolic activity. The poor metabolizer response may refer to a record with little to no functional metabolic activity.

Machine Learning and Classifiers

[0207] The classifiers described herein may be used in machine learning algorithms. A variety of machine learning classifiers exist, wherein each classifier produces a unique machine learning process and/or output. The machine learning algorithms may comprise a biased algorithm or an unbiased algorithm. The biased algorithm may comprise Gene Set Enrichment Analysis (GSVA) enrichment of phenotype-associated cell-specific modules. The unbiased approach may employ all available phenotypic data. The machine learning algorithm may comprise an elastic generalized linear model (GLM), a A:-nearest neighbors classifier (KNN), a random forest (RF) classifier, or any combination thereof. GLM, KNN, and RF machine learning algorithms may be performed using the glmnet, caret, and randomForest R packages, respectively.

[0208] The random forest classifier is able to sort through the inherent heterogeneity of the plurality of records to identify one or more third records associated with the specific phenotype. In some embodiments, the classifier identifies said one or more third records associated with the specific phenotype with an accuracy of at least about 70%. The implementation of the random forest classifier herein enable a specific phenotype association sensitivity of 85% and a specific phenotype association specificity of 83%. Further classifier optimization, however, may yield improved results.

[0209] KNN may classify unknown samples based on their proximity to a set number K of known samples. K may be 5% of the size of the pluralities of first, second, and third records. Alternatively, K may be 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, or any increment therein. A large K value may enable more precise calculations with less overall noise. Alternatively, the k-value may be determined through cross-validation by using an independent set of records to validate the K value. If the initial value of k is even, 1 may be added in order to avoid ties. RF may generate 500 decision trees which vote on the class of each sample. The Gini impurity index, a standard measure of misclassification error, correlates to the importance of such variables. In addition, pooled predictions may be assigned based on the average class probabilities across the three classifiers.

[0210] The GLM algorithm may carry out logistic regression with a tunable elastic penalty term to find a balance between an LI (LASSO) and an L2 (ridge), whereby penalties facilitate variable selection in order to generate sparse solutions. Least Absolute Shrinkage and Selection Operator (LASSO) is a regularization feature selection technique to reduce overfitting in regression problems. Ridge regression employs a penalty term is to shrink the LASSO coefficient values. In some embodiments, the elastic generalized linear model classifier employs an elastic penalty of about 0.9, wherein the penalty is 90% lasso and 10% ridge. The elastic penalty may be 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, or any increments therein.

[0211] Records may be classified as active or inactive using two different methodologies: (1) a leave-one-study-out cross-validation approach or (2) a 10-fold cross-validation approach. GLM, KNN, and RF classifiers may be tasked with identifying active and inactive state records based on whole blood (WB) gene expression data and module enrichment data.

[0212] Supervised classification approaches using elastic generalized linear modeling, A;-nearest neighbors, and random forest classifiers may be implemented. The trends in performance when cross-validating by one of the pluralities of records or cross-validating 10-fold display the potential advantages and disadvantages of diagnostic tests incorporating gene expression data or module enrichment. Cross-validating by one of the pluralities of records may be used to generalize 1-fold cross validation as a suboptimal scenario, whereas a 10-fold cross-validation is in fact more optimal. Although classification of active and inactive records from the pluralities of different records with 1- fold cross-validation may be suboptimal, module enrichment may be employed to smooth out much of the technical variation between data sets. 10-fold cross-validation may enable a more standardized diagnostic test. Although the plurality of second records and the plurality of first records are nonoverlapping, the test set employs overlapping records to facilitate proper classification. [0213] Furthermore, modules that may be negatively associated with phenotypic activity may be just as important in classification as positively associated modules. Further study of underrepresented categories of transcripts may enhance understanding and correlation of phenotypic activity.

[0214] Reduction of technical noise may improve classification. For example, RNA-Seq platforms, which produce transcript count records rather than probe intensity values, may display less technical variation across records if all samples are processed in the same way.

[0215] The strong performance of the random forest classifier indicates that nonlinear, decision treebased methods of classification may be ideal because decision trees ask questions about new records sequentially and adaptively. Random forest does not apply a one-size-fits-all approach to each of the different types of records to allow for classification of records whose expression patterns make them a minority within their phenotype. As such, active records that do not resemble the majority of active records still have a strong chance of being properly classified by random forest. By contrast other methods may approach variables from new records all at once.

Filtering

[0216] In some embodiments, the method further comprises filtering the first records, the second records, or both. In some embodiments, the filtering comprises normalizing, variance correction, removing outliers, removing background noise, removing data without annotation data, scaling, Weighted Gene Co-expression Network Analysis, enrichment analysis, dimensionality reduction, or any combination thereof.

[0217] In some embodiments, the normalizing is performed by Robust Multi-Array Analysis (RMA), Guanine Cytosine Robust Multi-Array Analysis (GCRMA), Linear Models for Microarray Data, variance stabilizing transformation (VST), normal-exponential quantile correction (NEQC), or any combination thereof. RMA may summarize the perfect matches through a median polish algorithm, quantile normalization, or both. Variance-stabilizing transformation may simplify considerations in graphical exploratory data analysis, allow the application of simple regressionbased or analysis of variance techniques, or both. Normalized expression values may be variance corrected using local empirical Bayesian shrinkage, and DE may be assessed using the Linear Models for Microarray Data (LIMMA) package. Resulting p-values may be adjusted for multiple hypothesis testing using the Benjamini -Hochberg correction, which resulted in a false discovery rate (FDR). Significant genes within each study may be filtered to retain DE genes with an FDR < 0.2, which may be considered statistically significant. The FDR may be selected a priori to diminish the number of genes that may be excluded as false negatives.

[0218] In some embodiments, the variance correction comprises employing a local empirical Bayesian shrinkage, adjusting the p-values for multiple hypothesis testing using the Benjamini- Hochberg correction, removing all data with a false discovery rate of less than 0.2, or any combination thereof. The Benjamini -Hochberg procedure may decrease the false discovery rate caused by incorrectly rejecting the true null hypotheses control for small p-values.

[0219] In some embodiments, the Weighted Gene Co-expression Network Analysis comprises calculating a topology matrix, clustering the data based on the topology matrix, correlating module eigenvalues for traits on a linear scale by Pearson correlation for nonparametric traits by Spearman correlation and for dichotomous traits by point-biserial correlation or t-test, or both. A topology matrix may specify the connections between vertices in directed multigraph.

[0220] Log2-normalized microarray expression values from purified CD4, CD 14, CD 19, CD33, and low density granulocyte (LDG) populations may be used as input to WGCNA to conduct an unsupervised clustering analysis, resulting in co-expression “modules,” or groups of densely interconnected genes which may correspond to comparably regulated biologic pathways. For each experiment, an approximately scale-free topology matrix (TOM) may be first calculated to encode the network strength between probes. Probes may be clustered into WGCNA modules based on TOM distances. Resultant dendrograms of correlation networks may be trimmed to isolate individual modular groups of probes by partitioning around medoids and labeled using color assignments based on module size. Expression profiles of genes within modules may be summarized by a module ei gengene (ME), which may be analogous to the module’s first principal component. MEs act as characteristic expression values for their respective modules and may be correlated with sample traits such as SLED Al or cell type by Pearson correlation for continuous or semi-continuous traits and by point-biserial correlation for dichotomous traits.

[0221] WGCNA modules from CD4, CD14, CD19, and CD33 cells may be tested for correlation to SLED Al. Plasma cell modules may be generated by differential expression analysis and not WGCNA, but may be included because of the established importance of plasma cells in SLE pathogenesis.

[0222] Removing the outliers may be performed by statistical analysis using R and relevant Bioconductor packages. Non-normalized arrays may be inspected for visual artifacts or poor hybridization using Affy QC plots. Principal Component Analysis (PCA) plots may be used to inspect the raw data files for outliers. Data sets culled of outliers may be cleaned of background noise and normalized using RMA, GCRMA, or NEQC where appropriate. Data sets may be then filtered to remove probes with low intensity values and probes without gene annotation data. WB gene expression data sets may be filtered to only include genes that passed quality control in all data sets. Differential expression (DE) analysis and WGCNA may then be carried out on data sets. WB gene expression data sets may then be further processed before machine learning analysis. WB gene expression values may be centered and scaled to have zero-mean and unit-variance within each data set and the standardized expression values from each data set may be joined for classification.

[0223] The GSVA-R package may be used as a non-parametric method for estimating the variation of pre-defined gene sets in WB gene expression data sets. Standardized expression values from WB data sets may be used to test for enrichment of cell-specific WGCNA gene modules using the Single-sample Gene Set Enrichment Analysis (ssGSEA) method, which scores single samples in isolation and may be thus shielded from technical variation within and among data sets. Statistical analysis of GSVA enrichment scores may be performed by Spearman correlation or Welch’s unequal variances /-test, where appropriate. GSVA may be performed on three WB datasets using 25 WGCNA modules made from purified cells with correlation or published relationship to SLED Al.

[0224] Patterns of enrichment of WGCNA modules that are derived from isolated cell populations of WB that are correlated to the phenotype may be more useful than gene expression across the pluralities of records to identify active versus inactive state records. To characterize the relationships between gene signatures from various records and phenotypic activity, WGCNA may be used to generate co-expression gene modules from purified populations of cells from records with an active disease state. Such records may be subsequently tested for enrichment in whole blood of other records. WGCNA analysis of leukocyte subsets may result in several gene modules with significant Pearson correlations to SLED Al (all |r| > .47, p < 0.05). CD4, CD14, CD19, and CD33 cells with 3, 6, 8, and 4 significant modules, respectively. Two low-density granulocyte (LDG) modules may be created by performing WGCNA analysis of LDGs along with either neutrophils or HC neutrophils and merging the modules most strongly expressed by LDGs Two plasma cell (PC) modules may be created by using the most increased and decreased transcripts of isolated plasma cells compared to naive and memory B cells.

[0225] Gene Ontology (GO) analysis of the genes within each of the record indicates that that some processes, such as those related to interferon signaling, RNA transcription, and protein translation, may be shared among cell types, whereas other processes may be unique to certain cell types and may be used to better classification of records.

[0226] GSVA enrichment may be performed using the 25 cell-specific gene modules in WB from 156 records (82 active, 74 inactive). Of the 25 cell-specific modules, 12 had enrichment scores with significant Spearman correlations to SLEDAI (p < 0.05), and 14 had enrichment scores with significant differences between active and inactive state records by Welch’s unequal variances /-test (p < 0.05). Notably, each cell type produced at least one module with a significant correlation to SLEDAI in WB and at least one module with a significant difference in enrichment scores between active and inactive records, demonstrating a relationship between phenotypic activity in specific cellular subsets and overall phenotypic activity in WB. However, as the Spearman’s rho values ranged from -0.40 to +0.36, no one module may have a substantial predictive value. Furthermore, the effect sizes as measured by Cohen’s d when testing active versus inactive enrichment scores ranged from -0.85 to +0.79. The CD4 Floralwhite and Orangered4 modules, which had the largest positive and negative effect sizes, respectively, showed a high degree of overlap in the enrichment scores of active and inactive records, where error bars indicate mean ± standard deviation. WB may be unable to fully separate active records from inactive records.

[0227] Analysis of individual phenotypic activity associated peripheral cellular subset gene modules may not be sufficient to predict phenotypic activity in unrelated WB data sets, since no single module from any cell type may be able to separate active from inactive state records. Although no single module had a sufficiently high predictive value, many cell-specific gene modules may be combined and optimized to predict phenotypes of active records. Moreover, the results emphasized the need for more advanced analysis to employ gene expression analysis to predict phenotypic activity.

Performance and Accuracy

[0228] When training and testing sets are formed by holding out entire data sets, machine learning algorithms using raw gene expression data had an average classification accuracy of only 53 percent. However, converting this gene expression data to module enrichment improved classification accuracy to 71 percent. When training and testing sets are formed by mixing records from the three data sets, module enrichment remained at a 70 percent classification accuracy. However, classification accuracy using raw gene expression increased to a mean of 79 percent. The best overall performance came from the random forest classifier, which had a predictive accuracy of 84 percent.

[0229] The performance of each machine learning algorithm may be determined by evaluating 2 different forms of cross-validation. A random 10-fold cross-validation may randomly assign each record to one of 10 groups. A leave-one-study-out cross-validation may determine the effects of systematic technical differences among data sets on classification performance. For each pass of cross-validation, one fold or study may be held out as a test set, whereby the classifiers are trained on the remaining data. Accuracy may be assessed as the proportion of records correctly classified across all testing folds. Performance metrics such as sensitivity and specificity may be assessed after cross-validation by agglomerating class probabilities and assignments from each fold or study. Receiver Operating Characteristic (ROC) curves may be generated using the pROC R package.

[0230] In almost all cases, the random forest classifier outperformed the GLM and KNN classifiers, although the results may be not significantly different when assessed by testing for equality of proportions (p > 0.05). Pooled predictions based on the class probabilities from the three classifiers may not improve overall performance. [0231] When cross-validating by study, the use of expression values may achieve an accuracy of only 53 percent, which is consistent with the findings that gene expression values may provide less value towards classifying unfamiliar records. When the training records and test records are greatly heterogeneous, the classifiers learning patterns may be less helpful for classifying test records. Remarkably, the use of module enrichment scores improved accuracy to approximately 70 percent.

[0232] The 10-fold cross-validation with raw gene expression values may result in better performance compared to the leave-one-study-out cross-validation. This increase in performance may be attributed to the presence of records from all plurality of first, second, and third records in both the training and test sets. In this case, the classifiers may leam patterns inherent to each set of records. In this circumstance, the random forest classifier may be the strongest performer with 84% accuracy (85% sensitivity, 83% specificity), whereby the ROC curve demonstrates an excellent tradeoff between recall and fall-out. The performance of module enrichment, however may not be substantially different between 10-fold cross-validation and leave-one-study-out cross-validation.

[0233] Overall, in a study-by-study approach (leave-one-study-out cross-validation), module enrichment may be more successful than raw gene expression. Importantly, when using the 10-fold cross-validation approach, raw gene expression may outperform module enrichment. Thus, phenotypic activity classification based on raw gene expression may be sensitive to technical variability, whereas classification based on module enrichment may cope better with variation among data sets.

[0234] The variable importance of Random forest provides insight into directors of the identification of phenotypic activity, random forest classifiers may be trained on all records from each of the plurality of records in order to identify the most important genes and modules as determined by mean decrease in the Gini impurity, a measure of misclassification error.

[0235] The most important genes and modules identified a wide array of cell types and biological functions. The most important genes encompass such diverse functions as interferon signaling, pattern recognition receptor signaling, and control of survival and proliferation. Notably, the most influential modules may be skewed away from B cell-derived modules and towards T cell- and myeloid cell-derived modules. As some of these modules had overlapping genes, the variable importance experiment may be repeated with modules that may be first scrubbed of any genes that appeared in more than one module before GSVA enrichment scoring. The relative variable importance scores of the de-duplicated modules correlated strongly with those of the original modules (Spearman’s rho = 0.73, p = 5.18E-5), indicating that module behavior may be partly driven by the overlapping genes but strongly driven by unique genes. Variable importance of top 25 individual genes. LDG: low-density granulocyte; PC: plasma cell. [0236] CD4_Floralwhite and CD14_Yellow, two interferon-related modules which maintained high importance after deduplication, may be further analyzed to study the effect of unique genes on module importance. Gene lists may be tested for statistical overrepresentation of Gene Ontology biological process terms with FDR correction on pantherdb.org. CD4_Floralwhite did not show any significant enrichment, but CD14_Yellow, which had the highest importance after deduplication, may be highly enriched for genes with the “Immune Effector Process” designation (26/77 genes, FDR = 9.38E-11 by Fisher’s exact test) . This suggests that CD14+ monocytes express unique genes that may play important roles in the initiation of phenotypic activity.

[0237] Several important findings on the topic of gene expression heterogeneity within and across data sets have been elucidated by this study. First, DE analysis of active vs inactive records may be insufficient for proper classification of phenotypic activity, as systematic differences between data sets render conventional bioinformatics techniques largely non-generalizable.

[0238] Further, WGCNA modules created from the cellular components of WB and correlated to SLED Al phenotypic activity may improve classification of phenotypic activity in records. The use of cell-specific gene modules based on a priori knowledge about their relevance to disease fared slightly better than raw gene expression, as it generated informative enrichment patterns, and many of the modules maintained significant correlations with SLED Al in WB. However, these enrichment scores failed to completely separate active records from inactive records by hierarchical clustering.

Method Characterization

[0239] Conventional bioinformatics approaches do not satisfactorily identify one or more records having a specific phenotype. DE analysis of a plurality of first records, a plurality of second records, and a plurality of third records having an active disease state and a non-active disease state displayed the major differences and heterogeneity. First, the 100 most significant DE genes by FDR in the plurality of first, second, and third records may be used to carry out hierarchical clustering of active and inactive disease state records. Active disease state records are clearly separated from inactive records, but only partially separated from inactive records.

[0240] Out of 6,640 unique DE genes from the three pluralities of records, 5,170 genes are unique to one of the plurality of records, 1,234 are shared by two of the plurality of records, and 36 are shared by all three of the plurality of records. There is minimal overlap of the 100 most significant genes by FDR in each of the pluralities of records. The only overlaps among the top 100 DE genes in each study by FDR are: TWY3 and EHBP1, shared between the plurality of first records and the plurality of third records; and LZIC, shared between the plurality of first records and plurality of second records. Furthermore, the fold change distributions of the 100 most significant DE genes in each of the pluralities of records varied considerably. In the plurality of first records, 94 of the 100 most significant genes are downregulated in active disease state records; in the plurality of second records, all of the top 100 genes are upregulated in active disease state records; and in the plurality of third records, the top 100 genes are more evenly distributed (41 up, 59 down). Orange bars denote active state records, wherein black bars denote inactive state records

[0241] The plurality of first, second, and third records may represent different populations and may be collected on different microarray platforms. The lack of commonality among the genes most descriptive of active state records and inactive state records in each of the pluralities of records casts doubt on whether active and inactive states from the different pluralities of records may be easily determined using conventional techniques.

[0242] Records from the pluralities of first, second, and third records may then be joined to evaluate whether unsupervised techniques may separate active state records from inactive state records. Hierarchical clustering on the 297 unique most significant DE genes by FDR showed considerable heterogeneity, and active records and inactive records did not consistently separate, per the heat map of the top 100 DE genes by FDR from each of the pluralities of records (combined total of 297 unique genes from the plurality of first, second, and third records) expressed in all records. As such, conventional techniques failed to identify active records, highlighting the need for more advanced algorithms.

Digital Processing Device

[0243] In some embodiments, the platforms, systems, media, and methods described herein include a digital processing device, or use of the same. In further embodiments, the digital processing device includes one or more hardware central processing units (CPUs) or general purpose graphics processing units (GPGPUs) that carry out the device’s functions. In still further embodiments, the digital processing device further comprises an operating system configured to perform executable instructions. In some embodiments, the digital processing device is optionally connected a computer network. In further embodiments, the digital processing device is optionally connected to the Internet such that it accesses the World Wide Web. In still further embodiments, the digital processing device is optionally connected to a cloud computing infrastructure. In other embodiments, the digital processing device is optionally connected to an intranet. In other embodiments, the digital processing device is optionally connected to a data storage device.

[0244] In accordance with the description herein, suitable digital processing devices include, by way of non-limiting examples, server computers, desktop computers, laptop computers, notebook computers, sub-notebook computers, netbook computers, netpad computers, set-top computers, media streaming devices, handheld computers, Internet appliances, mobile smartphones, tablet computers, personal digital assistants, video game consoles, and vehicles. Those of skill in the art will recognize that many smartphones are suitable for use in the system described herein. Those of skill in the art will also recognize that select televisions, video players, and digital music players with optional computer network connectivity are suitable for use in the system described herein.

Suitable tablet computers include those with booklet, slate, and convertible configurations, known to those of skill in the art.

[0245] In some embodiments, the digital processing device includes an operating system configured to perform executable instructions. The operating system is, for example, software, including programs and data, which manages the device’s hardware and provides services for execution of applications. Those of skill in the art will recognize that suitable server operating systems include, by way of non-limiting examples, FreeBSD, OpenBSD, NetBSD®, Linux, Apple® Mac OS X Server®, Oracle® Solaris®, Windows Server®, and Novell® NetWare®. Those of skill in the art will recognize that suitable personal computer operating systems include, by way of non-limiting examples, Microsoft® Windows®, Apple® Mac OS X®, UNIX®, and UNIX-like operating systems such as GNU/Linux®. In some embodiments, the operating system is provided by cloud computing. Those of skill in the art will also recognize that suitable mobile smart phone operating systems include, by way of non-limiting examples, Nokia® Symbian® OS, Apple® iOS®, Research In Motion® BlackBerry OS®, Google® Android®, Microsoft® Windows Phone® OS, Microsoft® Windows Mobile® OS, Linux®, and Palm® WebOS®. Those of skill in the art will also recognize that suitable media streaming device operating systems include, by way of non-limiting examples, Apple TV®, Roku®, Boxee®, Google TV®, Google Chromecast®, Amazon Fire®, and Samsung® HomeSync®. Those of skill in the art will also recognize that suitable video game console operating systems include, by way of non-limiting examples, Sony® PS3®, Sony® PS4®, Microsoft® Xbox 360®, Microsoft Xbox One, Nintendo® Wii®, Nintendo® Wii U®, and Ouya®.

[0246] In some embodiments, the device includes a storage and/or memory device. The storage and/or memory device is one or more physical apparatuses used to store data or programs on a temporary or permanent basis. In some embodiments, the device is volatile memory and requires power to maintain stored information. In some embodiments, the device is non-volatile memory and retains stored information when the digital processing device is not powered. In further embodiments, the non-volatile memory comprises flash memory. In some embodiments, the nonvolatile memory comprises dynamic random-access memory (DRAM). In some embodiments, the non-volatile memory comprises ferroelectric random access memory (FRAM). In some embodiments, the non-volatile memory comprises phase-change random access memory (PRAM). In other embodiments, the device is a storage device including, by way of non-limiting examples, CD-ROMs, DVDs, flash memory devices, magnetic disk drives, magnetic tapes drives, optical disk drives, and cloud computing-based storage. In further embodiments, the storage and/or memory device is a combination of devices such as those disclosed herein.

[0247] In some embodiments, the digital processing device includes a display to send visual information to a user. In some embodiments, the display is a liquid crystal display (LCD). In further embodiments, the display is a thin film transistor liquid crystal display (TFT-LCD). In some embodiments, the display is an organic light emitting diode (OLED) display. In various further embodiments, on OLED display is a passive-matrix OLED (PMOLED) or active-matrix OLED (AMOLED) display. In some embodiments, the display is a plasma display. In other embodiments, the display is a video projector. In yet other embodiments, the display is a head-mounted display in communication with the digital processing device, such as a VR headset. In further embodiments, suitable VR headsets include, by way of non-limiting examples, HTC Vive, Oculus Rift, Samsung Gear VR, Microsoft HoloLens, Razer OSVR, FOVE VR, Zeiss VR One, Avegant Glyph, Freefly VR headset, and the like. In still further embodiments, the display is a combination of devices such as those disclosed herein.

[0248] In some embodiments, the digital processing device includes an input device to receive information from a user. In some embodiments, the input device is a keyboard. In some embodiments, the input device is a pointing device including, by way of non-limiting examples, a mouse, trackball, track pad, joystick, game controller, or stylus. In some embodiments, the input device is a touch screen or a multi-touch screen. In other embodiments, the input device is a microphone to capture voice or other sound input. In other embodiments, the input device is a video camera or other sensor to capture motion or visual input. In further embodiments, the input device is a Kinect, Leap Motion, or the like. In still further embodiments, the input device is a combination of devices such as those disclosed herein.

Non-transitory computer readable storage medium

[0249] In some embodiments, the platforms, systems, media, and methods disclosed herein include one or more non-transitory computer readable storage media encoded with a program including instructions executable by the operating system of an optionally networked digital processing device. In further embodiments, a computer readable storage medium is a tangible component of a digital processing device. In still further embodiments, a computer readable storage medium is optionally removable from a digital processing device. In some embodiments, a computer readable storage medium includes, by way of non-limiting examples, CD-ROMs, DVDs, flash memory devices, solid state memory, magnetic disk drives, magnetic tape drives, optical disk drives, cloud computing systems and services, and the like. In some cases, the program and instructions are permanently, substantially permanently, semi -permanently, or non-transitorily encoded on the media.

Computer Program

[0250] In some embodiments, the platforms, systems, media, and methods disclosed herein include at least one computer program, or use of the same. A computer program includes a sequence of instructions, executable in the digital processing device’s CPU, written to perform a specified task. Computer readable instructions may be implemented as program modules, such as functions, objects, Application Programming Interfaces (APIs), data structures, and the like, that perform particular tasks or implement particular abstract data types. In light of the disclosure provided herein, those of skill in the art will recognize that a computer program may be written in various versions of various languages.

[0251] The functionality of the computer readable instructions may be combined or distributed as desired in various environments. In some embodiments, a computer program comprises one sequence of instructions. In some embodiments, a computer program comprises a plurality of sequences of instructions. In some embodiments, a computer program is provided from one location. In other embodiments, a computer program is provided from a plurality of locations. In various embodiments, a computer program includes one or more software modules. In various embodiments, a computer program includes, in part or in whole, one or more web applications, one or more mobile applications, one or more standalone applications, one or more web browser plug-ins, extensions, add-ins, or add-ons, or combinations thereof.

Web application

[0252] In some embodiments, a computer program includes a web application. In light of the disclosure provided herein, those of skill in the art will recognize that a web application, in various embodiments, utilizes one or more software frameworks and one or more database systems. In some embodiments, a web application is created upon a software framework such as Microsoft® .NET or Ruby on Rails (RoR). In some embodiments, a web application utilizes one or more database systems including, by way of non-limiting examples, relational, non-relational, object oriented, associative, and XML database systems. In further embodiments, suitable relational database systems include, by way of non-limiting examples, Microsoft® SQL Server, mySQL™, and Oracle®. Those of skill in the art will also recognize that a web application, in various embodiments, is written in one or more versions of one or more languages. A web application may be written in one or more markup languages, presentation definition languages, client-side scripting languages, serverside coding languages, database query languages, or combinations thereof. In some embodiments, a web application is written to some extent in a markup language such as Hypertext Markup Language (HTML), Extensible Hypertext Markup Language (XHTML), or extensible Markup Language (XML). In some embodiments, a web application is written to some extent in a presentation definition language such as Cascading Style Sheets (CSS). In some embodiments, a web application is written to some extent in a client-side scripting language such as Asynchronous Javascript and XML (AJAX), Elash® Actionscript, Javascript, or Silverlight®. In some embodiments, a web application is written to some extent in a server-side coding language such as Active Server Pages (ASP), ColdFusion®, Perl, Java™, JavaServer Pages (JSP), Hypertext Preprocessor (PHP), Python™, Ruby, Tel, Smalltalk, WebDNA®, or Groovy. In some embodiments, a web application is written to some extent in a database query language such as Structured Query Language (SQL). In some embodiments, a web application integrates enterprise server products such as IBM® Lotus Domino®. In some embodiments, a web application includes a media player element. In various further embodiments, a media player element utilizes one or more of many suitable multimedia technologies including, by way of non-limiting examples, Adobe® Flash®, HTML 5, Apple® QuickTime®, Microsoft® Silverlight®, Java™, and Unity®.

Standalone Application

[0253] In some embodiments, a computer program includes a standalone application, which is a program that is run as an independent computer process, not an add-on to an existing process, e.g., not a plug-in. Those of skill in the art will recognize that standalone applications are often compiled. A compiler is a computer program(s) that transforms source code written in a programming language into binary object code such as assembly language or machine code. Suitable compiled programming languages include, by way of non-limiting examples, C, C++, Objective-C, COBOL, Delphi, Eiffel, Java™, Lisp, Python™, Visual Basic, and VB .NET, or combinations thereof. Compilation is often performed, at least in part, to create an executable program. In some embodiments, a computer program includes one or more executable complied applications.

Web Browser Plug-in

[0254] In some embodiments, the computer program includes a web browser plug-in (e.g., extension, etc.). In computing, a plug-in is one or more software components that add specific functionality to a larger software application. Makers of software applications support plug-ins to enable third-party developers to create abilities which extend an application, to support easily adding new features, and to reduce the size of an application. When supported, plug-ins enable customizing the functionality of a software application. For example, plug-ins are commonly used in web browsers to play video, generate interactivity, scan for viruses, and display particular file types.

Those of skill in the art will be familiar with several web browser plug-ins including, Adobe® Flash® Player, Microsoft® Silverlight®, and Apple® QuickTime®.

[0255] In view of the disclosure provided herein, those of skill in the art will recognize that several plug-in frameworks are available that enable development of plug-ins in various programming languages, including, by way of non-limiting examples, C++, Delphi, Java™, PHP, Python™, and VB .NET, or combinations thereof.

[0256] Web browsers (also called Internet browsers) are software applications, designed for use with network-connected digital processing devices, for retrieving, presenting, and traversing information resources on the World Wide Web. Suitable web browsers include, by way of nonlimiting examples, Microsoft® Internet Explorer®, Mozilla® Firefox®, Google® Chrome, Apple® Safari®, Opera Software® Opera®, and KDE Konqueror. In some embodiments, the web browser is a mobile web browser. Mobile web browsers (also called mircrobrowsers, mini-browsers, and wireless browsers) are designed for use on mobile digital processing devices including, by way of nonlimiting examples, handheld computers, tablet computers, netbook computers, subnotebook computers, smartphones, music players, personal digital assistants (PDAs), and handheld video game systems. Suitable mobile web browsers include, by way of non-limiting examples, Google® Android® browser, RIM BlackBerry® Browser, Apple® Safari®, Palm® Blazer, Palm® WebOS® Browser, Mozilla® Firefox® for mobile, Microsoft® Internet Explorer® Mobile, Amazon® Kindle® Basic Web, Nokia® Browser, Opera Software® Opera® Mobile, and Sony® PSP™ browser.

Software Modules

[0257] In some embodiments, the platforms, systems, media, and methods disclosed herein include software, server, and/or database modules, or use of the same. In view of the disclosure provided herein, software modules are created by techniques known to those of skill in the art using machines, software, and languages known to the art. The software modules disclosed herein are implemented in a multitude of ways. In various embodiments, a software module comprises a file, a section of code, a programming object, a programming structure, or combinations thereof. In further various embodiments, a software module comprises a plurality of files, a plurality of sections of code, a plurality of programming objects, a plurality of programming structures, or combinations thereof. In various embodiments, the one or more software modules comprise, by way of non-limiting examples, a web application, a mobile application, and a standalone application. In some embodiments, software modules are in one computer program or application. In other embodiments, software modules are in more than one computer program or application. In some embodiments, software modules are hosted on one machine. In other embodiments, software modules are hosted on more than one machine. In further embodiments, software modules are hosted on cloud computing platforms. In some embodiments, software modules are hosted on one or more machines in one location. In other embodiments, software modules are hosted on one or more machines in more than one location.

Databases

[0258] In some embodiments, the platforms, systems, media, and methods disclosed herein include one or more databases, or use of the same. In view of the disclosure provided herein, those of skill in the art will recognize that many databases are suitable for identifying one or more records having a specific phenotype. In various embodiments, suitable databases include, by way of non-limiting examples, relational databases, non-relational databases, object oriented databases, object databases, entity-relationship model databases, associative databases, and XML databases. Further non-limiting examples include SQL, PostgreSQL, MySQL, Oracle, DB2, and Sybase. In some embodiments, a database is internet-based. In further embodiments, a database is web-based. In still further embodiments, a database is cloud computing-based. In other embodiments, a database is based on one or more local computer storage devices.

Biological Data Analysis

[0259] The present disclosure provides systems and methods to perform data analysis using drug or target scoring algorithms and/or big data analysis tools. In various aspects, such drug or target scoring algorithms and/or big data analysis tools may be used to perform analysis of data sets including, for example, mRNA gene expression or transcriptome data, DNA genomic data, proteomic data, metabolomic data, other types of “-omic” data, or a combination thereof.

[0260] In an aspect, the present disclosure provides a computer-implemented method for assessing a condition of a subject, comprising: (a) receiving a dataset of a biological sample of the subject; (b) selecting one or more data analysis tools, wherein the one or more data analysis tools comprise an analysis tool selected from the group consisting of : a BIG-C™ big data analysis tool, an I-Scope™ big data analysis tool, a T-Scope™ big data analysis tool, a CellScan big data analysis tool, an MS (Molecular Signature) Scoring ™ analysis tool, a Gene Set Variation Analysis (GSVA) tool (e.g., P- Scope), a CoLTs® (Combined Lupus Treatment Scoring) analysis tool, and a Target Scoring analysis tool; (c) processing the dataset using the one or more data analysis tools to generate a data signature of the biological sample of the subject; and (d) based at least in part on the data signature generated in (c), assessing the condition of the subject.

[0261] In some embodiments, the dataset comprises mRNA gene expression or transcriptome data, DNA genomic data, proteomic data, metabolomic data, or a combination thereof. In some embodiments, the biological sample comprises a whole blood (WB) sample, a PBMC sample, a tissue sample, a cell sample, or any derivative thereof. In some embodiments, assessing the condition of the subject comprises identifying a disease or disorder of the subject.

[0262] In some embodiments, the method further comprises identifying a disease or disorder of the subject at a sensitivity or specificity of at least about 70%. In some embodiments, the method further comprises determining a likelihood of the identification of the disease or disorder of the subject. In some embodiments, the method further comprises providing a therapeutic intervention for the disease or disorder of the subject. In some embodiments, the method further comprises monitoring the disease or disorder of the subject, wherein the monitoring comprises assessing the disease or disorder of the subject at a plurality of time points, wherein the assessing is based at least on the disease or disorder identified at each of the plurality of time points.

[0263] In some embodiments, selecting the one or more data analysis tools comprises receiving a user selection of the one or more data analysis tools. In some embodiments, selecting the one or more data analysis tools is automatically performed by the computer without receiving a user selection of the one or more data analysis tools. [0264] In another aspect, the present disclosure provides a computer system for assessing a condition of a subject, comprising: a database that is configured to store a dataset of a biological sample of the subject; and one or more computer processors operatively coupled to the database, wherein the one or more computer processors are individually or collectively programmed to: (i) select one or more data analysis tools comprising: a BIG-C™ big data analysis tool, an I-Scope™ big data analysis tool, a T-Scope™ big data analysis tool, a CellScan big data analysis tool, an MS (Molecular Signature) Scoring™ analysis tool, a Gene Set Variation Analysis (GSVA) tool (e.g., P- Scope), a CoLTs® (Combined Lupus Treatment Scoring) analysis tool, a Target Scoring analysis tool, or a combination thereof; (ii) process the dataset using the one or more data analysis tools to generate a data signature of the biological sample of the subject; and (iii) based at least in part on the data signature generated in (ii), assess the condition of the subject.

[0265] In another aspect, the present disclosure provides a non-transitory computer readable medium comprising machine- executable code that, upon execution by one or more computer processors, implements a method for assessing a condition of a subject, the method comprising: (a) receiving a dataset of a biological sample of the subject; (b) selecting one or more data analysis tools, wherein the one or more data analysis tools comprise an analysis tool selected from the group consisting of : a BIG-C™ big data analysis tool, an I-Scope™ big data analysis tool, a T-Scope™ big data analysis tool, a CellScan big data analysis tool, an MS (Molecular Signature) Scoring ™ analysis tool, a Gene Set Variation Analysis (GSVA) tool (e.g., P-Scope), a CoLTs® (Combined Lupus Treatment Scoring) analysis tool, and a Target Scoring analysis tool; (c) processing the dataset using the one or more data analysis tools to generate a data signature of the biological sample of the subject; and (d) based at least in part on the data signature generated in (c), assessing the condition of the subject. In any embodiment described herein, the one or more data analysis tools may be a plurality of data analysis tools each independently selected from a BIG-C™ big data analysis tool, an I-Scope™ big data analysis tool, a T-Scope™ big data analysis tool, a CellScan big data analysis tool, an MS (Molecular Signature) Scoring ™ analysis tool, a Gene Set Variation Analysis (GSVA) tool (e.g., P-Scope), a CoLTs® (Combined Lupus Treatment Scoring) analysis tool, and a Target Scoring analysis tool.

[0266] To obtain a blood sample, various techniques may be used, e.g., a syringe or other vacuum suction device. A blood sample may be optionally pre-treated or processed prior to use. A sample, such as a blood sample, may be analyzed under any of the methods and systems herein within 4 weeks, 2 weeks, 1 week, 6 days, 5 days, 4 days, 3 days, 2 days, 1 day, 12 hr, 6 hr, 3 hr, 2 hr, or 1 hr from the time the sample is obtained, or longer if frozen. When obtaining a sample from a subject (e.g., blood sample), the amount may vary depending upon subject size and the condition being screened. In some embodiments, at least 10 mL, 5 mL, 1 mL, 0.5 mL, 250, 200, 150, 100, 50, 40, 30, 20, 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1 pL of a sample is obtained. In some embodiments, 1-50, 2-40, 3-30, or 4-20 pL of sample is obtained. In some embodiments, more than 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95 or 100 pL of a sample is obtained.

[0267] The sample may be taken before and/or after treatment of a subject with a disease or disorder. Samples may be obtained from a subject during a treatment or a treatment regime. Multiple samples may be obtained from a subject to monitor the effects of the treatment over time. The sample may be taken from a subject known or suspected of having a disease or disorder for which a definitive positive or negative diagnosis is not available via clinical tests. The sample may be taken from a subject suspected of having a disease or disorder. The sample may be taken from a subject experiencing unexplained symptoms, such as fatigue, nausea, weight loss, aches and pains, weakness, or bleeding. The sample may be taken from a subject having explained symptoms. The sample may be taken from a subject at risk of developing a disease or disorder due to factors such as familial history, age, hypertension or pre-hypertension, diabetes or pre-diabetes, overweight or obesity, environmental exposure, lifestyle risk factors (e.g., smoking, alcohol consumption, or drug use), or presence of other risk factors.

[0268] In some embodiments, a sample may be taken at a first time point and assayed, and then another sample may be taken at a subsequent time point and assayed. Such methods may be used, for example, for longitudinal monitoring purposes to track the development or progression of a disease. In some embodiments, the progression of a disease may be tracked before treatment, after treatment, or during the course of treatment, to determine the treatment’s effectiveness. For example, a method as described herein may be performed on a subject prior to, and after, treatment with a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition therapy to measure the disease’s progression or regression in response to the lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition therapy.

[0269] After obtaining a sample from the subject, the sample may be processed to generate datasets indicative of a disease or disorder of the subject. For example, a presence, absence, or quantitative assessment of nucleic acid molecules of the sample from a panel of condition-associated genomic loci or may be indicative of a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition of the subject. Processing the sample obtained from the subject may comprise (i) subjecting the sample to conditions that are sufficient to isolate, enrich, or extract a plurality of nucleic acid molecules, and (ii) assaying the plurality of nucleic acid molecules to generate the dataset (e.g., microarray data, nucleic acid sequences, or quantitative polymerase chain reaction (qPCR) data). Methods of assaying may include any assay known in the art or described in the literature, for example, a microarray assay, a sequencing assay (e.g., DNA sequencing, RNA sequencing, or RNA-Seq), or a quantitative polymerase chain reaction (qPCR) assay. [0270] In some embodiments, a plurality of nucleic acid molecules is extracted from the sample and subjected to sequencing to generate a plurality of sequencing reads. The nucleic acid molecules may comprise ribonucleic acid (RNA) or deoxyribonucleic acid (DNA). The extraction method may extract all RNA or DNA molecules from a sample. Alternatively, the extraction method may selectively extract a portion of RNA or DNA molecules from a sample. Extracted RNA molecules from a sample may be converted to cDNA molecules by reverse transcription (RT).

[0271] The sample may be processed without any nucleic acid extraction. For example, the disease or disorder may be identified or monitored in the subject by using probes configured to selectively enrich nucleic acid (e.g., RNA or DNA) molecules corresponding to a panel of condition-associated genomic loci. The probes may be nucleic acid primers. The probes may have sequence complementarity with nucleic acid sequences from one or more of the panel of condition-associated genomic loci. The panel of condition-associated genomic loci may comprise at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least about 25, at least about 30, at least about 35, at least about 40, at least about 45, at least about 50, at least about 55, at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, at least about 90, at least about 95, at least about 100, or more condition-associated genomic loci.

[0272] The probes may be nucleic acid molecules (e.g., RNA or DNA) having sequence complementarity with nucleic acid sequences (e.g., RNA or DNA) of one or more genomic loci (e.g., condition-associated genomic loci). These nucleic acid molecules may be primers or enrichment sequences. The assaying of the sample using probes that are selective for the one or more genomic loci (e.g., condition-associated genomic loci) may comprise use of array hybridization, polymerase chain reaction (PCR), or nucleic acid sequencing (e.g., RNA sequencing or DNA sequencing, such as RNA-Seq).

[0273] The assay readouts may be quantified at one or more genomic loci (e.g., condition-associated genomic loci) to generate the data indicative of the disease or disorder. For example, quantification of array hybridization or polymerase chain reaction (PCR) corresponding to a plurality of genomic loci (e.g., condition-associated genomic loci) may generate data indicative of the disease or disorder. Assay readouts may comprise quantitative PCR (qPCR) values, digital PCR (dPCR) values, digital droplet PCR (ddPCR) values, fluorescence values, etc., or normalized values thereof.

Big data analysis tools and drug/target scoring algorithms

[0274] The present disclosure provides systems and methods to perform data analysis using drug or target scoring algorithms and/or big data analysis tools. In various aspects, such drug or target scoring algorithms and/or big data analysis tools may be used to perform analysis of data sets including, for example, mRNA gene expression or transcriptome data, DNA genomic data, proteomic data, metabolomic data, other types of “-omic” data, or a combination thereof. Systems and methods of the present disclosure may use one or more of the following: a BIG-C™ big data analysis tool, an I-Scope™ big data analysis tool, a T-Scope™ big data analysis tool, a CellScan big data analysis tool, an MS (Molecular Signature) Scoring ™ analysis tool, a Gene Set Variation Analysis (GSVA) tool (e.g., P-Scope), a CoLTs® (Combined Lupus Treatment Scoring) analysis tool, and a Target Scoring analysis tool.

[0275] A non-limiting example of a workflow of a method to assess a condition of a subject using one or more data analysis tools and/or algorithms may comprise receiving a dataset of a biological sample of a subject. Next, the method may comprise selecting one or more data analysis tools and/or algorithms. For example, the data analysis tools and/or algorithms may comprise a BIG-C™ big data analysis tool, an I-Scope™ big data analysis tool, a T-Scope™ big data analysis tool, a CellScan big data analysis tool, an MS (Molecular Signature) Scoring ™ analysis tool, a Gene Set Variation Analysis (GSVA) tool (e.g., P-Scope), a CoLTs® (Combined Lupus Treatment Scoring) analysis tool, a Target Scoring analysis tool, or a combination thereof. Next, the method may comprise processing the dataset using selected data analysis tools and/or algorithms to generate a data signature of the biological sample of the subject. Next, the method may comprise assessing the condition of the subject based on the data signature.

[0276] The BIG-C (Biologically Informed Gene Clustering) tool may be configured to sort large groups of genes into a set of functional groups (e.g., 53 functional groups). The functional groups are created utilizing publicly available information from online tools and databases including UniProtKB/Swiss-Prot, GO Terms, KEGG pathways, NCBI PubMed, and the Interactome. The functional groups may include one or more of: Active RNA, Anti-apoptosis, anti-proliferation, autophagy, chromatin remodeling, cytoplasm and biochemistry, cytoskeleton, DNA repair, endocytosis, endoplasmic reticulum, endosome and vesicles, fatty acid biosynthesis, cell surface, transcription, glycolysis and gluconeogenesis, golgi, immune cell surface, immune secreted, immune signaling, integrin pathway, interferon stimulated genes, intracellular signaling, lysosome, melanosome, MHC class I, MHC class II, microRNA processing, microRNA, mitochondrial transcription, mitochondria, mitochondria oxidative phosphorylation, mitochondrial TCA cycle, mRNA processing, mRNA splicing, non-coding RNA, nuclear receptor, nucleus and nucleolus, palmitoylation, pattern recognition receptors, peroxisomes, pro-apoptosis, pro-cell cycle, proteasome, pseudogenes, RAS superfamily, reactive oxygen species protection, secreted and extracellular matrix, transcription factors, transporters, transposon control, ubiquitylation and sumoylation, unfolded protein and stress, and unknown. Enrichment scores for each group are calculated based on an overlap p value to determine the functional groups over or under-expressed in the gene expression dataset. The BIG-C may be configured such that each gene is sorted into only one of the 53 functional groups, allowing for a quick and relatively simple understanding of types of genes enriched and co-expressed in a big dataset.

[0277] The I-Scope™ tool may be configured to identify immune infiltrates. Hematopoietic cells are unique in that they move throughout the body patrolling for threats to the host, and may infiltrate tissue sites not normally home to immune cells. I-Scope™ may be configured to identify hematopoietic cells through an iterative search of more than 17,000 genes identified in more than 50 microarray datasets. From this search, 1226 candidate genes are identified and researched for restriction in hematopoietic cells as determined by the HP A, GTEx and FANTOM5 datasets (e.g., available at proteinatlas.org). 926 genes meet the criteria for being mainly restricted to hematopoietic lineages (brain, reproductive organ exclusions were permitted). These genes are researched for immune cell specific expression in 27 hematopoietic sub-categories: alpha beta T cell, T cell, regulatory T Cell, activated T cell, anergic T cell, gamma delta T cells, CD8 T, NK/NKT cell, NK cell, T & B cells, B cells, germinal center B cells, B cell and plasmacytoid dendritic cell, T &B & myeloid, B & myeloid, T & myeloid, MHC Class II expressing cell, monocyte, dendritic cell, plasmacytoid dendritic cells, myeloid cell, plasma cell, erythrocyte, neutrophil, low density granulocyte, granulocyte, and platelet. Transcripts are entered into I-Scope™ and the number of transcripts in each category determined. Odd’s ratios are calculated with confidence intervals using the Fisher’s exact test in R.

[0278] The T-Scope™ tool may be configured to help identify types of non-hematopoietic cells in gene expression datasets. T-Scope™ may be configured by downloading approximately 10,000 tissue enriched and 8,000 cell line enriched genes from the human protein atlas along with their tissue or cell line designation (e.g., available at proteinatlas.org). Genes found in more than four tissues are eliminated. Housekeeping genes described in the gene expression study by She et al. are also removed (e.g., as described by She et al., “Definition, conservation and epigenetics of housekeeping and tissue-enriched genes,” BMC Genomics 2009, 10:269, which is incorporated herein by reference in its entirety). This list is further curated by removing genes differentially expressed in 34 hematopoietic cell gene expression datasets and adding kidney specific genes from datasets downloaded from the GEO repository and processed by Ampel BioSolutions. The resulting categories of genes represent genes enriched in the following 42 tissue/ cell specific categories: adrenal gland, breast, cartilage, cerebral cortex, uterine cervix, chondrocyte, colon, duodenum, endometrium, epididymis, esophagus fallopian tube, esophagus, fibroblast, heart muscle, keratinocyte, kidney, liver, lung, melanocyte, ovary pancreas, parathyroid gland, placenta, podocyte, prostrate, rectum, salivary gland, seminal vesicle, skeletal muscle, skin, small intestine, smooth muscle, stomach, synoviocyte, testis, kidney loop of henle, kidney proximal tubule, kidney distal tubule, and kidney collecting duct. [0279] The CellScan tool may be a combination of I-Scope™ and T-Scope™ , and may be configured to analyse tissues with suspected immune infiltrations that may also have tissue specific genes. CellScan may potentially be more stringent than either I-Scope™ or T-Scope™ because it may be used to distinguish resident tissue cells from non-resident hematopoietic cells.

[0280] The MS (Molecular Signature) Scoring tool may be configured to assess specific pathways in a disease state. Information on genes that encode for proteins that participate in a specific signaling pathway, and whether the gene product promotes or inhibits the pathway, are compiled and curated through literature mining. Curated pathways presented by the company include CD40- CD40hgand, IL-6, IL-12/23, TNF, IL-17, IL-21, S1P1, IL-13 and PDE4, but this method may be used for any known signaling pathway with available data. To determine if a signaling pathway is over or under-expressed in a microarray dataset, the gene list for each signaling pathway may be queried against the limma differentially expressed genes from a disease state compared to healthy controls, and the differentially expressed genes in the signaling pathway may be identified for each set. The fold changes for genes that promoted the pathway may be added together and the fold changes for genes that inhibited the pathway may be subtracted from the score. This total score may be normalized based on the number of genes that may be detected on the specific microarray platform used for the experiment. Activation scores of -100 to +100 may be determined using this method with negative scores indicating an inhibition of the specific pathway in the disease state and positive scores indicating an up-regulation of a specific pathway in the disease state. The Fischer’s exact test may be performed to determine if there was sufficient overlap of genes between the experimental differentially expressed genes and the genes in the signaling pathway.

[0281] Gene Set Variation Analysis (GSVA) may be performed (for example, as described in Catalina et al. (2019, Communications Biology, “Gene expression analysis delineates the potential roles of multiple interferons in systemic lupus erythematosus”, which is incorporated herein by reference in its entirety) to determine enrichment of signaling pathways in individual patient samples. Gene set variation analysis may be performed using an open source software package for the coding language R available at the R Bioconductor (bioconductor.org), e.g., as described by Hanzelman et al., (“GSVA: gene set variation analysis for microarray and RNA-Seq data,” BMC Bioinformatics, 2013, which is incorporated herein by reference in its entirety). The modules of genes to interrogate the datasets may be developed. Modules of genes determined to represent a specific signaling pathway or process may be identified (e.g., using publicly available datasets). For example, the IFNB1 signaling pathway is taken from a publicly available gene expression dataset of peripheral blood cells treated with IFNB1 in vitro. Genes co-expressed in this dataset (genes either all increased or decreased compared to control treated peripheral blood) are used to create modules of genes representing the IFNB1 signaling pathway, and GSVA is used to determine the enrichment of this set of genes and hence the IFNB1 signaling pathway in individual patient and control samples.

[0282] The CoLTs®, or Combined Lupus Treatment Scoring, may be configured to rank identified drugs or therapies by a number of essential characteristics, including scientific rationale, experience in lupus mice/human cells (preclinical), previous clinical experience in autoimmunity, drug properties, and safety profile, including adverse events. Face and test validities may be established by scoring SOC medications and confirming the scores with a panel of lupus clinicians. The final result may be the CoLTs® score. A CoLTs® algorithm may also be configured for drugs in development (DID), which typically do not have drug metabolism and adverse event information available.

[0283] The target scoring algorithm may be configured to prioritize a specific gene or protein that is potentially a good choice to target with a drug in lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) patients. It may be utilized even if there is currently no drug available to the target gene or protein. The algorithm may be based on the addition of 18 data based determinations plus the overall scientific rationale and generates scores from -13 (not a good target in SLE) to 27 (very promising target in SLE).

BIG-C™ big data analysis tool

[0284] BIG-C® is a fast and efficient cloud-based tool to functionally categorize gene products. With coverage of over 80% of the genome, BIG-C® leverages publicly available databases such as UniProtKB/Swiss-Prot, GO terms, KEGG pathways, NCBI PubMed and Interactome to place genes into 53 functional categories. The sorting into only one of 53 functional groups allows for a quick and relatively simple understanding of types of genes enriched and co-expressed in a big dataset. This assists in deriving further insights from genes expressed for a given disease state in human or pre-clinical mouse models.

[0285] BIG-C® may be used to functionally categorize immunological genes that are not covered in cancer databases such as GO and KEGG (e.g., as described by Grammer et al. 2016, “Drug repositioning in SLE: crowd-sourcing, literature-mining and Big Data analysis,” Lupus, 25(10), 1150-1170, which is incorporated herein by reference in its entirety). Using a knowledge base of over 5000 patients with systemic lupus erythematosus (SLE), over 16432 genes are each placed into one of 53 BIG-C® functional categories, and statistical analysis is performed to identify enriched categories. BIG-C® categories are cross-examined with the GO and KEGG terms to obtain additional information and insights.

[0286] A sample BIG-C® workflow may comprise the following steps. First, SLE genomic datasets arederived from whole blood, peripheral blood mononuclear cells, affected tissues, and purified immune cells. Second, datasets are analyzed using DE analysis (as shown by a differential expression heatmap) or Weighted Gene Coexpression Network Analysis (WGCNA) (as shown by a gene coexpression plot). Third, expressed genes are annotated using publicly available databases (e.g., UniProtKB/Swiss-Prot database, Human Immunodeficiencies database, Mouse MGI database, Entrez Molecular Sequence database, PubMed, and the Human Tissue Atlas). Fourth, signatures are cross-referenced with purified single-cell microarray datasets and RNAseq experiments. Fifth, BIG- G® is leveraged to separate the individual annotated genes into one of 53 functional categories (e.g., as described by Labonte et al. 2018, “Identification of alterations in macrophage activation associated with disease activity in systemic lupus erythematosus,” PloS one, 13(12), e0208132, which is incorporated herein by reference in its entirety). Sixth, chi-squared analysis is used to determine enriched categories of interest from overlap p-values. Seventh, enriched categories are cross-examined with GO and KEGG terms to derive key insights for further analysis.

I-Scope™ big data analysis tool

[0287] I-Scope™ may be a tool configured for cross-examining the presence and activity of varying types of immune cell infiltrates with observed gene expression patterns. It may take annotated gene expression data and analyze it for hematopoietic cell lineage. I-Scope™ may be used downstream of the BIG-C® (Biologically Informed Gene-Clustering) tool in that it helps to provide even more insight into the nature of the genes being expressed after categorization.

[0288] I-Scope™ addresses the need to understand the involvement of specific cells for a given disease state. While it is helpful to understand the relative up-regulation and down-regulation at the gene expression level, it is even more informative to understand specifically in which cells this is occurring. I-Scope™ may be configured to identify hematopoietic cells through an iterative search of more than 17,000 genes identified in more than 50 microarray datasets (e.g., as described by Hubbard et al., “Analysis of Lupus Synovitis Gene Expression Reveals Dysregulation of Pathogenic Pathways Activated within Infiltrating Immune Cells,” Arthritis Rheumatol, 2018; 70 (suppl 10), which is incorporated herein by reference in its entirety). I-Scope™ may function by restricting the analysis to genes of hematopoietic cell heritage and allow for cross-checking against purified singlecell experiments or datasets. The cross-check confirms and categorizes specific transcript signatures to the 28 hematopoietic cell sub-categories, ultimately allowing for cellular activity analysis across multiple samples and disease states. When combined with BIG-C® categories, the cellular activity may be correlated to specific functions within a given cell type.

[0289] A sample I-Scope™ workflow may comprise the following steps. First, candidate genes are identified from SLE (systemic lupus erythematosus) datasets potentially associated with immune cell expression. Second, using HP A, GTEx, and FANTOM5 datasets, expression signatures associated with hematopoietic cell lineage are identified. Third, signatures are cross-referenced with purified single-cell microarray datasets and RNAseq experiments. Fourth, transcripts are categorized into 28 hematopoietic cell sub-categories and assess cellular expression across different samples and disease states. Odd’s ratios are calculated with confidence intervals using the Fisher’s exact test in R. An I- Scope™ signature analysis for a given sample may lead to the I-Scope™ signature analysis across multiple samples and disease states.

T-Scope™ big data analysis tool

[0290] The T-Scope™ tool may be configured for cross-examining gene expression signatures of a given sample with a database of non-hematopoietic cell types (e.g., as described by Hubbard et al., “Analysis of Gene Expression from Systemic Lupus Erythematosus Synovium Reveals Unique Pathogenic Mechanisms [Abstract], Annual Meeting of the American College of Rheumatology; June 2019; Chicago, IL, which is incorporated herein by reference in its entirety). T-Scope™ may comprise a database of 704 transcripts allocated to 45 independent categories. Transcripts detected in the sample are matched to one of the cellular categories within the T-Scope™ tool to derive further insights on tissue cell activity. T-Scope™ may be used downstream of the BIG-C® (Biologically Informed Gene-Clustering) tool to understand which tissue cell types are present. In conjunction with I-Scope™ (which provides information related to immune cells), T-Scope™ may be performed to provide a complete view of all possible cell activity in a given sample.

[0291] T-Scope™ addresses the need to understand the involvement of specific tissue cells for a given disease state. While it is helpful to understand the relative up-regulation and down-regulation at the gene expression level, it is even more informative to understand specifically in which cells this is occurring. T-Scope™ may be configured by downloading a set of approximately 10,000 tissue enriched and 8,000 cell line enriched genes from the Human Protein Atlas along with their tissue or cell line designation. Genes differentially expressed in hematopoietic cell datasets are removed and kidney specific genes are added from the GEO repository. T-Scope™ may function by restricting the analysis to genes of known tissue cell heritage and allow for cross-checking against purified singlecell experiments or datasets. The cross-check confirms and categorizes specific transcript signatures to the 45 tissue cell sub-categories, ultimately allowing for cellular activity analysis across multiple samples and disease states. When combined with BIG-C® categories, the cellular activity may be correlated to specific functions within a given tissue cell type.

[0292] A sample T-Scope™ workflow may comprise the following steps. First, candidate genes are identified from SLE (systemic lupus erythematosus) differential expression datasets potentially associated with tissue cell expression. Second, using publicly available databases, expression signatures associated with potential tissue cell activity are identified. Third, signatures are cross- referenced with microarray, scRNAseq or RNAseq experiments. Fourth, transcripts are categorized into 45 tissue cell sub-categories and cellular expression is assessed across different samples and disease states. Results may be obtained using T-Scope™ in combination with I-Scope™ for identification of cells post-DE-analysis.

CellScan big data analysis tool

[0293] A cloud-based genomic platform may be configured to provide users with access to CellScan™, which comprises a suite of tools for the identification, analysis, and prioritization of targets for drug development and/or repositioning. This platform is powered by a database containing the genomic information gathered from 5000+ autoimmune patients. The cloud-based genomic platform may leverage results from RNAseq and microarray experiments in conjunction with clinical information, such as medication and lab tests, to provide undiscovered insights.

[0294] CellScan™ may go beyond typical ‘omics analysis by performing one or more of the following: functionally categorizing genes and their products (e.g., using BIG-C®); deconvolving gene expression data to identify unique immunological cell types from blood or biopsy samples (e.g., using I-Scope™); identifying tissue specific cell from biopsy samples (e.g., using T-Scope™); identifying receptor-ligand interactions and subsequent signaling pathways (e.g., using MS- Scoring™); ranking genes and their products for targeting by drugs and miRNA mimetics (e.g., using Target-Scoring™); and prioritizing FDA-approved drugs and drugs-in-development for treatment in patients or pre-clinical models (e.g., using CoLTs®).

[0295] CellScan™ applications may include one or more of: Biomarker Discovery, Disease Mechanisms, Drug Mechanism of Action, Drug Mechanism of Toxicity, and Target Identification and Validation. Experimental approaches supported by CellScan™ may include one or more of: IncRNA, Metabolomics, MicroArray, miRNA, mRNA, qPCR, Proteomics, and RNAseq.

[0296] Data analysis and interpretation with CellScan™ may build on comprehensive, manually curated content of a knowledge base. Powerful, quick, and efficient tools may be used to perform deep analysis of NGS and miRNA data to identify gene function, immunological and tissue cell type, pathways, and target/drug appropriate for a specific disease state.

[0297] CellScan™ features may be configured to optimize or maximize the impact of information that surfaces in an analysis so that interpretation of a dataset is comprehensive and elucidates actionable insights. These features may include one or more of: NGS RNAseq data analysis, biomarker scoring, and prioritizing targets and drugs for human clinical trials and/or pre-clinical models. The NGS RNAseq data analysis may comprise interrogating RNA and miRNA data for function, cell-type (immunological or tissue) and pathways. The biomarker scoring may comprise using a knowledge base and gene expression data to assess and prioritize biomarkers associated with a target disease or phenotype. The target/drug prioritization may comprise leveraging objective scoring of targets and drugs based on parameters such as scientific rationale, evidence in mouse/human cells, prior clinical data, overall drug properties, and the risk of adverse events. [0298] The knowledge base may be a repository created from millions of individual pieces of information gathered about genes, cells, tissues, drugs, and diseases, and manually reviewed for accuracy and includes rich contextual details and links to original publications. The knowledge base may enable access to relevant and substantiated knowledge from primary literature as well as public and private databases for comprehensive interpretation of NGS/RNAseq data elucidating function/pathways and prioritize targets/drugs for given disease states. An example list of reference databases for the content in CellScan™, with both human and mouse species-specific identifiers supported.

MS (Molecular Signature) Scoring™ analysis tool

[0299] MS-Scoring™ may be configured to identify receptor-ligand interactions and predict ongoing signaling pathways. In addition, MS-Scoring™ may be used to validate molecular pathways as potential targets for new or repurposed drug therapies. The specificity of next-generation drug therapies requires a way to understand the potential of a given therapy to act on the intended biochemical target. Moreover, a potential application of this is the repositioning of drug therapies that may have the correct biochemical targeting to address multiple clinical needs beyond the initial intended therapeutic value.

[0300] MS-Scoring™ may be specifically developed to address gaps in the QIAGEN IPA® (Ingenuity Pathway Analysis) tool that does not contain many immunologically relevant pathways. Similar to IPA®, MS-Scoring™ 1 may use log-fold change information to score the target and its signaling pathway to verify the viability of the targets. If the fold-change of the genes of a signaling pathway appears to be upregulated or inhibitors appear to be downregulated, MS-Scoring™ 1 may provide a score of +1. Conversely if the genes of a signaling pathway appear downregulated or the inhibitors upregulated, MS-Scoring™ 1 may provide a score of -1. A score of zero may be provided if no fold-change is observed. The scores may then be summed and normalized across the entire pathway to yield a final %score between -100 (inhibition) and +100 (up-regulation). Higher absolute magnitude scores, scores that are close to -100 or +100, may indicate a high potential for therapeutic targeting. The Fischer’s exact test may be performed to determine if there is sufficient overlap of genes between the experimental differentially expressed genes and the genes in the signaling pathway.

[0301] A sample MS-Scoring™ 1 workflow may comprise the following steps. First, potential drugs and pathways are identified by LINCS (Library of Integrated Network-Based Cellular Signatures) as candidates for therapeutic intervention. Second, MS-Scoring™ 1 is used to evaluate individual transcript elements of the target pathway. Third, signatures are cross-referenced with purified singlecell microarray datasets and RNAseq experiments. Fourth, scores are compiled and normalized to provide an overall % score for the pathway and higher absolute magnitude scores indicate a higher potential for therapeutic targeting.

[0302] MS-Scoring™ 1 may be performed of IL- 12 and IL-23 related pathways for targeting using ustekinumab for SLE (systemic lupus erythematosus) drug repositioning (e.g., as described by Grammer et al., 2016, “Drug repositioning in SLE: crowd-sourcing, literature-mining and Big Data analysis,” Lupus, 25(10), 1150-1170, which is incorporated herein by reference in its entirety).

[0303] MS-Scoring™ 2 may utilize custom-defined gene modules that represent a signaling pathway or process and is particularly useful for gene expression datasets from microarray or RNAseq. The MS-Scoring™ 2 tool may be configured to take a deeper look at signaling pathways analyzed using the MS-Scoring™ 1. The tool may analyze raw gene expression data and assess enrichment by the Gene Set Variation Analysis (as described herein), which assigns an indexed score to the individual co-expressed pathways between -1 and +1 indicating levels of downregulation and up-regulation respectively.

[0304] A sample MS-Scoring™ 2 workflow may comprise the following steps. First, a signaling pathway of interest is selected from the MS-Scoring™ 2 menu. Second, a raw gene expression data is inputted into the MS-Scoring™ 2 tool. Third, enrichment of signaling pathway(s) is assessed on a patient by patient basis. Fourth, the data may then be used to drive insight for the target signaling pathways in individual patient samples.

[0305] Results from GSVA Analysis on SLE (systemic lupus erythematosus) signaling pathways may be, e.g., as described by Hanzelmann et al., “GSVA: Gene Set Variation Analysis for Microarray and RNA-Seq Data,” BMC Bioinformatics, vol. 14, no. 1, 2013, p. 7., which is incorporated herein by reference in its entirety.

CoLTs®(Combined Lupus Treatment Scoring) analysis tool

[0306] A scoring method called CoLTs®, or Combined Lupus Treatment Scoring, may be configured to assessing and prioritizing the repositioning potential of drug therapies. CoLTs® may rank identified drugs/therapies by a number of essential characteristics, including scientific rationale, experience in lupus mice/human cells (preclinical), previous clinical experience in autoimmunity, drug properties, and safety profile, including adverse events. Face and test validities may be established by scoring standard of care (SOC) medications and confirming the scores with a panel of lupus clinicians. The final result may be the CoLTs® score. A CoLTs® algorithm may also be configured for drugs in development (DID) since they typically do not have drug metabolism and adverse event information available.

[0307] CoLTs® may be configured to perform objective scoring of drug molecules based on a hypothesis-based literature search of publicly available databases. The tool has the ability to rank drug molecules from both FDA-approved and non-approved classes and ranked based upon parameters such as scientific rationale, evidence in mouse/human cells, prior clinical data, overall drug properties, and the risk of adverse events. The parameters are used within five independent drug therapy categories: small molecules, biologies, complementary and alternative therapies, and drugs in development.

[0308] CoLTs® may address the need for a systematic and objective way to evaluate the potential of drug therapies to be repositioned for treatment of autoimmune diseases, initially within SLE (systemic lupus erythematosus). The composite score may embody all the accessible information in literature databases, inclusive of efficacy and adverse reactions, to be able to assist in the prioritization of drug development. While the composite score takes into account many aspects of a drug, it may heavily weigh the risk of adverse events and ranges from -16 to +11. CoLT Scoring® may be validated through repeated scoring of 215 potential therapies using a total of over 5000 reference data points as well as by clinicians specializing in the field of rheumatology. Specifically, CoLTs®’ prediction of Stelara/Ustekinumab to be a top priority biologic for lupus drug repositioning is validated by a successful Phase 2 clinical trial (e.g., as described by Vollenhoven et al., “Efficacy and Safety of Ustekinumab, an IL-12 and IL-23 Inhibitor, in Patients with Active Systemic Lupus Erythematosus: Results of a Multicentre, Double-Blind, Phase 2, Randomised, Controlled Study.” The Lancet, vol. 392, no. 10155, 2018, pp. 1330-1339, which is incorporated herein by reference in its entirety). CoLTs® may be calibrated on SoC (Standard of Care) therapies for the individual autoimmune disease being assessed.

[0309] Within the ten major categories, rationale ranges from 0 to +3, mouse/human in vitro experience ranges from -1 to +1, clinical properties are on a scale of -3 to +3, the adverse effect of inducing lupus ranges from -1 to 0, metabolic properties range from -2 to 0, and finally adverse events (such as toxicity, infection, carcinogenic, etc.) were given a score of -5 to 0 (e.g., as described by Grammer et al., 2016, “Drug repositioning in SLE: crowd-sourcing, literature-mining and Big Data analysis,” Lupus, 25(10), 1150-1170, which is incorporated herein by reference in its entirety). For example, CoLT Scoring® of SOC Therapies in Lupus (Belimumab, HCQ, and Rituximab) may be performed.

Target Scoring analysis tool

[0310] The Target scoring algorithm may be configured to prioritize a specific gene or protein that would potentially be a good choice to target with a drug in lupus patients. It may be utilized even if there is currently no drug available to the target gene or protein. The algorithm may be based on the addition of 18 data based determinations plus the overall scientific rationale and generates scores from -13 (not a good target in SLE) to 27 (very promising target in SLE). [0311] Target- Scoring™ may be configured to assessing and prioritizing the potential of molecular targets for further development of drug therapies. The Target- Scoring™ tool is very similar to CoLTs® except it approaches the need for new SLE therapies from a different angle. Target Scoring may be configured to perform an objective assessment of molecular targets for the development of new or repurposed drug therapies. Like CoLTs®, it also derives data from a hypothesis-based literature search and generates a composite score based on the publicly available information. Leveraging the composite score, researchers may better prioritize the development of novel drug therapies addressing the assessed targets of interest.

[0312] Target- Scoring™ may utilize 19 different scoring categories to derive a composite score that ranges from -13 to +27 for the suitability of a gene target for SLE therapy development. Target- Scoring™ may be validated through repeated scoring of potential therapies as well as by clinicians (e.g., clinicians specializing in the field of immunology).

Classifiers

Assessment of Conditions

[0313] A non-limiting example of a method to assess a condition of a subject, e.g., an SLE, DLE, PSO, AD, or SSc condition, may comprise one or more of the following operations. A dataset of a biological sample of a subject is received. The dataset may comprise quantitative measures of gene expression from each of a plurality of lupus-associated genomic loci.

[0314] A patient sample may be harvested using any method known to those of skill in the art. To obtain a blood sample, various techniques may be used, e.g., a syringe or other vacuum suction device. A blood sample may be optionally pre-treated or processed prior to use. A sample, such as a blood sample, may be analyzed under any of the methods and systems herein within 4 weeks, 2 weeks, 1 week, 6 days, 5 days, 4 days, 3 days, 2 days, 1 day, 12 hr, 6 hr, 3 hr, 2 hr, or 1 hr from the time the sample is obtained, or longer if frozen. When obtaining a sample from a subject (e.g., blood sample), the amount may vary depending upon subject size and the condition being screened. In some embodiments, at least 10 mL, 5 mL, 1 mL, 0.5 mL, 250, 200, 150, 100, 50, 40, 30, 20, 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1 pL of a sample is obtained. In some embodiments, 1-50, 2-40, 3-30, or 4-20 pL of sample is obtained. In some embodiments, more than 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95 or 100 pL of a sample is obtained.

[0315] To obtain a skin biopsy sample, various techniques may be used. The skin biopsy sample can include skin samples removed from the body of the subject. In certain embodiments, the skin biopsy sample include cells and/or tissues from cutaneous, intradermal, or subcutaneous layer, or from any abnormal tissue in theses layers. In certain embodiments, the skin biopsy sample includes cutaneous tissues. In certain embodiments, the skin biopsy sample includes subcutaneous tissues. Skin biopsy can be performed using any suitable technique known to those of skill in the art. In certain embodiments, skin biopsy can be performed using shave biopsy, punch biopsy, excisional biopsy, or any combination thereof. In certain embodiments, the shave biopsy procedure includes removing a small section of the top layers of skin (epidermis and a portion of the dermis). In certain embodiments, the punch biopsy procedure includes using a tool, such as a circular tool to remove a small core of skin, including deeper layers (epidermis, dermis and superficial fat). In certain embodiments, the excisional biopsy procedure includes removing a lump, lesion and/or an area of abnormal skin. In certain particular embodiments, entire or effectively entire lump, lesion and/or the area of abnormal skin is removed. In certain particular embodiments, the lump, lesion and/or area of abnormal skin is removed through fatty layer of the skin. The area, size, and amount of the skin biopsy sample m ay vary depending upon the condition being analyzed. In some embodiments, the skin sample is obtained via one, two, three, four, five, or more shave biopsies, punch biopsies, incisional or excisional biopsies, or a combination of the above. In some embodiments, the skin sample comprises, for example, elastin and/or collagen. The skin sample may be obtained from any desired anatomical location on the body, including one or more of the scalp, face, neck, chest, arms, legs, hands, back, buttocks, upper or lower extremities, or genitalia for example. The skin sample may have any appropriate depth. In some embodiments the skin sample has a depth of about 2 mm to about 25 mm. In some embodiments the skin sample has a depth of about 2 mm to about 3 mm, about 2 mm to about 4 mm, about 2 mm to about 5 mm, about 2 mm to about 6 mm, about 2 mm to about 7 mm, about 2 mm to about 8 mm, about 2 mm to about 9 mm, about 2 mm to about 10 mm, about 2 mm to about 15 mm, about 2 mm to about 20 mm, about 2 mm to about 25 mm, about 3 mm to about 4 mm, about 3 mm to about 5 mm, about 3 mm to about 6 mm, about 3 mm to about 7 mm, about 3 mm to about 8 mm, about 3 mm to about 9 mm, about 3 mm to about 10 mm, about 3 mm to about 15 mm, about 3 mm to about 20 mm, about 3 mm to about 25 mm, about 4 mm to about 5 mm, about 4 mm to about 6 mm, about 4 mm to about 7 mm, about 4 mm to about 8 mm, about 4 mm to about 9 mm, about 4 mm to about 10 mm, about 4 mm to about 15 mm, about 4 mm to about 20 mm, about 4 mm to about 25 mm, about 5 mm to about 6 mm, about 5 mm to about 7 mm, about 5 mm to about 8 mm, about 5 mm to about 9 mm, about 5 mm to about 10 mm, about 5 mm to about 15 mm, about 5 mm to about 20 mm, about 5 mm to about 25 mm, about 6 mm to about 7 mm, about 6 mm to about 8 mm, about 6 mm to about 9 mm, about 6 mm to about 10 mm, about 6 mm to about 15 mm, about 6 mm to about 20 mm, about 6 mm to about 25 mm, about 7 mm to about 8 mm, about 7 mm to about 9 mm, about 7 mm to about 10 mm, about 7 mm to about 15 mm, about 7 mm to about 20 mm, about 7 mm to about 25 mm, about 8 mm to about 9 mm, about 8 mm to about 10 mm, about 8 mm to about 15 mm, about 8 mm to about 20 mm, about 8 mm to about 25 mm, about 9 mm to about 10 mm, about 9 mm to about 15 mm, about 9 mm to about 20 mm, about 9 mm to about 25 mm, about 10 mm to about 15 mm, about 10 mm to about 20 mm, about 10 mm to about 25 mm, about 15 mm to about 20 mm, about 15 mm to about 25 mm, or about 20 mm to about 25 mm. In some embodiments the skin sample has a depth of about 2 mm, about 3 mm, about 4 mm, about 5 mm, about 6 mm, about 7 mm, about 8 mm, about 9 mm, about 10 mm, about 15 mm, about 20 mm, or about 25 mm. In some embodiments the skin sample has a depth of at least about 2 mm, about 3 mm, about 4 mm, about 5 mm, about 6 mm, about 7 mm, about 8 mm, about 9 mm, about 10 mm, about 15 mm, or about 20 mm. In some embodiments the skin sample has a depth of at most about 3 mm, about 4 mm, about 5 mm, about 6 mm, about 7 mm, about 8 mm, about 9 mm, about 10 mm, about 15 mm, about 20 mm, or about 25 mm. The skin sample can include one or more layers of the epidermis, dermis, and hypodermis. Skin sampling techniques for disease analysis are widely described in the literature, e.g., by the Mayo Clinic on their website (mayoclinic.org), available under “Skin Biopsy” (Mayo Clinic, Rochester, MN), incorporated herein by reference in its entirety.

[0316] The sample may be taken before and/or after treatment of a subject with a disease or disorder. Samples may be obtained from a subject during a treatment or a treatment regime. Multiple samples may be obtained from a subject to monitor the effects of the treatment over time. The sample may be taken from a subject known or suspected of having a disease or disorder for which a definitive positive or negative diagnosis is not available via clinical tests. The sample may be taken from a subject suspected of having a disease or disorder. The sample may be taken from a subject experiencing unexplained symptoms, such as fatigue, nausea, weight loss, aches and pains, weakness, or bleeding. The sample may be taken from a subject having explained symptoms. The sample may be taken from a subject at risk of developing a disease or disorder due to factors such as familial history, age, hypertension or pre-hypertension, diabetes or pre-diabetes, overweight or obesity, environmental exposure, lifestyle risk factors (e.g., smoking, alcohol consumption, or drug use), or presence of other risk factors.

[0317] In some embodiments, a sample may be taken at a first time point and assayed, and then another sample may be taken at a subsequent time point and assayed. Such methods may be used, for example, for longitudinal monitoring purposes to track the development or progression of a disease or disorder (e.g., an SLE condition). In some embodiments, the progression of a disease may be tracked before treatment, after treatment, or during the course of treatment, to determine the treatment’s effectiveness. For example, a method as described herein may be performed on a subject prior to, and after, treatment with an SLE therapy to measure the disease’s progression or regression in response to the SLE therapy. [0318] After obtaining a sample from the subject, the sample may be processed to generate datasets indicative of a condition (e.g., an SLE condition) of the subject. For example, a presence, absence, or quantitative assessment of nucleic acid molecules of the sample from a panel of condition-associated (e.g., SLE-associated) genomic loci or may be indicative of a condition (e.g., an SLE condition) of the subject. Processing the sample obtained from the subject may comprise (i) subjecting the sample to conditions that are sufficient to isolate, enrich, or extract a plurality of nucleic acid molecules, and (ii) assaying the plurality of nucleic acid molecules to generate the dataset (e.g., microarray data, nucleic acid sequences, or quantitative polymerase chain reaction (qPCR) data). Methods of assaying may include any assay known in the art or described in the literature, for example, a microarray assay, a sequencing assay (e.g., DNA sequencing, RNA sequencing, or RNA-Seq), or a quantitative polymerase chain reaction (qPCR) assay.

[0319] In some embodiments, a plurality of nucleic acid molecules is extracted from the sample and subjected to sequencing to generate a plurality of sequencing reads. The nucleic acid molecules may comprise ribonucleic acid (RNA) or deoxyribonucleic acid (DNA). The extraction method may extract all RNA or DNA molecules from a sample. Alternatively, the extraction method may selectively extract a portion of RNA or DNA molecules from a sample. Extracted RNA molecules from a sample may be converted to cDNA molecules by reverse transcription (RT).

[0320] The sample may be processed without any nucleic acid extraction. For example, the disease or disorder may be identified or monitored in the subject by using probes configured to selectively enrich nucleic acid (e.g., RNA or DNA) molecules corresponding to a panel of SLE-associated genomic loci. The probes may be nucleic acid primers. The probes may have sequence complementarity with nucleic acid sequences from one or more of the panel of condition-associated (e.g., SLE-associated) genomic loci. The panel of condition-associated genomic loci may comprise at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least about 25, at least about 30, at least about 35, at least about 40, at least about 45, at least about 50, at least about 55, at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, at least about 90, at least about 95, at least about 100, or more condition-associated genomic loci.

[0321] The probes may be nucleic acid molecules (e.g., RNA or DNA) having sequence complementarity with nucleic acid sequences (e.g., RNA or DNA) of one or more genomic loci (e.g., condition-associated genomic loci). These nucleic acid molecules may be primers or enrichment sequences. The assaying of the sample using probes that are selective for the one or more genomic loci (e.g., condition-associated genomic loci) may comprise use of array hybridization, polymerase chain reaction (PCR), or nucleic acid sequencing (e.g., RNA sequencing or DNA sequencing, such as RNA-Seq). [0322] The assay readouts may be quantified at one or more genomic loci (e.g., condition-associated genomic loci) to generate the data indicative of the disease or disorder. For example, quantification of array hybridization or polymerase chain reaction (PCR) corresponding to a plurality of genomic loci (e.g., condition-associated genomic loci) may generate data indicative of the disease or disorder. Assay readouts may comprise quantitative PCR (qPCR) values, digital PCR (dPCR) values, digital droplet PCR (ddPCR) values, fluorescence values, etc., or normalized values thereof.

Classifiers

[0323] In some embodiments, the present disclosure provides a system, method, or kit having data analysis realized in software application, computing hardware, or both. In various embodiments, the analysis application or system includes at least a data receiving module, a data pre-processing module, a data analysis module, a data interpretation module, or a data visualization module. In one embodiments, the data receiving module may comprise computer systems that connect laboratory hardware or instrumentation with computer systems that process laboratory data. In one embodiments, the data pre- processing module may comprise hardware systems or computer software that performs operations on the data in preparation for analysis. Examples of operations that may be applied to the data in the pre-processing module include affine transformations, denoising operations, data cleaning, reformatting, or subsampling. A data analysis module, which may be specialized for analyzing genomic data from one or more genomic materials, can, for example, take assembled genomic sequences and perform probabilistic and statistical analysis to identify abnormal patterns related to a disease, pathology, state, risk, condition, or phenotype. A data interpretation module may use analysis methods, for example, drawn from statistics, mathematics, or biology, to support understanding of the relation between the identified abnormal patterns and health conditions, functional states, prognoses, or risks. A data visualization module may use methods of mathematical modeling, computer graphics, or rendering to create visual representations of data that may facilitate the understanding or interpretation of results.

[0324] Feature sets may be generated from datasets obtained using one or more assays of a biological sample obtained or derived from a subject, and a trained algorithm may be used to process one or more of the feature sets to identify or assess a condition (e.g., a disease or disorder, such as a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition) of a subject. For example, the trained algorithm may be used to apply a machine learning classifier to a plurality of condition-associated genomic loci that are associated with two or more classes of individuals inputted into a machine learning model, in order to classify a subject into one of the two or more classes of individuals. For example, the trained algorithm may be used to apply a machine learning classifier to a plurality of condition-associated that are associated with individuals with known conditions (e.g., a disease or disorder, such as a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition) and individuals not having the condition (e.g., healthy individuals, or individuals who do not have a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition), in order to classify a subject as having the condition (e.g., positive test outcome) or not having the condition (e.g., negative test outcome).

[0325] The trained algorithm may be configured to identify the presence (e.g., positive test result) or absence (e.g., negative test result) of one or more conditions (e.g., a disease or disorder, such as a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition) with an accuracy of at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than 99%. This accuracy may be achieved for a set of at least about 25, at least about 50, at least about 100, at least about 150, at least about 200, at least about 250, at least about 300, at least about 350, at least about 400, at least about 450, at least about 500, at least about 1,000, or more than about 1,000 independent samples.

[0326] The trained algorithm may comprise a machine learning algorithm, such as a supervised machine learning algorithm. The supervised machine learning algorithm may comprise, for example, a Random Forest, a support vector machine (SVM), a neural network, or a deep learning algorithm. The trained algorithm may comprise a classification and regression tree (CART) algorithm. The trained algorithm may comprise an unsupervised machine learning algorithm.

[0327] The trained algorithm may comprise a classifier configured to accept as input a plurality of input variables or features (e.g., condition-associated genomic loci) and to produce or output one or more output values based on the plurality of input variables or features (e.g., condition-associated genomic loci). The plurality of input variables or features may comprise one or more datasets indicative of the presence (e.g., positive test result) or absence (e.g., negative test result) of one or more conditions (e.g., a disease or disorder, such as a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition). For example, an input variable or feature may comprise a number of sequences corresponding to or aligning to each of the plurality of condition-associated genomic loci.

[0328] The plurality of input variables or features may also include clinical information of a subject, such as health data. For example, the health data of a subject may comprise one or more of: a diagnosis of one or more conditions (e.g., a disease or disorder, such as a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition), a prognosis of one or more conditions (e.g., a disease or disorder, such as a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition), a risk of having one or more conditions (e.g., a disease or disorder, such as a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition), a treatment history of one or more conditions (e.g., a disease or disorder, such as a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition), a history of previous treatment for one or more conditions (e.g., a disease or disorder, such as a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition), a history of prescribed medications, a history of prescribed medical devices, age, height, weight, sex, smoking status, and one or more symptoms of the subject.

[0329] For example, the disease or disorder may comprise one or more of: systemic lupus erythematosus (SLE), discoid lupus erythematosus (DLE), lupus nephritis (LN), psoriasis (PSO), atopic dermatitis (AD), or systemic sclerosis (scleroderma, SSc). As another example, the symptoms may include one or more of: alopecia, anti-dsDNA seropositivity, arthritis, fever, hematuria, leukopenia, low serum complement, mucosal ulcer, myositis, pericarditis, pleurisy, proteinuria, pyuria, rash, thrombocytopenia, urinary cast, vasculitis, visual disturbance, or a combination thereof. As another example, the prescribed medications or drugs may include one or more of: antimalarials, corticosteroids, immunosuppressants, and nonsteroidal anti-inflammatory drugs (NSAIDs).

[0330] The trained algorithm may comprise a classifier, such that each of the one or more output values comprises one of a fixed number of possible values (e.g., a linear classifier, a logistic regression classifier, etc.) indicating a classification of the sample by the classifier. The trained algorithm may comprise a binary classifier, such that each of the one or more output values comprises one of two values (e.g., {0, 1}, {positive, negative}, or {high-risk, low-risk}) indicating a classification of the sample by the classifier. The trained algorithm may be another type of classifier, such that each of the one or more output values comprises one of more than two values (e.g., {0, 1, 2}, {positive, negative, or indeterminate}, or {high-risk, intermediate-risk, or low-risk}) indicating a classification of the sample by the classifier.

[0331] The classifier may be configured to classify samples by assigning output values, which may comprise descriptive labels, numerical values, or a combination thereof. Some of the output values may comprise descriptive labels. Such descriptive labels may provide an identification or indication of the presence (e.g., positive test result) or absence (e.g., negative test result) of one or more conditions (e.g., a disease or disorder, such as a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition) of the subject, and may comprise, for example, positive, negative, high-risk, intermediate-risk, low-risk, or indeterminate. Such descriptive labels may provide an identification of a treatment for the one or more conditions of the subject, and may comprise, for example, a therapeutic intervention, a duration of the therapeutic intervention, and/or a dosage of the therapeutic intervention suitable to treat the one or more conditions of the subject. Such descriptive labels may provide an identification of secondary clinical tests that may be appropriate to perform on the subject, and may comprise, for example, an imaging test, a blood test, a computed tomography (CT) scan, a magnetic resonance imaging (MRI) scan, an ultrasound scan, a chest X-ray, a positron emission tomography (PET) scan, a PET-CT scan, or any combination thereof. For example, such descriptive labels may provide a prognosis of the one or more conditions of the subject. As another example, such descriptive labels may provide a relative assessment of the one or more conditions of the subject. Some descriptive labels may be mapped to numerical values, for example, by mapping “positive” to 1 and “negative” to 0.

[0332] The classifier may be configured to classify samples by assigning output values that comprise numerical values, such as binary, integer, or continuous values. Such binary output values may comprise, for example, {0, 1 }, {positive, negative}, or {high-risk, low-risk}. Such integer output values may comprise, for example, {0, 1, 2}. Such continuous output values may comprise, for example, a probability value of at least 0 and no more than 1. Such continuous output values may comprise, for example, an un-normalized probability value of at least 0. Such continuous output values may indicate a prognosis of the one or more conditions (e.g., a disease or disorder, such as a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition) of the subject. Some numerical values may be mapped to descriptive labels, for example, by mapping 1 to “positive” and 0 to “negative.”

[0333] The classifier may be configured to classify samples by assigning output values based on one or more cutoff values. For example, a binary classification of samples may assign an output value of “positive” or 1 if the sample indicates that the subject has at least a 50% probability of having one or more conditions (e.g., a disease or disorder, such as a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition), thereby assigning the subject to a class of individuals receiving a positive test result. As another example, a binary classification of samples may assign an output value of “negative” or 0 if the sample indicates that the subject has less than a 50% probability of having one or more conditions (e.g., a disease or disorder), thereby assigning the subject to a class of individuals receiving a negative test result. In this case, a single cutoff value of 50% is used to classify samples into one of the two possible binary output values or classes of individuals (e.g., those receiving a positive test result and those receiving a negative test result). Examples of single cutoff values may include about 1%, about 2%, about 5%, about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about

55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, about 90%, about

91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, and about

99%.

[0334] As another example, the classifier may be configured to classify samples by assigning an output value of “positive” or 1 if the sample indicates that the subject has a probability of having one or more conditions (e.g., a disease or disorder, such as a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition) of at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more. The classification of samples may assign an output value of “positive” or 1 if the sample indicates that the subject has a probability of having one or more conditions (e.g., a disease or disorder, such as a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition) of more than about 50%, more than about 55%, more than about 60%, more than about 65%, more than about 70%, more than about 75%, more than about 80%, more than about 85%, more than about 90%, more than about 91%, more than about 92%, more than about 93%, more than about 94%, more than about 95%, more than about 96%, more than about 97%, more than about 98%, or more than about 99%.

[0335] The classifier may be configured to classify samples by assigning an output value of “negative” or 0 if the sample indicates that the subject has a probability of having one or more conditions (e.g., a disease or disorder, such as a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition) of less than about 50%, less than about 45%, less than about 40%, less than about 35%, less than about 30%, less than about 25%, less than about 20%, less than about 15%, less than about 10%, less than about 9%, less than about 8%, less than about 7%, less than about 6%, less than about 5%, less than about 4%, less than about 3%, less than about 2%, or less than about 1%. The classification of samples may assign an output value of “negative” or 0 if the sample indicates that the subject has a probability of having one or more conditions (e.g., a disease or disorder, such as a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition) of no more than about 50%, no more than about 45%, no more than about 40%, no more than about 35%, no more than about 30%, no more than about 25%, no more than about 20%, no more than about 15%, no more than about 10%, no more than about 9%, no more than about 8%, no more than about 7%, no more than about 6%, no more than about 5%, no more than about 4%, no more than about 3%, no more than about 2%, or no more than about 1%.

[0336] The classifier may be configured to classify samples by assigning an output value of “indeterminate” or 2 if the sample is not classified as “positive”, “negative”, 1, or 0. In this case, a set of two cutoff values is used to classify samples into one of the three possible output values or classes of individuals (e.g., corresponding to outcome groups of individuals having “low risk,” “intermediate risk,” and “high risk” of having one or more conditions, such as a disease or disorder). Examples of sets of cutoff values may include {1%, 99%}, {2%, 98%}, {5%, 95%}, {10%, 90%}, {15%, 85%}, {20%, 80%}, {25%, 75%}, {30%, 70%}, {35%, 65%}, {40%, 60%}, and {45%, 55%}. Similarly, sets of n cutoff values may be used to classify samples into one of n+1 possible output values or classes of individuals, where n is any positive integer.

[0337] The trained algorithm may be trained with a plurality of independent training samples. Each of the independent training samples may comprise a sample from a subject, associated datasets obtained by assaying the sample (as described elsewhere herein), and one or more known output values or classes of individuals corresponding to the sample (e.g., a clinical diagnosis, prognosis, absence, or treatment efficacy of a condition of the subject). Independent training samples may comprise samples and associated datasets and outputs obtained or derived from a plurality of different subjects. Independent training samples may comprise samples and associated datasets and outputs obtained at a plurality of different time points from the same subject (e.g., on a regular basis such as weekly, biweekly, or monthly), as part of a longitudinal monitoring of a subject before, during, and after a course of treatment for one or more conditions of the subject. Independent training samples may be associated with presence of the condition (e.g., training samples comprising samples and associated datasets and outputs obtained or derived from a plurality of subjects known to have the condition). Independent training samples may be associated with absence of the condition (e.g., training samples comprising samples and associated datasets and outputs obtained or derived from a plurality of subjects who are known to not have a previous diagnosis of the condition or who have received a negative test result for the condition).

[0338] The trained algorithm may be trained with at least about 5, at least about 10, at least about 15, at least about 20, at least about 25, at least about 30, at least about 35, at least about 40, at least about 45, at least about 50, at least about 100, at least about 150, at least about 200, at least about 250, at least about 300, at least about 350, at least about 400, at least about 450, or at least about 500 independent training samples. The independent training samples may comprise samples associated with presence of the condition and/or samples associated with absence of the condition. The trained algorithm may be trained with no more than about 500, no more than about 450, no more than about 400, no more than about 350, no more than about 300, no more than about 250, no more than about 200, no more than about 150, no more than about 100, or no more than about 50 independent training samples associated with presence of the condition (e.g., a disease or disorder, such as a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition). The trained algorithm may be trained with no more than about 500, no more than about 450, no more than about 400, no more than about 350, no more than about 300, no more than about 250, no more than about 200, no more than about 150, no more than about 100, or no more than about 50 independent training samples associated with absence of the condition (e.g., a disease or disorder, such as a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition). In some embodiments, the sample is independent of samples used to train the trained algorithm.

[0339] The trained algorithm may be trained with a first number of independent training samples associated with a presence of the condition (e.g., a disease or disorder, such as a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition) and a second number of independent training samples associated with an absence of the condition (e.g., a disease or disorder, such as a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition). The first number of independent training samples associated with presence of the condition (e.g., a disease or disorder, such as a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition) may be no more than the second number of independent training samples associated with absence of the condition (e.g., a disease or disorder, such as a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition). The first number of independent training samples associated with a presence of the condition (e.g., a disease or disorder) may be equal to the second number of independent training samples associated with an absence of the condition (e.g., a disease or disorder, such as a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition). The first number of independent training samples associated with a presence of the condition (e.g., a disease or disorder, such as a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition) may be greater than the second number of independent training samples associated with an absence of the condition (e.g., a disease or disorder, such as a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition).

[0340] The trained algorithm may comprise a classifier configured to identify the presence (e.g., positive test result) or absence (e.g., negative test result) of one or more conditions (e.g., a disease or disorder, such as a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition) at an accuracy of at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 81%, at least about

82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about

87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about

92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about

97%, at least about 98%, at least about 99%, or more; for at least about 5, at least about 10, at least about 15, at least about 20, at least about 25, at least about 30, at least about 35, at least about 40, at least about 45, at least about 50, at least about 100, at least about 150, at least about 200, at least about 250, at least about 300, at least about 350, at least about 400, at least about 450, or at least about 500 independent training samples. The accuracy of identifying the presence (e.g., positive test result) or absence (e.g., negative test result) of the one or more conditions by the trained algorithm may be calculated as the percentage of independent test samples (e.g., subjects known to have the condition or subjects with negative clinical test results for the condition) that are correctly identified or classified as having or not having the condition.

[0341] The trained algorithm may comprise a classifier configured to identify one or more conditions (e.g., a disease or disorder, such as a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition) with a positive predictive value (PPV) of at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more. The PPV of identifying the condition using the trained algorithm may be calculated as the percentage of samples identified or classified as having the condition that correspond to subjects that truly have the condition.

[0342] The trained algorithm may comprise a classifier configured to identify one or more conditions (e.g., a disease or disorder, such as a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition) with a negative predictive value (NPV) of at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more. The NPV of identifying the condition using the trained algorithm may be calculated as the percentage of samples identified or classified as not having the condition that correspond to subjects that truly do not have the condition.

[0343] The trained algorithm may comprise a classifier configured to identify one or more conditions (e.g., a disease or disorder, such as a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition) with a clinical sensitivity at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, at least about 99.1%, at least about 99.2%, at least about 99.3%, at least about 99.4%, at least about 99.5%, at least about 99.6%, at least about 99.7%, at least about 99.8%, at least about 99.9%, at least about 99.99%, at least about 99.999%, or more. The clinical sensitivity of identifying the condition using the trained algorithm may be calculated as the percentage of independent test samples associated with presence of the condition (e.g., subjects known to have the condition) that are correctly identified or classified as having the condition.

[0344] The trained algorithm may comprise a classifier configured to identify one or more conditions (e.g., a disease or disorder, such as a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition) with a clinical specificity of at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, at least about 99.1%, at least about 99.2%, at least about 99.3%, at least about 99.4%, at least about 99.5%, at least about 99.6%, at least about 99.7%, at least about 99.8%, at least about 99.9%, at least about 99.99%, at least about 99.999%, or more. The clinical specificity of identifying the condition using the trained algorithm may be calculated as the percentage of independent test samples associated with absence of the condition (e.g., subjects with negative clinical test results for the condition) that are correctly identified or classified as not having the condition.

[0345] The trained algorithm may comprise a classifier configured to identify the presence (e.g., positive test result) or absence (e.g., negative test result) of one or more conditions (e.g., a disease or disorder, such as a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition) with an Area-Under-Curve (AUC) of at least about 0.50, at least about 0.55, at least about 0.60, at least about 0.65, at least about 0.70, at least about 0.75, at least about 0.80, at least about

0.81, at least about 0.82, at least about 0.83, at least about 0.84, at least about 0.85, at least about

0.86, at least about 0.87, at least about 0.88, at least about 0.89, at least about 0.90, at least about

0.91, at least about 0.92, at least about 0.93, at least about 0.94, at least about 0.95, at least about

0.96, at least about 0.97, at least about 0.98, at least about 0.99, or more. The AUC may be calculated as an integral of the Receiver Operator Characteristic (ROC) curve (e.g., the area under the ROC curve) associated with the trained algorithm in classifying samples as having or not having the condition.

[0346] Classifiers of the trained algorithm may be adjusted or tuned to improve or optimize one or more performance metrics, such as accuracy, PPV, NPV, clinical sensitivity, clinical specificity, AUC, or a combination thereof (e.g., a performance index incorporating a plurality of such performance metrics, such as by calculating a weight sum therefrom), of identifying the presence (e.g., positive test result) or absence (e.g., negative test result) of the condition. The classifiers may be adjusted or tuned by adjusting parameters of the classifiers (e.g., a set of cutoff values used to classify a sample as described elsewhere herein, or weights of a neural network) to improve or optimize the performance metrics. The one or more classifiers may be adjusted or tuned so as to reduce an overall classification error (e.g., an “out-of-bag” or oob error rate for a Random Forest classifier). The one or more classifiers may be adjusted or tuned continuously during the training process (e.g., as sample datasets are added to the training set) or after the training process has completed. [0347] The trained algorithm may comprise a plurality of classifiers (e.g., an ensemble) such that the plurality of classifications or outcome values of the plurality of classifiers may be combined to produce a single classification or outcome value for the sample. For example, a sum or a weighted sum of the plurality of classifications or outcome values of the plurality of classifiers may be calculated to produce a single classification or outcome value for the sample. As another example, a majority vote of the plurality of classifications or outcome values of the plurality of classifiers may be identified to produce a single classification or outcome value for the sample. In this manner, a single classification or outcome value may be produced for the sample having greater confidence or statistical significance than the individual classifications or outcome values produced by each of the plurality of classifiers.

[0348] After the trained algorithm is initially trained, a subset of the inputs may be identified as most influential or most important to be included for making high-quality classifications (e.g., having highest permutation feature importance). For example, a subset of the panel of condition- associated genomic loci may be identified as most influential or most important to be included for making high-quality classifications or identifications of conditions (or sub-types of conditions). The panel of condition-associated genomic loci, or a subset thereof, may be ranked based on classification metrics indicative of each influence or importance of each individual condition- associated genomic locus toward making high-quality classifications or identifications of conditions (or sub-types of conditions). Such metrics may be used to reduce, in some cases significantly, the number of input variables (e.g., predictor variables) that may be used to train the one or more classifiers of the trained algorithm to a desired performance level (e.g., based on a desired minimum accuracy, PPV, NPV, clinical sensitivity, clinical specificity, AUC, or a combination thereof).

[0349] For example, if training a classifier of the trained algorithm with a plurality comprising several dozen or hundreds of input variables to the classifier results in an accuracy of classification of more than 99%, then training the classifier of the trained algorithm instead with only a selected subset of no more than about 5, no more than about 10, no more than about 15, no more than about 20, no more than about 25, no more than about 30, no more than about 35, no more than about 40, no more than about 45, no more than about 50, or no more than about 100 such most influential or most important input variables among the plurality may yield decreased but still acceptable accuracy of classification (e.g., at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99%). [0350] As another example, if training a classifier of the trained algorithm with a plurality comprising several dozen or hundreds of input variables to the classifier results in a sensitivity or specificity of classification of more than 99%, then training the classifier of the trained algorithm instead with only a selected subset of no more than about 5, no more than about 10, no more than about 15, no more than about 20, no more than about 25, no more than about 30, no more than about 35, no more than about 40, no more than about 45, no more than about 50, or no more than about 100 such most influential or most important input variables among the plurality may yield decreased but still acceptable sensitivity or specificity of classification (e.g., at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about

80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about

85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about

90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about

95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99%).

[0351] The subset of the plurality of input variables (e.g., the panel of condition-associated genomic loci) to the classifier of the trained algorithm may be selected by rank-ordering the entire plurality of input variables and selecting a predetermined number (e.g., no more than about 5, no more than about 10, no more than about 15, no more than about 20, no more than about 25, no more than about 30, no more than about 35, no more than about 40, no more than about 45, no more than about 50, or no more than about 100) of input variables with the best classification metrics (e.g., permutation feature importance).

[0352] Upon identifying the subject as having one or more conditions (e.g., a disease or disorder, such as a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition), the subject may be optionally provided with a therapeutic intervention (e.g., prescribing an appropriate course of treatment to treat the one or more conditions of the subject). The therapeutic intervention may comprise a prescription of an effective dose of a drug, a further testing or evaluation of the condition, a further monitoring of the condition, or a combination thereof. If the subject is currently being treated for the condition with a course of treatment, the therapeutic intervention may comprise a subsequent different course of treatment (e.g., to increase treatment efficacy due to non-efficacy of the current course of treatment).

[0353] The therapeutic intervention may include prescribed medications or drugs, which may include one or more of: antimalarials, corticosteroids, immunosuppressants, and nonsteroidal antiinflammatory drugs (NSAIDs). The therapeutic intervention may be effective to alleviate or decrease one or more symptoms, which may include one or more of: alopecia, anti-dsDNA seropositivity, arthritis, fever, hematuria, leukopenia, low serum complement, mucosal ulcer, myositis, pericarditis, pleurisy, proteinuria, pyuria, rash, thrombocytopenia, urinary cast, vasculitis, visual disturbance, or a combination thereof. [0354] The therapeutic intervention may comprise recommending the subject for a secondary clinical test to confirm a diagnosis of the condition. This secondary clinical test may comprise an imaging test, a blood test, a computed tomography (CT) scan, a magnetic resonance imaging (MRI) scan, an ultrasound scan, a chest X-ray, a positron emission tomography (PET) scan, a PET-CT scan, or any combination thereof.

[0355] The feature sets (e.g., comprising quantitative measures of a panel of condition-associated genomic loci) may be analyzed and assessed (e.g., using a trained algorithm comprising one or more classifiers) over a duration of time to monitor a patient (e.g., subject who has a condition or who is being treated for a condition). In such cases, the feature sets of the patient may change during the course of treatment. For example, the quantitative measures of the feature sets of a patient with decreasing risk of the condition due to an effective treatment may shift toward the profile or distribution of a healthy subject (e.g., a subject without the condition). Conversely, for example, the quantitative measures of the feature sets of a patient with increasing risk of the condition due to an ineffective treatment may shift toward the profile or distribution of a subject with higher risk of the condition or a more advanced stage or severity of the condition.

[0356] The condition of the subject may be monitored by monitoring a course of treatment for treating the condition of the subject. The monitoring may comprise assessing the condition of the subject at two or more time points. The assessing may be based at least on the feature sets (e.g., quantitative measures of a panel of condition-associated genomic loci) determined at each of the two or more time points. The therapeutic intervention may include prescribed medications or drugs, which may include one or more of: antimalarials, corticosteroids, immunosuppressants, and nonsteroidal anti-inflammatory drugs (NSAIDs). The therapeutic intervention may be effective to alleviate or decrease one or more symptoms, which may include one or more of: alopecia, anti- dsDNA seropositivity, arthritis, fever, hematuria, leukopenia, low serum complement, mucosal ulcer, myositis, pericarditis, pleurisy, proteinuria, pyuria, rash, thrombocytopenia, urinary cast, vasculitis, visual disturbance, or a combination thereof. The assessing may be based at least on the presence, absence, or severity of one or more symptoms, such as alopecia, anti-dsDNA seropositivity, arthritis, fever, hematuria, leukopenia, low serum complement, mucosal ulcer, myositis, pericarditis, pleurisy, proteinuria, pyuria, rash, thrombocytopenia, urinary cast, vasculitis, visual disturbance, or a combination thereof.

[0357] In some embodiments, a difference in the feature sets (e.g., quantitative measures of a panel of condition-associated genomic loci) determined between the two or more time points may be indicative of one or more clinical indications, such as (i) a diagnosis of the condition of the subject, (ii) a prognosis of the condition of the subject, (iii) an increased risk of the condition of the subject, (iv) a decreased risk of the condition of the subject, (v) an efficacy of the course of treatment for treating the condition of the subject, and (vi) a non-efficacy of the course of treatment for treating the condition of the subject.

[0358] In some embodiments, a difference in the feature sets (e.g., quantitative measures of a panel of condition-associated genomic loci) determined between the two or more time points may be indicative of a diagnosis of the condition of the subject. For example, if the condition was not detected in the subject at an earlier time point but was detected in the subject at a later time point, then the difference is indicative of a diagnosis of the condition of the subject. A clinical action or decision may be made based on this indication of diagnosis of the condition of the subject, such as, for example, prescribing a new therapeutic intervention for the subject. The clinical action or decision may comprise recommending the subject for a secondary clinical test to confirm the diagnosis of the condition. This secondary clinical test may comprise an imaging test, a blood test, a computed tomography (CT) scan, a magnetic resonance imaging (MRI) scan, an ultrasound scan, a chest X-ray, a positron emission tomography (PET) scan, a PET-CT scan, or any combination thereof.

[0359] In some embodiments, a difference in the feature sets (e.g., quantitative measures of a panel of condition-associated genomic loci) determined between the two or more time points may be indicative of a prognosis of the condition of the subject.

[0360] In some embodiments, a difference in the feature sets (e.g., quantitative measures of a panel of condition-associated genomic loci) determined between the two or more time points may be indicative of the subject having an increased risk of the condition. For example, if the condition was detected in the subject both at an earlier time point and at a later time point, and if the difference is a negative difference (e.g., the quantitative measures of a panel of condition-associated genomic loci increased from the earlier time point to the later time point), then the difference may be indicative of the subject having an increased risk of the condition. A clinical action or decision may be made based on this indication of the increased risk of the condition, e.g., prescribing a new therapeutic intervention or switching therapeutic interventions (e.g., ending a current treatment and prescribing a new treatment) for the subject. The clinical action or decision may comprise recommending the subject for a secondary clinical test to confirm the increased risk of the condition. This secondary clinical test may comprise an imaging test, a blood test, a computed tomography (CT) scan, a magnetic resonance imaging (MRI) scan, an ultrasound scan, a chest X-ray, a positron emission tomography (PET) scan, a PET-CT scan, or any combination thereof.

[0361] In some embodiments, a difference in the feature sets (e.g., quantitative measures of a panel of condition-associated genomic loci) determined between the two or more time points may be indicative of the subject having a decreased risk of the condition. For example, if the condition was detected in the subject both at an earlier time point and at a later time point, and if the difference is a positive difference (e.g., the quantitative measures of a panel of condition-associated genomic loci decreased from the earlier time point to the later time point), then the difference may be indicative of the subject having a decreased risk of the condition. A clinical action or decision may be made based on this indication of the decreased risk of the condition (e.g., continuing or ending a current therapeutic intervention) for the subject. The clinical action or decision may comprise recommending the subject for a secondary clinical test to confirm the decreased risk of the condition. This secondary clinical test may comprise an imaging test, a blood test, a computed tomography (CT) scan, a magnetic resonance imaging (MRI) scan, an ultrasound scan, a chest X-ray, a positron emission tomography (PET) scan, a PET-CT scan, or any combination thereof.

[0362] In some embodiments, a difference in the feature sets (e.g., quantitative measures of a panel of condition-associated genomic loci) determined between the two or more time points may be indicative of an efficacy of the course of treatment for treating the condition of the subject. For example, if the condition was detected in the subject at an earlier time point but was not detected in the subject at a later time point, then the difference may be indicative of an efficacy of the course of treatment for treating the condition of the subject. A clinical action or decision may be made based on this indication of the efficacy of the course of treatment for treating the condition of the subject, e.g., continuing or ending a current therapeutic intervention for the subject. The clinical action or decision may comprise recommending the subject for a secondary clinical test to confirm the efficacy of the course of treatment for treating the condition. This secondary clinical test may comprise an imaging test, a blood test, a computed tomography (CT) scan, a magnetic resonance imaging (MRI) scan, an ultrasound scan, a chest X-ray, a positron emission tomography (PET) scan, a PET-CT scan, or any combination thereof.

[0363] In some embodiments, a difference in the feature sets (e.g., quantitative measures of a panel of condition-associated genomic loci) determined between the two or more time points may be indicative of a non-efficacy of the course of treatment for treating the condition of the subject. For example, if the condition was detected in the subject both at an earlier time point and at a later time point, and if the difference is a negative or zero difference (e.g., the quantitative measures of a panel of condition-associated genomic loci increased or remained at a constant level from the earlier time point to the later time point), and if an efficacious treatment was indicated at an earlier time point, then the difference may be indicative of a non-efficacy of the course of treatment for treating the condition of the subject. A clinical action or decision may be made based on this indication of the non-efficacy of the course of treatment for treating the condition of the subject, e.g., ending a current therapeutic intervention and/or switching to (e.g., prescribing) a different new therapeutic intervention for the subject. The clinical action or decision may comprise recommending the subject for a secondary clinical test to confirm the non-efficacy of the course of treatment for treating the condition. This secondary clinical test may comprise an imaging test, a blood test, a computed tomography (CT) scan, a magnetic resonance imaging (MRI) scan, an ultrasound scan, a chest X- ray, a positron emission tomography (PET) scan, a PET-CT scan, or any combination thereof.

In various embodiments, machine learning methods are applied to distinguish samples in a population of samples. In one embodiments, machine learning methods are applied to distinguish samples between healthy and diseased (e.g., a lupus condition such as SLE or DLE, psoriasis, atopic dermatitis, or systemic sclerosis (scleroderma)) samples.

Kits

[0364] The present disclosure provides kits for identifying or monitoring a disease or disorder (e.g., a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition) of a subject. A kit may comprise probes for identifying a quantitative measure (e.g., indicative of a presence, absence, or relative amount) of sequences at each of a panel of condition-associated genomic loci in a sample of the subject. A quantitative measure (e.g., indicative of a presence, absence, or relative amount) of sequences at each of a panel of condition-associated genomic loci in the sample may be indicative of the disease or disorder (e.g., a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition) of the subject. The probes may be selective for the sequences at the panel of condition-associated genomic loci in the sample. A kit may comprise instructions for using the probes to process the sample to generate datasets indicative of a quantitative measure (e.g., indicative of a presence, absence, or relative amount) of sequences at each of the panel of condition- associated genomic loci in a sample of the subject.

[0365] The probes in the kit may be selective for the sequences at the panel of condition-associated genomic loci in the sample. The probes in the kit may be configured to selectively enrich nucleic acid (e.g., RNA or DNA) molecules corresponding to the panel of condition-associated genomic loci. The probes in the kit may be nucleic acid primers. The probes in the kit may have sequence complementarity with nucleic acid sequences from one or more of the panel of condition-associated genomic loci. The panel of condition-associated genomic loci or genomic regions may comprise at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, or more distinct condition-associated genomic loci.

[0366] The instructions in the kit may comprise instructions to assay the sample using the probes that are selective for the sequences at the panel of condition-associated genomic loci in the cell-free biological sample. These probes may be nucleic acid molecules (e.g., RNA or DNA) having sequence complementarity with nucleic acid sequences (e.g., RNA or DNA) from one or more of the plurality of panel of condition-associated genomic loci. These nucleic acid molecules may be primers or enrichment sequences. The instructions to assay the cell-free biological sample may comprise introductions to perform array hybridization, polymerase chain reaction (PCR), or nucleic acid sequencing (e.g., DNA sequencing or RNA sequencing) to process the sample to generate datasets indicative of a quantitative measure (e.g., indicative of a presence, absence, or relative amount) of sequences at each of the panel of condition-associated genomic loci in the sample. A quantitative measure (e.g., indicative of a presence, absence, or relative amount) of sequences at each of a panel of condition-associated genomic loci in the sample may be indicative of a disease or disorder (e.g., a lupus, psoriasis, atopic dermatitis, and/or systemic sclerosis (scleroderma) condition).

[0367] The instructions in the kit may comprise instructions to measure and interpret assay readouts, which may be quantified at one or more of the panel of condition-associated genomic loci to generate the datasets indicative of a quantitative measure (e.g., indicative of a presence, absence, or relative amount) of sequences at each of the panel of condition-associated genomic loci in the sample. For example, quantification of array hybridization or polymerase chain reaction (PCR) corresponding to the panel of condition-associated genomic loci may generate the datasets indicative of a quantitative measure (e.g., indicative of a presence, absence, or relative amount) of sequences at each of the panel of condition-associated genomic loci in the sample. Assay readouts may comprise quantitative PCR (qPCR) values, digital PCR (dPCR) values, digital droplet PCR (ddPCR) values, fluorescence values, etc., or normalized values thereof. Various systems, methods, classifiers and kits are described in WO 2020/102043, which is entirely incorporated herein by reference.

[0368] In some embodiments, the dataset comprises RNA gene expression or transcriptome data, DNA genomic data, or a combination thereof. In some embodiments, the biological sample comprises a whole blood (WB) sample, a PBMC sample, a tissue sample, a cell sample or any derivative thereof. In some embodiments, assessing the SLE condition of the subject comprises determining a diagnosis of the SLE condition, a prognosis of the SLE condition, a susceptibility of the SLE condition, a treatment for the SLE condition, or an efficacy or non-efficacy of a treatment for the SLE condition.

[0369] In some embodiments, the method further comprises determining a diagnosis of the SLE condition with a sensitivity of at least about 70%. In some embodiments, the method further comprises determining a diagnosis of the SLE condition with a specificity of at least about 70%. In some embodiments, the method further comprises determining a diagnosis of the SLE condition with a positive predictive value of at least about 70%. In some embodiments, the method further comprises determining a diagnosis of the SLE condition with a negative predictive value of at least about 70%. In some embodiments, the method further comprises determining a diagnosis of the SLE condition with an Area Under Curve (AUC) of at least about 70%. In some embodiments, the method further comprises determining a likelihood of the diagnosis of the SLE condition of the subject. [0370] In some embodiments, the method further comprises generating a plurality of drug candidates for the SLE condition of the subject. In some embodiments, the method further comprises evaluating or predicting a relative efficacy of the plurality of drug candidates for the SLE condition of the subject. In some embodiments, the method further comprises providing a therapeutic intervention comprising one or more of the plurality of drug candidates for the SLE condition of the subject.

[0371] In some embodiments, the method further comprises monitoring the SLE condition of the subject, wherein the monitoring comprises assessing the SLE condition of the subject at each of a plurality of time points, and processing the plurality of assessments of the SLE condition of the subject at each of the plurality of time points.

[0372]

EXAMPLES

[0373] The following illustrative examples are representative of embodiments of the software applications, systems, and methods described herein and are not meant to be limiting in any way.

Example 1: Altered expression of genes controlling metabolism characterizes the tissue response to immune injury in lupus

[0374] In an aspect, the present disclosure provides systems and methods for using bioinformatics approaches to deconvolute bulk mRNA for various cells and processes involved in lupus organ pathology, including inflammatory cells, endothelial cells, tissue cells.

[0375] In an aspect, the present disclosure provides systems and methods for the delineation of the altered metabolism of cells by using gene expression analysis.

[0376] In an aspect, the present disclosure provides systems and methods for using various regression models (e.g., classification and regression trees, linear regression, step-wise regression) to dissect the specific metabolic alterations in individual cell types.

[0377] In an aspect, the present disclosure provides systems and methods for using animal models and the ability to translate mouse gene expression into the human equivalent to confirm the results in humans and also analyze the effects of treatment.

[0378] In an aspect, the present disclosure provides systems and methods for the delineation of the role of specific cells (myeloid cells) and processes (interferon, mitochondrial dysfunction) in lupus tissue pathology.

[0379] In an aspect, the present disclosure provides systems and methods for using non-lymphocyte populations in skin and kidney toward diagnostic and/or prognostic biopsy tests. [0380] In an aspect, the present disclosure provides systems and methods for defining gene signatures in individual cell types in a mixed population such as blood or tissue (e.g., skin, kidney).

[0381] In an aspect, the present disclosure provides systems and methods for analyzing sets of metabolism genes and their relationship to function and cell type, including subsets of myeloid cells (e.g., subsets of myeloid celis new).

[0382] To compare lupus pathogenesis in disparate tissues, we analyzed gene expression profiles of human discoid lupus erythematosus (DLE) and lupus nephritis (LN). We found common increases in myeloid cell-defining gene sets and decreases in genes controlling glucose and lipid metabolism in lupus-affected skin and kidney. Regression models in DLE indicated increased glycolysis was correlated with keratinocyte, endothelial, and inflammatory cell transcripts, and decreased tricarboxylic (TCA) cycle genes were correlated with the keratinocyte signature. In LN, regression models demonstrated decreased glycolysis and TCA cycle genes were correlated with increased endothelial or decreased kidney cell transcripts, respectively. Less severe glomerular LN exhibited similar alterations in metabolism and tissue cell transcripts before monocyte/myeloid cell infiltration in some patients. Additionally, changes to mitochondrial and peroxisomal transcripts were associated with specific cells rather than global signal changes. Examination of murine LN gene expression demonstrated metabolic changes were not driven by acute exposure to type I interferon and may be restored after immunosuppression. Finally, expression of HAVCR1, a tubule damage marker, was negatively correlated with the TCA cycle signature in LN models. These results indicate that altered metabolic dysfunction is a common, reversible change in lupus-affected tissues and appears to reflect damage downstream of immunologic processes.

[0383] Systemic lupus erythematosus (SLE) is a complex autoimmune disease that affects multiple tissues within the body, including the skin and kidneys [1,2]. Although the primary mechanisms of SLE pathogenesis involve hyperactivity of both the innate and adaptive immune systems, evidence surrounding the involvement of perturbed metabolic activity has recently emerged [3-5], Whereas systemic metabolic dysregulation has been associated with lupus-related morbidities, such as atherosclerosis [6], the contribution of cellular metabolic abnormalities in human lupus-affected tissues has yet to be fully explored. Metabolic derangements in tissues and their contributions to disease pathology have been investigated in a number of inflammatory and rheumatic diseases. For example, abnormalities in mitochondrial functions contribute to immune/inflammatory skin diseases [7], and skin cells have been shown to upregulate the pentose phosphate pathway (PPP) under oxidative stress [8], Moreover, rheumatoid arthritis synoviocytes shift their metabolism to glycolysis because of local hypoxia [9] and osteoarthritis exhibited increased synovial tricarboxylic acid (TCA) cycle intermediates [10], Similarly, metabolic impairment has been found in many forms of kidney disease [11,12], including defects in fatty acid oxidation (FAO) that have been correlated with fibrosis progression in the kidney tubulointerstitium [11], [0384] At the cellular level, metabolic abnormalities in inflammatory diseases may be related to the nature of the immune cell infiltrate and its activation status. For example, macrophage polarization is regulated by glycolysis and FAO [13], and T cells reallocate glucose and upregulate glycolysis following activation [14], Moreover, macrophage markers have been associated with increased PPP activity in kidney diseases, including lupus nephritis (LN) [15], Additionally, exhausted T cells with altered mitochondrial function have been found in murine LN [16], and T cells isolated from SLE patients exhibit alterations in lipid composition [17], The idea of targeting T cell metabolism for lupus therapy has been advocated [18,19] based on the finding that CD4 T cells in lupus-prone mice had elevated glycolysis and oxidative metabolism that may be normalized with metformin and 2- Deoxy -D-glucose (2DG) treatment resulting in disease improvement [4],

[0385] Taken together, these studies suggest metabolic abnormalities in either infiltrating inflammatory cells or tissue cells contribute to and/or reflect tissue damage. To examine this in greater detail, we analyzed gene expression profiles in human and murine lupus tissues to discern the nature of abnormal metabolic pathways, elucidate cellular origins of metabolic abnormalities, and determine how inflammatory cells contribute to physiologic processes involved in tissue inflammation and damage.

[0386] Dysregulation of metabolic gene signatures was found to be common among lupus-affected tissues. Despite thousands of differentially expressed genes (DEGs) in discoid lupus erythematosus (DLE), World Health Organization (WHO) class III/IV LN glomerulus (GL) and WHO class III/IV LN tubulointerstitium (TI), there were only 559 increased and 324 decreased transcripts in common (FIG. 1A). Protein-protein interaction analysis showed the common upregulated genes were related to interferon (IFN) and immune signaling pathways (FIG. IB). The common downregulated genes were related to mitochondrial processes, including the TCA cycle and oxidative phosphorylation (OXPHOS).

[0387] As there is considerable transcriptomic heterogeneity among lupus patients [20,21], we sought to examine expression of genes controlling metabolism at the individual patient level, and, therefore, employed gene set variation analysis (GSVA) [22] (Table 2). Generally, lupus tissues exhibited lower GSVA scores indicative of metabolic pathways, whereas controls had higher metabolism GSVA scores (FIGs. 1C-1I). The glycolysis and OXPHOS signatures were decreased in LN TI, whereas the TCA cycle and fatty acid beta oxidation (FABO) gene signatures were decreased in all tissues. Decreases in genes signifying fatty acid alpha oxidation (FAAO) and amino acid (AA) metabolism were similarly detected in the majority of lupus patients.

[0388] It was found that increased myeloid cell signatures and decreased tissue cell signatures characterize the majority of lupus patients. To determine whether cellular changes accounted for the decreased metabolic signatures, we examined enrichment of immune and non-hematopoietic cell signatures in lupus tissues. Increased tissue enrichment of immune cell signatures was variable, with DLE and LN GL demonstrating more enrichment of inflammatory cell signatures as compared to LN TI (FIGs. 2A, FIGs. 9A-9O, FIG. 10). There was evidence of increased plasmacytoid dendritic cell (pDC), dendritic cell (DC), or monocyte/myeloid cell (monocyte/MC) signatures in some patients from each tissue. Non-hematopoietic cell gene expression was additionally altered. The endothelial cell (EC) signature was increased in LN GL. Keratinocyte transcripts were increased in 4/9 DLE patients, whereas melanocyte transcripts were decreased in 5/9 patients. General kidney cell transcripts were decreased in the majority of LN patients. Genes indicative of podocytes were decreased, whereas genes indicative of mesangial cells were increased in LN GL. Finally, the proximal tubule signature was significantly decreased and the collecting duct signature was significantly increased in LN TI.

[0389] Although the monocyte/MC signature was consistently increased among lupus-affected tissues, its nature in each tissue varied. Linear regression of the monocyte/MC signature and FCN1 expression indicated DLE and LN GL contained inflammatory monocyte-derived macrophages [23], but the correlation in LN TI was weak (FIG. 2B, Table 2). Tissue-resident macrophage populations were evaluated in DLE by expression of Fl 3 Al, FOLR2, SEPPI, and TXNIP [24] and in LN by coexpression of CD74 and CD81, which is characteristic of renal tissue-resident macrophages [25], The tissue-resident macrophage signature was present in LN TI as evidenced by monocyte/MC correlation with both markers and co-expression of CD74 and CD81 (FIGs. 2B-2C, FIG. 11B). However, the tissue-resident macrophage signature was not identified in DLE as shown by lack of correlation with Fl 3 Al, SEPPI, and TXNIP. Similarly, the tissue-resident macrophage signature was not found in LN GL as shown by lack of correlation with CD81 and no co-expression between CD74 and CD81. This suggests monocyte/MCs in DLE and LN GL are predominantly infiltrating inflammatory monocyte-derived macrophages, whereas LN TI is populated by tissue-resident macrophages.

[0390] It was found that Class II LN GL is molecularly similar to class ni/IV LN GL. To elucidate whether the same metabolic signature changes were present in less severe LN, we expanded our analysis to incorporate WHO class II LN samples, where less immune cells have been observed histologically [26], Unexpectedly, we found that genes controlling glycolysis, the TCA cycle, FAO, and AA metabolism were decreased in class II LN GL (FIGs. 3A-3G), and gene expression of pDCs and B cells was increased in class II (FIGs. 3H-3O). It is notable that gene expression of non- hematopoeitic cell populations was altered in class II LN GL, including a decrease in kidney cell genes and an increase in EC genes (FIGs. 3P-3T). These abnormalities in expression of metabolic genes and non-hematopoeitic cell genes were noted even in patients in which an increase in monocyte/MC genes was not detected (FIG. 12). Altogether, even though inflammatory cell gene signatures were less pronounced in some patients relative to class III/IV, class II LN GL was molecularly similar to class III/IV LN GL (FIG. 3U).

[0391] As in the glomerulus, class II LN TI was not statistically different from class III/IV LN TI (FIGs. 13A-13U); however, class II LN TI did not display decreased metabolic signatures. Class II LN TI had less indication of tissue cell damage and inflammatory infdtrate than class II LN GL (FIG. 14)

[0392] It was found that cellular signatures are associated with metabolic gene signature dysregulation in lupus-affected tissues. Stepwise regression, which identifies the independent variables that best explain the dependent variable [27], was employed to analyze cellular signature associations with metabolism gene signature changes. To improve precision, highly collinear cellular signatures in DLE were combined for stepwise analysis. In DLE, the inflammatory cell, EC, and keratinocyte signatures were positively correlated with the glycolysis signature, as indicated by a positive regression coefficient (FIGs. 15A-15B). Conversely, the GC B cell and fibroblast signatures were negatively correlated with other metabolic signatures. The keratinocyte signature was also negatively correlated with the TCA cycle signature.

[0393] We then implemented both stepwise regression and classification and regression tree (CART) analysis, which partitions data by independent variables [28], to determine the cellular signatures that were most associated with the metabolic signature changes in all classes of LN and controls. In LN GL, all metabolic signatures except for the PPP signature exhibited some dependence on the kidney cell signature by either stepwise regression or CART (FIGs. 4A-4H). The stepwise regression coefficients for the kidney cell signature were positive and of the largest magnitude for the TCA cycle, FAAO, FABO, and AA metabolism signatures; a negative regression coefficient was noted for the EC signature and glycolysis. Moreover, CART identified the EC signature as the strongest contributor to the glycolysis signature, with a positive EC GSVA score predicting a lower glycolysis GSVA score. Importantly, the EC signature classifier alone separated 23 of 30 LN GL samples from controls. It is notable that by CART analysis upregulation of the monocyte/MC signature tended to mitigate the decreased glycolysis signature related to ECs, suggesting that monocyte/MC genes might contribute to glycolysis in opposite directions from EC genes. The monocyte/MC signature exhibited a negative stepwise regression coefficient with the OXPHOS signature, which was also positively associated with the kidney cell signature by CART. Together, the stepwise regression and CART models indicate decreased OXPHOS is reflective of decreased kidney cell and increased monocyte/MC signatures, although increased pDC genes may also contribute.

[0394] The contribution of kidney-specific cell signatures to most metabolic changes was also observed in the TI (FIGs. 16A-16H). In contrast to LN GL, EC genes did not contribute to the decreased glycolysis signature in LN TI, but instead the kidney cell signature was predicted by stepwise regression and CART. Numerous cell types were found to contribute to the overall TCA cycle signature. Stepwise regression demonstrated a positive contribution from proximal tubule, kidney cell, and granulocyte transcripts, and a negative contribution from fibroblast and pDC cell signatures. CART identified the proximal tubule signature as the strongest contributor, with a positive correlation to the TCA cycle signature, and confirmed the involvement of the fibroblast signature as well. Monocyte/MC transcripts contributed most to the OXPHOS signature, although the regression coefficient was negative, implying that the presence of monocyte/MC contributed to a decreased OXPHOS signature. The FABO signature, which may be defective in affected TI of multiple etiologies [11,12,29], FAAO, and AA metabolism signatures also demonstrated a positive relationship with expression of kidney cell and/or proximal tubule genes.

[0395] We sought to confirm these findings by referencing data from single-cell RNA-sequencing (scRNA-seq) of LN biopsies [30], In LN, tissue-resident macrophages exhibited decreased OXPHOS genes (FIG. 17), which aligns with the negative relationship that stepwise regression and CART predicted between the monocyte/MC and OXPHOS signatures in LN TI (FIGs. 16A and 16E). Conversely, CD4 T cells had both increased and decreased expression of glycolysis genes, making their metabolism difficult to interpret. The kidney epithelial cell cluster, which was comprised of general kidney cell as well as tubule genes, exhibited significant negative differential expression of two glycolysis genes (G6PC, GAPDH) and one TCA cycle gene (PDK4), but four OXPHOS genes were significantly over-expressed (MT-CO2, MT-CO3, MT-ND5, NDUFS3). This supports the idea that kidney cells have high oxidative metabolism, and their absence or dysfunction may contribute to decreased OXPHOS signatures. However, insufficient genes were detected by scRNA-seq to confirm the current findings determined with bulk RNA analysis.

[0396] It was found that mitochondrial and peroxisomal signature changes and local hypoxia contribute to changes in metabolic gene expression in specific cells. As mitochondria and peroxisomes are the primary organelles responsible for metabolic processes such as OXPHOS and FAO [31], we sought to examine whether there were detectable changes to organelle-specific gene expression that may explain the altered metabolic state. There was no evidence of global mitochondrial gene expression changes, although mitochondrial genes were decreased in approximately half of DLE patients, and mitochondrial transcription was increased in 15/30 LN GL patients (FIGs. 5A-5E). Conversely, the apoptotic mitochondrial changes signature was increased in both DLE and LN GL (FIG. 5F). Genes associated with peroxisome biogenesis were decreased in some lupus patients in each tissue; in contrast, genes associated with peroxisomal fission were increased in some class III/IV LN TI patients (FIGs. 5G-5H). Stepwise regression analysis revealed a positive association between the apoptotic mitochondrial signature and the granulocyte and inflammatory cell signatures in DLE, and positive associations between the apoptotic mitochondrial signature and MC signatures in LN (FIGs. 5I-5K). Positive associations were observed between the peroxisome biogenesis signature and the kidney cell and proximal tubule signatures in LN GL and LN TI, respectively.

[0397] Since hypoxia has been cited as a driver of kidney disease in the TI [32], and a hypoxic microenvironment may result in decreased oxidative metabolism, we examined the contribution of HIF1A to metabolic gene signatures. Some lupus patients had increased expression of HIF1A by GSVA (FIG. 5L). Although HIF1 A expression did not significantly affect metabolic signatures in DLE, in LN TI, the HIF1 A gene signature had a positive correlation with the glycolysis signature, whereas in LN GL, the HIF1A gene signature was positively associated with the PPP signature (FIGs. 5M-5O). This suggests hypoxia contributes to specific metabolic alterations in the kidney.

[0398] It was found that metabolic gene expression changes occur independent of acute IFN stimulation. The IFN gene signature (IGS), a known hallmark of lupus [33,34] and lupus-affected tissues [35], has been implicated in metabolic alteration of MCs [36-39], To determine the functional relationship between IFN stimulation and metabolic alteration, we examined metabolic gene expression longitudinally in the IFNα-accel erated NZB/W murine model of LN, where the IGS increases at early timepoints following injection of IFNa adenovirus and then increases again when kidney disease develops (FIG. 6A). To determine whether metabolic signatures were changed with the elevated IGS, we examined metabolic signatures after IFNa administration. No significant changes in metabolic signatures were observed at week 1 (Wl) after IFNa administration (FIGs. 6B- 6H), the peak of early IGS expression. In contrast, the TCA cycle, OXPHOS, FAO, and AA metabolism gene signatures were decreased at W7 post-IFNa administration, when LN develops [40], There were negative correlations between the IGS and metabolic signatures for the TCA cycle, OXPHOS, FAO, and AA metabolism, suggesting increased IFN may contribute to decreased metabolism. However, these is a clear separation of the Wl and W7/W9 datapoints in the correlation analysis for TCA cycle, OXPHOS and AA metabolism signatures, in which Wl datapoints show positive GSVA scores for both the IGS and metabolic signatures. Thus, based upon the lack of transcriptional metabolic changes following early IFN exposure in the mouse, it does not appear that acute exposure to type I IFN explains the metabolic changes in lupus tissues, as transcriptional changes to metabolism occur when disease develops and the IFN signature reoccurs, but not at Wl, W2, or W3.

[0399] It was found that metabolic and cellular gene expression changes in murine LN are corrected by immunosuppressive treatment. To determine the robustness of the observed metabolic and cellular gene expression changes and determine whether dysregulation may be reversed by immunosuppressive therapy, we analyzed gene expression in pre- and post-treatment kidneys of lupus mice. Although some metabolic changes had been identified in the kidneys of untreated and treated NZB/W and NZM2410 mice [41], no analysis of the nature of the affected cells was carried out.

[0400] Metabolic gene expression was significantly altered in NZM2410, NZB/W, IFNα- accelerated NZB/W (GSE72410), and NZW/BXSB mice. Treatment of NZM2410 mice with BAFF- R-Ig and proteinuric NZB/W mice with a combination of cyclophosphamide (CTX)+CTLA4- Ig+anti-CD154 restored TCA cycle, FAAO, FABO, and AA metabolism gene expression (FIGs. 7A-7B). Notably, in the NZB/W kidneys when combination therapy was discontinued and LN relapsed, some metabolic abnormalities recurred. In IFNα-accelerated NZB/W mice, both the glycolysis and TCA cycle signatures were significantly decreased with the onset of LN (IFN W7), and CTX treatment significantly restored TCA cycle gene expression to baseline pre-disease levels (FIG. 7C). In MRL/lpr mice, the same trends were observed, although the data did not achieve statistical significance (FIG. 7D). NZW/BXSB mice had decreased metabolic signatures for all processes except glycolysis, the PPP, and OXPHOS (FIG. 7E).

[0401] GSVA of cellular changes in murine LN models demonstrated similar results to human LN, although inflammatory infiltrate and changes in the EC and podocyte signatures were less robust (FIGs. 18A-18Q, FIGs. 19A-19R, FIGs. 20A-20S, FIGs. 21A-21S, FIGs. 22A-22R, FIGs. 23A- 23Q). BAFF-R-Ig, combination therapy, and CTX-treatment restored kidney cell and proximal tubule gene signatures, and combination therapy decreased inflammatory cell gene signatures. Correlation analysis in all murine LN models showed strong positive correlations between metabolism and proximal tubule transcripts, whereas negative correlations were seen between metabolism and inflammatory cell signatures (FIGs. 24A-24F). This suggests that treatment may allow for functional metabolic recovery of non-hematopoietic cells in the kidney in the presence or absence of changes to the inflammatory cell signal.

[0402] It was found that metabolic changes correlate with expression of genes indicating tubular damage. Finally, we examined whether changes in genes controlling metabolism occurred synchronously with changes in expression of HAVCR1 (KIMI) and LCN2, which are known markers of tubular damage [42,43], Although HAVCR1 expression was increased in class III/IV human LN TI patients, there were no significant changes to LCN2 in any class of LN TI (FIGs. 8A- 8B). Similarly, in murine LN models, Havcrl and Lcn1 expression increased with active disease and returned toward normal with treatment. Both markers demonstrated significant inverse correlations with the kidney cell, proximal tubule, and TCA cycle signatures in both human and murine LN, although there were model-dependent differences in magnitude (FIGs. 8C-8F, FIG. 25).

[0403] Multi-pronged bioinformatic analyses of gene expression data from human lupus tissues revealed that despite intra-tissue heterogeneity metabolic dysfunction was present in all tissues. Immune effector cells have high metabolic needs [13,14,44,45] and, therefore, we initially hypothesized immune infiltration may be responsible for the observed lupus tissue-wide metabolic dysregulation. Although kidney-infiltrating CD8 T cells in murine LN are functionally exhausted with defective mitochondria [16], anergic/activated T cell markers were not found in these human LN samples, and regression models indicated T cells contributed minimally to changes in renal metabolic gene expression. Similarly, monocyte/MCs, which were increased in some patients from all tissues, might be expected to contribute to enhanced glucose metabolism - either glycolysis (Ml macrophages) or OXPHOS (M2 macrophages) [44], Indeed, monocyte/MC signatures were inversely correlated with OXPHOS in both LN tissues, and positively correlated with glycolysis in LN GL, suggesting they are likely Ml in nature and may contribute to the altered metabolic landscape of intact tissues. Although gene expression revealed differing origins of the renal MC populations, as they reflect monocyte-derived macrophages in LN GL and tissue-resident macrophages in LN TI, the consistent MC presence aligns with their prominent role in tissue damage [46], Observed increases in the monocyte/MC signature and strong inverse correlations between MCs and metabolism may reflect the role of MCs in tissue damage, even when T and B cells are not yet abundant.

[0404] To examine whether metabolic abnormalities represented primarily tissue cell defects as opposed to changes in inflammatory cell metabolism, we analyzed gene expression in class II LN, in which less inflammatory infdtrate is evident histologically [26], Even though class II LN samples had evidence of increased immune/inflammatory cell signatures that coincided with changes to metabolic signatures, examples were observed in which the changes in metabolic signatures were found in the absence of a monocyte/MC signature, suggesting alterations in metabolic signatures may be initiated immediately following immune complex (IC) deposition and complement activation. Subsequent monocyte/MC and other inflammatory cell activation/infiltration may then contribute to further damage of tissue cells. Indeed, changes in kidney gene expression may occur following early IC deposition, but before microscopic detection of inflammation [47], consistent with our transcriptomic results in class II LN.

[0405] Our findings of altered metabolism in lupus tissues align with changes seen in other forms of tissue pathology. We observed a positive association between the keratinocyte and glycolysis signatures, and upregulation of glycolysis has been observed in keratinocytes during cutaneous infection [48], Increased glycolysis [49,50] and decreased PPAR signaling, TCA cycle, and OXPHOS have been reported in dermal fibroblasts subjected to radiation [50], indicating dermal fibroblasts have the potential to contribute to inflammation-induced alterations in metabolism. However, in the current study, stepwise regression indicated that the fibroblast signature was negatively associated with the glycolysis, PPP, and OXPHOS signatures, whereas associations with FAO did not achieve statistical significance. The negative correlation between the fibroblast and PPP signatures contrasts with the observed upregulation of the PPP in cultured fibroblasts and keratinocytes that were exposed to UV-induced oxidative stress [8], This suggests that in vivo fibroblasts are altered by signals different than that mediated by UV light, as expected since fibroblasts are deep in the dermis and shielded from such ambient exposure.

[0406] Metabolic dysregulation is also common in kidney disease [12,15,29,51,52], In non-diabetic chronic kidney disease, TCA cycle abnormalities measured in urine metabolites coincided with changes to kidney gene expression [51], supporting our conclusions that metabolic dysregulation primarily reflects altered renal cell function as opposed to changes in immune cell metabolism. Moreover, defects in FAO have been correlated with fibrosis progression in TI disease [11], Both human and murine models with TI fibrosis exhibited decreased expression of FAO enzymes and resultant increases in lipid deposition, which was reversed by correcting the metabolic abnormalities [11], We similarly found that FAO signatures are substantially decreased in the TI; however, regression models indicated that altered FAO transcripts were most associated with decreased kidney cell signatures. Although fibroblasts were not predicted by the models, fibrosis or fibroblast enrichment may contribute to decreased kidney cell and proximal tubule transcripts in the TI.

[0407] In the glomerulus, ECs appear to play an additional role in disease. Endothelial activation in LN has been suggested [53], and EC transcripts were increased in 83% of all LN GL samples. Increased EC transcripts may reflect altered EC function, potentially resulting from cytokine/growth factor stimulation and/or hypoxia-induced cellular damage. Abnormal angiogenesis has been reported in diabetic nephropathy resulting in leaky vessels54, but the function of glomerular ECs in LN is less well-defined. Indeed, glomerular ECs have been found to be dependent upon podocyte stimulation for differentiation [55], whereas other studies suggest EC damage precedes podocyte injury [56], Our findings from class II LN support the latter, as signature changes to ECs occurred in the absence of podocyte changes in some patients, suggesting that ECs are early participants in LN.

[0408] The relationship between the EC signature and metabolic gene expression changes implied an alteration in EC physiology in LN. Although healthy ECs are highly glycolytic [57], all regression techniques in LN GL indicated an inverse correlation between the EC and glycolysis signatures. Consistent with our findings, in diabetic nephropathy stalled glycolytic flux has been observed in ECs [57], These data suggest that glomerular ECs are metabolically altered, perhaps because of IC and complement stimulation and/or cytokine exposure, making them less capable of maintaining normal function. Indeed, in the GL, the EC signature had a positive regression coefficient with the FABO signature, and quiescent ECs have been reported to upregulate FABO [58], supporting the conclusion that ECs are functionally deranged in LN GL.

[0409] Analysis of metabolism-associated genes in cell clusters derived from scRNA-seq of LN biopsies [30] further supports our finding that changes in metabolism are most closely related to kidney cell gene expression, with minor contributions from resident or infiltrating immune cells. Tissue-resident macrophages exhibited decreased OXPHOS genes, whereas CD4 T cell metabolism was unclear. Notably, the kidney epithelial cell cluster reported many metabolism-associated genes, suggesting decreased glycolysis and TCA cycle, but increased OXPHOS. Proximal tubules, which have the most mitochondria of any kidney epithelial cell, are known to be dependent on oxidative metabolism [59], and this further supports the idea that diminished OXPHOS in bulk RNA in part reflects decreased kidney epithelial cell transcripts. Additionally, because scRNA-seq looks at expression of individual cells as opposed to the bulk environment, the detected kidney epithelial cells are likely the residual functionally normal ones. However, because of technical issues including cell yield and read depth, there are difficulties in determining the status of individual cell metabolism from the scRNA-seq data. Altogether, scRNA-seq appeared to be less effective than deconvolution of bulk RNA to detect important but subtle changes in cellular metabolism.

[0410] To determine whether defective organelles were responsible for metabolic alteration, we examined gene expression specific for both mitochondrial and peroxisomal function. We observed changes to the mitochondrial and peroxisomal gene signals in some patients, and correlation analysis suggested these changes were associated with specific cell types. Notably, the peroxisome biogenesis signature was positively associated with signatures for ECs, kidney cells, and proximal tubules. Moreover, as the kidney is highly susceptible to hypoxia [60], we investigated the propensity for hypoxia to contribute to altered metabolism. Although GSVA demonstrated no significant increases in HIF1 A expression, there was an association between HIF1 A and the PPP and glycolysis signatures in LN GL and LN TI, respectively, suggesting that hypoxia may have specific effects on metabolism, and may contribute to some of the metabolic changes observed in the tissues.

[0411] The relationship between the IGS and metabolic signature changes in IFNα-accelerated LN mice indicated that acute type I IFN exposure may not explain the observed metabolic changes. The mouse studies were particularly informative because IFNa exposure was regulated. Although it has been demonstrated that type I IFN stimulation may increase OXPHOS and FAO in DCs and MCs [36,38], inhibit isocitrate dehydrogenase (part of the TCA cycle) in macrophages [39], and alter oxidative metabolism in other cells [61], IFNa overexpression did not change metabolic signatures at early timepoints. However, there was an inverse relationship between the IGS and metabolic defects after LN onset, when the IGS recurred. These results support the conclusion that downregulation of metabolic pathways is unlikely to be explained by the known actions of type I IFN alone, but rather during LN progression, decreased metabolic signatures may be parallel reflections of continued IGS exposure and inflammation.

[0412] Metabolic gene expression was altered in four murine LN models and immunosuppressive treatment, not known to directly affect cellular metabolism, restored metabolic gene expression. Although combination therapy diminished inflammatory cell abundance in the kidneys of NZB/W mice, we also observed restoration of metabolic and kidney cell gene expression after treatment in models with little inflammatory infiltrate or those in which inflammatory cells were not significantly decreased with treatment. This suggests that although inflammatory cells play a critical role in mediating tissue cell damage and metabolic dysfunction, damage is not related only to local inflammatory cells, as intensive therapy with CTX restores tissue cell defects without significant changes to inflammatory cells. Importantly, these results demonstrate that metabolic abnormalities in tissue cells are reversible with immunosuppressive therapy and restored metabolic gene expression might be considered a goal of effective lupus treatment. Moreover, monitoring tissue metabolism may be especially important in situations where anti-metabolic drugs, such as metformin or 2DG, are employed. Whereas these drugs may be promising for correction of individual immune cell defects, they have potential consequences for already metabolically deranged tissue cells.

[0413] Additionally, these studies reveal subtle differences in pathology of glomerular and tubulointerstitial involvement in LN. In both kidney regions, we observed decreased resident non- hematopoietic cell signatures and increased monocyte/MC signatures. However, monocyte/MCs in the glomerulus were likely monocyte-derived, whereas those in the tubulointerstitium appeared more like tissue-resident macrophages. Moreover, tubulointerstitial diseases in class III/IV LN was characterized by less inflammatory infiltrate than was observed in the glomerulus, and although metabolic signatures were comparably decreased in all classes of glomerular LN, metabolic signature changes in class II tubulointerstitial LN were less consistently regulated. These results align with studies that show tubulointerstitial damage occurs later in LN and predicts end stage renal disease [62], Poorer outcomes in class III/IV may be related to persistence of abnormalities or inhibition of repair mechanism that might contribute to progressive renal disease. Nonetheless, it is noteworthy that even the more modest immunologic damage in class II LN was associated with marked changes in metabolic signatures.

[0414] Detectable changes to immune cell, EC, kidney cell, and metabolic gene signatures in all classes of LN GL is notable. We found that gene expression may detect cellular changes with greater sensitivity than immunohistochemistry, when little or no inflammatory infiltrate is observed histologically. Gene expression may provide an advance to current classification or diagnostic techniques, as gene expression changes are detectable before discernable immunohistochemical changes, and transcriptomic analysis of metabolism may elucidate potential functional rather than merely histopathologic changes.

[0415] Our study is not without limitations. We performed post-hoc analysis of bulk gene expression in three lupus-affected tissues comprising limited numbers of lupus patients. Moreover, 37.5% of LN patients were being treated with immunosuppresives [53], that may have affected gene signatures. Additionally, the gene signature we identified as reflecting general kidney cells may be more specific for tubule cells, despite the strong representation in the glomerulus. Furthermore, regression analyses provided an estimate of the cellular variables that are most associated with each metabolic signature, but accuracy may be limited by sample size, and there is a chance of overfitting. Future work with larger cohorts currently not available would be necessary to validate these results. Additionally, direct assessment of functional metabolism may be necessary to assay how metabolic changes at the gene expression level reflect changes in protein content and cellular function.

[0416] In conclusion, prominent alterations in cellular metabolism signatures are characteristic of lupus tissue pathology. Systems bioinformatics and assessment with regression modeling techniques revealed that the monocyte/MC signature, including both monocyte-derived macrophages and tissueresident macrophages, was increased in many lupus patients, kidney cell signatures were decreased in LN, the EC signature was increased in LN GL, and these cell signature changes were associated with altered metabolism signatures. Moreover, apoptotic mitochondrial gene changes were associated with MC genes in DLE and LN GL. In murine LN, metabolic dysregulation correlated with tubular damage marker expression and metabolic gene changes were reverted to normal by immunosuppressive therapy. Altogether, altered metabolism may serve as a promising biomarker or therapeutic target for lupus tissue disease, especially as metabolic gene expression changes precede expression of the renal damage biomarker LCN2 in human LN, and coincide with changes to HAVCR1. Indeed, urinalysis has been used to measure metabolite biomarkers in the kidney [51,52] and metabolism transcripts may be used to estimate degree of kidney cell damage and assess treatment efficacy. Although treatment strategies aimed at metabolic restoration are not straightforward, the current findings support the conclusion that immunosuppressive therapy may restore metabolic function, and, thereby, may ameliorate damage in specific lupus-affected tissues.

[0417] Human and mouse gene expression datasets were analyzed as follows. Raw data from publicly available human and murine lupus datasets were derived from the Gene Expression Omnibus (GEO) repository.

[0418] GSE72535 comprises microarray analysis of lesional skin biopsies from human patients with DLE with no systemic involvement with Cutaneous Lupus Activity and Severity Index (CLASI) > 2 [63], Some DLE patients were treated with various therapies including corticosteroids, immunomodulators, and hydroxychloroquine [63],

[0419] GSE32591 comprises microarray analysis of human renal biopsies that were originally derived from the European Renal cDNA Bank (ERCB) [53], LN patients from the ERCB (n = 32) had an average age of 35.1 ± 2.4 years, average proteinuria of 2.9 ± 0.6 g/day, and eGFRMDRD of 63.7±5.4 ml/min/1.73m2. LN patients were treated with various therapies, including steroids and immunosuppressants53. Two patients from this group were excluded from our analyses because they had non-infl ammatory class V LN.

[0420] GSE86423 comprises samples from the IFNα-accelerated NZB/W LN model. Female NZB/W mice were injected with an adenovirus vector expressing recombinant murine IFNα (5 X 109 particles) at 9 weeks40. Kidney gene expression was measured in mice at 0, 1, 2, 3, 4, 5, 7, and 9 weeks post IFNa injection [40],

[0421] GSE32583 and GSE49898 comprise samples from three murine lupus models. Kidney gene expression was measured in NZM2410 mice including 6-8 week pre-disease mice, 22-30 week diseased mice, and treated mice in remission (Tx + 15w) [41,53], NZM2410 mice in the Tx + 15w group were treated with adenovirus expressing BAFF-R-Ig at 22 weeks, and then sacrificed at 30-35 weeks or 55 weeks [41], Kidney gene expression in NZB/W mice was measured at 16w, 23w, 36w, and after treatment [41,53], Both 23 w and 36w mice shown in the main figures had confirmed proteinuria. Some NZB/W mice with proteinuria (>300 mg/dl) at two timepoints were treated with combination therapy - one dose of 50mg/kg CTX, six doses of lOOpg CTLA-4-Ig, and one dose 250pg of anti-CD40L41. Mice were determined to be in remission if they achieved proteinuria of <30 mg/dl at two timepoints [41], One group was sacrificed 3-4 weeks after remission (Tx Rem. + 3- 4w) and another was sacrificed >5-14 weeks after remission (Tx Rem. + >5w) [41], The latter group had confirmed histologic relapse [41], Kidney gene expression from NZW/BXSB mice was measured at 17w (prenephritic mice) or 18-21w mice with confirmed proteinuria [53],

[0422] GSE72410 comprises samples from the IFNα-accel erated NZB/W LN model. NZB/W mice were treated with adenovirus-expressing murine IFNa at 14-15w (1.2 x 108 IFNa rAd5-CMV)64. Kidney gene expression was measured in 17w naive NZB/W (naive), as well as IFNα-accelerated NZB/W mice: 17w mice 21 days post IFNa injection (IFN W3), and in mice treated with vehicle or CTX for four weeks beginning three weeks post IFNa injection (IFN W7 + Veh or IFN W7 + CTX) [64],

[0423] GSE153021 comprises samples from the MRL/lpr LN model. MRL /l r mice were treated with vehicle, prednisone, mycophenolate mofetil (MMF), FK506, or all three (Multi -target) for eight weeks beginning at week 8 or age-matched wildtype MRL/MpJ mice treated with vehicle [65],

[0424] Quality control and data normalization were performed as follows. Microarray data were processed [66] using free, open source programs (GEOquery, affy, affycoretools, limma, and simpleaffy). Unnormalized arrays were inspected for visual artifacts or poor RNA hybridization using QC plots. Datasets were annotated using their native chip definition files (CDFs). Probes missing gene annotation data were discarded. Raw data (CEL files) from the Affymetrix platform were background corrected and normalized using guanine cytosine robust multiarray average (GCRMA) or robust multichip average (RMA) algorithms, whereas raw data files from Illumina chip were read and normalized using neqc (limma R package).

[0425] RNA-seq data (GSE72410 and GSE153021) was processed from FASTQ files as described by Daamen [67] and also described below. SRA files were downloaded and converted into FASTQ format. Read ends and adapters were trimmed with Trimmomatic (v0.38) using a sliding window, ilmnclip, and headcrop filters. The reads were head cropped at 6bp and adapters were removed before read alignment. Reads were mapped to the mouse reference genome ml 7 using STAR, and the .sam files were converted to sorted .bam files using Sambamba. The mouse reference genome was downloaded from GENCODE. Read counts were summarized using the featureCounts function of the Subread package (vl.61). The DESeq2 workflow was used to filter RNA-seq genes with low expression (i.e. genes with very few reads). The filtered raw counts were normalized using DESeq method and then log2 transformed.

[0426] Principal component analysis was used to inspect the raw data files from each dataset for outliers. All log2 transformed data was formatted into R expression set objects (E-sets).

[0427] DEG analysis was performed as follows. For human dataset DEG analysis, Affymetrix probes were additionally annotated with custom BrainArray (BA) chip definition files (CDFs) [66, 68], Any probes with different Affymetrix and BA gene annotations were excluded. GCRMA- normalized expression values were variance corrected using local empirical Bayesian shrinkage before calculation of DEGs using the ebayes function in the BioConductor LIMMA package. P- values were adjusted for multiple hypothesis testing using the Benjamini -Hochberg correction, which resulted in a False Discovery Rate (FDR). Significant Affymetrix and BA probes within each study were merged and filtered to retain probes with a pre-set FDR < 0.2 which were considered statistically significant. This FDR was employed to avoid falsely excluding genes of interest. This list was further filtered to retain only the most significant probe per gene in order to remove duplicate genes.

[0428] Network analysis and visualization were performed as follows. Cytoscape (V3.8.0) [69] with the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) (VI.5.1) and ClusterMaker2 (VI.3.1) [70] plugins was used to create and visualize protein-protein interactions among the 883 common DEGs. Clusters were generated with the Molecular Complex Detection (MCODE) clustering algorithm within ClusterMaker2 and a node score cutoff of 0.2, k-Core of 2, and a max depth of 100 were set.

[0429] Functional Enrichment Analysis was performed as follows. Functional enrichment of Cytoscape-derived MCODE clusters was performed using BIG-C, a clustering tool developed to categorize the biologic function of large lists of genes [66], The top three significant BIG-C categories (p < 0.05, OR > 1) were reported.

[0430] Gene Set Variation Analysis (GSVA) was performed as follows. GSVA22 for R/Bioconductor was used as a non-parametric, unsupervised method for estimating the variation of pre-defined gene sets in dataset samples. For each dataset only one CDF, Affymetrix or Illumina, was used for each probe. For genes with multiple Affymetrix probe identifiers, only the probe with the highest inter-quartile range (IQR) of expression [71] was retained. Genes with IQR=0 were removed. GSVA enrichment scores were calculated non-parametrically using a Kolmogorov Smirnoff (KS)-like rank statistic [22]; a negative value for a particular sample and gene set means that the gene set has a lower expression than the gene set with a positive value. The same GSVA probes used in calculations for class III/IV LN GL and TI were specified for calculations in all classes of LN.

[0431] GSVA Gene Sets were analyzed as follows. Metabolism gene sets were created from literature mining. Hematopoietic gene sets were derived from Immune/Inflammatory-Scope (I- Scope), a tool developed to identify immune cell-specific genes in big data [72], Non-hematopoietic cellular gene sets were derived from T-Scope [72] or literature mining. Many gene sets in T-Scope were derived from The Human Protein Atlas (www.proteinatlas.org) [73,74], The keratinocyte signature was derived from the keratinocyte-specific genes of Gazel et al. [75], Kidney-specific lists generated from the Human Protein Atlas and single-cell data were additionally modified to incorporate genes found by both transcriptomics and immunohistochemistry [76], The mesangial cell signature was derived from PanglaoDB [77], In all tissues, the hematopoietic, EC, and fibroblast gene sets were evaluated. For non-hematopoietic gene signatures, those relevant to each tissue were reported (DLE - keratinocyte, melanocyte; LN GL - kidney cell, podocyte, mesangial cell; LN TI - kidney cell, proximal tubule, Loop of Henle (LoH) cell, distal tubule, collecting duct cell). Mitochondrial and peroxisomal gene sets were derived from literature mining and the BIG-C, and also compared to signatures in the MSIG database. The Apoptotic Mitochondrial Changes signature (M7482) was accessed on the MSIG database [78,79] and is derived from G0_0008637.

GO Mitochondrial Fission (Ml 2786) and GOBP Peroxiome Fission (M22828) signatures were accessed on the MSIG database and modified slightly. The IGS is the type I IFN core signature [35],

[0432] All human metabolism, non-hematopoietic cell, and IFN gene sets were converted to murine gene sets using the homologene R package and human2mouse function. Genes that were not converted programmatically were manually converted by GeneCards and Mouse Genome Informatics orthologs. Murine hematopoietic cell gene sets were curated by literature mining. For murine datasets with expression of two or more anergic/activated T cell markers, the anergic/activated T cell signature was combined with the T cell signature for GSVA analysis.

[0433] Although the same gene sets were input for each category in each tissue (Table 2), the genes used in calculation of the GSVA enrichment score for each tissue differ slightly based upon the gene measurement platform and expression within that sample. All reported GSVA enrichment scores, except for the HIF1 A gene signature, were calculated based upon a minimum of three genes. The HIF1 A gene signature only includes HIF1A because there were no additional genes determined to be specific to hypoxia only. Although two genes were well co-expressed, the DC signature in LN TI did not meet the minimum three gene requirement for GSVA, nor did the LDG signature in LN GL and LN TI. [0434] Hierarchical clustering was performed as follows. Human lupus tissue samples were hierarchically clustered by the Euclidian distance of their GSVA enrichment scores into two (k=2) or four (k=4) clusters using the heatmap.2 function in R.

[0435] Regression Models were analyzed as follows. For all linear models, GSVA scores for cellular signatures in all tissue samples were input as independent variables, and the pathway GSVA score (metabolism signature, mitochondrial signature, or peroxisomal signature) was input as the dependent variable. As GSVA scales the expression of a signature from -1 to -1, the value for each input cellular or metabolic signature in each sample is relative to the same signature in other samples and to the other signatures in the same sample. To ensure that collinearity between immune cell signatures did not confound results for stepwise regression analyses in DLE, we combined the pDC, skin-specific DC, monocyte/MC, T cell, anergic/activated T cell, B cell, and plasma cell signatures into the “inflammatory cell” signature, because the genes were highly co-expressed. The list of all genes used as the “inflammatory cell” signature may be found in Table 2. This reduced the number of input variables for DLE stepwise analysis to ten cell signatures.

[0436] Visualization was performed as follows. Final figures were generated in GraphPad Prism or Adobe Illustrator.

[0437] Overlap p-values and ORs for functional enrichment of DEGs were calculated in R using two-sided fisher.test with confidence level = 0.95. Because of known heterogeneity in lupus patients, the number of lupus patients in each tissue who fell above or below the control mean ± 1 standard deviation (SD) were then reported in order to determine whether individual patients exhibit an increased or decreased signature when the population did not achieve statistical significance. Calculation of mean and standard deviation (SD) for the control samples for each GSVA score in each tissue was performed in Microsoft Excel. All analyses in GraphPad Prism were carried out with version 8.2.0 (435) or later versions. Control and lupus sample populations of GSVA scores for each gene set were assessed for normality using the D'Agostino-Pearson test in GraphPad Prism, and the distributions for 75% or more of the gene sets in each population were determined to be normal. Welch’s t-test with Bonferroni correction for GSVA enrichment in human samples was performed in GraphPad Prism. Bonferroni correction for metabolic signatures, immune cell signatures, non- hematopoietic signatures, or mitochondrial/peroxisomal signatures were performed separately. The Mann-Whitney U test for GSVA enrichment in murine samples was performed in GraphPad Prism. Univariate (simple) linear regression and Pearson correlation analyses were carried out in GraphPad Prism. Hedges’ g effect sizes were calculated in R using the cohen.d function with Hedges’ correction under the “effsize” package. Stepwise regression was performed in R using the Im function followed by the stepAIC function. Variance inflation factor (VIF) < 10 was confirmed for each independent variable. Although some data points were determined to be influential to the stepwise equation using Cook’s D, no samples were removed from the models in efforts to capture the heterogeneity present in lupus. CART analysis was performed in R using the rpart function with the “anova” method. Each resulting decision tree was pruned once except for the glycolysis and PPP signatures in LN TI. GSVA scores for all individual samples (patient or control) are presented as individual data points in either dot or violin plots. The number of samples for each group may be found in the figure legends. Information regarding the statistical comparisons made and level of significance is mentioned in the figure legends.

[0438] Data availability: All microarray datasets in this publication are available on NCBI’s GEO database. All bioinformatic software used in this publication is open source, and freely available for R. Example codes used here (LIMMA, GSVA, stepwise regression, and CART) are available at figshare, www.figshare.com as “AMPEL BioSolutions LIMMA Differential Expression Analysis Code,” “AMPEL BioSolutions GSVA AFFY nonzeroIQR Code”, and “AMPEL BioSolutions Stepwise and CART Code,” respectively.

Figure Legends

[0439] FIGs. 1A-1I show that dysregulation of metabolic gene signatures is common among lupus- affected tissues. FIG. 1A: Comparison of DEGs among DLE, class III/IV LN GL, and class III/IV LN TI. FIG. IB: MCODE protein-protein interactions of common UP and DOWN DEGs were generated with Cytoscape using the STRING and ClusterMaker2 plugins and annotated with BIG-C functional categories (odds ratio (OR) > 1, p < 0.05) in Adobe Illustrator. Overlap p-value was calculated using Fisher’s exact test. GSVA of signatures for glycolysis (FIG. 1C), the PPP (FIG.

ID), the TCA cycle (FIG. IE), OXPHOS (FIG. IF), FAAO (FIG. 1G), FABO (FIG. 1H), and AA metabolism in lupus tissues and controls (CTLs) (FIG. II). Each point represents an individual sample. Significant differences in enrichment of the metabolic signatures in each lupus tissue as compared to CTL was determined by Welch’s t-test with Bonferroni correction. Numbers below each tissue indicate the number of lupus patients with enrichment scores 1 SD less than (< 1SD) or greater than (> 1SD) the CTL mean. For all calculations, the following sample numbers were used: DLE [CTL (n = 8), DLE (n = 9)], LN GL [CTL (n = 14), Cl III/IV (n = 22)], and LN TI [CTL (n = 15), Cl III/IV (n = 22)]. **, p < 0.01; ***, p<0.001; ****, p<0.0001.

[0440] FIGs. 2A-2C show that increased myeloid cell signatures and decreased non-hematopoietic cell signatures characterize the majority of lupus patients. FIG. 2A: Hedges’ g effect sizes of immune and non-hematopoietic cell signatures in DLE, class III/IV LN GL, and class III/IV LN TI as compared to tissue CTLs. Significant p-values reflect significant differences in enrichment of the immune cell signatures or non-hematopoietic cell signatures in lupus tissues as compared to CTL as determined by Welch’s t-test with Bonferroni correction (FIG. 9). FIG. 2B: R 2 values derived from linear regression of the monocyte-derived macrophage or the tissue-resident macrophage markers with the monocyte/MC GSVA scores in individual patients and CTLs from lupus-affected tissues (FIGs. 11A-11B). Significant p-values reflect significantly non-zero slopes. FIG. 2C: Pearson correlation coefficients between tissue-resident macrophage markers in LN. For all calculations, the following sample numbers were used: DLE [CTL (n = 8), DLE (n = 9)], LN GL [CTL (n = 14), Cl 3/4 (n = 22)], and LN TI [CTL (n=15), Cl 3/4 (n=22)]. *, p < 0.05; **, p < 0.01; ***, p<0.001; ****, p<0.0001.

[0441] FIGs. 3A-3U show that metabolic and cellular signature changes in class II LN GL are similar to those seen in class III/IV. GSVA of metabolic pathway signatures (FIGs. 3A-3G) and cell signatures (FIGs. 3G-3T) in all classes of LN GL. Each point represents an individual sample. Significant differences in enrichment of the metabolic signatures, immune cell signatures, or non- hematopoietic cell signatures between class II LN GL and CTL, class III/IV LN GL and CTL, and class II LN GL and class III/IV LN GL were performed by Welch’s t-test with Bonferroni correction. FIG. 3U: Hierarchical clustering (k = 4) of all glomerular samples. For all calculations, the following sample numbers were used: LN GL [CTL (n = 14), Cl II (n = 8), Cl III/IV (n = 22)]. **, p < 0.01; ***, p<0.001; ****, p<0.0001.

[0442] FIGs. 4A-4H show that metabolic gene expression changes in LN GL are associated with changes in the EC, kidney cell, and fibroblast gene signatures. FIG. 4A: Stepwise regression coefficients and FIGs. 4B-4H CART analysis for metabolic pathway signatures in all glomerular LN samples and CTLs. Values in the final CART leaves for each process represent the average GSVA score of samples that were assigned to that leaf. Each resulting CART decision tree was pruned once. For all calculations, the following sample numbers were used: LN GL [CTL (n = 14), Cl II (n = 8), Cl III/IV (n = 22)]. Significant p-values in FIG. 4A reflect significant coefficients in the stepwise regression model. *, p < 0.05; **, p < 0.01; ***, p<0.001; ****, p<0.0001.

[0443] FIGs. 5A-5O show that mitochondrial and peroxisomal signature changes and local hypoxia contribute to changes in metabolic gene expression in specific cells. GSVA of signatures for mitochondria- (FIGs. 5A-5F) or peroxisome-related gene signatures (FIGs. 5G-5H) in lupus tissues and CTLs. Each point represents an individual sample. FIGs. 5I-5K: Stepwise regression coefficients for mitochondrial and peroxisomal signatures in all tissues and CTLs. FIG. 5L: GSVA of HIF1A in lupus tissues and CTLs. Each point represents an individual sample. FIGs. 5M-5O: Stepwise regression coefficients for metabolic pathway signatures with the addition of HIF1A in all tissues and CTLs. Significant differences in enrichment of the mitochondrial/peroxisomal signatures or HIF1A in DLE and CTL, Cl II LN GL and CTL, Cl III/IV LN GL and CTL, Cl n LN TI and CTL, Cl III/IV LN TI and CTL was determined by Welch’s t-test with Bonferroni correction. Numbers below each tissue indicate the number of lupus patients with enrichment scores 1 SD less than (< 1 SD) or greater than (> 1 SD) the CTL mean. For all calculations, the following sample numbers were used: DLE [CTL (n = 8), DLE (n = 9)], LN GL [CTL (n = 14), Cl 2 (n = 8), Cl III/IV (n = 22)], and LN TI [CTL (n = 15), Cl II (n = 8), Cl III/IV (n = 22)]. Significant p-values in FIGs. 5I-5K and FIGs. 5M-5O reflect significant coefficients in the stepwise regression model. *, p < 0.05; **, p < 0.01; ***, p<0.001; ****, p<0.0001.

[0444] FIGs. 6A-6H show that metabolic gene expression changes occur independent of acute IFN stimulation in murine LN. FIG. 6A: GSVA of the IGS in the kidney of IFNα-accelerated NZB/W mice (GSE86423). FIGs. 6B-6H: GSVA of metabolic signatures and linear regression between the IGS and metabolic signature GSVA scores. Each point represents an individual mouse. For each signature, significant differences in enrichment at each timepoint compared to baseline were evaluated with the Mann-Whitney U test. For all calculations the following sample numbers were used: Baseline (n = 3), W1 (n = 5), W2 (n = 5), W3 (n = 5), W5 (n = 5), W7 (n = 5), and W9 (n = 5). Significant p-values in the regression plots in FIGs. 6B-6H reflect significantly non-zero slopes. *, p < 0.05; **, p < 0.01; ***, p<0.001; ****, p<0.0001.

[0445] FIGs. 7A-7E show that metabolic gene expression changes in murine LN are corrected with immunosuppressive treatment. GSVA of metabolism signatures in the kidney of NZM2410 (GSE32583, GSE49898) (FIG. 7A), NZB/W (GSE32583, GSE49898) (FIG. 7B), IFNα-accelerated NZB/W (GSE72410) (FIG. 7C), MRL/lpr (GSE153021) (FIG. 7D), and NZW/BXSB (GSE32583, GSE49898) mice (FIG. 7E) with and without treatment. Each point represents an individual mouse. For each signature, significant differences in enrichment at each timepoint compared to baseline and each treatment timepoint compared to disease were evaluated with the Mann-Whitney U test. For all calculations the following sample numbers were used: NZM2410 [6w (n = 5), 21-30w (n = 5), Tx+15w (n = 6)], NZB/W [16w (n = 8), 23w (n = 6), 36w (n = 10), Tx Remission +3-4w (n = 8), Tx Remission + >5w (n = 6)], IFN-accelerated NZB/W [Naive (n = 5), IFN W3 (n = 5), IFN W7 + Veh (n = 5), IFN W7 + CTX (n = 5)], MRL/lpr [Wildtype (n = 3), Vehicle (n = 3), Prednisone (n = 3), MMF (n = 3), FK506 (n = 3), Multi-target (n = 3)], and NZW/BXSB [17w (n = 6), 18-21w +P (n = 6)]. *, p < 0.05; **, p < 0.01; ***, p<0.001; ****, p<0.0001.

[0446] FIGs. 8A-8F show that cellular and metabolic gene expression changes correlate with expression of genes indicating tubular damage in human and murine LN. Log2 expression of HAVCRl/Havcrl (FIG. 8A) and LCN2/Lcn2 (FIG. 8B) in human LN TI and the kidneys of (NZM2410 (GSE32583, GSE49898), NZB/W (GSE32583, GSE49898), IFNα-accelerated NZB/W (GSE86423), IFNα-accelerated NZB/W (GSE72410), and MRL/lpr (GSE153021) mice. Each point represents an individual human or mouse. Significant differences in gene expression at each timepoint compared to disease were evaluated in the human TI with Welch’s t-test and in murine models with the Mann- Whitney U test. Linear regression between HA VCRl/Havcrl and LCN2/Lcn2 expression and kidney cell, proximal tubule, and TCA cycle GSVA scores in all samples for human (FIGs. 8C-8E) and murine (FIG. 8F) LN. For all calculations the following sample numbers were used: LN TI [CTL (n = 15), Cl II (n = 8), Cl III/IV (n = 22) ], NZM2410 [6w (n = 5), 21-30w (n = 5), Tx+15w (n = 6)], NZB/W [16w (n = 8), 23w (n = 6), 36w (n = 10), Tx Remission +3-4w (n = 8), Tx Remission + >5w (n = 6)], IFN-accelerated NZB/W (GSE86423) [Baseline (n = 3), W1 (n = 5), W2 (n = 5), W3 (n = 5), W5 (n = 5), W7 (n = 5), and W9 (n = 5), IFN-accelerated NZB/W (GSE72410) [Naive (n = 5), IFN W3 (n = 5), IFN W7 + Veh (n = 5), IFN W7 + CTX (n = 5)], and MRL///?r [Wildtype (n = 3), Vehicle (n = 3), Prednisone (n = 3), MMF (n = 3), FK506 (n = 3), Multi-target (n = 3)]. Significant p-values in (c-f) reflect significantly non-zero slopes determined by linear regression or correlation analysis. *, p < 0.05; **, p < 0.01; ***, pO.OOl; ****, p<0.0001.

[0447] FIGs. 9A-9O show that increased myeloid cell signatures and decreased non-hematopoietic cell signatures characterize the majority of lupus patients. GSVA of signatures for granulocytes (FIG. 9A), pDCs (FIG. 9B), dendritic cells (FIG. 9C), monocyte/MCs (FIG. 9D), T cells (FIG. 9E), B cells (FIG. 9F), plasma cells (FIG. 9G), platelets (FIG. 9H), immune cells (FIG. 91) with expression found only in DLE, endothelial cells (FIG. 9J), fibroblasts (FIG. 9K), skin cells (FIG. 9L), kidney cells (FIG. 9M), glomerular cells (FIG. 9N), and tubule cells (FIG. 90) in lupus tissues and CTLs. Each point represents an individual sample. Significant differences in enrichment of the immune cell signatures or non-hematopoietic cell signatures in each lupus tissue as compared to CTL was determined by Welch’s t-test with Bonferroni correction. Numbers below each tissue indicate the number of lupus patients with enrichment scores 1 SD less than (< 1SD) or greater than (> 1SD) the CTL mean. For all calculations, the following sample numbers were used: DLE [CTL (n = 8), DLE (n = 9)], LN GL [CTL (n = 14), Cl III/IV (n = 22)], and LN TI [CTL (n = 15), Cl ni/IV (n = 22)]. **, p < 0.01; ***, p<0.001; ****, p<0.0001.

[0448] FIG. 10 shows that anergic/activated T cell marker genes have no change in expression in LN class III/IV. Log2 expression of CD160, CD244, CTLA4, ICOS, KLRG1, LAG3, and PDCD1 in lupus tissues and CTLs. Each point represents an individual sample. Significant differences in expression of the anergic/activated T cell marker gene in each lupus tissue as compared to CTL were determined by Welch’s t-test. Value beneath the x-axis denotes the interquartile range (IQR) of expression calculated in R. For all calculations, the following sample numbers were used: LN GL [CTL (n = 14), Cl III/IV (n = 22)] and LN TI [CTL (n = 15), Cl III/IV (n = 22)]. *, p<0.05.

[0449] FIGs. 11A-11B show that monocyte/MC gene signatures reflect both monocyte-derived macrophage and tissue-resident macrophage populations. Linear regression between the monocyte/MC GSVA score and FCN1 expression (FIG. 11A) or TRM marker expression (FIG. 11B) in lupus-affected tissues. Each point represents an individual sample. For all calculations, the following sample numbers were used: DLE [CTL (n = 8), DLE (n = 9)], LN GL [CTL (n = 14), Cl III/IV (n = 22)], and LN TI [CTL (n = 15), Cl III/IV (n = 22)]. Significant p-values reflect significantly non-zero slopes.

[0450] FIG. 12 shows that metabolic and cellular gene expression changes in class II LN GL are similar to those seen in class III/IV. Hierarchical clustering (k = 4) of class II LN GL samples (n = 8). Red stars represent patients with a GSVA score greater than the control mean + 1 SD. Black stars represent patients with a GSVA score less than the control mean - 1SD.

[0451] FIGs. 13A-13U show that metabolic and cellular gene expression changes in class II LN TI are less robust than those seen in class III/IV. GSVA of metabolic pathway signatures (FIGs. 13A- 13G) and cell signatures (FIGs. 13H-13T) in all classes of LN TI. Each point represents an individual sample. Significant differences in enrichment of the metabolic signatures, immune cell signatures, or non-hematopoietic cell signatures between class II LN TI and CTL, class III/IV LN TI and CTL, and class II LN TI and class III/IV LN TI were performed by Welch’s t-test with Bonferroni correction. FIG. 13U: Hierarchical clustering (k = 4) of all tubulointerstitial samples. For all calculations, the following sample numbers were used: LN TI [CTL (n = 15), Cl II (n = 8), Cl III/IV (n = 22)]. **, p < 0.01; ***, p<0.001; ****, p<0.0001.

[0452] FIG. 14 shows that metabolic and cellular gene expression changes in some class II LN TI patients are similar to those seen in class III/IV patients. Hierarchical clustering (k = 4) of class II LN TI samples (n = 8). Red stars represent patients with a GSVA score greater than the control mean + 1 SD. Black stars represent patients with a GSVA score less than the control mean - 1SD.

[0453] FIGs. 15A-15B show that numerous cellular gene signatures contribute to the observed metabolic changes in DLE. FIG. 16A: Stepwise regression coefficients for metabolic pathway GSVA scores in all samples for DLE and CTLs. For stepwise repression the pDC, skin-specific DC, monocyte/MC, T Cell, anergic/activated T cell, B cell, and plasma cell signatures were combined into the “inflammatory cell” signature because of collinearity. FIG. 16B: Hierarchical clustering (k=2) of all skin samples. For all calculations, the following sample numbers were used: DLE [CTL (n = 8), DLE (n = 9)]. Significant p-values reflect significant coefficients in the stepwise regression model. *, p < 0.05; **, p < 0.01; ***, p<0.001; ****, p<0.0001.

[0454] FIGs. 16A-16H show that metabolic gene expression changes in LN TI are associated with changes in the kidney cell, proximal tubule, and monocyte/MC gene signatures. Stepwise regression coefficients (FIG. 16A) and CART (FIG. 16B-16H) analysis for metabolic pathway signatures in all tubulointerstitial LN samples and CTLs. Values in the final CART leaves for each process represent the average GSVA score of samples that were assigned to that leaf. Each resulting CART decision tree except for glycolysis and PPP was pruned once. For all calculations, the following sample numbers were used: LN TI [CTL (n = 15), Cl II (n = 8), Cl III/IV (n = 22)]. Significant p-values in FIG. 16A reflect significant coefficients in the stepwise regression model. *, p < 0.05; **, p < 0.01; ***, p<0.001; ****, p<0.0001.

[0455] FIG. 17 shows that metabolic genes are altered in scRNA-seq from LN biopsies. DEGs related to metabolism in scRNA-seq clusters (CM2: tissue-resident macrophages, CTOa: effector memory CD4+ T cells, and CEO: epithelial cells) that were present in both LN patients and CTL samples from Arazi et al (Ref. 30). Reported metabolic genes are those with corrected p-values <0.05.

[0456] FIGs. 18A-18Q show that cellular gene expression changes in NZM2410 kidneys may be corrected with immunosuppressive treatment. GSVA of immune (FIGs. 18A-18H) and non- hematopoietic (FIGs. 18I-18Q) cell signatures in the kidneys of NZM2410 mice (GSE32583, GSE49898) with and without treatment. Each point represents an individual mouse. For each signature significant differences in enrichment at each timepoint compared to baseline and treatment compared to disease were evaluated with the Mann-Whitney U test. For all calculations the following sample numbers were used: 6w (n = 5), 21-30w (n = 5), and Tx+15w (n = 6). *, p < 0.05; **, p < 0.01; ***, p<0.001; ****, p<0.0001.

[0457] FIGs. 19A-19R show that cellular gene expression changes in NZB/W kidneys may be corrected with immunosuppressive treatment. GSVA of immune (FIGs. 19A-19I) and non- hematopoietic (FIGs. 19J-19R) cell signatures in the kidneys of NZB/W mice (GSE32583, GSE49898) with and without treatment. Each point represents an individual mouse. For each signature significant differences in enrichment at each timepoint compared to baseline and treatment compared to disease were evaluated with the Mann-Whitney U test. For all calculations the following sample numbers were used: 16w (n = 8), 23w (n = 6), 36w (n = 10), Tx Remission +3-4w (n = 8), and Tx Remission + >5w (n = 6). *, p < 0.05; **, p < 0.01; ***, p<0.001; ****, p<0.0001.

[0458] FIGs. 20A-20S show that immune/inflammatory cell gene expression is increased and proximal tubule cell gene expression is decreased in IFNα-accelerated NZB/W kidneys. GSVA of immune (FIGs. 20A-20J) and non-hematopoietic (FIGs. 20K-20S) cell signatures in the kidneys of IFNα-accelerated NZB/W mice (GSE86423). Each point represents an individual mouse. For each signature significant differences in enrichment at each timepoint compared to baseline were evaluated with the Mann-Whitney U test. For all calculations the following sample numbers were used: Baseline (n = 3), W1 (n = 5), W2 (n = 5), W3 (n = 5), W5 (n = 5), W7 (n = 5), and W9 (n = 5). *, p < 0.05; **, p < 0.01; ***, p<0.001; ****, p<0.0001.

[0459] FIGs. 21A-21S show that cellular gene expression changes in IFNα-accelerated NZB/W kidneys may be corrected with immunosuppressive treatment. GSVA of immune (FIGs. 21A-21 J) and non-hematopoietic (FIGs. 21K-21S) cell signatures in the kidneys of IFNα-accelerated NZB/W mice (GSE72410) with and without treatment. Each point represents an individual mouse. For each signature significant differences in enrichment at each timepoint compared to baseline and treatment compared to disease were evaluated with the Mann-Whitney U test. For all calculations the following sample numbers were used: Naive (n = 5), IFN W3 (n = 5), IFN W7 + Veh (n = 5), and IFN W7 + CTX (n = 5). *, p < 0.05; **, p < 0.01; ***, p<0.001; ****, p<0.0001. [0460] FIGs. 22A-22R show that cellular gene expression in the MRL/lpr kidney is not significantly altered. GSVA of immune (FIGs. 22A-22I) and non-hematopoietic (FIGs. 22J-22R) cell signatures in the kidneys of MRL/lpr mice (GSE153021) with and without treatment. Each point represents an individual mouse. For each signature significant differences in enrichment at each timepoint compared to baseline and each treatment timepoint compared to disease were evaluated with the Mann-Whitney U test. For all calculations the following sample numbers were used: Wildtype (n = 3), Vehicle (n = 3), Prednisone (n = 3), MMF (n = 3), FK506 (n = 3), and Multi -target (n = 3).

[0461] FIGs. 23A-23Q show that immune/inflammatory cell gene expression is increased and kidney cell and proximal tubule cell gene expression is decreased in NZW/BXSB kidneys. GSVA of immune (FIGs. 23A-23H) and non-hematopoietic (FIGs. 23I-23Q) cell signatures in the kidneys of NZW/BXSB mice (GSE32583, GSE49898). Each point represents an individual mouse. For each signature significant differences in enrichment at each timepoint compared to baseline were evaluated with the Mann-Whitney U test. For all calculations the following sample numbers were used: 17w (n = 6) and 18-21w + P (n = 6). *, p < 0.05; **, p < 0.01; ***, p<0.001; ****, p<0.0001.

[0462] FIGs. 24A-24F show that cellular gene expression changes in murine LN correlate with metabolic gene signatures. Pearson correlation coefficients for all metabolic pathway and cellular GSVA scores in all samples of each murine LN model NZM2410 (GSE32583, GSE49898) (FIG. 24A), NZB/W (GSE32583, GSE49898) (FIG. 24B), IFNα-accelerated NZB/W (GSE86423) (FIG. 24C), IFNα-accelerated (GSE72410) (FIG. 24D), MRL/lpr (GSE 153021) (FIG. 24E), and NZW/BXSB (GSE32583, GSE49898) (FIG. 24F). For all calculations the following sample numbers were used: NZM2410 [6w (n = 5), 21-30w (n = 5), Tx+15w (n = 6)], NZB/W [16w (n = 8), 23w (n = 6), 36w (n = 10), Tx Remission +3-4w (n = 8), Tx Remission + >5w (n = 6)], IFN- accelerated NZB/W (GSE8642) [Baseline (n = 3), W1 (n = 5), W2 (n = 5), W3 (n = 5), W5 (n = 5), W7 (n = 5), and W9 (n = 5)], IFN-accelerated NZB/W (GSE72410) [Naive (n = 5), IFN W3 (n = 5), IFN W7 + Veh (n = 5), IFN W7 + CTX (n = 5)], MRL/lpr [Wildtype (n = 3), Vehicle (n = 3), Prednisone (n = 3), MMF (n = 3), FK506 (n = 3), Multi-target (n = 3)], and NZW/BXSB [17w (n = 6), 18-21w +P (n = 6)]. Significant p-values represent significantly non-zero slopes. *, p < 0.05; **, p < 0.01; ***, p<0.001; ****, p<0.0001.

[0463] FIG. 25 shows that cellular and metabolic gene expression changes correlate with expression of genes indicating tubular damage in murine LN. Correlation between Haver 1 or Lcn2 gene expression and GSVA scores for kidney cell, proximal tubule, and TCA cycle in all samples from the kidneys of NZM2410 (GSE32583, GSE49898), NZB/W (GSE32583, GSE49898), IFNα- accelerated NZB/W (GSE86423), IFNα-accelerated NZB/W (GSE72410), MRL/lpr (GSE153021), and NZW/BXSB (GSE32583) mice. For all calculations the following sample numbers were used: NZM2410 [6w (n = 5), 21-30w (n = 5), Tx+15w (n = 6)], NZB/W [16w (n = 8), 23w (n = 6), 36w (n = 10), Tx Remission +3-4w (n = 8), Tx Remission + >5w (n = 6)], IFN-accelerated NZB/W (GSE8642) [Baseline (n = 3), W1 (n = 5), W2 (n = 5), W3 (n = 5), W5 (n = 5), W7 (n = 5), and W9 (n = 5)], IFN-accelerated NZB/W (GSE72410) [Naive (n = 5), IFN W3 (n = 5), IFN W7 + Veh (n =

5), IFN W7 + CTX (n = 5)], MRL/lpr [Wildtype (n = 3), Vehicle (n = 3), Prednisone (n = 3), MMF (n = 3), FK506 (n = 3), Multi-target (n = 3)], and NZW/BXSB [17w (n = 6), 18-21w +P (n = 6)]. Significant p-values reflect significantly non-zero slopes.

[0464] FIGs. 26A-26F show alteration/dysregulation of metabolic gene signatures in lupus, psoriasis, atopic dermatitis, and scleroderma-affected tissues. Each graph shows comparison of DEGs among class III/IV LN GL (violin plot 2), class III/IV LN TI (violin plot 4), DLE (violin plot

6), PSO (violin plot 8), AD (violin plot 10), and SSc (violin plot 12), and respective controls (unshaded violin plots 1, 3, 5, 7, 9 and 11 in each panel). The graphs show GSVA of signatures for glycolysis (FIG. 26A), the PPP (FIG. 26B), the TCA cycle (FIG. 26C), OXPHOS (FIG. 26D), FABO (FIG. 26E), and AA metabolism in lupus tissues and controls (CTLs) (FIG. 26F). Each point represents an individual sample. Numbers below each tissue indicate the number of lupus patients with enrichment scores 1 SD less than (< 1SD) or greater than (> 1SD) the CTL mean. Significant p- values reflect significant differences in GSVA enrichment of the metabolic or cellular signatures in each lupus tissue as compared to CTL in was determined by Welch’s t-test with Bonferroni correction. **, p < 0.01; ***, p < 0.001; ****, p < 0.0001. See methods described in relation to FIGs. 1A-1I, Example 1.

[0465] FIGs. 27A and 27B show that increased immune cell signatures and decreased non- hematopoietic cell signatures characterize the majority of lupus patients. FIG. 27A: Hedges’ g effect sizes of immune cell signatures in class III/IV LN GL, class III/IV LN TI, DLE, PSO, AD, and SSc as compared to tissue CTLs. FIG. 27B: Hedges’ g effect sizes of non-hematopoietic cell signatures in class III/IV LN GL, class III/IV LN TI, DLE, PSO, AD, and SSc as compared to tissue CTLs. Significant p-values reflect significant differences in GSVA enrichment of the metabolic or cellular signatures in each lupus tissue as compared to CTL was determined by Welch’s t-test with Bonferroni correction. **, p < 0.01; ***, p < 0.001; ****, p < 0.0001. See methods described in relation to FIG. 2A, Example 1.

[0466] FIGs. 28A-28C show that metabolic and cellular gene signatures are concurrently altered in the tissues of inflammatory diseases, with different metabolic changes reflecting different cellular signatures. Stepwise regression coefficients are shown for the glycolysis (FIG. 28A), TCA cycle (FIG. 28B), and FABO (FIG. 28C) signatures in class II-IV LN GL, class II-IV LN TI, DLE, PSO, AD, SSc and tissue CTLs. Significant p-values reflect significant coefficients in the stepwise regression model. *, p < 0.05; **, p < 0.01; ***, p < 0.001; ****, p < 0.0001. See methods described in relation to FIG. 15A and 16A, Example 1. [0467] Table 1: 883 common DEGs among human lupus tissues

[0468] Table 2: GSVA gene sets of human or mouse genes

[0469] References (each incorporated by reference herein in its entirety)

[0470] 1. Deng, G.-M. & Tsokos, G. C. Pathogenesis and targeted treatment of skin injury in SLE. Nat. Rev. Rheumatol. 11, 663-669 (2015).

[0471] 2. Bagavant, H. & Fu, S. M. Pathogenesis of kidney disease in systemic lupus erythematosus. Current Opinion in Rheumatology vol. 21 489-494 (2009).

[0472] 3. Morel, L. Immunometabolism in systemic lupus erythematosus. Nat. Rev. Rheumatol. 13, 280-290 (2017).

[0473] 4. Yin, Y. et al. Normalization of CD4+ T cell metabolism reverses lupus. Sci. Transl. Med. 7, 274ra18 (2015).

[0474] 5. Li, W., Sivakumar, R., Titov, A. A., Choi, S.-C. & Morel, L. Metabolic Factors that Contribute to Lupus Pathogenesis. Crit. Rev. Immunol. 36, 75 (2016).

[0475] 6. Reiss, A. B. Effects of inflammation on cholesterol metabolism: Impact on systemic lupus erythematosus. Curr. Rheumatol. Rep. 11, 255-260 (2009).

[0476] 7. Feichtinger, R. G, Sperl, W., Bauer, J. W. & Kofler, B. Mitochondrial dysfunction: A neglected component of skin diseases. Exp. Dermatol. 23, 607-614 (2014). [0477] 8. Kuehne, A. et al. Acute Activation of Oxidative Pentose Phosphate Pathway as First- Line Response to Oxidative Stress in Human Skin Cells. Mol. Cell 59, 359-371 (2015).

[0478] 9. Biniecka, M. et al. Dysregulated bioenergetics: a key regulator of joint inflammation. Ann. Rheum. Dis. 75, 2192-2200 (2016).

[0479] 10. Adams, S. B. et al. Global metabolic profiling of human osteoarthritic synovium. Osteoarthr. Cartil. 20, 64-7 (2012).

[0480] 11. Kang, H. M. et al. Defective fatty acid oxidation in renal tubular epithelial cells has a key role in kidney fibrosis development. Nat. Med. 21, 37-46 (2015).

[0481] 12. Simon, N. & Hertig, A. Alteration of fatty acid oxidation in tubular epithelial cells: From acute kidney injury to renal fibrogenesis. Front. Med. 2, (2015).

[0482] 13. Kelly, B. & O’Neill, L. A. Metabolic reprogramming in macrophages and dendritic cells in innate immunity. Cell Res. 25, 771-784 (2015).

[0483] 14. Maclver, N. J., Mi chai ek, R. D. & Rathmell, J. C. Metabolic Regulation of T Lymphocytes. Annu. Rev. Immunol. 31, 259-283 (2013).

[0484] 15. Grayson, P. C. et al. Metabolic pathways and immunometabolism in rare kidney diseases. Ann. Rheum. Dis. 77, 1227-1234 (2018).

[0485] 16. Tilstra, J. S. et al. Kidney-infiltrating T cells in murine lupus nephritis are metabolically and functionally exhausted. J. Clin. Invest. 128, 4884-4897 (2018) is incorporated by reference herein in its entirety.

[0486] 17. Kidani, Y. & Bensinger, S. J. Lipids rule: resetting lipid metabolism restores T cell function in systemic lupus erythematosus. J. Clin. Invest. 124, 482-485 (2014) is incorporated by reference herein in its entirety.

[0487] 18. Sharabi, A. & Tsokos, G. C. T cell metabolism: new insights in systemic lupus erythematosus pathogenesis and therapy. Nat. Rev. Rheumatol. 16, 100-112 (2020).

[0488] 19. Mehta, M. M. & Chandel, N. S. Targeting metabolism for lupus therapy. Sci. Transl. Med. 7, 274fs5 (2015).

[0489] 20. Banchereau, R. et al. Personalized Immunomonitoring Uncovers Molecular Networks that Stratify Lupus Patients. Cell 165, 551-565 (2016).

[0490] 21. Catalina, M. D. et al. Patient ancestry significantly contributes to molecular heterogeneity of systemic lupus erythematosus. JCI Insight 5, (2020).

[0491] 22. Hanzelmann, S., Castelo, R. & Guinney, J. GSVA: gene set variation analysis for microarray and RNA-Seq data. BMC Bioinformatics 14, 7 (2013). [0492] 23. Frank enberger, M., Schwaeble, W. & Ziegler-Heitbrock, L. Expression of M-Ficolin in human monocytes and macrophages. Mol. Immunol. 45, 1424-1430 (2008).

[0493] 24. Xue, D., Tabib, T., Morse, C. & Lafyatis, R. Transcriptome landscape of myeloid cells in human skin reveals diversity, rare populations and putative DC progenitors. J. Dermatol. Sci. 97, 41-49 (2020).

[0494] 25. Zimmerman, K. A. et al. Single-cell RNA sequencing identifies candidate renal resident macrophage gene expression signatures across species. J. Am. Soc. Nephrol. 30, 767-781 (2019).

[0495] 26. Weening, J. J. et al. The classification of glomerulonephritis in systemic lupus erythematosus revisited. Kidney Int. 65, 521-530 (2004).

[0496] 27. Agostinelli, C. Robust stepwise regression. J. Appl. Stat. 29, 825-840 (2002).

[0497] 28. Krzywinski, M. & Altman, N. Classification and regression trees. Nature Methods vol. 14 757-758 (2017).

[0498] 29. Afshinnia, F. et al. Impaired B-oxidation and altered complex lipid fatty acid partitioning with advancing CKD. J. Am. Soc. Nephrol. 29, 295-306 (2018).

[0499] 30. Arazi, A. et al. The immune cell landscape in kidneys of patients with lupus nephritis. Nat. Immunol. 20, 902-914 (2019).

[0500] 31. Wanders, R. J. A., Waterham, H. R. & Ferdinandusse, S. Metabolic interplay between peroxisomes and other subcellular organelles including mitochondria and the endoplasmic reticulum. Front. Cell Dev. Biol. 3, 83 (2016).

[0501] 32. Shu, S. et al. Hypoxia and Hypoxia-Inducible Factors in Kidney Injury and Repair. Cells 8, 207 (2019).

[0502] 33. Baechler, E. C. et al. Interferon-inducible gene expression signature in peripheral blood cells of patients with severe lupus. Proc. Natl. Acad. Sci. 100, 2610-2615 (2003).

[0503] 34. Bennett, L. et al. Interferon and granulopoiesis signatures in systemic lupus erythematosus blood. J. Exp. Med. 197, 711-23 (2003).

[0504] 35. Catalina, M. D., Bachali, P., Geraci, N. S., Grammer, A. C. & Lipsky, P. E. Gene expression analysis delineates the potential roles of multiple interferons in systemic lupus erythematosus. Commun. Biol. 2, 140 (2019).

[0505] 36. Ahmed, D. et al. Transcriptional Profding Suggests Extensive Metabolic Rewiring of Human and Mouse Macrophages during Early Interferon Alpha Responses. Mediators Inflamm. 2018, (2018). [0506] 37. Pantel, A. et al. Direct Type I IFN but Not MDA5/TLR3 Activation of Dendritic Cells Is Required for Maturation and Metabolic Shift to Glycolysis after Poly IC Stimulation. PLoS Biol. 12, (2014).

[0507] 38. Wu, D. et al. Type 1 Interferons Induce Changes in Core Metabolism that Are Critical for Immune Function. Immunity 44, 1325-1336 (2016).

[0508] 39. De Souza, D. P. et al. Autocrine IFN-I inhibits isocitrate dehydrogenase in the TCA cycle of LPS-stimulated macrophages. J. Clin. Invest. 129, 4239-4244 (2019).

[0509] 40. Gardet, A. et al. Pristane-Accelerated Autoimmune Disease in (SWR X NZB) Fl Mice Leads to Prominent Tubulointerstitial Inflammation and Human Lupus Nephritis-Like Fibrosis. PLoS One 11, eO 164423 (2016).

[0510] 41. Bethunaickan, R. et al. Identification of stage-specific genes associated with lupus nephritis and response to remission induction in (NZB x NZW)F1 and NZM2410 mice. Arthritis Rheumatol. 66, 2246-2258 (2014).

[0511] 42. Castillo-Rodriguez, E. et al. Kidney Injury Marker 1 and Neutrophil Gelatinase- Associated Lipocalin in Chronic Kidney Disease. Nephron 136, 263-267 (2017).

[0512] 43. Viau, A. et al. Lipocalin 2 is essential for chronic kidney disease progression in mice and humans. J. Clin. Invest. 120, 4065-4076 (2010).

[0513] 44. Viola, A., Munari, F., Sanchez-Rodriguez, R., Scolaro, T. & Castegna, A. The metabolic signature of macrophage responses. Front. Immunol. 10, 1462 (2019).

[0514] 45. Waters, L. R., Ahsan, F. M., Wolf, D. M., Shinhai, O. & Teitell, M. A. Initial B Cell Activation Induces Metabolic Reprogramming and Mitochondrial Remodeling. iScience 5, 99-109 (2018).

[0515] 46. Davidson, A. What is damaging the kidney in lupus nephritis? Nature Reviews Rheumatology vol. 12 143-153 (2016).

[0516] 47. Schiffer, L. et al. Activated Renal Macrophages Are Markers of Disease Onset and Disease Remission in Lupus Nephritis. J. Immunol. 180, 1938-1947 (2008).

[0517] 48. Wickersham, M., Wachtel, S., Fok, T. W., Richardson, A. & Parker, D. Metabolic Stress Drives Keratinocyte Defenses against Staphylococcus aureus Infection. CellReports 18, 2742- 2751 (2017).

[0518] 49. Wu, S.-B. & Wei, Y.-H. AMPK-mediated increase of glycolysis as an adaptive response to oxidative stress in human cells: Implication of the cell survival in mitochondrial diseases. Biochim. Biophys. Acta - Mol. Basis Dis. 1822, 233-247 (2012). [0519] 50. Zhao, X. et al. Metabolic regulation of dermal fibroblasts contributes to skin extracellular matrix homeostasis and fibrosis. Nat. Metab. 1, 147-157 (2019).

[0520] 51. Hallan, S. et al. Metabolomics and Gene Expression Analysis Reveal Downregulation of the Citric Acid (TCA) Cycle in Non-diabetic CKD Patients. EBioMedicine 26, 68-77 (2017).

[0521] 52. Sharma, K. et al. Metabolomics Reveals Signature of Mitochondrial Dysfunction in Diabetic Kidney Disease. J. Am. Soc. Nephrol. 24, 1901-1912 (2013).

[0522] 53. Berthier, C. C. et al. Cross-Species Transcriptional Network Analysis Defines Shared Inflammatory Responses in Murine and Human Lupus Nephritis. J. Immunol. 189, 988-1001 (2012).

[0523] 54. Nakagawa, T., Kosugi, T., Haneda, M., Rivard, C. J. & Long, D. A. Abnormal angiogenesis in diabetic nephropathy. Diabetes 58, 1471-1478 (2009).

[0524] 55. Ballermann, B. J. Glomerular endothelial cell differentiation. Kidney Int. 67, 1668— 1671 (2005).

[0525] 56. Sun, Y. B. Y. et al. Glomerular Endothelial Cell Injury and Damage Precedes That of Podocytes in Adriamycin-Induced Nephropathy. PLoS One 8, e55027 (2013).

[0526] 57. Eelen, G, De Zeeuw, P., Simons, M. & Carmeliet, P. Endothelial cell metabolism in normal and diseased vasculature. Circ. Res. 116, 1231-1244 (2015).

[0527] 58. Kalucka, J. et al. Quiescent Endothelial Cells Upregulate Fatty Acid -Oxidation for Vasculoprotection via Redox Homeostasis. Cell Metab. 28, 881-894. e!3 (2018).

[0528] 59. Bhargava, P. & Schnellmann, R. G. Mitochondrial energetics in the kidney. Nature Reviews Nephrology vol. 13 629-646 (2017).

[0529] 60. Eckardt, K. U. et al. Role of hypoxia in the pathogenesis of renal disease, in Kidney International vol. 68 S46-S51 (Elsevier, 2005).

[0530] 61. Fritsch, S. D. & Weichhart, T. Effects of interferons and viruses on metabolism. Frontiers in Immunology vol. 7 630 (2016).

[0531] 62. Broder, A. et al. Tubulointerstitial damage predicts end stage renal disease in lupus nephritis with preserved to moderately impaired renal function: A retrospective cohort study. Semin. Arthritis Rheum. 47, 545-551 (2018).

[0532] 63. Chong, B. F. et al. A subset of CD163+ macrophages displays mixed polarizations in discoid lupus skin. Arthritis Res. Ther. 17, 1-10 (2015). [0533] 64. Katewa, A. et al. Btk-specific inhibition blocks pathogenic plasma cell signatures and myeloid cell-associated damage in IFNα-driven lupus nephritis. JCI insight 2, e90111 (2017).

[0534] 65. Fu, J. et al. Transcriptomic analysis uncovers novel synergistic mechanisms in combination therapy for lupus nephritis. Kidney Int. 93, 416-429 (2018).

[0535] 66. Labonte, A. C. et al. Identification of alterations in macrophage activation associated with disease activity in systemic lupus erythematosus. PLoS One 13, e0208132 (2018).

[0536] 67. Daamen, A. R. et al. Comprehensive transcriptomic analysis of COVID-19 blood, lung, and airway. Sci. Rep. 11, 7052 (2021).

[0537] 68. Dai, M. et al. Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res. 33, (2005).

[0538] 69. Shannon, P. et al. Cytoscape: A software Environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498-2504 (2003).

[0539] 70. Morris, J. H. et al. ClusterMaker: A multi-algorithm clustering plugin for Cytoscape. BMC Bioinformatics 12, 436 (2011).

[0540] 71. Chockalingam, S., Aluru, M. & Aluru, S. Microarray Data Processing Techniques for Genome-Scale Network Inference from Large Public Repositories. Microarrays 5, 23 (2016).

[0541] 72. Hubbard, E. L. et al. Analysis of gene expression from systemic lupus erythematosus synovium reveals myeloid cell-driven pathogenesis of lupus arthritis. Sci. Rep. 10, 17361 (2020).

[0542] 73. Fagerberg, L. et al. Analysis of the human tissue-specific expression by genomewide integration of transcriptomics and antibody-based proteomics. Mol. Cell. Proteomics 13, 397- 406 (2014).

[0543] 74. Uhlen, M. et al. Proteomics. Tissue-based map of the human proteome. Science 347(6220): 1260419-1260419 (2015).

[0544] 75. Gazel, A. et al. Transcriptional Profiling of Epidermal Keratinocytes: Comparison of Genes Expressed in Skin, Cultured Keratinocytes, and Reconstituted Epidermis, Using Large DNA Microarrays. J. Invest. Dermatol. 121, 1459-1468 (2003).

[0545] 76. Habuka, M. et al. The kidney transcriptome and proteome defined by transcriptomics and antibody -based profiling. PLoS One 9, (2014).

[0546] 77. Franzen, O., Gan, L.-M. & Bjbrkegren, J. L. M. PanglaoDB - A Single Cell Sequencing Resource For Gene Expression Data, panglaodb.se/index.html (2019).

[0547] 78. Liberzon, A. et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 1739-1740 (2011). [0548] 79. Subramanian, A. et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U. S. A. 102, 15545-15550 (2005).

[0549] 80. Kingsmore, K. M. et al. Transcriptomic Meta-analysis of Lupus Affected Tissues Reveals Shared Immune, Metabolic, and Biochemical Dysregulation [abstract]. Arthritis Rheumatol 71, (2019).

Example 2: Machine learning reveals distinct gene signature profiles in lesional and nonlesional regions of inflammatory skin diseases.

[0550] Inflammatory skin diseases have unique clinical features but may have both selective and overlapping responses to targeted therapies. To determine the unique and shared molecular features of inflammatory skin diseases, we carried out a comprehensive analysis of gene expression from cutaneous lupus erythematosus (CLE) and compared it to that of psoriasis, atopic dermatitis, and systemic sclerosis. Using gene set variation analysis (GSVA), we found that lesional samples from each condition had unique features, but all four diseases displayed common enrichment in multiple inflammatory cell and pathway gene signatures, including the interferon, tumor necrosis factor, and IL-23 gene signatures. These findings were confirmed by both classification and regression tree (CART) analysis and machine learning (ML) models. Nonlesional samples from each disease also differed from normal samples and each other by ML. Notably, the features used in classification of nonlesional disease compared to control weremore distinct than their lesional counterparts and GSVA confirmed unique features of nonlesional disease. These data show that lesional and nonlesional skin samples from CLE andother inflammatory skin diseases have unique profiles of gene expression abnormalities, especially in nonlesional skin. The results suggest a model in which diseases-specific abnormalities in “pre-lesional” skin may permit environmental stimuli to trigger inflammatory responses leading to both the unique and shared manifestations of each disease. Dissection of molecular pathways enriched in both clinically involved and uninvolved skin can advance the understanding of the pathogenesis of these conditions and identify novel therapies.

[0551] Autoimmune and inflammatory diseases, such as systemic lupus erythematosus (SLE), can affect many organs, including the skin. Indeed, skin manifestations of lupus, known as cutaneous lupus erythematosus (CLE), are common and occur in 70-85% of lupus patients (1- 3). Historically, CLE is classified into three subtypes based on clinical and serological features: acute CLE (ACLE), subacute CLE (SCLE), and chronic CLE (CCLE) (4). The heterogeneity of CLE makes it difficult to determine particular cytokines or inflammatory pathways to target therapeutically and, as a result, no therapies are specifically approved for CLE (5). Both an innate immune response, coordinated through Toll-like receptor activation as well as multiple adaptive immune responses have been reported in the initiation and propagation of CLE (2). Targeting B cells with belimumab (6) and type 1 interferon (IFN) with anifrolumab (7) show some benefit by decreasing cutaneous manifestations of SLE. In contrast, other inflammatory skin diseases, such as psoriasis (PSO) have numerous approved therapies (8), and dupilumab, an inhibitor of IL-4 receptor signaling, is an effective therapy for both atopic dermatitis (AD) and PSO (9). This overlap of central, nonredundant pathways between PSO and AD illustrates that diseases with markedly different clinical phenotypes may have similar immunopathogenic underpinnings.

[0552] Although independent transcriptomic analyses have provided insight into the molecular landscape of CLE, a complete molecular characterization of the disease is limited by small patient cohorts (10-17). Previous bulk gene expression studies focused on specific aspects of lupus skin disease, such as the presence of T helper (Th) 17 cells (10) or specific macrophage populations (11), the correlation between inflammatory cell populations and fibroblast marker expression (12), cytokine expression (13), inflammasome signaling (14) or IFN signaling (15- 17). However, there remains a need to examine the interplay of inflammatory cells, non-hematopoietic cells, and pathway perturbations to understand the molecular events in CLE pathogenesis in further detail

[0553] Whereas the dissimilarities in clinical manifestations of CLE and other inflammatory skin diseases have been well documented, the molecular differences between CLE and other inflammatory skin diseases are less completely studied. For instance, keratinocytes, one of the predominant non-hematopoietic cell populations in the skin, have been implicated in PSO pathogenesis (18) and shown to be hypersensitive to IFN signaling in CLE (15), yet understanding of their role in CLE is limited. Moreover, systemic sclerosis (SSc), anotherinflammatory skin disease, is characterized by fibrosis and vascular damage due to excessive deposits of extracellular matrix and differentiation of fibroblasts to myofibroblasts (19), but little is known about the role of fibrosis in the pathogenesis of CLE. Finally, AD is characterized by anallergic reaction owing to a loss in skin barrier function, fibrosis, and Th2 cell signaling (20), but these functions in CLE have not been explored. Detailed comparison among the molecular signatures of CLE, PSO, AD, and SSc could achieve better understanding of the primary pathogenic mechanisms and provide direction for new therapeutic avenues in these conditions.

[0554] In this study, we compared the gene expression signatures of four inflammatory skin diseases: CLE, PSO, AD, and SSc. In order to achieve a read depth sufficient to maintain the in vivo proportions of cellular signals in the biopsies without technical distortion and to capture the majority of molecular pathways, we analyzed bulk RNA. In addition, we employed analytic tools to deconvolute transcriptomic data and determine cellular and pathway signals enriched across heterogeneous cohorts of patients from each of the diseases. Using gene set variation analysis (GSVA), we determined that lesional skin of the four skin diseases expressed both shared and unique molecular signatures. Machine learning (ML) demonstrated that both lesional and nonlesional samples of each disease could be classified as distinct from control samples as well as from each other. Notably, nonlesional skin of each disease was more distinct than lesional skin, as there were more common features in ML classification of lesional skin among the four diseases. GSVA confirmed the molecular differences between uninvolved skin of the various conditions. These results suggest a model in which the nonlesional skin of patients with inflammatory skin diseases harbors unique abnormalities that potentially make the skin differentially sensitive to specific environmental stimuli. Inciting stimuli appear to induce responses with many overlapping inflammatory molecular features shared by the diseases. Altogether, this suggests that therapeutics employed in the treatment of one inflammatory skin disease may be useful in the treatment of additional diseases and confirms the utility of gene expression analysis in understanding the immunopathogenesis of clinical and pre-clinical disease.

Results

Comprehensive gene expression analysis of DLE reveals similarities and differences with other inflammatory skin diseases.

[0555] We carried out a comprehensive transcriptomic analysis of five independent datasets of samples biopsied from both patients with DLE, the most frequent subset of CLE (4), and healthy controls (Table 3). In order to examine cellular and pathway signaling on an individual patient level, we carried out GSVA using a total of 48 informative gene signatures (Table 4A-1 to 4A-20, and 4B-1 to 4B-28). Hierarchical clustering of GSVA enrichment scores demonstrated that DLE was molecularly separable from healthy skin (FIG. 29A). Despite some interpatient heterogeneity in each dataset, signatures for plasmacytoid dendritic cells (pDCs), monocytes, monocyte/myeloid cells, natural killer (NK) cells, T cells, B cells and plasma cells were consistently enriched in patients with DLE as compared to healthy skin (left, FIG. 29B). Conversely, signatures representative of granulocytes, Langerhans cells and melanocytes were consistently downregulated in lupus- affected skin. As previously shown, the interferon gene signature (IGS) was increased in all DLE datasets, as were the IL- 12 and TNF gene signatures (14, 17) (right, FIG. 29B). There was also enrichment of gene expression of other inflammatory pathways, such as IL-21, IL-23, complement and the immunoproteasome. Additionally, genes reflective of most metabolic processes were decreased in DLE samples whereas signatures representing glycolysis and the pentose phosphate pathway remained mostly unchanged.

[0556] To understand the molecular landscape of cutaneous lupus in the context of other inflammatory skin diseases, we examined gene expression data derived from skin biopsies of patients with PSO, AD or SSc. Overall, there was enrichment of most myeloid and lymphoid- derived cell signatures across all four diseases as compared to control, whereas expression of skinspecific dendritic cells (DCs) differed among the diseases (left, FIG. 30A, FIG. 31-34). Monocyte gene signatures were consistently enriched in DLE, AD and SSc. In contrast, the endothelial cell gene signature was enriched in SSc only. Inflammatory cytokine gene expression was largely enriched in all four diseases, especially pathways involving IFN, IL- 12, IL-23 and TNF. DLE and SSc exhibited gene expression enrichment in the complement signature, whereas PSO and AD did not (right, FIG. 30A, FIG 31-34). We noted heterogeneity among datasets of each disease; for example, the Thl7 signature was upregulated in two of three PSO datasets and the peroxisome signature was downregulated in four out of five DLE datasets.

[0557] Next, we employed classification and regression tree (CART) analysis using GSVA enrichment scores of 48 cellular and pathway gene signatures to discern the gene expression variables that best classified the inflammatory skin diseases (FIG. 30B-H). CART analyses four lesional (L) diseases (DLE, PSO, and SSc) were compared to controls; whereas, classification of nonlesional (NL) samples as compared to controls is dependent on different features in each nonlesional disease. The IGS, unfolded protein and granulocyte gene signatures were the most important signatures for classifying control or DLE (FIG. 30B), whereas the IGS and glycolysis signatures together classified 83% of PSO samples (FIG. 30C). In contrast, the IL-12 complex, IGS and apoptosis signatures classified AD or control (FIG. 30D), and the IGS, TGFB fibroblast, IL- 12 complex, amino acid metabolism and IL-1 cytokine signatures were the most important features in classifying SSc or control (FIG. 30E). These data show that both common and diseasespecific molecular pathway signatures classify the involved skin of the different inflammatory skin diseases.

ML classification of lesional inflammatory skin samples confirms unique and common molecular pathways

[0558] To distinguish inflammatory skin diseases more precisely and confirm the major transcriptomic contributors, we employed several ML algorithms. First, we examined distinct binary classification of pooled lesional DLE, PSO, AD, and SSc compared to pooled control samples using the ensemble decision tree, random forest (RF), with the 48 cellular and pathway signature GSVA scores as input features. The areas under the receiver operating characteristic (AuROC) curves and precision-recall (AuPR) curves for each binary classification were greater than 0.96 in all cases, indicating excellent performance and appropriate binary classification for each disease compared to control samples (FIG. 35A, B). For each binary comparison, wedetermined the top 15 most important features in separating disease from control samples usingRF Gini feature importance. To classify DLE from control samples, the IGS, TNF and IL-23 complex signatures were the most important features (FIG. 35C), whereas to classify PSO from control samples, the cell cycle, TNF and IL- 12 complex signatures were the most important features (FIG. 35D). To classify AD from control samples, the IL- 12 complex and TNF signatures as well as the IGS were the most important features (FIG. 35E), and the plasma cell signature, IGS and TNF signature were the most important features to classify SSc (FIG. 35F). ML analysis elucidated seven features in common among the 15 most important features for classifying eachinflammatory skin disease from control, including the IGS and the TNF, IL-23 complex, plasma cell, IL- 12 complex, anti-inflammation and T cell IL-23 signatures (FIG. 35G). It is notable that the 15 most important features performed comparably to the full ensemble of 48 features in binary classification of inflammatory skin disease samples (FIG. 35H). Additionally, a number of other ML algorithms were similarly effective at binary classification of these samples (FIG. 36).

[0559] Next, we directly compared gene expression signatures of DLE samples with those of other inflammatory skin diseases. Distinctions in cellular and pathway signature enrichment among DLE and PSO samples were observed using hierarchical clustering of GSVA scores in two datasets with samples from these inflammatory skin conditions (FIG. 38). Notably, transcriptomic signatures that differed between DLE and PSO included the IGS as well as the Langerhans cell and T cell IL- 12 gene signatures. Since we lacked datasets in which DLE samples wereanalyzed concurrently with AD and SSc, we employed ML algorithms to classify samples from these conditions using the 48 GSVA enrichment scores as features. Performance characteristics and AuROC and AuPR curves demonstrated less effective classification of DLE from other diseases than the classification of each disease versus control (FIG. 39 A, B). In the binary classification of DLE and PSO samples, the most important features included the amino acid metabolism, fibroblast and keratinocyte gene signatures (FIG. 39C), whereas classification ofDLE and AD samples involved the glycolysis, TGFβ fibroblast and Langerhans cell gene signatures (FIG. 39D). The Thl7, TGFβ fibroblast and IL-12 signatures were most important in separating DLE and SSc samples (FIG. 39E).

Classification using only the top 15 most important features was as effective as when all 48 features were employed (FIG. 39F, Table 9). Finally, other ML classifiers performed similarly to RF (FIG. 40).

Transcriptomic profiles of nonlesional skin samples distinguish inflammatory skin diseases from each other

[0560] Although the molecular characteristics of lesional skin in each independent disease have been well-studied, less is known about the transcriptomic profiles of uninvolved skin. To determine whether there are underlying immunological abnormalities contributing to disease, we examined gene expression profiles in nonlesional skin samples from patients with DLE, PSO and AD to assess the extent to which they differed from either control or lesional skin; nonlesional SSc data was not available for analysis. Analysis of GSVA enrichment scores demonstrated that nonlesional samples were transcriptionally different from lesional samples (FIG. 42). In fact, the upregulation of inflammatory pathways including IFN, IL- 12, IL-23 and TNF gene signatures in lesional versus nonlesional skin mirrored the upregulation of these same pathways observed in comparison of lesional versus control skin in DLE, PSO and AD (FIG. 30A). We next carried out ML classification using GSVA scores to determine whether nonlesional samples were different than control. To overcome class imbalance issues, we employed undersampling and synthetic minority oversampling technique (SMOTE) to balance the number of samples in each class to optimize each binary classification. ML was able to classify nonlesional disease and healthy control samples reliably (FIG. 43A, B). Classification of nonlesional DLE and control showed that the unfolded protein, Langerhans cell and NK cell signatures were the most important features (FIG. 43C), whereas the amino acid metabolism, cell cycle and IL- 17 complex signatures werethe most important features to classify nonlesional PSO and control (FIG. 43D). By contrast, the oxidative phosphorylation, anti-inflammation and granulocyte signatures were the most important features in classifying nonlesional AD from control (FIG. 43E). Notably, comparison of the top 15 features of each binary classification showed that there was minimal overlap of important features among nonlesional skin diseases, with only one feature, apoptosis, shared among nonlesional DLE, PSO and AD (FIG. 43F). The binary classifications performed accurately when only the top 15 features of each binary classification were employed (FIG. 43G, Table 11). Moreover, other ML classifiers performed similarly to RF (FIG. 44). These data indicate that nonlesional skin in each of the three diseases evaluated is uniquely different from control skin.

[0561] Given that nonlesional skin of each disease was distinct from control skin and appeared to be distinct from other diseases, we next compared our binary classification of nonlesional DLE compared to nonlesional PSO or nonlesional AD using balance strategies described above (FIG. 45A, B). We found that nonlesional DLE and nonlesional PSO are easily separable, with the NK cell, amino acid metabolism and plasma cell signatures being the top features used in classification (FIG. 45C). In the binary classification of nonlesional DLE compared to nonlesional AD, the top features included the inflammasome, NK cell and unfolded protein signatures (FIG. 45D). Comparable classification was observed with the top 15 features only (FIG. 45E, Table 13) and other ML classifiers gave similar results (FIG. 47). Finally, binary classification of nonlesional PSO and nonlesional AD exhibited less effective classifier performance, but classification based on signatures including amino acid metabolism, IL-23 complex and cell cycle was achieved (FIG. 48). These results indicate that transcriptomic profiles of nonlesional skin are sufficiently unique to distinguish different inflammatory skin diseases. Moreover, the differences among nonlesional skin of the three diseases appear to be greater than the differences among lesional skin of the same three diseases.

[0562] The data demonstrated that ML employed specific gene signatures for the classification of nonlesional samples from patients with inflammatory skin diseases. To probe the differences in nonlesional skin in greater detail, we carried out an additional analysis using GSVA. For this analysis, we pooled the control samples and nonlesional samples from all datasets and employed Z- score normalization to scale data from samples obtained from different datasets. We found that nonlesional DLE samples compared to control samples show upregulation ofB cells, melanocytes and complement protein gene signatures (FIG. 49A, FIG. 51). Nonlesional PSO samples showed upregulation of T cell and Thl7 gene signatures, whereas nonlesionalAD samples compared to control samples showed upregulation of skin-specific DC, IL- 12 and anti-inflammation gene signatures as compared to control samples (FIG. 49 A, FIG. 52 and FIG. 53).

[0563] To confirm these GSVA results, we also employed a mean of Z-scores calculation for enrichment of signatures and found similar results (FIGS. 54, 55, 56, Table 6). Notably, comparison of significantly enriched signatures determine by the Z-score GSVA approach and the top most important features for ML classification of nonlesional skin demonstrated considerable overlap (FIG. 49B) Of the 40 features used in both approaches, the majority of discriminatory features were similar. For example, in nonlesional DLE there were 11 shared features between ML and Z-score GSVA methods including the plasma cell, TNF, Langerhans cell and B cell signatures. In nonlesional PSO, there were 12 shared features between the two methods including the skin-specific DC, IL-17 complex and Thl7 signatures. In nonlesional AD, there were eight shared features between the two methods including neutrophil, plasma cell and anti-inflammationsignatures.

ML successfully classifies different subtypes of CLE

[0564] Because the ML pipeline was able to determine specific signatures that separated related diseases, we sought to determine whether subtypes of CLE could be distinguished by analyzinggene expression profiles. We previously observed a robust upregulation of immune cell gene signatures, including the pDC, monocyte, T cell and B cell signatures by GSVA comparison of DLE and control samples (refer to FIG. 29B). The monocyte, T cell and B cell signatures were upregulated and the Langerhans cell signature was downregulated in patients with SCLE as compared to healthy skin in a similar pattern to DLE (FIG. 57A, B). In contrast, comparison of DLE and SCLE revealed minimal signatures with significant differences between the two CLE subtypes (FIG. 57C, D). Indeed, by hierarchical clustering, samples from patients with SCLE did not form distinct clusters but rather were interspersed with DLE suggesting that subtypes of lupus are not molecularly separable (FIG. 58). However, when we applied ML to determine whether DLE could be distinguished from SCLE (FIG. 58A, B), classification with good performance characteristics was achieved (FIG. 58C, D). The top 15 features important in distinguishing DLE from SCLE included plasma cell, unfolded protein and TNF signatures (FIG. 58E). It is notable that the pattern of gene enrichment signatures is similar in DLE and SCLE, but the magnitude of enrichment of gene signatures was greater in DLE. These quantitative differences are sufficient for ML to classify the two subtypes of CLE effectively (FIG. 58F).

Cytokine-stimulated keratinocyte signatures and T cell signatures differ among inflammatory skin diseases [0565] Keratinocyte and T cell signatures are often upregulated in inflammatory skin diseases. Because the ML analyses to this point have focused on gene signatures previously implicated in lupus (17, 21, 22), with less emphasis on those implicated in other inflammatory skin diseases (23, 24), we examined previously published PSO- or AD- specific gene signatures were also differentially enriched among the four lesional inflammatory skin diseases as compared to healthy controls. To accomplish this, we first evaluated gene sets derived from keratinocytes stimulated with various cytokines (Table 4C). We found, in all diseases, that many of the keratinocyte gene signatures were highly enriched (FIG. 59). DLE was highly enriched for the IFN-stimulated keratinocyte gene signatures, whereas PSO was highly enriched in IL-1 and IL- 17-stimulated keratinocyte gene signatures. Keratinocyte gene signatures were upregulated equally in AD, including IFN and IL- 17-stimulated keratinocyte gene signatures. SSc showed more modest increases to the keratinocyte gene signatures, with IL-1, IL- 17 and TNF-stimulatedkeratinocyte signatures decreased in some datasets. These data indicate a possible role for cytokine-stimulated keratinocytes in the pathogenesis of each disease, with a less prominent effect in SSc. Collinearity analysis revealed that the keratinocyte signatures were highly correlated, and therefore, not appropriate for use in ML (FIG. 60). Secondly, we examined the role of more nuanced T cell populations. For example, Thl7 cells and IL-17 are targets of successful therapeutic intervention in PSO(25, 26)\ additionally, the Th2 T cell subset is implicated in AD pathogenesis (20). GSVA analysis of the T cell gene signatures demonstrated heterogeneity across the PSO, AD and SSc datasets. T follicular helper (Tfh) cell, Thl cell and Thl7 cell signatures were enriched in most DLE datasets; PSO had variable enrichment of Thl cell, Th2 cell and Thl 7 signatures, as did AD, while SSc exhibited less robust T cell enrichment (FIG. 61).

[0566] FIG. 63 shows that ML classification of DLE versus PSO, AD, and SSc confirms distinct disease-specific gene signatures. FIG. 63A, B, and C: Top 15 features important in classifying lesional DLE versus lesional PSO (FIG. 63A), lesional DLE versus lesional AD (FIG. 63B), lesional DLE versus lesional SSc (FIG. 63C). FIG. 63D, and E: Top 15 features important in classifying nonlesional DLE versus nonlesional PSO (FIG. 63D), and nonlesional DLE versus nonlesional AD (FIG. 63E) using SHAP values. Collinear features were removed. For the classification of DLE vs each disease for both lesional and nonlesional samples, the data set was split into train (70%) and test (30%) sets. Training model weights of the random forest classifier (RF) from each binary classification model (lesional DLE vs lesional AD, lesional DLE vs lesional PSO, lesional DLE vs lesional SSC, nonlesional DLE vs AD, and nonlesional DLE vs PSO) were imported. SHAP was applied on the random forest training model and summary plots (FIGs. 63A to E) of the training data in each disease-to-disease binary classification were generated. The dots in the summary plots refer to the data points/samples. The y-axis of the plot has the features sorted in rank order of their feature importance scores (i.e. high on the top), whereas the x-axis refers to the mean SHAP values of each feature, meaning the magnitude by which the feature impacts the model outcome. The positive SHAP values designate that the feature (the high or low GSVA score) positively contribute to classification of the target variable (DLE = 1) and the negative SHAP values designate that the feature negatively contributes feature to the label (in this case, DLE).

[0567] Inflammatory skin disease risk score was calculated to understand activity of cellular and immune pathways in lesional skin diseases. GSVA of 48 gene signatures representing cells (immune and non-hematopoietic) (Tables 4A-1 to 4A-20) and pathways (Tables 4B-1 to 4B-20) was run independently on 16 datasets including samples from DLE, AD, PSO, and SSc. When pooled, the datasets comprised 90 lesional DLE, 183 lesional PSO, 132 lesional AD, 97 lesional SSC and 164 normal skin controls. To derive a model by which the inflammatory skin disease risk score could be generated, the GSVA scores in each sample were binarized, where GSVA scores > 0 became 1, and GSVA scores < 0 became 0. Logistic regression with ridge penalty was then run by random sampling with replacement 41 samples from each disease (41 lesional DLE, 41 lesional PSO, 41 lesional AD, and 41 lesional SSc), thereby totaling to 164 lesional skin disease samples, and 164 non-disease skin controls on each iteration, with the 48 binarized GSVA scores in each sample serving as features. Coefficients were calculated for each iteration and final coefficients were obtained by taking the average of all iterations. FIG. 64 shows the resulting coefficients for classification of lesional skin from control after the ridge regression model was run for 500 iterations. To calculate inflammatory skin disease risk score in each sample, the coefficient for each category from the logistic regression model is multiplied by the binarized GSVA score for that category and all categories are summed to generate a final score. Following are the resulting coefficients shown in FIG. 64 - IFN: 0.84; IL 12 complex: 0.75; Anti inflammation: 0.51; Cell cycle: 0.51; IL 23 complex: 0.43; Plasma Cell: 0.4; Apoptosis: 0.4; B Cell: 0.37; Monocyte/myeloid Cell: 0.34; Pentose Phosphate: 0.32; ROS production: 0.32; T Cell IL 23 Signature: 0.3; pDC: 0.29; Glycolysis: 0.28; Unfolded Protein: 0.27; Complement Protein: 0.27; AA Metabolism: 0.26;

Monocyte: 0.25; Keratinocyte: 0.25; IL 21 Complex: 0.21; T Cell IL 12 signature: 0.2; TGFB Fibroblast: 0.2; Inflammasome: 0.19; Endothelial Cell: 0.17; Proteasome: 0.17; IL 17 Complex: 0.16; Imunoproteasome: 0.16; TCA cycle: 0.15; TNF: 0.13; Neutrophil: 0.12; Skin-specific DC: 0.12; GC B cell: 0.1; Thl7: 0.07; IL 12: 0.06; IL1 cytokines: 0.06; T Cell: 0.05; Langerhans Cell: 0.03; Melanocyte: -0.02; Fibroblast: -0.04; NK Cell: -0.11; OXPHOS: -0.16; Peroxisome: -0.2; Erythrocyte: -0.22; Platelet: -0.22; FABO: -0.23; FAAO: -0.32; Granulocyte: -0.38; and LDG: -0.41.

[0568] Analysis of the Transcriptomic Profiles of Rheumatic Skin Diseases Reveals Diseasespecific Endotypes. Patients with rheumatic skin diseases such as cutaneous lupus erythematosus (CLE) and systemic sclerosis (SSc) can be classified from individuals with healthy skin using the enrichment of specific molecular signatures However, as patient heterogeneity is a well-known feature of these diseases, it is important to ascertain whether there are distinct patient molecular phenotypes (endotypes). Moreover, identifying specific disease endotypes from analysis of peripheral blood might increase the capacity to recognize patient endotypes.

[0569] Gene expression data derived from publicly available lesional skin biopsies of CLE, specifically discoid lupus erythematosus (DLE), the most severe CLE subtype, and SSc patients was analyzed using gene set variation analysis (GSVA) of informative gene modules. Paired blood samples were also analyzed when available. K-means clustering was then applied to the GSVA enrichment scores to identify molecular endotypes in both skin diseases.

[0570] In DLE, k-means clustering revealed three subsets with distinct features (FIG. 65A). The least abnormal DLE cluster (blue, Group 1) was characterized by minimal changes, including higher B cell GSVA scores, whereas the other two clusters (Green, Group 2 and Yellow, Group 3) had markedly increased inflammatory cell and pathway gene expression, with most modules being enriched by a greater magnitude in yellow (Yellow, Group 3). In SSc, four endotypes emerged from k-means clustering (FIG. 65B). The first cluster (Gold, Group 1) was the least abnormal, with unchanged cell and metabolic signatures. The second cluster (Orange, Group 2) was characterized by increased melanocyte transcripts and altered metabolic signature expression. The third cluster (Pink, Group 3) exhibited increased myeloid, NK, T, and plasma cell signatures as well as increased glycolysis, pentose phosphate, and apoptosis signatures. The most severe SSc endotype (Purple, Group 4) was comprised of increased myeloid cell, T cell, and fibroblast signatures, and altered metabolism expression. Both the pink and purple clusters were enriched in cytokine signaling modules such as IL12 and TNF. Comparison of the three DLE endotypes with the four SSc endotypes revealed some overlap between the least severe clusters (blue and gold) of each disease (FIG. 65C)

[0571] Both DLE and SSc skin exhibit distinct subsets based upon their molecular profile (endotype). In addition, the most severe SSc skin subset can be identified from blood gene expression. Identifying specific molecular endotypes of patients with inflammatory skin disease may facilitate matching individual patients with effective therapies.

Discussion

[0572] Here, we employed a comprehensive analysis of gene expression profiles to characterize the molecular features of four inflammatory skin diseases. Although considerable inter- and intra-dataset heterogeneity was observed, we documented molecular gene signaturesthat define both lesional and nonlesional skin of the various conditions. Notable among the findings were the shared and unique features of lesional skin among the four diseases and the unique features of nonlesional uninvolved skin. Altogether, this analysis demonstrates theinformative power of transcriptomics to determine pathological characteristics of specific stages of each disease. [0573] Our analyses involved multiple bioinformatic and statistical approaches that allowed usto understand the molecular pathways underlying the pre-clinical and clinical stages of inflammatory skin diseases. First and foremost, we assessed numerous datasets for each disease so that we could capture the transcriptional landscape of each condition and overcome the heterogeneity among patients and datasets. Second, each dataset was independently evaluated by GSVA using informative gene signatures we previously employed in the analysis of lupus (17, 21, 22, 27), gene signatures derived from interrogation of other inflammatory skin diseases (23, 24), as well as additional signatures we generated because of their relevance to skin pathogenesis. This analysis allowed us to observe unique patterns in the enrichment of inflammatory pathway signatures among and between the diseases and document that the diseases were molecularly separable. We employed ML models, including CART and random forest, to determine that effective classification between disease and control or between diseases was achievable and to identify the most important features labeling the conditions. The ML models not only permitted the effective classification of samples, but also allowed for dimensionality reduction, scaling the original 48 input gene signatures down to 15 features mostimportant in each classification.

[0574] Despite previously noted heterogeneity (28, 29), our analysis revealed that the molecular landscape of DLE was more homogeneous across datasets comprising patients from different centers was sufficiently similar to permit accurate classification. Similarly, datasets including patients with SSc demonstrated consistent gene expression patterns. In contrast, we and othersfound datasets comprising patients with PSO and AD to be more molecularly heterogeneous (30). Nevertheless, we identified definitional transcriptional elements for each of the various conditions that included both shared and specific molecular perturbations. Comparison of the lesional DLE, PSO, AD and SSc transcriptomes using GSVA demonstrated that these four inflammatory skin diseases have numerous inflammatory pathways in common. IFN, TNF, IL-12complex, IL-23 complex, T cell IL-23, antiinflammation and unfolded protein gene signatures were commonly upregulated among lesional biopsies from the four inflammatory diseases. Indeed, CART analysis, which was used as a first path algorithm to detect important discriminators within the data, demonstrated the IFN and IL- 12 complex gene signatures were the two most important features in distinguishing lesional DLE, PSO, AD and SSc from their respective control samples. Moreover, ML algorithms documented that of the 15 features necessary for accurate classification of each disease from control, seven features are common among all four diseases, including the IGS, IL- 12 complex, IL-23 complex, TNF, plasma cell, T cell IL-23 and anti-inflammation gene signatures. Altogether, there were six shared and upregulated features between the GSVA and ML methods, suggesting that despite different genetic predispositions and disease manifestations, lesional DLE, PSO, AD and SSc have a common inflammatory microenvironment that differentiates them from control skin. This was further supported by the overlapping enrichment of numerous signatures among at least two of the four diseases. For example, GSVA demonstrated the neutrophil signature was upregulatedin the majority of PSO and SSc patients, whereas the pDC, monocyte, monocyte/myeloid cell and B cell signatures were increased in the majority of DLE and SSc patients; the T cell, IL-21 complex, inflammasome and cell cycle signatures were increased in all diseases, except SSc.

[0575] Despite the similarities among the diseases however, we detected unique characteristicsof each inflammatory skin disease. Indeed, we observed clear molecular distinctions between lesional samples from patients with DLE, PSO, AD, and SSc. The GSVA analysis revealed that the NK cell signature was only upregulated in lesional DLE compared to control samples, whereas the IL-1 cytokine signature was uniquely upregulated in lesional PSO compared to controls. Notably, however neither signature proved to be of particular importance in ML classification of either disease from controls. Similarly, the proteasome and Langerhans cell signatures were uniquely enriched in AD compared to control samples, and the endothelial cell and fibroblast signatures were uniquely enriched in lesional SSc compared to controls, these signatures were not of particular importance in ML classification of either disease. Despite this complexity, ML was able to delineate the most important features for classification of each condition from normal. For example, although increased in some patients from all diseases, the monocyte, T cell and B cell signatures were more important in the classification of DLE. Moreover, the keratinocyte and neutrophil signatures were most important in classifying PSO, and not the other diseases, a finding that is consistent with the role of keratinocyte proliferation and neutrophil infiltration in PSO (18, 31 -33). In addition, the IL- 21 complex signature was upregulated in all diseases except SSc, but was unique to ML classification of AD, consistent with the role of IL-21 in allergic skin diseases (34, 35). Finally, the TGFB fibroblast signature, was important in classification of SSc, which aligns with the central role of fibrosis in thisdisease (23, 36). Furthermore, ML demonstrated that the pDC, fibroblast, and glycolysis signatures are important in classifying lesional DLE from the other lesional diseases, illustrating that ML can identify molecular changes as effective classification features among samples. These findings strongly imply there are unique molecular features in lesional biopsies of inflammatory skin disease, along with a panoply of shared features.

[0576] Although there are numerous reports of gene expression abnormalities in lesional skin, less is known about the architecture of clinically uninvolved skin as compared to healthy skin. Examination of nonlesional skin in DLE, PSO, and AD provided new insights into the molecular processes operating in uninvolved skin and suggested a unique pre-clinical set of abnormalities in each condition. Application of both ML and Z-score based approaches as orthogonal analytic tools to assess the differences between nonlesional and normal skin revealed unique pattemsof abnormalities in each inflammatory skin condition. Notably, only the apoptosis signature was one of the top 15 features employed by ML to classify nonlesional DLE, PSO, and AD versus pooled controls (FIG. 62A) FIG. 62B shows summary of possible therapies of lesional skin diseases analyzed (left) and possible therapies for both lesional and nonlesional regions of each disease (right) based on molecular characterization. This suggests that dysregulated apoptosis may be a key feature in the initiation of each of these three diseases; indeed, apoptosis is cited in the pathogenesis of skin diseases including CLE, PSO and AD and enhanced apoptosis is a well-recognized systemic feature of SLE (37-41). In general, unique molecular features characterize each condition, such as the IL-21 pathway in CLE and IL- 17 in PSO. Genetic polymorphisms may contribute to the abnormalities noted in nonlesional skin, as for example, susceptibility to lupus is in part associated with polymorphisms in IL-21 (42) and the IL-21 receptor (43) and polymorphisms in IL- 17 are correlated with PSO treatment response (44).

[0577] Of note, unlike lesional disease, we did not observe a prominent role for the IGS in nonlesional skin from DLE, PSO, or AD. This contrasts with some previous studies suggesting that nonlesional skin from patients with SLE or DLE is influenced by type 1 ILN (45-47). However, this contention was based largely on single-cell RNA-seq analysis of nonlesional keratinocytes and their expression of the IGS (45, 46), whereas our studies have evaluated expression of the IGS by deconvolution of bulk tissue gene expression. Our data revealed increased IGS in a few DLE samples, which may align with the increase of ILN action in only select cell clusters from single-cell RNA-seq analysis, but not in the majority of samples. Altogether, this suggests that ILN is not a dominant factor of nonlesional disease in either CLE or PSO and may instead reflect the concurrent exposure to UV light or presence of specific autoantibodies, both of which are associated with upregulation of the IGS (48-50).

[0578] Taken together, the data suggest a model in which patients with inflammatory skin disease manifest a specific set of pre-clinical molecular abnormalities that could predispose a patient to the development of typical clinical features, perhaps after encountering an environmental trigger (such as UV light, bacterial products or allergens). Upon development of cutaneous inflammation, common molecular features are upregulated, although the lesional disease maintains a unique gene expression profile (FIG. 62). This model is consistent with reports that nonlesional skin of patients with inflammatory skin disease is a pre-inflammatory or “primed” state, and that some of the same molecular processes may contribute to maintainingboth the pre-inflammatory and inflammatory components of the skin conditions (30). Herein, we see both an overlap of gene signatures upregulated in lesional skin between skin conditions butalso some overlap between nonlesional and lesional skin within each inflammatory skin disease.

[0579] Previous reports were not able to separate molecular features of DLE from those of SCLE despite the dramatic differences in clinical phenotype (51). Both DLE and SCLE are characterized by interface dermatitis, but the differences in clinical manifestations suggestdifferent molecular underpinnings. By GSVA, gene expression profiles of these two entities weresimilar to each other. In fact, GSVA analysis showed the same gene signatures were significantly enriched in each subtype compared to control. However, the effect size was greaterin significantly enriched modules in DLE compared to SCLE. The quantitative differences were sufficient for ML to classify DLE from SCLE by using predominately the plasma cell, neutrophil, pDC, melanocyte and GC B cell features as well as the TNF, IL- 12 and IL-1 cytokine inflammatory features to classify CLE subtypes.

[0580] The results of this analysis lend insight into future treatment strategies for CLE, PSO, AD, and SSc based on the observed common and distinct molecular characteristics. Lor example, IL-17 is a well-known target for PSO treatment (52) and has been explored in therapy for lupus (53) and AD (54); however, we did not observe consistent upregulation of the IL-17 complex signature among the lesional manifestations of DLE, AD, and SSc, suggesting IL-17 neutralizing therapy may be best suited for lesional PSO alone. However, we observed upregulation of IL- 17 complex and Thl7 gene signatures in nonlesional PSO and AD samples, as well as upregulation of the IL-17 complex signature in DLE as compared to control samples, suggesting that IL- 17 targeting might be appropriate to prevent the emergence of typical skin lesions in all three diseases as well as to treat established plaques in lesional PSO. Of note, twoof five lesional DLE datasets demonstrated significant upregulated of the IL- 17 complex and Thl7 signature suggesting that a subset of DLE patients might be responsive to IL- 17 neutralization, and a study investigating the role of secukinumab, a monoclonal antibody to IL- 17a, in DLE is ongoing (55). In addition, the consistent upregulation of the TNF signature in each lesional inflammatory skin disease supports the possibility that TNF neutralizing agents may ameliorate inflammation in all four conditions. To date, TNF neutralizing agents are effective in treating PSO (26) (etanercept (56), infliximab (57), adalimumab (58), and certolizumab (59)), while others report their possible efficacy in SLE (60) and AD (61, 62). Notably, a recent phase II trial found that intradermal injection of a TNF neutralizing agent, etanercept, as opposed to traditional systemic injection, induced remission in DLE (63, 64), supporting the conclusion that local presence of TNF in the skin lesion is pathogenic in DLE.The IL- 12 signature was important in classifying all four lesional skin diseases, suggesting potential efficacy for the IL- 12/23 inhibitor, ustekinumab, which is approved for treating PSO(65) Recent phase III trials in lupus were unsuccessful (66, 67), but improvement of skin and mucocutaneous lesions was noted in phase II trials (68). Finally, consistent enrichment of the IGS was noted in lesional skin of all four diseases, suggesting the potential for efficacy of interferon inhibitors such as anifrolumab. Indeed, anifrolumab treatment, which was recently approved for SLE, caused a significant reduction in skin involvement in CLE compared to patients receiving placebo (7).

[0581] Some of the individual datasets had small sample numbers; therefore, it was necessary to pool lesional samples from each disease, nonlesional samples from each disease and controls to achieve sufficient sample numbers for ML. In addition, some datasets had few or no controls, and thereby, nonlesional skin could not be compared to control samples by GSVA without the pooling of samples and employing normalization steps. Despite this, we found a number of changes in nonlesional DLE similar to those previously reported by other techniques, for example the decrease in Langerhans cells viaimmunohistochemistry (69). Moreover, because many of the widely used keratinocyte gene signatures were highly correlated with each other, ML analysis on cutaneous gene signatures previously reported in PSO and AD was not possible (30, 70, 71). Despite these caveats and the intra- and inter-dataset heterogeneity, we identified gene signatures both similar and distinct in lesional and nonlesional inflammatory skin diseases.

[0582] In summary, this transcriptomic analysis is one of the first comprehensive studies to evaluate four inflammatory skin diseases concurrently and introduce comparative analyses of both lesional and nonlesional samples with control samples. We elucidated similarities and differences among both lesional and nonlesional DLE, PSO, AD, and SSC. Overall, our combined GSVA/ML analysis demonstrated that although there are seven shared features for classifying lesional DLE, PSO, AD, and SSc from pooled controls; nonlesional skin samples among diseases are molecularly more distinct from one another than lesional samples. This reveals that nonlesional skin samples are extremely informative about the underlying disease process and could be used in a subset of patients for future clinical trials (72). Indeed, nonlesional skin may be more useful in identifying the driving features underlying pathogenesis, since during chronic lesional disease the inflammatory milieu among diseases becomes more similar. In addition, although enrichment analysis of all cell types and pathways is important in the overall definition of disease pathology and necessary to understand for treatment, specific features may be more important in molecular diagnostics for identifying one disease from another.

Materials and Methods:

[0583] Experimental Design: 15 publicly available gene expression datasets (accessed from the Gene Expression Omnibus (GEO)) were analyzed (Table 3), including: 11 Affymetrix / Illumina microarray datasets (GSE52471 (10), GSE72535 (11), GSE81071 (14, 15, 68, 69), GSE109248 (13), GSE100093 (16), GSE120809 (12), GSE117239 (75), GSE117468 (52), GSE130588 (76), GSE58095 (77), GSE95065 and 4 RNA-seq datasets (GSE121212 (30), GSE137430 (54), GSE157194 (71), GSE130955 (78)). GSE81071 was split into two parts based on the submission date on GEO (GSE81071 from 2017 referred to in the text asGSE81071_A and GSE81071 from 2019 referred to in the text as GSE81071 B). All datasets comprise gene expression derived from skin biopsies of lesional or nonlesional skin frompatients with an inflammatory skin disease, including PSO, AD, SSc, and CLE subtypes including DLE, SOLE, and ACLE or skin biopsies derived from healthy control subjects. For GSE117239 (75), GSE117468 (52), GSE130588 (76), GSE137430 (54), and GSE157194 (71), only lesional and nonlesional samples at baseline without drug treatment were included in the analysis. [0584] Statistical Analysis: Statistical differences between cohorts were evaluated using unpaired t- test with Welch’s correction for GSVA enrichment scores of lesional and nonlesional samples, mean Z-scores of nonlesional samples versus control, and paired t-test with Welch’s correction for lesional versus nonlesional comparison were carried out in GraphPad PRISM. Calculation of mean and standard deviation (SD) for each GSVA score in each tissue was performed in Microsoft Excel. The number of samples for each dataset detailed in Table 5B. Furtherstatistical details can be found below.

Raw data processing:

[0585] Microarray data. Microarray data was normalized using either GeneChip Robust Multiarray Average (GCRMA), Robust Multiarray Average (RMA), or normexp background correction (NEQC) based on the microarray platform. Outliers and batch effects were identified using principal component analysis (PC A) plots. For the dataset with known batch effects, GSE81071, raw gene expression values were normalized using 11 housekeeping genes, which were shown to not vary significantly across datasets (79). These 11 housekeeping genes were: chromosomel Open Reading Frame 43 (Clorf43), Charged multivesicular body protein 2A (CHMP2A), ER membrane protein complex subunit 7 (EMC 7), glucose-6-phosphate isomerase (GPI),proteosome subunit beta type 2 (PSMB2), proteosome subunit beta type 4 (PSMB4), member RAS oncogene family (RAB7A), receptor accessory protein 5 (REEP5), small nuclear ribonucleoprotein D3 (SNRPD3), valosin containing protein (VCP\ and vacuolar protein sorting 29 homolog (VPS29).

[0586] RNAseq data. SRA toolkit (NCBI Sequence Read Archive, Version 2.10) was used to fetch .sra files from GEO and convert them to.fastq files. Quality of the FASTQ files was checked using FASTQC software (Babraham Institute Bioinformatics, Version 0.11.9). Adapters were removed using Trimmomatic software (80) (Version 0.4) and appropriate head crop parameters. Trimmed reads were aligned to the human reference genome (hg38) using STAR aligner (81) (Version 2.7). STAR output .sam files were converted to .bam files using sambamba (82) (Version 0.8). Read summarization was provided using the featureCounts (83) function of the Subread (84) (Version 2.0) package. Count normalization and regularized log transformation were carried out using rlog function in DESeq2 (85) (Version 1.32) R package.

[0587] Gene Set Variation Analysis: Gene Set Variation Analysis (86) (GSVA) is a nonparametric, unsupervised method for estimating variations in gene set enrichment among the samples of anexpression dataset. The GSVA algorithm was implemented using the R Bioconductor open- source package gsva (version 1.40). GSVA was carried out in one of the following ways:

[0588] When individual datasets were analyzed, the preprocessed log2 gene expression matrix of each dataset was used as the GSVA input. GSVA was run on each dataset separately. Before running GSVA, input genes were filtered and only those with interquartile range (IQR) ofexpression > 0 across all the samples were considered for analysis. All analysis in FIGS. 29, 30, 35, 39, 43 and 45, are the result of this GSVA process. A minimum of 2 genes was required for each signature.

[0589] For the analysis of pooled nonlesional and control samples, log2 gene expression values generated from independent preprocessing of all 16 datasets were concatenated to create a matrix whose rows consisted of 8425 genes detected across all datasets and whose columns consisted of the 1065 samples comprised of DLE, nonlesional (NL) DLE, ACLE, SCLE, PSO,NL PSO, AD, NL AD, SSc, and CTLs. Log2 values were then transformed to Z scores using scale() function in R. Z- score transformation converts each sample to have expression values with mean and unit variance equal to 0 (87, 88). This transformation permitted comparison of nonlesional disease samples to control directly. GSVA was then run on the following three inputs 1) 21 pooled nonlesional DLE and 168 pooled control samples, 2) 132 pooled nonlesionalAD and 168 pooled control samples, and 3) 163 pooled nonlesional PSO and 168 pooled samples. The data presented in FIG. 49A is derived from this GSVA process. A minimum of 2 genes was required for each signature.

[0590] GSVA Gene Sets: The gene sets used for GSVA can be found in Table 4A-C Cellular/ pathway signatures: Gene sets employed in our GSVA analysis included 48 annotatedand novel cellular and pathway signatures that have been implicated in lupus (4, 5, 6) or inflammatory skin diseases (7, 8). Immune cell gene sets were previously evaluated (21, 27) or amended slightly based upon data from the Human Protein Atlas (89, 90). Non-hematopoietic cell signatures were derived from the Human Protein Atlas (89, 90), previously published gene sets (91), and literature mining as previously described and employed. Pathway gene signatures were previously evaluated in lupus (21, 27), previously published (23, 24), or newly adopted by literature mining (92, 93). The output GSVA scores of each signature were used as features for training and validating ML classifiers. 40 of the 48 cellular and pathway gene signatures were used to implement the GSVA analysis on pooled nonlesional and control samples. The following signatures were excluded from this analysis because of insufficient genenumbers (</= 2) in the 8,425 genes used: LDG, GC B cell, erythrocyte, IL1 cytokines, IL 12 complex, IL21 complex, IL23 complex, and the immunoproteasome.

[0591] Keratinocyte signatures: 30 gene sets specific to keratinocytes treated with individual cytokines were created by from previously published studies (18, 94-100), (70, 101-107). Only those genes that are upregulated in keratinocytes when treated with various cytokines were includedin these sets.

[0592] T cell signatures: Gene sets for T cells were created from literature mining (108-111) and the Human Protein Atlas (89, 90) to distinguish seven different T cell subsets that have been implicated in inflammatory skin disease.

[0593] Classification and Regression Tree (CART): The library rpart (Version 4.1) was used to implement the CART algorithm for classification described previously (112, 113) and library rpart.plot (Version 3.1) was used to visualize classification trees. GSVA enrichment scores of cellular and pathway signatures were used as independent variables and specific disease(either DLE, PSO, AD, SSC, or CTL) was used as the dependent variable for analysis. Classification trees were built independently for each disease

ML Analysis:

[0594] Creating input for ML. The input for ML was created by pooling GSVA enrichment scores of cellular and pathway gene signatures from multiple skin datasets based on classification of skin disease or sample (Table 5B). Lor every dataset, GSVA enrichment scores, that range from -1 to +1, were concatenated from multiple datasets, providing a sufficiently largecohort to train and validate various ML algorithms. 14 input data frames were created for 14 separate binary ML classifications (Table 5A). Seven of the 14 binary classifications involve comparing control samples (164 CTL) with either lesional samples (DLE, PSO, AD or SSc) or nonlesional samples (DLE, PSO or AD) of inflammatory skin diseases (Table 5A A-D and I-K), whereas the other six binary classifications involve comparing lesional DLE samples with lesional samples of other diseases (either PSO, AD or SSc) (Table 5A E-H) and nonlesional DLE samples with nonlesional samples of other diseases (Table 5A L-M). In addition, another binary classification consisted of comparing nonlesional PSO and nonlesional AD (Table 5A). Lor lesional skin classification, pooled samples were 90 DLE, 132 AD, 97 SSc, and 183 PSO. Lor nonlesional skin classification, pooled samples were 21 DLE, 163 PSO, and 132 AD, and for healthy skin pooled samples were 164 CTL (Table 5B).

[0595] Class balance strategies: Pour class balance strategies, including: random undersampling (Table 5A C), random oversampling (Table 5A E, K) removingsamples from an entire dataset (Table 5A F), and Synthetic Minority Oversampling Technique (SMOTE) (114) (Table 5A I, L, M) were used for classifications with class imbalance. The random undersampling strategy involves randomly selecting samples from the majority class, whereas the random oversampling strategy involves randomly duplicating examples from the minority class. SMOTE functions by randomly selectingsamples from the minority class, finding its k nearest neighbors, randomly selecting a neighbor, and generating a synthetic sample at a randomly selected point between two samples in the feature space. As previously noted, we used random undersampling to trim the number of examples in the majority class then used SMOTE to oversample the minority class to balance class distribution. The purpose of all class balancing strategies was to have balanced representation of both classes for ML. The dataset was split into 70% training and 30% validation and class balancing strategies were applied on the training dataset. ML algorithms were then implemented, and evaluation matrices were noted. Receiver Operating Characteristic (ROC) curves and Precision- Recall (PR) curves were plotted using the matplotlib (Version3.3.4) library of Python. A ROC curve is graphical way to visualize trade-off between sensitivity and specificity. High area under the curve represents a low false-positive rate and a high true- positive rate. A PR curve is a measure of classification when classes are imbalanced. High areaunder the PR curve represents both high recall and high precision, where high precision relates to a low false-positive rate, and high recall relates to a low-false negative rate. For our analysis, we were interested in features that contributed the most towards separation of classes, hence RF was chosen as the primary ML classifier because it gives impurity-based feature importance. The top 15 features with decreasing Gini index from each classification were summarized in a bar graph using ggplots2 (Version 3.3.5) library in R. Capability of the top 15 features alone to separate the two respective classes was tested by repeating the 14 binary ML classifications using only the top 15 features. Various overlaps between the top 15 features of multipleclassifications were visualized in Venn diagrams.

[0596] Binary ML classification: 14 separate binary ML classifications were carried out using scikit- leam (Version 0.24.1) library in Python (Version 3.8.2). For each binary classification, performance of several ML algorithms, including: Logistic regression (LR), K-Nearest Neighbor (KNN), Naive Bayes (NB), Support Vector Machines (SVM), Random Forest (RF), and Gradient Boosting (GB) was evaluated based on sensitivity, specificity, Cohen kappa score, f-1 score, and accuracy. RF was chosen as the primary ML classifier because it gives impurity-based feature importance. The top 15 features with decreasing Gini index from each classification were summarized in a bar graph using ggplots2 (Version 3.3.5) library in R. Capability of the top 15 features to separate two respective classes was tested by repeating the 14 binary ML classifications using only the top 15 features.

[0597] Feature correlation'. Before carrying out binary ML classification, feature selection was necessary in order to remove noninformative or redundant features. We assessed feature redundancy by calculating the Pearson correlation between each feature and every otherfeature. Pearson correlation between features was computed using the cor function in R. corplotlibrary in R was used to plot 22 Pearson correlation plots (FIG. 38, 42, 48, 51, 60). In 13 of these correlation plots, there was a pair of highly correlated features (correlation coefficient > 0.8), and the feature with the lower correlation was removed using a greedy elimination approach; doing this allowed us to retain the most informative features forML (Table 5A) Pearson correlation plots were also plotted for keratinocytesgene signatures and T cell signatures (FIG. 37, 46). High correlation between the keratinocyte gene signatures made them unsuitable for ML analysis (FIG. 36).

[0598] Statistical Analysis: Statistical differences between cohorts were evaluated using Welch’s T test for lesional disease versus control GSVA scores from a single dataset, nonlesional samples versus control GSVA scores from combined dataset, mean Z -cores of nonlesional samples versus mean Z-scores of control samples of a single gene signature and Paired T test for lesional versus nonlesional comparison. The magnitude of this difference (the effect size) was estimated using Hedge’s g,(115) calculated as below where,

[0599] cohort 1 and cohort 2 could be either disease and their respective control samples of a single dataset or nonlesional samples and control samples from combined dataset or mean z scores of nonlesional samples and mean z scores of control samples of a single gene signature or lesional and their paired nonlesional samples of a single dataset. All the statistical analysis was carried out in using effectSize (version 0.8.1) and stats (version 3.6.2) library in R.

[0600] Data Visualization: Heatmaps of GSVA Hedges’ G effect size and violin plots of GSVA enrichment scores were visualized using GraphPad PRISM (Version 9.2.0). GSVA enrichment scores of gene signatures were visualized using violin plots in Prism or ComplexHeatmap (116) for hierarchical clustering (Version 2.8) package in R. Figures were made using Adobelllustrator Creative Cloud (Version 25.3.1).

[0601] Data and materials availability: All transcriptomic data are previously published and available in the NCBI Gene Expression Omnibus (GSE52471, GSE72535, GSE81071, GSE109248, GSE100093, GSE120809, GSE117239, GSE117468, GSE130588, GSE137430, GSE157194, GSE121212, GSE95065, GSE58095, GSE130955) as seen in Table 3. All data are available in the main text or the supplementary materials. All bioinformatic software used in this publication is open source, freely available for R and Python. Additionally, example code used in this paper for GSVA, CART, and ML are available at figshare, www. figshare.com. File names are “AMPEL BioSolutions GSVA Code AFFY nonzeroIQR Code”, “AMPEL BioSolutions Stepwise and CARTCode”, “AMPEL BioSolutions ML Binary”.

Table 3: Publicly available datasets used

Table 3: Continued

Table 4A: Genes within cellular signature

Table 4A-1: B Cell signature genes

Table 4A-2: Endothelial Cell signature genes

Table 4A-3: Erythrocyte signature genes

Table 4A-4: Fibroblast signature genes

Table 4A-5: GC B Cell signature genes

Table 4A-6: Granulocyte signature genes CLC, HSH2D, MS4A2, PGLYRP1, PRG2, SYNE1

Table 4A-7: Keratinocyte signature genes

Table 4A-8: Langerhans Cell signature genes

Table 4A-9: LDG signature genes

Table 4A-10: Melanocyte signature genes

Table 4A-11: Monocyte signature genes

Table 4A-12: Monocyte/Myeloid Cell signature genes

Table 4A-13: Neutrophil signature genes

Table 4B-5: FAAO signature genes

Table 4B-14: IL23 Complex signature genes IL12B, IL12RB1, IL23A, IL23R

Table 4B-15: Immunoproteasome signature genes

Table 4B-16: Inflammasome signature genes

Table 4B-17: OXPHOS signature genes

Table 4B-18: Pentose Phosphate signature genes

Table 4B-19: Peroxisome signature genes

Table 4B-20: Proteasome signature genes

Table 4B-21: ROS Production signature genes

Table 4B-22: T Cell IL12 signature genes

Table 4B-23: T Cell IL23 signature genes

Table 4B-24: TCA cycle signature genes

Table 4B-25: TNF signature genes Table 4B-26: Unfolded Protein signature genes

Table 4B-27: TGFB Fibroblast signature genes

Table 4B-28: Anti-inflammation signature genes

Table 4C: Genes within T cell signatures.

Table 4D: Genes within keratinocyte signatures

Table 5A: Class balance strategy used for machine learning classification.

Table 5B: Number of samples pooled from each skin dataset to create input for machine learning. Table 5B (Continued)

Table 6: Comparison between Z-score GSVA and mean of Z-score.

Table 6: Continued

Table 7: Classification metrics to properly separate DLE, PSO, AD or SSc and control samples using all 48 (top) or the top 15 (bottom) cellular and pathway gene signatures.

Table 8: Classification metrics including sensitivity, specificity, Cohen Kappa score, precision, f-1 score and accuracy to properly separate lesional disease samples (DLE, PSO, AD or SSc) from healthy control sampleswith each ML classifier

Table 9: Classification metrics to properly separate lesional DLE samples and lesional PSO or lesional AD or lesional SSc samples using all 48 (top) or the top 15 (bottom) cellular and pathway gene signatures.

Table 10: Classification metrics including sensitivity, specificity, Cohen Kappa score, precision, f-1 score and accuracy to properly separate lesional DLE samples from lesional PSO, AD, and SSc samples with each ML classifier.

Table 11: Classification metrics to properly separate nonlesional DLE and control samples, nonlesional PSO and control samples, as well as nonlesional AD and control samples using all 48 (top) or the top 15 (bottom) cellular and pathway gene signatures.

Table 12: Classification metrics including sensitivity, specificity, Cohen Kappa score, precision, f-1 score and accuracy to properly separate nonlesional disease samples (DLE, PSO or AD) from healthy control samples with each ML classifier.

Table 13: Classification metrics to properly separate DLE samples and PSO or AD samples using all 48 (top) or the top 15 (bottom) cellular and pathway gene signatures.

Table 14: Classification metrics including sensitivity, specificity, Cohen Kappa score, precision, f-1 score and accuracy toproperly separate nonlesional DLE samples from nonlesional PSO and nonlesional AD samples with each ML classifier.

Table 15: Classification metrics including sensitivity, specificity, Cohen Kappa score, precision, f-1 score and accuracy to properly separate nonlesional PSO samples from nonlesional AD samples with each ML classifier.

Table 16: Classification metrics including sensitivity, specificity, Cohen Kappa score, precision, f-1 score and accuracy to properly separate DLE and SCLE.

[0602] Abbreviations:

[0603] Systemic lupus erythematosus (SLE)

[0604] Cutaneous lupus erythematosus (CLE)

[0605] Acute cutaneous lupus erythematosus (ACLE)

[0606] Subacute cutaneous lupus erythematosus (SCLE)

[0607] Chronic cutaneous lupus erythematosus (CCLE)

[0608] Chronic discoid lupus erythematosus (CDLE) or (DLE)

[0609] Atopic dermatitis (AD) or atopic eczema

[0610] Psoriasis (PSO)

[0611] Systemic sclerosis (SSc) or scleroderma

[0612] Gene set variation analysis (GSVA)

[0613] Machine learning (ML)

[0614] Random forest (RF)

[0615] Area under the receiver operating characteristic curve (AUROC curve)

[0616] Area under the precision-recall curve (AUPR curve)

[0617] Synthetic minority oversampling technique (SMOTE)

[0618] Low-density granulocyte (LDG)

[0619] Skin-specific dendritic cell (Skin-specific DC)

[0620] Plasmacytoid dendritic cell (pDC)

[0621] Natural killer (NK) cell [0622] Germinal center (GC) B cell

[0623] Interferon (IFN)

[0624] Interferon gene signature (IGS)

[0625] Interleukin (IL)

[0626] Transforming growth factor beta (TGFβ)

[0627] Tumor necrosis factor (TNF)

[0628] Reactive Oxygen Species production (ROS production)

[0629] Tricarboxylic acid cycle (TCA)

[0630] Oxidative phosphorylation (OXPHOS)

[0631] Fatty acid alpha oxidation (FAAO)

[0632] Fatty acid beta oxidation (FABO)

[0633] Amino acid metabolism (AA metabolism)

[0634] T helper (Th)

[0635] T follicular helper (Tfh)

[0636] References (each incorporated herein by reference in its entirety):

[0637] 1. B. Tebbe, C. Orfanos, Epidemiology and socioeconomic impact of skin disease in lupus erythematosus, Lupus 6, 96-104 (1997).

[0638] 2. M. P. Maz, J. Michelle Kahlenberg, Cutaneous and systemic connections in lupus, Curr. Opin. Rheumatol. 32 (2020), doi: 10.1097/BOR.0000000000000739.

[0639] 3. L. Uva, D. Miguel, C. Pinheiro, J. P. Freitas, M. Marques Gomes, P. Filipe, Cutaneous manifestations of systemic lupus erythematosus, Autoimmune Dis. 1 (2012), doi:10.1155/2012/834291.

[0640] 4. J. Wenzel, Cutaneous lupus erythematosus: new insights into pathogenesis and therapeutic strategies, Nat. Rev. Rheumatol. 15, 519-532 (2019).

[0641] 5. S. Ribero, S. Sciascia, L. Borradori, D. Lipsker, The Cutaneous Spectrum of Lupus Erythematosus, Clin. Rev. Allergy Immunol. 53, 291-305 (2017).

[0642] 6. P. Vashisht, K. Borghoff, J. R. O’Dell, M. Hearth-Holmes, Belimumab for the treatment of recalcitrant cutaneous lupus, Lupus 26, 857-864 (2017). [0643] 7. E. F. Morand, R. Furie, Y. Tanaka, I. N. Bruce, A. D. Askanase, C. Richez, S.-C. Bae, P. Z. Brohawn, L. Pineda, A. Berglind, R. Tummala, Trial of Anifrolumab in Active Systemic Lupus Erythematosus, N. Engl. J. Med. 382, 211-221 (2020).

[0644] 8. A. Menter, B. E. Strober, D. H. Kaplan, D. Kivelevitch, E. F. Prater, B. Stoff, A. W. Armstrong, C. Connor, K. M. Cordoro, D. M. R. Davis, B. E. Elewski, J. M. Gelfand, K. B. Gordon, A. B. Gottlieb, A. Kavanaugh, M. Kiselica, N. J. Korman, D. Kroshinsky, M. Lebwohl, C. L. Leonardi, J. Lichten, H. W. Lim, N. N. Mehta, A. S. Paller, S. L. Parra, A. L. Pathy, R. N. Rupani, M. Siegel, E. B. Wong, J. J. Wu, V. Hariharan, C. A. Elmets, Joint AAD-NPF guidelines of care for the management and treatment of psoriasis with biologies, J. Am. Acad. Dermatol. 80, 1029-1072 (2019).

[0645] 9. D. Deleanu, I. Nedelea, Biological therapies for atopic dermatitis: An update (review), Exp.Ther. Med. 17, 1061-1067 (2019).

[0646] 10. A. Jabbari, M. Suarez-Farinas, J. Fuentes-Duculan, J. Gonzalez, I. Cueto, A. G. Franks, J. G. Krueger, Dominant Thl and minimal Thl7 skewing in discoid lupus revealed by transcriptomic comparison with psoriasis, J. Invest. Dermatol. 134, 87-95 (2014).

[0647] 11. B. F. Chong, L. chiang Tseng, G. A. Hosier, N. M. Teske, S. Zhang, D. R. Karp, N. J. Olsen, C. Mohan, A subset of CD163+ macrophages displays mixed polarizations in discoid lupus skin, Arthritis Res. Ther. 17, 324 (2015).

[0648] 12. A. M. S. Barron, J. C. Mantero, J. D. Ho, B. Nazari, K. L. Horback, J. Bhawan, R. Lafyatis, C. Lam, J. L. Browning, Perivascular Adventitial Fibroblast Specialization Accompanies T Cell Retention in the Inflamed Human Dermis, J. Immunol. 202, 56-68 (2019).

[0649] 13. P. Mande, B. Zirak, W. C. Ko, K. Taravati, K. L. Bride, T. Y. Brodeur, A. Deng, K. Dresser, Z. Jiang, R. Ettinger, K. A. Fitzgerald, M. D. Rosenblum, J. E. Harris, A. Marshak- Rothstein, Fas ligand promotes an inducible TLR-dependent model of cutaneous lupus-like inflammation, J. Clin. Invest. 128, 2966-2978 (2018).

[0650] 14. J. Liu, C. C. Berthier, J. M. Kahlenberg, Enhanced Inflammasome Activity in Systemic Lupus Erythematosus Is Mediated via Type I Interferon-Induced Up-Regulation of Interferon Regulatory Factor 1, Arthritis Rheumatol. 69, 1840-1849 (2017).

[0651] 15. L. C. Tsoi, G. A. Hile, C. C. Berthier, M. K. Sarkar, T. J. Reed, J. Liu, R. Uppala, M. Patrick, K. Raja, X. Xing, E. Xing, K. He, J. E. Gudjonsson, J. M. Kahlenberg, Hypersensitive IFN Responses in Lupus Keratinocytes Reveal Key Mechanistic Determinants in Cutaneous Lupus, J. Immunol. 202, 2121-2130 (2019)

[0652] 16. V. P. Werth, D. Fiorentino, B. A. Sullivan, M. J. Boedigheimer, K. Chiu, C. Wang, G. E. Arnold, M. A. Damore, J. Bigler, A. A. Welcher, C. B. Russell, D. A. Martin, J. B. Chung, Brief Report: Pharmacodynamics, Safety, and Clinical Efficacy of AMG 811, a Human Anti- Interferon-y Antibody, in Patients With Discoid Lupus Erythematosus, Arthritis Rheumatol. 69, 1028-1034 (2017).

[0653] 17. M. D. Catalina, P. Bachali, N. S. Geraci, A. C. Grammer, P. E. Lipsky, Gene expression analysis delineates the potential roles of multiple interferons in systemic lupus erythematosus, Commun. Biol. 2 (2019), doi: 10.1038/s42003-019-0382-x.

[0654] 18. K. E. Nograles, L. C. Zaba, E. Guttman- Yassky, J. Fuentes-Duculan, M. Suarez-Farinas, I. Cardinale, A. Khatcherian, J. Gonzalez, K. C. Pierson, T. R. White, C. Pensabene, I. Coats, I. Novitskaya, M. A. Lowes, J. G. Krueger, Thl7 cytokines interleukin (IL)-17 and IL-22 modulate distinct inflammatory and keratinocyte-response pathways, Br. J. Dermatol. 159, 1092-1102 (2008).

[0655] 19. Y. Asano, Systemic sclerosis, J. Dermatol. 45, 128-138 (2018).

[0656] 20. P. M. Brunner, E. Guttman-Yassky, D. Y. M. Leung, The immunology of atopic dermatitis and its reversibility with broad-spectrum and targeted therapies/. Allergy Clin. Immunol. 139, S65-S76 (2017).

[0657] 21. K. M. Kingsmore, P. Bachali, M. D. Catalina, A. R. Daamen, S. E. Heuer, R. D. Robl, A. C. Grammer, P. E. Lipsky, Altered expression of genes controlling metabolism characterizes the tissue response to immune injury in lupus, Sci. Rep. 11 (2021), doi: 10.1038/s41598-021-93034-w.

[0658] 22. S. J. Waddell, S. J. Popper, K. H. Rubins, M. J. Griffiths, P. O. Brown, M. Levin, D. A. Reiman, Dissecting Interferon-Induced Transcriptional Programs in Human Peripheral Blood Cells, PLoS One 5 (2010), doi:10.1371/joumal.pone.0009753.

[0659] 23. J. L. Sargent, A. Milano, S. Bhattacharyya, J. Varga, M. K. Connolly, H. Y. Chang, M. L. Whitfield, A TGFB -responsive gene signature is associated with a subset of diffuse scleroderma with increased disease severity, J. Invest. Dermatol. 130, 694-705 (2010).

[0660] 24. C. L. Langrish, Y. Chen, W. M. Blumenschein, J. Mattson, B. Basham, J. D. Sedgwick, T. McClanahan, R. A. Kastelein, D. J. Cua, IL-23 drives a pathogenic T cell population that induces autoimmune inflammation, J. Exp. Med. 201, 233-240 (2005).

[0661] 25. A. Blauvelt, A. Chiricozzi, The Immunologic Role of IL- 17 in Psoriasis and Psoriatic Arthritis Pathogenesis, Clin. Rev. Allergy Immunol. 55, 379-390 (2018).

[0662] 26. J. E. Hawkes, B. Y. Yan, T. C. Chan, J. G. Krueger, Discovery of the IL-23/IL-17 Signaling Pathway and the Treatment of Psoriasis, J. Immunol. 201, 1605-1613 (2018).

[0663] 27. M. D. Catalina, P. Bachali, A. E. Yeo, N. S. Geraci, M. A. Petri, A. C. Grammer, P. E. Lipsky, Patient ancestry significantly contributes to molecular heterogeneity of systemic lupus erythematosus, JCI Insight 5 (2020), doi: 10.1172/j ci. insight.140380. [0664] 28. T. Vazquez, R. Feng, K. J. Williams, V. P. Werth, Immunological and clinical heterogeneityin cutaneous lupus erythematosus/Y. J. Dermatol. 185, 481-483 (2021).

[0665] 29. J. L. Zhu, L. T. Tran, M. Smith, F. Zheng, L. Cai, J. A. James, J. M. Guthridge, B. F. Chong, Modular gene analysis reveals distinct molecular signatures for subsets of patients with cutaneous lupus erythematosus, Br. J. Dermatol. (2021), doi:l 0.1111/bjd.l 9800.

[0666] 30. L. C. Tsoi, E. Rodriguez, F. Degenhardt, H. Baurecht, U. Wehkamp, N. Volks, S. Szymczak, W. R. Swindell, M. K. Sarkar, K. Raja, S. Shao, M. Patrick, Y. Gao, R. Uppala, B. E. Perez White, S. Getsios, P. W. Harms, E. Maverakis, J. T. Elder, A. Franke, J. E. Gudjonsson, S. Wei dinger, Atopic Dermatitis Is an IL-13-Dominant Disease with Greater Molecular Heterogeneity Compared to Psoriasis, J. Invest. Dermatol. 139, 1480-1489 (2019).

[0667] 31. H. Valdimarsson, J. E. Gudjonsson, A. Johnston, H. Sigmundsdottir, H. Valdimarsson, Immunopathogenic mechanisms in psoriasis, Clin Exp Immunol 135, 1-8 (2004).

[0668] 32. L. Pasquali, A. Srivastava, F. Meisgen, K. Das Mahapatra, P. Xia, N. Xu Landen, A. Pivarcsi, E. Sonkoly, The keratinocyte transcriptome in psoriasis: Pathways related to immune responses, cell cycle and keratinization, Acta Derm. Venereol. 99, 196-205 (2019).

[0669] 33. M. B. Abdel-Naser, A. I. Liakou, R. Elewa, S. Hippe, J. Knolle, C. C. Zouboulis, Increased Activity and Number of Epidermal Melanocytes in Lesional Psoriatic Skin, Dermatology 232, 425-430 (2016).

[0670] 34. H. Jin, M. K. Oyoshi, Y. Le, T. Bianchi, S. Koduru, C. B. Mathias, L. Kumar, S. Le Bras, D. Young, M. Collins, M. J. Grusby, J. Wenzel, T. Bieber, M. Boes, L. E. Silberstein, H. C. Oettgen, R. S. Geha, IL-21R is essential for epicutaneous sensitization and allergic skin inflammation in humans and mice, J. Clin. Invest. 119, 47-60 (2009).

[0671] 35. F. GONG, Q. SU, Y. H. PAN, X. HUANG, W. H. SHEN, The emerging role of interleukin-21 in allergic diseases (Review), Biomed. Reports 1, 837-839 (2013).

[0672] 36. A. P. Sappino, I. Masouye, J. H. Saurat, G. Gabbiani, Smooth muscle differentiation in scleroderma fibroblastic cells, dm. J. Pathol. 137, 585-591 (1990).

[0673] 37. S. Domingo, C. Sole, T. Moline, B. Ferrer, J. Cortes-Hernandez, MicroRNAs in Several Cutaneous Autoimmune Diseases: Psoriasis, Cutaneous Lupus Erythematosus and Atopic DermatitisCe/A 9 (2020), doi:10.3390/cells9122656.

[0674] 38. J. D’Orazio, S. Jarrett, A. Amaro-Ortiz, T. Scott, UV radiation and the skin, Int. J. Mol. Sci. 14, 12222-12248 (2013).

[0675] 39. M. Laporte, P. Galand, D. Fokan, C. De Graef, M. Heenen, Apoptosis in established and healing psoriasis, Dermatology 200, 314-316 (2000). [0676] 40. A. Trautmann, M. Akdis, S. Klunker, K. Blaser, C. A. Akdis, Role of apoptosis in atopic dermatitis, Int. Arch. Allergy Immunol. 124, 230-232 (2001).

[0677] 41. B. Franz, B. Fritzsching, A. Riehl, N. Oberle, C. D. Klemke, J. Sykora, S. Quick, C. Stumpf, M. Hartmann, A. Enk, T. Ruzicka, P. H. Krammer, E. Suri-Payer, A. Kuhn, Low number of regulatory T cells in skin lesions of patients with cutaneous lupus erythematosus, Arthritis Rheum. 56, 1910-1920 (2007).

[0678] 42. A. H. Sawalha, K. M. Kaufman, J. A. Kelly, A. J. Adler, T. Aberle, J. Kilpatrick, E. K. Wakeland, Q. Z. Li, A. E. Wandstrat, D. R. Karp, J. A. James, J. T. Merrill, P. Lipsky, J. B. Harley, Genetic association of interleukin-21 polymorphisms with systemic lupus erythematosus, Ann. Rheum. Dis. 67, 458-461 (2008).

[0679] 43. R. Webb, J. T. Merrill, J. A. Kelly, A. Sestak, K. M. Kaufman, C. D. Langefeld, J. Ziegler, P. Robert, J. C. Edberg, R. Ramsey-goldman, M. Petri, J. D. Reveille, G. S. Alarcon, L. M. Vila, M. E. Alarcon-Riquelme, J. A. James, G. S. Gilkeson, C. O. Jacob, K. L. Moser, P. M. Gaffney, T. J. Vyse, S. K. Nath, P. Lipsky, J. B. Harley, A. H. Sawalha, A polymorphism within interleukin-21 receptor (IL21R) confers risk for systemic lupus erythematosus, Arthritis Rheumatol. 60, 2402- 2407 (2009).

[0680] 44. A. Puscas, A. Catana, C. Puscas, I. Roman, C. Vomicescu, M. Somlea, R. Orasan, Psoriasis: Association of interleukin- 17 gene polymorphisms with severity and response to treatment (Review), Exp. Ther. Med. , 875-880 (2019).

[0681] 45. A. C. Allison Billi, F. Ma, O. Plazyo, M. Gharaee-Kermani, R. Wasikowski, G. A. Hile, X. Xing, C. M. Yee, S. M. Rizvi, M. P. Maz, F. Wen, L. C. Tsoi, M. Pellegrini, R. L. Modlin, J. E. Gudjonsson, J. Michelle Kahlenberg, A. C. Billi, F. Ma, O. Plazyo, M. Gharaee-Kermani, R. Wasikowski, G. A. Hile, X. Xing, C. M. Yee, S. M. Rizvi, M. P. Maz, F. Wen, L. C. Tsoi, M. Pellegrini, R. L. Modlin, J. E. Gudjonsson, J. Michelle Kahlenberg, Non-lesional and Lesional Lupus Skin Share Inflammatory Phenotypes that Drive Activation of CD 16+ Dendritic Cells. BioRxiv (2021 ), doi : 10.1101 /2021.09.17.460124.

[0682] 46. E. Der, H. Suryawanshi, P. Morozov, M. Kustagi, B. Goilav, S. Ranabathou, P. Izmirly, R. Clancy, H. M. Belmont, M. Koenigsberg, M. Mokrzycki, H. Rominieki, J. A. Graham, J. P. Rocca, N. Bomkamp, N. Jordan, E. Schulte, M. Wu, J. Pullman, K. Slowikowski, S. Raychaudhuri, J. Guthridge, J. James, J. Buyon, T. Tuschl, C. Putterman, J. Anolik, W. Apruzzese, A. Arazi, C. Berthier, M. Brenner, J. Buyon, R. Clancy, S. Connery, M. Cunningham, M. Dall ’Era, A. Davidson, E. Der, A. Fava, C. Fonseka, R. Furie, D. Goldman, R. Gupta, J. Guthridge, N. Hacohen, D. Hildeman, P. Hoover, R. Hsu, J. James, R. Kado, K. Kalunian, D. Kamen, M. Kretzler, H. Maecker, E. Massarotti, W. McCune, M. McMahon, M. Park, F. Payan-Schober, W. Pendergraft, M. Petri, M. Pichavant, C. Putterman, D. Rao, S. Raychaudhuri, K. Slowikowski, H. Suryawanshi, T. Tuschl, P. Utz, D. Waguespack, D. Wofsy, F. Zhang, Tubular cell and keratinocyte single-cell transcriptomics applied to lupus nephritis reveal type I IFN and fibrosis relevant pathways, Nat. Immunol. (2019), doi: 10.1038/s41590-019-0386-l.

[0683] 47. T. M. Li, K. R. Veiga, N. Schwartz, Y. Chinenov, D. J. Oliver, J. Lora, A. Jabbari, Y. Liu, W. D. Shipman, M. J. Sandoval, I. F. Sollohub, W. G. Ambler, M. Rashighi, J. G. Krueger, N. Anandasabapathy, C. P. Blobel, T. T. Lu, Type I interferon modulates Langerhans cell ADAMI 7 to promote photosensitivity in lupus, BioRxiv (2021), doi: l 0.1101/2021.08.18.456792.

[0684] 48. K. A. Kirou, C. Lee, S. George, K. Louca, M. G. E. Peterson, M. K. Crow, Activation of the interferon-a pathway identifies a subgroup of systemic lupus erythematosus patients with distinct serologic features and active disease, Arthritis Rheum. 52, 1491-1503 (2005).

[0685] 49. Q. Z. Li, J. Zhou, Y. Lian, B. Zhang, V. K. Branch, F. Carr-Johnson, D. R. Karp, C. Mohan, E. K. Wakeland, N. J. Olsen, Interferon signature gene expression is correlated with autoantibody profiles in patients with incomplete lupus syndromes, Clin. Exp. Immunol. 159, 281— 291 (2010).

[0686] 50. T. Iwamoto, J. Dorschner, M. Jolly, X. Huang, T. B. Niewold, Associations between type I interferon and antiphospholipid antibody status differ between ancestral backgrounds, Lupus Sci. Med. 5, 1-4 (2018).

[0687] 51. C. C. Berthier, L. C. Tsoi, T. J. Reed, J. N. Stannard, E. M. Myers, R. Namas, X. Xing, S. Lazar, L. Lowe, M. Kretzler, J. E. Gudjonsson, J. M. Kahlenberg, Molecular Profiling of Cutaneous Lupus Lesions Identifies Subgroups Distinct from Clinical Phenotypes, J. Clin. Med. 8, 1244 (2019).

[0688] 52. L. E. Tomalin, C. B. Russell, S. Garcet, D. A. Ewald, P. Klekotka, A. Nirula, H. Norsgaard, M. Suarez-Farinas, J. G. Krueger, Short-term transcriptional response to IL- 17 receptor- Aantagonism in the treatment of psoriasis, J. Allergy Clin. Immunol. 145, 922-932 (2020).

[0689] 53. M. Robert, P. Miossec, Interleukin- 17 and lupus: enough to be a target? For which patients?, Lupus 29, 6-14 (2020).

[0690] 54. B. Ungar, A. B. Pavel, R. Li, G. Kimmel, J. Nia, P. Hashim, H. J. Kim, M. Chima, A. S. Vekaria, Y. Estrada, H. Xu, X. Peng, G. K. Singer, D. Baum, Y. Mansouri, M. Taliercio, E.

Guttman- Yassky, Phase 2 randomized, double-blind study of IL- 17 targeting with secukinumab in atopic dermatitis, J. Allergy Clin. Immunol. 147, 394-397 (2021).

[0691] 55. A Study to Assess the Safety and Efficacy of Secukinumab in Alleviating Symptoms of Discoid Lupus Erythematosus, U.S. Natl. Libr. Med. Clin. Trials (2021), doi: 10.31525/ctl - nct03866317. [0692] 56. S. Tyring, A. Gottlieb, K. Papp, K. Gordon, C. Leonardi, A. Wang, D. Lalla, M. Woolley, A. Jahreis, R. Zitnik, D. Celia, R. Krishnan, Etanercept and clinical outcomes, fatigue, and depression in psoriasis: double-blind placebo-controlled randomisedphase III trial, Lancet 367, 29- 35 (2006).

[0693] 57. K. Reich, F. O. Nestle, K. Papp, J. P. Ortonne, R. Evans, C. Guzzo, S. Li, L. T. Dooley, C. E. M. Griffiths, Infliximab induction and maintenance therapy for moderate-to-severe psoriasis: A phase III, multicentre, double-blind trial, Lancet 366, 1367-1374 (2005).

[0694] 58. A. Menter, S. K. Tyring, K. Gordon, A. B. Kimball, C. L. Leonardi, R. G. Langley, B. E. Strober, M. Kaul, Y. Gu, M. Okun, K. Papp, Adalimumab therapy for moderate to severe psoriasis: A randomized, controlled phase III trial, J. Am. Acad. Dermatol. 58, 106-115 (2008).

[0695] 59. A. Blauvelt, K. Reich, M. Lebwohl, D. Burge, C. Arendt, L. Peterson, J. Drew, R. Rolleri A. B. Gottlieb, Certolizumab pegol for the treatment of patients with moderate-to-severe chronic plaque psoriasis: pooled analysis of week 16 data from three randomized controlled trials, J.Eur. Acad. Dermatology Venereol. 33, 546-552 (2019).

[0696] 60. A. Lorenzo-Vizcaya, D. A. Isenberg, The use of anti-TNF-alpha therapies for patients with systemic lupus erythematosus. Where are we now?, Expert Opin. Biol. Ther. 21, 639-647 (2021).

[0697] 61. A. Jacobi, C. Antoni, B. Manger, G. Schuler, M. Hertl, Infliximab in the treatment of moderate to severe atopic dermatitis., J. Am. Acad. Dermatol. 52, 522-526 (2005).

[0698] 62. N. Cassano, F. Loconsole, C. Coviello, G. A. Vena, CASE REPORT INFLIXIMAB IN RECALCITRANT SEVERE ATOPIC ECZEMA ASSOCIATED WITH CONTACT ALLERGY (2006).

[0699] 63. M. Yuzaiful, M. Yusof, M. Wittmann, C. Fernandez, D. Wilson, S. Edward, G. Abignano, A. Alase, P. Laws, M. Goodfield, P. Emery, E. Vita, TARGETED THERAPY USING INTRADERMAL INJECTION OF ETANERCEPT FOR REMISSION INDUCTION IN DISCOID LUPUS ERYTHEMATOSUS (TARGET-DLE): RESULTS FROM A PROOF-OF-CONCEPT PHASE II TRIAL, Lupus Sci. Med. 6, A1-A227 (2019).

[0700] 64. Targeted Therapy Using Intradermal Injection of Etanercept for Remission Induction in Discoid Lupus Erythematosus (TARGET-DLE ), U.S. Natl. Libr. Med. Clin. Trials (2019).

[0701] 65. A. B. Gottlieb, A. M. Goldminz, Ustekinumab for psoriasis and psoriatic arthritis, J. Rheumatol. 39, 86-89 (2012). [0702] 66. A Study of Ustekinumab in Participants With Active Systemic Lupus Erythematosus (2021;https://clinicaltrials.gov/ct2/show/NCT03517722?term=u stekinumab&cond=lupus&draw=2& rank=3).

[0703] 67. Janssen Pharmaceuticals, Janssen announces discontinuation of Phase 3 LOTUS studyevaluating Ustekinumab in systemic lupus erythematosus, Janssen Pharm. (2020) (available at https://www.jnj.com/janssen-announces-discontinuation-of-pha se-3-lotus-study-evaluating- ustekinumab-in-systemic-lupus-erythematosus).

[0704] 68. R. F. van Vollenhoven, B. H. Hahn, G. C. Tsokos, C. L. Wagner, P. Lipsky, Z. Touma, V. P. Werth, R. M. Gordon, B. Zhou, B. Hsu, M. Chevrier, M. Triebel, J. L. Jordan, S. Rose, Efficacy and safety of ustekinumab, an IL- 12 and IL-23 inhibitor, in patients with active systemic lupus erythematosus: results of a multicentre, double-blind, phase 2, randomised, controlled study, Lancet 392, 1330-1339 (2018).

[0705] 69. W. D. Shipman, S. Chyou, A. Ramanathan, P. M. Izmirly, S. Sharma, T. Pannellini, D. C. Dasoveanu, X. Qing, C. M. Magro, R. D. Granstein, M. A. Lowes, E. G. Pamer, D. H. Kaplan, J. E. Salmon, B. J. Mehrara, J. W. Young, R. M. Clancy, C. P. Blobel, T. T. Lu, A protective Langerhans cell keratinocyte axis that is dysfunctional in photosensitivity, Sci. Transl. Med. 10 (2018), doi : 10.1126/ sci tr ansimed. aap9527.

[0706] 70. S. Tian, J. G. Krueger, K. Li, A. Jabbari, C. Brodmerkel, M. A. Lowes, M. Suarez- Farinas, Meta- Analysis Derived (MAD) Transcriptome of Psoriasis Defines the “Core” Pathogenesis of Disease, PLoS One 7 (2012), doi:10.1371/joumal.pone.0044274.

[0707] 71. L. Mobus, E. Rodriguez, I. Harder, D. Stolzl, N. Boraczynski, S. Gerdes, A. Kleinheinz, S. Abraham, A. Heratizadeh, C. Handrick, E. Haufe, T. Werfel, J. Schmitt, S. Weidinger, Atopic dermatitis displays stable and dynamic skin transcriptome signatures, J. Allergy Clin. Immunol. 147, 213-223 (2021).

[0708] 72. L. C. Tsoi, M. T. Patrick, S. Shuai, M. K. Sarkar, S. Chi, B. Ruffino, A. C. Billi, X. Xing, R. Uppala, C. Zang, J. Fullmer, Z. He, E. Maverakis, N. N. Mehta, B. E. Perez White, S. Getsios, Y. Helfrich, J. J. Voorhees, J. M. Kahlenberg, S. Weidinger, J. E. Gudjonsson, Cytokine responses in nonlesional psoriatic skin as clinical predictor to anti-TNF agents, J. Allergy Clin. Immunol. 75, 15-18 (2021).

[0709] 73. M. K. Sarkar, G. A. Hile, L. C. Tsoi, X. Xing, J. Liu, Y. Liang, C. C. Berthier, W. R. Swindell, M. T. Patrick, S. Shao, P. S. Tsou, R. Uppala, M. A. Beamer, A. Srivastava, S. L. Bielas, P. W. Harms, S. Getsios, J. T. Elder, J. J. Voorhees, J. E. Gudjonsson, J. M. Kahlenberg, Photosensitivity and type i IFN responses in cutaneous lupus are driven by epidermal-derived interferon kappa, Ann. Rheum. Dis. 77, 1653-1664 (2018). [0710] 74. A. C. Billi, M. Gharaee-Kermani, J. Fullmer, L. C. Tsoi, B. D. Hill, D. Gruszka, J. Ludwig, X. Xing, S. Estadt, S. J. Wolf, S. M. Rizvi, C. C. Berthier, J. B. Hodgin, M. A. Beamer, M. K. Sarkar, Y. Liang, R. Uppala, S. Shao, C. Zeng, P. W. Harms, M. E. Verhaegen, J. J. Voorhees, F. Wen, N. L. Ward, A. A. Dlugosz, J. M. Kahlenberg, J. E. Gudjonsson, The female-biased factorVGLL3 drives cutaneous and systemic autoimmunity, JCI Insight 4 (2019), doi: 10.1172/j ci. insight.127291.

[0711] 75. C. Brodmerkel, K. Li, S. Garcet, K. Hayden, A. Chiricozzi, I. Novitskaya, J. Fuentes- Duculan, M. Suarez-Farinas, K. Campbell, J. G. Krueger, Modulation of inflammatory gene transcripts in psoriasis vulgaris: Differences between ustekinumab and etanercept, J. Allergy Clin. Immunol. 143, 1962-1965 (2019).

[0712] 76. E. Guttman- Yassky, R. Bissonnette, B. Ungar, M. Suarez-Farinas, M. Ardeleanu, H. Esaki, M. Suprun, Y. Estrada, H. Xu, X. Peng, J. I. Silverberg, A. Menter, J. G. Krueger, R. Zhang, U Chaudhry, B. Swanson, N. M. H. Graham, G. Pirozzi, G. D. Yancopoulos, J. D. Jennifer, Dupilumab progressively improves systemic and cutaneous abnormalities in patients with atopic dermatitis, J. Allergy Clin. Immunol. 143, 155-172 (2019).

[0713] 77. S. Assassi, W. R. Swindell, M. Wu, F. D. Tan, D. Khanna, D. E. Furst, D. P. Tashkin, R. R. Jahan-Tigh, M. D. Mayes, J. E. Gudjonsson, J. T. Chang, Dissecting the heterogeneity of skin gene expression patterns in systemic sclerosis, Arthritis Rheumatol. 67, 3016-3026 (2015).

[0714] 78. B. Skaug, D. Khanna, W. R. Swindell, M. E. Hinchchff, T. M. Freeh, V. D. Steen, F. N. Hant, J. K. Gordon, A. A. Shah, L. Zhu, J. Zheng, J. L. Browning, A. M. S. S. Barron, M. Wu, S. Visvanathan, P. Baum, J. M. Franks, M. L. Whitfield, V. K. Shanmugam, R. T. Domsic, F. V. Castelino, E. J. Bernstein, N. Wareing, M. A. Lyons, J. Ying, J. Charles, M. D. Mayes, S. Assassi, Global skin gene expression analysis of early diffuse cutaneous systemic sclerosis shows a prominent innate and adaptive inflammatory profile, Ann Rheum Dis 79, 379-386 (2020).

[0715] 79. E. Eisenberg, E. Y. Levanon, Human housekeeping genes, revisitedTrends Genet. 29, 569-574 (2013).

[0716] 80. A. M. Bolger, M. Lohse, B. Usadel, Trimmomatic: A flexible trimmer for Illumina sequence data, Bioinformatics 30, 2114-2120 (2014).

[0717] 81. A. Dobin, C. A. Davis, F. Schlesinger, J. Drenkow, C. Zaleski, S. Jha, P. Batut, M. Chaisson, T. R. Gingeras, STAR: Ultrafast universal RNA-seq aligner, Bioinformatics 29, 15-21 (2013).

[0718] 82. A. Tarasov, A. J. Vilella, E. Cuppen, I. J. Nijman, P. Prins, Sambamba: fast processing ofNGS alignment formats, Bioinformatics 31, 2032-2034 (2015). [0719] 83. Y. Liao, G. K. Smyth, W. Shi, FeatureCounts: An efficient general purpose program forassigning sequence reads to genomic features, Bioinformatics 30, 923-930 (2014).

[0720] 84. Y. Liao, G. K. Smyth, W. Shi, The Subread aligner: Fast, accurate and scalable read mapping by seed-and-vote, Nucleic Acids Res. 41 (2013), doi: 10.1093/nar/gkt214.

[0721] 85. M. I. Love, W. Huber, S. Anders, Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2, Genome Biol. 15, 1-21 (2014).

[0722] 86. S. Hanzelmann, R. Castelo, J. Guinney, GSVA: gene set variation analysis for microarray and RNA-Seq data (2013; http://www.biomedcentral.com/1471- 2105/14/7http://www.bioconductor.org.Background).

[0723] 87. C. Cheadle, M. P. Vawter, W. J. Freed, K. G. Becker, Analysis of Microarray Data Using Z Score Transformation (; http://www.grc.nia.nih.gov/).

[0724] 88. J. Menche, E. Guney, A. Sharma, P. J. Branigan, M. J. Loza, F. Baribaud, R. Dobrin, A.

L. Barabasi, Integrating personalized gene expression profiles into predictive disease-associated gene pools, npj Syst. Biol. Appl. 3 (2017), doi: 10.1038/s41540-017-0009-0.

[0725] 89. L. Fagerberg, B. M. Hallstrom, P. Oksvold, C. Kampf, D. Djureinovic, J. Odeberg, M. Habuka, S. Tahmasebpoor, A. Danielsson, K. Edlund, A. Asplund, E. Sjostedt, E. Lundberg, C. A. K. Szigyarto, M. Skogs, J. Ottosson Takanen, H. Berling, H. Tegel, J. Mulder, P. Nilsson, J. M. Schwenk, C. Lindskog, F. Danielsson, A. Mardinoglu, A. Sivertsson, K. Von Feilitzen, M. Forsberg,

M. Zwahlen, I. Olsson, S. Navani, M. Huss, J. Nielsen, F. Ponten, M. Uhlen, Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics, Mol. Cell. Proteomics 13, 397-406 (2014).

[0726] 90. M. Uhlen, L. Fagerberg, B. M. Hallstrom, C. Lindskog, P. Oksvold, A. Mardinoglu, A. Sivertsson, C. Kampf, E. Sjostedt, A. Asplund, I. M. Olsson, K. Edlund, E. Lundberg, S. Navani, C. A. K. Szigyarto, J. Odeberg, D. Djureinovic, J. O. Takanen, S. Hober, T. Alm, P. H. Edqvist, H. Berling, H. Tegel, J. Mulder, J. Rockberg, P. Nilsson, J. M. Schwenk, M. Hamsten, K. Von Feilitzen, M. Forsberg, L. Persson, F. Johansson, M. Zwahlen, G. Von Heijne, J. Nielsen, F. Ponten, Tissue-based map of the human proteome, Science (80-. ). 347, 394 (2015)

[0727] 91. A. Gazel, P. Ramphal, M. Rosdy, B. De Wever, C. Tomier, N. Hosein, B. Lee, M. Tomic- Canic, M. Blumenberg, Transcriptional Profiling of Epidermal Keratinocytes: Comparison of Genes Expressed in Skin, Cultured Keratinocytes, and Reconstituted Epidermis, Using Large DNA Microarrays, J. Invest. Dermatol. 121, 1459-1468 (2003).

[0728] 92. Q. Xu, P. K. Majumder, K. Ross, Y. Shim, T. R. Golub, M. Loda, W. R. Sellers, Identification of prostate cancer modifier pathways using parental strain expression mapping, Proc. Natl. Acad. Sci. U. S. A. 104, 17771-17776 (2007). [0729] 93. A. T. J. Wierenga, A. Cunningham, A. Erdem, N. V. Lopera, A. Z. Brouwers-Vos, M. Pruis, A. B. Mulder, U. L. Gunther, J. H. A. Martens, E. Vellenga, J. J. Schuringa, EHFl/2-exerted control over glycolytic gene expression is not functionally relevant for glycolysis in human leukemic stem/progenitor cells, Cancer Metab. 7, 1-17 (2019).

[0730] 94. R. Mansourian, D. M. Mutch, N. Antille, J. Aubert, P. Fogel, J. M. Le Goff, J. Moulin, A. Petrov, A. Rytz, J. J. Voegel, M. A. Roberts, The Global Error Assessment (GEA) model for the selection of differentially expressed genes in microarray data, Bioinformatics 20, 2726-2737 (2004).

[0731] 95. T. Banno, A. Gazel, M. Blumenberg, Pathway-specific profiling identifies the NF-KB- dependent tumor necrosis factor α-regulated genes in epidermal keratinocytes, J. Biol. Chem. 280, 18973-18980 (2005).

[0732] 96. K. Wolk, E. Witte, E. Wallace, W. D. Dbcke, S. Kunz, K. Asadullah, H. D. Volk, W. Sterry, R. Sabat, IL-22 regulates the expression of genes responsible for antimicrobial defense, cellular differentiation, and mobility in keratinocytes: A potential role in psoriasis, Eur. J. Immunol. 36, 1309-1323 (2006).

[0733] 97. C. F. Cheng, J. Fan, B. Bandyopahdhay, D. Mock, S. Guan, M. Chen, D. T. Woodley, W. Li, Profiling motility signal-specific genes in primary human keratinocytes, J. Invest. Dermatol.

128, 1981-1990 (2008).

[0734] 98. F. Shen, S. L. Gaffen, Structure-function relationships in the IL-17 receptor: Implications for signal transduction and therapyCytokine 41, 92-104 (2008).

[0735] 99. S. Yano, T. Banno, R. Walsh, M. Blumenberg, Transcriptional responses of human epidermal keratinocytes to cytokine interleukin- 1, J. Cell. Physiol. 214, 1-13 (2008).

[0736] 100. Y. Yao, L. Richman, C. Morehouse, M. de los Reyes, B. W. Higgs, A. Boutrin, B. White, A. Coyle, J. Krueger, P. A. Kiener, B. Jallal, Type I interferon: Potential therapeutic target for psoriasis?, PLoS One 3 (2008), doi: 10.1371/joumal.pone.0002737.

[0737] 101. M. Suarez-Farinas, M. A. Lowes, L. C. Zaba, J. G. Krueger, Evaluation of the Psoriasis Transcriptome across Different Studies by Gene Set Enrichment Analysis (GSEA), PLoS One 5 (2010), doi: 10.1371/joumal. pone.0010247.

[0738] 102. A. Chiricozzi, E. Guttman- Yassky, M. Suarez-Farias, K. E. Nograles, S. Tian, I. Cardinale, S. Chimenti, J. G. Krueger, Integrative responses to IL-17 and TNF-a in human keratinocytes account for key inflammatory pathogenic circuits in psoriasis, J. Invest. Dermatol. 131, 677-687 (2011).

[0739] 103. H. Fujita, The role of IL-22 and Th22 cells in human skin diseases, J. Dermatol. Sci. 'll, 3- 8 (2013). [0740] 104. S. Hirakawa, R. Saito, H. Ohara, R. Okuyama, S. Aiba, Dual Oxidase 1 Induced by Th2 Cytokines Promotes STAT6 Phosphorylation via Oxidative Inactivation of Protein Tyrosine Phosphatase IB in Human Epidermal Keratinocytes, J. Immunol. 186, 4762-4770 (2011).

[0741] 105. V. Ramirez-Carrozzi, A. Sambandam, E. Luis, Z. Lin, S. Jeet, J. Lesch, J. Hackney, J. Kim, M. Zhou, J. Lai, Z. Modrusan, T. Sai, W. Lee, M. Xu, P. Caplazi, L. Diehl, J. De Voss, M. Balazs, L. Gonzalez, H. Singh, W. Ouyang, R. Pappu, IL-17C regulates the innate immune function of epithelial cells in an autocrine manner, Nat. Immunol. 12, 1159-1166 (2011).

[0742] 106. W. R. Swindell, A. Johnston, J. J. Voorhees, J. T. Elder, J. E. Gudjonsson, Dissecting the psoriasis transcriptome: Inflammatory- and cytokine-driven gene expression in lesions from 163 patients, BMC Genomics 14 (2013), doi:10.1186/1471-2164-14-527.

[0743] 107. B. Li, L. C. Tsoi, W. R. Swindell, J. E. Gudjonsson, T. Tejasvi, A. Johnston, J. Ding, P. E. Stuart, X. Xing, J. J. Kochkodan, J. J. Voorhees, H. M. Kang, R. P. Nair, G. R. Abecasis, J. T. Elder, Transcriptome analysis of psoriasis in a large case-control sample: RNA-seq provides insights into disease mechanisms, J. Invest. Dermatol. 134, 1828-1838 (2014).

[0744] 108. T. Duhen, C. Ni, D. Campbell, Identification of a specific gene signature in human Thl/17 cells (BA13P.126), J. Immunol. 192, 177.12 LP- 177.12 (2014).

[0745] 109. B. Hollbacher, T. Duhen, S. Motley, M. M. Klicznik, I. K. Gratz, D. J. Campbell, Transcriptomic profiling of human effector and regulatory T cell subsets identifies predictive population signatures, Immunohorizons 4, 585-596 (2021).

[0746] 110. J. B. Wing, Y. Kitagawa, M. Locci, H. Hume, C. Tay, T. Morita, Y. Kidani, K. Matsuda, T. Inoue, T. Kurosaki, S. Crotty, C. Coban, N. Ohkura, S. Sakaguchi, A distinct subpopulation of CD25- T-follicular regulatory cells localizes in the germinal centers, Proc. Natl. Acad. Sci. U. S. A. 114, E6400-E6409 (2017).

[0747] 111. N. Kutukculer, E. Azarsiz, G. Aksu, N. E. Karaca, CD4+CD25+Foxp3+ T regulatory cells, Thl (CCR5, IL-2, IFN-y) and Th2 (CCR4, IL-4, 11-13) type chemokine receptors and intracellular cytokines in children with common variable immunodeficiency, Int. J. Immunopathol. Pharmacol.29, 241-251 (2016).

[0748] 112. L. Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone, Classification and Regression Trees (1984).

[0749] 113. L. Breiman, Random forests, Mach. Learn. 45, 5-32 (2001).

[0750] 114. R. Blagus, L. Lusa, SMOTE for high- dimensional class-imbalanced data, BMC Bioinfiormatics 14 (2013), doi: 10.1186/1471-2105-14-106. [0751] 115. L. V. Hedges, Distribution Theory for Glass’s Estimator of Effect size and Related Estimators, J. Educ. Stat. 6, 107-128 (1981).

[0752] 116. Z. Gu, R. Eils, M. Schlesner, Complex heatmaps reveal patterns and correlations in multidimensional genomic data, Bioinformatics 32, 2847-2849 (2016).

[0753] 117. B. A. Martinez, S Shrotri, K. Kingsmore, P. Bachali, A. Grammer, P. Lipsky, Machine learning reveals distinct gene signature profiles in lesional and nonlesional regions of inflammatory skin diseases, Science Advances 8, eabn4776 (29 April 2022), including supplementary materials, incorporated by reference herein in its entirety.

[0754] 118. Kingsmore, Kathryn M., Prathyusha Bachali, Michelle D. Catalina, Andrea R. Daamen, Sarah E. Heuer, Robert D. RobI, Amrie C. Grammer, and Peter E. Lipsky. Altered expression of genes controlling metabolism characterizes the tissue response to immune injury in lupus. Scientific Reports 11, no. 1 (2021 Jul 20): 14789, including supplementary materials, incorporated by reference herein in its entirety.

[0755] 119. Kingsmore et al. Metabolic dysregulation characterizes the tissue response to immune injury in systemic lupus erythematosus and inflammatory skin diseases. Lupus 21st Century 2021; incorporated by reference herein in its entirety.

[0756] 120. Martinez et al. Comparative analysis of inflammatory skin diseases reveals shared and distinct gene signature profiles in lesional and nonlesional regions. #507 Society for Investigative Dermatology annual meeting May 18-21, 2022: incorporated by reference herein in its entirety.

Example 3: The transcriptomic landscape of nephritic kidneys reveals mechanisms for end organ resistance to damage in lupus-prone mice

[0757] Pathologic inflammation is a major driver of kidney damage in lupus nephritis (LN), but the immune mechanisms of disease progression and risk factors for end organ damage are poorly understood. To characterize molecular profiles through the development of LN, we carried out gene expression analysis of micro-dissected kidneys from lupus-prone NZM2328 mice. We identified a continuum of inflammatory processes associated with the progression from acute inflammatory to chronic destructive disease initiated in the glomeruli and progressing to the tubulointerstitium. We examined male mice and the congenic NZM2328.R27 strain, both of which are resistant to the development of chronic nephritis and end organ damage, as a means to define pathogenic processes. Male mice exhibited minimal immune infiltration in the glomeruli resulting in non-progressive renal pathology. Immune infiltrates in the glomeruli of R27 mice expressed a regulatory gene signature and especially a dominance of M2-like macrophages. Moreover, R27 mice manifested an enhanced kidney tubule signature, with evidence of increased mitochondrial activity, glycolysis, and lipid metabolism, consistent with a functional resistance to cellular damage. The robust tubule signature was associated with the absence of an immune/inflammatory gene signature. Numerous genes in the R27 genetic region were upregulated in NZM2328 nephritic kidneys, that could contribute to the protective effect of this interval on the evolution of LN.

[0758] Systemic lupus erythematosus (SLE) is an autoimmune disorder that can affect a variety of tissues, including the kidney (1). Lupus nephritis (LN) is one of the most severe organ manifestations of SLE and affects approximately 40% of adult lupus patients with 10-20% of patients developing end-stage renal disease (ESRD) (2). Disease is thought to initiate in the kidney glomeruli with immune complex (IC) deposition and complement activation leading to the release of damage associated molecular patterns (DAMPS), cytokine production, and the infiltration of inflammatory cells that amplify and sustain inflammation (3, 4). Damage to the kidney glomeruli promotes ischemic damage and chronic hypoxia, compromising the downstream blood supply to the tubulointerstitium (TI) and reducing tubule cell viability, which serve as prognostic markers for the development of ESRD (5-7).

[0759] In current practice, the severity of lupus nephritis pathogenesis is determined by histological classification, which is used to drive therapeutic decisions and assess the potential for terminal kidney damage (8-10). However, despite advances in classification, there remains no clear indication of factors controlling the conversion of acute to chronic nephritis and no proven treatments to prevent ESRD (11-13). Therefore, there remains a need to understand the risk factors for chronic disease and the stages of inflammation leading to ESRD in greater detail.

[0760] Previous studies established the NZM2328 lupus-prone mouse strain as a model for human proliferative glomerulonephritis (GN) with severe IC-mediated nephritis and early mortality predominantly affecting female mice (14-16). These studies determined that disease in female NZM2328 mice presents in two stages termed acute GN (AGN) with pathology largely confined to the glomerulus, and chronic (GN) in which inflammation and tissue damage are also found amongst and between the tubules (15, 16). Each stage was associated with a genetic locus of chromosome 1 and the Agnzl and Cgnzl regions were found to be critical for the development of AGN and CGN respectively. In addition, the NZM2328.LclR27 (R27) recombinant strain was generated by replacing the Cgnzl region of NZM2328 with that from the C57BL/J strain, such that female R27 mice develop AGN but do not progress to CGN. Similarly, male NZM2328 mice develop a milder, acute form of nephritis but do not exhibit severe proteinuria or progress to chronic disease (17, 18). Despite these clear clinical distinctions, the nature of the immune/inflammatory response driving AGN and the contribution of tissue as well as inflammatory cells and processes to the development of CGN resulting in renal failure and early death have not been described. [0761] The heterogeneity in disease presentation among LN patients and difficulty in predicting therapeutic responses from histology alone have highlighted the utility of molecular profiling to improve classification of lupus kidney pathology (13, 19). Here, to understand the pathogenesis of lupus nephritis and especially the relationship between acute and chronic GN in greater detail, we utilized transcriptome analysis to define the stages of GN in NZM2328 mice and identify pathologic immune populations and processes associated with disease progression. In addition, we identified distinct mechanisms of resistance to chronic disease based on differences in gender and genetics with implications for elucidating risk factors for development of ESRD in human lupus patients.

Results

[0762] Transcriptional profiling uncovers immune populations present at the onset of GN in NZM2328 mice. To establish the inflammatory environment in the kidney at disease onset, we analyzed the transcriptomes of micro-dissected glomeruli and tubulointerstitial tissue from the kidneys of female NZM2328 mice. Tissues from 8-9 week-old (CTL) mice were used as a control for 36 week-old (AGN) mice as characterized in our previous studies (14-16). We carried out Gene Set Variation Analysis (GSVA) (20) using a set of curated gene sets to analyze gene expression from CTL and AGN mice and to identify immunologic gene signature enrichment at the acute stage (FIGs. 66A-D, Tables 19-1 to 19-36). The glomeruli of AGN mice were enriched for gene signatures of a number of pathologic immune cell types, including myeloid cells, Ml macrophages (M(|)s), antigen presenting cells (APCs), CD8 T cells, and T follicular helper (Tfh) cells (FIG. 66A). In addition, genes encoding immune cell receptors including pattern recognition receptors (PRRs) as well as major histocompatibility complex (MHC) class I and II were significantly elevated in AGN glomeruli. The tubulointerstitium (TI) of AGN mice was enriched for many of the same immune signatures including myeloid cells, Ml MΦs , APCs, and MHC class II as well as the IG chain signature indicative of the presence of a plasma cell infiltrate (FIG. 66B). However, despite the presence of signatures indicative of immune cell infiltrates, gene signatures of podocytes in the glomeruli (FIG. 66C) and of tubule cells in the TI (FIG. 66D) were not significantly different than CTL. Thus, the kidneys of mice with AGN are enriched for pro-inflammatory, and predominantly innate immune cell gene signatures with no evidence of damage to the kidney cells.

[0763] Renal disease of NZM2328 mice is characterized by escalating stages of inflammation.

To identify different stages of GN in the kidneys of female NZM2328 mice, we carried out histological studies at regular intervals throughout disease progression (FIGs. 67A-D; Table 17). Tissues from young mice, before disease development and without evidence of kidney pathology were used as a control for diseased mice (FIG. 67A). At the AGN stage, glomeruli were increased in size with evidence of mesangial expansion, immune cell infiltration, and immune complex deposition including IgG, C3, and anti-nuclear antibody (ANA) deposits (FIG. 67B). There were no changes to tubule cells of AGN mice and they exhibited mild immune cell infiltration in the interstitium. We identified an intermediate stage of disease progression termed transitional GN (TGN) at which, like the AGN stage, glomeruli exhibited mesangial expansion and immune cell infiltration, but levels of IgG and C3 deposition as well as serum levels of anti-DNA antibodies were elevated over AGN mice (FIG. 67C). The interstitium of TGN mice had more inflammatory cells than at the AGN stage and tubular cells showed some dilation and atrophy. However, tubule damage was not evident. At the CGN stage, mice exhibited glomerular sclerosis, fibrosis with interstitial inflammation, and the highest level of immune complex deposition as compared to earlier disease stages (FIG. 67D). In CGN stage mice, > 80% of tubular cells had tubular dilation with increased evidence of atrophy and tubular casts as compared to the TGN stage.

[0764] Transcriptomic analysis reveals distinct immune profiles of acute, transitional, and chronic GN in NZM2328 mice. To establish immune profiles of the stages of LN, we compared the transcriptomes of micro-dissected glomeruli and tubulointerstitial tissue from pre-disease CTL mice with mice with progressively more severe stages of disease (AGN: 36 weeks old; TGN: 36-37 weeks old; CGN: 39 weeks old). We carried out GSVA using a battery of informative gene sets to analyze gene expression from each cohort and to determine the immune cells and inflammatory pathways enriched through the development of CGN (FIGs. 67E-F & 68A-B). In the glomerulus, the AGN stage was characterized by enrichment of Tfh cell, APC, and CD8 T cell signatures in conjunction with PRR, MHC class II, and apoptosis gene signatures (FIG. 67E). No other immune/inflammatory pathways were significantly enriched when compared with later-stage cohorts. The immune profile of the glomerulus of TGN mice reflected an intermediate stage of renal disease and the peak of inflammatory signature enrichment. We found increased enrichment of gene signatures of germinal center (GC) B cells, myeloid cells, and MΦs , including both Ml and M2 subsets, as well as signatures of interferon (IFN) stimulated genes, MHC class I, the cell cycle, and the Hifla signaling pathway (FIG. 67E). In addition, the inflammatory signatures enriched in AGN mice were increased further at the transitional stage. The glomeruli of CGN mice were enriched for platelets and WNT signaling and de-enriched for gene signatures of mitochondrial function and amino acid metabolism (FIG. 67E). Along with the enrichment of gene signatures of inflammatory cells and pathways, we also found evidence of kidney damage in TGN and CGN mice with de-enrichment of the gene signature for podocytes (FIG. 67F).

[0765] Gene expression results from the TI of NZM2328 kidneys showed a progressive pattern of inflammation and kidney cell loss. The TI of AGN showed no abnormalities except for an increase in IG chains, indicative of plasma cells, and no abnormalities in tissue cell gene expression. (FIGs. 68A&B). In contrast, The TI regions of TGN kidneys were enriched for numerous immune and inflammatory signatures, including Thl7 cells, PRRs, MHC class I and II, myeloid cells, and Ml and M2 MΦs (FIG. 68A). The TI of chronic stage mice exhibited enrichment of IFN stimulated genes, the cell cycle, and WNT signaling as well as decreases in gene signatures for mitochondria, amino acid metabolism, and oxidative phosphorylation (FIG. 68A). We found further indicators of damage to the kidney tubules in chronic stage mice with decreases in kidney tubule cell gene signatures (FIG. 68B) that correlated with significant increases in the expression of the kidney tubule damage-associated genes, Haver 1 and Lcn2 (FIGs. 68C&D). Overall, these results suggest that renal disease in female NZM2328 mice progresses from the glomerulus to the tubules and that inflammation established in the acute and transitional stages promotes a decrease in kidney cell signatures, indicative of cell damage.

[0766] Lack of inflammatory signatures in the glomerulus of NZM2328 male mice associated with lack of progression to chronic renal disease. SLE in human patients exhibits sex bias and affects females nine times more frequently than males (21). This also occurs in the renal disease of NZM2328 mice as males develop acute GN with immune complex deposition in the kidneys, but rarely develop severe proteinuria and do not progress further to chronic disease (17, 18). Thus, to gain insight into the differences in the nature of disease in male NZM2328 mice, we evaluated the transcriptomes of glomeruli from male mice with AGN at 10 months of age as compared to predisease, 8-9-week-old mice (FIGs. 69A-C). Even though male mice were selected because they had evident immune complex deposition, in contrast to the females, male AGN mice were not enriched for gene signatures indicative of a robust adaptive immune response, myeloid cell infiltration in the kidney, or increased inflammation in the kidneys (FIG. 69A). Instead, males exhibited enrichment for signatures of mRNA splicing and transcription factors and de-enrichment of metabolic pathways, including glycolysis, oxidative phosphorylation, and fatty acid oxidation. In addition, there was no difference in expression of kidney tissue signatures in male mice with acute stage disease as compared to normal controls (FIG. 69B).

[0767] The increased rate of SLE in females over males implicates sex hormones and downstream cellular processes as potential contributors to disease (22, 23). To assess the role of sex hormones in renal disease in NZM2328 mice, we developed signatures of estrogen- and androgen-regulated genes and compared their enrichment in female and male mice with AGN as compared to normal controls (FIG. 69C). Female AGN mice exhibited no differences in enrichment of hormone-regulated gene signatures. However, in males, androgen-regulated genes were decreased in AGN mice. Furthermore, the majority of androgen-regulated genes contributing to the de-enrichment in male mice were related to mitochondrial function and metabolic pathways, including Akapl, Cox6bl, lapp, Mrps6, Mybbpla, Ndufal, Phkg2, Prelidl, Sard, and Tmem86a. This result suggested that a decreased male hormone response in AGN mice may contribute to resistance to disease progression by regulating metabolism and dampening inflammation.

[0768] Inflammatory gene signatures in the glomeruli of R27 mice differ from those in

NZM2328 mice. In addition to the differences in severity between males and females, there is also a genetic component to the risk for development of chronic renal disease in female NZM2328 mice. Previous work identified the CGN-associated locus, Cgnzl, and the R27 congenic mouse strain generated by replacing the Cgnzl locus with that from the non-lupus prone C57BL/J mouse (15, 16). Female R27 mice develop AGN with similar kidney pathology to NZM2328 mice, but do not progress further to severe proteinuria and ESRD suggesting that they are resistant to kidney damage. To examine the basis of non-progressive GN in R27 mice, we analyzed gene expression data from the glomeruli and tubulointerstitial tissue of normal, control R27 mice (8-9 weeks) and R27 mice with AGN (12 months). R27 mice with AGN were identified by the presence of proteinuria and glomerular deposits of immunoglobulin detected by immunofluorescence

[0769] First, we assessed differentially expressed genes (DEGs) from the glomeruli of NZM2328 (Table 21) and R27 mice (Table 22) relative to their respective CTLs to compare the inflammatory environment of R27 mice directly with that of the parent strain that develops CGN (FIG. 70A). DEGs from the glomeruli of both NZM2328 and R27 AGN mice exhibited significant overlaps with increased expression of APC, myeloid cell, MΦs, and MHC signatures. Notably, the myeloid cell and M(|) signatures exhibited more significant overlap with DEGs from NZM2328 mice. In addition, DEGs from NZM2328 but not R27 AGN mice were associated with PRRs and IFN stimulated genes, suggesting a more severe inflammatory environment than in the glomeruli of the congenic R27 mice with acute disease.

[0770] We next carried out GSVA to determine the relative enrichment of gene signatures for immune and inflammatory cells and pathways in R27 AGN mice as compared to R27 CTL mice (FIGs. 70B & 71B) The glomeruli of R27 AGN mice were enriched for gene signatures indicative of inflammation, including APCs, IG Chains, MΦs and MHC class I and II (FIG. 70B). However, separate analysis of MΦs subsets revealed that a gene signature of anti-inflammatory M2, but not pro- inflammatory Ml MΦs was enriched in R27 mice. Notably, there was no difference in IFN stimulated genes, PRRs, or Hifla signaling between R27 control and AGN mice. Furthermore, there was no evidence of kidney damage, as kidney cell-specific gene signatures were also unchanged in R27 AGN mice (FIG. 70C). Overall, despite the increase in inflammatory cell and pathway gene signatures in R27 AGN mice, the differences in the nature of this inflammation from NZM2328 female mice may contribute to resistance from progression to chronic disease.

[0771] NZM2328.R27 mice exhibit resistance to kidney tubule damage. To examine the status of inflammation in the TI of R27 AGN mice relative to NZM2328 mice with acute disease, we compared overlap of DEGs from each cohort with gene signatures of immune cells and inflammatory processes (FIG. 71A). DEGs from the TI of NZM2328 AGN mice were significantly associated with signatures of APCs, myeloid cells, MΦs , and MHC class II, indicating the presence of some immune infiltrates but not to the same extent as the inflammation in the glomeruli at this early point in disease (Table 21). In contrast, DEGs from R27 AGN mice only exhibited significant overlap with APC and MHC class II and were not associated with other inflammatory signatures (Table 22). Thus, differential expression analysis from R27 AGN mice indicated a lack of disease progression from the glomeruli to the kidney tubules, manifested by decreased inflammatory signatures in the TI.

[0772] GSVA analysis of the TI of R27 AGN mice confirmed the enrichment of APCs, IG chains, and myeloid cell signatures, but not MΦs , indicating a less extensive infiltration of immune/inflammatory cells in the R27 mice than that in NZM mice. (FIG. 71B). Previous studies using the R27 strain found evidence that resistance to end organ damage might contribute to their decreased development of CGN (16). In support of this, we found that kidney tissue cell signatures and, in particular, kidney tubule cells were significantly increased in R27 AGN mice (FIG. 71C), whereas the kidney damage-associated genes, Haver 1 and Lcn2, were unchanged (FIG. 71D). We also found that gene signatures related to mitochondria, glycolysis, and lipid metabolism were increased (FIG. 71B) and were significantly correlated with the kidney tubule cell gene signature (FIG. 71E), suggesting that robust mitochondrial function may contribute to the kidney tubule cell enrichment observed in R27 AGN mice.

[0773] Kidney cell signatures enriched in NZM2328.R27 mice correlate with expression of chronic GN risk locus genes. The risk for progression to CGN in NZM2328 mice was associated with a 1.34 Mb region of chromosome 1 (Cgnzl) containing 45 genes (Table 20) (16). We analyzed differential expression of these CGN susceptibility genes in the glomerulus and TI of female NZM2328 AGN/TGN/CGN, and R27 AGN mice as compared to normal controls to determine their contribution to renal disease progression (FIGs. 72A-E). In the glomerulus, we found that genes encoding receptors expressed on immune cells and associated with inflammation, including Cd244, Fcerlg, Fcgr3, Fcgr4, and Slamf7 VNQ Q significantly increased in NZM2328 AGN, whereas none was overexpressed in R27 kidneys, suggesting their expression is associated with increased inflammation and disease progression (FIG. 72A). Expression of these genes as well as additional immune-associated Cgnzl locus genes was further increased at the height of inflammatory cell and pathway gene signature enrichment at the TGN stage and either maintained or decreased at the CGN stage in the glomerulus of NZM2328 mice.

[0774] In the kidney TI, there were no significant differences in expression of Cgnzl locus genes in NZM2328 or R27 female AGN mice as compared to normal controls (FIG. 83). This result was reflective of the lack of pro-inflammatory signature enrichment in acute stage mice. However, at the TGN stage, expression of immune-associated Cgnzl locus genes increased significantly over normal control mice, providing further evidence for the critical role of the CGN risk locus in progression to chronic disease (FIG. 72B). [0775] The increased enrichment of kidney tubule cell signatures in R27 mice suggested that the genetic background of these mice, in which the Cgnzl chronic risk locus was replaced with that from the non-lupus prone C57BL/J strain, rendered R27 kidneys resistant to tubule damage and prevented the progression from acute to chronic disease. To investigate the relationship between CGN risk locus gene expression and kidney tubule cell enrichment in R27 female AGN mice, we carried out linear regression analysis (FIG. 72C), and found that log2 expression values for 7 of the 45 genes composing the Cgnzl locus (Apoa2, Fcerlg, Ncstn, Ndufs2, Niti, Pexl9, Sdhc) were significantly correlated with GSVA scores for kidney distal tubule cells and thus could play a role in promoting resistance to kidney damage. Notably, we found that the proteins encoded by these genes were involved in mitochondrial respiration (Ndufs2, Sdhc), metabolite processing (Apoa2, Ncstn, Niti, Pexl9) and immune signaling (Fcerlg), suggesting a relationship to the increased enrichment of mitochondrial signatures in R27 AGN mice. Furthermore, log2 expression values for 4 of these genes (Apoa2, Ndufs2, Niti, and Sdhc) were significantly correlated with kidney tubule cell GSVA scores and significantly decreased in the TI of NZM2328 CGN mice as compared to normal controls (FIGs. 72D&E). To delve further into the functional pathways involving these kidney cell- associated genes, we identified upstream regulators (UPRs) using Ingenuity Pathway Analysis (IPA)(24) (Table 18). Notable UPRs predicted to drive expression of the 7 CGN risk locus genes included Rbl, Rictor, Wnt3a, Ctnnbl, and Hifla and thus reflected the involvement of cell growth regulation, WNT signaling, and stress response pathways in the kidneys of R27 AGN mice. These results suggest that the cellular functions associated with expression of some of the Cgnzl risk locus genes in R27 mice contribute to robust mitochondrial function and promote resistance to kidney tissue damage in the context of acute nephritis. Table 20 lists the 45 Cgnzl Locus Genes. Table 21 lists the DEGs (in glomeruli and TI) from AGN mice. Table 22 lists the DEGs (in glomeruli and TI) from R27 mice.

[0776] Gene co-expression network analysis identifies molecular profiles correlating with disease progression in NZM2328 mice. As an orthogonal approach to identify molecular patterns reflective of disease stage in NZM2328 mice in an unsupervised manner, we generated a network of co-expressed gene modules using multiscale embedded gene co-expression network analysis (MEGENA) and correlated individual gene modules with mouse GN stages (FIGs. 81A and B) MEGENA of gene expression results from NZM2328 mice generated 60 co-expressed gene modules for the glomerulus and 48 modules for the TI that were divided into three megaclusters and annotated based on gene overlap with curated gene signatures as well as gene ontology (GO) terms (Tables 26-1 to 26-60, and 27-1 to 27-48). Overall, the MEGENA- derived gene modules were representative of the major cell types and processes we had previously associated with GN using curated gene signatures, including inflammatory myeloid cells, kidney tissue cells, and metabolic processes. Furthermore, k-means clustering based on the MEGENA modules successfully separated mice into cohorts based on disease severity. In the glomerulus (FIG. 81A), the coral cluster (the left most cluster) of CTL and AGN mice was positively correlated with gene modules associated with kidney cells and metabolic processes, and negatively correlated with gene modules related to the immune/inflammatory response. Two clusters (maroon (2nd cluster from left) and green (3rd cluster from left)) contained a combination of TGN and CGN mice and were positively correlated with immune response modules and negatively correlated with kidney cell and metabolism modules. The final cluster of CGN mice (blue (the right most cluster)) had a negative correlation with immune response and kidney/metabolic modules but retained a high positive correlation with secreted immune factors. MEGENA results and correlations with disease stage in the TI (FIG. 81B) were similar to the glomerulus, but the resulting gene modules were more heavily skewed toward mitochondrial metabolism and the blue cluster of CGN mice was still positively correlated with the immune response-associated modules. In summary, this unsupervised approach employing co-expressed gene modules yielded results that closely resemble our previously identified molecular profiles of disease progression in NZM2328 mice.

[0777] Identification of gene signatures characterizing GN stages in NZM2328 mice. We next sought to assemble a panel of curated gene signatures that would characterize the inflammatory environment in different stages of murine GN and determine whether similar immune profiles could be identified in human LN kidneys. To accomplish this, a core set of 22 GSVA gene signatures was selected based on significant enrichment in AGN, TGN, or CGN NZM2328 mice (Table 28-1 to 28-22, Table 2, FIGs. 67E and 68A). GSVA scores were then used as input for k-means clustering to form 4 clusters of mice from the glomerulus and TI gene expression datasets (FIG. 82A&B). In the glomerulus, AGN mice in the maroon cluster were characterized by slightly increased enrichment of inflammatory immune cells compared to CTL mice but retained enrichment of kidney tissue cell and metabolism gene signatures (FIG. 82A). TGN mice in the green cluster exhibited the highest enrichment of all inflammatory gene signatures accompanied by a decrease in metabolic and kidney cell signatures. CGN mice were divided among multiple clusters and thus reflected heterogeneity in immune profiles among mice at this stage of disease. Two CGN mice were placed in the coral cluster (the left most cluster) with CTL mice reflecting waning inflammation and retention of metabolic and kidney cell signatures. Another group of CGN mice with continued evidence of inflammatory gene signatures were found in the green cluster (the 3rd cluster from the left) with TGN mice. Finally, the blue cluster (the right most cluster) of CGN mice exhibited a relative de-enrichment of immune cells, kidney cells, and metabolic pathways indicative of a post-inflammatory state with evidence of end organ damage. [0778] Gene expression-based clustering of the TI yielded similar results as the glomerulus with increasing inflammation and decreasing metabolism and kidney tubule cell gene signatures marking progression in disease severity (FIG. 82B). However, in the TI, the AGN mice clustered with CTLs and more of the CGN mice appeared to retain immune cell enrichment, reflecting persistent immune cell infiltration.

[0779] Validation of NZM2328 gene expression patterns in an unrelated dataset

[0780] To validate the findings in NZM2328 mice in another lupus-prone strain, we applied the same approach to analysis of publicly available gene expression data from whole kidney tissue of the IFNa-accel erated NZB/W model (IFNa-NZB, GSE86423). Notably, the 22 curated gene expression signatures used to separate disease stages in NZM2328 mice, followed a similar enrichment pattern over a 9-week time course in IFNa-NZB mice indicating that this result was not unique to the gene expression dataset we generated from the NZM2328 strain (FIG. 84).

[0781] Gene signatures characterizing GN stages in NZM2328 mice identify analogous subsets of human LN patients

[0782] To determine whether immune profiles of NZM2328 mice with different stages of GN would translate to human lupus patients, we analyzed a publicly available gene expression dataset of microdissected glomeruli and TI from kidneys of patients with International Society of Nephrology (ISN) class II-IV LN as determined by histological classification (GSE32591) (27). We carried out GSVA using human orthologs of the 22 curated mouse gene signatures and identified 4 molecular endotypes by k-means clustering based on the pattern of enriched gene signatures in each individual patient (Table 17, FIGs. 82C-E). GSVA results of glomeruli and TI from kidneys of LN patients formed 4 patient clusters that exhibited similar gene set enrichment profiles to the nephritic kidneys of NZM2328 mice (FIGs.

82C&D). In both the glomerulus and TI, we observed a clear progression with increased enrichment of inflammatory cells corresponding with de-enrichment of kidney tissue cells as well as metabolic pathway signatures. In addition, the cohort correlations between LN classification and gene signature enrichment revealed increased correlations with pro- inflammatory cells in proliferative nephritis, particularly in the glomerulus, and corresponding negative correlations with kidney tissue cell signatures. However, whereas in the mouse kidneys we observed a post-inflammatory cluster of CGN stage mice, human LN samples with the greatest de-enrichment in metabolic and kidney cell signatures also retained a relatively high enrichment of immune/inflammatory cell signatures. [0783] To confirm these results, we generated and analyzed a second gene expression dataset from whole kidneys of human LN patients in a similar manner (FIG. 82E). Notably, gene signature enrichment profiles of each human whole kidney subset more closely resembled clusters from mouse GN, including a cluster of samples that exhibited both de-enrichment of inflammatory signatures and metabolic signatures. GSVA of human whole kidney gene expression using the unsupervised MEGENA modules generated for NZM2328 mouse kidneys (FIGs. 81A and B) also yielded similar patterns of gene expression enrichment across LN patient clusters (FIGs. 85A and B). As an additional approach to establish similarities between mouse and human kidney gene expression profiles, we carried out MEGENA using the human LN whole kidney dataset. Then, MEGENA modules generated for the NZM2328 mouse (FIGs. 81A and B) were used as a reference to determine the preservation of gene module assignment between the mouse and human kidney gene co-expression networks (Tables 26-1 to 26-60, and 27-1 to 27-48). The results indicated that 22 MEGENA modules from the mouse glomerulus and 31 MEGENA modules from the mouse TI had a significant module preservation score (z-score > 2) with human kidney modules indicating a high degree of overlap in their gene expression profiles. Overall, these results demonstrate that gene expression analysis can be used to classify stages of GN in lupus-prone mice and that mouse kidney endotypes can be translated to human LN patients.

[0784] FIGs. 85A-B show NZM2328 mouse MEGENA module-based clustering of human LN kidneys. K-means clustering (k=4) of whole kidney samples from human LN patients based on GSVA scores from human orthologs of the MEGENA modules from NZM2328 mouse microdissected glomeruli (A) and TI (B) used. The optimal number of module clusters was defined by the silhouette method and annotated by gene overlap with curated immunologic signatures and GO terms. Heatmap visualizations depict positive to negative GSVA scores on a red to blue gradient and positive to negative correlations between GSVA scores and disease classification on a gold to blue gradient.

Discussion

[0785] The challenge of classifying disease pathology in heterogeneous presentations of LN has highlighted the need for a better understanding of the progression of disease in the kidneys of lupus patients and the risk factors for ESRD. To begin to address this, we utilized gene expression analysis to characterize stages of autoimmune inflammation leading up to the development of chronic disease in an established murine model of LN. Mice were classified in disease stages by histological comparison, matching mice by level of disease pathology and amount of IC deposition. This analysis revealed distinct immune profiles for acute disease, after initial IC deposition in the kidney glomerulus, transitional disease in which inflammatory cell and pathway enrichment is at its peak, and chronic disease in which the accumulated insults result in irreversible damage to the kidney tissue.

[0786] We found evidence of selective immune cell infiltration in the glomeruli of AGN mice, including enrichment for Monocyte/M(|), APC, MHC Class II, and Tfh cell gene signatures. This result reflects a limited set of immune/inflammatory cells present in the tissues at the initiation of AGN and likely reflects the cellular response to IC deposition. (1, 15, 25). Enrichment of apoptosis and PRR gene signatures at the AGN stage may reflect the widening innate immune response triggered by early DAMP release from the kidney tissue. At the AGN stage the glomeruli of NZM2328 mice were also enriched for CD8 T cells, which studies have found to be elevated in both human and mouse GN and have been linked with disease severity (26-28). However, it is possible that in the context of AGN, these CD8 T cells act in a more regulatory rather than effector cell capacity, as previously suggested (29).

[0787] Our classification of the progression of GN in NZM2328 mice uncovered a newly recognized, intermediate stage, between acute and chronic disease during which we observed the greatest level of immune activity. As GN in NZM2328 mice progressed to the transitional stage, we observed a striking increase in innate immune response pathways and evidence of significant myeloid cell infiltration in the kidney tissue. Previous molecular studies of lupus nephritis report a robust IFN response as the key feature distinguishing kidneys of lupus patients from healthy individuals and the TGN stage is when we first observed significant enrichment of an IFN signature in the glomeruli of diseased mice (30-32). In line with this result, we also found significant enrichment of MΦs populations in TGN mice and, in particular, those with a pro-inflammatory, Ml rather than an alternatively activated, M2 gene signature. MΦss with both an Ml and an M2 phenotype have been described in mouse models of lupus nephritis and associated with disease pathogenesis (33-36). However, despite the production of anti-inflammatory molecules by M2 MΦss, the amplification of inflammatory cytokine production by immune and kidney tissue cells was found to overwhelm any regulatory response and promote disease progression. Kidney-infiltrating MΦss are also important mediators of damage to the kidney tissue and we found that increases in MΦs signatures in TGN mice were accompanied by decreases in kidney cell signatures and, in particular, podocytes. Podocytes are frequent targets of immune infiltration in the glomerulus and podocyte injury has been associated with proteinuria in lupus patients and is regarded as a precursor to end organ renal damage (37-39). There is a debate about whether podocyte injury is a primary or more down-stream event in the progression of LN. The current data indicate that in this murine model, detectable podocyte injury is down-stream of the initial events in AGN. [0788] It has been reported that the low oxygen tension environment in the kidney becomes more hypoxic in lupus nephritis, correlates with disease severity, and is associated with mitochondrial dysfunction in lupus mouse models (40, 41). In addition, several studies supporting the “chronic hypoxia hypothesis” have identified hypoxia-induced damage in the TI as the final critical pathway leading to ESRD in human patients (3, 5, 42). Our results align with these studies as we observed enrichment of the hypoxia response pathway through Hifla in the glomeruli of TGN mice. Furthermore, heightened severity of disease pathology in CGN mice was accompanied by evidence of further damage to the kidney tissue, as well as a loss of mitochondrial and metabolic gene signatures suggestive of mitochondrial dysfunction. Therefore, our results support previous assertions that targeting the hypoxia response and mitochondrial dysfunction may be beneficial in the treatment of lupus patients (41, 43).

[0789] Glomeruli serve as the first connection points of kidney nephrons with the vasculature and thus, are also the initial targets of ICs and infiltrating immune cells before disease progresses downstream to the kidney tubules. As a result, damage to the kidney tubules is regarded as a diagnostic marker for progression to ESRD (7, 44). In line with this, enrichment of inflammatory cell and pathway gene signatures was delayed in the TI as compared to the glomeruli of nephritic mice and resulted in de-enrichment of kidney tubule cell gene signatures in the TI of CGN mice. We also found that the expression of kidney-damage associated genes Haver 1 andLcn2(45-48) was significantly elevated in the TI of CGN mice. In addition, de-enrichment of metabolic gene signatures indicative of mitochondrial dysfunction was more prevalent in the TI of CGN mice suggesting that mitochondrial stress contributed to kidney tubule damage in late-stage disease.

[0790] Here, we also examined the mechanism(s) of resistance to chronic disease based on differences in gender and genetic background of lupus-prone NZM2328 mice. The increased prevalence of SLE in females over males in both human lupus patients and certain lupus mouse models implicates sex hormones in the pathogenesis of lupus nephritis (21, 23, 49, 50). Our analysis by both histology and gene expression-based approaches confirmed that male NZM2328 lupus- prone mice develop a milder form of AGN than female mice that does not progress to CGN (17). In addition, critical metabolic signatures, including glycolysis and oxidative phosphorylation, were decreased in male AGN mice suggestive of a dampened inflammatory response. Analysis of sex hormone-regulated gene signatures in the kidney did not indicate a difference in the estrogen response of female or male mice, which has been associated with lupus pathogenesis in both humans and mouse models (51-54). However, in many cases, the effects of estrogen regulation have been on immune cell populations and, therefore, we cannot discount an influence of estrogen regulation on circulating immune cells outside of the kidney tissue. In contrast to estrogens, androgens have been implicated in immunosuppression with decreased levels found in autoimmunity (55, 56). In line with this, androgen-regulated genes were de-enriched in male NZM2328 AGN mice and the genes contributing to this decrease were involved in cellular metabolism, suggesting a mechanism of androgen regulated immunosuppression through targeting metabolic pathways that is decreased in NZM nephritic mice.

[0791] We investigated the genetic-based resistance to chronic disease using female mice of the congenic strain, NZM2328.R27 (16). Interestingly, the glomeruli of R27 mice exhibited evidence of M(|) infdtration, but the phenotype of these MΦs was restricted to an anti-inflammatory, M2, phenotype and there was no enrichment of the pro-inflammatory, Ml, gene signature observed in the base strain. This result suggests that the altered nature of the inflammatory response in R27 AGN mice contributes to end organ resistance to disease. Furthermore, the tubules of R27 AGN mice exhibited enrichment of gene signatures indicating a resistance to damaging pathologic processes stemming from inflamed glomeruli including increased kidney tubule cell signatures in conjunction with increased mitochondrial and metabolic gene signatures.

[0792] Since the R27 strain was derived by replacing the chronic disease risk locus, Cgnzl, of NZM2328, we examined the potential contribution of the 45 genes within this locus to resistance to CGN. We uncovered several pro-inflammatory genes with elevated expression in NZM2328 female mice, that would promote the activation of pathogenic immune populations such as Ml MΦs and have been implicated in GN (57, 58). Furthermore, 7 risk locus genes that significantly correlated with kidney tubule cell signature enrichment in R27 AGN mice were involved in cell growth, metabolism, and WNT signaling. Involvement in boosting mitochondrial function could counteract the risk of mitochondrial stress and loss of function that were present in late-stage NZM2328 female mice. In addition, WNT signaling has been shown to have a positive role in resolving acute kidney injury, whereas it may promote maladaptive responses during chronic disease (59).

[0793] We have identified multiple mechanisms by which lupus-prone mice acquire resistance to chronic nephritis with implications for identifying risk factors for ESRD in human lupus patients. Interestingly, these mechanisms appear to be independent of the amount of IC deposition as all AGN mice (NZM2328 female, NZM2328 male, and R27 female) were matched by the level of pathology before monitoring disease progression. Resistance to chronic disease in male NZM2328 mice may have occurred at the initial point of IC deposition in the glomerulus, which failed to elicit a potent inflammatory response, possibly related to androgen-dependent suppression of energy -producing metabolic pathways. Resistance to chronic disease in R27 mice was associated with an altered composition of immune cells in the glomeruli that resulted in a lack of immune pathology downstream in the tubules. Moreover, the tubules in the R27 appear to be resistant to damage, as manifested by enhanced metabolic signatures. The resistance of tubules to damage related to immune activity in the glomerulus and/or hypoxia could play a pivotal role in preventing the typical inflammatory infiltrate in the TI of CGN, Thus, the absence of tubular dysfunction may have limited the inflammatory infiltrate in the TI and ultimately prevented additional damage to the kidney tissue.

[0794] Using a gene expression-based clustering approach, we have identified a core set of curated gene signatures able to classify disease stages of murine GN into molecular endotypes that effectively translate to human LN patients. Notably, human orthologs of the murine GN gene signatures identified a similar pattern in two independent cohorts of human LN patients consisting of increased enrichment of inflammatory cells and corresponding de-enrichment of metabolic pathways and kidney tissue cells associated with more advanced stages of kidney pathology. In current practice, the severity of LN pathogenesis is determined by histological classification, which is used to drive therapeutic decisions and assess the potential for terminal kidney damage (27,65,66). We found only modest correlation between ISN histological classification of renal pathology in human LN patients and molecular classification by gene expression profiling and the gene signature correlations that were identified were inconsistent across patients with the same ISN class and between datasets. This result emphasizes the subjectivity of histological assessment of renal pathology, and suggests that molecular classification may be a more robust and reproducible approach to classification of human LN.

[0795] An orthogonal, unsupervised approach to generate co-expressed gene modules (MEGENA) also identified similar molecular patterns that effectively classified mouse GN stages, human LN patients, and were highly conserved between species. This unsupervised approach supplies further validation for gene expressed-based profiles derived from curated gene signatures as well as the utility of lupus mice to recapitulate human LN at the molecular level. In summary, this work provides a comprehensive examination of the immune processes involved in progression of murine GN to chronic disease resulting in renal failure. In addition, this work presents a foundation for improved classification of LN based on molecular endotypes and illustrates the applicability of murine models to better understand the stages of human disease.

[0796] In summary, this, this work provides a comprehensive examination of the immune processes involved in progression of murine GN to chronic disease resulting in renal failure and provides new insights that could be applicable in understanding human LN.

Methods

[0797] Mice. NZM2328 and NZM2328.R27 congenic mice were obtained/generated as previously described (14, 16). All mice were housed at the University of Virginia (UVA) Center of Comparative Medicine under pathogen-free conditions. [0798] Histological characterization. Kidneys of NZM2328 and R27 mice were harvested and frozen sections were mounted on slides and the stage of GN was confirmed by histological classification as previously described (14, 16). Then laser capture microdissection (LCM) was used to isolate glomeruli and tubulointerstitial tissue from each kidney sample.

[0799] Gene expression analysis by microarray. RNA preparation and array hybridization were carried out by the UT Southwestern Microarray Core facility for NZM2328 and R27 female mice Affymetrix Clariom D Array of NZM2328 female and R27 mice and by the UVA Genome Analysis and Technology Core for the GeneChip Mouse 430 v2.0 array of NZM2328 female and male mice according to standard Affymetrix protocols.

[0800] Processing of raw microarray data was carried out with the R/Bioconductor package oligo. Affymetrix CEL files were background corrected and normalized using the robust multi-array average (RMA) method. Normalized data was transformed into log2 intensity values and formatted as R expression set objects (E-sets). Principal component analysis (PC A) was used to inspect the datasets for outliers. E-sets were annotated using chip definition files (CDFs) corresponding to Affymetrix Clariom D (NZM2328 female and R27 mice) or Mouse 430 v2.0 (NZM2328 male mice) arrays. Low intensity probes were filtered by visual selection of thresholds at the dip in histograms of binned log2-transformed probe intensities. Variance correction was carried out using the ebayes function in the R/Bioconductor LEMMA package. Resulting p- values were adjusted for multiple comparisons using the Benjamini -Hochberg correction that produced a false discovery rate (FDR) for each comparison. Probes were distilled down to differentially expressed (DE) probes with FDR < 0.2 which were considered statistically significant.

[0801] Gene set variation analysis (GSVA). The R/Bioconductor package GSVA (20) (vl .36.3) was used as a non-parametric, unsupervised method to estimate the variation in enrichment of predefined gene sets in microarray data from NZM2328 mice. The input for GSVA was a matrix of log2 expression values for all samples and a collection of gene signatures for immune cell types and functional pathways. Genes with multiple Affymetrix identifiers were selected based on the highest interquartile range (IQR) and probes with IQR=0 were filtered out. GSVA enrichment scores were calculated using a Kolgomorov-Smimoff (KS)-like random walk statistic and were normalized across all samples to values between -1 and +1 indicative of negative enrichment and positive enrichment, respectively.

[0802] GSVA gene set generation. Gene sets used as input for GSVA are listed in Tables 19-1 to 19-36. The genes are grouped according to cell type or pathway signature. For example, Table 19-1 lists myeloid cell signature genes. Cell type and pathway gene signatures were generated based on literature mining, Mouse Genome Informatics (MGI)(60) gene ontology (GO) terms, and immune cell-specific expression derived from the Immunological Genome Project Consortium (ImmGen)(61). The glycolysis, oxidative phosphorylation, amino acid metabolism, and fatty acid oxidation gene signatures have been previously described (62). The cell type gene signatures were derived from Mouse Cell Scan, a tool for identification of cellular origin from mouse gene expression datasets. The pathway gene signatures were derived from the Mouse Biologically Informed Gene Clustering (BIG-C) tool for categorization of biological functions in mouse gene expression datasets. The human ortholog gene Tables (19A-1 to 19A-36) show certain human orthologs corresponding to the genes in mouse gene Tables (Tables 19-1 to 19-36) as indicated by the names listed in the table headers.

[0803] Linear regression analysis. Linear regression analysis between GSVA enrichment scores and log2 gene expression values was carried out using GraphPad Prism software (v9.3.1). The goodness of fit is displayed as the R 2 value. The p-value indicates the significance of the slope of the regression line.

[0804] Ingenuity pathway analysis (IP A). Molecules upstream of selected Cgnzl locus genes were identified using IPA upstream regulator (UPR) analysis (Qiagen) (24). UPRs with an overlap p-value < 0.01 were considered significant.

[0805] Multiscale embedded gene co-expression network analysis (MEGENA). The MEGENA R package (23) was used to generate gene co-expression networks for NZM2328 mouse glomerulus and TI by inputting the top 5,000 row variance genes from the respective gene expression matrices. A planar filtered network (PFN) was formed using a false discovery rate (FDR) of 0.2. MEGENA multi-scale clustering analysis (MCA) used the PFN to form lineages of gene modules which were assigned “lineage” names based on their descendance from the root MEGENA module. Modules were functionally annotated by overlapping their gene symbols with curated mouse-specific functional signatures, immune cell, and kidney tissue cell signatures as well as the top GO terms (24) exhibiting the greatest coverage for each module. Annotations of MEGENA modules were considered significant if there were at least 3 overlapping gene symbols between the module gene symbols and annotation signature gene symbols, and the Fisher’s p-value statistic of the overlap was p<0.2. A module eigengene (ME) was calculated for each module equivalent to the first principal component of a module’s gene expression. Intracorrelations of sample traits were calculated for brief inspection. MEs were correlated to all sample traits and correlations were zeroed out where the p-value of the correlation was >=0.2. All second generation (gen2) MEGENA modules were retained for ensuing analysis. A gene expression set from human whole kidney biopsies was subjected to MEGENA analysis in a similar manner. Gen2 MEGENA modules from NZM2328 glomerulus and TI were examined for preservation in the MEGENA human kidney modules by utilizing an algorithm that generates z.summ composite scores of 20 preservation metrics (25). [0806] K-means clustering. GSVA enrichment scores of gen2 MEGENA modules (Table 26-1 to 26-60, and Tables 27-1 to 27-48) or 22 curated immune cell, kidney cell, and metabolic pathway gene signatures (Tables 28-1 to 28-22) were used as input for k-means clustering performed with 1000 iterations to identify the most stable clusters for each dataset. Clustering results were visualized using the R package ComplexHeatmap (v 2.12) (26).

[0807] Human orthologs of mouse genes. In order to facilitate direct comparison of cell type and functional gene signature enrichment profiles between mouse and human gene expression datasets, mouse genes were translated into human orthologs. Orthologs are defined as genes in different species that evolved from a common ancestral gene and typically retain the same function. However, there are exceptions in that not all genes have direct mouse to human orthologs and not all predicted orthologs are “true” orthologs that retain the same function and expression pattern across species. To account for this, “true” human orthologs of the mouse gene sets were identified on a gene-by-gene basis using publicly available online databases (GeneCards, the Mouse Genome Informatics (MGI), and UniProtKB) as well as literature mining. Through this process, only genes with similar tissue expression, cellular localization, and functions between mouse and human were retained in the human gene sets.

[0808] Statistical analysis. P-values and odds ratios (ORs) for the overlap of DEGs with inflammatory cell types and pathways were calculated with a two-sided fisher’s exact test in R with a confidence level of 0.95. All other statistical tests were carried out with GraphPad Prism (v9.3.1). Comparisons for two groups (CTL, AGN) were calculated using an unpaired, two-sided Welch’s t- test. Comparisons for more than two groups (CTL, AGN, TGN, CGN) were calculated using Brown Forsythe and Welch’s ANOVA followed by Dunnett’s T3 multiple comparisons test.

Table 17. NZM2328 Kidney Histology, Listed by: Mouse Strain | Mouse ID | Stage | Age | Protein urea | Glomerular histology | Tubular histology | Interstitium histology | IgG deposition in Glom | C3 deposition in Glom | ANA

Table 18. IPA Upstream Regulator Analysis Tables 19-1 to 19-36: GSVA gene sets. In Tables 19-1 to 19-36, the genes are listed by: Gene symbol | Entrez ID.

Table 19-1: Myeloid Cell (also see Table 19A-25 for human orthologs)

Table 19-2: Monocyte-Macrophage (also see Table 19A-23 for human orthologs) _

Table 19-3: Ml Macrophage (also see Table 19A-18 for human orthologs) _

Table 19-4: M2 Macrophage (also see Table 19A-19 for human orthologs) _

Table 19-5: Antigen Presenting Cell (also see Table 19A-3 for human orthologs) _

Table 19-6: CD8 T cell (also see Table 19A-4 for human orthologs)

Table 19-7: Thl7 Cell (also see Table 19A-33 for human orthologs)

Table 19-8: Tfh Cell (also see Table 19A-32 for human orthologs)

Table 19-9: Treg (also see Table 19A-35 for human orthologs)

Table 19-10: GC B Cell (also see Table 19A-7 for human orthologs)

Table 19-11: IG Chain Cell (also see Table 19A-11 for human orthologs)

Table 19-12: Platelet (also see Table 19A-28 for human orthologs)

Table 19-13: Pattern Recognition Receptors (also see Table 19A-27 for human orthologs)

Table 19-14: MHC Class One (also see Table 19A-20 for human orthologs)

Table 19-15: MHC Class Two (also see Table 19A-21 for human orthologs)

Table 19-16: IFN Stimulated Genes (also see Table 19A-10 for human orthologs)

Table 19-17: Pro-Cell-Cycle (also see Table 19A-31 for human orthologs)

Table 19-18: Pro-Apoptosis (also see Table 19A-30 for human orthologs)

Table 19-19: mRNA Splicing (also see Table 19A-24 for human orthologs) Alkbh5 | 268420; Ctnnbll | 66642; Khdrbs3 | 13992; Mbnl3 | 171170; RbmlO | 236732; Rbml l | 224344; Rbm20 | 73713; Rbm24 | 666794; Rsrcl | 66880; Rtca | 66368

Table 19-20: Transcription Factors (also see Table 19A-34 for human orthologs)

Table 19-21: Hifla Signaling Pathway (also see Table 19A-9 for human orthologs)

Cybb | 13058; Rwdd3 | 66568; Hifla | 15251; Mir210 | 387206; Pdkl | 228026; Pdk3 | 236900

Table 19-22: WNT Signaling (also see Table 19A-36 for human orthologs)

Table 19-23: General Mitochondria (also see Table 19A-22 for human orthologs)

Table 19-24: Glycolysis (also see Table 19A-8 for human orthologs)

Table 19-25: Oxidative Phosphorylation (also see Table 19A-26 for human orthologs)

Table 19-26: Amino Acid Metabolism (also see Table 19A-1 for human orthologs)

Table 19-27: Fatty Acid Oxidation (also see Table 19A-6 for human orthologs)

Table 19-28: Lipid Metabolism (also see Table 19A-17 for human orthologs)

Table 19-29: Podocyte (also see Table 19A-29 for human orthologs)

Nphsl | 54631; Nphs2 | 170484; Podxl | 27205; Synpo | 104027; Wtl | 22431

Table 19-30: Kidney Cell (also see Table 19A-12 for human orthologs)

Table 19-31: Kidney Distal Tubule (also see Table 19A-13 for human orthologs)

Table 19-32: Kidney Proximal Tubule (also see Table 19A-15 for human orthologs)

Table 19-33: Kidney Loop of Henle (also see Table 19A-14 for human orthologs)

Clcnka | 12733; Clcnkb | 56365; Kngl | 16644

Table 19-34: Kidney Tubule Collecting Duct (also see Table 19A-16 for human orthologs)

Table 19-35: Androgen Regulated Genes (also see Table 19A-2 for human orthologs)

Table 19-36: Estrogen Regulated Genes (also see Table 19A-5 for human orthologs) 11858; Scnnlg | 20278; Slcl0a3 | 214601; Srebfl | 20787; Stat5a | 20850; Tbl3 | 213773;

Tcaml | 75870; Tmem86a | 67893; Twnk | 226153; Zfp804a | 241514

Tables 19A-1 -19A-36: Human orthologs of the mouse genes listed in Table 19-1 to 19-36. The genes are listed by Gene symbol | Entrez ID.

The genes in each of Tables 19A-1 to 19A-36 can be used as effective biomarkers for classifying the LN disease state of the patient.

Table 20: Cgnzl Locus Genes

The human orthologs of the genes in Table 20 can be used as effective biomarkers for classifying the

LN disease state of the patient. _

Table 21: DEGs (in glomeruli and TI) from NZM2328 AGN mice The human orthologs of the genes in Table 21 can be used as effective biomarkers for classifying the LN disease state of the patient.

Table 22: DEGs (in glomeruli and TI) from R27 mice

The human orthologs of the genes in Table 22 can be used as effective biomarkers for classifying the

LN disease state of the patient.

Tables 26-1 to 26-60: MEGENA gen2 Modules: NZM2328 Glomerulus

The genes in each of Tables 26-1 to 26-60 can be used as effective biomarkers for classifying the LN disease state of the patient.

Table 26-1: Module 2.1 _

Table 26-2: Module 2.11

Table 26-3: Module 3.12 _

Table 26-4: Module 3.13 _

Table 26-5: Module 3.14 _

Table 26-6: Module 3.15 _

Table 26-7: Module 3.17

Table 26-8: Module 4.18

Table 26-9: Module 4.19

Table 26-10: Module 4.2

Table 26-11: Module 4.21

Table 26-12: Module 4.22

Table 26-13: Module 4.23 _

Table 26-14: Module 4.24

Table 26-15: Module 4.25

Table 26-16: Module 4.26

Table 26-17: Module 4.27

Table 26-18: Module 4.28

Table 26-19: Module 4.29

Table 26-20: Module 4.3

Table 26-21: Module 4.31 _

Table 26-22: Module 4.32 _

Table 26-23: Module 4.33 _

Table 26-24: Module 4.34 _

Table 26-25: Module 4.36

Table 26-26: Module 4.38

Table 26-27: Module 4.39

Table 26-28: Module 4.41

Table 26-29: Module 4.44 Table 26-30: Module 5.53 _

Table 26-31: Module 5.54

Table 26-32: Module 5.55

Table 26-33: Module 5.56

Table 26-34: Module 5.57

Table 26-35: Module 5.58

Table 26-36: Module 5.59

Table 26-37: Module 5.61

Table 26-38: Module 5.63

Table 26-39: Module 6.79

Table 26-40: Module 6.8

Table 26-41: Module 7.84

Table 26-42: Module 7.85 Table 26-43: Module 7.86

Table 26-44: Module 7.87

Table 26-45: Module 7.88

Table 26-46: Module 7.89

Table 26-47: Module 7.9

Table 26-48: Module 7.91

NSMCE1, TAF1B, UQCC2, 1700013 G24RIK, TMEM256, TMEM29, RPS4X, CMC1,

CHCHD1, GM14151, RNF223, LRRC26,

Table 26-57: Module 8.99 _

Table 26-58: Module 9.104 _

Table 26-59: Module 9.105 _

Table 26-60: Module 9.106 _

Table 27-1 to 27-48: MEGENA gen2 Modules: NZM2328 TI

The genes in each of Tables 27-1 to 27-48 can be used as effective biomarkers for classifying the LN disease state of the patient.

Table 27-1: Module 2.33

Table 27-2: Module 3.35

Table 27-3: Module 3.36

Table 27-4: Module 4.44

Table 27-5: Module 5.47

Table 27-6: Module 5.48

Table 27-7: Module 5.49

Table 27-8: Module 5.5

Table 27-9: Module 9.62

Table 27-10: Module 9.63

Table 27-11: Module 9.64

Table 27-12: Module 10.66

Table 27-13: Module 10.68

Table 27-14: Module 10.7

Table 27-15: Module 10.71

Table 27-16: Module 10.72

Table 27-17: Module 10.73

Table 27-18: Module 12.76

Table 27-19: Module 13.77 CRLF3, MRPS28, FAM217A, WDR18, SYT17, SLC35F3, ALS2CR12, 9230112D BRIK, CCDC122, 2300009 A05RIK,

Table 27-20: Module 15.83 _

Table 27-21: Module 15.84 _

Table 27-22: Module 17.95 _

Table 27-23: Module 17.96 _

Table 27-24: Module 17.98 _

Table 27-25: Module 18.1 _

Table 27-26: Module 18.101

Table 27-27: Module 19.102

Table 27-28: Module 19.103

Table 27-29: Module 19.104

Table 27-30: Module 19.105

Table 27-31: Module 20.108

Table 27-32: Module 21.109

Table 27-33: Module 21.11 _

Table 27-34: Module 21.111 _

Table 27-35: Module 21.112 _

Table 27-36: Module 21.113 _

Table 27-37: Module 21.114 _

Table 27-38: Module 22.117 AU019823, GATA6, ACOT6, HIBCH, RNF186, RHBDF2, ARHGAP31, TRP53I11, GM684, CTXN3, OLFR558, GM6377, IFT43, MGAT5, CAV3,

Table 27-39: Module 23.12 _

Table 27-40: Module 24.122

Table 27-41: Module 24.123 _

Table 27-42: Module 24.124 _

Table 27-43: Module 24.126 _

Table 27-44: Module 25.127 _

Table 27-45: Module 26.13 Table 27-46: Module 27.132

Table 27-47: Module 29.139

Table 27-48: Module 29.14 _

Table 28: Gene expression signatures to separate LN disease stages

The genes in each of Tables 28-1 to 28-22 can be used as effective biomarkers for classifying the LN disease state of the patient.

Table 28-1: Amino Acid Metabolism

Table 28-2: Antigen Presenting Cell

Table 28-3: CD8 T cell

Table 28-4: Fatty Acid Oxidation

Table 28-5: GC B Cell

Table 28-6: General Mitochondria

Table 28-7: IG Chains

Table 28-8: Kidney Cell

Table 28-9: Kidney Distal Tubule

Table 28-10: Kidney Loop of Henle

CLCNKA, CLCNKB, CLCNKB, KNG1

Table 28-11: Kidney Prox Convoluted Tubule

Table 28-12: LDG

CTSG, ELANE, LCN2, MPO, OSM

Table 28-13: Monocyte-Macrophage IL31RA, LGALS12, LYVE1, MARCO, MEGF10, MERTK, MFGE8, MRC1, MS4A4A, MSR1, PLA2G5, SCARF2, SIGLEC1,

Table 28-14: Myeloid Cell

Table 28-15: NK Cell

Table 28-16: Oxidative Phosphorylation

Table 28-17: pDC

Table 28-18: Platelet

Table 28-19: Podocyte

Table 28-20: TCA cycle

Table 28-21: Tfh Cell

Table 28-22: Thl7 Cell References

[0809] 1. Davidson A. What is damaging the kidney in lupus nephritis?. Nat. Rev. Rheumatol. 2016;12(3): 143-153.

[0810] 2. Maria NI, Davidson A. Protecting the kidney in systemic lupus erythematosus: from diagnosis to therapy. Nat. Rev. Rheumatol. 2020;16(5):255-267.

[0811] 3. Mimura I, Nangaku M. The suffocating kidney: Tubulointerstitial hypoxia in end-stage renal disease. Nat. Rev. Nephrol. 2010;6(l 1):667— 678.

[0812] 4. Suarez-Fueyo A, et al. T cells and autoimmune kidney disease. Nat. Rev. Nephrol. 2017;13(6):329-343.

[0813] 5. Nangaku M. Chronic hypoxia and tubulointerstitial injury: a final common pathway to end-stage renal failure.. J. Am. Soc. Nephrol. 2006; 17(1): 17-25.

[0814] 6. Leatherwood C, et al. Clinical characteristics and renal prognosis associated with interstitial fibrosis and tubular atrophy (IFTA) and vascular injury in lupus nephritis biopsies.. Semin. Arthritis Rheum. 2019;49(3):396-404.

[0815] 7. Liu B-C, et al. Renal tubule injury: a driving force toward chronic kidney disease.. Kidney Int. 2018;93(3):568-579.

[0816] 8. Weening JJ, et al. The Classification of Glomerulonephritis in Systemic Lupus Erythematosus Revisited. J. Am. Soc. Nephrol. 2004;15(2):241-250.

[0817] 9. Ortega LM, et al. Lupus nephritis: Pathologic features, epidemiology and a guide to therapeutic decisions. Lupus 2010; 19(5):557— 574.

[0818] 10. Hsieh C, et al. Predicting outcomes of lupus nephritis with tubulointerstitial inflammation and scarring.. Arthritis Care Res. (Hoboken). 2011 ;63(6): 865— 874.

[0819] 11. Schwartz MM, et al. Irreproducibility of the activity and chronicity indices limits their utility in the management of lupus nephritis. Lupus Nephritis Collaborative Study Group.. Am. J. kidney Dis. Off. J. Natl. Kidney Found. 1993 ;21(4): 374-377.

[0820] 12. Mubarak M, Nasri H. ISN/RPS 2003 classification of lupus nephritis: time to take a look on the achievements and limitations of the schema.. J. Nephropathol. 2014;3(3):87-90.

[0821] 13. Almaani S, et al. Rethinking Lupus Nephritis Classification on a Molecular Level.. J.

Clin. Med. 2019;8(10). doi: 10.3390/jcm8101524

[0822] 14. Waters ST, et al. NZM2328: a new mouse model of systemic lupus erythematosus with unique genetic susceptibility loci.. Clin. Immunol. 2001 ; 100(3):372— 383.

[0823] 15. Waters ST, et al. Breaking Tolerance to Double Stranded DNA, Nucleosome, and Other Nuclear Antigens Is Not Required for the Pathogenesis of Lupus Glomerulonephritis. J. Exp. Med. 2004;199(2):255-264.

[0824] 16. Ge Y, et al. Cgnzl allele confers kidney resistance to damage preventing progression of immune complex-mediated acute lupus glomerulonephritis. J. Exp. Med. 2013;210(l 1):2387— 2401. [0825] 17. Bagavant H, et al. Role for Nephritogenic T Cells in Lupus Glomerulonephritis: Progression to Renal Failure Is Accompanied by T Cell Activation and Expansion in Regional Lymph Nodes. J. Immunol. 2006; 177( 11): 8258-8265.

[0826] 18. Fu SM, et al. Pathogenesis of proliferative lupus nephritis from a historical and personal perspective.. Clin. Immunol. 2017;185:51-58.

[0827] 19. Peterson KS, et al. Characterization of heterogeneity in the molecular pathogenesis of lupus nephritis from transcriptional profiles of laser-captured glomeruli.. J. Clin. Invest. 2004;113(12): 1722-1733.

[0828] 20. Hanzelmann S, Castelo R, Guinney J. GSVA: gene set variation analysis for microarray and RNA-Seq data. BMC Bioinformatics 2013; 14( 1): 7.

[0829] 21. Schwartzman-Morris J, Putterman C. Gender Differences in the Pathogenesis and Outcome of Lupus and of Lupus Nephritis. Clin. Dev. Immunol. 2012;2012:604892.

[0830] 22. Lahita RG. The role of sex hormones in systemic lupus erythematosus.. Curr. Opin. Rheumatol. 1999;11(5):352— 356.

[0831] 23. Moulton VR. Sex Hormones in Acquired Immunity and Autoimmune Disease. Front. Immunol. 2018;9. doi: 10.3389/fimmu.2018.02279

[0832] 24. Kramer A, et al. Causal analysis approaches in Ingenuity Pathway Analysis.. Bioinformatics 2014;30(4):523-530.

[0833] 25. Mannik M, et al. Multiple autoantibodies form the glomerular immune deposits in patients with systemic lupus erythematosus.. J. Rheumatol. 2003;30(7): 1495-1504.

[0834] 26. Reynolds J, et al. Anti-CD8 monoclonal antibody therapy is effective in the prevention and treatment of experimental autoimmune glomerulonephritis.. J. Am. Soc. Nephrol. 2002;13(2):359- 369.

[0835] 27. Chen A, et al. Role of CD8+ T cells in crescentic glomerulonephritis.. Nephrol. Dial. Transplant. Off. Publ. Eur. Dial. Transpl. Assoc. - Eur. Ren. Assoc. 2020;35(4):564-572.

[0836] 28. Couzi L, et al. Predominance of CD8+ T lymphocytes among periglomerular infdtrating cells and link to the prognosis of class III and class IV lupus nephritis.. Arthritis Rheum.

2007;56(7):2362-2370.

[0837] 29. Kim HJ, et al. CD8+ T regulatory cells express the Ly49 class I MHC receptor and are defective in autoimmune prone B6-Yaa mice. Proc. Natl. Acad. Sci. U. S. A. 2011 ; 108(5):2010— 2015.

[0838] 30. Arazi A, et al. The immune cell landscape in kidneys of patients with lupus nephritis. Nat. Immunol. 2019;20(7):902-914.

[0839] 31. Toro-Dominguez D, et al. Stratification of Systemic Lupus Erythematosus Patients Into

Three Groups of Disease Activity Progression According to Longitudinal Gene Expression. Arthritis Rheumatol. 2018;70(12):2025-2035.

[0840] 32. Der E, et al. Tubular cell and keratinocyte single-cell transcriptomics applied to lupus nephritis reveal type I IFN and fibrosis relevant pathways. Nat. Immunol. 2019;20(7):915-927. [0841] 33. Sung SJ, et al. Dependence of Glomerulonephritis Induction on Novel Intraglomerular Alternatively Activated Bone Marrow-Derived Macrophages and Mac-1 and PD-L1 in Lupus-Prone NZM2328 Mice. J. Immunol. 2017; 198(7):2589-260L

[0842] 34. Sung S-SJ, Fu SM. Interactions among glomerulus infiltrating macrophages and intrinsic cells via cytokines in chronic lupus glomerulonephritis.. J. Autoimmun. 2020;106: 102331.

[0843] 35. Kuriakose J, et al. Patrolling monocytes promote the pathogenesis of early lupus-like glomerulonephritis. J. Clin. Invest. 2019;129(6):2251-2265.

[0844] 36. Schiffer L, et al. Activated renal macrophages are markers of disease onset and disease remission in lupus nephritis.. J. Immunol. 2008; 180(3): 1938— 1947.

[0845] 37. Ma R, et al. Intrarenal macrophage infiltration induced by T cells is associated with podocyte injury in lupus nephritis patients.. Lupus 2016;25( 14): 1577-1586.

[0846] 38. Sakhi H, et al. Podocyte Injury in Lupus Nephritis.. J. Clin. Med. 2019;8(9). doi: 10.3390/jcm8091340

[0847] 39. Tian Y, et al. Nestin protects podocyte from injury in lupus nephritis by mitophagy and oxidative stress. Cell Death Dis. 2020; 11(5): 319.

[0848] 40. Deng W, et al. Hypoxia inducible factor-1 alpha promotes mesangial cell proliferation in lupus nephritis. Am. J. Nephrol. 2014;40(6):507-515.

[0849] 41. Chen PM, et al. Kidney tissue hypoxia dictates T cell-mediated injury in murine lupus nephritis. Sci. Transl. Med. 2020; 12(538). doi: 10.1126/scitranslmed.aayl620

[0850] 42. Fine LG, Bandyopadhay D, Norman JT. Is there a common mechanism for the progression of different types of renal diseases other than proteinuria? Towards the unifying theme of chronic hypoxia.. Kidney Int. Suppl. 2000;75:S22-6.

[0851] 43. Fortner KA, et al. Targeting mitochondrial oxidative stress with MitoQ reduces NET formation and kidney disease in lupus-prone MRL- Ipr mice. Lupus Sci. Med. 2020;7(l). doi: 10. 1136/lupus-2020-000387

[0852] 44. Hong S, Healy H, Kassianos AJ. The Emerging Role of Renal Tubular Epithelial Cells in the Immunological Pathophysiology of Lupus Nephritis. Front. Immunol. 2020; 11 (September): 1-8.

[0853] 45. Bonventre J V. Kidney injury molecule-1 (KIM-1): A urinary biomarker and much more. Nephrol. Dial. Transplant. 2009;24(11): 3265— 3268.

[0854] 46. Zhou Y, et al. Comparison of kidney injury molecule-1 and other nephrotoxicity biomarkers in urine and kidney following acute exposure to gentamicin, mercury, and chromium. Toxicol. Sci. 2008;101(l): 159-170.

[0855] 47. Castillo-Rodriguez E, et al. Kidney Injury Marker 1 and Neutrophil Gelatinase- Associated Lipocalin in Chronic Kidney Disease. Nephron 2017;136(4):263-267.

[0856] 48. Moschen AR, et al. Lipocalin-2: A Master Mediator of Intestinal and Metabolic Inflammation. Trends Endocrinol. Metah. 2017;28(5):388— 397.

[0857] 49. Ansar Ahmed S, Penhale WJ, Talal N. Sex hormones, immune responses, and autoimmune diseases. Mechanisms of sex hormone action.. Am. J. Pathol. 1985 ; 121 (3): 531— 551. [0858] 50. Cutolo M, Wilder RL. Different roles for androgens and estrogens in the susceptibility to autoimmune rheumatic diseases.. Rheum. Dis. Clin. North Am. 2000;26(4):825-839.

[0859] 51. Shim G-J, et al. Autoimmune glomerulonephritis with spontaneous formation of splenic germinal centers in mice lacking the estrogen receptor alpha gene. Proc. Natl. Acad. Sci.

2004;101(6): 1720-1724.

[0860] 52. Rider V, et al. Molecular mechanisms involved in the estrogen-dependent regulation of calcineurin in systemic lupus erythematosus T cells.. Clin. Immunol. 2000;95(2): 124-134.

[0861] 53. Lang TJ, et al. Increased severity of murine lupus in female mice is due to enhanced expansion of pathogenic T cells.. J. Immunol. 2003; 171(11): 5795— 5801.

[0862] 54. Graham JH, Yoachim SD, Gould KA. Estrogen Receptor Alpha Signaling Is Responsible for the Female Sex Bias in the Loss of Tolerance and Immune Cell Activation Induced by the Lupus Susceptibility Locus Slelb. Front. Immunol. 2020;l l. doi: 10.3389/fimmu.2020.582214

[0863] 55. Cutolo M, et al. Androgens and estrogens modulate the immune and inflammatory responses in rheumatoid arthritis.. Ann. N. Y. Acad. Sci. 2002;966: 131-142.

[0864] 56. Gubbels Bupp MR, Jorgensen TN. Androgen-Induced Immunosuppression.. Front. Immunol. 2018;9:794.

[0865] 57. Aitman TJ, et al. Copy number polymorphism in Fcgr3 predisposes to glomerulonephritis in rats and humans. Nature 2006;439(7078):851-855.

[0866] 58. Stratigou V, et al. Altered expression of signalling lymphocyte activation molecule receptors in T-cells from lupus nephritis patients-A potential biomarker of disease activity. Rheumatol. (United Kingdom) 2017;56(7): 1206-1216.

[0867] 59. Zhou D, et al. Wnt/p-catenin signaling in kidney injury and repair: A double-edged sword. Lab. Investig. 2016;96(2): 156-167.

[0868] 60. Bult CJ, et al. Mouse Genome Database (MGD) 2019.. Nucleic Acids Res.

2019;47(Dl):D801-D806.

[0869] 61. Heng TSP, et al. The Immunological Genome Project: networks of gene expression in immune cells. Nat. Immunol. 2008;9(10): 1091-1094.

[0870] 62. Kingsmore KM, et al. Altered expression of genes controlling metabolism characterizes the tissue response to immune injury in lupus.. Sci. Rep. 2021 ; 11(1): 14789.

Example 4: Transcriptomic analysis of lupus nephritis kidneys identifies molecular endotypes

[0871] Predicting the course or response to treatment of lupus nephritis (LN) from standard renal biopsies is problematic. We analyzed bulk RNA from renal biopsies and paired blood to understand and determine gene expression based lupus nephritis and molecular biomarkers of LN.

[0872] Methods: Gene expression data was collected from the whole kidney and blood of patients with LN and additional datasets were identified in the Gene Expression Omnibus (GEO) (Table 24). Enrichment of informative modules of co-expressed genes in individual samples was analyzed using Gene Set Variation Analysis (GSVA). Gene modules identifying immune/inflammatory cells, resident kidney cells, and metabolic and inflammatory processes as described (1) were employed where appropriate. Kidney biopsy histology and immunofluorescence were scored by a blinded clinical pathologist. Gene sets used as input for GSVA for FIGs. 74 and 76 are listed in Tables 23-1 to 23-28. For example, Table 23-1 lists Amino Acid Metabolism signature genes. Gene sets used as input for GSVA for FIGs. 77, 78 and 80 are listed in Tables 25-1 to 25-32. The genes are grouped according to cell type or pathway signature.

[0873] Results: FIGs. 75 A-B and 76A-I show clustering of metabolic and cellular GSVA enrichment scores in LN kidneys reveals four distinct endotypes of patients with LN. Analysis of LN gene expression with GSVA of kidney biopsy samples of 76 patients with LN (BH11201) revealed four distinct endotypes of LN (FIG. 74A), which could be ordered from least to most severe based on the profile of abnormal features (FIG. 74B). The least severe cluster (“coral”) exhibited minimal immune cell infiltrates or changes to kidney cell or metabolic signature expression. The second cluster (“yellow”) had increased expression of immune/inflammatory cell signatures, with minimal changes to kidney cell or metabolic signatures. The third cluster (“purple”) exhibited both increased expression of immune cell signatures and decreased expression of kidney cell and metabolic signatures. The final cluster (“black”) exhibited minimal immune cell gene expression, but decreased metabolism signatures and increased endothelial cell, fibroblast, and mesangial cell signatures. The columns represent individual patients that are grouped into four clusters (Coral, Yellow, Purple, and Black). The rows represent gene modules indicative of immune/inflammatory cells, non-hematopoietic cells, and cellular metabolism.

[0874] FIGs. 76A-I show comparison of the molecular endotypes, as shown in FIG. 74A and B, for GSVA enrichment of signatures for specific (A-C) metabolism (D-F) non-hematopoietic cells, and (G-I) immune cells. Significant differences in mean GSVA enrichment score between Coral, the “least abnormal” LN endotype, and each other cluster were assessed by Brown-Forsythe and Welch ANOVA with Dunnett’s T3 multiple comparisons. *, p < 0.05, ***, p < 0.001, ****, p < 0.0001.

[0875] FIGs. 75A-H show Systemic disease activity correlates with molecular endotypes of LN, whereas histologic metrics do not. FIG. 75A shows ISN/RPS classes in 46 patients with LN. FIG. 75B shows positive or negative IgA deposition in 44 patients with LN. FIG. 75C shows distribution of active or inactive disease by SLED Al in 32 patients with LN. FIG. 75D shows renal activity index in 49 patients with LN. FIG. 75E shows renal chronicity index in 49 patients with FIG. 75F shows proteinuria values (g/24h) in 24 patients with LN. FIG. 75G shows positive or negative IgG deposition in 41 patients with LN. FIG. 75H positive or negative IgM deposition in 42 patients with LN. Of the LN samples with paired histology, 24/33 samples with proliferative LN were found in clusters with increased immune/inflammatory cell signatures (15/33 in the purple cluster, 9/33 in the yellow cluster) (FIG. 75A). Notably, the percentage of patients with IgA deposition on the glomerular basement membrane was highest in the purple cluster, and significantly lower in the coral cluster (FIG. 75B). Moreover, the percentage of patients with active disease determined by SLED Al was lowest in the coral cluster (FIG. 75C). Mean renal activity and chronicity indices were not significantly different between clusters (FIG. 75D-E). In FIGs. 75 A, B, C, G and H significant differences in expected and observed frequencies between Coral, the “least abnormal” LN endotype, and all other clusters were evaluated by Chi Square Test. The likelihood (odds ratio) of having active disease in the Coral cluster was 0.06 (p=0.006) as compared to the three other clusters. The likelihood (odds ratio) of having proliferative nephritis, or positive IgA deposition, IgG deposition, or IgM deposition in the Coral cluster as compared to the other three clusters was not significant. In (FIGs. 75 A, B, C, G and H) significant associations between the categorical variables and all clusters (denoted with asterisks on the y-axis) were evaluated using Chi Square Test of Independence. In (FIGs. 75 D, E, and F) significant differences in enrichment of the clinical variable between each cluster and Coral was assessed by Brown-Forsythe and Welch ANOVA with Dunnett’s T3 multiple comparisons. All data is for patients in dataset BH11201

[0876] FIG. 77 show clustering of GSVA enrichment scores into the four kidney-derived molecular clusters, using informative cellular and pathway signatures (Tables 25-1 to 25- 32) in paired blood from patients with LN (BH11201). The columns represent individual patients that are grouped into four clusters (Coral, Yellow, Purple, Black). The rows represent gene modules indicative of immune/inflammatory cells and cellular pathways/processes.

[0877] FIGs. 78A-L show analysis of paired blood of patients with LN demonstrates clusterspecific enrichment of LDG, T cell, dendritic cell, and glucocorticoid signatures. GSVA enrichment of (FIG. 78A) LDG, (FIG. 78B) T cell, (FIG. 78C) TCRA, (FIG. 78D) TCRAJ, (FIG. 78E) TCRB, (FIG. 78F) anergic/activated T cell, (FIG. 78G) dendritic cell, (FIG. 78H) glucocorticoid, (FIG. 781) interferon (IFN), (FIG. 78J) monocyte, (FIG. 78K) B cell, and (FIG. 78L) plasma cell signatures in the blood of 71 patients with LN (BH11201) are shown. X- axis clusters denote the cluster to which the sample belongs based upon analysis of paired kidney gene expression. Significant differences in enrichment of gene signatures between each cluster and Coral was assessed by Brown-Forsythe and Welch ANOVA with Dunnett’s T3 multiple comparisons. *, p < 0.05, **, p < 0.01, ***, p < 0.001. The glucocorticoid signature is derived from Northcott et al. (2).

[0878] FIGs. 79A-I show the LDG and T cell signatures are consistently correlated with the glucocorticoid signature in the blood of patients with LN, whereas the dendritic cell signature is not. Linear regression of the glucocorticoid signature with the LDG, T cell, and dendritic cell signatures in the blood of patients with lupus nephritis for (FIGs. 79A-C) BH11201 (n = 71), (FIGs. 79D-F) GSE49454 (n = 19), and (FIGs. 79G-J) GSE99967 (n = 28) is shown. The glucocorticoid signature is derived from Northcott et al. (2). In each of FIGs. 79-1 the glucocorticoid signature is shown in the x-axis.

[0879] FIGs. 80A-B show the expression of erythropoietin (EPO) or a recombinant human erythropoietin (rHuEPO) signature in the blood of patients with LN is not associated with the molecular endotypes LN. (a) Log2 expression of EPO in the paired blood of 71 patients with LN. (b) GSVA enrichment of the rHuEPO signature in the paired blood of 71 patients with LN. The rHuEPO signature was derived from Wang et al (3), where differentially expressed genes were measured after administration of rHuEPO, and nine of the genes that were consistently expressed after rHuEPO administration comprised the signature.

[0880] Conclusion: Transcriptomic analyses suggest the existence of LN endotypes that progress from acute inflammatory to chronic kidney disease with little inflammation and marked kidney damage. These endotypes include: 1) minimal disease (coral cluster); 2) inflammatory disease without kidney cell damage or metabolic dysfunction (yellow cluster); 3) inflammatory disease with kidney cell and metabolic dysfunction (purple cluster); and 4) markedly decreased kidney cell and metabolic function with little inflammation (black cluster). There is an association between disease activity and molecular endotypes of LN, whereas there is minimal to no association between molecular endotypes and histologic phenotype. Analysis of the blood of patients with LN suggest that the two most severe molecular endotypes of LN have different profiles than patients with minimal disease. Although the LDG signature can relate to glucocorticoid treatment (4) and T cell lymphopenia is a marker of severe lupus in general (5), and both were correlated with the glucocorticoid signature, the combination suggests greater suspicion of progression to more severe LN. The dendritic cell signature was less likely to be associated with the glucocorticoid signature, suggesting it may be a promising blood biomarker of more severe LN. Although production of erythropoietin is known to decrease with chronic kidney disease (6), expression of EPO was unchanged in the blood across subsets. The molecular endotypes of LN may be an important way to stage LN and provide useful information to guide therapy.

Tables 23-1 to 23-28 show the GSVA gene sets.

The genes in each of Tables 23-1 to 23-28 can be used as effective biomarkers for classifying the LN disease state of the patient.

Table 23-1: Amino Acid Metabolism Table 23-10: Dendritic Cell

Table 23-11: GC B Cell

Table 23-12: Granulocyte

Table 23-13: LDG

Table 23-14: Monocyte/Myeloid Cell

Table 23-15: NK Cell

Table 23-16: pDC

Table 23-17: Plasma Cell

Table 23-18: Platelet

Table 23-19: T Cell

Table 23-20: Endothelial Cell

Table 23-21: Fibroblast

Table 23-22: Kidney Cell

Table 23-23: Kidney Distal Tubule

Table 23-24: Kidney Loop of Henle Cell

CD9, CLCNKB, KCNJ16, KNG1, S100A6, SPP1

Table 23-25: Kidney Proximal Convoluted Tubule SLC16A12, SLC22A12, SLC22A13, SLC22A2, SLC22A8, SLC34A1, SLC36A2, SLC5A2, SLC6A18, TMEM174, TMEM52B, UGT2B7 _

Table 23-26: Kidney Tubule Collecting Duct Cell _

Table 23-27: Mesangial Cell _

Table 23-28; Podocyte _

NPHS1, NPHS2, PODXL, SYNPO, WT1

Table 24: Kidney and blood datasets for patients with LN, SLE, or controls used for the analysis.

Tables 25-1 to 25-32: 32 Genes Modules listing 989 Genes. Genes are Listed by: Gene

Symbol | Gene Entrez ID ||)

The genes in each of Tables 25-1 to 25-32 can be used as effective biomarkers for classifying the LN disease state of the patient.

Table 25-1:

Table 25-2:

Table 25-3:

Table 25-4:

Table 25-5:

Table 25-6:

Table 25-7:

Table 25-8:

Table 25-9:

Table 25-10:

Table 25-11:

Table 25-12:

Table 25-13:

Table 25-14:

Table 25-15:

Table 25-16: Table 25-17:

Table 25-18:

Table 25-19:

Table 25-20:

Table 25-21:

Table 25-22:

Table 25-23:

Table 25-24:

Table 25-25:

Table 25-26:

Table 25-27:

Table 25-28:

Table 25-29:

Table 25-30:

Table 25-31: _

Table 25-32: _

References

(1) Kingsmore KM, Bachali P, Catalina MD, et al. Altered expression of genes controlling metabolism characterizes the tissue response to immune injury in lupus. Sci Rep. 2021;ll(l):14789.

(2) Northcott M, Gearing LJ, Nim HT, et al. Glucocorticoid gene signatures in systemic lupus erythematosus and the effects of type I interferon: a cross-sectional and in-vitro study. The Lancet Rheumatology. 2021;3(5):e357-e370.

(3) Wang G, Durussel J, Shurlock J, et al. Validation of whole-blood transcriptome signature during microdose recombinant human erythropoietin (Rhuepo) administration. BMC Genomics. 2017;18(8):817.

(4) Dale DC, Fauci AS, Guerry D IV, Wolff SM. Comparison of agents producing a neutrophilic leukocytosis in man. Hydrocortisone, prednisone, endotoxin, and etiocholanolone. J Clin Invest. 1975;56(4):808-813.

(5) Martin M, Guffroy A, Argemi X, Martin T. [Systemic lupus erythematosus and lymphopenia: Clinical and pathophysiological features]. Rev Med Interne. 2017;38(9):603-613.

(6) Portoles J, Martin L, Broseta JJ, Cases A. Anemia in chronic kidney disease: from pathophysiology and current treatments, to future agents. Front Med. 2021;8:642296.

[0881] While preferred embodiments have been shown and described herein, such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the scope of the disclosure. It may be understood that various alternatives to the embodiments described herein may be employed in practice.

Numerous different combinations of embodiments described herein are possible, and such combinations are considered part of the present disclosure. In addition, all features discussed in connection with any one embodiment herein may be readily adapted for use in other embodiments herein. It is intended that the following claims define the scope of the disclosure and that methods and structures within the scope of these claims and their equivalents be covered thereby.