Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
AI METHOD FOR NMR SPECTRA ANALYSIS
Document Type and Number:
WIPO Patent Application WO/2024/049537
Kind Code:
A1
Abstract:
Disclosed herein is a method and system for NMR analysis is disclosed, including peak picking, fitting, and reconstruction. The method is demonstrated for complex 1D and 2D NMR spectra and also for spectral regions with multiple strong overlaps and a large dynamic range whose analysis is challenging for current computational methods. The disclosed method utilizes trained machine learning and/or artificial intelligence models, or a model derived therefrom together with subsequent spectra fitting and analysis. A system is disclosed to carry out the method and is further disclosed as a computer system.

Inventors:
LI DAWEI (US)
BRUSCHWEILER RAFAEL (US)
Application Number:
PCT/US2023/026601
Publication Date:
March 07, 2024
Filing Date:
June 29, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
OHIO STATE INNOVATION FOUNDATION (US)
LI DAWEI (US)
BRUSCHWEILER RAFAEL (US)
International Classes:
G01N24/08; G01N15/00; G01N35/00; G01R33/46; G06N20/00; G01J3/44; G06N5/00
Domestic Patent References:
WO2020239884A12020-12-03
Foreign References:
US20190226947A12019-07-25
US20090024360A12009-01-22
US20210041329A12021-02-11
US20100322864A12010-12-23
Attorney, Agent or Firm:
STAUFFER, Shannon K. et al. (US)
Download PDF:
Claims:
Claims

What is claimed is:

1. A method to detect peaks in a spectral graph, the method comprising: receiving, by a processor, a set of spectral graph data; determining, by the processor, at least one peak location or value, in the set of spectral graph data using one or more trained, machine learning and/or artificial intelligence models, or a model derived therefrom; and causing, by the processor, the at least one peak location or value, to be displayed or employed in subsequent analysis.

2. The method of claim 1, wherein the one or more trained, machine learning and/or artificial intelligence model was trained using training data comprising synthetic spectra, wherein the training data consists of unambiguously identifiable peaks.

3. The method of claim 2, wherein the training data further comprises labeled peak data and at least one neighboring peak data.

4. The method of claim 1 or 2, wherein the one or more trained, machine learning and/or artificial intelligence models comprise one or more neural network models configured to identify at least one peak location or value within the set of spectral graph data, wherein the one or more neural network models identify a point and its two nearest neighbors as a peak.

5. The method of claim 4, wherein a neural network model comprises a plurality of hidden convolutional layers and at least one max pooling layer, wherein the at least one max pooling layer is a final neural network layer.

6. The method of any one of claims 4-5, wherein one of the one or more neural network models further comprise a convolutional layer with an activation function configured to classify the at least one peak location or value as a peak, a shoulder peak, or non-peak.

7. The method of any one of claims 4-6, where one of the one or more neural network models further comprises an output regression layer configured to determine a line shape centered around the at least one peak location or value.

8. The method of any one of claims 1-7, wherein the one or more trained, machine learning and/or artificial intelligence models are applied to the spectral graph data in a sliding window domain.

9. The method of any one of claims 1-8, wherein a low peak amplitude cutoff is applied to the spectral graph data.

10. The method of any one of claims 1-9, wherein the set of spectral graph data is solution NMR, solid-state NMR, EPR, or ESR graph data.

11. The method of any one of claims 1-10, further comprising: determining a spectral line over the at least one peak location or value.

12. The method of claim 11, wherein the spectral line employs a Lorentzian or Gaussian profde.

13. The method of claim 12, wherein the spectral line employs a Voigt profde.

14. The method of any one of claims 1-13, wherein the subsequent analysis is one or both of querying peaks against a known spectral database, and quantifying concentrations from the one or more peak location or values.

15. The method of any one of Icaims 1-14, wherein the set of spectral graph data includes one-dimensional NMR spectra.

16. The method of any one of cairns 1-14, wherein the set of spectral graph data includes two-dimensional NMR spectra.

17. The method of any one of caims 1-16, wherein the set of spectral graph data includes solid-state NMR data.

18. The method of any one of caims 1-16, wherein the set of spectral graph data includes solution NMR.

19. The method of any one of claims 1-18 further comprising: receiving, by a processor, the set of spectral graph data via a web-server; providing the received set of spectral graph data to an analysis engine to determine at least one peak location or value.

20. The method of claim 19, wherein the analysis engine is configured to perform automated peak picking to determine the least one peak location or value.

21. The method of claim 19 or 20, wherein the analysis engine is configured to quantify the one peak location or value, and provide the quanfication to a user device in a report or via display.

22. The method of any one of claims 19-21, wherein the analysis engine is configured to match the one peak location or value to a set of spectra in a database for metabolite identification and provide search output to the user device in the report or via the display.

23. The method of any one of claims 19-22, wherein the analysis engine is configured to perform data normalization via ratio analysis.

24. The method of any one of claims 19-23, wherein the analysis engine is configured to perform peak- and compound-based uni- and multi-variate statistical analyses.

25. A system comprising: a processor; and a memory having instructions stored thereon, wherein execution of the instructions by the processor causes the processor to perform any one of the methods of claims 1-13.

26. The system of claim 25, wherein the system is configured as an NMR instrument.

27. The system of claim 26, wherein the system is configured as an MRI imaging system.

28. The system of claim 25, wherein the system is configured as a server in a remote/extemal or cloud infrastructure.

29. The system of claim 28, wherein the server is configured to receive the set of spectral graph data over a network.

30. A non-transitory computer readable medium having instructions stored thereon, wherein the instructions when executed by the processor causes the processor to perform any one of the methods of claims 1-24 or any one of the system of claims 25-19.

Description:
AT Method for NMR Spectra Analysis

Related Application

[0001] This patent application claims priority to, and the benefit of, U.S. provisional application, US63/483,387, filed on February 6, 2023, entitled “Al Method for NMR Spectra Analysis,” which is hereby incorporated by reference herein in its entirety. This patent application claims priority to, and the benefit of, U.S. provisional application, US63/402,940, filed on August 31, 2022, entitled “Al Method for NMR Spectra Analysis”, which is hereby incorporated by reference herein in its entirety.

Government License Rights

[0002] This invention was made with government support under grant number 2103637 awarded by the National Science Foundation and grant number GM139482 awarded by the National Insititute of Health. The government has certain rights in the invention.

Background

[0003] A critical step in the analysis of complex NMR spectra of proteins, RNA, DNA and molecular mixtures is the identification of individual peaks in ID and cross-peaks in 2D and higher dimensional NMR spectra. Peak identification, often referred to as peak-picking, is a prerequisite for all subsequent steps in spectral analysis and interpretation, including resonance assignment and peak quantitation for biomolecular interaction and dynamics studies or the unambiguous elucidation of the composition of complex mixtures.

[0004] Computer-assisted and auto- mated 2D NMR peak picking has a long history starting in the 1980s, shortly after the introduction of 2D NMR. However, the development of peak-picking algorithms that can deal with strong spectral overlap and the presence of spectral artifacts has proven challenging to this day. Traditional peak pickers examine spectra in terms of geometric properties such as local maxima, local symmetry, and contour line features. Curvature and related features analysis have been employed, for example, using matrix factorization singular value decomposition and multivariate Gaussian densities. They can provide reliable identification of individual peaks, but only if they are sufficiently separated from each other. Other methods directly combine time-domain fitting and peak picking.

[0005] There is a benefit to improving the analysis of complex NMR spectra. Summary

[0006] An example Al method and system are disclosed for the analysis of nuclear magnetic resonance (NMR) spectra for the comprehensive and unambiguous identification and characterization of peaks, e.g., in NMR analyses of complex biological molecular systems. In one example, a deep neural network (DNN)-based method (also referred to as “Deep Picker” or “DP”) is employed for peak picking and spectral deconvolution that can be semi-automatically (or automatically) performed for analysis of one-dimensional, two-dimensional, or three- dimensional NMR spectra data. The exemplary method, in some embodiments, includes a plurality of hidden convolutional layers and was trained on a large number of synthetic spectra of known composition with variable degrees of crowdedness.

[0007] The term “NMR,” as used herein, also includes magnetic-spin-associated measurements such as magnetic resonance, as well as other nuclei (Nuclear) specific spectroscopy measurements, such as EPR, ESR, solution NMR, solid-state NMR, and others described herein.

[0008] The quantitative deconvolution of ID NMR spectra into individual resonances or peaks is a key step in many modern NMR workflows as it critically affects downstream analysis and interpretation. Depending on the complexity of the NMR spectrum, spectral deconvolution can be a notable challenging. Based on the recent deep neural network, DEEP Picker, and Voigt Fitter for 2D NMR spectral deconvolution, a fully automated solution for ID NMR spectral analysis is disclosed, including peak picking, fitting, and reconstruction. The method is demonstrated for complex ID solution NMR spectra showing excellent performance and also for spectral regions with multiple strong overlaps and a large dynamic range whose analysis is challenging for current computational methods.

[0009] In some aspects, the techniques described herein relate to a method to detect peaks in a spectral graph, the method including: receiving, by a processor, a set of spectral graph data (e.g., possible application to EPR, ESR, solution NMR, solid-state NMR, etc.); determining, by the processor, at least one peak location or value, in the set of spectral graph data using one or more trained, machine learning and/or artificial intelligence models, or a model derived therefrom; and causing, by the processor, the at least one peak location or value, to be displayed or employed in subsequent analysis. [0010] In some aspects, the techniques described herein relate to a method, wherein the one or more trained, machine learning and/or artificial intelligence model was trained using training data including synthetic spectra, wherein the training data consists of unambiguously identifiable peaks.

[0011] In some aspects, the techniques described herein relate to a method, wherein the training data further includes labeled peak data and at least one neighboring peak data.

[0012] In some aspects, the techniques described herein relate to a method, wherein the one or more trained, machine learning and/or artificial intelligence models include one or more neural network models configured to identify at least one peak location or value within the set of spectral graph data, wherein the one or more neural network models identify a point and its two nearest neighbors as a peak.

[0013] In some aspects, the techniques described herein relate to a method, wherein a neural network model includes a plurality of hidden convolutional layers and at least one max pooling layer, wherein the at least one max pooling layer is a final neural network layer.

[0014] In some aspects, the techniques described herein relate to a method, wherein one of the one or more neural network models further include a convolutional layer with an activation function (e.g. classification) configured to classify the at least one peak location or value as a peak, a shoulder peak, or non-peak.

[0015] In some aspects, the techniques described herein relate to a method, where one of the one or more neural network models further includes an output regression layer configured to determine a line shape centered around the at least one peak location or value.

[0016] In some aspects, the techniques described herein relate to a method, wherein the one or more trained, machine learning and/or artificial intelligence models are applied to the spectral graph data in a sliding window domain.

[0017] In some aspects, the techniques described herein relate to a method, wherein a low peak amplitude cutoff is applied to the spectral graph data.

[0018] In some aspects, the techniques described herein relate to a method, wherein the set of spectral graph data is solution NMR, solid-state NMR, EPR, or ESR graph data.

[0019] In some aspects, the techniques described herein relate to a method, further including: determining a spectral line over the at least one peak location or value. [0020] In some aspects, the techniques described herein relate to a method, wherein the spectral line employs a Lorentzian or Gaussian profile.

[0021] In some aspects, the techniques described herein relate to a method, wherein the spectral line employs a Voigt profile.

[0022] In some aspects, the techniques described herein relate to a method, wherein the subsequent analysis is one or both of querying peaks against a known spectral database, and quantifying concentrations from the one or more peak location or values.

[0023] In some aspects, the techniques described herein relate to a system including: a processor; and a memory having instructions stored thereon, for the execution of the exemplary method.

[0024] In some aspects, the techniques described herein relate to a system, wherein the system is configured as an NMR instrument.

[0025] In some aspects, the techniques described herein relate to a system, wherein the system is configured as an MRI imaging system.

[0026] In some aspects, the techniques described herein relate to a system, wherein the system is configured as a server (e.g., in a remote/external or cloud infrastructure).

[0027] In some aspects, the techniques described herein relate to a system, wherein the server is configured to receive the set of spectral graph data over a network.

Brief Description of the Drawings

[0028] Fig. 1 shows an example method of detect peaks in a spectral graph.

[0029] Fig. 2 shows the architecture of the deep neural network peak picker (DEEP

Picker), which is composed of seven ID convolutional layers with rectified linear (ReLU) unit activation functions (C1-C7), one max-pooling layer (Pl), one convolutional layer with a SoftMax activation function to classify every data point, and one convolutional layer with linear activation function to predict the peak position at the sub-pixel resolution, peak amplitude, peak width, and the Lorentzian fraction of its peak shape.

[0030] Figs. 3A-3F show examples of ID NMR training sets of convoluted NMR spectra (outer spectral line) and their deconvolutions (inner spectral lines). Sum spectrum (outer spectral line) that can be unambiguously deconvoluted into two individual overlapping peaks (inner spectral lines) (Fig. 3A, 3B, 3C). Sum spectrum (outer spectral line) generated from three distinct peaks (inner spectral lines, fdled circles), but can also be accurately explained by only two peaks (inner spectral lines, open circles) (Fig. 3D). Sum spectrum (outer spectral line) generated from four distinct peaks (inner spectral lines, filled circles), but can also be accurately explained by only three peaks (inner spectral lines, open circles) (Fig. 3E). Sum spectrum (outer spectral line) can be deconvoluted equally well into two distinct peak pairs (crosses and circles) (Fig. 3F).

[0031] Figs. 4A, 4B show peak predictions by DEEP Picker for K-Ras l 5 N-'H HSQC for part of a cross-section along the direct 1H dimension. Prediction score of Class 2 peaks (red), Class 1 peaks (magenta), and Class 0 non-peaks (black) from the output classifier layer after a 3- point moving average. The class with the highest score is the class assigned to a given data point after non-maximal suppression (Fig. 4A). Input spectrum (blue) together with reconstructed individual Class 2 peaks (red) and Class 1 peaks (magenta) (Fig. 4B).

[0032] Figs. 5A, 5B show ontour plots of synthetic 2D spectrum representing two overlapping cross-peaks (circles). First, DEEP Picker predicts ID peak positions for each column (horiztonal line) and row (vertical line). Next, the 2D peak-picking algorithm used intersections of horizontal and vertical lines to define 2D cross-peaks, while removing false positive peaks (crosses) using the approach described in the text (Fig. 5 A). If both horizontal and vertical lines deviate from perfect horizontal and vertical lines, respectively, the 2D peak-picking algorithm replaced the intersection peak with two cross-peaks (filled circles) near the true positions (open circles) using the approach described in the text (Fig. 5B).

[0033] Figs. 6A-6F shows the performance of DEEP Picker for selected regions of 2D 15 N- X H HSQC spectrum of a-synuclein. HSQC spectrum processed with original resolution and D-F with reduced resolution along indirect dimension (Figs. 6A-6C). Three pairs of Figs: (6A, 6D), (6B, 6E), (6C, 6F) show the same 2D regions for comparison. Picked cross-peaks are indicated as circles and color-coded according to their amplitude on a logarithmic scale, whereas the contour line spacings are linear. Despite the lower spectral resolution in Figs. 6D-6F, DEEP correctly picked the peaks, including all strongly overlapped cross-peaks. Note that the spectra of Figs. 6D-6F have reduced sensitivity since they used only half of the time-domain data.

[0034] Figs. 7A-7D shows the peak-picking performance of DEEP Picker for four different proteins. Selected regions of 2D 15 N- X H HSQC spectrum of the four different proteins Gankyrin (Fig. 7A), PLA2 (Fig. 7B), ARID (Fig. 7C), and Rop (Fig. 7D). Picked cross-peaks by DEEP Picker are indicated as circles and color-coded according to their amplitude on a logarithmic scale, whereas the contour line spacings are linear Experimental information and enlarged plots of each spectrum are given in the Supporting Information. Some of the weakest cross-peaks (small number of contours) were not picked because they are below the noise cutoff used by DEEP Picker.

[0035] Figs. 8A-8D show peak-picking results of DEEP Picker with the commonly used NMR peak pickers NMRPipe, Sparky, and NMRView. Selected regions of 15N-1H HSQC spectra of proteins belong to Rop (Fig. 8A), Gankyrin (Fig. 8B), aSyn (Fig. 8C), and ARID (Fig. 8D). Contour lines are plotted using a logarithmic scale. Only DEEP Picker identified all shoulder peaks, including strongly overlapped ones, such as the one in Fig. 8A at (8.48, 121.6) ppm.

[0036] Figs. 9A-9D show the of performance of DEEP Picker for 2D l 3 C 'H HSQC of mouse urine. Selected spectral regions are depicted in Figs. 9A-9D, which include the highly crowded carbohydrate region. DEEP Picker was able to identify and distinguish between crosspeaks that strongly overlap, which poses a significant challenge for their analysis by traditional peak pickers. Picked cross-peaks are indicated as circles and color-coded according to their amplitude on a logarithmic scale with logarithmic contour line spacings.

[0037] Figs. 10A-10D show the application of DEEP Picker to 2D NOESY and TOCSY spectra. Selected regions of 2D X H- X H NOESY of Im7 (Figs. 10A, 10B) and 2D X H- X H TOCSY of mouse urine (Figs. 10C, 10D) with picked cross-peaks indicated as circles that are color-coded according to their amplitude (logarithmic scale, see sidebar). DEEP Picker identified strong and weak cross-peaks, including ones that severely overlap or show multiplet structures due to J- splittings, whose analysis is often challenging for traditional peak pickers.

[0038] Figs 11A-1 ID show the performance of DEEP Picker for selected regions of 2D NOESY spectrum of protein Im7 (Figs. 11 A, 1 IB) and Figs. 11C, 1 ID show the same regions, but picked peaks have different color coding. Contour lines are plotted using a logarithmic scale. In Figs. 11 A and 11C, picked peaks are color-coded according to the cross-peak amplitude (on logarithmic scale, see sidebar), whereas in Figs. 11B and 11D, picked peaks are color-coded according to the predicted confidence level score (on a linear scale, see sidebar).

[0039] Figs. 12A-12F Comparison of the performance of DEEP Picker for the same spectra with different signal -to-noise (S/N). Selected regions of X5 N- X H HSQC spectra of K-Ras protein at a very low concentration of 130 pM recorded at 850 MHz proton frequency with either 4 scans (Figs. 12A, 12B, 12C) or 108 scans per ti-increment (Figs. 12D, 12E, 12F). Estimated signal-to-noise ratios are 25 (Figs. 12 A, 12B, 12C) and 125 (Figs. 12D, 12E, 12F). Contour lines are plotted using a linear scale, and the cross-peaks picked by DEEP Picker are indicated by open circles that are color-coded according to the cross-peak amplitudes (on a logarithmic scale, see color sidebar). In the case of the low S/N spectrum, DEEP Picker sometimes picks multiple peaks on the top of cross-peaks since the peaks have uneven shapes caused by the presence of noise. On the other hand, DEEP Picker sometimes misses low amplitude cross-peaks when they are too close to the noise floor (Figs. 12C vs. 12F). It should be noted that the examples above were chosen specifically to illustrate potential challenges for spectra with low S/N ratios. At the same time, many cross-peaks are picked by DEEP Picker without difficulties even in the low S/N spectrum (Figs. 12A and 12B).

[0040] Fig. 13 shows an example of a synthetic spectrum used for the training of DNN peak picker. The outer spectrum represents the superposition of three individual overlapping Voigt peaks (inner peaks). The blue sum spectrum, together with the three individual Voigt peaks (ground truth) , serve as input for the training of the DNN peak picker.

[0041] Figs. 14A, 14B show a crowded region of 2D l5 N-'H HSQC spectrum of a- synuclein using uniformly sampled time-domain data (Fig. 14A) and non-uniformly sampled (25%) time-domain data (Fig. 14B) with spectral reconstruction using SMILE. Contour lines are plotted using a linear scale, and the cross-peaks picked by the DNN (DEEP Picker) are indicated by open circles that are color-coded according to the cross-peak amplitudes (logarithmic scale, see color sidebar). The uniformly sampled spectrum was collected with 256x1024 complex points and zero-fdled to 2 K (Ni) x 8 K (N2) points.

[0042] Figs. 15 A, 15B show the performance of a DNN peak picker for a crowded region of A, a 2D NOESY spectrum of Im 7 protein (Fig. 15 A) and a 2D TOCSY spectrum of mouse urine metabolomics sample (Fig. 15B) (contours are plotted on logarithmic scale) together with cross-peak positions returned by DEEP Picker as open circles where the colors reflect the peak amplitudes (see color sidebar). Peaks with amplitudes below the low limit of the color sidebar are depicted as gray crosses.

[0043] Fig. 16 shows the workflow for the semi-automated quantitative analysis of HSQC spectra of metabolite mixtures by the COLMARq web server, e.g., as an analysis server or employing an analysis engine. COLMARq, via its analytical software components, allows for upload of cohorts of HSQC and TOCSY spectra, automated peak picking, peak fitting for quantification, peak matching between spectra, data normalization via ratio analysis, database query for metabolite identification, and peak- and compound-based uni- and multi-variate statistical analyses.

[0044] Figs. 17A, 17B show a selected region of the 13 C- X H HSQC spectrum of the biofilm (Fig. 17A) and the reconstructed spectrum by COLMARq (Fig. 17B) from the fitted peaks. Contour lines are plotted using a logarithmic scale and the fitted cross-peaks are indicated by plus symbols that are color-coded according to the cross-peak amplitudes (logarithmic scale, see color sidebar). Residual fitting errors are plotted in both panels (Fig. 17A and 17B) as red (positive) and blue (negative) contour lines using the same scale.

[0045] Fig. 18 shows an example of two matched (consensus) cross-peaks across the HSQC spectra of four different P. aeruginosa samples. A high-sensitivity doublet is labeled by ellipses in the upper half of the spectra, whereas a low-sensitivity multiplet is labeled by ellipses in the lower half of the spectra containing either three or four individual cross-peaks across the different spectra. Individual peaks that belong to these two consensus peaks are labeled as open circles. The other peaks that were part of other consensus peaks are labeled as small fdled circles, and nonconsensus peaks that appear only in one spectrum are labeled as small open circles.

[0046] Figs. 19A, 19B show performance analytics of COLMARq. For normalization, peak volumes are divided by matched peaks of a user-selected reference spectrum and the logio(ratios) are rank ordered and plotted versus the number of peaks (Fig. 19A). The average ratio of the flat central part, calculated as the median of the 35-65% percentile ratios, determines the normalization factor for each spectrum and all peaks are divided by this factor. After peakbased statistical analysis, COLMARq displays the p-value histogram (Fig. 19B) showing the distribution of -values from t-tests. In this example, a high number of significant differences between cohorts reflect the inherent metabolic heterogeneity of the P. aeruginosa planktonic and biofilm cultures.

[0047] Figs. 20A, 20B show an exemplary COLMARq user interface. The user interface allows for visual inspection of the metabolite matches after a database query for metabolite identification. The user can click through each metabolite match for visual inspection of the specific spectral regions of both the HSQC (left) and TOC SY (right) spectrum of each sample forjudging the match (Fig. 10A). The spectra of two representative samples of each cohort matched to aspartate for Cohort 1 in the top row and Cohort 2 in the bottom row (Fig. 20 A). Tn the HSQC spectra, the blue circles represent the expected database peaks for a metabolite, the red circles indicate the consensus peak position from user spectra for each expected metabolite peak, and the green circles mark the peaks the user selects for quantification, which can be manually edited. In the TOCSY spectra, the pink circles mark expected cross-peaks for a metabolite match- With its high peak matching ratio in the HSQC and the presence of the expected TOCSY cross-peaks, aspartate is a good match The chart shown (Fig. 20B) reports the quantitative information for each peak of the metabolite match, denoted by its unique peak index, including mean values of peak volumes with their standard deviations for each cohort along with 6 scores and /^-values from t-tests.

[0048] Fig. 21 shows an example of COLMARq user interface for visual inspection of a peak match after a database query of four different samples where each panel represents a different sample.

[0049] Figs. 22A-22F demonstrate DEEP Picker and Voigt Fitter ID for selected regions of ID X H spectrum of glucose. (A), (C), (E): Experimental and reconstructed spectra are depicted in black lines and red dots, respectively. Deconvoluted individual peaks are depicted as blue lines. (B), (D), (F): Simulated spectra including strong coupling effects based on chemical shift and scalar-coupling spin Hamiltonian with parameters taken from GISSMO website at the same Bo field strength (850 MHz X H frequency) as in the experiments. Transverse Ri relaxation rates were uniformly set to a low value of 0.6 s' 1 to obtain a very high-resolution spectrum for better comparison with DEEP Picker. Pairs of panels (A, B), (C, D), (E, F) show the same ID spectral regions. DEEP Picker and Voigt Fitter ID correctly deconvoluted the experimental spectra for both simple regions (A) and more complex regions (C and E).

[0050] Fig. 23 shows the application of DEEP Picker and Voigt Fitter ID to selected regions of ID spectra, which were generated by adding two selected traces along direct X H dimension from experimental 2D X3 C- X H HSQC of mouse urine sample. (A), (C), (E): Experimental and reconstructed spectra are depicted as black lines and red dots, respectively. Deconvoluted individual peaks are depicted as blue lines. (B), (D), (F): The two HSQC traces and their sum are depicted as purple, cyan, and black lines, respectively. Pairs of panels (A, B), (C, D), and (E, F) show the same ID spectral regions for comparison. The deconvolution by DEEP PickerlD was performed with model 1 with a PPP of 12. [0051 ] Fig. 24 shows the application of DEEP Picker and Voigt Fitter ID to a spectral region of ID X H spectrum of mouse urine. Experimental and reconstructed spectra are depicted as black lines and red dots, respectively. Deconvoluted individual peaks are depicted as blue lines. The deconvolution by DEEP PickerlD was performed with model 2 with a PPP of 8.

[0052] Figs. 25A, 25B show an exemplary quantitation analysis of an artificial serum sample using DP1D and FVld analysis method (Fig. 25A) from a solution NMR spectra (Fig. 25B).

[0053] Figs. 26A, 26B show reconstructed NMR spectra using DP1D and FVld analysis methods from high-field NMR spectra (Fig. 26A) and low-field NMR spectra (Fig. 26B). High- field NMR spectra were taken using 850MHz field, and low field NMR spectra were taken using a 80 MHz field.

[0054] Figs. 27A, 27B, 27C show reconstructed NMR spectra of a DMEM sample using DP ID and FVld analysis methods from high-field NMR spectra (Fig. 27A) and low-field NMR spectra (Fig. 27B). High-field NMR spectra were taken using an 850MHz field, and low field NMR spectra were taken using an 80 MHz field. Quantitation predictions of the DMEM sample metabolites from the NMR spectra using DP1D and FVld analysis methods (Fig. 27B).

[0055] Fig. 28 shows and example computing device.

Detailed Specification

[0056] To facilitate an understanding of the principles and features of various embodiments of the present invention, they are explained hereinafter with reference to their implementation in illustrative embodiments.

[0057] An exemplary method for detecting peaks in spectral graph data is shown in Fig.

1. The method may be applied to spectral graph data from solution NMR, solid state NMR, EPR, ESR, MRI, or other spectral data that can be decomposed and fitted to known peak shapes, such as Gaussian, Lorentzian, or a combination thereof, such as Voigt. The exemplary method may comprise receiving the spectral graph data 110, determining at leas one peak location from the spectral data using a trained machine learning and/or artificial intelligence model, or a model derived therefrom 120, and outputting the at least one peak location or value to be displayed or employed in subsequent analysis 130. The method may further comprise determining a spectral line over the at least one peak location or value. The spectral line may be approximated based on a combination of Gaussian and Lorentzian shape profiles, in particular by a Voigt profile. The subsequent analysis of the identified peaks may include one or both of querying the identified peaks against a known spectral database and quantifying concentrations from one or more peak locations or values.

[0058] In some examples, the trained machine learning and/or artificial intelligence models comprise one or more neural network models configured to identify at least one peak location or value within the spectral graph. The neural network models may identify a point and neighboring points, such as 2, 3, or 4 neighboring points. The spectral data may be preprocessed by apodization using a 27t-Kaiser or cosine-squaw window function along each dimension without resolution enhancement or reprocessing the spectrum with adequate zero-filling; phase correction could be applied along all dimensions so that the maximal phase error does not exceed 3°, and standard baseline correction along all dimensions, which in common NMR spectral processing software. A low peak amplitude cutoff may also be applied to the spectral graph data at [[For protein 15N-1H HSQC applications, the default LPAC was 9 times the noise level, whereas, for metabolomics, the default cutoff was 5.5 times the noise level. LPAC was adjusted by the user, or the returned peak list was edited according to amplitude as well as other criteria specified by the user during post-processing. The optimal LPAC also depended on the signal-to- noise-ratio of the spectrum, the dynamic range of cross-peaks of interest for downstream analysis, and the presence of sample impurities, chemically modified or aggregated proteins. It is contemplated that a set of rules can be established for an optimal choice of the LPAC for various types of applications, which may be accomplished by analyzing spectra of many different biomolecular systems with DEEP Picker. Machine learning and/or artificial intelligence models are applied to the spectral graph data in a sliding window domain.

[0059] In some implementations, the trained machine learning and/or artificial intelligence models are trained on spectral graph data comprising synthetic spectra, wherein the synthetic spectra are convolutions of known peak shapes. The synthetic spectra are chosen such that the known peak shapes can be deconvoluted and unambiguously identified. The training data may further include experimentally derived and manually labeled peak data and at least one neighboring peak data.

[0060] In some implementations, the trained machine learning and/or artificial intelligence models comprise one or more neural networks. In one implementation, one neural network comprises a plurality of hidden convolutional layers. A plurality may be determined by one or more optimization protocols for a system. Tn one example, seven hidden convolutional layers are provided in the neural network. Following the hidden convolutional layer, a hidden max pooling layer is the penultimate layer of the neural network. The final layers may be run in parallel, a convolutional layer with an activation function configured to classify the peak location or value as a peak, a shoulder peak, or non-peak, and an output regression layer configured to determine the line shape centered around the at least one peak location or value.

[0061] The method may further comprise determining a spectral line over the peak location or value. The spectral line may define the shape of the peak, which may be approximated as a Lorentzian or Gaussian profile or a combination thereof. In one example, the line shape is approximated by a Voigt profile.

[0062] The one peak location or value and accompanying spectral line may be employed in subsequent analysis to further quantify the components that gave rise to the spectra. In one example, the subsequent analysis comprises querying peaks against a known spectral database. In another example, the subsequent analysis comprises quantifying concentrations from the peak locations or values. In yet other examples, both querying peaks against a known spectral database and quantifying concentrations from the peak locations or values may be employed. [0063] An exemplary system is described herein that is configured to perform the exemplary methods for detecting peaks in spectral graph data. The system may comprise a processor and a memory having instructions stored thereon such that the execution fo the instruction by the processor causes the processor to perform a set of method steps for detecting peaks in spectral graph data. The system may be configured on a benchtop instrument, such as an NMR instrument, it may be configured as an MRI imaging system, or configured as a server in a remote, external, or cloud infrastructure. In the system, the server may be configured to receive the set of spectral graph data over a network.

[0064] An exemplary non-transitory computer-readable medium having instructions stored thereon for detecting peaks in spectral graph data, wherein the instructions, when executed by a processor, cause the processor to perform any one of the methods or any one of the systems that cause the detecting peaks in spectral graph data.

[0065] In addition to the disclosed AI/ML algorithms, other AI/ML algorithms can be employed in addition to those described herein. [0066] Machine Learning. The term “artificial intelligence” (e.g., as used in the context of Al systems) can include any technique that enables one or more computing devices or computing systems (i.e., a machine) to mimic human intelligence. Artificial intelligence (Al) includes but is not limited to knowledge bases, machine learning, representation learning, and deep learning. The term “machine learning” is defined herein to be a subset of Al that enables a machine to acquire knowledge by extracting patterns from raw data. Machine learning techniques include, but are not limited to, logistic regression, support vector machines (SVMs), decision trees, Naive Bayes classifiers, and artificial neural networks. The term “representation learning” is defined herein to be a subset of machine learning that enables a machine to automatically discover representations needed for feature detection, prediction, or classification from raw data. Representation learning techniques include, but are not limited to, autoencoders. The term “deep learning” is defined herein to be a subset of machine learning that enables a machine to automatically discover representations needed for feature detection, prediction, classification, etc., using layers of processing. Deep learning techniques include but are not limited to artificial neural networks or multilayer perceptron (MLP).

[0067] Machine learning models include supervised, semi-supervised, and unsupervised learning models. In a supervised learning model, the model learns a function that maps an input (also known as feature or features) to an output (also known as target or target) during training with a labeled data set (or dataset). In an unsupervised learning model, the model has a pattern in the data. In a semi-supervised model, the model learns a function that maps an input (also known as a feature or features) to an output (also known as a target) during training with both labeled and unlabeled data.

[0068] Neural Networks. An artificial neural network (ANN) is a computing system including a plurality of interconnected neurons (e.g., also referred to as “nodes”). This disclosure contemplates that the nodes can be implemented using a computing device (e.g., a processing unit and memory as described herein). The nodes can be arranged in a plurality of layers, such as an input layer, an output layer, and optionally one or more hidden layers. An ANN having hidden layers can be referred to as a deep neural network or multilayer perceptron (MLP). Each node is connected to one or more other nodes in the ANN. For example, each layer is made of a plurality of nodes, where each node is connected to all nodes in the previous layer. The nodes in a given layer are not interconnected with one another, i.e., the nodes in a given layer function independently of one another. As used herein, nodes in the input layer receive data from outside of the ANN, nodes in the hidden layer(s) modify the data between the input and output layers, and nodes in the output layer provide the results. Each node is configured to receive an input, implement an activation function (e.g., binary step, linear, sigmoid, tanH, or rectified linear unit (ReLU) function), and provide an output in accordance with the activation function. Additionally, each node is associated with a respective weight. ANNs are trained with a dataset to maximize or minimize an objective function. In some implementations, the objective function is a cost function, which is a measure of the ANN’s performance (e.g., error such as LI or L2 loss) during training, and the training algorithm tunes the node weights and/or bias to minimize the cost function. This disclosure contemplates that any algorithm that finds the maximum or minimum of the objective function can be used for training the ANN. Training algorithms for ANNs include but are not limited to backpropagation. It should be understood that an artificial neural network is provided only as an example machine learning model. This disclosure contemplates that the machine learning model can be any supervised learning model, semisupervised learning model, or unsupervised learning model. Optionally, the machine learning model is a deep learning model. Machine learning models are known in the art and are therefore not described in further detail herein.

[0069] A convolutional neural network (CNN) is a type of deep neural network that has been applied, for example, to image analysis applications. Unlike traditional neural networks, each layer in a CNN has a plurality of nodes arranged in three dimensions (width, height, and depth). CNNs can include different types of layers, e.g., convolutional, pooling, and fully- connected (also referred to herein as “dense”) layers. A convolutional layer includes a set of fdters and performs the bulk of the computations. A pooling layer is optionally inserted between convolutional layers to reduce the computational power and/or control overfitting (e.g., by downsampling). A fully-connected layer includes neurons, where each neuron is connected to all of the neurons in the previous layer. The layers are stacked similarly to traditional neural networks. GCNNs are CNNs that have been adapted to work on structured datasets such as graphs.

[0070] Other Supervised Learning Models. A logistic regression (LR) classifier is a supervised classification model that uses the logistic function to predict the probability of a target, which can be used for classification. LR classifiers are trained with a data set (also referred to herein as a “dataset”) to maximize or minimize an objective function, for example, a measure of the LR classifier’s performance (e.g., an error such as LI or L2 loss), during training. This disclosure contemplates that any algorithm that finds the minimum of a cost function can be used. LR classifiers are known in the art and are therefore not described in further detail herein. [0071] An Naive Bayes’ (NB) classifier is a supervised classification model that is based on Bayes’ Theorem, which assumes independence among features (i.e., the presence of one feature in a class is unrelated to the presence of any other features). NB classifiers are trained with a data set by computing the conditional probability distribution of each feature given a label and applying Bayes’ Theorem to compute the conditional probability distribution of a label given an observation. NB classifiers are known in the art and are therefore not described in further detail herein.

[0072] A k-NN classifier is a supervised classification model that classifies new data points based on similarity measures (e.g., distance functions). The k-NN classifiers are trained with a data set (also referred to herein as a “dataset”) to maximize or minimize a measure of the k-NN classifier’s performance during training. The k-NN classifiers are known in the art and are therefore not described in further detail herein.

[0073] A majority voting ensemble is a meta-classifier that combines a plurality of machine learning classifiers for classification via majority voting. In other words, the majority voting ensemble’s final prediction (e.g., class label) is the one predicted most frequently by the member classification models. The majority voting ensembles are known in the art and are therefore not described in further detail herein.

Experimental Results and Additional Examples

[0074] In one example, a deep neural network for accurate deconvolution of complex two-dimensional NMR spectra was developed (Example 1). A machine learning model for the peak picking of biomolecular NMR spectra is described in Example 2, and an example web server for 2D NMR peak picking and quantitative comparative analysis is described in Example 3. An example of a DEEP Picker ID and Voigt FitterlD method is described in Example 4 [0075] Example 1

[0076] Generation of the training set. Since, for every deep learning project, the size and quality of database information are essential components, the exponential growth of data repositories over the past decade has been one of the major drivers for the rapid progress in deep learning. Such data can be either obtained from the real world (e g., image libraries) or be synthetically generated entirely in silico, such as in data augmentation techniques [30], A critical part of the successful training of a neural network is the availability of a large amount of high- quality training data that comprehensively mirror the envisioned applications. This includes an optimal class balance of the training data by ensuring that the distribution of training data among all classes is unbiased; otherwise, DNN accuracy for the identification of members of the underrepresented class(es) will be reduced [31 ], [32], As an example, for NMR peak picking, if the peaks in the training set have dominantly Gaussian lineshape, the resulting DNN is more likely to fail when applied to Lorentzian peaks and vice versa.

[0077] In order to ensure a large database of class-balanced, high-quality training data, a database consisting of synthetic ID NMR spectra with different peak widths, shapes, and peak overlaps was developed. As discussed below, the ID NMR-trained DNN was also deployed to higher dimensional spectra.

[0078] In using synthetic spectra, the parameters of the individual peaks were accurately defined; their use as a training set had benefits over an experimental training set in that the “ground truth” was, by definition, known. A synthetic database has the additional advantage over an experimental database in that it allows almost unlimited coverage by sampling many more different peak shapes and overlap scenarios without requiring a human expert’s input for peak classification. The shapes of all synthesized NMR peaks generated here follow a Voigt profile [33], which corresponds to the convolution of a Lorentzian and a Gaussian peak shape with the two shapes present in different amounts. The rationale for the Voigt profile is that in solution NMR, the natural lineshapes are in good approximation Lorentzian, but after apodization using commonly used window functions, such as the 27t-Kaiser window function, the peaks acquire a Voigt profile with some Gaussian component with improved spectral resolution. The amount of the Lorentzian component in the final peak shape depends on the natural linewidth, i.e., 1/(%T2), where T2 is the transverse relaxation time.

[0079] In the training set, the number of points per peak (PPP), which is given by the number of data points that sample the peak’s fullwidth at half-height (FWHH) (i.e., FWHH/(digital resolution)), was allowed to vary either from 6 to 20 points or from 4 to 12 points, whereby the former was typical for protein and the latter for metabolomics spectra. The Lorentzian component can vary from 0%, corresponding to a pure Gaussian lineshape, to 100%, corresponding to a pure Lorentzian lineshape. It is worth mentioning that for a given spectrum, both synthetic and experimental, PPP can be easily adjusted to meet the above criterion by adjusting the amount of zero-filling prior to Fourier transformation.

[0080] To generate synthetic ID spectra for the purpose of the training and validation of the DEEP Picker, a database of peak pairs with random separation and amplitude was generated first.

[0081] In this synthetic database, like in experimental spectra, a subset of peaks overlap so strongly that they are impossible to distinguish, e.g., the spectrum resulting from two overlapped peaks was impossible to uniquely deconvolute and, in the presence of some noise, can be equally well represented by a single peak. For the training of the neural network, it was important that a spectrum was assigned to deconvoluted peaks in a way that the neural network had a realistic chance to identify them; otherwise, it was likely to fail in practice when encountering such types of ambiguous situations. Previous work [29] relied on peak lists generated by expert NMR spectroscopists to make this distinction, which implicitly makes human judgment an important part of the neural network training process. Here, the exemplary system employed a strategy that uses a well-defined mathematical criterion instead in the form of nonlinear peak fitting as the gold standard.

[0082] Specifically, nonlinear peak fitting was applied, assuming a single peak on all potential peak pairs. When the maximal difference in amplitude for all points of the original and the fitted peak was less than 3% of the peak amplitude, a synthetic peak pair was excluded from the training set. Figure 3A, 3B shows examples of two and three overlapped peaks that can be uniquely identified, and Fig. 3C shows a case where the superposition of the two black peaks can also be explained by a single peak with minimal error. In experimental spectra, most peaks will not have a perfect Voigt shape because of noise, apodization, baseline distortion, and small phase errors, which cause asymmetric peak shapes. To prevent the trained neural network from picking only perfectly shaped Voigt-shaped peaks, a selection of strongly overlapped peak pairs were labeled as a single peak if the maximal absolute error is <2% (after peak fitting with a single peak) and the peak widths of the two peaks differ by less than a factor 1.5. Figure 3D shows an entry of the synthetic peak database for which the profile was generated from three peaks but assigned to only two distinct peaks (red peaks). [0083] Next, more complex synthetic spectra were generated representing 3-5 peaks by randomly adding peaks and peak pairs from the original database. After this process, nonlinear peak fitting was performed to determine whether the generated spectrum can be explained within 3% maximal error in a robust manner with a smaller number of peaks than the number used to generate it. If this was possible, the spectrum was removed from the database. Figure 3E shows an example of such a spectrum, which was added to the training set, generated from four overlapped peaks (black peaks) but assigned only to three well-defined, distinct peaks (red peaks). These ID spectra were then combined to form spectra with 300 data points with 3-9 peaks. Our final training and validation sets consist of 5000 and 500 of these ID spectra, respectively.

[0084] Accurate identification of shoulder peaks was one of the most challenging tasks for any peak-picking algorithm. Unlike main peaks, shoulder peaks do not often belong to local maxima of the full spectrum, which makes their identification, along with the accurate determination of their positions and amplitudes, significantly harder. Figure 3F depicts the profile of two overlapped peaks along with two distinct peak deconvolutions, which both achieve <0.1% maximal error. The position of the main peak (left peak) was well-defined, but the shoulder peak had a large positional uncertainty. Such situations were taken into account in the design of DEEP Picker, as is discussed below. It was found that it was not necessary to include spectral artifacts in the training sets. Certain potential artifacts were best addressed during spectral processing, such as baseline correction and apodization. Proper phase correction (Oth and 1st order) was also important, although phase errors of a few degrees were tolerated by the model, which was realistically achievable in practical applications. Residual water signals and ti- noise can be identified before or after peak picking, as discussed below.

[0085] Design and training of DEEP Picker. Inspired by widely used image recognition and image labeling neural networks [34], DEEP Picker determined a point-by-point prediction of the input spectrum on top of a sliding window using stacked convolutional layers [35], DEEP Picker assessed every data point along the spectrum as either a peak or a non-peak, and if assessed as a peak, it predicted peak shape and amplitude. To boost the performance of the above algorithm for NMR spectral analysis, two changes were made. First, to accommodate the peak positional uncertainty, instead of labeling just the data point closest to the predicted peak position, in the training set, the three closest points to the predicted peak position were defined as “peak” and all other data points as “non-peak .” This permitted accurate peak identification even when the predicted peak position is less well defined, e.g., in the presence of a strong overlap. At the same time, the score for the successful peak prediction is increased.

[0086] In this way, the neural network was prioritized to predict other peaks accurately instead of trying to provide exact locations of peaks that have intrinsically elevated positional uncertainties. Second, the prediction of peak parameters of standalone peaks or overlapping peaks whose amplitudes substantially exceed those of their overlapping partner peaks was accurately achieved, whereas peak parameter prediction of peaks, such as shoulder peaks, that were weaker than their overlapping neighbors is naturally much harder.

[0087] It was observed that it is beneficial to predict the peak parameters of these two different types of peaks by using separate neural network components in the output regressor layer, which can be achieved by using different labels for the two types of peaks.

[0088] Three different classes of output in the training have been implemented, though other classes may be employed, and the prediction framework for each point of a spectrum. The classes include “Class 2 peaks”, which are spectral features that can be explained by single peaks or peaks that dominate their overlapping neighbor peaks, “Class 1 peaks”, which are peaks that have overlap and are dominated by their overlapping neighbor peak(s) in terms of peak amplitude and volume and are usually manifested as shoulder peaks, and “Class 0 non-peaks”, which are spectral points that do not correspond to a peak center.

[0089] The neural network was implemented and trained using TensorFlow vl .336, taking ID spectra as input. The architecture of DEEP Picker is illustrated in Fig. 2.

[0090] In Fig. 2, the deep neural network peak picker included seven ID convolutional layers with rectified linear (ReLU) unit activation functions (C1-C7), one max-pooling layer (Pl), one convolutional layer with a SoftMax activation function to classify every data point, and one convolutional layer with linear activation function to predict the peak position at the subpixel resolution, peak amplitude, peak width, and the Lorentzian fraction of its peak shape. The input of DEEP Picker is an N x 1 tensor (column vector), where N is the number of data points of the ID input spectrum. Hidden layers and output layers all had the same dimension N as the input. The depths of the 8 hidden layers (from Cl to C7, Pl) were 40, 20, 10, 20, 10, 30, 18, and 18. Their kernel sizes were 11, 1, 11, 1, 1, 11, 1, and 3, and they are 1 for both the classifier and regressor. As is common in machine learning, all kernels are applied N times in a sliding window fashion, with each layer having multiple kernels (note that in the figure for each layer, only a single kernel is indicated for a given position). The output classifier layer (top right) yields the prediction of whether a data point is a Class 2 peak, Class 1 peak, or Class 0 non-peak. The output regressor layer (bottom right) yields the predicted peak parameters for all peaks (Class 2 and 1).

[0091] After hyper-parameter tuning, DEEP Picker contains 7 hidden convolutional layers, 1 hidden max-pooling layer, and two parallel output layers with a total of 8037 trainable parameters. In typical Convolutional Neural Networks for image classification and object detection, max pooling is combined with convolutional layers to achieve location invariance of features. By contrast, location invariance generally does not apply in NMR spectra since a shift of one peak (or feature) will generally affect the interpretation of nearby peaks (or features). Therefore, the max-pooling layer was used as the penultimate layer. A convolutional layer with SoftMax activation [37], called the output classifier layer, was utilized to classify every data point, which assigns an individual score for all three peak classes (2, 1, or 0), which were then normalized for each data point so that their sum was 1. The class with the maximal score was then chosen as the predicted class, with the numerical score as a quantitative measure of confidence of the predicted class for each data point of the input spectrum. For any data point predicted to be a peak (Class 2 or 1), DEEP Picker predicted the subpixel peak position relative to the on-grid points, peak amplitude, peak width, and the Lorentzian vs. Gaussian components to the Voigt shape using another convolutional layer, called output regressor layer. It is worth mentioning that all kernels were applied multiple times, i.e., across the full input spectrum in a sliding window fashion, and each convolutional layer has multiple kernels, although in Fig. 2, only one kernel operating at an arbitrarily chosen position is illustrated for each layer. The loss function was the mean squared error (MSE) for the regressor and cross-entropy for the classifier. The loss value (training target) was the weighted average of the cross-entropies of the three classes of data points and the MSE of the two classes of peaks. DEEP Picker was trained using the Adam optimizer with a learning rate of 0.002 for 4000 epochs [37],

[0092] All data were used simultaneously in a single batch. The performance of the validation set was monitored in this process to prevent potential overfitting. The small size ofhte neural network compared to the size of the training set generally avoids overfitting. The output classifier layer (top-right) assigned to every spectral data point output in the form of either a Class 2 peak, Class 1 peak, or a Class 0 non-peak. The output regressor layer (bottom -right) predicted peak parameters (amplitude, linewidth, etc.) for any Class 2 or 1 peak. Because three data points were labeled to be a peak for each true peak in the training set, DEEP Picker predicted three consecutive data points as the peak for well-defined peaks. However, for a peak with large positional uncertainty, such as a strongly overlapped peak, DEEP Picker assigned peaks to regions with fewer or more than three data points. In either case, the application of a non-maximum suppression algorithm [34], [38] for post-processing kept only a (single) data point that has the highest score for each region, as further explained below.

[0093] The ID DEEP Picker was then tested for a ID 'H cross-section (along co ) of an experimental 2D 15 N- L H HSQC spectrum of KRas, which is a globular protein with 169 residues.

[0094] Figure 4A shows the point-by-point prediction of the output classifier layer where the red, magenta, and black lines are scores for the Class 2 peaks, Class 1 peaks, and Class 0 non-peaks, respectively. For each data point, the class with the highest score was taken as the predicted class. The sum of scores for Class 2 peaks and for Class 1 peaks was taken as a confidence level score of the picked peaks. This helps focus subsequent visual inspection on low-scoring peaks for their potential removal from further analysis. For example, Class 2 has the highest score for three consecutive data points, around 8.59 ppm (indicated by a red arrow), and hence, all three data points were predicted to be Class 2 peaks. Application of the non-maximum suppression algorithm suppressed low-confidence predicted peaks that were direct neighbors of predicted peaks, and only the middle data point with a score around 1.0 was kept since the scores of the two neighboring data points were around only 0.6. Once this middle data point was identified as a peak, the deconvoluted peak was generated at sub-pixel position resolution along with its peak amplitude, peak width, and the fraction of Lorentzian vs. Gaussian components obtained from the output regressor layer. In Fig. 5B, red lines correspond to reconstructed individual Class 2 peaks from the prediction, including the peaks at 8.59 and 8.65 ppm. Magenta lines correspond to reconstructed individual Class 1 peaks (shoulder peaks) by the same method, and the sum of the red and magenta spectra corresponds to the input spectrum.

[0095] Because DEEP Picker was a local feature-based predictor, it also assigned Class 2 or 1 peaks to noise features that are close to the baseline in regions without signal. Such noise peaks are subsequently removed if they are below a peak amplitude cutoff based on an automated global noise level estimator.

[0096] The noise level of an NMR spectrum was defined as 1.485 times the median absolute deviation (MAD) of the full spectrum by performing two rounds of iterations after signals that exceeded the threshold were removed after the first iteration. This measure as notably robust as long as there are sizeable empty regions in the spectrum.

[0097] DEEP Picker used a “low peak amplitude cutoff’ (LPAC) to consider an identified feature as a peak, which depends on the molecular system and the intended use of the spectrum. For protein l 5 N-'H HSQC applications, the default LPAC was 9-times the noise level, whereas for metabolomics, the default cutoff was 5.5 times the noise level. LPAC was adjusted by the user, or the returned peak list was edited according to amplitude as well as other criteria specified by the user during post-processing. The optimal LPAC also depended on the signal-to- noise-ratio of the spectrum, the dynamic range of cross-peaks of interest for downstream analysis, and the presence of sample impurities, chemically modified or aggregated proteins. It is contemplated that a set of rules can be established for an optimal choice of the LPAC for various types of applications, which may be accomplished by analyzing the spectra of many different biomolecular systems with DEEP Picker.

[0098] Similarly, it is contemplated that DEEP Picker may also predict a point with a small deviation from an otherwise smooth profile to be a separate peak. Because the predicted amplitude of such a peak will be small, it can be filtered out using the same type of peak amplitude cutoff.

[0099] Generalization to 2D spectra. Because 2D NMR cross-peaks can have a much larger number of different peak shapes and overlap patterns than ID spectra, the training set needed to be extremely large to achieve a robust neural network-based peak picker. The DEEP Picker was applied separately to all rows and columns of a 2D spectrum and combined the scores for 2D cross-peak identification. In order for a 2D data point to be identified as a cross-peak, the data point was assigned by DEEP Picker to a ID peak in both its cross-sections along coi (column) and C02 (row) (exceptions will be discussed below). Peak width, sub-pixel position resolution, and percentage of Lorentzian vs. Gaussian components to the Voigt profile along the two dimensions were taken directly from the corresponding ID prediction, whereas the peak amplitude was obtained as the average of the two ID predictions and the peak confidence level score was calculated as the lower of the two ID confidence level scores.

[0100] 2D spectral peak picking is illustrated in Figs. 5A, 5B using a synthetic spectrum consisting of two overlapping cross-peaks where the true peak positions were indicated by circles. Figure 5A shows the two cross-peaks at locations (40,40) and (48,48) were picked correctly. In addition, the row-based ID peaks (bold horizontal lines) identified by DEEP Picker also intersect with the column-based ID peaks (bold vertical lines) at the locations (40,48) and (48,40), which would cause the prediction of these two additional cross-peaks (crosses) that are however false. The 2D peak-picking algorithm was able to identify and remove these types of false-positive cross-peaks based on the fact that along both their rows and columns, they behave as ID shoulder peaks (Class 1 peaks). Figure 6B illustrates another instructive case where two true cross-peaks (solid circles) were close along both the direct and indirect dimensions, and only one ID peak was predicted for any column (bold horizontal line) and any row (bold vertical line), despite that the cross-peak shape suggested the presence of two strongly overlapping peaks. In this case, the horizontal and vertical lines reflecting column-based and row-based ID peaks, respectively, were tilted, deviating significantly from straight vertical and horizontal directions. The 2D peak-picking algorithm searched for this type of pattern by calculating the angle between the most tilted segment of the horizontal and vertical lines. If both angles were larger than a cutoff of 14°, the peak at the intersection was replaced by two new peaks, whose positions are defined by the midpoints of the end positions of the two segments. The most tilted segments of the horizontal and vertical lines are plotted as dotted lines in Fig. 6B, and the new cross-peaks as open circles. The predicted peaks slightly deviate from the exact locations of the true peaks (filled circles), which shows that in the case of such strong peak overlap, the extracted 2D cross-peaks showed some small positional errors.

[0101] Indeed, the exemplary method can be applied to other NMR. It is contemplated that the exemplary method can be, for example, employed on 3D NMR spectra by analyzing ID cross-sections along all three dimensions in a manner that is analogous to the extension of DEEP Picker from ID to 2D. Since DEEP Picker was specifically trained on ID spectra that are representative of cross-sections of 2D spectra in terms of the number of points per peak, lineshapes, etc., adaptation to 3D spectra, which tend to have much lower digital resolution while suffering from fewer cross-peak overlaps due to the 3 rd dimension, poses new challenges. The exemplary system may be employed in 3D (and possibly even higher dimensional spectra) by training a neural network that uses fewer points per peak.

[0102] Application to 15 N- H HSQC spectra of proteins. After training on synthetic data, DEEP Picker was applied to experimental 2D l 5 N-'H HSQC spectra of proteins, whereby all NMR spectra were processed using NMRPipe39 with manual phase correction and automatic polynomial baseline removal. 2D HSQC spectra belong to the most widely used spectra in biomolecular NMR, for example, for fingerprinting, chemical shift perturbation in titration studies, or pseudo-3D NMR experiments for quantitative dynamics studies (Rl, R2, Rip, CPMG, etc.)[l]. Hence, the accurate computer-assisted analysis of HSQC spectra, including strongly overlapped regions, is important for many different types of NMR applications.

[0103] DEEP Picker was first applied to a-synuclein, which is an intrinsically disordered 140-residue protein. The ^N^H HSQC spectrum was originally measured with 1024 complex data points along the direct dimension and 256 complex data points along the indirect dimension. In order to assess DEEP Picker’s power to recover accurate cross-peak information at high resolution from lower resolution data, the time-domain data was reprocessed by artificially reducing the spectral resolution along the indirect dimension: by removing the ti increments 129-256 the spectral resolution was reduced by a factor two. DEEP Picker was then applied to both the original high-resolution and the reduced-resolution spectra for comparison.

[0104] The results for selected regions are shown for the original spectrum in Fig. 6A-6C (left panels) and for the reduced resolution spectrum in Fig. 6D-6F (right panels). DEEP Picker successfully identified all cross-peaks, including those belonging to strongly overlapped regions, with the exception of a very low-intensity peak (Fig. 6D) approaching the noise level in the spectrum that used only half of the experimental data. This demonstrates that the peak picker was able to deconvolute accurately such a complex spectrum, even if the resolution is limited, provided that the signal-to-noise of the signals of interest is sufficiently high.

[0105] DEEP Picker also performs well for globular proteins, as is demonstrated in Fig. 8 for 15 N- 1H HSQC spectra of four different proteins, namely Gankyrin [40] (24.4 kDa), PLA2 [41] (13.8 kDa), ARID [42] (10.9 kDa), and Rop[43] (14.2 kDa). All four spectral regions depicted have significant amounts of cross-peak overlap. Figs. 8A-8D show a comparison of the peak picking results of NMRPipe, Sparky, NMRViewJ, and DEEP Picker for challenging regions of protein 15 N- X H HSQC spectra, whereby only DEEP Picker successfully identified all shoulder peaks.

[0106] Application to 13 C- H HSQC of metabolomics sample. NMR spectra of metabolomics samples represent another important class of samples where strong peak overlaps can occur in some regions of 2D X3 C- XH HSQC spectra, which are usually measured at X3 C natural abundance because of the often large number of different metabolites present in such samples. In contrast to protein NMR spectra, the large dynamic range of peak amplitudes and amplitudes due to large differences in metabolite concentrations pose an additional challenge. An objective of metabolomics studies was “fingerprinting”, which is the unique identification and analysis of as many cross-peaks as possible, even for ones that barely exceed the noise level, toward a comprehensive and quantitative analysis of these types of biological samples. Because of their small size compared to proteins, metabolites undergo rapid overall tumbling leading to long transverse relaxation times and sharp cross-peaks with small linewidths, but the number of cross-peaks can be very large depending on the complexity of the sample. The application of DEEP Picker for a 2D X3 C- X H HSQC spectrum of mouse urine, which may contain hundreds of different metabolites with various concentrations, is demonstrated herein.

[0107] Selected spectral regions of the spectrum, together with the picked cross-peaks, are shown in Figs. 9A-9D. The aliphatic regions shown belong to some of the most crowded regions of urine spectra that include numerous carbohydrates. Because a dominant fraction of the cross-peaks of mouse urine belongs to unknown metabolites, the ground truth is largely unknown. Hence, Figs. 9A-9D primarily serves as an illustration of the performance of DEEP Picker. Nonetheless, visual inspection shows how DEEP Picker was able to identify and distinguish between strongly overlapping cross-peaks that pose significant challenges for their analysis from standard 2D X3 C- X H HSQC experiments [44], More accurate spectral analysis directly benefits the identification of metabolites in urine and other complex metabolomics mixtures, which is a key step toward their quantitative profiling [2],

[0108] Application to NOESY and TOCSY spectra. When applied to other common 2D NMR experiments, such as NOESY and TOCSY, which tend to possess a larger dynamic range along with larger numbers of challenging peaks than HSQC spectra, DEEP Picker exhibits high confidence in peak picking. This is demonstrated in Figs. 10A-10D, which shows regions of a NOESY spectrum of protein Im7 and a TOCSY spectrum of urine. DEEP Picker was able to identify individual multiplet components due to J-splittings, which can be challenging for traditional peak pickers. DEEP Picker generally has a higher confidence score for major crosspeaks and lower confidence in low amplitude cross-peaks or multiplet components.

[0109] Selected regions of the NOESY spectrum with picked peaks that are color-coded according to their confidence level score are shown in Figs. 11A-1 ID. Similar to Fig. 9, since the ground truth of NOESY and TOCSY spectra with their very large number of cross-peaks is only partially known, Fig. 10 serves primarily as an illustration of complex spectra that can be supplied to DEEP Picker for peak picking.

[0110] Quantitative performance and effect of noise and other artifacts. A quantitative and objective assessment of a peak picker is desirable. However, unlike other common machine learning applications, there is not a large, carefully curated NMR spectral test database available for an objective assessment of NMR peak picking performance. Here, previously determined or published cross-peak assignments that were obtained with the help of complete sets of 3D assignment experiments were used. The picked X5 N- X H HSQC cross-peaks were assessed in terms of the number of false negatives and false positives, whereby “false” positives were visually inspected as they may correspond to true cross-peaks belonging to impurities, chemically modified, or aggregated proteins.

Table 1. Quantitative assessment of DEEP Picker performance for selected 2D X5 N- X H protein

NMR spectra

[0111] In Table 1, quantitative statistics and performance metrics of DEEP Picker were compiled for two of the most challenging proteins. The results suggested that the accuracy of DEEP Picker is very high with the only false negative peaks corresponding either to peaks that almost perfectly overlap with other peaks in the 2D X5 N- X H HSQC and could only be identified with the help of additional 3D triple-resonance ( X H, X3 C, 15 N) NMR experiments or because they were weak, falling well below a given amplitude cutoff.

[0112] Five peaks with high amplitudes were identified by DEEP Picker in both protein spectra that, upon visual inspection, looked like real peaks but had not been assigned. A large number of weak peaks were identified by DEEP Picker with amplitudes <10% of the major cross-peaks that had been previously assigned. Visual inspection, based on contour plots with the lowest contours drawn at a very low level, revealed that these cross-peaks are, in all likelihood, true peaks. Their unambiguous annotation as main peaks requires spin connectivity information from additional multi-dimensional NMR experiments, but due to their low amplitudes, sensitivity could be a significant challenge. It is possible that, in addition, certain noise artifacts or peaks with small phase errors are computationally indistinguishable from true peaks. Since their amplitudes are usually only a fraction of the major peaks that are of primary interest for the vast majority of NMR applications, they can be effectively filtered out during post-analysis using cutoff criteria based on amplitude.

[0113] Although the HSQC spectra used in this work stem from “real world” applications with signal-to-noise ratios (S/N) that are typical for samples measured at a shared NMR facility, additional measurements were taken of HSQC spectra on a K-Ras sample with a concentration of only 130 pM. A 30m X5 N- X H HSQC spectrum was collected with only four scans per increment and, for comparison, also with 108 scans per ti-increment, improving S/N by over a factor of 5. As shown in Fig. 12, the application of DEEP Picker revealed that even for the low- sensitivity spectrum with S/N= 25: 1, DEEP picked all isolated peaks correctly and is able to identify the vast majority of shoulder peaks. Sometimes, however, multiple peaks were picked around a peak maximum because of the uneven peak shapes displayed by the noisy spectrum, and some low- amplitude peaks close to the noise floor were lost.

[0114] Because DEEP Picker used local information only, artifacts or noise that shared the same local features with true peaks were not easily recognized. Such artifacts were identified and removed in a column-by-column post-analysis, including residual water signals that had a well-defined X H chemical shift or ti-noise forming vertical signal streaks along the indirect dimension, and they were not counted as false positives. Similarly, phase errors of the input spectrum were identified by inspection of the entire spectrum rather than based on individual peaks. They were manifested as minor peaks associated with main peaks in a systematic uniform (Oth order) or frequency-dependent (1 st order) manner and they wereremoved by reprocessing the original spectrum. On the other hand, if only selected peaks possessed such features, they were determined to correspond to true overlapped peaks and were kept in the final peak list. A residual smooth baseline artifact, a small phase error (<2°), or slightly imperfect peak shapes caused, e.g., by temperature fluctuations or shimming issues, were generally tolerated by DEEP Picker, since in the training set, the study intentionally annotated some very closely overlapped peak pairs as a single peak.

[0115] Example 2

[0116] A new era for NMR spectral analysis by machine learning. The basic premise of a machine-learning based NMR peak picker is that it is possible to train an algorithm with a database of NMR spectra of known NMR peak composition so that it then can be applied for the accurate and comprehensive identification of NMR signals that were not used for training. In fact, early attempts using artificial neural networks (ANNs) for pattern recognition in ID and 2D NMR spectra date back to the late 1980s and early 1990s [30’], [6’]. However, over the ensuing decades, ANNs could not establish themselves as standard tools against more traditional spectral analysis methods. The task of peak recognition in NMR spectra loosely resembles the object recognition problem in photographs and images. A burgeoning field enabled by gigantic image databases generated in part through social media. Much of the recent progress in image recognition stems from major developments in the theory and practical implementation of Deep Neural Networks (DNNs), especially for the subclass of convolutional neural networks, which greatly benefit from vast improvements in computer power available today. Machine-learning approaches based on DNNs are used in the methods described in the following example and may be extended to other machine-learning methods. It should be noted that the DNN returns peak positions with peak shapes and amplitudes whose accuracy depends, among other factors, on the degree of overlap.

[0117] General Considerations for experimental vs. synthetic spectral training databases. In any machine-learning project, the importance of the quality and size of the database used cannot be understated. The ideal database should cover a sufficiently large number of spectra that contain NMR peaks of different amplitudes and shapes resembling those expected in target applications. Moreover, the database spectra should encompass the fullest possible range of peak configuration scenarios ranging from peaks that are well isolated to strongly overlapping peak clusters along with a large dynamic range of peak amplitudes and peak widths. Such a database will then allow the training of a DNN for the picking of individual peaks of both well-resolved and crowded spectral regions. The underlying assumption is that a more comprehensive representation of peak properties and configurations in the training database will improve the performance of the DNN peak picker when applied to spectra in real-world applications.

[0118] For the development of a machine-leaming-based peak picker for 2D or higher dimensional NMR spectra, the training set ideally consists of a vast amount of experimental 2D NMR spectra, e.g., 15 N- X H HSQC, NOESY, and TOCSY spectra, of many different protein systems that were recorded with similar acquisition parameters and processed in the same way. These data sets will then allow the training for the identification of isolated peaks and peaks that show a variable degree of overlap using assignments made by human experts. Such a strategy was chosen in the original application of neural networks to peak picking [6’] and also in the recent NMRNet approach [13’] for the automated analysis of NOESY-type spectra. A related approach has been proposed to filter out noise peaks to facilitate the automated analysis of multidimensional datasets [14’]. The challenges are that such a large collection of experimental spectra needs to be carefully curated, including uniform experimental parameters and processing, and that even a large number of experimental spectra used in this way will only sample a subset of possible peak overlap scenarios, especially for peak clusters that involve three or more crosspeaks. This carries the risk that the neural network may be inadequate for the deconvolution of crowded regions consisting of sets of peaks that mutually overlap or overlapping peaks with large amplitude or linewidth differences since they may be too different from the scenarios covered by the training database. Such deficiencies and gaps in experimental training sets are difficult to spot and may only be revealed during the application phase of a DNN. Another potential complication with an experimental training database is that the correct decomposition of a spectral region into the sum of true individual resonances may be unknown or only partially known. It should also be kept in mind that a feature annotated as a "true peak" by a human expert based on visual inspection does not always guarantee that it is a real peak.

[0119] The ground truth describes the reality one wants the model to accurately predict. Part of the ground truth information about experimental NMR peaks, such as peak positions of 2D 15 N- X H or 15 C- 1 H HSQC peaks, can often be obtained from higher dimensional NMR spectra of isotopically labeled proteins that exhibit minimal peak overlap, whereas the true width and volume of each individual peak may not be easily obtainable by experiments for strongly overlapped peaks. By contrast, in the case of NOESY-type experiments, the reality of a peak, especially when it is weak, is often hard to prove as it would require the highly accurate knowledge of a biomolecule's conformational ensemble in solution and its dynamics time scales, which are rarely available.

[0120] Many of these issues can be avoided by choosing an entirely synthetic training database. In this case, the ground truth of each database entry is the set of all individually simulated peaks prior to their summation, for which all peak parameters (position, width, volume) are precisely known. The advantage of synthetic databases is that the ground truth is always known without ambiguities. Synthetic databases also allow the generation of densely sampled spectral sets that adequately represent the many possible complex peak overlap situations, including ones that may be missed in experimental databases.

[0121] Fig. 13 shows an example of a synthetic ID spectrum consisting of three individual overlapping peaks, which represent the ground truth and are suitable to be part of the training of the DNN peak picker. Before its use for training, however, a synthetic spectral database was curated by removing entries whose peak overlaps were too strong and that were virtually impossible to unambiguously deconvolute in the presence of noise. Removal of such entries prevents feeding the neural network with ambiguous training data resulting in poor performance. Alternatively, a hybrid spectral database for DNN training could be used, one that contains both automatically annotated synthetic spectra and manually annotated experimental spectra, whereby the relative weights of the two types of databases can be optimized during training.

[0122] In some examples, a synthetic database for training and validation was directly computed in the frequency domain by the coaddition of generated peaks with randomly chosen positions, lineshapes, and volumes as done for DEEP Picker. In another example, peaks were simulated in the time-domain, followed by zero-fdling, apodization, and Fourier transformation. The latter method allows the simulation of peak shapes that closely resemble those found in actual NMR experiments. Examples additionally included adding Gaussian random noise to the time domain data with a standard deviation, e.g., as a constant fraction of the volume of the largest peak. In yet other examples, systematic artifacts were introduced similar to those encountered in practice, such as small phase errors and baseline offsets. However, with the inclusion of more "degrees of freedom," the database was enlargened to adequately represent the combined presence of these effects in the database spectra. In the following example of DEEP Picker, training spectra directly simulated in the frequency domain without the explicit inclusion of the above-mentioned artifacts (noise, phase, baseline) were used.

[0123] When generating the synthetic spectral database, information about peak shapes to be expected in subsequent applications is important, allowing a more targeted training of the DNN to improve peak-picking accuracy. In solution NMR spectra, peak shapes are in good approximation, a convolution of Lorentzian and Gaussian lineshapes, which is a consequence of the exponentially decaying free induction decay (FID) subject to apodization with commonly used window functions, such as a shifted sine-bell, cosine square or 27t-Kaiser window, prior to zero-fdling and Fourier transformation. The convolution of Lorentzian and Gaussian lineshapes is known as Voigt profile V(co), which can be expressed as the real part of the complex Fourier transformation of functions of the type exp(icoot-R2t-bt 2 ):

[0124] where A is the peak volume, coo is the resonance frequency defining the peak position, R2= I/T2 is the transverse relaxation rate, and b defines the Gaussian contribution to the lineshape (the lineshape is Lorentzian in the limit of b=0 and Gaussian in the limit of R2=0). In practice, it can be advantageous to use pseudo-Voigt functions (Zaghloul and Ali 2011) for the efficient generation of synthetic spectra. The Voigt lineshape was used for the training of DEEP Picker. Experimental spectra with peak shapes that significantly and systematically deviate from Voigt profiles may be picked less accurately and, hence, they will require training of a peak picker using a different training database with spectra that accurately represent the target peak shapes.

[0125] One of the training parameters was the digital resolution of the spectrum compared to the linewidth of a typical resonance. It can be expressed as the number of spectral data points per peak (PPP) and which was the typical number of spectral datapoints across a peak's full width at half height (FWHH). In the case of DEEP Picker, peaks were represented at fairly high density, i.e., PPP fell into the range from 6 to 20 points or from 4 to 12 points, whereby the former was typical for proteins and the latter for metabolomics spectra. By guaranteeing that an input spectrum has a PPP in the above range, the performance accuracy of the DNN was improved while keeping the size of the spectral training set reasonably small, it is contemplated that for real-world applications, the input spectra may have a PPP in the same range as the training datasets. When needed, the PPP was adjusted for a spectrum even after it had been recorded simply by the application of the appropriate amount of zero-fdling. It was, therefore, not necessary to increase the actual spectral resolution via the collection of additional data points at longer ti-evolution or t2-acquisition times.

[0126] DNN peak picker : Training, testing, validation. Peak picking was trained for ID,

2D, or higher dimensional spectra. Because in ID the number of relative peak positions was confined along a single axis, there are fewer possibilities of relative peak arrangements compared to 2D. This allowed one to work with a relatively small ID spectral database to represent relevant configurations defined by the number of peaks, relative peak amplitudes, peak widths, and degree of overlap. Since 2D NMR spectra are defined on a 2D Cartesian frequency grid, there was no rotational symmetry that could be utilized to reduce the database size, although mirror images about the coi and C02 axes were exploited. Hence, for a 2D spectral database, many more possible relative orientations of two or more peaks together with their linewidths and peak shapes along the 2nd dimension needed to be explicitly represented, which required a substantially larger database. It was, therefore, attractive to train a ID peak picker and subsequently adapt it for peak picking of 2D (or higher dimensional) spectra. The latter was achieved by separately applying the peak picker to each row and each column and then reconciling the results to obtain the peak positions in 2D or higher dimensions. This strategy was further refined to avoid false positives for the accurate identification of shoulder peaks as implemented in DEEP Picker [20’]. The performance of DEEP Picker was exemplified in Fig. 14 for a l5 N-'H HSQC spectrum obtained both by standard Fourier transformation and non-uniform sampling (NUS) reconstruction using SMILE [34’]. The NUS dataset used 25% of the data points of the conventional dataset with the same ti, max. Therefore, the NUS spectrum had around two times lower sensitivity than the 2D Fourier transform (FT) spectrum. Comparing the crosspeak positions (excluding the side-chain NH2 groups) of the NUS spectrum by DEEP Picker before and after Voigt fitting with the peak positions of the conventional spectrum (after DEEP Picker and Voigt fitting) yields chemical shift RMSDs of ( X H: 1.2 ppb, 15 N:17 ppb) and ( x H:0.84 ppb, 15 N: 15 ppb), respectively. For comparison, the chemical shift RMSD of the conventional spectrum before and after Voigt fitting is ( x H:0.94 ppb, 15 N:4.7 ppb) Therefore, the cross-peak accuracy and precision of the NUS and the conventional spectrum achieved by DEEP Picker with or without Voigt Fitter are identical for all practical purposes. It is contemplated the DNN and Voigt Fitter can be extended to 3D and even higher dimensional NMR data.

[0127] It is contemplated that similar peak-picking performance may be achieved by applying the exemplified DNN method using training sets and network architectures suitably chosen for target systems. For example, if the described model includes more degrees of freedom, such as 2D vs. ID, significantly larger training databases may be needed to prevent overfitting during training, which is contemplated as an example. Additionally, the network architecture, including the number of network layers, may need to be adjusted accordingly for higher dimensions.

[0128] Following training, careful validation, i.e. the application of the DNN to datasets that are of the same nature as the training datasets but that were not used for training and whose ground truth is known, was critical to assess the performance of the example DNN and identify possible overfitting. After successful validation, the DNN underwent testing on a test dataset that was completely independent of both the training and validation datasets to obtain benchmarks of performance.

[0129] Once the testing was found satisfactory, the DNN performance was ready to be assessed using real-world experimental data. Such an assessment was important to gain confidence in the capability of the DNN in practice. In the case of proteins, suitable test data sets were 2D 15 N- X H HSQC spectra that had been assigned with the help of a standard suite of higher dimensional spectra (such as 3D HNCA, 3D HNCO, etc.). For 2D 15 N- X H HSQC and similar types of spectra, a quantitative performance score was obtained by counting the number of peaks that were correctly identified (true positives), those that were missed by the peak picker (false negatives), and peaks that were identified but may be artifacts (false positives). These numbers were then converted into normalized standard statistical quantities, such as precision= true positives/(true positives + false positives) and recall= true positives/(true positives + false negatives) (Ting 2010) to facilitate the comparison of the performance of different peak picker software and the monitoring of progress during development.

[0130] For the determination of true positives, it was possible that two cross-peaks perfectly overlap, and the DNN returned only a single peak and, hence, it was counted as one true positive and one false negative. If hydrogen exchange and chemical exchange wee not too severe for both underlying cross-peaks, the peak amplitude of this "double peak" was about twice the average peak amplitude, which an experienced NMR spectroscopist may correctly interpret as a "double peak". By contrast, only if the DNN was instructed to interpret peak amplitude information in such a way the false negative count can be avoided. Hence, even "flawless" spectral analysis of such a peak by the DNN can lead to an imperfect score.

[0131] The situation with false positives is even more complicated as there are several possible reasons why the DNN might identify extra peaks. First, even after purification, NMR samples of proteins are never perfectly homogeneous as there can be residual low-concentration impurities in the sample that give rise to additional cross-peaks. Moreover, protein samples can undergo partial aggregation or degradation that will also cause the presence of low-amplitude cross-peaks. If these additional peaks have amplitudes that exceed the noise floor, there is a good chance that the DNN will pick them, leading to a potentially large number of false positives. It should be noted that this is not the fault of the DNN, as it is precisely doing what it has been trained for. Because these extra peaks are often relatively weak, it is useful to define a "low peak amplitude cutoff' (LPAC), which is an amplitude threshold below which a peak returned by DNN is discarded. In fact, some kind of cutoff is also used by traditional peak pickers. The LPAC can be defined as a fraction of the amplitude of average true positive peaks or as a multiple of the mean noise amplitude a noise of the noise floor in a peak-free region of the spectrum, which can be automatically obtained by a robust global noise estimator [20’]. The optimal LPAC may vary from protein to protein or even from sample to sample of the same protein as the amounts of impurities and degraded protein may vary and depend on the sample condition, including age as well as storage conditions, and measurement temperature. Automatic setting of the LPAC is possible, e.g., by determining the mean amplitude A mean of the AT largest peaks in a 15 N- X H HSQC spectrum, where AT is the number of non-proline residues of the protein sequence, so that

LPAC = max(A mcan/X, Y(7noisc) (2)

[0132] where X and Y are scaling factors with a typical range between 5 and 50 depending on the application and technical preferences. Yunoise is a lower limit for LPAC, which ensures that the DNN does not pick peaks that are noise features belonging to the noise floor. This definition of LPAC can be directly applied to proteins with different sample concentrations where at high sample concentration LPAC =A mean/X and at low concentration LPAC = Ytrnoise. For the latter case, it is naturally possible that some true positive peaks disappear in the noise floor, leading to false negative counts. In the event of substantial line broadening due to rapid H N -hydrogen exchange with the water solvent or conformational exchange, certain peaks can be considerably weakened, falling below A mean/X, thereby also leading to an uptick of the false negative count. Any a priori information available for a given sample and the type of experiment will help one to properly set the LPAC. Fortunately, for resonance assignment experiments of many proteins, the performance score of the DNN is not particularly sensitive to the exact LPAC value.

[0133] In other examples, DNN peak picking is applied to other types of NMR, such as 2D NOESY, 2D TOCSY, or 2D 13 C- 1 H HSQC spectra of metabolomics samples (serum, urine, etc.). The quantitative evaluation of peak-picking performance is, however, challenging for such spectra since the ground truth is generally unknown or incomplete. This is due to unassigned protein resonances in NOESY and TOCSY or the presence of unknown metabolites in metabolomics samples. Therefore, peak-picker performance may be best evaluated by human experts visually assessing whether spectral features, including overlaps, are properly deconvoluted by the DNN. Examples of DEEP Picker performance are given for NOESY and TOCSY spectra of a protein and a mouse urine sample in Fig. 15, demonstrating how a DNN is able to handle even highly congested spectral regions. Both NOESY spectra of proteins and metabolomics spectra are well-known for their large dynamic range of cross-peak amplitudes. In the case of a 2D NOESY, highly informative long-range distance cross-peaks have the lowest amplitude due to their r' 6 -internuclear distance dependence, and in metabolomics samples, different metabolites can have vastly different concentrations. When setting X in Eq. (2) to a large value, LPAC is solely determined by the noise floor, i.e., LPAC= Yc se where Y can typically take a value in the range of 5-20. This ensures that as many low-amplitude peaks as possible are picked, with some carrying important information while others may be discarded during downstream analysis. The lack of ground truth makes such peaks unsuitable for analysis performance scoring. Nonetheless, in certain situations, it could be useful to report the number of such picked peaks even if they fall below the LPAC and thus are not considered for downstream analysis.

[0134] For a given LPAC value, most peak pickers (including DEEP Picker) work in a fully automated mode. Depending on the outcome of the results of downstream spectral analysis, such as resonance assignment or the number of unique metabolites identified, the LPAC value can in principle be iteratively adjusted until the highest level of consistency has been achieved. [0135] Peak picking in the presence of spectral artifacts. Relevant regions of 2D X5 N- X H HSQC spectra of proteins were generally "clean" with minimal artifacts, such as ti-noise. Also, the residual water signal was up-field shifted with respect to all amide proton frequencies and was therefore easily excluded before or after analysis by the DNN. By contrast, in 2D TOCSY and NOESY-type spectra, for some resonances, the appearance of ti-noise was manifested as narrow streaks along their indirect dimension, which may be interpreted by the DNN as a large number of peaks in case the LPAC is defined by the global noise level. Such false peaks can be filtered out by replacing the global LPAC with one that is increased for regions with pronounced ti-noise, although this carries the risk that true but relatively weak cross-peaks in these regions will be missed too.

[0136] Measure of confidence of picked peaks. A quantitative measure of confidence in the output of the DNN for each picked peak directed users to individual peaks or entire peak regions whose analysis by DNN was potentially ambiguous and challenging. In the case of DEEP Picker, it returned for each spectral point a score for being a major peak, an overlapping minor peak, or not a peak. Peaks that had a low score for being not a peak were assigned a high confidence level, whereas peaks that had a high score for being not a peak were assigned a low confidence level. The latter was also the peaks that were typically hard to deconvolute due to significant overlap or low signal -to-noise (<10 X G noise ). Confidence level information also helped users visually inspect the results by focusing on potentially ambiguous peaks that were earmarked by the software. Traditional peak pickers generally do not return a confidence level for picked peaks. Additional reliability criteria were included, such as those implemented in iPick [27’], assessing the volume, linewidth, and signal -to-noise of each peak.

[0137] Application to non-uniformly sampled NMR spectra. For samples and experiments for which spectral resolution is the limiting factor rather than sensitivity, non-uniform sampling (NUS) has become a popular alternative to multidimensional FT NMR [12’]. Software packages are available for the spectral reconstruction of NUS data that use different reconstruction algorithms. Because NUS spectra can suffer from artifacts, including some that are different from those encountered in Fourier transform NMR [36’], it can potentially cause distortions in spectral lineshapes and the appearance of extra peaks. Figure 14B shows the same X5 N- X H HSQC spectral region used for the demonstration of DNN peak picking but processed by NUS with a 25% Poisson gap sampling rate using the SMILE software [34’]. The performance of DEEP Picker for this NUS spectrum closely matched that of the fully sampled 2D FT spectrum: when LPAC/G noise was set to 30 for both the NUS and uniformly sampled spectrum, only three peaks were missing from the NUS reconstruction compared to the fully sampled spectrum due to the inherently lower sensitivity of the NUS spectrum using only a fourth of the time-domain data of the reference spectrum of Fig. 14 A. For LPAC/ c noise set to this number was reduced to only one missing peak. These results suggest that DEEP Picker can be readily deployed also to NUS- SMILE spectra without requiring new DNN training. Since different NUS reconstruction software returns different results, a systematic analysis for different types of spectra, NUS schedules, and reconstruction algorithms is needed to better understand the potential limitations of a DNN peak picker applied to NUS spectra.

[0138] The overall workflow of peak picking by a DNN is generally very similar to that of a traditional peak picker. However, in preferred implementations, DEEP Picker is configured to apodize the input spectrum, e.g., using a 27r-Kaiser or cosine-squaw window function along each dimension without resolution enhancement. The appearance of Sine-wiggles should be avoided. DEEP Picker also allows for the PPP to fall in the range of 4-12 for metabolomics spectra and 6-20 for protein spectra along each spectral dimension. If the PPP of a given spectrum is too low, it can be readily increased by reprocessing the spectrum with adequate zerofilling. DEEP Picker also applied proper phase correction along all dimensions so that the maximal phase error does not exceed 3°. This is usually easily achievable for spectra acquired on modern NMR spectrometers. DEEP Picker also correct the standard baseline along all dimensions is advised as implemented in common NMR spectral processing software. The outlined pre-processing of input spectra enable the best possible peak-picking performance of the described method.

[0139] With rapid progress in DNNs and other machine learning methods in many areas of science, automated NMR spectra analysis can benefit from these developments when carefully taking into consideration the unique features of NMR spectra. As with traditional peak pickers, the more prior information is available about the NMR sample, be it a biomacromolecular sample or complex mixture, the more accurate and useful will be the output of the DNN peak picker. This includes information about potential artifacts, such as phasing errors, ti-noise, baseline distortions, or the presence of intrinsically weak cross-peaks, such as Asn and Gin 15 NH 2 side-chain cross peaks 15 N- X HHSQC spectra. However, even with minimal knowledge about a system for which the NMR spectrum was collected, such as the number of residues of a protein or whether the spectrum stems from a metabolomics sample, the described DNNs can identify NMR peaks even for spectral regions characterized by extreme crowding and a large dynamic range as was found for DEEP Picker. When paired with automated peak fitting specifically adapted to the DNN output, machine-learning-based analysis of NMR spectra offers an advanced degree of automation that helps speed up analysis and improve accuracy. It will make complex spectra amenable to comprehensive analysis, including regions that were previously inaccessible by traditional methods. While the described peak-picking method is possible of fully automated peak-picking, the method may be complimented with visual inspection, at least for part of the spectrum. The use of the disclosed method may form part of a work flow to identify such "regions of interest" so that human inspection can focus on a relatively small subset of critical cases, thereby minimizing the effort required for obtaining high-quality results ins dependable and reproducible manner. The strategies described here work well for ID and 2D NMR datasets and can be generalized to higher dimensional spectra collected both with uniform and non-uniform sampling schedules. Although the above examples focus on solution NMR spectra, similar results are attainable also for solid-state NMR spectra. [0140] Continuous rapid progress in machine-learning and other Al methodologies is likely to soon enable largely automated workflows for NMR data analysis, starting from shimming and NMR pulse calibration all the way to the extraction of high-resolution, sitespecific structural and dynamic data. As magnetic resonance will soon approach its centennial anniversary, this is expected to be yet another milestone in its ever-continuing evolution. It will not only improve both the quality of the output and overall throughput but also help further broaden the appeal of NMR as one of the most versatile analytical and biophysical techniques making it better accessible to novices and non-experts. Therefore, the proper interpretation of all these data in terms of the underlying chemistry, biophysics, biology, and medicine will remain vitally important and eventually determine the impact of these lines of research on their corresponding subfields and beyond.

[0141] Example 3 [0142] Metabolomics is the comprehensive identification and quantification of the small molecules involved in metabolic pathways in a biological system, known as metabolites [ 1”],[2”] . Metabolites are the substrates and products of many biological processes; therefore, measuring the metabolic profile captures a snapshot of cellular activity. Metabolomics is also the most downstream omics strategy; therefore, it is influenced by upstream genetic and protein changes or environmental factors, making it uniquely reflective of the phenotype [3”]. For these reasons, metabolomics approaches have proven valuable for diagnostics and monitoring of the treatment of a multitude of conditions and diseases, the characterization of regulatory biochemical processes, or applications in food science and nutrition [4”]-[6”] .

[0143] Intrinsic to the majority of successful metabolomics studies is the ability to accurately detect and quantify metabolites from a cohort of samples in a highly reproducible manner. Nuclear magnetic resonance (NKR) spectroscopy is a useful and powerful tool due to its inherent high reproducibility, resolution, and quantitative capabilities [7”]-[l 1”]. NMR is also nondestructive to the sample and does not require additional sample derivatization or separation steps, such as chromatography. NMR is uniquely adept for quantitative untargeted metabolomics because it can produce quantitative data for all reasonably abundant known and unknown metabolites present in a complex mixture in a single measurement [8”], [12”], [13”].

[0144] ID X H NMR is often utilized due to its short measurement time and quantitative nature. Several automated tools for ID X H quantitative analysis have been developed, such as Meta.hoLab [14”], BATMAN [15”], Bayesil [16”], AQuA [17”], ASICS[18”], and rDolphin [19”]. However, complex mixtures containing metabolites with similar chemic.tl motifs will cause peak overlap and crowded spectral regions, making metabolite identification ambiguous and quantification inaccurate[10”]. These issues are largely resolved by collecting 2D NMR spectra, which adds additional resolution by correlating protons with neighboring nuclei such as 13 C or other protons [13”]. Although 2D NMR spectra are not absolutely quantitative due to their dependence on J-couplings and differential spin relaxation times, peaks belonging to the same compound in spectra collected under the same parameters can be directly compared to determine relative concentrations for the quantitative determination of fold changes and statistical analysis between cohorts of samples [10”]. If needed, absolute quantitation of spectra can be achieved with the collection of reference spectra, spiking experiments, or specialty tedious like HSQCO [20”]. [0145] The 2D 13 C- 1 H HSQC offers significant resolution enhancement compared to ID 'H, ameliorating peak overlap. In addition, 2D X H — X H TOCSY spectra aid in metabolite information within spin systems of metabolites. This combined 2D HSQC and TOCSY approach, as implemented in our previously described COLMARm web server with its database of over 750 reference spectra, affords comprehensive, accurate, and efficient metabolite identification [21”]. Still, extracting high-quality quantitative information from spectra remains a major challenge in NMR-based metabolomics [22”]. The additional steps necessary for metabolite quantification, including peak picking, fitting, and matching, during the course of the analysis of cohorts of samples containing hundreds to thousands of peaks per spectrum, can be ambiguous, time-consuming, and tedious[10”]. A few tools have begun to take advantage of the increased resolution offered by 2D NMR to improve the quantitative analysis of ID X H spectra. Dolphin combines ID X H and 2D J-resolved spectra to enhance the reliability and accuracy of metabolite matching to reference spectra. In this method, 2D J-resolved spectra are used to identify targeted metabolites, followed by quantification by lineshape fitting of the corresponding peaks in the ID X H spectra. The user also has options for referencing and normalization, but quantification by this method is still limited by the extent of the ID peak overlap [22”]. In the R package specmine, 2D spectra are represented as a matrix, the dimensionality is reduced to a ID specmine dataset to reduce the computational cost, and then spectra can be plotted for visualization, peak detection, and measurement of peak intensities [23”]. However, the specimen requires coding experience (in R) and does not perform metabolite identification. Beyond these recent methods, there are no automated tools available for the identification and quantification of metabolites in 2D spectra and subsequent analysis [13”]. [0146] An example implementation of the described system is presented. The example system, a public web server, COLMARq, is presented, which facilitates the semi-automated, quantitative analysis of cohorts of 2D NMR spectra in an accurate and efficient manner. The COLMARq workflow (Fig. 16) involves uploading of cohorts of 2D HSQC and 2D TOCSY spectra, peak picking, peak fitting, peak matching between samples, data normalization, database query, peak and metabolite-based statistical analysis, and data export of the results. This provides a user easily access to input NMR spectra and provides efficient, quantitatively interpretable results such as p-values for metabolite concentration differences between groups or multivariate analysis. The COLMARq server, e g., as an analysis server or employing an analysis engine. COLMARq, via its analytical software components, allows for the upload of cohorts of HSQC and TOC SY spectra, automated peak picking, peak fitting for quantification, peak matching between spectra, data normalization via ratio analysis, database query for metabolite identification, and peak- and compound-based uni- and multi-variate statistical analyses. Examples of such operations, as performed in a workfl ow/analyti cal pipeline, are described below.

[0147] These tasks are performed in an automated manner while allowing for user input and manual correction as needed. The example further demonstrates a comparative, quantitative analysis of cohorts of P. aeruginosa bacterial cultures in biofilm versus planktonic growth modes.

[0148] Sample Preparation. P. aeruginosa strain PAO1 [24”] cultures were grown overnight in lysogeny broth (LB) (Sigma Aldrich) and diluted to ODeoo, = 0.1. Then, cultures were scaled and grown planktonically in 50 mL of LB at 220 rpm at 37 °C for 24 h and as a biofilm on LB plates (28.4 cm 2 ) containing 1.5% (w/ v) agar, statically, at 37 °C in 5% CO2 for 48 hours (n = 9) for metabolomics experiments.

[0149] Planktonic cultures were harvested by centrifugation at 4,300 x g for 20 minutes at 4 °C and washed with 1 mL of phosphate-buffered saline (PBS). Biofilm cultures were harvested by scraping with a sterile loop. Samples were immediately resuspended in 600 pL of cold 1: 1 methanol (Fisher)/double distilled H2O (ddLLO) for quenching. Stainless-steel beads (SSB14B) (300 pL) (1.4 mm) were added, and cells were lysed using a Bullet Blender (24 Gold BB24-AU by Next Advance) at a speed of 8 for 9 minutes at 4 °C [25”]. An additional 500 pL of 1: 1 methanol/ddH20 was added, and the sample was centrifuged at 14,000 X g for 10 minutes at 4 °C to remove solid debris. Methanol/ ddH20/chloroform (Fisher) (1: 1: 1) was added for a total volume of 24 mL [26”], [27”]. The sample was vortexed and centrifuged at 4,300 X g for 20 minutes at 4 °C for phase separation. The aqueous phase was collected, and the methanol content was reduced using rotary evaporation, followed by lyophilization overnight. For NMR measurements, the samples were resuspended in 200 pL of NMR buffer (50 mM sodium phosphate buffer in D2O at pH 7.2 with 0.1 mM DSS (4,4 dimethy 1-4-silapentane-lsulfonic acid) for referencing) and centrifuged at 20,000 x g for 15 minutes at 4 °C for removal of any residual protein content. The pellet was washed with 100 «L of NMR buffer, and the supernatants were combined and transferred to a 3 mm NMR tube with a Teflon cap and sealed with parafdm. [0150] NMR Experiments. NMR spectra were collected at 298 K on a Bruker AVANCE III HD 850 MHz solution-state spectrometer equipped with a cryogenically cooled TCI probe. 2D 'H-l 1-1 TOCSY spectra were collected (Broker pulse program "dipsi2ggpphpr") with 256 complex t2 and 2048 complex ti points for a measurement time of 4 hours. The spectral widths along the indirect and direct dimensions were 10,202.0 and 10,204.1 Hz, and the number of scans per ti increment was 14. 2D ’H — ’H HSQC spectra (Brisker pulse program "hsqcetgpsisp2.2") were collected with 512 complex ti and 2048 complex t2 points for a measurement time of 16 hours. The spectral widths along the indirect and direct dimensions were 34,206.2 and 9375.0 Hz, and the number of scans per ti increment was 32. The transmitter frequency offset values were 75 ppm in the 13 C dimension and 4.7 ppm in the ’H dimension for all experiments. NMR data was zero-fdled fourfold in both dimensions, apodized using a cosine- squared window function, Fourier transformed, and phase corrected using NMRPipe [28”]. [0151] Results. The individual steps are listed in the flowchart of COLMARq (Fig. 16), and they are explained in more detail in the following. Since most metabolomics studies start out with metabolite identification, COLMARq works directly with the results of previous COLMARm session(s) used for metabolite identification. Hence, for each sample, the processed 2D HSQC and optionally TOCSY NMR spectra in the frequency domain are first uploaded to the COLMARm web server 1610, followed by peak deconvolution and spectral referencing (if necessary). It accepts the spectral data formats of Broker Topspin (ASCII), Mnova, NMRPipe, and Sparky. If the user has prior knowledge of the metabolite composition of the samples and is familiar with the functions of COLMARm, the spectral files can also be directly uploaded to COLMARq in batch mode 1610. First, all cross-peaks are identified by automated peak picking 1620, which is critical for all subsequent steps. COLMARq and COLMARm support two types of peak pickers: the default method is the deep neural network DEEP Picker, which has proven highly effective for crowded 2D spectra of proteins and metabolomics samples [29”]. As an alternative, a traditional peak picker can be selected, which is based on a Laplacian spectral filter amplifying shoulder peaks at the cost of increased noise and some false positive peak identification in highly crowded regions. The traditional peak picker is similar to existing peak pickers implemented in Mnova and other tools [30”], [31”].

[0152] Next, each identified cross-peak is quantified for the purpose of determining the relative concentration of the metabolite 1630. This is accomplished by the numerical fitting of the cross-peaks using the exemplary method of Voigt fitting. After appropriate apodization using a cosine square or 2?r-Kaiser window function, NMR lineshapes follow in good approximation Voigt profiles, which are hybrids between Lorentzian and Gaussian profiles, along both frequency dimensions. Each 2D HSQC cross-peak is characterized by seven parameters: the peak position along each dimension (which can be off the underlying digital spectral grid), the peak amplitude or volume, and the peak shape, whereby the peak shape is determined by its two Voigt parameters along each dimension. For many 13 C — X H HSQC spectra in metabolomics, the cross-peaks have in good approximation a Gaussian shape and thus can be fitted with only five parameters. Using the output of the peak picker as initial values for fitting, the Voigt Fitter performs a non-linear least-squares fit to simultaneously optimize the peak parameters of all peaks to reproduce the original spectrum. While nonoverlapping peaks can be fitted individually quite efficiently, fitting of large overlapping peak clusters requires the simultaneous fitting of N cross-peaks identified in the cluster. Most non-linear least square fitting algorithms, such as the Levenberg — Marquardt algorithm and its derivatives, involve the iterative diagonalization of a 5N x 5N square matrix, which computationally scales with 0(N 3 ). As a consequence, for sizable N, the fitting process can become very slow, even on modem computer workstations. To address this issue, we implemented a Gaussian mixture-type model algorithm into the Voigt fitting method [33”], which scales linearly with N, allowing the rapid fitting of complex spectra with an essentially unlimited number of both overlapping and nonoverlapping cross-peaks as typically encountered in metabolomics spectra. The Gaussian mixture-type model algorithm solves the problem iteratively, where each iteration includes the following steps: (1) calculate the theoretical spectrum of each peak using its current peak parameters; (2) aggregate the theoretical spectra of all peaks to obtain the total theoretical spectrum; (3) calculate for each individual peak the ratio of its (theoretical) spectrum and the total (theoretical) spectrum; (4) deconvolute the experimental spectrum into the spectra of individual peaks in a way such that the ratio of individual spectral peaks and the total experimental spectrum is the same as the ratio obtained in step 3; and (5) fit each peak using the deconvoluted spectrum as a starting point and update peak parameters. The algorithm will go back to step (1) until the change of the peak parameters falls below a predefined cutoff. In step (5) of the algorithm, the nonlinear least squares fit is performed sequentially for each individual cross-peak in a five-dimensional parameter space (in the case of Gaussian peak shapes), rather than simultaneously for all N cross-peaks in a 5N- dimensional parameter. Hence, the computational effort of the algorithm scales linearly with N, i.e., 0(N), allowing a dramatic speed-up in the fitting of spectra with large numbers of peaks as typically encountered in metabolomics applications. In contrast to other fitting software, Voigt Fitter does not require the selection of spectral subregions for efficient fitting as it can autonomously handle entire spectra with several thousand crosspeaks. Voigt Fitter also does not require the manual addition or elimination of peaks for improved fitting as DEEP Picker reliably produces a high-quality set of cross-peaks, including their positions, lineshapes, and amplitudes, as a starting point for Voigt Fitter. As a benchmark, the fitting of the 1772 cross-peaks of a 2D 13 C - X H HSQC spectrum of the P. aeruginosa biofilm, where the largest peak cluster contains 142 peaks, takes only about 20 seconds. By contrast, due to its unfavorable scaling property, a traditional non-least-squares fitting approach takes many hours or even days. An illustration of a complex spectral region of the biofilm spectrum and its fitted counterpart is shown in Fig. 17, demonstrating the high accuracy of the Voigt Fitter even for highly overlapped cross-peak dusters.

[0153] The next step in the COLMARq workflow is to match peaks 1640 stemming from the resonance signal of a certain spin of the same metabolite across the entire batch of spectra. The peak matching algorithm takes into account peak positions (chemical shifts), peak heights, possible peak multiplets due to scalar J-couplings and peak picking consistency among different spectra. In metabolomics samples, the vast majority of cross-peaks have well-defined positions that remain essentially unchanged from sample to sample. However, a small number of crosspeaks can move by as much as 0.02 or 0.2 ppm along the proton or carbon dimension, respectively. This can be caused by slight variations of sample conditions among replicates, such as alterations in pH. Besides chemical shift information, the peak matching algorithm also takes into account peak amplitudes. Specifically, peaks whose amplitudes are within a factor of 10 of each other in different samples are preferred for matching. If this is not possible within the chemical shift cutoff, the peak matching algorithm will then try to match peaks with amplitude ratios exceeding 10.

[0154] Spectral multiplets observed in metabolomics HSQC spectra should be matched as a group against the same kind of multiplets in other samples. An example of matched doublets is shown in Fig. 18 (blue cluster). For low sensitivity multiplets (with amplitudes smaller than 10 times the noise level) or multiplets that strongly overlap with other peaks, the DEEP picker (and Voigt fitter) may interpret the same feature as a multiplet in some samples and as a single peak in others. An example of such a case is also shown in Fig. 18 (red cluster), where the consensus peak was identified as three peaks in Samples #0, #2, and #3 and four peaks in Sample #1. While the peak matching algorithm will assign a lower confidence score to these types of imperfect matching results, they can still be useful for downstream analysis. Because of the sometimes difficult and ambiguous nature of peak matching, it is recommended that the user check the peak matching results using the visualization plots on the web server to ensure the most accurate downstream quantitative analysis. The web server was designed with a high level of flexibility, allowing users to interactively make manual adjustments to the peak matching result. Based on the user’s assessment of confidence in the matching results of individual peaks, they can be adjusted or discarded during a later stage of the analysis.

[0155] The next step is the normalization of spectra 1650, which can correct for variations in the total sample amount or overall sample concentration between replicates or cohorts, which may occur during sample collection, sample preparation, or data acquisition [34”]- [36”]. For solution NMR-based studies in which the total volume of each sample can be controlled during sample preparation, the potential global dilution factor for each sample should be accounted for during data analysis. For this purpose, COLMARq supports the widely used median fold change method [34”], which works well when many metabolites have a similar concentration across all samples. This method determines the median fold change between samples as a robust estimate of the dilution factors between samples. Specifically, the COLMARq normalization tool estimates the normalization factors between a reference sample specified by the user and all other samples. For each pair of samples, the tool calculates the foldchange ratio of all matched peaks, rank orders the ratios, and then uses the mean of the median 30% fold-change ratios as the normalization factor. The accuracy of this approach depends on the quality of peak matching in the previous step. As mentioned above, the COLMARq server gives users the option to manually adjust peak matching and exclude matched peaks that have a low confidence score.

[0156] As an example, cohorts of nine spectra were uploaded from P. aeruginosa planktonic and biofilm cultures to COLMARq for statistical analysis. A screenshot of the normalization plot of the web server is displayed in Fig. 19A. Peak volumes of Sample #3 was divided by the corresponding peak volumes of Sample #2, which was chosen as the reference spectrum, and the resulting peak volume ratios were rank ordered. Fig. 19A shows the logarithm of the ratios vs. peak number, giving rise to a characteristic rotated sigmoidal curve. The tails on both ends show peaks that mostly differ between samples (smallest ratios on the left and largest ratios on the right), while the relatively flat center reflects that the dilution factor between the samples is minimal. Averaging the median 30% of ratios results in a normalization factor of 1.005 for this sample, indicating that any dilution effect for Sample #3 vs Sample #2 is minimal. This type of normalization plot can be generated for each sample to obtain a visual impression of potential dilution effects and determine whether the underlying assumption of this method is valid, namely, that the majority of ratios between metabolites are, in good approximation, constant as manifested in a flat middle range of the rotated sigmoidal curve. The peak volumes of each spectrum are then divided by the normalization factor to make them quantitatively comparable to the reference spectrum and to each other for subsequent statistical analysis 1670. [0157] Once cross-peaks are quantified and matched across all samples, statistical analysis can be performed in a standard manner. Although statistical analysis tools are widely available, the COLMARq server also provides limited univariate and multivariate statistical analysis 1670 capabilities to readily give users information about cross-peaks or metabolites that have statistically significant concentration differences between cohorts. The user can sort the uploaded samples into two groups with the option to selectively exclude samples from statistical analysis. At this time, COLMARq provides peak-based p-value analysis (t-test), including a histogram of all p-values. For the p-value calculation, the two cohorts are assumed to be both normally distributed with equal variance. COLMARq also allows users to perform a peak-based principal component analysis (PCA) as an unsupervised multivariate statistical analysis method commonly used in metabolomics for the visual clustering of samples in a score plot based on the covariation of cross-peak volumes to assess separation between cohorts.

[0158] For metabolites and peaks that are observable in some samples and unobservable in others, it is generally useful to set the missing amplitudes to either 1/2 or 1/3 (default) of the detection limit of the experiment rather than setting them to zero. In COLMARq, the peak amplitude detection limit can be defined by the user as a fixed multiple of the noise level automatically determined for each spectrum. For this purpose, from all observable peaks, an empirical relationship between peak volumes and peak amplitudes is established, which is then used to estimate the peak volume of peaks with amplitudes at 1/ 3 of the peak height detection limit.

[0159] In the demonstration with P. aeruginosa, a total of 1302 distinct cross-peaks were picked in each spectrum with 782 peaks showing a significant difference between cohorts with p < 0.05. A screenshot of the p-value histogram from the web sewer (Fig. 19B), including only peaks present in all 18 spectra, shows a substantial number of cross-peaks whose volumes systematically differ between cohorts (histogram bar on very left), reflecting the inherent metabolic heterogeneity of the P. aeruginosa planktonic and biofilm cultures. Of the significantly different peaks, 493 do not match any known metabolites in the database, highlighting the potential of peak-based statistical analysis for the characterization also of unknown metabolites.

[0160] COLMARq also provides metabolite database query 1660 capabilities directly adopted from COLMARm. If an experimental consensus peak is within the predefined frequency cutoff of a database peek, it is classified as a "matched peak." The "matching ratio" is then defined as the ratio of the number of matched peaks to the total number of peaks of the database compound. The default cutoff parameters for 'H and l 3 C chemical shift differences are set at 0.04 and 0.4 ppm, respectively, and the lowest accepted matching ratio is set to 0.6. Users can alter these three parameters on the web server interface and repeat the database query to see how they affect the returned matched metabolite list. If needed, users also have the option to interactively edit the cross-peaks matched to each metabolite database peak by drag and drop.

[0161] COLMARq may be used to detect all possible metabolite matches, whereby user visualization plays an important part to narrow results of P. aeruginosa using cutoff parameters of 0.3 ppm for 13 C and 0.03 ppm for X H with a peak matching ratio of 0.6, a total of 169 metabolites were matched to the spectra. After manual editing, 66 metabolites were determined to be highly confident hits marked as good or fair and quantified. The total matched compound list included 68 tentative hits that were matched due to a peak overlap between similar metabolites but did not contain unique peaks. This can occur in highly crowded spectral regions pertaining to highly similar compounds, such as carbohydrates and nucleotides, which comprise 47 of the 68 tentative hits. An additional 22 compounds were matched, but because they were present at low abundance with weak and missing peaks in many spectra, they were not quantified. If desired, the user can set stricter cutoff parameters to reduce the number of incomplete matches. Fig. 20A shows a screenshot from the web server as an example of four interactive HSQC and TOCSY plots zoomed in on a metabolite match. The blue circles mark the expected cross-peak positions for this metabolite from the database, and the pink circles in the TOCSY mark the expected TOCSY cross-peaks. As previously mentioned, the user can drag and drop the blue circles to select which experimental peaks are the best match for this metabolite. Another example of metabolite matching with more samples is shown in Fig. 21.

[0162] In addition to the cross-peak-based -value analysis of Fig. 19B, users can also perform compound-based p-value calculations. In this case, the relative concentration of a compound is calculated from the weighted average peak volume over all its cross-peaks. By default, all cross-peaks have the same weight, but users have the option to adjust the weights. For example, users can assign lower or even zero weight to weak peaks so that the relative concentration is dominated by the strongest and, hence, most quantitative peaks of a metabolite. Using a weight of zero to exclude peaks is useful in the case that one or more peaks belonging to a metabolite are overlapped with a peak from another metabolite allowing the inclusion of only unique peaks for accurate quantification. For P. aeruginosa, of the total 66 matched metabolites, 52 display a significant concentration difference between cohorts (p < 0.05). Fig. 20B shows a chart with statistical information for an example metabolite match. The chart includes for each peak the mean and standard deviation for each cohort, and the t-score and p-value between cohorts.

[0163] The COLMARq server provides several flexible options for the user to download both intermediate and final results for subsequent use. For example, users have the option to download the matched peak list with peak volumes in text format so that they can be used as input for further statistical analysis using the user's preferred software. Users can also download numerical peak-based or compound-based p-value results.

[0164] Discussion. The high complementary of NMR to mass spectrometry makes NMR a powerful method for the targeted and untargeted quantitative analysis of metabolomics samples. 8 Due to NMR’s unique versatility, it is not a surprise that there exist a variety of different NMR approaches, each with its own pros and cons. High-throughput applications involving large cohorts of samples typically rely on ID X H experiments as it requires measurement times of only around 15 minutes per sample. Alternatively, the ability to uniquely identify a large number of metabolites from ID spectra alone is limited due to crowded spectral regions that are difficult to deconvolute. Tn addition, strong peak overlap and background signals can compromise the quantitation of individual peaks. It is, therefore, common to assist ID NMR- based metabolomics studies with a very small number of 2D NMR experiments of selected samples for the verification of metabolite assignments [37”]. 2D NMR spectra, such as 13 C- X H HSQC, 'H^H TOCSY, or 1H-1H COSY, provide vast resolution enhancement over ID. However, the collection of 2D NMR spectra for samples that are limited by sensitivity, rather than the sampling of the indirect time domain, is typically associated with significantly prolonged measurement times. This applies in particular to 13 C- X H HSQC spectra at 13 C natural abundance. At the same time, the first-rate resolution properties make them particularly well suited for semi-automated, quantitative analysis. For other 2D NMR spectra, nonuniform sampling along the indirect dimension or ultrafast 2D NMR can provide a significant speed-up over the traditional 2D NMR acquisition method [38”]. Compared to ID NMR, 2D NMR-based metabolomics is more involved during the NMR data acquisition stage. On the other hand, the 2D method offers a substantial time gain together with higher accuracy during the analysis part of a project.

[0165] Despite their widespread use for resonance assignment and metabolite identification purposes, it is still very uncommon to use 2D NMR spectra for fully quantitative metabolomics analysis. A number of standalone software has been introduced for the quantitative analysis of 2D NMR cross-peaks by peak fitting, including NMRPipe [28”], FMLR [3”9], PINT [40”], INFOS [41”], and FitNMR [42”] using a range of different models for the peak shapes from Gaussian to lineshapes directly mirroring the apodization function used. These software programs have not been designed for the typical metabolomics workflow involving cohorts of complex spectra from different samples that require peak matching, which may explain their lack of routine usage in metabolomics. COLMARq offers a convenient integration by directly using metabolite assignments (from COLMARm) for the quantification of cohorts of spectra, peak matching, normalization, and statistical analysis. COLMARq is the first publicly available web server to facilitate metabolite identification and fully quantitative analysis of 2D NMR spectra for metabolomics.

[0166] X3 C- X H HSQC spectra have a very clean baseline void of a background signal in most regions, which makes them particularly suitable for the highly quantitative analysis of a large number of peaks. COLMARq is best used in combination with COLMARm, where for each sample, a COLMARm analysis for metabolite query is performed first. This is followed by the simultaneous uploading of all COLMARm sessions into COLMARq for quantification. For experienced users, the COLMARm upload step can be circumvented, and the spectra corresponding to all samples can be uploaded in batch mode to COLMARq for analysis. Normalization of peak volumes from different spectra is known to be important. The median ratio method implemented in COLMARq assumes that the concentration of a majority of metabolites remains unchanged, giving rise to the flipped sigmoidal profile of the rank-ordered ratios with an extended flat part in the middle percentile range.

[0167] The graphical representation of this relationship by COLMARq (Fig. 19A) allows the user a quick assessment of whether this assumption is fulfilled and whether the normalization procedure is appropriate for a particular study. The analysis of a large number of cross-peaks across a cohort of samples afforded by 2D NMR-based metabolomics also allows a meaningful analysis of the p-value distribution in the form of a histogram (Fig. 19B). For two sample cohorts that are statistically indistinguishable, the p-value histogram should be flat, i.e., each p-value from 0 to 1 has the same probability. Therefore, the p-value histogram provides a straightforward visual assessment of whether the two cohorts are inherently different in their metabolomic makeup. This is particularly useful for pilot studies based on a relatively small number of samples to decide whether a larger scale study, for example, for the characterization of putative biomarkers, is warranted.

[0168] It was observed that the 2D NMR-based metabolomics can equally well for targeted and untargeted studies, including biological samples that are not commonly studied, involving potentially large numbers of cross-peaks belonging to both known and unknown metabolites. Such high-quality information is harder to obtain from ID NMR-based metabolomics unless the metabolite composition, for example, for human serum, is mostly known.

[0169] COLMARq is largely automated by taking advantage of the very accurate peak identification performance by DEEP Picker as input for Voigt Fitter for quantification. In addition, manual editing is made possible, which is useful for peak matching between multiple spectra in strongly overlapped regions that show variations in peak positions between samples or for weak peaks that only show up in subsets of spectra. As a demonstration, we used COLMARq for the efficient, semiautomated analysis of metabolite extracts from cohorts of nine P. aeruginosa planktonic and biofilm cultures each. With over 32,000 spectral cross-peaks to analyze across all 18 2D HSQC spectra, manual analysis is tedious and can take months. Batch uploading of the 18 sets of 2D HSQC and TOCSY spectra to COLMARq (~30 minutes), automatic peak picking, fitting, and matching between spectra (~2.5 hours), metabolite query against the database (few seconds), and normalization and statistical analysis (few seconds) were completed with the COLMARq server in only about 3 hours. Manual adjustment of the automated peak matching between spectra to ensure accurate selection of peaks within multiplets and between samples is the most timeconsuming step when working with a larger volume of samples. COLMARq provides visualization of all matched peaks and metabolites for a user- friendly approach to inspection and judgment of matches. The highly interactive nature of the web server facilitates simple adjustments during the course of all analysis steps. The user can go from the collected NMR spectra to a list of metabolites with their fold-changes, p-values, and a PCA plot between hours to a few days, depending on the number of samples and the amount of manual adjustments required.

[0170] In the P. aeruginosa samples, 66 metabolites were judged as good or fair database matches, and 52 of these metabolites showed a significant difference between cohorts (p < 0.05). For a recent study, these results were exported, and the metabolites were mapped to metabolic pathways to provide information about the differential metabolism of P. aeruginosa in the two growth modes [44”]. COLMARq is not limited by sample type and, therefore, should be useful for the analysis of a wide variety of metabolomics applications.

[0171] The COLMARq web server provides users with a simple, intuitive, and versatile peak picking, fitting, and matching tool for the widest possible range of NMR-based metabolomics studies that are publicly accessible. The quantification, matching, and assignment of all peaks from the sample cohorts represent a comprehensive and fully quantitative approach for the downstream analysis in both targeted and untargeted metabolomics studies. COLMARq allows users to take full advantage of the resolution and quantitative power of 2D NMR-based metabolomics measurements, considerably facilitating the accurate, semi-automated, and efficient analysis of metabolomics data.

[0172] Example 4

[0173] DEEP PickerlD and Voigt FitterlD methods are disclosed in the following example. [0174] ID peak picking and Voigt fitting method were further tested and optimized, demonstrating that it works rather well for ID NMR even when the spectra are complex. In contrast, the 2D NMR spectra DEEP Picker employed a deep neural network trained on ID data. [0175] ID NMR likely covers over 90% of all NMR applications worldwide, and 2D, 3D, 4D, etc., the rest. Many ID NMR applications deal with relatively simple spectra (few resonances, little overlap), and they would likely benefit for DEEP PickerlD (DP1D) + Voigt FitterlD (VF1D) only to a limited extent compared to existing NMR methods using commercial software (MNova, ACD, etc.). However, emerging applications in metabolomics based on ID NMR of complex mixtures (urine, serum, and many other biological systems as well as food) show very complex, overlapped spectra (see Fig. 24 ) and would benefit from DP ID + VF1D since commercial software cannot handle such situations. To make these applications more user- friendly, new databases and related tools for the accurate and automated analysis of such samples by ID NMR alone were developed. Therefore, DP1D + VF1D as demonstrated herein, represents a step toward a host of applications, including ones that will use a more targeted approach by going after a set of specific metabolites that are likely present in complex biological mixtures (human samples, food, biomarker detection, etc.).

[0176] ID DEEP Picker (DP1D) and ID Voigt Fitter (VF1D) follow a similar strategy as that of 2D DEEP Picker (DP2D) and 2D Voigt Fitter (VF2D). The artificial neural networks for DP ID and DP2D were trained on the same one-dimensional spectral training set. However, the applications are rather distinct: DP ID + VF1D analyzes ID NMR spectra, whereas DP2D + VF2D analyzes 2D NMR spectra only. Therefore, the ID methodology and process are distinct from 2D.

[0177] Although the 2D NMR approach with COLMARq is more accurate and is able to identify more metabolites in a given sample compared to ID), it typically takes several hours or longer for NMR measurement time per sample, whereas the ID NMR approach takes about 10 minutes per sample. Therefore, these approaches are very complementary. The system may employ protocols that combine the benefits of the ID approach with the 2D approach, especially for large cohorts of samples, so that the results are accurate but are obtained in a short period of time. This should be useful for a wide range of applications, including pre-clinical screening, diagnostics, quality control of food, etc. [0178] The application of DP1D and VF1D to the analysis of complex spectra in metabolomics involves two steps: (1) a ID spectrum of a complex mixture is deconvoluted with DP ID and VF1D into individual signals using the methodology of (i). (2) In a second step, the results are matched against a library of ID NMR spectra of common metabolites allowing both the identification of the metabolites present in the mixture and their quantification. This is useful for biomarker identification and biochemical pathway analysis in complex biological mixtures. [0179] In some implementations, the second step may include querying the DP1D peak picks against an NMR metabolite database. In this implementation, the database would include chemical shifts and J-coupling multiplets. The metabolite spectra of the database can be stored as the spin Hamiltonian, which allows the user to compare spectra peaks at various field strengths, Bo. An example metabolite database entry of spin Hamiltonians is provided in Table 2.

Table 2. An example metabolite spectra database entry, where peak data is given in as the spin Hamiltonian.

[0180] The application of DP1D and VF1D to the analysis of complex spectra in metabolomics may involve a third step: quantifying metabolite concentrations from peak volumes. Tn one example, shown in Fig. 25, the prediction of metabolite concentrations was predicted using the DPID and VF1D analysis method as described and compares the prediction to known concentrations.

[0181] In another example of metabolite concentration prediction from NMR peak picking analysis using the DPID and VF1D analysis method, a sample of DMEM (Dulbecco's Modified Eagle Medium), which is a nutrition-rich medium that is widely used in laboratories for supporting the growth of many different mammalian cells, was analyzed. The DMEM sample contained 15 amino acids, glucose, and pyruvate at different concentrations and several vitamins at low concentrations. Using the Dp ID and VF1D analysis method to analyze solution NMR spectra at high (850 MHz) and low field (80 MHz), the NMR spectra were reconstructed (Fig. 26) and the metabolite concentrations were predicted (Fig. 27). This example shows that the DP ID and VF1D analysis method allows quantitative analysis of moderately complex mixture at variable Bo field, even at low field Materials and Methods

[0182] Sample Preparation. Glucose sample. 2 mM glucose (from Sigma- Aldrich) was prepared in D2O before 600 pF were transferred to a 5 mm NMR tube for NMR data collection. [0183] Mouse urine sample. Frozen mouse urine sample was thawed on ice. An aliquot of 178 pl mouse urine was mixed with 20 pl sodium phosphate buffer (500 mM) in D2O and 2 pl DSS (4,4-dimethyl-4-silapentane-l -sulfonic acid from 10 mM stock solution prepared in D2O) with a final pH of 7.4. 200 pl of the final sample was transferred to a 3 mm NMR tube for NMR data collection.

[0184] NMR Experiments and Processing. All NMR spectra were collected at 298 K on Bruker AVANCE III HD 850 MHz spectrometers equipped with a cryogenically cooled TCI probe. A ID X H NOESY glucose spectrum was recorded with a total of 32768 complex data points and 64 scans. The relaxation delay between consecutive scans was 12 seconds, the spectral width was 13 ppm, and the transmitter frequency offset was set to 4.7 ppm. NMR data was zero-filled four-fold, apodized using a cosine squared window function, Fourier- transformed, and phase-corrected using Bruker Topspin 4 software.

[0185] ID X H mouse urine spectrum was recorded with the Bruker standard pulse sequence “zgesgppe,” which is a X H perfect-echo ID experiment with excitation-sculpting water suppression, with a total of 53190 complex data points and 64 scans. The relaxation delay between consecutive scans was 4 seconds. The spectral width was 25 ppm with the transmitter frequency offset set to 4.7 ppm. The NMR free induction decay was zero-filled two-fold, apodized using a 2n-Kaiser window function, Fourier-transformed, and phase-corrected using NMRPipe [5”’]. [0186] A 2D X3 C- X H high-resolution HSQC spectrum of mouse urine was recorded with Bruker pulse program “hsqcetgpsisp2.2”, which is a sensitivity-enhanced X3 C- X H HSQC with bilevel adiabatic decoupling. 3072 total complex data points in the X H ti dimension and 512 total complex points in the X3 C dimension were recorded. For each increment, 16 scans were recorded, and the relaxation delay between consecutive scans was set to 1.5 s. The spectral widths along the X H and X3 C dimensions were 18 ppm and 185 ppm, respectively. The transmitter frequency offsets were 4.7 ppm and 82.5 ppm, respectively. NMR data was zero-filled eight-fold in both dimensions, apodized using a 27t-Kaiser window function, Fourier-transformed, and phase-corrected using NMRPipe [5”’].

[0187] Deep neural network DEEP Picker ID and Voigt Fitter. DEEP PickerlD is a deep neural network that was trained on a library of 5000 synthetic ID NMR spectra containing between 3 and 9 peaks with Voigt lineshape and variable amounts of overlaps (Li et al., 2021). In the original work, DEEP Picker was specifically adapted for the analysis of 2D NMR spectra and subsequently combined with the Voigt Fitter software for the quantitative analysis of 2D NMR metabolomics spectra either as standalone software or incorporated in the public web server COLMARq [15”’]. Briefly, DEEP PickerlD is a convolutional neural network, which was trained using TensorFlow vl.3 [1’”], taking a ID spectrum as input. It contains 7 hidden convolutional layers, 1 hidden max-pooling layer, and two parallel output layers with a total of 8037 trainable parameters. A convolutional output classifier layer with SoftMax activation classifies every input data point by assigning an individual score for three peak classes (main peaks = class 2, shoulder peaks = class 1, no peak = 0). The class with the maximal score is then chosen as the predicted class, with the numerical score as a quantitative measure of confidence of the predicted class for each data point of the input spectrum. For any data point predicted to be a peak (class 2 or 1), DEEP PickerlD also predicts the sub-pixel peak position relative to the on- grid points, peak amplitude, peak width, and the Lorentzian vs. Gaussian components to the Voigt shape using a convolutional output regressor layer. Although DEEP PickerlD is a rather accurate predictor of peak parameters in its own right, these values can be further refined by the Voigt FitterlD software by performing a non-linear least square fit of the original input ID spectrum in terms of Voigt peak shapes using the DEEP PickerlD output peak parameters as input. Voigt FitterlD is a ID version of the 2D Voigt Fitter software published previously (Li et al., 2022a). DEEP Picker ID paired with Voigt Fitter ID results in a fully quantitative representation of the input ID NMR spectrum in terms of a finite set of ID Voigt-shaped peaks. [0188] The input spectrum for DEEP PickerlD needs to be pre-processed in a standard fashion, including phase correction, baseline correction, zero filling, apodization, and Fourier transformation. DEEP PickerlD contains two models whereby model 1 (model 2) has optimal performance when the digital resolution is sufficiently high, around 12 (8) points per peak (PPP). Deep PickerlD performs best for peaks with a moderate to high signal -to-noise ratio (S/N) and lineshapes that closely follow Voigt profiles with an S/N > 10 where the noise level is defined as the standard deviation of the spectrum in a peak-free region. In the presence of significant amounts of noise, nonnegligible line shape distortion, such as those caused by temperature fluctuations or suboptimal shimming during data collection, Deep PickerlD may pick some false peaks, for example, by interpreting lineshape distortions as shoulder peaks. Voigt FitterlD has built-in tools to remove spectral features from its peak list when one of the following situations occurs: (i) a fitted peak is too wide, i.e., the peak width is larger than the fitting region, or it becomes too narrow, i.e., the peak width is less than 1 point; (ii) a fitted peak strongly overlaps with another peak so that merging of two peaks into a single peak causes a minimal change of the fitting error. Deep PickerlD and Voigt FitterlD together provide a self-sufficient spectral analysis tool set for the complete deconvolution of ID spectra into individual peaks. Peak parameters, such as peak position, peak height, and peak volume, can then be directly used for downstream analysis, such as compound identification and quantitative NMR applications, when incorporated into a quantitative NMR workflow (qNMR). Because error estimation is an important part of any quantitative data analysis, Monte Carlo-based error propagation is implemented in Voigt FitterlD as an option. It performs repetitive fitting of the reconstructed spectrum after adding random noise with the same standard deviation as that of the experimental input spectrum for each round of fitting. The output from this error estimation procedure contains the fitting parameters from each round from which the uncertainty of each peak parameter is obtained.

[0189] Results. DEEP PickerlD and Voigt FitterlD performance is first demonstrated for glucose in D2O (Fig. 22).

[0190] Because glucose populates two non-equivalent isomers a-glucose and P-glucose with different relative populations that interconvert on a slow time scale and displays strong coupling effects even at commonly available magnetic fields, the deconvolution of its ID X H NMR spectrum is a challenge. Figs. 22A-22F show selected regions of the ID 1 H NMR glucose spectrum with variable amounts of peak overlap. The experimental spectra (black) along with the deconvolution results (blue) are shown in the left column (Figs. 22A, 22C, 22E). The right column (Figs. 22B, 22D, 22F) shows the corresponding spectral regions derived from quantummechanical spin simulations using chemical shifts and scalar J-couplings obtained from the GISSMO library [4”’]. An artificially slow, uniform transverse Ri relaxation rate of 0.6 s' 1 was applied to the simulated free induction decays (FID) so that after Fourier transformation, the resulting spectrum has sharp lines for easy recognition of the individual peaks and for the comparison with the automated deconvolution results. Fig. 22A starts out with a symmetric doublet centered at 5.223 ppm, which is accurately picked and fitted by DEEP Picker ID and Voigt FitterlD in agreement with the simulation results in Fig. 22B. Figs. 22C and 22D show a triplet centered around 3.705 ppm whereby the strong central peak overlaps with two much smaller peaks on each side, which are correctly picked and fitted. According to the simulation, there is another small peak around 3.718 ppm, which strongly overlaps with a much stronger peak at 3.716 ppm and could not be identified by DEEP PickerlD. This small peak also cannot be discerned by visual inspection (note that the small J-splitting of the small peak in the simulated spectrum of Fig. 22D is not resolved in the experimental spectrum of Fig. 22C). The most complex region of the glucose spectrum (3.81 - 3.85 ppm) is depicted in Fig. 22E along with its deconvolution, which is in very good agreement with the simulated peaks (Fig. 22F). The neural network does a remarkable job in identifying the small peak at 3.838 ppm, which only gives rise to a very faint shoulder peak of its down-field shifted larger neighbor. The broad, somewhat oddly shaped spectral feature from 3.82 to 3.83 ppm in the experiment is deconvoluted into four individual peaks, whereby the small peak found in the simulation at 3.823 ppm was not deconvoluted by DEEP PickerlD because it overlaps too closely with the main peak at 3.824 ppm. This is consistent with the general rule that two peaks whose positions differ within their line widths are hard to deconvolute, especially when their amplitudes significantly differ from each other.

[0191] To assess the deconvolution accuracy of our tool, we constructed experimental spectra with overlaps from resolved spectra by co-adding traces of a 13 C- X H HSQC spectrum of mouse urine along the direct 1 H detection dimension at a fixed 13 C chemical shift. Selected examples of overlapping peaks, both in isolation and as a superposition, are shown in Fig. 23. The left column (Figs. 23 A, 23C, 23E) shows the experimental superpositions together with their deconvolution and the full spectral reconstruction, which can be directly compared with the individual traces in the right column (Figs. 23B, 23D, 23F). Figs. 23A and 23B show two strongly overlapped peaks of different amplitude, giving rise to a sum peak with a noticeable protrusion on its right flank, which are accurately deconvoluted and fitted by DEEP PickerlD and Voigt FitterlD. Figs. 23C and 23D show a similar scenario, except that the amplitude ratio of the two peaks is around 35: 1, which is much larger than in Figs. 23 A, 23B. Again, deconvolution was achieved with high accuracy. Figs. 23E and 23F demonstrate the deconvolution capacity for a challenging case of four moderately to strongly overlapped peaks. Although the peak at 3.139 ppm is wedged between two stronger peaks, it is successfully extracted by the peak picking and fitting algorithms. The final example (Fig. 24) shows a region of the mouse urine spectrum along with the deconvolution and reconstruction result. The algorithm deconvolutes the spectrum by identifying not only the main peaks, but also all the minor peaks, including the peak at 7.809 ppm with confidence demonstrating the potential of the proposed deconvolution method in practice when encountering spectra with highly overlapped regions, such as those routinely collected for urine and other complex biofluids in the context of metabolomics.

[0192] Discussion and Conclusion. In the vast majority of modern NMR applications, one of the most critical steps in NMR spectral analysis is the identification of individual peaks along with their quantitative parametrization by lineshape fitting. The result of this procedure often dictates the usefulness, and ultimately the success, of the collected experiment. Traditional peak-picking methods rely on clearly defined mathematical criteria, such as the properties of the 1 st and 2 nd derivatives of the spectrum, to identify individual peaks. These criteria are often too rigid to deal with spectral overlap scenarios encountered in practice. After proper training, a deep neural network like DEEP PickerlD, on the other hand, has a stunning ability to track major and minor spectral features surpassing the capacity of most human NMR practitioners. Through the combination of advanced machine learning by the convolutional deep neural network DEEP PickerlD and a peak fitting routine Voigt FitterlD, it was demonstrated how ID NMR spectral features of variable complexity can be deconvoluted into individual resonances in a reliable and accurate manner. The success rate of the method depends on the quality of spectra that can be affected by sample preparation, NMR data acquisition, and pre-processing. This concerns the elimination or suppression of the solvent signal or of a prominent background caused, for example, by the presence of a macromolecular matrix in the sample. Although apodization, zerofilling, phase, and baseline correction are standard steps during data processing, they need to be applied judiciously to prevent suboptimal performance of spectral deconvolution and fitting. Phase errors of up to about 3° can be tolerated, but for larger phase distortions, DEEP PickerlD may interpret asymmetries in the peak shapes as shoulder peaks. Similarly, poor shimming of higher order shims, especially z 2 and z 4 , can lead to systematic peak asymmetries across the spectrum, which DEEP PickerlD may interpret as shoulder peaks. In order to accurately recognize peak shapes DEEP PickerlD requires an adequate digital resolution, which is around 8 or 12 points across a single peak, depending also on the chosen DEEP PickerlD model. If needed, lower-resolution spectra can be easily subjected to appropriated zero-filling to meet this criterion. Peak shapes should follow in good approximation Voigt profiles, which can be achieved by the application of common window functions as those described for the processing of the spectra in this work (cosine-squared and 27i-Kaiser window functions). The computational time of Voigt Fitter ID scales linearly with the number of peaks, allowing the rapid fitting of complex ID spectra with even thousands of peaks. The fitting of the ID mouse urine spectrum with a total of 4500 Voigt-shaped peaks took about 1 minute on a standard desktop computer. Like all nonlinear optimization software, Voigt FitterlD cannot guarantee that the final solution is the global % 2 minimum. Therefore, a nearly complete list of high-quality initial peaks returned by DEEP PickerlD that match the ground truth as closely as possible is key for the success of Voigt Fitter ID.

[0193] A surge in metabolomics research over recent years has spurred the development of advanced quantitative tools for the analysis of complex NMR spectra, both for ID and 2D spectra. Some metabolomics software [9”’], [20”’] are specifically geared toward the quantification of specific metabolites with known reference spectra, limiting their application to specific samples only, such as serum. In the case of DEEP PickerlD and Voigt FitterlD, the analysis is performed in a fully untargeted manner, i.e., without any molecular spectral templates, allowing its application to essentially any NMR spectrum that consists of resonances with Voigt lineshapes. The deconvolution results can then be further analyzed, for example, by querying against a spectral database or for quantitation of mixture component concentrations. In the case of a cohort of samples, the DEEP PickerlD and Voigt FitterlD results can be used for univariate or multivariate statistical analysis for the assessment of statistically significant differences between cohorts. Metabolomics query capabilities for the analysis of the output of DEEP PickerlD and Voigt FitterlD, which will also take into account peak shifts caused, for example, by pH differences between samples is currently under development. The DEEP PickerlD and Voigt FitterlD software can also be applied to a pseudo-2D series of ID spectra for the extraction of longitudinal Ri, transverse Ri relaxation parameters or translational diffusion constants by diffusion-ordered NMR [11”’]. The unique strength of the combination of DEEP PickerlD with Voigt FitterlD is their ability to accurately deconvolute and reconstruct NMR spectra of generic origin ranging from well-resolved to highly crowded, which should fulfdl the growing needs in a wide range of contemporary NMR applications [0194] Discussion

[0195] One of the major strengths of nuclear magnetic resonance (NMR) spectroscopy is its broad applicability to a vast range of molecular systems in solution or in the solid state. Because the nuclei of many atoms in molecules are NMR-active, such as hydrogen atoms, the information content of NMR spectra is uniquely rich allowing studies of molecular composition, interactions, structure and dynamics at atomic detail. Due to its quantitative nature, NMR is also highly suitable for the analysis of molecular mixtures for component identification and quantification with application in metabolomics [17’”] and for monitoring of industrial chemical and biochemical processes [24’”].

[0196] Despite enormous methodological progress made over many decades of NMR research that have resulted in a vast collection of different NMR experiments, in many NMR facilities the most popular choice remains the standard one-dimensional (ID) X H proton NMR experiment. This is the result of several factors, such as good sensitivity, short measurement time (potentially associated with a low user fee), straightforward processing, and easy and dependable implementation on different types of NMR spectrometers. However, due to the richness of the resulting X H NMR spectrum in many samples, it is prone to various amounts of spectral overlaps, which is the overlap of two or more resonances, rendering the identification and quantification of the underlying resonances challenging [7’”].

[0197] Because the first step of the analysis of almost every NMR spectrum consists of the identification of the individual resonances, spectral crowding often makes the process incomplete, ambiguous or even impossible. For many years, spectral analysis is being routinely assisted by computer software to perform useful tasks like peak-picking and peak integration thereby speeding up the analysis process by supporting human experts during this process [3”’] , [18”’] , [19”’]. A number of commercial general purpose software packages are available for the analysis of ID 1H NMR spectra such as the ACD/NMR workbook suite, the AMIX software, the Chenomx NMR suite (, and MNova NMR. Recent developments in NMR-based metabolomics, which oftentimes involve highly complex 1H NMR spectra, has led to a proliferation of academic software for the (semi-)automated analysis of such spectra, including MetaboLab [16’”], BATMAN [9’”], Bayesil [20’”], AQuA [21’”], ASICS [12’”], rDolphin [2’”] and MetaboDeconlD [8’”]. Some of these programs are suitable for untargeted compound identification whereas others only map those spectral features that are contained in a pre-defined metabolite spectral database.

[0198] For a fully quantitative spectral analysis, numerical lineshape fitting has become the method of choice using a parametric representation of each resonance in the spectrum [10’”], [22’”], [23’”]. Commonly used lineshapes are Lorentzian, Gaussian, and Voigt profiles that may explicitly include truncation or apodization effects, such as sine wiggles [6’”]. Because essentially all fitting software rely on a local non-linear least squares minimization between the model and the experimental spectrum, such as a Levenberg-Marquardt minimizer, accurate line position and linewidth for each resonance as input parameters is of paramount importance. Because such information is hard to obtain by automated computational approaches alone, lineshape fitting often requires significant interactive intervention by a human expert. This applies in particular to spectral regions with significant peak overlap manifested, for example, by one or several shoulder peaks and a large dynamic range. Although sophisticated mathematical peak picking algorithms have been developed that identify realistic peak positions [3’”], they work best for well-resolved peaks or peaks with moderate overlap, but tend to fail in the case of strong overlaps and overlaps involving three or more peaks.

[0199] Recent applications of machine-learning methods, in particular of deep neural networks (DNN), have shown qualitative progress in the ability to deconvolute complex multidimensional NMR spectra [15’”]. In the case of “DEEP Picker”, training was exclusively based on a library containing 5000 synthetic ID test spectra consisting of 3 to 9 individual Voigtshaped peaks with random amplitudes and positions amounting to a collection of training spectra with a wide range of spectral overlap [13”’]. The algorithm was then generalized to two- dimensional (2D) NMR spectra as encountered in many protein NMR and metabolomics applications.

[0200] Example Computing Device

[0201] It should be appreciated that the logical operations described above and in the appendix can be implemented in some embodiments (1) as a sequence of computer-implemented acts or program modules running on a computing system and/or (2) as interconnected machine logic circuits or circuit modules within the computing system. The implementation may be a matter of choice dependent on the performance and other requirements of the computing system. Accordingly, the logical operations described herein are referred to variously as state operations, acts, or modules. These operations, acts, and/or modules can be implemented in software, in firmware, in special purpose digital logic, in hardware, and any combination thereof. It should also be appreciated that more or fewer operations can be performed than shown in the figures and described herein. These operations can also be performed in a different order than those described herein.

[0202] In addition to the various system discussed herein, Fig. 28 shows an illustrative computer architecture for a computer system 2800 capable of executing the software components that can use the output of the exemplary method described herein. The computer architecture shown in Fig. 28 illustrates an example computer system configuration, and the computer 2800 can be utilized to execute any aspects of the components and/or modules presented herein described as executing on the analysis system or any components in communication therewith, including providing support of TEE as described herein as well as trusted Time, GPS, and Monotonic Counter as noted above.

[0203] In an embodiment, the computing device 2800 may comprise two or more computers in communication with each other that collaborate to perform a task. For example, but not by way of limitation, an application may be partitioned in such a way as to permit concurrent and/or parallel processing of the instructions of the application. Alternatively, the data processed by the application may be partitioned in such a way as to permit concurrent and/or parallel processing of different portions of a data set by the two or more computers. In an embodiment, virtualization software may be employed by the computing device 2800 to provide the functionality of a number of servers that is not directly bound to the number of computers in the computing device 2800. For example, virtualization software may provide twenty virtual servers on four physical computers. In an embodiment, the functionality disclosed above may be provided by executing the application and/or applications in a cloud computing environment. Cloud computing may comprise providing computing services via a network connection using dynamically scalable computing resources. Cloud computing may be supported, at least in part, by virtualization software. A cloud computing environment may be established by an enterprise and/or may be hired on an as-needed basis from a third-party provider. Some cloud computing environments may comprise cloud computing resources owned and operated by the enterprise as well as cloud computing resources hired and/or leased from a third-party provider.

[0204] In its most basic configuration, computing device 2800 typically includes at least one processing unit 2820 and system memory 2830. Depending on the exact configuration and type of computing device, system memory 2830 may be volatile (such as random-access memory (RAM)), non-volatile (such as read-only memory (ROM), flash memory, etc.), or some combination of the two.

[0205] This most basic configuration is illustrated in Fig. 28 by dashed line 2810. The processing unit 2820 may be a standard programmable processor that performs arithmetic and logic operations necessary for the operation of the computing device 2800. While only one processing unit 2820 is shown, multiple processors may be present. As used herein, processing unit and processor refers to a physical hardware device that executes encoded instructions for performing functions on inputs and creating outputs, including, for example, but not limited to, microprocessors (MCUs), microcontrollers, graphical processing units (GPUs), and applicationspecific circuits (ASICs). Thus, while instructions may be discussed as executed by a processor, the instructions may be executed simultaneously, serially, or otherwise executed by one or multiple processors. The computing device 2800 may also include a bus or other communication mechanism for communicating information among various components of the computing device 200.

[0206] Computing device 2800 may have additional features/functionality. For example, computing device 2800 may include additional storage such as removable storage 2840 and nonremovable storage 2850 including, but not limited to, magnetic or optical disks or tapes. Computing device 2800 may also contain network connection(s) 2880 that allow the device to communicate with other devices such as over the communication pathways described herein. The network connection(s) 2880 may take the form of modems, modem banks, Ethernet cards, universal serial bus (USB) interface cards, serial interfaces, token ring cards, fiber distributed data interface (FDDI) cards, wireless local area network (WLAN) cards, radio transceiver cards such as code division multiple access (CDMA), global system for mobile communications (GSM), long-term evolution (LTE), worldwide interoperability for microwave access (WiMAX), and/or other air interface protocol radio transceiver cards, and other well-known network devices. Computing device 2800 may also have input device(s) 2870 such as keyboards, keypads, switches, dials, mice, trackballs, touch screens, voice recognizers, card readers, paper tape readers, or other well-known input devices. Output device(s) 2860 such as printers, video monitors, liquid crystal displays (LCDs), touch screen displays, displays, speakers, etc. may also be included. The additional devices may be connected to the bus in order to facilitate the communication of data among the components of the computing device 2800. All these devices are well known in the art and need not be discussed at length here.

[0207] The processing unit 2820 may be configured to execute program code encoded in tangible, computer-readable media. Tangible, computer-readable media refers to any media that is capable of providing data that causes the computing device 2800 (i.e., a machine) to operate in a particular fashion. Various computer-readable media may be utilized to provide instructions to the processing unit 2820 for execution. Example tangible, computer-readable media may include, but is not limited to, volatile media, non-volatile media, removable media, and nonremovable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. System memory 2830, removable storage 2840, and non-removable storage 2850 are all examples of tangible, computer storage media. Example tangible, computer-readable recording media include, but are not limited to, an integrated circuit (e.g., field-programmable gate array or application-specific IC), a hard disk, an optical disk, a magneto-optical disk, a floppy disk, a magnetic tape, a holographic storage medium, a solid-state device, RAM, ROM, electrically erasable program read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices.

[0208] In light of the above, it should be appreciated that many types of physical transformations take place in the computer architecture 2800 in order to store and execute the software components presented herein. It also should be appreciated that the computer architecture 2800 may include other types of computing devices, including hand-held computers, embedded computer systems, personal digital assistants, and other types of computing devices known to those skilled in the art. It is also contemplated that the computer architecture 2800 may not include all of the components shown in Fig. 28, may include other components that are not explicitly shown in Fig. 28, or may utilize an architecture different than that shown in FIG. 28. [0209] In an example implementation, the processing unit 2820 may execute program code stored in the system memory 2830. For example, the bus may carry data to the system memory 2830, from which the processing unit 2820 receives and executes instructions. The data received by the system memory 2830 may optionally be stored on the removable storage 2840 or the non-removable storage 2850 before or after execution by the processing unit 2820.

[0210] It should be understood that the various techniques described herein may be implemented in connection with hardware or software or, where appropriate, with a combination thereof. Thus, the methods and apparatuses of the presently disclosed subject matter, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium wherein, when the program code is loaded into and executed by a machine, such as a computing device, the machine becomes an apparatus for practicing the presently disclosed subject matter. In the case of program code execution on programmable computers, the computing device generally includes a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. One or more programs may implement or utilize the processes described in connection with the presently disclosed subject matter, e.g., through the use of an application programming interface (API), reusable controls, or the like. Such programs may be implemented in a high-level procedural or object-oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language, and it may be combined with hardware implementations.

[0211] Moreover, the various components may be in communication via wireless and/or hardwire or other desirable and available communication means, systems and hardware. Moreover, various components and modules may be substituted with other modules or components that provide similar functions.

[0212] Some references, which may include various patents, patent applications, and publications, are cited in a reference list and discussed in the disclosure provided herein. The citation and/or discussion of such references is provided merely to clarify the description of the present disclosure and is not an admission that any such reference is “prior art” to any aspects of the present disclosure described herein. In terms of notation, “[n]” corresponds to the nth 10 reference in the list. All references cited and discussed in this specification are incorporated herein by reference in their entireties and to the same extent as if each reference was individually incorporated by reference.

[0213] Although example embodiments of the present disclosure are explained in some instances in detail herein, it is to be understood that other embodiments are contemplated. Accordingly, it is not intended that the present disclosure be limited in its scope to the details of construction and arrangement of components set forth in the following description or illustrated in the drawings. The present disclosure is capable of other embodiments and of being practiced or carried out in various ways.

[0214] It must also be noted that, as used in the specification and the appended claims, the singular forms “a,” “an” and “the” include plural referents unless the context clearly dictates otherwise. Ranges may be expressed herein as from “about” or “ 5 approximately” one particular value and/or to “about” or “approximately” another particular value. When such a range is expressed, other exemplary embodiments include from the one particular value and/or to the other particular value.

[0215] By “comprising” or “containing” or “including” is meant that at least the name compound, element, particle, or method step is present in the composition or article or method, but does not exclude the presence of other compounds, materials, particles, method steps, even if the other such compounds, material, particles, method steps have the same function as what is named.

[0216] In describing example embodiments, terminology will be resorted to for the sake of clarity. It is intended that each term contemplates its broadest meaning as understood by those skilled in the art and includes all technical equivalents that operate in a similar manner to accomplish a similar purpose. It is also to be understood that the mention of one or more steps of a method does not preclude the presence of additional method steps or intervening method steps between those steps expressly identified. Steps of a method may be performed in a different order than those described herein without departing from the scope of the present disclosure. Similarly, it is also to be understood that the mention of one or more components in a device or system does not preclude the presence of additional components or intervening components between those components expressly identified.

[0217] As discussed herein, a “subject” may be any applicable human, animal, or other organism, living or dead, or other biological or molecular structure or chemical environment, and may relate to particular components of the subject, for instance, specific tissues or fluids of a subject (e.g., human tissue in a particular area of the body of a living subject), which may be in a particular location of the subject, referred to herein as an “area of interest” or a “region of interest.”

[0218] The term “about,” as used herein, means approximately, in the region of, roughly, or around. When the term “about” is used in conjunction with a numerical range, it modifies that range by extending the boundaries above and below the numerical values set forth. In general, the term “about” is used herein to modify a numerical value above and below the stated value by a variance of 10%. In one aspect, the term “about” means plus or minus 10% of the numerical value of the number with which it is being used. Therefore, about 50% means in the range of 45%-55%. Numerical ranges recited herein by endpoints include all numbers and fractions subsumed within that range (e.g., 1 to 5 includes 1, 1.5, 2, 2.75, 3, 3.90, 4, 4.24, and 5).

[0219] Similarly, numerical ranges recited herein by endpoints include subranges subsumed within that range (e.g., 1 to 5 includes 1-1.5, 1.5-2, 2-2.75, 2.75-3, 3-3.90, 3.90-4, 4- 4.24, 4.24-5, 2-5, 3-5, 1-4, and 2-4). It is also to be understood that all numbers and fractions thereof are presumed to be modified by the term “about.”

[0220] The following patents, applications, and publications, as listed below and throughout this document, are hereby incorporated by reference in their entirety herein.

Exemplary Aspects

[0221] Exemplary aspect 1. A method to detect peaks in an NMR spectra comprising: receiving, by a processor, an NMR data set comprising two or more convolved NMR spectra; determining, by the processor, at least one peak location or value, or a deconvolved NMR spectra, in the two or more NMR spectra using one or more trained machine learned and/or artificial intelligence models, or a model derived therefrom; and causing, by the processor, the at least one peak location or value, or the deconvolved NMR spectra, to be displayed or employed in subsequent analysis.

[0222] Exemplary aspect 2. The method of exemplary aspect 1, wherein the trained machine learned and/or artificial intelligence model was trained using labeled peak data and at least one neighborhood peak data.

[0223] Exemplary aspect 3. The method of exemplary aspect 1, wherein the one or more trained machine learned and/or artificial intelligence models include a first model and a second model, wherein the first model is configured to detect a first type of peak, and wherein the second model is configured to detect a second type of peak.

[0224] Exemplary aspect 4. The method of exemplary aspect 1, wherein the one or more trained machine learned and/or artificial intelligence models are trained by a training data set, the training data set comprising a first label data, a second label data, and a third label data, wherein the first label data consists of a single peak or a given peak window, wherein the second label data consists of multiple peaks for a given peak window, and wherein the third label data consists of a local peak for a given peak window.

[0225] Exemplary aspect 5. The method of exemplary aspect 1, wherein the one or more trained machine learned and/or artificial intelligence models includes a neural network model.

[0226] Exemplary aspect 6. The method of any one of exemplary aspects 1-5, further comprising: determining a spectral line over the at least one peak location or value.

[0227] Exemplary aspect 7. The method of exemplary aspect 7, wherein the spectral line employs a Lorentzian or Gaussian profile.

[0228] Exemplary aspect 8. The method of exemplary aspect 7, wherein the spectral line employs a Voigt profile.

[0229] Exemplary aspect 9. A system comprising: a processor; and a memory having instructions stored thereon, wherein execution of the instructions by the processor causes the processor to perform any one of the methods of exemplary aspects 1-5.

[0230] Exemplary aspect 10. The system of exemplary aspect 9, wherein the system is configured as an NMR instrument.

[0231] Exemplary aspect 11. The system of exemplary aspect 9, wherein the system is configured as an MRI imaging system. [0232] Exemplary aspect 12. The system of exemplary aspect 9, wherein the system is configured as a server (e.g., in a remote/extemal or cloud infrastructure).

[0233] Exemplary aspect 13. The system of exemplary aspect 12, wherein the server is configured to receive the NMR data set over a network.

[0234] Exemplary aspect 14. A non-transitory computer readable medium having instructions stored thereon, wherein the instructions when executed by the processor causes the processor to perform any one of the methods of exemplary aspects 1-8 or any one of the system of exemplary aspects 9-10.

[0235] Exemplary aspect 15. A method to detect peaks in a spectral graph, the method comprising: receiving, by a processor, a set of spectral graph data; determining, by the processor, at least one peak location or value, in the set of spectral graph data using one or more trained, machine learning and/or artificial intelligence models, or a model derived therefrom; and causing, by the processor, the at least one peak location or value, to be displayed or employed in subsequent analysis.

[0236] Exemplary aspect 16. The method of exemplary aspect 15, wherein the one or more trained, machine learning and/or artificial intelligence model was trained using training data comprising synthetic spectra, wherein the training data consists of unambiguously identifiable peaks.

[0237] Exemplary aspect 17. The method of exemplary aspect 16, wherein the training data further comprises labeled peak data and at least one neighboring peak data.

[0238] Exemplary aspect 18. The method of exemplary aspect 15 or 16, wherein the one or more trained, machine learning and/or artificial intelligence models comprise one or more neural network models configured to identify at least one peak location or value within the set of spectral graph data, wherein the one or more neural network models identify a point and its two nearest neighbors as a peak.

[0239] Exemplary aspect 19. The method of exemplary aspect 18, wherein a neural network model comprises a plurality of hidden convolutional layers and at least one max pooling layer, wherein the at least one max pooling layer is a final neural network layer.

[0240] Exemplary aspect 20. The method of any one of exemplary aspects 18-19, wherein one of the one or more neural network models further comprise a convolutional layer with an activation function configured to classify the at least one peak location or value as a peak, a shoulder peak, or non-peak.

[0241] Exemplary aspect 21. The method of any one of exemplary aspects 18-20, where one of the one or more neural network models further comprises an output regression layer configured to determine a line shape centered around the at least one peak location or value.

[0242] Exemplary aspect 22. The method of any one of exemplary aspects 15-21, wherein the one or more trained, machine learning and/or artificial intelligence models are applied to the spectral graph data in a sliding window domain.

[0243] Exemplary aspect 23. The method of any one of exemplary aspects 15-22, wherein a low peak amplitude cutoff is applied to the spectral graph data.

[0244] Exemplary aspect 24. The method of any one of exemplary aspects 15-23, wherein the set of spectral graph data is solution NMR, solid-state NMR, EPR, or ESR graph data.

[0245] Exemplary aspect 25. The method of any one of exemplary aspects 15-24, further comprising: determining a spectral line over the at least one peak location or value.

[0246] Exemplary aspect 26. The method of exemplary aspect 25, wherein the spectral line employs a Lorentzian or Gaussian profile.

[0247] Exemplary aspect 27. The method of exemplary aspect 26, wherein the spectral line employs a Voigt profile.

[0248] Exemplary aspect 28. The method of any one of exemplary aspects 15-27, wherein the subsequent analysis is one or both of querying peaks against a known spectral database, and quantifying concentrations from the one or more peak location or values.

[0249] Exemplary aspect 29. The method of any one of Icaims 15-28, wherein the set of spectral graph data includes one-dimensional NMR spectra.

[0250] Exemplary aspect 30. The method of any one of Icaims 15-28, wherein the set of spectral graph data includes two-dimensional NMR spectra.

[0251] Exemplary aspect 31. The method of any one of Icaims 1-30, wherein the set of spectral graph data includes solid-state NMR data.

[0252] Exemplary aspect 32. The method of any one of Icaims 15-30, wherein the set of spectral graph data includes solution NMR. [0253] Exemplary aspect 33. The method of any one of Exemplary aspects 15-32 further comprising: receiving, by a processor, the set of spectral graph data via a web-server; providing the received set of spectral graph data to an analysis engine to determine at least one peak location or value.

[0254] Exemplary aspect 34. The method of Exemplary aspect 33, wherein the analysis engine is configured to perform automated peak picking to determine the least one peak location or value.

[0255] Exemplary aspect 35. The method of Exemplary aspect 33 or 34, wherein the analysis engine is configured to quantify the one peak location or value, and provide the quanfication to a user device in a report or via display.

[0256] Exemplary aspect 36. The method of any one of Exemplary aspects 33-35, wherein the analysis engine is configured to match the one peak location or value to a set of spectra in a database for metabolite identification and provide search output to the user device in the report or via the display.

[0257] Exemplary aspect 37. The method of any one of Exemplary aspects 33-36, wherein the analysis engine is configured to perform data normalization via ratio analysis. [0258] Exemplary aspect 38. The method of any one of Exemplary aspects 33-37, wherein the analysis engine is configured to perform peak- and compound-based uni- and multivariate statistical analyses.

[0259] Exemplary aspect 39. A system comprising: a processor; and a memory having instructions stored thereon, wherein execution of the instructions by the processor causes the processor to perform any one of the methods of Exemplary aspects 15-38.

[0260] Exemplary aspect 40. The system of Exemplary aspect 39, wherein the system is configured as an NMR instrument.

[0261] Exemplary aspect 41. The system of Exemplary aspect 40, wherein the system is configured as an MRI imaging system.

[0262] Exemplary aspect 42. The system of Exemplary aspect 41, wherein the system is configured as a server in a remote/external or cloud infrastructure.

[0263] Exemplary aspect 43. The system of Exemplary aspect 42, wherein the server is configured to receive the set of spectral graph data over a network. [0264] Exemplary aspect 44. A non -transitory computer readable medium having instructions stored thereon, wherein the instructions when executed by the processor causes the processor to perform any one of the methods of Exemplary aspects 15-38 or any one of the system of claims 39-43.

References

Reference List #1

[1] Kovermann, M., Rogne, P. & Wolf-Watz, M. Protein dynamics and function from solutionstate NMR spectroscopy. Q. Rev. Biophys. 49, e6 (2016).

[2] Markley, J. L. et al. The future of NMR-based metabolomics. Curr. Opin. Biotechnol. 43, 34- 40 (2017).

[3] Pfandler, P., Bodenhausen, G., Meier, B. U. & Ernst, R. R. Toward automated assignment of nuclear magnetic-resonance spectra — pattern-recognition in two-dimensional correlation spectra. Anal. Chem. 57, 2510-2516 (1985).

[4] Meier, B. U., Madi, Z. L. & Ernst, R. R. Computer analysis of nuclear spin systems based on local symmetry in 2D spectra. J. Magn. Reson. 74, 565-573 (1987).

[5] Bartels, C., Xia, T. H., Billeter, M., Guntert, P. & Wuthrich, K. The program XEASY for computer- supported NMR spectral analysis of biological macromolecules. J. Biomol. NMR 6, 1- 10 (1995).

[6] Koradi, R , Billeter, M., Engeli, M., Guntert, P. & Wuthrich, K. Automated peak picking and peak integration in macromolecular NMR spectra using AUTOPSY. J. Magn. Reson. 135, 288- 297 (1998).

[7] Johnson, B. A. Using NMRView to visualize and analyze the NMR spectra of macromolecules. Methods Mol. Biol. 278, 313-352 (2004).

[8] Garrett, D. S., Powers, R., Gronenborn, A. M. & Clore, G. M. A common sense approach to peak picking in two-, three-, and four-dimensional spectra using automatic computer analysis of contour diagrams. J. Magn. Reson. 95, 214-220 (1991).

[9] Liu, Z., Abbas, A., Jing, B. Y. & Gao, X. WaVPeak: picking NMR peaks through waveletbased smoothing and volume-based filtering. Bioinformatics 28, 914-920 (2012).

[10] Skinner, S. P. et al. CcpNmr AnalysisAssign: a flexible platform for integrated NMR analysis. J. Biomol. NMR 66, 111-124 (2016). [11 ] Wurz, J. M. & Guntert, P. Peak picking multidimensional NMR spectra with the contour geometry based algorithm CYPICK. J. Biomol. NMR 67, 63-76 (2017).

[12] Korzhneva, D. M., Ibraghimov, I. V., Billeter, M. & Orekhov, V. Y. MUNIN: application of three-way decomposition to the analysis of heteronuclear NMR relaxation data. J. Biomol. NMR 21, 263-268 (2001).

[13] Orekhov, V. Y., Ibraghimov, I. V. & Billeter, M. MUNIN: a new approach to multidimensional NMR spectra interpretation. J. Biomol. NMR 20, 49-60 (2001).

[14] Tikole, S., Jaravine, V., Rogov, V., Dotsch, V. & Guntert, P. Peak picking NMR spectral data using non-negative matrix factorization. BMC Bioinforma. 15, 46 (2014).

[15] Alipanahi, B., Gao, X., Karakoc, E., Donaldson, L. & Li,M. PICKY: a novel SVDbased NMR spectra peak picking method. Bioinformatics 25, i268-i275 (2009).

[16] Antz, C., Neidig, K. P. & Kalbitzer, H. R. A general Bayesian method for an automated signal class recognition in 2D NMR spectra combined with a multivariate discriminant analysis.

J. Biomol. NMR 5, 287-296 (1995).

[17] Rouh, A., Louis-Joseph, A. & Lallemand, J. Y. Bayesian signal extraction from noisy FT NMR spectra. J. Biomol. NMR 4, 505-518 (1994).

[18] Cheng, Y., Gao, X. & Liang, F. Bayesian peak picking for NMR spectra. Genom. Proteom. Bioinf. 12, 39-47 (2014).

[19] LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436-444 (2015).

[20] Baraniuk, R., Donoho, D. & Gavish, M. The science of deep learning. Proc. Natl Acad. Sci. USA 117, 30029-30032 (2020).

[21] Chen, D., Wang, Z., Guo, D., Orekhov, V. & Qu, X. Review and prospect: deep learning in nuclear magnetic resonance spectroscopy. Chemistry 26, 10391-10401 (2020).

[22] Hansen, D. F. Using deep neural networks to reconstruct non-uniformly sampled NMR spectra. J. Biomol. NMR 73, 577-585 (2019).

[23] Qu, X. et al. Accelerated nuclear magnetic resonance spectroscopy with deep learning. Angew. Chem. Int. Ed. Engl. 59, 10297-10300 (2020).

[24] Lee, H. H. & Kim, H. Intact metabolite spectrum mining by deep learning in proton magnetic resonance spectroscopy of the brain. Magn. Reson. Med. 82, 33-48 (2019).

[25] Shen, Y. & Bax, A. SPARTA+: a modest improvement in empirical NMR chemical shift prediction by means of an artificial neural network. J. Biomol. NMR 48, 13-22 (2010). [26] Han, B , Liu, Y , Ginzinger, S W. & Wishart, D. S. SHIFTX2: significantly improved protein chemical shift prediction. J. Biomol. NMR 50, 43-57 (2011).

[27] Li, D. & Bruschweiler, R. PPM One: a static protein structure based chemical shift predictor. J. Biomol. NMR 62, 403-409 (2015).

[28] Liu, S. et al. Multiresolution 3D-DenseNet for chemical shift prediction in NMR crystallography. J. Phys. Chem. Lett. 10, 4558-4565 (2019).

[29] Klukowski, P. et al. NMRNet: a deep learning approach to automated peak picking of protein NMR spectra. Bioinformatics 34, 2590-2597 (2018).

[30] Zhang, Y. D. et al. Image based fruit category classification by 13-layer deep convolutional neural network and data augmentation. Multimed. Tools Appl. 78, 3613-3632 (2019).

[31] Wei, Q. & Dunbrack, R. L. Jr. The role of balanced training and testing data sets for binary classifiers in bioinformatics. PLoS ONE 8, e67863 (2013).

[32] Larrazabal, A. J., Nieto, N., Peterson, V., Milone, D. H. & Ferrante, E. Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis. Proc. Natl Acad. Sci. USA 117, 12592-12594 (2020).

[33] Olivier, J., Kilani, S. & Poirier, R. Determination in low-energy electron loss spectroscopy of the Gaussian and Lorentzian content of experimental lineshapes. Appl. Surf. Sci. 8, 353-358 (1981).

[34] Redmon, J., Divvala, S., Girshick, R. & Farhadi, A. You only look once: unified, real-time object detection. In Proc. CVPR IEEE 779-788 (2016).

[35] LeCun, Y. et al. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1, 541-551 (1989).

[36] Abadi, M. et al. TensorFlow: a system for large-scale machine learning. In Proc. OSDI'16: 12th Usenix Symposium on Operating Systems Design and Implementation, 265-283 (2016).

[37] Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (The MIT Press, 2016).

[38] Hosang, J., Benenson, R. & Schiele, B. Learning non-maximum suppression. In Proc. 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), 6469-6477 (2017).

[39] Delaglio, F. et al. NMRPipe: a multidimensional spectral processing system based on UNIX pipes. J. Biomol. NMR 6, 277-293 (1995). [40] Yuan, C et al. Solution structure of the human oncogenic protein gankyrin containing seven ankyrin repeats and analysis of its structure-function relationship. Biochemistry 43, 12152— 12161 (2004).

[41] Yuan, C., Byeon, I. J., Li, Y. & Tsai, M. D. Structural analysis of phospholipase A2 from functional perspective. 1. Functionally relevant solution structure and roles of the hydrogenbonding network. Biochemistry 38, 2909-2918 (1999).

[42] Tu, S. et al. The ARID domain of the H3K4 demethylase RBP2 binds to a DNA CCGCCC motif. Nat. Struct. Mol. Biol. 15, 419-421 (2008).

[43] Bowles, D. P. et al. Resonance assignments of wild-type and two cysteine-free variants of the four-helix bundle protein, Rop. Biomol. NMR Assign. 12, 345-350 (2018).

[44] Timari, I. et al. Real-time pure shift HSQC NMR for untargeted metabolomics. Anal. Chem. 91, 2304-2311 (2019).

[45] Bovik, A. C. Handbook of Image and Video Processing. 2nd edn (Elsevier Academic Press, 2005).

[46] MestreNova V. 14.0 (2020).

[47] Cobas, C., Aboutanios, E. & Sykora, S. Fast two-dimensional nuclear magnetic resonance resolution enhancement by use of a Laplacian estimator. Spectrosc. Lett. 53, 529-535 (2020).

[48] Dhillon, A. & Verma, G. K. Convolutional neural network: a review of models, methodologies and applications to object detection. Prog. Artif. Intell. 9, 85-112 (2020).

[49] Bingol, K., Li, D. W., Zhang, B. & Bruschweiler, R. Comprehensive metabolite identification strategy using multiple two-dimensional NMR spectra of a complex mixture implemented in the COLMARm Web Server. Anal. Chem. 88, 12411-12418 (2016).

[50] Li, D.-W., Hansen, A. L., Yuan, C., Bruschweiler-Li, L. & Bruschweiler, R. 2D NMR HSQC Spectra of Proteins and Mouse Urine with Peaks Picked by DEEP Picker https://doi.org/10.5281/zenodo.5155575 (2021).

[51] Li, D.-W., Hansen, A. L., Yuan, C., Bruschweiler-Li, L. & Bruschweiler, R. DEEP Picker is a Deep Neural Network for Accurate Deconvolution of Complex Twodimensional NMR Spectra https://doi.org/10.5281/zenodo.5142740 (2021).

Reference list #2 [1 ’] Abadi, M. et al. (2016) TensorFlow: A system for large-scale machine learning. Tn: Proceedings of Osdi'16: 12th usenix symposium on operating systems design and implementation. 265-283.

[2’] Alipanahi B, Gao X, Karakoc E, Donaldson L, Li M (2009) PICKY: a novel SVD-based NMR spectra peak picking method. Bioinformatics 25:i268-275. https:// doi. org/ 10. 1093/ bioin formatics/ btp225

[3’] Alpaydin E (2020) Introduction to machine learning, Fourth. The MIT Press, Cambridge [4’] Antz C, Neidig KP, Kalbitzer HR (1995) A general Bayesian method for an automated signal class recognition in 2D NMR spectra combined with a multivariate discriminant analysis.

J Biomol NMR 5:287-296. https:// doi. org/ 10. 1007/ BF002 11755

[5’] Bartels C, Xia TH, Billeter M, Guntert P, Wuthrich K (1995) The program XEASY for computer- supported NMR spectral analysis of biological macromolecules. J Biomol NMR 6: 1- 10. https:// doi. org/ 10. 1007/ BF004 17486

[6’] Carrara EA, Pagliari F, Nicolini C (1993) Neural networks for nuclear magnetic resonance spectroscopy. In: Proceedings of 1993 international conference on neural networks (IJCNN-93- Nagoya, Japan). 983-986 vol.981.

[7’] Cheng Y, Gao X, Liang F (2014) Bayesian peak picking for NMR spectra. Genom Proteomics Bioinform 12:39-47. https:// doi. org/10. 1016/j. gpb. 2013. 07. 003

[8’] Garrett DS, Powers R, Gronenborn AM, Gore GMA (2011) Common sense approach to peak picking in two-, three-, and fourdimensional spectra using automatic computer analysis of contour diagrams 1991. J Magn Reson 213:357-363. https:// doi. org/ 10.1016/j. jmr. 2011. 09. 007

[9’] Hansen AL, Bruschweiler R (2016) Absolute minimal sampling in high-dimensional NMR spectroscopy. Angew Chem Int Ed Engl 55:14169-14172. https:// doi. org/ 10. 1002/ anie. 20160 8048

[10’] Hansen AL, Li D, Wang C, Bruschweiler R (2017) Absolute minimal sampling of homonuclear 2D NMR TOCSY spectra for highthroughput applications of complex mixtures. Angew Chem Int Ed Engl 56:8149-8152. https:// doi. org/ 10. 1002/ anie. 20170 3587 [I L] Johnson BA (2004) Using NMRView to visualize and analyze the NMR spectra of macromolecules. Methods Mol Biol 278:313-352. https:// doi. org/ 10. 1385/1- 59259- 809-9: 313 [12’] Kazimierczuk K, Orekhov V (2015) Non-uniform sampling: postfourier era of NMR data collection and processing. Magn Reason Chem 53:921-926. https:// doi. org/ 10. 1002/ mrc. 4284

[13’] Klukowski P et al (2018) NMRNet: a deep learning approach to automated peak picking of protein NMR spectra. Bioinformatics 34:2590-2597. https:// doi. org/ 10. 1093/ bioin forma tics/ btyl34

[14’] Kobayashi N et al (2018) Noise peak filtering in multi-dimensional NMR spectra using convolutional neural networks. Bioinformatics 34:4300-4301. https:// doi. org/ 10. 1093/ bioin forma tics/ bty581

[15’] Koradi R, Billeter M, Engeli M, Guntert P, Wuthrich K (1998) Automated peak picking and peak integration in macromolecular NMR spectra using AUTOPSY. J Magn Reson 135:288-297. https:// doi.org/ 10. 1006/ jmre. 1998. 1570

[16’] Korzhneva DM, Ibraghimov IV, Billeter M, Orekhov VY (2001) MUNIN: application of three-way decomposition to the analysis of heteronuclear NMR relaxation data. J Biomol NMR 21:263-268. https:// doi. org/ 10. 1023/a: 10129 82830 367

[17’] Krishnamurthy K (2013) CRAFT (complete reduction to amplitude frequency table)-robust and time-efficient Bayesian approach for quantitative mixture analysis by NMR. Magn Reson Chem 51 :821-829. https:// doi. org/ 10. 1002/ mrc. 4022

[18’] Krishnamurthy K, Seller AM, Russell DJ (2017) Application of CRAFT in two- dimensional NMR data processing. Magn Reson Chem 55:224-232. https:// doi. org/ 10. 1002/ mrc. 4449

[19’] Li D, Hansen AL, Bruschweiler-Li L, Bruschweiler R (2018) Nonuniform and absolute minimal sampling for high-throughput multidimensional NMR applications. Chemistry 24: 11535-11544. https:// doi. org/ 10. 1002/ chem. 20180 0954

[20’] Li D, Hansen AL, Yuan C, Bruschweiler-Li L, Bruschweiler R (2021) DEEP picker is a deep neural network for accurate deconvolution of complex two-dimensional nmr spectra. Nat Commun 12:5229. https:// doi. org/ 10. 1038/ s41467- 021- 25496-5

[21’] Liu Z, Abbas A, Jing BY, Gao X (2012) WaVPeak: picking NMR peaks through waveletbased smoothing and volume-based filtering. Bioinformatics 28:914-920. https:// doi. org/ 10. 1093/ bioinforma tics/ bts078 [22’] Meier BU, Madi ZL, Ernst RR (1987) Computer-analysis of nuclearspin systems based on local symmetry in 2d spectra. J Magn Reson 74:565-573. https:// doi. org/ 10. 1016/ 0022- 2364(87) 90278-2

[23’] Neidig KP, Bodenmueller H, Kalbitzer HR (1984) Computer aided evaluation of two- dimensional NMR spectra of proteins. Biochem Biophys Res Commun 125: 1143-1150. https:// doi. org/ 10. 1016/0006- 291x(84) 91403-7

[24’] Orekhov VY, Ibraghimov IV, Billeter M (2001) MUNIN: a new approach to multidimensional NMR spectra interpretation. J Biomol NMR 20:49-60. https:// doi. org/ 10. 1023/a: 10112 34126 930

[25’] Paszke, A. et al. (2017) Automatic differentiation in pytorch. In 31 st Conference on neural information processing systems.

[26’] Pfandler P, Bodenhausen G, Meier BU, Ernst RR (1985) Toward automated assignment of nuclear magnetic-resonance spectra-pattern-recognition in two-dimensional correlation spectra.

Anal Chem 57:2510-2516. https:// doi. org/ 10. 1021/ ac002 90a018

[27’] Rahimi M, Lee Y, Markley JL, Lee W (2021) iPick: multiprocessing software for integrated NMR signal detection and validation. J Magn Reson 328: 106995. https:// doi. org/ 10. 1016/j. jmr. 2021. 106995

[28’] Rouh A, Louis-Joseph A, Lallemand JY (1994) Bayesian signal extraction from noisy FT NMR spectra. J Biomol NMR 4:505-518. https:// doi. org/ 10. 1007/ BF001 56617

[29’] Skinner SP et al (2016) CcpNmr AnalysisAssign: a flexible platform for integrated NMR analysis. J Biomol NMR 66: 111-124. https://doi. org/ 10. 1007/ S10858- 016- 0060-y

[30’] Thomsen JU, Meyer B (1989) Pattern recognition of the 1H NMR spectra of sugar alditols using a neural network. J Magn Reason 84:212-217. https:// doi. org/ 10. 1016/ 0022- 2364(89) 90021-8

[31’] Tikole S, Jaravine V, Rogov V, Dotsch V, Guntert P (2014) Peak picking NMR spectral data using non-negative matrix factorization. BMC Bioinformatics 15:46. https:// doi. org/ 10. 1186/1471- 2105- 15- 46

[32’] Ting KM (2011) Encyclopedia of machine learning. In: Sammut C, Webb GI (eds), Springer, Boston, MA, pp 781. https:// doi. org/ 10.1007/ 978-0- 387- 30164-8_ 752 [33’] Wurz JM, Guntert P (2017) Peak picking multidimensional NMR spectra with the contour geometry based algorithm CYPICK. J Biomol NMR 67:63-76. https:// doi. org/ 10. 1007/ S10858- 016- 0084-3

[34’] Ying J, Delaglio F, Torchia DA, Bax A (2017) Sparse multidimensional iterative lineshape- enhanced (SMILE) reconstruction of both nonuniformly sampled and conventional NMR data. J Biomol NMR 68: 101-118. https:// doi. org/ 10. 1007/ S10858- 016- 0072-7

[35’] Zaghloul MR, Ali AN (2011) Algorithm 916: computing the Faddeyeva and Voigt functions. ACM Trans Math Softw. https:// doi. org/ 10.1145/ 20496 73. 20496 79

[36’] Zambrello MA, Maciejewski MW, Schuyler AD, Weatherby G, Hoch JC (2017) Robust and transferable quantification of NMR spectral quality using IROC analysis. J Magn Reson 285:37-46. https://doi. org/ 10. 1016/j. jmr. 2017. 10. 005

Reference List #3

[1”] Lindon, J. C ; Holmes, E.; Bollard, M. E.; Stanley, E. G ; Nicholson, J. K. Biomarkers 2004, 9, 1-31.

[2”] Klassen, A.; Faccio, A. T.; Canute, G. A. B.; da Cruz, P. L. R.; Ribeiro, H. C.; Tavares, M. F. M.; Sussulini, A., Metabolomics: Definitions and Significance in Systems Biology. In Metabolomics: From Fundamentals to Clinical Applications, Sussulini, A., Ed. Springer International Publishing: Cham, 2017; pp. 3-17, DOI: 10.1007/978-3-319-47656-8_l.

[3”] Fiehn, O. Plant Mol. Biol. 2002, 48, 155-171.

[4”] Holmes, E.; Wilson, I. D.; Nicholson, J. K. Cell 2008, 134, 714-717.

[5”] Wishart, D. S. Nat. Rev. Drug Discovery 2016, 15, 473-484.

[6”] LeVatte, M.; Keshteli, A. H.; Zarei, P.; Wishart, D. S. Lifestyle Genomics 2022, 15, 1-9.

[7”] Nagana Gowda, G. A.; Raftery, D. J. Magn. Reson. 2015, 260, 144-160.

[8”] Markley, J. L.; Briischweiler, R.; Edison, A. S.; Eghbalnia, H. R.; Powers, R.; Raftery, D.; Wishart, D. S. Curr. Opin. Biotechnol. 2017, 43, 34-40.

[9”] Bingol, K.; Bruschweiler-Li, L.; Li, D.; Zhang, B.; Xie, M.; Briischweiler, R. Bioanalysis 2016, 8, 557-573.

[10”] Crook, A. A.; Powers, R. Molecules 2020, 25, 5128.

[11”] Edison, A. S.; Colonna, M.; Gouveia, G. J.; Holderman, N. R.; Judge, M. T.; Shen, X.; Zhang, S. Anal. Chem. 2021, 93, 478-499. [12”] Bingol, K ; Briischweiler, R. Anal. Chem. 2014, 86, 47-57.

[13”] Emwas, A. H.; Roy, R.; McKay, R. T.; Tenori, L.; Saccenti, E.; Gowda, G. A. N.; Raftery, D.; Alahmari, F.; Jaremko, L.; Jaremko, M.; Wishart, D. S. Metabolites 2019, 9, 123.

[14”] Ludwig, C ; Gunther, U. L. BMC Bioinf. 2011, 12, 366.

[15”] Hao, J.; Liebeke, M.; Astle, W ; De Iorio, M.; Bundy, J. G.; Ebbels, T. M. Nat. Protoc. 2014, 9, 1416-1427.

[16”] Ravanbakhsh, S.; Liu, P.; Bjorndahl, T. C.; Mandal, R.; Grant, J. R.; Wilson, M.; Eisner, R.; Sinelnikov, I ; Hu, X.; Luchinat, C.; Greiner, R.; Wishart, D. S. PLoS One 2015, 10, No. e0124219.

[17”] Rohnisch, H. E.; Eriksson, J.; Mullner, E.; Agback, P.; Sandstrom, C.; Moazzami, A. A. Anal. Chem. 2018, 90, 2095-2102.

[18”] Lefort, G.; Liaubet, L ; Canlet, C.; Tardivel, P.; Pere, M. C.; Quesnel, H.; Paris, A.; lannuccelli, N.; Vialaneix, N.; Servien, R. Bioinformatics 2019, 35, 4356-4363.

[19”] Canueto, D.; Gomez, J.; Salek, R. M.; Correig, X.; Canellas, N. Metabolomics 2018, 14, 24.

[20) Hu, K.; Westler, W. M.; Markley, J. L. J. Am. Chem. Soc. 2011, 133, 1662-1665.

[21”] Bingol, K.; Li, D. W.; Zhang, B.; Briischweiler, R. Anal. Chem. 2016, 88, 12411-12418.

References List #4

[l”’]Abadi, M., Barham, P., Chen, J. M., Chen, Z. F., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D. G., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., and Zheng, X. Q.: TensorFlow: A system for large-scale machine learning, Proceedings of Osdi' 16: 12th Usenix Symposium on Operating Systems Design and Implementation, WOS:0005690624000172016.

[2”']Canueto, D., Gomez, J., Salek, R. M., Correig, X., and Canellas, N.: rDolphin: a GUI R package for proficient automatic profiling of ID (l)H-NMR spectra of study datasets, Metabolomics, 14, 24, 10.1007/sl 1306-018-1319-y, 2018.

[3”']Cobas, C., Seoane, F., Vaz, E., Bernstein, M. A., Dominguez, S., Perez, M., and Sykora, S.: Automatic assignment of 1H-NMR spectra of small molecules, Magn Reson Chem, 51, 649-654, 10.1002/mrc.3995, 2013. [4”']Dashti, H , Wedell, J. R , Westler, W. M., Tonelli, M., Aceti, D , Amarasinghe, G. K., Markley, J. L., and Eghbalnia, H. R.: Applications of Parametrized NMR Spin Systems of Small Molecules, Anal Chem, 90, 10646-10649, 10.1021/acs.analchem.8b02660, 2018.

[5”']Delaglio, F., Grzesiek, S., Vuister, G. W., Zhu, G., Pfeifer, J., and Bax, A.: NMRPipe: a multidimensional spectral processing system based on UNIX pipes, J. Biomol. NMR, 6, 277- 293, 10.1007/BF00197809, 1995.

[6”']Dudley, J. A., Park, S., MacDonald, M. E., Fetene, E., and Smith, C. A.: Resolving overlapped signals with automated FitNMR analytical peak modeling, J Magn Reson, 318, 106773, 10.1016/j.jmr.2020.106773, 2020.

[7”']Giraudeau, P.: Challenges and perspectives in quantitative NMR, Magn Reson Chem, 55, 61-69, 10.1002/mrc.4475, 2017.

[8”']Hackl, M., Tauber, P , Schweda, F., Zacharias, H. U., Altenbuchinger, M., Oefner, P. J., and Gronwald, W.: An R-Package for the Deconvolution and Integration of ID NMR Data: MetaboDeconlD, Metabolites, 11, 10.3390/metabol 1070452, 2021.

[9”']Hao, J., Liebeke, M., Astle, W., De Iorio, M., Bundy, J. G., and Ebbels, T. M.: Bayesian deconvolution and quantification of metabolites in complex ID NMR spectra using BATMAN, Nat Protoc, 9, 1416-1427, 10.1038/nprot.2014.090, 2014.

[10”']Higinbotham, J. and Marshall, I.: NMR lineshapes and lineshape fitting procedures, in: Annual Reports on Nmr Spectroscopy, Vol 43, edited by: Webb, G. A., Annual Reports on NMR Spectroscopy, 59-120, 10.1016/s0066-4103(01)43009-2, 2001.

[l l'”]Johnson, C. S.: Diffusion ordered nuclear magnetic resonance spectroscopy: principles and applications, Progress in Nuclear Magnetic Resonance Spectroscopy, 34, 203-256, 10.1016/s0079-6565(99)00003-5, 1999.

[12'”]Lefort, G., Liaubet, L , Canlet, C., Tardivel, P., Pere, M. C., Quesnel, H., Paris, A., lannuccelli, N., Vialaneix, N., and Servien, R.: ASICS: an R package for a whole analysis workflow of ID 1H NMR spectra, Bioinformatics, 35, 4356-4363, 10.1093/bioinformatics/btz248, 2019.

[13'”]Li, D., Hansen, A. L., Yuan, C , Bruschweiler-Li, L., and Briischweiler, R.: DEEP Picker is a Deep Neural Network for Accurate Deconvolution of Complex Two-Dimensional NMR Spectra Nat. Commun., DOI: 10.5281/zenodo.5142740, 2021. [14'”]Li, D W , Leggett, A., Bruschweiler-Li, L ., and Bruschweiler, R : COLMARq: A Web Server for 2D NMR Peak Picking and Quantitative Comparative Analysis of Cohorts of Metabolomics Samples, Anal Chem, 94, 8674-8682, 10.1021/acs.analchem.2c00891, 2022a.

[15'”]Li, D. W , Hansen, A. L., Bruschweiler-Li, L., Yuan, C., and Bruschweiler, R.: Fundamental and practical aspects of machine learning for the peak picking of biomolecular NMR spectra, J Biomol NMR, 76, 49-57, 10.1007/sl0858-022-00393-l, 2022b.

[16'”]Ludwig, C. and Gunther, U. L.: MetaboLab— advanced NMR data processing and analysis for metabolomics, BMC Bioinformatics, 12, 366, 10.1186/1471-2105-12-366, 2011.

[17”']Markley, J. L., Bruschweiler, R., Edison, A. S., Eghbalnia, H. R., Powers, R., Raftery, D., and Wishart, D. S.: The future of NMR-based metabolomics, Curr. Opin. Biotechnol., 43, 34-40, 10.1016/j.copbio.2016.08.001, 2017.

[18'”]Martin, Y. L.: A Global Approach to Accurate and Automatic Quantitative Analysis of NMR Spectra by Complex Least-Squares Curve Fitting, J Magn Reson Series A, 111, 1-10, 10. 1006/jmra. 1994.1218, 1994.

[19'”]Nelson, S. J. and Brown, T. R.: The accuracy of quantification from ID NMR spectra using the PIQABLE algorithm, J Magn Reson, 84, 95-109, 10.1016/0022-2364(89)90008-5, 1989.

[2O'”]Ravanbakhsh, S., Liu, P., Bjomdahl, T. C , Mandal, R., Grant, J. R., Wilson, M., Eisner, R , Sinelnikov, I., Hu, X., Luchinat, C , Greiner, R., and Wishart, D. S.: Accurate, fully- automated NMR spectral profiling for metabolomics, PLoS One, 10, eO 124219, 10.1371/journal. pone.0124219, 2015.

[21'”]Rohnisch, H. E., Eriksson, J., Mullner, E., Agback, P., Sandstrom, C., and Moazzami, A. A. : AQuA: An Automated Quantification Algorithm for High-Throughput NMR-Based Metabolomics and Its Application in Human Plasma, Anal Chem, 90, 2095-2102, 10.1021 /acs. analchem.7b04324, 2018.

[22'”]Smith, A. A.: INFOS: spectrum fitting software for NMR analysis, J Biomol NMR, 67, 77-94, 10.1007/S10858-016-0085-2, 2017.

[23'”] Sokolenko, S., Jezequel, T., Hajjar, G., Farjon, J., Akoka, S., and Giraudeau, P.: Robust ID NMR lineshape fitting using real and imaginary data in the frequency domain, J Magn Reson, 298, 91-100, 10.1016/j.jmr.2018.11.004, 2019. [24”']Wang, R C C ., Campbell, D. A., Green, J. R , and Cuperlovic-Culf, M.: Automatic ID (1)H NMR Metabolite Quantification for Bioreactor Monitoring, Metabolites, 11, 10.3390/metabol 1030157, 2021.