Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
MUSICAL SCORE PERFORMANCE ALIGNMENT FOR AUTOMATED PERFORMANCE EVALUATION
Document Type and Number:
WIPO Patent Application WO/2024/107949
Kind Code:
A1
Abstract:
Techniques are described for musical score performance alignment for automated performance evaluation. Embodiments receive performed note data defining pitches and onset times of performed notes as a user performs a musical score, and the musical score is computationally represented by score note data defining pitches and onset times of score notes. The performed notes are automatically aligned to respective score notes by computing a highest likelihood sequence of complex hand states for a sequence of time steps, each corresponding to the onset timing of a respective one of the score notes. Evaluation feedback, including qualitative evaluation feedback, is automatically generated by pattern-matching the note-wise alignment to evaluation models. Several embodiments provide additional features, such as performing automated alignment, evaluation, and feedback while the user is performing the musical score, in context of two-handed performances, etc.

Inventors:
GRACHTEN MAARTEN (US)
CHOI SUNNY SUNGEUN (US)
VAN LAARHOVEN FRITS (US)
Application Number:
PCT/US2023/080002
Publication Date:
May 23, 2024
Filing Date:
November 16, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
MUSIC APP INC (US)
International Classes:
G09B15/00; G10G1/00; G10H1/00
Foreign References:
US8629342B22014-01-14
US20220172640A12022-06-02
US9865241B22018-01-09
CN111723938A2020-09-29
US7705231B22010-04-27
Other References:
SHULEI JI ET AL: "A Comprehensive Survey on Deep Music Generation: Multi-level Representations, Algorithms, Evaluations, and Future Directions", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 13 November 2020 (2020-11-13), XP081813231
Attorney, Agent or Firm:
SHERWINTER, Daniel, J. et al. (US)
Download PDF:
Claims:
Attorney Docket No.: 108913-1412527 (000200PC) WHAT IS CLAIMED IS: 1. The method for musical score performance alignment for automated performance evaluation comprising: receiving, by a processor-based evaluation engine, performed note data defining a sequence of performed notes representing a performance of a musical score by a user in accordance with a score time basis, each of the sequence of performed notes defined by at least a respective performed pitch and performed onset time; receiving, by the processor-based evaluation engine, score note data defining a sequence of score notes representing the musical score, each of the sequence of score notes defined by at least a respective score pitch and a score onset time referenced to the score time basis; computing, by the processor-based evaluation engine, a note-wise alignment between the performed note data and the score note data by computing a highest likelihood sequence of complex hand states for a sequence of time steps in the score time basis given the sequence of score notes and the sequence of performance notes, wherein each time step of the sequence of time steps is defined as corresponding to the score onset time of a respective one of the sequence of score notes; and automatically generating qualitative evaluation feedback by pattern-matching the note-wise alignment to a library of evaluation models. 2. The method of claim 1, wherein the computing the note-wise alignment and the automatically generating the qualitative evaluation feedback are at least partially performed for at least a portion of the sequence of time steps during a corresponding portion of the performance by the user, and further comprising displaying at least a portion of the qualitative evaluation feedback during the corresponding portion of the performance by the user. 3. The method of claim 1, wherein the automatically generating the qualitative evaluation feedback is at least partially performed upon completion of the performance by the user, and further comprising displaying at least a portion of the qualitative evaluation feedback after completion of the performance by the user. Attorney Docket No.: 108913-1412527 (000200PC) 4. The method of claim 1, wherein each complex hand state of the sequence of complex hand states is a joint left-and-right-hand state computed as element of a cartesian product of a computed left-hand state space for the time step and a computed right-hand state space for the time step. 5. The method of claim 1, wherein the computing the note-wise alignment comprises: applying time step discretization and windowing to the performed note data and the score note data to generate a sequence of regions, each associated with a respective time step of the sequence of time steps in the score time basis. 6. The method of claim 5, wherein the computing the note-wise alignment further comprises: computing region data for each region of the sequence of regions based on the computing the highest likelihood sequence of complex hand states, the respective region data for each region indicating either a not-playing observation or a playing observation, the playing observation further indicating associated pitch delta information and associated timing delta information for the region. 7. The method of claim 6, further comprising, for each region, computing the associated pitch delta information and the associated timing delta information by: defining a positive observation for the region as either the not-playing observation or the playing observation; computing a kernel density estimation (KDE) of the positive observation, such that the KDE of the positive observation contains modes corresponding to alignments between the score note associated with the region and any performance notes in the region; defining a corresponding negative observation for the region; computing a KDE of the negative observation, such that the KDE of the negative observation contains modes corresponding to misalignments between the score note associated with the region and any performance notes in the region; and computing a non-normalized density for the region by subtracting the KDE of the negative observation from the KDE of the positive observation, such that the non-normalized Attorney Docket No.: 108913-1412527 (000200PC) density has modes corresponding to highest-likelihood candidates for the associated pitch delta information and the associated timing delta information for the region. 8. The method of claim 6, wherein, for each region, the computing the region data comprises: computing the associated pitch delta information, associated timing delta information, and associated weighted kernel density estimations for the region; discretizing the hand state space for the region based on the associated pitch delta information, the associated timing delta information, and the associated kernel density estimations; and computing a Viterbi path through the discretized hand state space. 9. The method of claim 6, wherein the computing the note-wise alignment further comprises: generating a one-to-one mapping between the performance notes and the score notes by, for each region of the sequence of regions, querying the respective region data to obtain respective region feedback, the respective region feedback comprising either a single feedback indication of the not-playing observation, or one or more feedback indications representing the playing observation as one or more note-specific errors relative to one or more particular score notes. 10. The method of claim 1, wherein: each complex hand state of the highest likelihood sequence of complex hand states represents a hypothesis of whether the user is playing a predicted note at the time step and, if so, a hypothesis of a delta between the predicted note and the one of the sequence of score- referenced note events, wherein the delta comprises a pitch delta and a time delta, and at least one of the pitch delta or the time delta is represented by a continuous variable. 11. The method of claim 10, wherein the computing comprises, at each time step: applying a revised hidden Markov model (HMM) to generate a respective plurality of hypothesized complex hand states for the time step; and Attorney Docket No.: 108913-1412527 (000200PC) ranking the respective plurality of hypothesized complex hand states to determine a respective highest likelihood complex hand state for the time step, the highest likelihood sequence of complex hand states being the respective highest likelihood complex hand state for each of the sequence of time steps. 12. The method of claim 11, further comprising: applying a Viterbi algorithm to compute transition probabilities between the respective plurality of hypothesized complex states at each time step, wherein the ranking is determined by combining the observation likelihood of each hypothesis with the transition probabilities between consecutive hypotheses. 13. The method of claim 1, wherein the receiving performed note data comprises: receiving a raw audio stream via an audio interface; and processing the raw audio stream by a transcription engine of a note processor to generate the performed note data, the note processor being coupled with the processor-based evaluation engine. 14. The method of claim 1, wherein the receiving performed note data comprises: receiving a musical instrument digital interface (MIDI) stream via a MIDI interface; and processing the MIDI stream by a note processor to generate the performed note data, the note processor being coupled with the processor-based evaluation engine. 15. The method of claim 1, wherein each of the sequence of performed notes is defined further by a respective performed offset time and/or performed duration. 16. The method of claim 1, wherein each of the sequence of score notes is defined further by a respective score offset time and/or score duration. 17. An automated musical performance evaluation system comprising: a musical score data store having, stored thereon, score note data defining a sequence of score notes representing a musical score, each of the sequence of score notes defined by at least a respective score pitch and a score onset time referenced to a score time basis; Attorney Docket No.: 108913-1412527 (000200PC) an audio interface to receive an audio stream during performance of the musical score by a user in accordance with the score time basis; a note processor coupled with the audio interface to generate, from the audio stream, performed note data defining a sequence of performed notes representing the performance of the musical score by the user, each of the sequence of performed notes defined by at least a respective performed pitch and performed onset time; and a processor-based evaluation engine coupled with the note processor and the musical score data store and configured to: receive the performed note data from the note processor; receive the score note data from the musical score data store; compute, by the processor-based evaluation engine, a note-wise alignment between the performed note data and the score note data by computing a highest likelihood sequence of complex hand states for a sequence of time steps in the score time basis given the sequence of score notes and the sequence of performance notes, wherein each time step of the sequence of time steps is defined as corresponding to the score onset time of a respective one of the sequence of score notes; and automatically generate qualitative evaluation feedback by pattern- matching the note-wise alignment to a library of evaluation models. 18. The system of claim 17, further comprising: a display processor to: direct display of the musical score data on a display device as a visual representation of the musical score during the performance by the user; and direct display of the qualitative evaluation feedback on the display device. 19. The system of claim 18, wherein: the processor-based evaluation engine is configured to compute the note-wise alignment and the automatically generate the qualitative evaluation feedback for at least a portion of the sequence of time steps during a corresponding portion of the performance by the user; and the display processor is configured to direct display of at least a portion of the qualitative evaluation feedback during the corresponding portion of the performance by the user. Attorney Docket No.: 108913-1412527 (000200PC) 20. The system of claim 17, further comprising: one or more processors coupled with the audio interface; and a non-transitory processor-readable memory having, stored thereon, instructions which, when executed, cause the one or more processors to implement the note processor and the processor-based evaluation engine.
Description:
Attorney Docket No.: 108913-1412527 (000200PC) PATENT APPLICATION MUSICAL SCORE PERFORMANCE ALIGNMENT FOR AUTOMATED PERFORMANCE EVALUATION CROSS-REFERENCE TO RELATED APPLICATIONS [0001] This application claims the benefit of and priority to U.S. Provisional Application No. 63/425,715, filed on November 16, 2022, which is hereby incorporated by reference. FIELD [0002] Embodiments relate generally to musical performance feedback applications; and, more particularly, to musical score performance alignment for automated performance evaluation. BACKGROUND [0003] A primary value provided by any teacher of a skill, including a music teacher, is to be able to qualitatively evaluate a student’s performance to be able to clearly communicate useful feedback. Existing computer-based applications for musical training, such as for learning to play piano, are highly deficient in this regard. SUMMARY [0004] Embodiments of the present invention relate to musical score performance alignment for automated performance evaluation. For example, embodiments receive performed note data defining pitches and onset times of performed notes as a user performs a musical score, and the musical score is computationally represented by score note data defining pitches and onset times of score notes. The performed notes are automatically aligned to respective score notes by computing a highest likelihood sequence of complex hand states for a sequence of time steps, each corresponding to the onset timing of a respective one of the score notes. Evaluation feedback, including qualitative evaluation feedback, is automatically generated by pattern- matching the note-wise alignment to evaluation models. Several embodiments provide additional features, such as performing automated alignment, evaluation, and feedback while the user is performing the musical score, in context of two-handed performances, etc. [0005] This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used in isolation to determine the scope of the claimed subject Attorney Docket No.: 108913-1412527 (000200PC) matter. The subject matter should be understood by reference to appropriate portions of the entire specification of this patent, any or all drawings, and each claim. [0006] The foregoing, together with other features and embodiments, will become more apparent upon referring to the following specification, claims, and accompanying drawings. BRIEF DESCRIPTION OF THE DRAWINGS [0007] The present disclosure is described in conjunction with the appended figures: [0008] FIGS.1A and 1B, an illustrative system is shown to support various embodiments described herein. [0009] FIG.2 shows an example of a set of score notes and performed notes in a “pianoroll” layout, along with a table of related time and pitch delta values. [0010] FIG.3 shows a plot of representative pitch and time deltas for several performed notes plotted on a two-dimensional plane relative to the pitch and timing of a score note. [0011] FIGS.4A and 4B show illustrative mode-selection/discretization for example data. [0012] FIGS.5A and 5B show an illustrative process flow for aligning performance notes to score notes, according to embodiments described herein. [0013] FIG.6 shows a pianoroll format representation of the given time window, including a sequence of score notes (dark thin horizontal lines) and performed notes (lighter thicker horizontal lines). [0014] FIGS.7A and 7B show kernel density plots over a timing-pitch delta hypothesis space for left-hand and right-hand events, respectively, corresponding to a time step in the given time window of FIG.6. [0015] FIG.8 provides a schematic illustration of one embodiment of a computer system that can implement various system components and/or perform various steps of methods provided by various embodiments. [0016] FIGS.9A – 9C show several plots representing an illustrative performance that is largely correct, except for a silence between approximately 5 and 7 seconds. Attorney Docket No.: 108913-1412527 (000200PC) [0017] FIGS.10A – 10C show several plots representing an illustrative performance that has several issues, including that the user played one octave too high from 27 second onward, and that the user played several wrong notes across the piece. [0018] FIG.11 shows a flow diagram of an illustrative method for musical score performance alignment for automated performance evaluation, according to embodiments described herein. [0019] In the appended figures, similar components and/or features may have the same reference label. Further, various components of the same type may be distinguished by following the reference label by a second label (e.g., a lower-case letter) that distinguishes among the similar components. If only the first reference label is used in the specification, the description is applicable to any one of the similar components having the same first reference label irrespective of the second reference label. DETAILED DESCRIPTION [0020] Embodiments of the disclosed technology will become clearer when reviewed in connection with the description of the figures herein below. In the following description, numerous specific details are set forth to provide a thorough understanding of the present invention. However, one having ordinary skill in the art should recognize that the invention may be practiced without these specific details. In some instances, circuits, structures, and techniques have not been shown in detail to avoid obscuring the present invention. [0021] When learning an instrument, such as piano, it is valuable to be able to receive feedback on performances. Several computer-based applications provide an environment in which users can perform a piece, and the application will provide feedback. However, to date, these conventional applications have been deficient in the amount and type of feedback provided. For example, some conventional applications provide the user with a sequence of notes (scored notes) to play. During and/or after the user’s performance of the scored notes, the conventional application decomposes the performance into a sequence of performed notes, each having at least an associated time and pitch. The conventional application looks at each of the score notes to determine whether it was correctly played by determining whether there is a performed note with the same pitch and sufficiently close timing. The conventional application may then provide note-by-note feedback (e.g., by showing correctly played scored notes in green and incorrectly Attorney Docket No.: 108913-1412527 (000200PC) played score notes in red) and/or a quantitative measure of correctness (e.g., a percentage of correctly played notes). [0022] While such note-by-note or quantitative feedback can be helpful, it does not provide the user with any qualitative analysis. For example, the user may not be able to readily determine if she is having trouble with rhythm or with pitch, or whether there is some pattern to the user’s errors. As one example, a beginning user may play an entire passage perfectly, except that all notes were shifted an octave away from their correct positions. As another example, suppose a user played every note with the correct pitch, but the rhythm was consistently incorrect. In both cases, a conventional application may simplistically and unhelpfully mark all of the notes as incorrect, and the user would have no clear guidance for improvement. [0023] Additional deficiencies arise in environments where an application uses microphone input to “listen” to the user’s performance. Such applications conventionally perform signal processing on a received audio signal by looking for changes in primary frequency components and/or amplitude envelopes to decompose the signal into discrete note events with pitch and timing. This process can be prone to transcription error. For example, noise and/or other artifacts in the received audio signal can cause an application to recognize multiple performed notes as a single note event, to recognize one performed note as several note events, to fail to recognize note events, to recognize phantom note events (e.g., by misinterpreting a background noise as a note), etc. [0024] Techniques described herein use predictive techniques to infer a highest likelihood sequence of complex states given a sequence of scored note events (in a presented musical score) and a sequence of performed note events (received as an audio signal). Each complex state in the highest likelihood sequence represents an inferred determination, at a corresponding time step (e.g., in a score-referenced time basis), of whether or not the user is playing at that time step, and, if the user is playing, a delta between what the user is playing at that time step and what is represented in the musical score at that time step. The predictive techniques can include applying a revised hidden Markov model (HMM) to generate and rank complex state hypotheses at each time step. Each complex state hypothesis can include a complex set of variables, including at least a playing/not playing variable and one or more continuous variables to represent pitch, timing, and/or other logical note characteristics (e.g., dynamics). In some Attorney Docket No.: 108913-1412527 (000200PC) embodiments, a Viterbi algorithm (e.g., or an alternative method, like particle filtering), is applied to compute a most likely sequence of complex states given a score and a sequence of performed notes. In some such embodiments, the HMM results in a very large number of complex state hypotheses at each time step (e.g., theoretically infinite because of the continuous variables), and one or more clustering/mode-finding algorithms (e.g., mean-shift, gradient-based hill climbing, or the like) is applied to reduce the number to a discrete set of N complex state hypotheses, where N is a predetermined number (e.g., ten). In such embodiments, the Viterbi algorithm is applied to the discrete set of N complex state hypotheses at each step. In some embodiments, N is a predetermined number (e.g., ten). In other embodiments, N varies per timestep, but is limited by heuristics (e.g., by taking the 25th percentile of largest modes of the emission probability distribution at each timestep). In such embodiments, the Viterbi algorithm is applied to the discrete and finite set of complex state hypotheses. In some embodiments, the inferred highest likelihood sequence of complex states is used to generate qualitative feedback for the user’s performance. For example, the information contained in the inferred highest likelihood sequence of complex states directly conveys regions where the user did not play, where the user played consistently early or consistently late, or where they played with a systematic pitch offset (e.g., when they play an entire sequence of notes in the wrong octave). The inferred highest likelihood sequence of complex states also serves as the basis for the further inference of more localized performance errors, such as individual wrong notes, early or late note onsets, etc. In addition to highlighting errors, the inferred highest likelihood sequence of complex states can also be used to formulate positive feedback (e.g. highlighting regions that the user played flawlessly). [0025] As known to anyone of skill in musical performance, a “musical score” uses a form of musical notation to represent a musical composition (i.e., a piece of music) as a sequence of score-notated events. It is assumed herein that the musical score provides a user with a score- referenced time basis. For example, the musical score may include a “4:4” time signature, indicating that each measure has four beats, and each beat is referenced to a quarter note. Thus, performing the piece at a metronome setting of 60 beats per minute involves performing the piece at a rate of 60 quarter notes per minute, or 15 measures per minute. Sounding events are notated in a musical score in the form of notes: These events are typically pitched, and are temporally delimited by an onset and offset time (the position in time at which the pitched sound Attorney Docket No.: 108913-1412527 (000200PC) begins and ends, respectively). Each score note can include an associated onset time and an offset time. As used herein, “onset time” is the time at which a score note begins, and “offset time” is the time at which a score note ends. Relative horizontal placement of an score note on a staff can indicate the onset time, and standard notation symbology designates the duration of the score note, thereby indicating the offset time. Standard musical notation includes a large symbology for designating score notes and their durations. For example, various symbols are used to indicate that a note or rest consumes the duration of a whole note, half note, quarter note, eighth note, dotted quarter note, double-dotted quarter note, triplet eighth note, a quarter note tied to a sixteenth note, etc. Vertical placement of a score note on a staff indicates its pitch. For example, a score note placed across the highest line of a staff marked with a treble clef sign has the pitch ‘F’ in the fifth octave (F5). Note events as notated in a musical score are referred to herein as “score notes.” [0026] Proper performance of score notes involves applying the notated rhythmic timing to the score-referenced time basis. For example, with a ‘4:4’ time signature and a metronome setting of 60 beats per minute, precise rhythmic performance of the example eighth note or rest above begins (has a note onset at) seven seconds after the start of the piece and continues for a half- second (i.e., has a note offset at 7.5 seconds). In some cases, the played duration is determined by additional and/or alternative factors, such as musical genre, time signature, and additional score markings (e.g., indicating to play a particular passage more quickly, to gradually slow down over a passage, etc.). By recognizing and understanding all of the musical notation, a musician can know when to play each score note and for how long to hold each score note, thereby translating the notated music into played music. [0027] The ability to translate score notes into performed notes, particularly in real-time, can involve extensive training and practice. Even seasoned musicians can struggle to play complex and/or unfamiliar sequences of score notes with consistently accurate timing and pitch. Some conventional approaches exist for automatically evaluating a user’s performance, but those approaches tend to be limited in several ways. One limitation is that such conventional performance evaluation systems tend to focus either on an overall qualification score (good, bad), or to provide a note-by-note correct/incorrect feedback indication. Another limitation is that such conventional systems typically assume that the performance is a roughly accurate Attorney Docket No.: 108913-1412527 (000200PC) rendering of the score (e.g., without entire passages being transposed or skipped). As such, these conventional systems tend not to be able to provide useful qualitative feedback to the performer. [0028] Embodiments herein provide novel techniques for using predictive inferences of complex state sequences to automatically generate an accurate transcription of a user’s performance of a sequence of score notes and/or to automatically generate qualitative evaluation of the user’s performance. For example, embodiments herein assume that, at any given time during a user’s performance, the state of the performance can be characterized by either: complete silence (e.g., a lack of performance notes in both hands), partial silence in one or both hands, full or partial pitch-transposition in one or both hands, or full or partial time-shifting in one or both hands (within a relatively small window, due to the in-time nature of the performance). Detection of these states can be used to provide qualitative feedback about where the user should focus their attention and/or practice time to improve their playing. This can include marking regions where the user did not play, where pitch transpositions occurred, where temporal (rhythmic) inaccuracies occurred, where there were individual missed or wrong notes, etc. [0029] Turning first to FIGS.1A and 1B, an illustrative system 100 is shown to support various embodiments described herein. Embodiments of the system 100 use predictive techniques to infer a highest likelihood complex state sequence from a user’s performance (e.g., by playing a piano or other instrument) of a sequence of score-notated events as notated on a musical score. Some embodiments of the system 100 are tailored to generate ex post feedback by waiting until after a user’s performance has completed before processing an entire sequence of performance notes and deriving qualitative feedback from the sequence. Other embodiments of the system 100 are tailored to generate dynamic feedback by processing and evaluating performance notes substantially in real time in order to provide qualitative feedback concurrent with the user’s performance. The system 100 can be referred to as an automated performance evaluation system, and such a system is understood also to perform automated musical score performance alignment and other features described herein. [0030] The version of the system 100a shown in FIG.1A performs processing of the audio stream of the user’s performance within the device I/O subsystem 105, and sends performance notes to the evaluator subsystem 150 for evaluation. The version of the system 100b shown in Attorney Docket No.: 108913-1412527 (000200PC) FIG.1B performs processing of the audio stream of the user’s performance within the evaluator subsystem 150. Similar components of the two versions of the system 100 are labeled with similar reference labels and operate in the same manner, except as otherwise noted. As such, features of the two versions of the environment can be combined in any suitable manner. An alternative embodiment can include both versions of the environment, which can be selected automatically and/or by the user, as appropriate. For example, such a system can provide certain types of qualitative feedback while the user is performing (e.g., to help the user make dynamic corrections, and the system can provide additional and/or different types of qualitative feedback after the performance is complete. In some embodiments, system 100a implements the note processor 135 for ex post feedback to generate a sequence of performance notes representing the user’s entire performance prior to sending the sequence of performance notes to the evaluation engine 155. In some embodiments, system 100b implements the note processor 135 for dynamic feedback to dynamically process the user’s performance and to send one or more performance notes at a time to the evaluation engine 155 to support dynamic presentation of qualitative feedback. [0031] The system 100 includes a device input/output (I/O) subsystem 105, an evaluator subsystem 150, and a musical score (MS) data store 140. Embodiments of the device I/O subsystem 105 include a display interface 120 with a display processor 125 and an audio interface 130 with an note processor 135. Embodiments of the system 100 are implemented on a computational device. In some embodiments, the computational device is a portable electronic device, such as a laptop computer, a tablet computer, a smartphone, etc. In other embodiments, the computational device is an appliance, such as a desktop computer. In other embodiments, the computational device is integrated with a musical instrument, such as in an electric piano. [0032] Details of the music score, including the sequence of score notes, are stored by the MS data store 140. As illustrated, the MS data store 140 stores at least MS visual data 145 and MS logical data 143. In some embodiments, the MS visual data 145 includes all information used by the display processor 125 to generate graphical score representations, including graphical representations of score notes, for visual output by the display interface 120. For example, the MS visual data 145 includes a stored graphical representation of the score, including notes and rests, and including any additional graphical score elements (e.g., staff lines, measure lines, clefs, accidentals, time signatures, key signatures, dynamic markings, tempo markings, expressive Attorney Docket No.: 108913-1412527 (000200PC) markings, titles, lyrics, etc.). In other embodiments, the MS visual data 145 includes only a portion of the information used by the display processor 125 to generate visual output of the musical score by the display interface 120. For example, the MS visual data 145 includes score elements other than notes and rests, and the display processor 125 generates visual representations of the notes and rests based on MS logical data 143 for the score notes. [0033] The MS logical data 143 logically defines at least a sequence of score notes, indicating at least an associated pitch, onset time and offset time (and/or duration, note type, etc.) for each score note. In some implementations, the MS logical data 143 includes other information relating to the particular score note, such as pitch for a note, whether the score note is part of a triplet or other grouping, whether the score note is associated with a particular hand or finger, etc. In some embodiments, the MS logical data 143 is score-referenced, such that onset time and/or offset time (and/or duration) is defined in relation to notation on the underlying musical score. The relationship to the score can be based on the type of note or rest (e.g., “quarter note”), a consumed fraction of a measure or measures (e.g., half of a measure), a number of metronome beats (e.g., three beats), etc. In an example score note, an onset time is indicated as occurring at the second beat of the fourth measure, and the offset time is indicated as occurring at the fourth beat of the fourth measure. Additional MS logical data 143 can be used to convert the offset time to a duration, if needed. For example, MS logical data 143 indicates that the musical score is in “4/4” time, from which it can be derived that the example score note is a half note. Additionally or alternatively, the offset time can be indicated in the score-referential MS logical data 143 as a duration. Referring to the same example score note, the MS logical data 143 can indicate that the score note has a duration of two beats, or that the score note is a half-note. In such cases, the SNR offset timing can be derived, if desired, by adding the explicitly indicated duration to the onset time. In each case, a tempo can be used to convert the score-referenced MS logical data 143 into real-time-referenced timing information. For example, additional MS logical data 143 indicates a default tempo of 120 beats per minute, such that a score-referenced half-note would consume approximately one second of real time in the default tempo. Score- referenced definition of MS logical data 143 provides various features. For example, the real- time durations of all score notes can automatically be adjusted along with changes in tempo. [0034] Alternatively, MS logical data 143 can be real-time-referenced. In one implementation, each score note is defined as beginning (i.e., onset time) some number of milliseconds from the Attorney Docket No.: 108913-1412527 (000200PC) beginning of the composition and as ending (i.e., offset time) some number of milliseconds from the beginning of the composition. In another implementation, each score note is defined as beginning (i.e., onset time) some number of milliseconds from the beginning of the composition and as ending (i.e., offset time) some number of milliseconds after the onset time. Regardless of whether the MS logical data 143 is score-referenced or real-time-referenced, offset times either be defined as an associated ending time for a score note, or as a duration following an associated onset time for the score note. [0035] In some embodiments, the display processor 125 generates graphical representations of musical scores and score notes based only on MS visual data 145, without any processing of MS logical data 143. In other such embodiments, the display processor 125 partially generates graphical representations of score notes based on MS logical data 143 (e.g., such as determining horizontal placement of notes and rests corresponding to the score notes based on onset times, determining note or rest types based on offset times, etc.), while remaining graphical representations of musical score and/or score note elements are generated from MS visual data 145. Some embodiments of the display processor 125 include a notation rules engine (not shown) that can derive additional information from the MS visual data 145 and/or from the MS logical data 143 that can be used to generate graphical representations of musical score and/or score note elements. For example, the notation rules engine may determine whether a note has a flag, whether notes are tied together across a measure boundary, whether a note stem is pointing up or down, whether an accidental is shown, etc. [0036] Embodiments of the display interface 120 and the display processor 125 are implemented in any suitable manner that supports visual portions of the audiovisual output features described herein. The terms “visual” and “graphical” are used interchangeably herein. The display interface 120 can be implemented by any suitable display components, such as by a tablet computer display, a smartphone display, a computer monitor, etc. The display processor 125 can be implemented as any processor, portion of a processor, or group of processors capable of generating graphical outputs via the display interface 120, as described herein. For example, the display processor 125 can be implemented by a central processing unit (CPU), a graphics processing unit (GPU), etc. As described above, the display processor 125 can use the MS visual data 145 (e.g., and the MS logical data 143) to generate a graphical representation of a musical Attorney Docket No.: 108913-1412527 (000200PC) score, including graphical representations of the sequence of score notes to be performed by a user. [0037] Embodiments of the display processor 125 provide additional features. Some such features involve generating display outputs to graphically represent real-time progress through the musical score, such as by graphically indicating a current playback position. Other such features involve generating display outputs to graphically represent evaluation feedback. These and other features of the display interface 120 and the display processor 125 are described further below. [0038] Embodiments of the audio interface 130 and the note processor 135 are implemented in any suitable manner that supports aural portions of the audiovisual output features described herein. The terms “aural” and “audio” are used interchangeably herein. The audio interface 130 can be implemented by any suitable audio components, such as by one or more audio transducers, speakers, headphones, etc. The note processor 135 can be implemented as any processor, portion of a processor, or group of processors capable of generating audio outputs via the audio interface 130, as described herein. In some embodiments, the audio representations of the score notes are dynamically generated from the MS logical data 143. In some such embodiments, audio representations of the sequence of score notes indicate only the onset times, such as by buzzing, vibrating, or clicking at each onset time. In other such embodiments, audio representations further indicate offset times for the score notes (e.g., the audio interface 130 buzzes or vibrates for the duration between the onset and the offset of each score note). In other such embodiments, where the MS logical data 143 includes additional audio information, the audio representations of the score notes indicates the further audio information, such as pitch, dynamics, etc. In other embodiments, the audio representations of the score notes are generated from recorded audio. For example, the sequence of score notes is played back by playing back a digital audio file from the MS data store 140 that stores a recording of the musical composition. [0039] Embodiments of the note processor 135 provide additional features. Some such features involve generating additional audio during a user performance, such as outputting a metronome and/or a backing track to indicate tempo. When the user performs the sequence of score notes in tempo, the score notes and the user’s performed notes can easily be registered to a common time basis (e.g., a score-referenced time basis). Other such features involve generating Attorney Docket No.: 108913-1412527 (000200PC) audio outputs to play back portions of a user’s performance, such as for comparison with playback of an ideal version of the composition, a recorded version of the composition, a past performance by the user, etc. [0040] As a user plays through a composition (a sequence of score notes represented on a music score), embodiments receive audio of the user’s performance via the audio interface 130 and process the audio by the note processor 135. For example, a microphone is used to convert the user’s performance into an audio signal. Additionally or alternatively, the user can perform on a musical instrument configured to generate output in musical instrumental digital interface (MIDI) format, and the device I/O subsystem 105 can include a MIDI input for receiving the performed notes as digital event data. In such cases, the audio interface 130 can include a MIDI interface, and the note processor 135 can include MIDI processing features. In the case of MIDI, or the like, the digital data itself may indicate the sequence of musical events (e.g., note events) performed by the user. In the absence of MIDI, the raw audio information is processed to extract the sequence of musical events corresponding to the user's performance. This extraction may either happen in the note processor 135 of the device I/O subsystem, or in evaluation engine 155, as part of the evaluator subsystem 150. As such, the information sent from the note processor 135 to the evaluation engine 155 is a representation of the user's performance either in the form of raw/processed audio, or in the form of a sequence of discrete musical events. [0041] As described herein, embodiments seek to find a most likely set of performed notes based on the known sequence of score notes from a musical score and an audio signal representation of a user’s performance. The sequence of score notes is received from the MS data store 140, and the sequence of performed notes is received from the device I/O subsystem 105 (e.g., from the note processor 135). A score-referenced time basis is used to define a series of time steps. For example, the user plays along with a metronome or rhythmic backing track, which establishes a shared time basis between the sequence of score notes and the user’s performance. [0042] Embodiments can generate the sequence of performed notes in one or more ways, depending on the type of input received via the audio interface 130. The performed notes are received as a real-time audio stream, which can be a stream of raw analog audio information (e.g., received via a microphone or audio cable), a stream of raw digital audio information (e.g., Attorney Docket No.: 108913-1412527 (000200PC) an analog stream converted to a digital data stream), or a MIDI data stream received via a MIDI interface. References herein to “MIDI,” “MIDI stream,” “MIDI interface,” or the like can be generally extended to include any type of special audio encoding format that encodes at least the pitch, onset time, and offset time or duration for each note event. In comparison, references to a “stream of raw analog audio information” or a “stream of raw digital audio information” generally include any stream of analog or digital information that represents a user’s audio performance but does not encode individual note events. [0043] In the case of a MIDI stream, a MIDI processor 137 of the note processor 135 takes the MIDI stream as input and generates an output representing a stream of performed notes (each with at least a corresponding pitch, onset time, and offset times or durations). In the case of a raw analog or digital audio stream, the received stream can be segmented by the note processor 135 for real-time processing. For example, the note processor 135 can split the received audio stream into overlapping windows of fixed duration (e.g., 0.3 seconds), and each overlapping window can be fed into a transcription engine 139. For each overlapping window, the output of the transcription engine 139 represents a set of notes that started and a set of notes that ended during that window, if any. The note processor 135 can assemble the sets of notes from the transcription engine 139 into the stream of performed notes (each with at least a corresponding pitch, onset time, and offset times or durations). Thus, regardless of the form of the received real-time audio stream, components of the note processor 135 can generate an output representing a stream of performed notes. [0044] In ex post feedback embodiments, performance evaluation occurs after the performance is complete. In such cases, embodiments of the note processor 135 can assemble the streams of performed notes into a sequence of performed notes representing the entire performance. The sequence of performed notes can then be evaluated in segments and/or in its entirety by the evaluation engine 155. [0045] In dynamic feedback embodiments, performance evaluation is dynamic based on real- time processing and evaluation of the performed notes. As used herein, “real-time” or “dynamic” are intended to mean concurrent with the user’s performance, as opposed to after completion of the user’s performance. In some implementations, the note processor 135 generates performance notes and sends them to the evaluation engine 155 substantially Attorney Docket No.: 108913-1412527 (000200PC) continuously. For example, as each set of one or a small number of performance notes is generated, the set is sent to the evaluation engine 155. In some such implementations, performance notes are sent based on a moving window, such as the same moving window used by the transcription engine 139 to generate the performance notes. In other implementations, the note processor 135 sends sequences of performance notes to the evaluation engine 155 based on one or more sending trigger events, such as periodically (e.g., according to a fixed frequency, such as once per second), each time a certain number of performance notes is reached, each time a certain portion of the score has elapsed (e.g., once per measure), each time a buffer of performance note data is full, each time the evaluation engine 155 completes processing a previous batch of performance notes, etc. [0046] Embodiments of the evaluation engine 155 can process the performance notes as they are received from the note processor 135. In some implementations, the evaluation engine 155 evaluates performance notes based on a moving evaluation window. The evaluation window can be longer, shorter, or the same duration as the moving window used for transcription and/or for sending performance notes, etc. For example, performance notes may be sent by the note processor 135 to the evaluation engine 155 multiple times per second, but the evaluation engine 155 may only evaluate the received performance notes once every 5 seconds. In other implementations, the evaluation engine 155 performs evaluations of sequences of performance notes based on one or more evaluation trigger events, such as periodically, each time a certain number of performance notes is received, each time a certain portion of the score has elapsed, each time a buffer of performance note data is full, etc. The evaluation trigger event(s) used by the evaluation engine 155 can be the same as or different from those used by the note processor 135. [0047] Further, different moving window durations and/or evaluation triggers can be used for different types of evaluations. As one example, it may be determined that evaluating a moving window of three sequential performance notes is useful for determining whether the user appears to be playing in the wrong octave, while a moving window of ten performance notes is useful for determining whether the user is tending to play ahead of or behind the beat. As anther example, certain evaluations may only be made after a certain number of occurrences of a particular opportunity, such as a certain number of times the score indicates an accidental, a certain number of times the score indicates a mordent, etc. Also, certain types of evaluations may only be Attorney Docket No.: 108913-1412527 (000200PC) performed at certain times and/or under certain conditions. As one example, embodiments can evaluate whether the user failed to begin playing at all, but such an evaluation is only relevant at the start of the performance. As another example, certain evaluations may only be relevant when the user is playing with two hands. As another example, certain evaluations may only be relevant when the score indicates a change in tempo, a change in key signature, a change in dynamics, an accidental, etc. [0048] Embodiments of the evaluation engine 155 generate a highest likelihood sequence of complex states. Each complex state in the sequence is generated based on a set of hypotheses. At each time step, embodiments generate a playing hypothesis indicating a hypothesis of whether the user is playing, or not. If it is hypothesized that the user is playing, embodiments further generate a delta hypothesis indicating a hypothesis of a delta between what is being played and a corresponding score note. The delta hypothesis can indicate a hypothesis of a timing delta, a pitch delta, and/or some other musically relevant delta (e.g., a dynamics delta). For example, a non-zero pitch delta indicates a hypothesis of the user playing at the wrong pitch, and a non-zero timing delta indicates a hypothesis of the user playing behind or ahead of the score-referenced time basis (e.g., corresponding to a metronome). One or more performance errors are determined based on the delta hypothesis, and one or more qualitative evaluations are made based on the performance errors. For example, determining a consistent pitch offset may qualitatively indicate that the user performed a passage in the wrong octave, that the user always incorrectly performed a particular accidental, etc. [0049] In some cases, a performance involves multiple concurrent streams of aural information, such as two hands playing together. In some such cases, separate hypotheses can be generated for each track. For example, the highest likelihood sequence of complex states is generated for the sequence of time steps for each hand. In other cases, a joint hypothesis is computed for the sequences of complex states of both hands. Although computing the highest likelihood sequence of the joint hand states involves more computation, it is likely to give better results by avoiding inconsistent hypotheses between hands (e.g. both hands playing the same notes). [0050] Embodiments operate in context of a user performing along with temporal guidance. For example, in the performance evaluation application, playback of a musical composition is Attorney Docket No.: 108913-1412527 (000200PC) accompanied by a metronome and/or an indication (e.g., a moving vertical line superimposed on the score) indicating current score-referenced timing. As such, embodiments can formulate hypotheses based on an assumption that the time axis of the user’s performance is shared with the score-referenced time axis (i.e., the user is playing “in time” with the music). Of course, features of the performance evaluation system can allow the user to perform the piece at different tempo settings (i.e., at faster and slower speeds), but the score-referenced time axis will also be adjusted accordingly. [0051] At each point in time in a performance, embodiments hypothesize whether a note is being played. In some embodiments, the hypothesis is whether a note is being played by each hand. When it is hypothesized that a note is being played, embodiments further hypothesize a specific pitch delta and time delta between the hypothesized note and a corresponding score note. A non-zero pitch delta indicates that the user is playing at the wrong pitch, and a non-zero time delta indicates that the user is either behind or ahead of the score-referenced time axis (e.g., ahead of or behind a metronome). Further identification of note-level errors of the performance are determined based on the playing/not-playing and pitch/time delta hypotheses, as described herein. [0052] Implementation of embodiments herein involves finding an automated technical approach to address the technical problem of finding a most likely state sequence given the two sequences of notes—a sequence of score notes and a sequence of performance notes—where a state represents whether a user is playing a part/hand at a given time, and if so, the pitch/time delta at which the user is playing. It can be reasonably assumed that the observation probabilities at a given timestep (i.e., the performed notes observed around this timestep) are independent of states at other timesteps, given the state at the current timestep (i.e., whether/how the user is playing at this timestep). Hence, embodiments can formulize the technical problem as a hidden Markov model (HMM), in which the states are unknown, the sequence of score notes is given as background information, and the performance notes are the raw observations from which it is desired to infer the most likely sequence of states. [0053] The approach realized by embodiments herein can provide several novel features. One novel feature is that embodiments combine the raw observations (the performance notes) with the background information (the score notes) to form the observations (sets of time and pitch Attorney Docket No.: 108913-1412527 (000200PC) deltas) that are accounted for by the states. Another novel feature is that the state space is not constrained to being fully discrete or fully real-valued; rather, it can be a hybrid between the two. Another novel feature is that the state space can be discretized dynamically at each time step to allow for tractable inference using the Viterbi algorithm. [0054] As described herein, it is desirable to provide qualitative feedback to a user during and/or after a performance, and embodiments herein provide such feedback based on aligning the performed note sequence (as performed by the user) with the score note sequence (as represented on the musical composition score). The alignment is performed by formulating a technical model (e.g., the HMM) that represents the problem, and formulating the model can begin with formulation of a technical definition of a note and with discretization of the time axis that is shared by the score and performed note sequences. [0055] For the purpose of such alignment, both score notes and performed notes can be regarded as pairs of onset time and pitch. For example, onset time can be represented in seconds, and pitch can be represented as a range of pitch identifiers from A0 to C8, as MIDI note numbers between 21 and 108, and/or in any other suitable manner. Implementations can ignore note duration and/or other (e.g., score-specific) attributes, such as enharmonic spelling, or articulation markings. Thus we define a score S and performance P as sequences of (onset, pitch) pairs, as follows: [0056] Given a score S, the discrete timesteps i of the Markov process {^ ^ } can be defined, so that the HMM describes the sequence of unique note onset times of the score notes as: As such, ^ ^ is the time associated with at timestep i. [0057] Rather than treating the performed notes directly as observations, embodiments compute deltas of performed notes with respect to score notes in terms of onset time and pitch. Attorney Docket No.: 108913-1412527 (000200PC) More specifically, for a score note s and a performed note p the time and pitch ^ʌ^^deltas are respectively defined as: ^ (గ) ^ ,^ = ^^^^^(^)െ ^^^^^(^) (5) [0058] Since it is not known at this stage which score note is associated with each performed note (if any at all), time and pitch deltas are computed between a score note and all performed notes that lie within predefined time and pitch delta ranges. This can be represented as follows: [0059] For example, time delta ranges can be set so that െ^ (గ) ^ ^^ = 24. A time and pitch delta pair can be referred to as a single item according to the following simplifying notation: ^ (௧) (గ) ^ ,^ = (^^,^,^^,^ ) א Թ (8) [0060] FIG.2 shows an example of a set of score notes and performed notes in a “pianoroll” layout 210, along with a table 220 of related time and pitch delta values. In the pianoroll layout, horizontal placement represents time, and vertical placement represents pitch. Pitch and time deltas can be computed between each score note and any performance note that lie within predefined time and pitch delta ranges, as described above. As illustrated, for a particular score note (s_0), three performance notes (p_1, p_2, and p_3) lie within predefined time and pitch delta ranges. As one illustrated example, between score note s_0 and performed note p_1, there is a computed pitch delta (dp_01) and a computed time delta (dt_01). As another illustrated example, between score note s_0 and performed note p_3, there is a computed pitch delta (dp_03) and a computed time delta (dt_03). Attorney Docket No.: 108913-1412527 (000200PC) [0061] The computed pitch and time deltas can be plotted on a two-dimensional plane. For example, FIG. 3 shows a plot 300 of representative pitch and time deltas for several performed notes plotted on a two-dimensional plane relative to the pitch and timing of a score note. The pitch and timing of the score note corresponds with the location (0, 0) on the plot 300. Such a plot can be used automatically to determine the region of performance notes falling within the predefined time and pitch delta ranges as candidate performance notes to align with the score notes. [0062] Embodiments produce a variable number of deltas for each score note; possibly zero (e.g., when there are no performed notes in the vicinity). Even if the Markov process itself allows for states to change at each timestep, embodiments account for the practical nature of the activity being modeled. In particular, it is recognized that the hand playing states evolve at a slower rate than the timesteps of the Markov process. As such, for example, it is helpful to regard individual missed notes in an otherwise correct performance as mishaps during a “playing” state rather than as a rapid transition from a “playing” state to a “not playing” state and back. For this reason, instead of defining the observations at timestep i as only the deltas computed from notes that start at ^ ^ , deltas from notes surrounding ^ ^ are aggregated. More specifically, a parameter ^ (an integer value) can be defined, and for each timestep i and hand ^^^^ א ^,^ the ^ notes whose onsets are closest to ^ ^ are selected. This subset of score notes is referred to as the window . Because the relevance of a score note for the inference of state ^ ^ is expected to diminish with its distance from ^ ^ , each note ^^ in the window ^^ can be assigned as follows: where ^ ^ = |^ ^ െ ^^^^^(^)| and ^ is a temperature parameter that controls how quickly the weight drops as notes are further way from the center of the window. [0063] The observation at timestep i can be defined as a set of (delta, weight) pairs, as follows: Attorney Docket No.: 108913-1412527 (000200PC) [0064] Note that the weight associated with a pair of time/pitch delta values is that of the score note from which the deltas were computed (e.g., as noted in table 220 of FIG. 2). For example, the table 220 of FIG.2 illustrates the observations as an array, and the plot 300 of FIG.3 illustrates the observations as projected into a 2D plane. The full observation at step i is the pair of hand-specific observations [0065] A state can now be defined. A state ^ represents the state of both hands at a given time. A hand can either be in a “not playing” state (represented by the empty set ^), or in a “playing” state, with a time and pitch delta that lies within the specified range. The state space ^ for the left hand and right hand separately can thus be defined as: א ^: { ^ } ^ ^ single hand state (12) [0066] Although pitch delta values ^ (^) can only take integer values, the space of possible pitch deltas ^ (^) can be defined as a subset of the real numbers in order to facilitate treating of time and pitch deltas in a uniform way, which is convenient in the inference process. A joint left- and right-hand state ^ can then be represented as an element of the cartesian product of the single hand state spaces: ^ joint hand state ( 13 ) [0067] Toward the goal of achieving alignment between the score note sequence and the performance note sequence, embodiments recognize that such alignment is consistent with a best explanation of the observations represented by the sequence of hand states. In other words, embodiments are interested in the sequence of hand states ۶ כ that best explains the observations ^. Under the HMM formulation, this sequence is characterized as follows: ۶ כ = argmax۶ log^ ( ^ | ۶ ) = argmax۶ ^ log ^ ( ^^ | ^^ ) + log^ ( ^^ | ^^ି^ ) ( 14 ) ^ஸ^ஸே where ^ = ( ^^ ,ڮ ,^ ) is the sequence of observations, ۶ = (^ ^ ,ڮ ,^ ) is the sequence of states, and ^ ^ = (^,^) is the initial joint hand state. [0068] Because the state-space is not a finite and discrete set, the Viterbi algorithm cannot be used to find ۶ כ , and finding ۶ כ is not tractable in general. One possible approach is to use so- Attorney Docket No.: 108913-1412527 (000200PC) called “particle filtering” to track a set of ۶ כ candidates over time, even if particle filter approaches are typically defined for purely real-valued state spaces. Alternatively, to support application of the Viterbi algorithm, some embodiments dynamically discretize the state space at each timestep. Such dynamic discretization upholds the benefit of the continuous state space (that delta values can be modeled precisely at each time step, without quantization), while at the same time allowing efficient inference using the Viterbi algorithm. The terms “Viterbi path” and “Viterbi alignment” generally refer herein to ^ כ . [0069] The state space as defined in Equations (12, 13) contains a discrete item—the “not- playing” value ^—and the continuous time/pitch delta space ^. A naive approach to discretizing the time/pitch delta space would be to use a fixed quantization grid defined in advance. A disadvantage of this approach is that there is an inherent tradeoff between the accuracy of the state representation (the more coarse the quantization, the larger the quantization error), and the computational burden of inference (finer quantization leads to larger state spaces). [0070] Since for any given time step ^ and hand there is typically only a small number of likely candidate states in the time/pitch delta space, a viable approach is to identify the most promising candidate states and select those as the discrete values—along with the “not-playing” state—that the individual hand states and ^ (ோ) ^ can take. The joint hand state space grows quadratically with the size of the individual hand state space, but the computational impact can be limited by keeping the number of candidates low (e.g. < 5). [0071] We identify the candidate states from the time/pitch delta space at step ^ using weighted Kernel Density Estimation (KDE) on the observations ^^ . The KDE of ^^ is a non-negative function KDEை^(^) that returns the data density of ^^ at point ^: where ^ is the bandwidth of the (Gaussian) kernel. [0072] Since the deltas are computed for many pairs of nearby score and performed notes, the delta values will generally be scattered around the space. However, if a user plays with a certain time and pitch delta with respect to the score, the deltas between score notes and their Attorney Docket No.: 108913-1412527 (000200PC) corresponding performed note will accumulate around that point in the delta space, forming a peak/mode in the KDE. [0073] A complicating factor is that repetitive patterns in the score (e.g. sequences of notes with the same duration and alternating pitches) tend to produce modes caused by deltas between non-corresponding score and performed notes, potentially leading to a misalignment between score and performance. To counter this phenomenon, embodiments can define a set of “negative” observations ^ (^) (ோ) ^ = (^ ^ ,^ ^ ) as follows: [0074] The density KDE ை^ (^) contains modes corresponding to misalignments (when time and pitch delta values are both 0). Therefore, by subtracting KDE ை^ (^) from KDE ை^ (^), a (non- normalized) density can be obtained having modes that are likely candidates for the time/pitch delta at timestep ^: [0075] There are no analytical methods to find the modes of KDEs (even when using Gaussian kernels), but there are iterative methods, such as the so-called “mean shift algorithm.” Embodiments use the mean shift algorithm, which is seeded with the observations ^^ . For increased computational efficiency, the gradient of the density ^^ is used (which can be computed analytically, since we use a Gaussian kernel) as the shift vector of a candidate, rather than computing the shift vector from the neighborhood of the candidate. Furthermore, since embodiments are only interested in the largest modes (rather than all modes), embodiments can discard any candidates that are below a threshold (e.g., 0.5 times the density value of the current best candidate). For normal kernel bandwidth values, such an approach typically yields between 1 and 5 candidates in fewer than 10 mean shift iterations. [0076] FIGS.4A and 4B show illustrative mode-selection/discretization for example data. An illustrative set of eight iterations (labeled “Iteration 1” through “Iteratioon 8”) demonstrates iterative mode finding on the Gaussian kernel density estimation (KDE) of observed pitch/time deltas. FIG.4A shows iterations 1 – 4, and FIG.4B shows iterations 5 – 8. Each iteration is Attorney Docket No.: 108913-1412527 (000200PC) illustrated as a plot of pitch delta (in semitones) versus time delta (in seconds). In each plot, one or more “X” indicators represent candidate pitch/time deltas for the iteration, and heat maps around those pitch/time deltas represent the KDEs. [0077] Prior to iteration 1, all observed pitch/time deltas may be candidates. At iteration 1, candidates at low densities are filtered out. At subsequent iterations, candidates are alternatingly moved towards a local maximum along the gradient of the KDE and are clustered. The iterations illustrated in FIGS.4A and 4B can be an example of Gaussian KDE ^(^|^) of the state H ^^į t , įp) given a list O of observed time/pitch deltas between note pairs. Embodiments use the modes (or a subset of the modes) of this distribution ^(^|^) as the most likely hypotheses, yielding a typically low number of hypotheses, for each of which the likelihood ^(^|^) is computed, and used in the formula provided below (Equations (18) and (19)). [0078] As described herein, embodiments hypothesize each complex state based on at least one continuous delta variable. As such, hypothesis ranking, including transition probability estimation, can include local smoothing and/or normalization. For example, suppose a score note corresponds to a D-flat in octave 4 of the piano; a first hypothesis is that the user performed a D-natural in octave 4, and a second hypothesis is that the user performed a D-natural in octave 5. Suppose further that the preceding few notes were all in octave 4, and the user has played them all in octave 5. Without any local smoothing, the first hypothesis would have a computed pitch offset of +1, and the second hypothesis would have a computed pitch offset of +13. Because of the continuous nature of the pitch delta variable, local smoothing can effectively treat the second hypothesis as having a pitch offset of +1 relative to a localized transposition of +12 (i.e., an octave). [0079] Some embodiments can define observation likelihoods ^(^|^) = to express the likelihood of the left- and right-hand observations given left- and right-hand states. Simplifying assumptions can be made that ^(^ (^) |^ (^) such that the following holds: [0080] The terms ^ are defined as follows: Attorney Docket No.: 108913-1412527 (000200PC) where ^ ^ is the variance of the Gaussian observation likelihood function (centered at ^ (^^^ௗ) ), and ^ is a parameter specifying the likelihood of the “not-playing” state. [0081] The other component of the HMM to be defined in order to compute the Viterbi path are the transition probabilities ^(^ ^ |^ ^ି^ ). Here, similar simplifying assumptions of independence can be made as for the observation likelihoods: [0082] The transition probability per hand is given by: where ^ is the variance of the Gaussian transition probability. [0083] The parameters used by embodiments herein can be optimized using Bayesian black box optimization based on the expected improvement criterion. The evaluation function evaluates a set of parameter values by computing alignments on a set of score/performance pairs, and comparing the computed alignments against annotated ground-truth alignments. The quantity to be minimized is the number of timesteps at which the predicted hand states deviate from the ground-truth. [0084] The result of the Viterbi algorithm is the sequence of hand states ^ כ that describe the time/pitch deltas at which each hand is playing, and whether the hand is playing at all. This result can be used as the basis for a note-wise alignment, that builds a mapping between individual performed notes and score notes. To construct the note-wise alignment, embodiments apply a greedy search for the closest transformed score note for each performed note. For example, embodiments can implement an algorithm for note-wise alignment from P to S based on the Viterbi path ^ כ , such as described by the following pseudocode: Attorney Docket No.: 108913-1412527 (000200PC) 1: procedure NOTEWISEALIGNMENT (S, P, H * ) 2: Pƍ ĸ P 3: Sƍ ĸ APPLYINFERREDDELTAS (S, H * ) 4: R ĸ ^ 5: while Pƍ ^ ^ do 6: d ĸ minsƍ א Sƍ || p í sƍ || 7: s ĸ argmin sƍ א Sƍ || p í sƍ || 8: if d < ^ then 9: R ĸ R ^ {(s, p)} 10: Sƍ ĸ Sƍ/s 11: else 12: R ĸ R ^ {(^, p)} 13: end if 14: Pƍ ĸ Pƍ/p 15: end while 16: return R 17: end procedure [0085] The above algorithm includes another embedded algorithm for transformation of the score notes by applying the inferred time/pitch deltas ^ כ to the score notes ^. For example, the embedded algorithm can be described by the following pseudocode: 1: procedure ApplyINFERREDDELTAS (S, H * ) 2: Sƍ ĸ ^ 3: for all hand א L, R do 4: for all s א S (hand) do 5: onset(s) 6: if H i ^^^ then 7: sƍ ĸ s + Hi 8: Sƍ ĸ Sƍ ^ {sƍ} 9: end if 10: end for Attorney Docket No.: 108913-1412527 (000200PC) 11: end for 12: return Sƍ 13: end procedure [0086] Although the greedy note-wise alignment could in principle be performed without ^ כ , that is, using the untransformed score notes ^ rather than ^^, it would fail to establish the correct mapping in when the time/pitch deltas are too large. [0087] FIGS.5A and 5B show an illustrative process flow 500 for aligning performance notes to score notes, according to embodiments described herein. The process is represented as six stages. The first three stages (510, 520, and 530) are shown in the portion of the flow 500a shown in FIG. 5A, and the remaining three stages (540, 550, and 560) are shown in the portion of the flow 500b shown in FIG.5B, The flow 500 can be implemented by the evaluation engine 155 of FIG.1A or 1B. [0088] Embodiments of the flow 500 begin at a first stage 510 by receiving performance notes and score notes falling within an evaluation window. For example, all performance and score notes can be received for ex post feedback, or a portion of performance and score notes corresponding to a range of timestamps can be received for dynamic processing. In a second stage 520, time step discretization and windowing are performed to effectively create processing chunks of score notes and potential candidate performance notes for alignment thereto. In a third stage 530, pitch and time deltas are computed, and KDE (e.g., Gaussian KDE) is performed based on the processing chunks generated in the second stage 520. [0089] Turning to the remaining portion of the flow 500b in FIG.5B, at a fourth stage 540, the state space for the time and pitch deltas from the third stage 530 is discretized. At a fifth stage 550, the Viterbi path (indicated by the bold line connecting hypotheses at consecutive time steps) is computed through the discretized state space from the fourth stage 540. At a sixth stage, the Viterbi path computed in the fifth stage 550 is used to identify the set of hypotheses that best explains the observations, thereby generating a most likely note-wise alignment between the score and performance notes. [0090] For added clarity, FIGS.6, 7A, and 7B show illustrative state space representations for a given time window of a user activity. FIG.6 shows a pianoroll format representation 600 of Attorney Docket No.: 108913-1412527 (000200PC) the given time window, including a sequence of score notes (dark thin horizontal lines) and performed notes (lighter thicker horizontal lines) for a two-handed music. Dimmed line segments are not part of the given time window. FIGS.7A and 7B show kernel density plots 700 over a timing-pitch delta hypothesis space for left-hand and right-hand events, respectively, corresponding to a time step (t = 5.094 seconds) in the given time window of FIG.6. The circles denote the modes selected through the state space discretization, and the square indicates the mode that lies on the Viterbi path. In FIG.7A, it can be seen that one of the modes was chosen to be part of the Viterbi path. In FIG.7B, the left plot, the absence of a square indicates that the Viterbi path included a “not playing” hypothesis in this time step. [0091] Having applied the Viterbi algorithm, embodiments can move from score performance alignment to performance evaluation. In effect, the alignment of the performance notes to the score notes can be used as a most likely representation of a sequence of performance errors as between the performed notes and the score notes, and the sequence of performance errors can be used to generate evaluation feedback. At least some of the evaluation feedback can be qualitative. For example, feedback can indicate that entire passage seems not to have been played, an entire passage seems to have been played in the wrong octave or key, a particular type of score note seems to be consistently incorrect (e.g., possibly the user does not understand how to play a dotted quarter note, an accidental, a triplet, etc.; or that the user needs to work on releasing a note during a rest; etc.), that the user is lagging or rushing in one or more sections, etc. Finer-grained issues, such as one-off missed notes, wrong pitches, or early/late note onsets, can also follow directly from the note-wise alignment. Thus, feedback can be provided on different bases, such as based on time, based on pitch or timing or other (e.g., dynamics), based on score note type, at the individual event level, etc. Further, multiple types of feedback can be provided concurrently. For example, in a single passage, feedback may indicate that the user played the entire passage an octave too high, also rushed a set of eighth notes in the middle of the passage, and also missed one of the notes (i.e., even accounting for the octave shift). [0092] Returning to FIG.1, the evaluator subsystem 150 includes an evaluation engine 155, an evaluation store 156, and a feedback engine 157. To support the novel evaluation approach described herein, embodiments of the evaluation engine 155 are implemented with two stages: an alignment-based evaluation sub-engine, and a note-wise evaluation sub-engine. The alignment- based evaluation sub-engine performs the alignment between the performance notes and the Attorney Docket No.: 108913-1412527 (000200PC) score notes, such as according to the flow 500 of FIGS.5A and 5B. The output of the alignment- based evaluation sub-engine can be a list of regions inside the score; each region corresponds to a time when the user is either not playing, or the user is playing with some time and pitch offset (e.g., which may be zero, if the user is playing with the correct pitch and timing). The note-wise evaluation sub-engine generates a mapping (e.g., a one-to-one mapping) of performance notes to score notes. In some implementations, the note-wise evaluation sub-engine queries each region from the alignment-based evaluation sub-engine, and each region returns respective region feedback. For example, if the user did not play anything during a particular region, the feedback may be a single feedback indicating that nothing was played there; if the user did play something during a particular region, the feedback may be multiple feedback indicating a list of note- specific errors relative to a particular score note. [0093] Some embodiments store the highest likelihood sequence of complex states (e.g., as a sequence of performance errors) in the evaluation store 156. The evaluation store 156 can also store evaluation models that are usable by the evaluation engine 155. For example, the evaluation engine 155 to generate automated evaluation of the performance. For example, the evaluation models can provide a definition by which the evaluation engine 155 can recognize when a sequence of performance errors suggests that the user was playing in the wrong key, etc. In some cases, an evaluation model may be defined algorithmically. For example, the evaluation model may indicate that if more than a threshold number of the sequence of performance errors matches a particular criterion, then a particular evaluation is generated. In other cases, an evaluation model may be defined as a pattern, mask, or the like; and the evaluation engine 155 uses more complex statistical, pattern matching, machine learning, and/or other algorithmic approaches to determine whether to generate a particular evaluation. In some cases, one or more evaluation models is generated by an artificial intelligence/machine learning (AI/ML) approach. For example, a large number of examples of a particular type of performance deficiency is assembled as training data, and an AI/ML engine is used to generate an appropriate evaluation model for subsequent identification of the deficiency. While descriptions herein refer to a sequence of performance errors and identification of performance deficiencies, the same techniques can be used to recognize performance improvements, successes, etc. For example, one or more evaluation models can be used by the evaluation engine 155 to identify instances in Attorney Docket No.: 108913-1412527 (000200PC) which the user did a great job (e.g., performed a passage better than expected, or better than they performed it at an earlier attempt). [0094] The evaluation engine 155 generates the one or more types of performance evaluations (e.g., including qualitative evaluations) of the user’s performance and sends corresponding information to the feedback engine 157. The evaluation data is used by the feedback engine 157 to generate feedback data 159. The feedback data 159 can include micro-level feedback and/or macro-level feedback. The feedback data 159 can be used by the display processor 125 to generate graphical performance feedback to be output to the user via the display interface 120. In some embodiments, the feedback engine 157 generates the feedback data 159 to be further usable by the note processor 135 to generate audible performance feedback to be output to the user via the audio interface 130. [0095] In some embodiments, the feedback data 159 is micro-level feedback that indicates event- wise accuracy as a graphical overlay to the musical score (e.g., at an score note and/or a performed note level). The term “graphical overlay” is used herein to generally include any type of concurrent graphical presentation that provides visual juxtaposition between the graphical feedback elements and the graphical score elements to which the feedback applies. For example, such a graphical overlay can include displaying a graphical feedback element in a semi- transparent manner over a graphical score element, recoloring (e.g., or changing line weight, etc.) a graphical score element to imply a graphical feedback element, adding text or images representing a graphical feedback element alongside a graphical score element, etc. [0096] In some implementation, score notes are displayed on the musical score in a first color to indicate a skipped event; in a second one or more colors to indicate a well-played (or improved) event; and in a third one or more colors to indicate a poorly-played (or worsened) event. In another implementation, a first color is used to highlight portions of the musical score during which an score note should be played (e.g., from the onset time to the offset time for a particular note score note), a second color is used to highlight portions of the musical score during which a matching performed note was played (and/or color to indicate whether the performed note was played well or poorly). In other implementations, generated feedback can include one or more numeric scores, colors or other graphical elements used to overlay (or provide access to) past performance data from the user or from other performers, etc. Attorney Docket No.: 108913-1412527 (000200PC) [0097] In other embodiments, the feedback data 159 is macro-level feedback that indicates feedback relating to more than a single event at a time. Some macro-level feedback applies to the entire performance. Some macro-level feedback applies to a section (e.g., a page, a measure, a line, a hand, etc.). Some macro-level feedback applies to a passage (e.g., a musically relevant phrase, a grouped set of sixteenth notes, etc.). Some macro-level feedback applies to a category (e.g., a particular type of note or rest, all rests, triplets, dotted notes, staccato notes, etc.). As described herein, at least some of the macro-level feedback can be qualitative, such as based on an identified pattern indicating an area for improvement, an area showing present improvement, etc. [0098] The macro-level feedback can be presented as a graphical overlay to the musical score, in a separate portion of the display, or in any suitable manner. In some such embodiments, the macro-level feedback indicates an overall performance score (e.g., in textual form and/or any other suitable form). For example, the feedback data 159 indicates that the performance received an overall numerical score of 82 percent. In other such embodiments, the evaluation engine 155 and/or the feedback engine 157 performs one or more statistical analyses on the evaluation data to look for patterns or trends in the highest likelihood sequence of complex states (e.g., in the sequence of performance errors), and the feedback data 159 indicates the results of those analyses. One such analysis indicates patterns of performance across passages and/or sections of the performance. For example, the feedback data 159 shows that the user performed well (i.e., had a high rhythmic correspondence) in measures 1 - 10, very poorly in measure 11 – 13, and moderately well for the remainder of the performance. Another such analysis matches performance patterns to predefined categories of performance. As one example, the feedback data 159 indicates that there was poor performance on rests, but not on notes (e.g., the analysis compares rhythmic correspondence for rest score notes versus not score notes). As another example, the feedback data 159 indicates that there was an overall tendency to play ahead of the beat or behind the beat (e.g., the analysis finds a statistical tendency for performance event onset times to be ahead of onset times, or delayed with respect to onset times). As another example, the feedback data 159 indicates that there was a misunderstanding of certain musical notations (e.g., the analysis finds that the user performed two notes where two score note notes are tied together, indicating a misunderstanding of tying; that the user performed all score note triplets as having the duration and spacing of eighth notes, indicating a misunderstanding of triplets; etc.). Attorney Docket No.: 108913-1412527 (000200PC) As another example, the feedback data 159 indicates that the performer appears not to have attempted some of the musical score (e.g., the analysis finds that performed notes are missing for all of one of the hands in a two-handed piece, or for an entire section of the piece). As another example, the feedback data 159 indicates that the performer appears to struggle on certain types of passages (e.g., the analysis finds that rhythmic correspondence tends to be higher during faster or slower sections of the piece). In any of the above examples, embodiments of the feedback engine 157 can generate feedback data 159 as a textual and/or graphical overlay in the same or a separate portion of an interface to indicate the macro-level feedback to the user. [0099] In embodiments that provide dynamic feedback, the evaluation engine 155 can generate the one or more types of performance evaluations (e.g., including qualitative evaluations) of the user’s performance and can send corresponding information to the feedback engine 157 concurrent with the user’s performance. The evaluation data is used by the feedback engine 157 to generate feedback data 159 (micro-level feedback and/or macro-level feedback) also concurrent with the user’s performance. As described above, the feedback data 159 can be used by the display processor 125 to generate graphical performance feedback (e.g., and/or audible feedback) to be output to the user via the display interface 120. Some or all of the feedback data 159 can be used to generate graphical and/or audible performance feedback concurrent with the user’s performance. [0100] As one example, dynamic performance evaluation determines that the user has not started playing the composition after some time. In response, dynamic qualitative feedback can be displayed (e.g., in a pop-up window, or the like) to ask whether the user wishes to cancel or continue playback and/or performance of the composition. In some implementations, such feedback also includes pausing the playback. Other feedback (e.g., an audible bell, a change in display color, etc.) can also be used to make the user aware of the detected condition. As another example, while the user is playing, dynamic performance evaluation determines that the user has been playing in the wrong octave for some number of sequential notes. In response, dynamic qualitative feedback can be displayed (e.g., in a pop-up window, on the displayed score, or the like) to guide the user to change octaves. In some implementations, the dynamic feedback can change based on whether this is the first time the user has made this error, the playing level of the user, the duration over which the same error is being detected, or any other suitable factor. Attorney Docket No.: 108913-1412527 (000200PC) [0101] While the system 100 of FIG.1 is illustrated as distinct subsystems, the environment can be implemented in any suitable computational environment according to any suitable architecture. In some embodiments, the device I/O subsystem 105, the evaluator subsystem 150, and the MS data store 140 are all implemented in a unitary computational environment. In other embodiments, the device I/O subsystem 105, the evaluator subsystem 150, and the MS data store 140 are implemented in multiple computational environments communicatively coupled together by any suitable wired and/or wireless communications. [0102] In some embodiments, the system 100 is implemented in a cloud-based environment. For example, one or more user devices are in communication with one or more remote servers via one or more communication networks. Each user device can implement a respective instance of the device I/O subsystem 105, a thin client, and a network interface. The one or more remote servers implements the evaluator subsystem 150 and a remote storage subsystem. The remote storage subsystem can include the MS data store 140 and/or the evaluation data store 156. For example, the remote storage subsystem can be used to maintain records of historical performance data. During operation, a user uses the thin client (e.g., an app) of the user device 210 to access an application that provides features of the rhythm evaluator subsystem 150 by communicating over the network(s). Operation of the thin client can involve communication of various types of data. For example, the MS visual data 145 and MS logical data 143 are received by the user device from the remote storage subsystem of the remote server(s), performance data is communicated from the user device back to the rhythm evaluator subsystem 150 of the remote server(s), and feedback data 159 is received by the user device from the rhythm evaluator subsystem 150 of the remote server(s). The network(s) can include any suitable wired or wireless communication links with any one or more public and/or private networks, local and/or remote networks, etc. [0103] Embodiments of automated performance alignment and performance evaluation systems (e.g., the automated performance evaluation system 100 of FIGS.1A and/or 1B), or components thereof, can be implemented on, and/or can incorporate, one or more computer systems, as illustrated in FIG.8. FIG. 8 provides a schematic illustration of one embodiment of a computer system 800 that can implement various system components and/or perform various steps of methods provided by various embodiments. It should be noted that FIG.8 is meant only to provide a generalized illustration of various components, any or all of which may be utilized Attorney Docket No.: 108913-1412527 (000200PC) as appropriate. FIG.8, therefore, broadly illustrates how individual system elements may be implemented in a relatively separated or relatively more integrated manner. [0104] The computer system 800 is shown including hardware elements that can be electrically coupled via a bus 805 (or may otherwise be in communication, as appropriate). The hardware elements may include one or more processors 810, including, without limitation, one or more general-purpose processors and/or one or more special-purpose processors (such as digital signal processing chips, graphics acceleration processors, video decoders, and/or the like). As illustrated, some embodiments include a device I/O subsystem 105, which can include one or more I/O devices 815 and one or more I/O processors 817. The I/O devices 815 can include a a display interface 120, an audio interface 130, and/or any other suitable interface. The I/O processors 817 can include a display processor 125, a note processor 135, and/or any other suitable processors. Additionally, input devices can include, without limitation, buttons, knobs, switches, keypads, touchscreens, remote controls, microphones, MIDI devices, and/or the like; and output devices can include, without limitation, displays, speakers, indicators, gauges, and/or the like. Some embodiments of the computer system 800 interface with additional computers, peripheral devices, etc., such that the device I/O subsystem 105 can include various physical and/or logical interfaces (e.g., ports, etc.) to facilitate component-to-component interaction and control. [0105] The computer system 800 may further include (and/or be in communication with) one or more non-transitory storage devices 825, which can comprise, without limitation, local and/or network accessible storage, and/or can include, without limitation, a disk drive, a drive array, an optical storage device, a solid-state storage device, such as a random access memory (“RAM”), and/or a read-only memory (“ROM”), which can be programmable, flash-updateable and/or the like. Such storage devices may be configured to implement any appropriate data stores, including, without limitation, various file systems, database structures, and/or the like. In some embodiments, the storage devices 825 include non-transitory memory. In some embodiments, the storage devices 825 can include the MS data store 140, the evaluation data store 156, and/or any other suitable data storage. The storage devices 825 can also include buffers and/or other temporary storage for use by the transcription engine 139, the note processor 135, the evaluation engine 155, and/or any other components. Attorney Docket No.: 108913-1412527 (000200PC) [0106] The computer system 800 can also include a communications subsystem 830, which can include, without limitation, any suitable antennas, transceivers, modems, network cards (wireless or wired), infrared communication devices, wireless communication devices, chipsets (such as a Bluetooth¥ device, an 802.11 device, a WiFi device, a WiMax device, cellular communication device, etc.), and/or other communication components. As illustrated, the communications subsystem 830 can also include the network interface 220 for facilitating communications between user devices and remote servers via communication networks. The communications subsystem 830 can further facilitate communications with other computational systems. [0107] In many embodiments, the computer system 800 will further include a working memory 835, which can include a RAM or ROM device, as described herein. The computer system 800 also can include software elements, shown as currently being located within the working memory 835, including an operating system 840, device drivers, executable libraries, and/or other code, such as one or more application programs 845, which may include computer programs provided by various embodiments, and/or may be designed to implement methods, and/or configure systems, provided by other embodiments, as described herein. Merely by way of example, one or more procedures described with respect to the method(s) discussed herein can be implemented as code and/or instructions executable by a computer (and/or a processor within a computer); in an aspect, then, such code and/or instructions can be used to configure and/or adapt a general-purpose computer (or other device) to perform one or more operations in accordance with the described methods. In some embodiments, the operating system 840 and the working memory 835 are used in conjunction with the one or more processors 810 to implement features of musical score performance alignment and/or automated performance evaluation, as described herein. [0108] A set of these instructions and/or codes can be stored on a non-transitory computer- readable storage medium, such as the non-transitory storage device(s) 825 described above. In some cases, the storage medium can be incorporated within a computer system, such as computer system 800. In other embodiments, the storage medium can be separate from a computer system (e.g., a removable medium, such as a compact disc), and/or provided in an installation package, such that the storage medium can be used to program, configure, and/or adapt a general-purpose Attorney Docket No.: 108913-1412527 (000200PC) computer with the instructions/code stored thereon. These instructions can take the form of executable code, which is executable by the computer system 800 and/or can take the form of source and/or installable code, which, upon compilation and/or installation on the computer system 800 (e.g., using any of a variety of generally available compilers, installation programs, compression/decompression utilities, etc.), then takes the form of executable code. [0109] It will be apparent to those skilled in the art that substantial variations may be made in accordance with specific requirements. For example, customized hardware can also be used, and/or particular elements can be implemented in hardware, software (including portable software, such as applets, etc.), or both. Further, connection to other computing devices, such as network input/output devices, may be employed. [0110] As mentioned above, in one aspect, some embodiments may employ a computer system (such as the computer system 800) to perform methods in accordance with various embodiments of the invention. According to a set of embodiments, some or all of the procedures of such methods are performed by the computer system 800 in response to processor 810 executing one or more sequences of one or more instructions (which can be incorporated into the operating system 840 and/or other code, such as an application program 845) contained in the working memory 835. Such instructions may be read into the working memory 835 from another computer-readable medium, such as one or more of the non-transitory storage device(s) 825. Merely by way of example, execution of the sequences of instructions contained in the working memory 835 can cause the processor(s) 810 to perform one or more procedures of the methods described herein. [0111] The terms “machine-readable medium,” “computer-readable storage medium” and “computer-readable medium,” as used herein, refer to any medium that participates in providing data that causes a machine to operate in a specific fashion. These mediums may be non- transitory. In an embodiment implemented using the computer system 800, various computer- readable media can be involved in providing instructions/code to processor(s) 810 for execution and/or can be used to store and/or carry such instructions/code. In many implementations, a computer-readable medium is a physical and/or tangible storage medium. Such a medium may take the form of a non-volatile media or volatile media. Non-volatile media include, for example, optical and/or magnetic disks, such as the non-transitory storage device(s) 825. Attorney Docket No.: 108913-1412527 (000200PC) Volatile media include, without limitation, dynamic memory, such as the working memory 835. Common forms of physical and/or tangible computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD- ROM, any other optical medium, any other physical medium with patterns of marks, a RAM, a PROM, EPROM, a FLASH-EPROM, any other memory chip or cartridge, or any other medium from which a computer can read instructions and/or code. Various forms of computer-readable media may be involved in carrying one or more sequences of one or more instructions to the processor(s) 810 for execution. Merely by way of example, the instructions may initially be carried on a magnetic disk and/or optical disc of a remote computer. A remote computer can load the instructions into its dynamic memory and send the instructions as signals over a transmission medium to be received and/or executed by the computer system 800. The communications subsystem 830 (and/or components thereof) generally will receive signals, and the bus 805 then can carry the signals (and/or the data, instructions, etc., carried by the signals) to the working memory 835, from which the processor(s) 810 retrieves and executes the instructions. The instructions received by the working memory 835 may optionally be stored on a non-transitory storage device 825 either before or after execution by the processor(s) 810. [0112] It should further be understood that the components of computer system 800 can be distributed across a network. For example, some processing may be performed in one location using a first processor while other processing may be performed by another processor remote from the first processor. Other components of computer system 800 may be similarly distributed. As such, computer system 800 may be interpreted as a distributed computing system that performs processing in multiple locations. In some instances, computer system 800 may be interpreted as a single computing device, such as a distinct laptop, desktop computer, or the like, depending on the context. [0113] FIGS.9A – 9C show several plots representing an illustrative performance that is largely correct, except for a silence between approximately 5 and 7 seconds. In the illustrative performance, only the right hand is being played by the user, and the left-hand accompaniment is an automated backing track. A pair of horizontal dashed lines are shown in all of FIGS. 9A – 9C to represent the period of silence. FIG. 9A shows a pianoroll representation 900 of a relevant portion of the performance. Similar to the representation 600 in FIG.6, the representation 900 shows score notes as thin lines and shows performed notes as thicker shaded regions (overlayed Attorney Docket No.: 108913-1412527 (000200PC) on the score notes, where there is overlap). FIG. 9B shows plots 910 of right-hand time and pitch deltas, respectively, as inferred in the Viterbi path. Black line segments represent the average time/pitch delta in a region, and the underlying grey line represents the temporal variation of the deltas. Interruptions of the line segments correspond to H i = ^, where either the user is not playing anything at all, or notes cannot meaningfully be aligned to the score. FIG.9C shows a plot 920 representing inferred qualitative feedback, localized on the time line. [0114] FIGS.10A – 10C show several plots representing an illustrative performance that has several issues, including that the user played one octave too high from 27 second onward, and that the user played several wrong notes across the piece. In the illustrative performance, only the right hand is being played by the user, and the left-hand accompaniment is an automated backing track. FIG.10A shows a pianoroll representation 1000 of a relevant portion of the performance. Similar to the representations in FIGS.6 and 9A, the representation 1000 shows score notes as thin lines and shows performed notes as thicker shaded regions (overlayed on the score notes, where there is overlap). FIG.10B shows plots 1010 of right-hand time and pitch deltas, respectively, as inferred in the Viterbi path. Black line segments represent the average time/pitch delta in a region, and the underlying grey line represents the temporal variation of the deltas. Interruptions of the line segments correspond to H i = ^, where either the user is not playing anything at all, or notes cannot meaningfully be aligned to the score. FIG.10C shows a plot 1020 representing inferred qualitative feedback, localized on the time line. It can be seen in FIG. 10C that the inferred qualitative feedback includes several wrong notes played by the user, a region where the user is not playing the score in a recognizable way, the user missed an accidental, and the user made an interval error. [0115] FIG.11 shows a flow diagram of an illustrative method 1100 for musical score performance alignment for automated performance evaluation, according to embodiments described herein. Embodiments begin at stage 1104 by receiving (e.g., by a processor-based evaluation engine) performed note data defining a sequence of performed notes representing performance of a musical score by a user in accordance with a score time basis. Each of the sequence of performed notes is defined by at least a respective performed pitch and performed onset time. In some implementations, each performed note is further defined by a respective performance offset time and/or performance duration. Attorney Docket No.: 108913-1412527 (000200PC) [0116] In some embodiments, the receiving performed note data in stage 1104 includes receiving a raw audio stream via an audio interface and processing the raw audio stream by a transcription engine of a note processor to generate the performed note data. In other embodiments, the receiving performed note data in stage 1104 includes receiving a musical instrument digital interface (MIDI) stream via a MIDI interface and processing the MIDI stream by a note processor to generate the performed note data. The note processor can be coupled with the processor-based evaluation engine. [0117] At stage 1108, embodiments can receive (e.g., by the processor-based evaluation engine) score note data defining a sequence of score notes representing the musical score. Each of the sequence of score notes is defined by at least a respective score pitch and a score onset time referenced to the score time basis. In some implementations, each score note is further defined by a respective score offset time and/or score duration. [0118] At stage 1112, embodiments can compute (e.g., by the processor-based evaluation engine) a note-wise alignment between the performed note data and the score note data by computing the highest likelihood sequence of complex hand states for a sequence of time steps in the score time basis given the sequence of score notes and the sequence of performance notes. Each time step of the sequence of time steps is defined as corresponding to the score onset time of a respective one of the sequence of score notes (each time step corresponds with the onset timing of a respective score note). In some embodiments, each complex hand state of the sequence of complex hand states is a joint left-and-right-hand state computed as element of a cartesian product of a computed left-hand state space for the time step and a computed right- hand state space for the time step. [0119] In some embodiments, the computing the note-wise alignment at stage 1112 can include applying time step discretization and windowing to the performed note data and the score note data to generate a sequence of regions, each associated with a respective time step of the sequence of time steps in the score time basis. Some such embodiments can include computing region data for each region of the sequence of regions based on the computing the highest likelihood sequence of complex hand states, the respective region data for each region indicating either a not-playing observation or a playing observation, the playing observation further indicating associated pitch delta information and associated timing delta information for the Attorney Docket No.: 108913-1412527 (000200PC) region. In some embodiments, computing the region data includes: computing the associated pitch delta information, associated timing delta information, and associated weighted kernel density estimations for the region; discretizing the hand state space for the region based on the associated pitch delta information, the associated timing delta information, and the associated kernel density estimations; and computing a Viterbi path through the discretized hand state space. In some embodiments, computing the note-wise alignment at stage 1112 further includes generating a one-to-one mapping between the performance notes and the score notes by, for each region of the sequence of regions, querying the respective region data to obtain respective region feedback, the respective region feedback comprising either a single feedback indication of the not-playing observation, or one or more feedback indications representing the playing observation as one or more note-specific errors relative to one or more particular score notes. [0120] Some such embodiments further include, for each region, computing the associated pitch delta information and the associated timing delta information by: defining a positive observation for the region as either the not-playing observation or the playing observation; computing a kernel density estimation (KDE) of the positive observation, such that the KDE of the positive observation contains modes corresponding to alignments between the score note associated with the region and any performance notes in the region; defining a corresponding negative observation for the region; computing a KDE of the negative observation, such that the KDE of the negative observation contains modes corresponding to misalignments between the score note associated with the region and any performance notes in the region; and computing a non-normalized density for the region by subtracting the KDE of the negative observation from the KDE of the positive observation, such that the non-normalized density has modes corresponding to highest-likelihood candidates for the associated pitch delta information and the associated timing delta information for the region. [0121] In some embodiments, each complex hand state of the highest likelihood sequence of complex hand states represents a hypothesis of whether the user is playing a predicted note at the time step and, if so, a hypothesis of a delta between the predicted note and the one of the sequence of score-referenced note events. The delta can include a pitch delta and a time delta, and either or both of the pitch delta and the time delta can be represented by a continuous variable. In some such embodiments, computing the note-wise alignment at stage 1112 includes, at each time step: applying a revised hidden Markov model (HMM) to generate a respective Attorney Docket No.: 108913-1412527 (000200PC) plurality of hypothesized complex hand states for the time step; and ranking the respective plurality of hypothesized complex hand states to determine a respective highest likelihood complex hand state for the time step, the highest likelihood sequence of complex hand states being the respective highest likelihood complex hand state for each of the sequence of time steps. Some such embodiments further include applying a Viterbi algorithm to compute transition probabilities between the respective plurality of hypothesized complex states at each time step, wherein the ranking is based on the transition probabilities. [0122] At stage 1116, embodiments can automatically generate qualitative evaluation feedback by pattern-matching the note-wise alignment to a library of evaluation models. In some embodiments, the computing In stage 1112 and the automatically generating in stage 1116 are at least partially performed for at least a portion of the sequence of time steps during a corresponding portion of the performance by the user (i.e., concurrently with the user performing the piece). In such embodiments, the method can continue at stage 1120 by displaying at least a portion of the qualitative evaluation feedback during the corresponding portion of the performance by the user. In other embodiments, the automatically generating at stage 1116 is at least partially performed upon completion of the performance by the user. In such embodiments, the method can continue at stage 1120 by displaying at least a portion of the qualitative evaluation feedback after completion of the performance by the user. [0123] The methods, systems, and devices discussed above are examples. Various configurations may omit, substitute, or add various procedures or components as appropriate. For instance, in alternative configurations, the methods may be performed in an order different from that described, and/or various stages may be added, omitted, and/or combined. Also, features described with respect to certain configurations may be combined in various other configurations. Different aspects and elements of the configurations may be combined in a similar manner. Also, technology evolves and, thus, many of the elements are examples and do not limit the scope of the disclosure or claims. [0124] Specific details are given in the description to provide a thorough understanding of example configurations (including implementations). However, configurations may be practiced without these specific details. For example, well-known circuits, processes, algorithms, structures, and techniques have been shown without unnecessary detail in order to avoid Attorney Docket No.: 108913-1412527 (000200PC) obscuring the configurations. This description provides example configurations only, and does not limit the scope, applicability, or configurations of the claims. Rather, the preceding description of the configurations will provide those skilled in the art with an enabling description for implementing described techniques. Various changes may be made in the function and arrangement of elements without departing from the spirit or scope of the disclosure. [0125] Also, configurations may be described as a process which is depicted as a flow diagram or block diagram. Although each may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be rearranged. A process may have additional steps not included in the figure. Furthermore, examples of the methods may be implemented by hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware, or microcode, the program code or code segments to perform the necessary tasks may be stored in a non-transitory computer-readable medium such as a storage medium. Processors may perform the described tasks. [0126] Having described several example configurations, various modifications, alternative constructions, and equivalents may be used without departing from the spirit of the disclosure. For example, the above elements may be components of a larger system, wherein other rules may take precedence over or otherwise modify the application of the invention. Also, a number of steps may be undertaken before, during, or after the above elements are considered.