Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
DETECTION OF BRUTE FORCE ATTACKS
Document Type and Number:
WIPO Patent Application WO/2021/119396
Kind Code:
A1
Abstract:
The disclosed embodiments determine a plurality of anomaly indications for a plurality of corresponding time series. A multi-modal model is defined for each time series. A first distribution is compared against a time series when the time series values fall within a first range and a second distribution is compared against the time series when the time series values fall with a second range. Based on the comparison, an indication of anomaly is generated for the time series. The indicators of anomaly for each time series are then combined using Fishers method in some embodiments. The resulting combined anomaly indication is used to determine whether a network is experiencing a brute force attack.

Inventors:
SODJA COLE (US)
CARROLL JUSTIN (US)
Application Number:
PCT/US2020/064455
Publication Date:
June 17, 2021
Filing Date:
December 11, 2020
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
MICROSOFT TECHNOLOGY LICENSING LLC (US)
International Classes:
H04L29/06; G06F21/55
Foreign References:
US20160004580A12016-01-07
Other References:
ASWANI KRISHNA ET AL: "Topic modeling of SSH logs using latent dirichlet allocation for the application in cyber security", 2015 SYSTEMS AND INFORMATION ENGINEERING DESIGN SYMPOSIUM, IEEE, 24 April 2015 (2015-04-24), pages 75 - 79, XP032783151, DOI: 10.1109/SIEDS.2015.7117015
LEE JAE-KOOK ET AL: "Heavy-tailed distribution of the SSH Brute-force attack duration in a multi-user environment", JOURNAL OF THE KOREAN PHYSICAL SOCIETY, KOREAN PHYSICAL SOCIETY, KR, vol. 69, no. 2, 27 July 2016 (2016-07-27), pages 253 - 258, XP036014378, ISSN: 0374-4884, [retrieved on 20160727], DOI: 10.3938/JKPS.69.253
Attorney, Agent or Firm:
CHOI, Daniel et al. (US)
Download PDF:
Claims:
1. A method performed by hardware processing circuitry, comprising: obtaining a first time series of operational parameter values of a device attached to a network; comparing the values of the first time series to a first parameter value range and a second parameter value range; determining, based on the comparing, that the values of the first time series are within the first parameter value range; based on the determining, selecting, from a plurality of distributions, a first distribution; comparing the first time series to the selected first distribution; determining, based on the comparing, a first probability at which values in the first time series occur in the selected distribution; determining, based on the first probability, a likelihood of a brute force attack on the network; based on the first time series, adjusting a boundary between the first parameter value range and the second parameter value range; and determining, based on the adjusted boundary, a second likelihood of a brute force attack.

2. The method of claim 1, further comprising performing, based on the likelihood, a mitigating action, the mitigating action including changing an access control policy of the network.

3. The method of claim 1 , wherein the adjusting of the boundary comprises updating a threshold value τρ defining the boundary between the first range and the second range, τρ defined according to: τρ = E[Y\Y > Qp-1] where:

E[] is an expected value function,

Y is the first time series, τρ is a threshold value between the first range and the second range, and

QP is a qth quantile of a negative binomial distribution.

4. The method of claim 1, wherein the adjusting of the boundary further comprises updating parameters defining the first and second distributions based on the first time series.

5. The method of claim 1 , where at least one parameter of the distribution is updated according to: where: φt+€ is the updated parameter, gΦ is a link function for a parameter φ, where: is a central moment corresponding to the parameter φ, is a smoothing weight, is gΦ[Φt], and is a sample value included in the first time series.

6. The method of claim 1, further comprising: determining a second time series for second operational parameter values of the device; selecting, based on the second time series, a second distribution; second comparing the second time series to the selected second distribution; determining, based on the second comparing, a second probability at which second values in the second time series occur in the selected second distribution; applying Fisher’s method to the first probability and the second probability; and based on the applying, generating a combined indicator of anomaly, wherein the determining of the likelihood of the brute force attack is further based on the combined indicator.

7. A system, comprising: hardware processing circuitry; one or more hardware memories storing instructions that when executed configure the hardware processing circuitry to perform operations comprising: obtaining a first time series of operational parameter values of a device attached to a network; comparing the values of the first time series to a first parameter value range and a second parameter value range; determining, based on the comparing, that the values of the first time series are within the first parameter value range; based on the determining, selecting, from a plurality of distributions, a first distribution; comparing the first time series to the selected first distribution; determining, based on the comparing, a first probability at which values in the first time series occur in the selected first distribution; determining, based on the first probability, a likelihood of a brute force attack on the network; based on the first time series, adjusting a boundary between the first parameter value range and the second parameter value range; and determining, based on the adjusted boundary, a second likelihood of a brute force attack.

8. The system of claim 7, the operations further comprising performing, based on the likelihood, a mitigating action.

9. The system of claim 8, wherein the mitigating action includes changing an access control policy of the network

10. The system of claim 7, wherein the adjusting of the boundary comprises updating a threshold value τρ defining the boundary between the first range and the second range, Tp defined according to:

Tp: = E[Y|Y > Qp_1] where:

E[] is an expected value function,

Y is the first time series, tp is a threshold value between the first range and the second range, QP is a qth quantile of a negative binomial distribution.

11. The system of claim 7, wherein the adjusting of the boundary further comprises updating parameters defining the first and second distributions based on the first time series.

12. The system of claim 7, wherein the parameters are updated via exponential smoothing and a grid of smoothing weights.

13. The system of claim 12, wherein at least one parameter of the distribution is updated according to: where: φt+ is the updated parameter, gΦ is a link function for a parameter φ, where:

Μφ is a central moment corresponding to the parameter φ, is a smoothing weight, ηt is gΦt], yt+€ is a sample value included in the first time series.

14. The system of claim 7, the operations further comprising modeling a distribution of the first time series as a finite mixture of distributions where each parameter Op is a stochastic process.

15. A computer readable storage medium comprising instructions that when executed configure hardware processing circuitry to perform operations comprising: obtaining a first time series of operational parameter values of a device attached to a network; comparing the values of the first time series to a first parameter value range and a second parameter value range; determining, based on the comparing, that the values of the first time series are within the first parameter value range; based on the determining, selecting, from a plurality of distributions, a first distribution; comparing the first time series to the selected first distribution; determining, based on the comparing, a first probability at which values in the first time series occur in the selected distribution; determining, based on the first probability, a likelihood of a brute force attack on the network; based on the first time series, adjusting a boundary between the first parameter value range and the second parameter value range; and determining, based on the adjusted boundary, a second likelihood of a brute force attack.

Description:
DETECTION OF BRUTE FORCE ATTACKS

BACKGROUND

[0001] Computers exposed to the internet are an attractive target for adversaries because they present a simple and effective way to gain access to a network. Performing a brute force attack does not necessarily require a high level of expertise or a sophisticated use of exploits. Instead, in some scenarios, attackers can utilize off-the-shelf tools to scan the Internet for potential victims and leverage similar such tools for conducting the brute force attack. [0002] Attackers target servers having relatively weak passwords and no multi-factor authentication, virtual private networks (VPNs), and other security protections. Through brute force attacks, threat actor groups have gained access to target machines and have conducted many follow-on activities like ransomware and coin mining operations.

BRIEF DESCRIPTION OF THE DRAWINGS [0003] FIG. 1 is an overview diagram of an example embodiment of a network.

[0004] FIG. 2A shows an empirical distribution in number of days per machine where one or more brute force attacks were observed.

[0005] FIG. 2B shows an example count of daily and maximum hourly network failed logins for a local machine under brute force attack. [0006] FIG. 3 shows example daily and cumulative failed network logins. [0007] FIGs. 4A-B shows two histograms including example daily counts of remote desktop protocol (RDP) public connections per machine in an example enterprise with known brute force attacks.

[0008] FIG. 5 is a bubble chart of example IP abuse score versus counts of machine with inbound RDP.

[0009] FIG. 6 shows an example daily count of network failed login for a machine with no brute force attack.

[0010] FIG. 7 shows example ranking of detected RDP inbound brute force attacks. [0011] FIG. 8 is an example data flow diagram of one or more of the disclosed embodiments.

[0012] FIG. 9 is an example data flow diagram of one or more of the disclosed embodiments.

[0013] FIG. 10 is a flowchart of an example method for determining whether a network is experiencing a brute force attack. [0014] FIG. 11 is a flowchart of an example process for estimating parameters of an unknown distribution.

[0015] FIG.12 illustrates a block diagram of an example machine 900 upon which any one or more of the techniques (e.g., methodologies) discussed herein may perform.

DETAILED DESCRIPTION

[0016] Brute force attacks represent a technical problem in that they can be difficult to detect while generating a manageable number of false positives. In a brute force attack, adversaries can attempt to sign into an account by effectively using one or more trial-and- error methods. In some instances of brute force attacks, failed logins are associated with these attacks. These failed logins can occur over a very short time frequency, typically minutes or even seconds. A brute force attack might also involve an attempt to access one or multiple accounts using valid usernames. The valid usernames may be obtained by the attacker via traditional credential theft techniques or via use of common usernames such as “administrator.” The same holds for password combinations. In detecting brute force attacks, some of the disclosed embodiments focus on a source Interet Protocol (IP) address and username, as in at least some cases, password data is not available.

[0017] In some operating environments, login failures are logged to a log file or other logging facility. These log entries may include an event code indicating the login failure, along with associated information, such as a username used in the login attempt. For example, in the Microsoft Windows operating system, whenever an attempted login fails for a local machine, Event Tracing for Windows (ETW) registers Event ID 4625 with the associated username.

[0018] Moreover, source IP addresses associated with the failed login attempt are also recorded in at least some environments. This information can be useful in assessing if a machine is under brute force attack. This information, in combination with indications of login failures (e.g. Event ID 4624) for non-server machines, is useful to understand which login sessions were successfully created. Based on this understanding, indications of a compromised machine are generated.

[0019] Disclosed are embodiments that utilize these and/or other signals which have proven valuable in detecting and, in some embodiments, mitigating brute force attacks. In some embodiments, this capability may be deployed via Microsoft Threat Experts, a managed threat hunting service in Microsoft Defender Advanced Threat Protection. Other embodiments are deployed in non-Microsoft environments or environments including solutions delivered by a mix of different vendors. [0020] In some embodiments, the disclosed embodiments generate one or more alerts. The alerts are directed to, in various embodiments, one or more of system administrators, security vendors, end users, or other contacts. In some embodiments, the detection of anomalous activity indicating a brute force attack causes programmatic reconfiguration of network security devices such as firewalls, proxies, routers, switches, or other network devices to mitigate the brute force attack. For example, in some embodiments, an access control policy of one or more network security devices is updated based on a likelihood of a brute force attack. The programmatic reconfiguration is performed, in some embodiments, without human intervention. For example, a firewall is reconfigured in some embodiments to prevent access to a network by a device identified as initiating at least a portion of an identified brute force attack. In some embodiments, more restrictive access policies are progressively deployed to network security devices until the brute force attack is successfully mitigated (e.g. a level of anomalous activity drops below a predetermined threshold). These access policies are progressively deployed, in some embodiments, by programmatically configuring one or more firewalls deployed to control access to the network.

[0021] Some of the disclosed embodiments recognize that observation of a sudden, relatively large count of particular types of events (e.g. Event ID 4625) associated with network connections (e.g. RDP), while infrequent in some environments, does not necessarily imply that a machine is under attack. For example, a script that performs the following actions would appear to be suspicious when evaluating a time series of counts of failed logins. However, such an example is most likely not malicious:

• uses an expired password

• retries login attempts every N-minutes with different usernames

• over a public IP address within a range owned by the enterprise [0022] In contrast, behavior that includes the following events can be indicative of an attack:

• extreme counts of failed login from many unknown usernames

• never previously successfully authenticated

• from multiple different connections (e.g. multiple different RDP connections)

• from new inbound IP addresses, especially with history of abuse

[0023] The above example demonstrates that understanding the context of failed logins and inbound connections can be useful in discriminating between true positive (TP) and false positive (FP) brute force attacks. Such understanding provides for improved precision in detecting brute force attacks.

[0024] In developing the disclosed embodiments, historical data from several previous months was obtained. This historical data was later analyzed to provide insights into the types of brute force attacks occurring across the measured environments during those previous months. Out of approximately 45,000 machines that had both public IP connections (e.g., RDP) and at least one (1) network failed login, on average, several hundred machines per day had a high probability of undergoing one or more brute force attack attempts. Of the subpopulation of machines with detected brute force attacks, the attacks lasted 2-3 days on average, with about 90% of cases lasting for 1 week or less, and less than 5% for 2 or more weeks. Based at least partly on insights gamed via the analysis of the historical data, the presently disclosed embodiments were then developed.

[0025] To detect brute force attacks, some of the disclosed embodiments implement a technical solution to the technical problem of brute force attacks that models a distribution of operational parameter values as a finite mixture of distributions. The use of a mixture model in detection of a brute force attack recognizes the multi-modal nature of operational parameter values experienced by a computer network. By comparing operational parameter values to a distribution appropriate for a network environment during a given time period, the disclosed embodiments adapt their detection of brute force attacks as needed to reduce a number of false positives while still providing accurate detection of real brute force attacks.

[0026] Each distribution in the mixture model is defined by one or more parameters. The parameters and thus the distributions themselves are dynamically updated based on additional operational parameter values as those operational parameter values are observed. The disclosed embodiments compare observed operational parameter values to particular distributions in the mixture model based on dynamically determined thresholds that determine which distribution is used for the comparison. In some embodiments, multiple time series are each modeled as separate mixtures of distributions, with each mixture model defined by its own distribution parameters and comparison thresholds. By recognizing that one or more time series of operational parameter values may be multimodal, and thus are best compared using a mixture model of distributions, the disclosed embodiments are able to more accurately detect anomalous events, while reducing the probability of false positives.

[0027] FIG. 1 is an overview diagram 100 of a network that implements one or more of the disclosed embodiments. FIG. 1 shows a network 101. The network 101 includes a plurality of network components 102A-F. FIG. 1 also shows devices external 104A-C to the network 101. Network components 102A-F within the network 101 and external devices 104A-C outside the network 101 communicate with each other. Examples of this communication is illustrated in FIG. 1 as arrows 106A-G (not all arrows are labeled to preserve figure clarity).

[0028] Access to the network 101 is controlled by a firewall 108. FIG. 1 also shows a network security system 110. In some embodiments, the network security system 110 collects information on the communication between network components 102A-F and/or external devices 104A-C represented by the arrows 106A-G. The network security system 110 also collects, in some embodiments, operational parameter values of one or more of the network components 102A-F. As discussed in more detail below, based on information indicating the communications, the disclosed embodiments provided for improved methods of detecting a brute force attack on the network 101. In some embodiments, the network security system 110 sends one or more control signals 112 to the firewall 108. The control signals 112 configure one or more network access policies for one or more of the devices 102A-F.

[0029] The network security system 110 receives operational parameter values from one or more of the network components 102A-F. The operational parameter values are received by the network security system 110 via messages transmitted from the network components (directly or indirectly) to the network security system 110. These messages are shown as messages 122A-C. The operational parameter values provided by the one or more network components 102A-C are provided, in at least some embodiments, as a time series of values. In other words, the operational parameter values are periodically, or at multiple times, at regular or irregular intervals, measured, and then communicated to the network security system 110.

[0030] The network security system 110 processes the operational parameter values according to the methods discussed herein in order to determine a probability that the network 101 is experiencing a brute force attack. For example, the analysis performed by the network security system 110 determines, in some cases, that one or more of the external devices 104A-C is initiating a brute force attack on the network 101.

[0031] FIG. 2A shows an example empirical distribution of a number of days devices of a network experienced a brute force attack. Large counts of failed logins are often associated with brute force attacks. In the example of FIG. 2A, 90% of brute force attacks exhibit greater than ten attempts, with a median larger than 60. In addition, unusual daily counts have a high positive correlation with large counts in shorter time windows (see FIG. 2B). The number of extreme failed logins per day occurred under two hours typically, with about 40% failing in under thirty minutes. [0032] FIG. 2B shows example counts of daily and maximum hourly network failed logins for a local machine under brute force attack. While detection logic based on thresholding a count of failed logins during a daily or finer grain time window can detect many brute force attacks, this can produce too many false positives. Relying on such a strategy will also result in false negatives, resulting in missed compromises of a network. We identified several instances of brute force attacks that generated fewer than five to ten failed attempts daily, but often persisted for many days. This attack pattern avoids extreme counts at any point in time. For such a brute force attack, thresholding the cumulative number of logins over time that are unsuccessful across time demonstrated efficacy, as discussed below with respect to FIG. 3. [0033] FIG. 3 shows daily and cumulative failed network logins. Looking at counts of network failed logins provides a useful but incomplete picture of RDP brute force attacks. This can be further augmented with additional information about the failed login, such as the failure reason, time of day and day of week, as well as the username itself. An especially strong signal is the source IP of the inbound RDP connection. Knowing if the external IP has a high reputation of abuse, as can be looked up on sites like https://www.abuseipdb.com/, can directly confirm if an IP is a part of an active brute force.

[0034] Unfortunately, not all IP addresses have a history of abuse, and it can be expensive to retrieve information about many external IP addresses on demand. Maintaining a list of suspicious IPs is an option, but relying on this can result in false negatives as inevitably, new IPs continually occur, particularly with the adoption of cloud computing and ease of spinning up virtual machines. A generic signal that can augment foiled login and user information is counting distinct RDP connections from external IP addresses. Again, extreme values occurring at a given time or cumulated over time can be an indicator of attack.

[0035] FIGs. 4A and 4B show example histograms (i.e., counts put into discrete bins) of daily counts of RDP public connections per machine that occurred for an example enterprise with known brute force attacks. It’s evident that normal machines have a lower probability of larger counts compared to machines attacked. [0036] Given that some enterprises have machines under brute force attack daily, the priority may be to focus on machines that have been compromised, defined by a first successful login following unsuccessful attempts from suspicious source IP addresses or unusual usernames. In embodiments directed to Microsoft Windows operating systems, Windows logs, Event ID 4624 can be leveraged to measure successful login events for local machine in combination with unsuccessful logins (e.g. Event ID 4625).

[0037] Out of the hundreds of machines with RDP brute force attacks detected in our analysis, we found that about .08% were compromised. Furthermore, across all enterprises analyzed over several months, on average about one machine was detected with high probability of being compromised resulting from an RDP brute force attack every 3-4 days.

[0038] FIG. 5 shows a bubble chart of example average abuse scores of external IPs associated with RDP brute force attacks that successfully compromised machines. The size of the bubbles is determined by the count of distinct machines across the enterprises analyzed having a network connection from each IP. While there is diversity IPs originate,

Netherlands, Russia and United Kingdom have a larger concentration of inbound RDP connections from high-abuse IP.

[0039] A takeaway from our analysis is that successful brute force attempts are not uncommon; therefore, it’s critical to monitor at least the suspicious connections and unusual unsuccessful logins that result in authenticated login events. In the following sections we describe a methodology to do this.

[0040] Reliance on thresholding of operational parameter values such as a number of unsuccessful attempts per machine for detecting brute force attacks can be noisy and may result in many false positives. For example, such an approach can generate false positives in a situation when a script has been preconfigured with an outdated password and continuously attempts to login using the invalid password.

[0041] To avoid problems associated with basic thresholding, at least some of the disclosed embodiments utilize a plurality of contextually relevant signals, such as:

• a timing, type, and count of unsuccessful login attempts

• username history

• a type and frequency of network connections

• first-time username from a new source machine with a successful login [0042] These examples can be extended to include indicators associated with a brute force attack, such as counts of detected port scanning.

[0043] At least some of the disclosed embodiments analyze one or more of the following per machine (or device) signals when determining whether an RDP inbound brute force attack is in progress or has previously occurred:

• unsuccessful login and RDP connections: hour of day and day of week.

This data is represented, in various embodiments, via a time series indicating counts of unsuccessful logins, and/or a separate time series indicating counts of successful logins. These time series are correlated with time to identify counts of unsuccessful or failed logins per hour and/or day as needed. These time series are generated, in various embodiments, on a per device and/or a per user/source IP basis. In some embodiments, these time series may be aggregated across time periods. For example, a time series representing weekly effects may be generated (e.g. activity Monday-Wednesday may be greater, in some cases, than end of work week or weekend activity for example). Some embodiments generate time series that accumulate counts across weeks.

• Time delay between one or more failed or unsuccessful logins and a successful logon. For example, a time series indicating user account information such as one or more of a user account name, an IP address, a time of a last login failure, and a time of a last login success is generated in some embodiments.

• Event ID 4625 logon type (filtered to network and Remote Interactive). This time series provides login type information for each login attempt and/or successful login. In some embodiments, the login attempts are remote desktop protocol (RDP) login attempts. In some embodiments, the time series is an ordinal time series. In some embodiments, login type information, such as a login type is represented using standard login type codes, such as code = 3 for network login, code = 10 for remote login, and code = 1 for local console-initiated logins.

• Event ID 4625 failure reason (filtered to %%2308, %%2312, %%2313). Some embodiments generate a categorical time series indicating a code for a login failure reason. For example, %%2313 is a code for “unknown username or bad password.”

• A cumulative count of each distinct or unique usernames that failed to login without a successful login (or without a subsequent successful login). Some embodiments include one or more time series relating to a single device or across multiple devices. For example, a first time series communicates, in some embodiments, a count C of distinct usernames that attempted to login and failed at a particular device. A second time series communicates that a second number “k" out of the C distinct users also failed to login to other devices.

• A count (and cumulative count) of failed or unsuccessful logins. As discussed above.

• A count (and cumulative count) of RDP inbound external IP addresses. In some embodiments, a time series provides counts of distinct, external IP addresses per device over time, per device per IP address, and/or per IP address (which may include IP addresses across enterprise networks). Some embodiments include an additional time series of counts of a particular IP address across devices over time. In various embodiments, these time series are per machine/device and/or across devices. For example, user and IP address statistics are examined, in some embodiments, at both per machine and across machines. Counts of failed/success/network connections are examined, in some embodiments on a per machine basis. Some embodiments include a time series of cumulative counts, which provide counts of similar data as described above, except that the counts are provided for longer time periods. For example, in some embodiments, a first time series provides a count for a period of time, wherein the period of time is specified via a configurable parameter (e.g. hourly). These embodiments may also provide a second time series including a cumulative count of the parameter for a longer time period (e.g. twelve hours, 24 hours, one day, two days, three days, one week, or any other longer period of time).

• A count of other machines having RDP inbound connections from a common device or at least devices having equivalent IP addresses. This time series correlates, in some embodiments, counts of IP addresses that have attempted logins on multiple devices. Described another way, the time series indicates counts of at least two devices that have RDP inbound connections from a common device. In at least some embodiments, a time series of these counts is generated for each device included within a monitored network. Thus, in these embodiments, the count represents, for a particular device, a count of devices having an inbound connection with a source IP address that is equivalent to an inbound connection source address on the particular device.

[0044] In some embodiments, each time series or signal is scored for its indication of a brute force attack. In some embodiments, hourly time windows are the lowest time granularity. In other embodiments, other periods of time (a minute, five minutes, etc.) are used as the shortest time window for a time series. One or more additional time series can be generated by accumulating the lowest time granularity time series. For example, in some embodiments, some time series aggregate multiple hour time series within a day. In some embodiments, some time series aggregate granular time series across multiple days accounting for specific daily and weekly periodicity effects. Additionally, some time series are aggregated across multiple devices, and across multiple networks in some embodiments.

[0045] Let t be a time window, and T represent an interval of multiple time windows at a higher temporal frequency than t. The discussion below defines y t to be a lowest level univariate time series per each signal type and per single or multiple devices, and y T be an aggregation of y t for the time frequency T.

[0046] A first signal is a categorical time series of logon type (e.g. Event Id 4625) per device per t. For each device in a network, y t =

[("2", count t ), ("3", count t ) ..., ("11", count t )], where “2”, “3" and “11" are examples of login types. For example, "2" is a sign-on at a keyboard, “3” is a connection to a shared folder on the device from elsewhere in the network, etc. Of particular interest for monitoring inbound brute force were logon types “3” and “10”. The count t represents a total number of failed logins at time t of the specified login type. In some embodiments, the count is determined across usernames and IP addresses group.

[0047] A second signal is a categorical time series of failure reason (e.g. Event ID 4625) per device per / per logon type. For each device in a network, y t =

[("2: %%2309", count t , ("3:%%2313", count t ) ... ,("11 :%%2304", count t ], where each entry of the time series yt includes the logon type (e.g. “2:”, “3:", “11 :”), a failure reason code (e.g. “%%2309”), and a count. For example, the failure reason %%2308 describes that the users attempting to login have not been granted access on the device. The count t term represents a total number of failed logins at time t for the specified logon type and failure reason. In some embodiments, the total number of failed logins are counted across usernames. Some embodiments generate a time series of count t per IP address. In some embodiments, a time series is generated that aggregates count t values for a group of IP addresses having an associated reputation or anomaly score that meets a predetermined criterion (e.g., reputation or score is less than a predetermined threshold). [0048] A third signal in some embodiments is a time series aggregated failed login counts across multiple login types and/or multiple login failure reasons. Thus, these embodiments contemplate a third signal comprised of one or more combinations of possible login types and one or more combinations of possible login failure reasons. In some embodiments, this signal is generated on a per device basis. For each device in a network and set of logon type and failure reason time series, y t = count of total number of foiled logins at time t across username and IP addresses. For example, y t can be counts summed across possible login types 3 and 10, and login failure reasons in (%%2308, %%2312, %%2311, %%2313).

[0049] A fourth signal in some embodiments is a cumulative count of foiled logins per device per T. The cumulative count is filtered by one or more logon types and login failure reasons. For each device in a network and the filtered logon type(s) and failure reason(s), the time series, y T = ∑ t∈T y t , where y t is a count of a failed login signal.

[0050] A fifth signal in some embodiments is a cumulative count of failed logins per device per T across a subset of one or more logon types and failure reasons. For each device in a network and set of logon type and failure reason time series, y T = ∑ t∈T y t , where y t is the count of a failed login signal. Thus, a difference between the fourth and fifth signal is that the fifth signal includes a subset of logon types relative to the fourth signal. For example, in some embodiments, the fifth signal indicates login failures having reasons associated with new (or unknown) usernames, and a logon type indicating a network connection (e.g. the login attempt was initiated from outside the monitored network or at least not via a console of the machine). [0051] A sixth signal in some embodiments is a total count of remote desktop protocol

(RDP) inbound public source addresses per device per time t. In other words, the sixth signal represents, in some embodiments, how many different source addresses are attempting RDP connections on a device during the time period t. Thus, the sixth signal provides, for each device in a network, RDP source addresses that are public. Thus, the sixth signal can be defined, in some embodiments, as y t = a count of total distinct IP source RDP addresses per t.

[0052] A seventh signal in some embodiments is a cumulative count of RDP inbound public source address per device . For each device in a network and RDP network connection, y T = ∑ t∈T y t where y t is the count of RDP inbound connections. The seventh signal accumulates the count for a longer period of time than the sixth signal. For example, whereas the sixth signal accumulates a count for a period of one hour in some embodiments, the seventh signal accumulates the count for a period of one, two, or three days, or even longer.

[0053] An eighth signal in some embodiments is a cumulative failed and successful sign in rate per username per time period t. In some embodiments, this eighth signal is generated per device and per each username used in an attempted login. Thus, in some embodiments let y T = (yT , ƒ fail , V T success ) be the bivariate cumulative number of failed and successful login cumulated across each time window up to T, across all devices in network. Let U T be the count of distinct username on a device where y Ti success = 0, that is, the user has never successfully authenticated to any device in the network. Thus, in some embodiments, the eighth signal represents a time series of a cumulative count of usernames attempting to login to a device but never successfully doing so. This eighth signal is generated, in at least some embodiments,

[0054] A ninth signal in some embodiments is a username last time successful sign in per device per T. Per each device and per each username, let y T = (T 1 , ... , T k ) be the k most recent distinct times with at least one successful login. If this does not exist this time series is set to empty. [0055] A tenth signal in some embodiments characterizes IP address abuse per T. Per each IP address, if available, let y T _ d be the reputation score of the IP address associated with suspicious activity (brute force, scanning, etc.) over the time period T and last to- days. Otherwise, y T _ d is a multivariate time series consisting of the following:

• Region associated with source IP address. Some embodiments assign a reputation score to geographic regions. A reputation of a source IP address is determined, in some embodiments, based at least in part on a reputation associated with a region of origin. In these embodiments, a second component of the reputation is based on the source IP address itself. The region of a source IP address is weighted more heavily, in some embodiments, if a history associated with the source IP address is less robust. In other words, if reputation information associated with the source IP address meets a first criterion, a first weight is assigned to a reputation of the source IP address region, and a second weight is assigned to the reputation of the source IP address itself. An overall reputation is then determined based on the first weight and second weight. If the source IP address reputation meets a second criterion, a third weight is assigned to the reputation of the source IP address region, and a fourth weight is assigned to the reputation of the source IP address itself. The third weight is lower than the first weight in some embodiments. The fourth weight is higher than the second weight in some embodiments.

• A count of devices with multiple inbound RDP connections and failed logins.

• A count of enterprises where at least one device had multiple inbound RDP connection and failed logins.

• A count of distinct IP addresses that also occurred dining an RDP connection with failed login over the time window on a given device

[0056] For many cybersecurity problems, including detecting brute force attacks, previously labeled data is not usually available. Thus, training a supervised learning model is not feasible. This is where unsupervised learning is helpful, enabling one to discover and quantify unknown behaviors when examples are too sparse. Given that several of the signals we consider for modeling RDP brute force attacks are inherently dependent on values observed over time (for example, daily counts of failed logins and counts of inbound connections), time series models are particularly beneficial. Specifically, time series anomaly detection naturally provides a logical framework to quantify uncertainty in modeling temporal changes in data and produce probabilities that then can be ranked and compared to a predetermined threshold to control a desirable false positive rate.

[0057] Time series anomaly detection captures the temporal dynamics of signals and accurately quantifies the probability of observing values at any point in time under normal operating conditions. More formally, if we introduce the notation Y(t) to denote the signals taking on values at time t, then we build a model to compute reliable estimates of the probability of Y(t) exceeding observed values given all known and relevant information, represented by P[y(t)], sometimes called an anomaly score. Given a false positive tolerance rate r (e.g., .1% or 1 out of 10K per time), for each time t, values y*(t) satisfying P[y*(t)] < r would be detected as anomalous. Assuming the right signals reflecting the relevant behaviors of the type of attacks are chosen, then the idea is simple in that the lowest anomaly scores occurring per time will be likely associated with the highest likelihood of real threats.

[0058] For example, with respect to FIG. 2B discussed above, the time series of daily count of feiled login occurring on the brute force attack day 8/4/2019 had extreme values that would be associated with an empirical probability of about .03% out of all machine and days with at least 1 failed network login for the enterprise.

[0059] As discussed earlier, applying anomaly detection to a single or a few signals to detect real attacks can yield too many false positives. To mitigate this, we combined anomaly scores across the eight signals we selected to model RDP brute force attack patterns. The details of our solution are included in the appendix, but in summary, our methodology involves:

1) updating statistical discrete time series models sequentially for each signal, capturing time of day, day of week, and both point and cumulative effects

2) combining anomaly scores using an approach that yields accurate probability estimates, and

3) ranking the top N anomalies per day to control a desired number of false positives. Our approach to time series anomaly detection is computationally efficient, automatically learns how to update probabilities and adapt to changes in data. [0060] FIG. 6 shows a daily count of network failed login for a machine with no brute force attack. Parametric discrete location/scale distributions do not generate well- calibrated p-values for rare time series as seen in FIG. 6, and thus if used to detect anomalies can result in too many false positives when looking across many machines at high time frequencies. To overcome this challenge of sparse time series of counts of failed login attempts and RDP inbound public connections, some embodiments utilize a mixture model. In some of these embodiments, a zero-inflated two-component negative binomial distribution was utilized.

[0061] This formulation is based on thresholding values which are used to select a distribution to compare to the time series. An example of selecting a distribution is provided via Equation 1 below. The comparison of multiple distributions to a single time series based on threshold captures the multi-modal and heavy-tailed behavior of operational parameter values in a computationally efficient way. This represents an improvement over other approaches such as those that utilize expectation maximization. [0062] In some embodiments, hierarchical priors are given from empirical estimates of the sample moments across machines using about one month of data.

Equation 1: Zero-Adjusted Negative Binomial Threshold Model

Let y t be a univariate time series corresponding to one of the signals of brute force.

Negative Binomial Distribution (NB): [0063] This formulation does not yield a conjugate prior, and thus directly computing probabilities from the posterior predicted density is not done. Instead, anomaly scores are generated based on drawing samples from distributions and then computing the empirical right-tail p-value.

[0064] Updating parameters is done based on applying exponential smoothing, a sequential and computationally efficient method capable of weighting more recent data.

To avoid skewing of estimates based on outlier data, such as machines under brute force or other attacks, trimming is applied to sample from the distribution at a specified false positive rate, (e.g. .1% ). Algorithm 1 below demonstrates this.

[0065] The smoothing parameters were learned based on maximum likelihood estimation and then fixed during each new sequential update. To induce further uncertainty, bootstrapping across machines is done to produce a histogram of smoothing weights, and samples are drawn in accordance to their frequency. We found that weights concentrated away from 0 vary between .06% to 8% for over 90% of machines, thus leading to slow changes in the parameters. An extension using adaptive forgetting factors will be considered in fixture work to automatically learn how to correct smoothing real time.

[0066] To update model parameters, some embodiments utilize algorithm 1, discussed below. In algorithm 1, for each model parameter θ t , distribution of smoothing parameters file, and new value y t+∈ , the following steps are performed: 1) Apply an appropriate link function g θ to produce a continuous unbounded variable ηt : = gθ[θt];

2) Sample a smoothing weight

3) Update additive parameter using exponential smoothing:

4) Update parameter θ t+∈ = gθ -1 t+∈

[0067] FIG. 7 shows example ranking of detected RDP inbound brute force attacks. For each machine detected with a probable brute force attack, each instance is assigned TP, FP, or unknown, and each TP is assigned priority based on the severity of the attack. For high-priority TP, a targeted attack notification is sent, in some embodiments, to the associated enterprise with details and recommendations regarding the active brute force attack; otherwise the machine is closely monitored until more information is available. [0068] Some of the disclosed anomaly detection embodiments provide extra capability of sending targeted attack notifications to organizations when attacks are detected. In many cases, these notifications are provided before the brute force attack succeeds or the actor was able to conduct further malicious behavior. Experimental results indicate that the average precision per day, that is, true positive rate, was approximately 93.7% at a conservative false positive rate of 1%.

[0069] Via selection of signals found to be highly associated with RDP brute force attacks, time series anomaly detection is very accurate in identifying real threats.

Monitoring suspicious activity in failed login and network connections should be taken seriously, and real time anomaly detection capable of self-updating with the changing dynamics in a network can indeed provide a sustainable solution.

[0070] Some embodiments employ hierarchical zero adjusted negative binomial dynamic threshold models to capture the characteristics of the highly discrete count time series. Specifically, as shown in FIG. 2B, many examples of brute force attacks do not demonstrate failed logins for valid credentials on a local machine, and hence, there are excess zeros that would not be explained by standard probability distributions such as the negative binomial. Also, the variance of non-zero counts can be much larger than the mean, where for example, valid scripts connecting via RDP can generate counts in the twenties or more over several minutes because of an outdated password. Moreover, given a combination of multiple users or scripts connecting to shared machines at the same time, this can generate more extreme counts at higher quantiles resulting in heavier tails as seen in FIG. 6.

[0071] FIG. 8 is a diagram showing data flow occurring in one or more of the disclosed embodiments. Data flow 800 of FIG. 8 shows operation of two separate mixture models operating on two separate time series 802A and 802B. Each model includes a plurality to distributions (e.g. 804A-C in mixture model 1, and 804D-F in mixture model 2). Each mixture model also threshold values defining which of the plurality of distributions is applied to time series values (shown as value ranges 805A-C and ranges 805D-F). Each mixture model also includes a threshold mixture component which compares current time series values to the selected distribution. Each of the mixture models generates an indication of anomaly, shown in FIG. 8 as 812A and 812B. These indications of anomaly are then combined, by a combiner 814 to generate a single indication of anomaly or anomaly score 816.

[0072] In particular, data flow 800 includes a plurality of time series 802A and 802B. Each of the time series 802A and 802B indicates operational parameter values. Each of the time series 802A and 802B is provided to a selection component 803 A and 803B respectively. The selection components 803A-B select between a plurality of distributions based on operational parameter values of the respective time series and a range associated with each distribution in the plurality of distributions. For example, FIG. 8 shows value ranges 805A-C associated with distributions 804A-C respectively. FIG. 8 also shows ranges 805d-f associated with distributions 804d-f respectively. The selection component

803A selects one of distributions 804A-C for time series 802A based on value ranges 805A-C. Selection component 803B selects one of distributions 804D-F for time series 802B based on ranges 805D-F. The value ranges 805A-C and 805D-F are dynamically updated by some embodiments of this disclosure. For example, some embodiments update the value ranges 805A-C and 805D-F based on values of the time series 802A and

802B respectively. This dynamic update process is discussed further below.

[0073] The selected distribution (809A for time series 802A, and distribution 809B for time series 802B) is provided to a respective threshold mixture model 808A-B. Each distribution P(Y | Θ] is modeled, in some embodiments, as a finite mixture of distributions. The distributions of the mixture are obtained from a list of distributions Pi[Y I Oi],...Pm[Y I Om], where each parameter Op is a stochastic process.

[0074] Each threshold mixture model 808A-B compares the respective time series to its selected distribution (809A for time series 802A and selected distribution 809B for time series 802B), and outputs an indication of anomaly, shown as 812A-B respectively. The indications of anomaly 812A-B are combined by a combiner 814 into a single indication of anomaly or anomaly score 816. In some embodiments, the combiner 814 employs Fisher’s method, as discussed further below, to combine the indications of anomaly 812A-

B. In some embodiments, the single indication of anomaly or anomaly score 816 indicates whether a network is experiencing a brute force attack.

[0075] While FIG. 8 shows two time series, two sets of distributions (e.g. 804A-C and 804D-F), two selectors (803A-B), and two threshold mixture models (e.g. 808a-b), some of the disclosed embodiments rely on more than two of each of these components. For example, as discussed above, any number of time series (e.g. input signals), such as three (3), four(4), five (5), ten (10), fifteen (15), twenty (20) or any number may be processed, compared, and combined (e.g. via the combiner 814) in the contemplated embodiments. [0076] FIG. 8 shows two groups of distributions as 804A-C and 804D-F. FIG. 8 demonstrates a selection of one of distributions 804A-C to compare to time series 802A, and a second selection of one of distributions 804D-F to compare to time series 802B. In some embodiments, the two groups of distributions 804A-C and 804D-F are equivalent.

In some embodiments, the two groups of distributions 804A-C and 804D-F overlap. In some embodiments, the two groups of distributions 804A-C and 804D-F overlap, but are not equivalent. This can also be the case for larger than two groups of distributions, such as three (3), four (4), five (5), ten (10), twenty (20), or any number of distribution groups. [0077] FIG. 8 also shows an updater 851. The updater 851 monitors operational parameter values in one or more of the time series 802A or the time series 802B (this is not shown in FIG. 8). As illustrated, the updater 851 is shown dynamically updating the mixture model for the time series 802A based on the monitored operational parameter values. In particular, the updater 851 is shown generating new parameter values Θ for each of the distributions 804A-C included in a particular mixture model. These parameter values are represented in FIG. 8 as data 826 via the symbol Θ. The updater 851 also updates threshold values that define boundaries between distributions of the mixture model. For example, the threshold values define boundaries between distributions 804A-

C. These boundaries are represented in FIG. 8 by the different value ranges 805A-C. FIG. 8 shows updated values for τ via data flow 824 defining the value ranges 805A-C.

The updater 851 also updates two-state homogenous hidden Markov models represented by the symbol “C” in FIG. 8, via data flow 822. While FIG. 8 shows the updater 851 being applied only to the top mixture model that is processing time series 802A, in at least some embodiments, the updater is similarly applied to two or more mixture models utilized to detect a brute force attack. For example, the updater 851 could also be applied to update the second mixture model that processes the time series 802B, however, this has been omitted from FIG. 8 to preserve figure clarity. Specific methods utilized by the updater 851 are discussed in more detail below. [0078] FIG. 9 shows an example data flow implemented in one or more of the disclosed embodiments. The data flow 850 of FIG. 9 illustrates how the updater 851 processes data in one or more of the disclosed embodiments. FIG. 9 shows a time series of operational parameter values 852 flowing into a historical database 854. Historical time series data 857 is then provided to the updater 851. Based on the historical time series data, the updater generates data values 858 (τ ), 860 (C), and 862 (Θ). Data value 862 is a parameter at least partially defining a distribution of the historical time series data 857.

In some embodiments, multiple parameter values are generated. Data value 860 is two state homogenous hidden Markov model, defined by a probability that the time series data (e.g. 857) is greater than the value τ of data value 858. Referring back to FIG. 8, data values 858, 860, and 862 generated by the updater 851 are shown flowing to control the value ranges 805A-C, the distribution 804A-C, and the threshold mixture models 808A-C. [0079] FIG. 9 shows the r values in data value 858, two state markov model values in data value 860, and parameter values in data value 862 being stored in a mixture model parameters data store 864. The mixture model parameters data store 864 is referenced, in some embodiments, by a scoring process 866. The scoring process 866 implements at least portions of data flow 800, discussed above with respect to FIG. 8.

[0080] The updater 851 operates, in at least some embodiments, on multiple time series obtained by the disclosed embodiments and multiple corresponding mixture models. One implementation of the updater 851 is discussed below with respect to FIG. 11.

[0081] FIG. 10 is a flowchart of a process for determining whether a network is experiencing a brute force attack. In some embodiments, one or more of the functions discussed below with respect to FIG. 10 are performed by hardware processing circuitry (e.g. 1202). In some embodiments, instructions (e.g. 1224) stored in a memory (e.g. 1204, 1206) configure the hardware processing circuitry to perform one or more of the functions discussed below. In some embodiments, process 1000 discussed below is performed by a network management system or a network security system (e.g. 110).

[0082] After start operation 1005, process 1000 moves to operation 1010, where a first time series of operational parameter values are received. The first time series of operational parameter values relate to operation of a device attached to a network. For example, as discussed above with respect to FIG. 1 , operations parameter values of one or more of the network components 102A-F are provided to the network security system 110. In some embodiments, operation 1010 includes receiving one or more messages indicating the operational parameter values (e.g. any one or more of messages 122A-C). In various embodiments, multiple different time series of different operational parameter values are received. The multiple different time series are received from a single device and/or multiple devices in some embodiments. For example, as discussed above, one or more network component devices provide or transmit time series indicating operational parameter values to a network management system or network security system (e.g. 110). The received time series of operational parameter values indicate one or more of failed login and RDP connections per hour of day and day of week, time delays between one or more failed login and a successful logon, logon type (e.g. Event ID 4625), failure reason (e.g. Event ID 4625), cumulative count of each distinct username that failed to login without success, a count (and cumulative count) of failed logins, a count (and cumulative count) of RDP external IP addresses (e.g. for inbound RDP connections). These time series of operational parameter values are discussed in more detail above.

[0083] In operation 1020, the operational parameter values in the first time series are compared to a first parameter value range and a second parameter value range. For example, as discussed above, some of the disclosed embodiments employ a multi-modal distribution (mixture model) approach to analyzing operational parameter values.

[0084] For example, a plurality of time series of operational parameter values received in operation 1010 can be represented via y1...yk Each time series is a subset of time series from the same given data generating process where each can be a different probability distribution family. Each distribution is modeled as a finite mixture of distributions specified from a list of distributions P where each parameter Op is a stochastic process.

[0085] Some of the disclosed embodiments utilize a switching mechanism that is based on learned threshold values of y exceeding specified quantiles. The quantiles are defined, in at least some embodiments, via Equation 1 below:

Let q p ∈ (0, 1) be a chosen q th quantile Q p of Ρ ρ [Y|θ ρ ], and define r ρ : =

E[YIY > Q p-1 ], such that r ρ > τ ρ+1 .

Let C p be a two-state homogenous hidden Markov model, defined by the probability that Y >= τ ρ .

[0086] The dynamic threshold mixture density is defined in some embodiments as follows:

P tbesh [ Y IΦ = (Φ,r,C)] := P1[Y]Φ1 ] * P[C 1 ] if Y <= r ι ;

Ρ 2 [Υ|Φ 2 ] * P[C 2 ] If r 1 < Y <= r 2 ; (1)

··· > Pm[Y|Φ m ] * P[C m ] otherwise;

[0087] Thus, in operation 1030, from a plurality of possible distributions to apply to the time series, one distribution is selected based on a parameter value range of the time series. The parameter value ranges for each of the possible distributions is dynamically adjusted in some embodiments as discussed further below (e.g. with respect to FIG. 11). Thus, in operation 1030, a first distribution from the plurality of possible distributions is selected based on the comparisons. For example, if the time series values are generally within the first parameter value range, a first distribution is selected, whereas if the time series values are generally within the second parameter value range, a second distribution is selected, at least in some embodiments. If the first distribution is selected, the selected first distribution is the used to determine whether the time series represents anomalous activity, or a degree of anomalous activity present in the time series.

[0088] In some embodiments, time series values are generally within the first parameter value range if the values fall within the first parameter value range after outliers or exceptional values are eliminated or not considered. For example, some embodiments may determine a percentage of time series values that fall within each of the possible parameter value ranges, with a range including the highest percentage of values used in selecting an appropriate distribution.

[0089] In operation 1040, the first time series is compared to the selected distribution. For example, in some embodiments, a percentage of operational parameter values that conform with the selected distribution is determined. In embodiments that process a plurality of time series, values of each time series are compared to a corresponding distribution selected based on the respective time series (e.g. selected distribution 809A is compared to time series 802A and selected distribution 809B is compared to time series 802B in FIG. 8).

[0090] As discussed above, process 1000 compares values of one or more time series (e.g. time series 802A and time series 802B) to a corresponding set of parameter value ranges (e.g. value ranges 805A-C and value ranges 805D-F respectively), or a first parameter value range and second parameter value range for that particular time series. Distributions specific for each time series are then selected based on which range the values of the respective time series fell (e.g., as illustrated in FIG. 8, a plurality of distributions 804A-C are considered for selection based on corresponding value ranges 805A-C and values of the time series 802A).

[0091] In operation 1050, an indication of anomaly is determined based on the comparison of the first time series with the selected distribution (e.g., as performed when the threshold mixture model 808A compares time series 802A to selected distribution 809 A). Some embodiments determine a percentage of operational parameter values of the first time series that fall outside the selected distribution. The percentage is the indication of anomaly. In other embodiments, a probability of occurrence of each value in the time series is determined based on the selected distribution. These probabilities are then aggregated to determine the indication of anomaly. In some embodiments, the aggregation is performed by multiplying the probability of occurrence of each of the values. The resulting aggregated probability is then normalized in some embodiments.

[0092] Note that some embodiments of process 1000 operate on multiple different time series of operational parameter values, and multiple mixture models. For example, as described above with respect to FIG. 8, some embodiments combine a plurality of anomaly indications (e.g. 812A and 812B) generated by a corresponding plurality of mixture models via a combination process (e.g. combiner 814) to generate a combined indicator of anomaly, or in other words, a single indication of anomaly or anomaly score (e.g. 816). Some embodiments utilize Fisher’s method to combine the plurality of indications of anomaly into a single anomaly indication. For example, some embodiments operate as follows: Let α T ,1 , ... , α Τ,N be the collection of available scores. These embodiments then compute where χ τ is the combined indicator of anomaly for purposes of the discussion of process 1000. Fisher showed that if each a T l is independent with a uniform distribution between 0 and 1, then χ τ would have a chi-square distribution with 2N degrees of freedom. The use of Fisher’s test applied to anomaly scores produces a scalable solution that yields interpretable probabilities that thus can be controlled to achieve a desired false positive rate.

[0093] In operation 1060, based on the indication of anomaly, a likelihood of a brute force attack on the network is determined. For example, in some embodiments, if the indication of anomaly is above a predetermined threshold, process 1000 determines that the network is experiencing a brute force attack. Some embodiments maintain a moving average of indications of anomaly, and determine the network is experiencing a brute force attack if the moving average of anomaly indications transgresses a predetermined threshold. [0094] Upon determination that a brute force attack is likely occurring, additional forensics processing is performed in some embodiments to identify one or more offending computers that are performing the brute force attack. These identified computers are located outside the network (e.g. external devices 104A-C), or in some embodiments, may be internal to the network (e.g. in some cases, malware installed on a computer within the network may provide a “proxy” for a malicious actor to perform a brute force attack on the network).

[0095] In operation 1070, the brute force attack is mitigated based on the likelihood. Some embodiments mitigate a brute force attack by determining a mitigation action based on the likelihood. A mitigating action can include controlling network access in some embodiments. For example, in some embodiments, if the likelihood is higher than a predetermined threshold, one or more access control policies of a firewall (e.g. firewall 108) or multiple firewalls controlling access to a network (e.g. 101) may be adjusted. For example, more restrictive firewall polices may be applied during a brute force attack (e.g. initiated by the network security system 110). In some embodiments, packets received from devices identified as participating in the brute force attack (e.g. any one or more of external devices 104A-C) are dropped at an access point to the network (e.g. one or more firewalls, such as firewall 108). This is accomplished in some embodiments, by modifying access policies of the firewalls to restrict access to the network by the identified devices. In some embodiments, one or more alerts are generated based on the likelihood. The alerts may be generated, in various embodiments, via text, email, or other messaging technology.

[0096] In operation 1080, the first parameter value range and the second parameter value range (e.g. value range 805A and value range 805B) are updated based on the operational parameter values of the first time series (e.g. 802A). Updating the first parameter value range and the second parameter value range adjusts at least a boundary between the first parameter value range and second parameter value range. To the extent that a mixture model includes more than two distributions, multiple boundaries are adjusted in operation 1080.

[0097] For example, given samples y from P[Y| φ] (e.g. the first time series), estimation of the parameters φ proceeds in two stages, at least in some embodiments. In a first stage, parameters are initialized. In some embodiments, Gibbs sampling is used to initialize the parameters. This computes the conditional likelihood for each parameter under a specified choice of hierarchical priors. The first stage is performed before operation 1010 is performed in some embodiments. In a second stage, each parameter is updated using generalized exponential smoothing with a selected grid of smoothing weights α Φ for each parameter φ with a link function g Φ . Thus, for each parameter φ, the link function g Φ is applied to yield an additive term A smoothing weight is then sampled The additive parameter is updated according to: where Μψ is the central moment corresponding to the parameter Φ.

[0098] In some embodiments, the updated parameter is set to, After the parameters of the distribution are updated, an updated posterior distribution φ t+1 , and for each new sample y t+1 , a right tail pvalue is computed. In some embodiments, the right tail pvalue is computed according to: where: π t+1 is the right tail pvalue for the updated distribution, μ is a probability measure; for discrete data, this is the counting measure, else it is the Lebesgue measure, φ t+1 is an updated posterior distribution, y t+1 is a new sample value,

P thresh is a dynamic threshold mixture density. In some embodiments, Pthres is defined as where:

Cp is a two-state homogenous hidden Markov model, defined by the probability that Y >= τ ρ , q p € (0,1), and q p is a qth quantile Qp of P P [Y|G)p],

E[] is an expected value function. [0099] One embodiment of operation 1080 is described below with respect to FIG. 11.

After process 1000 completes, process 1000 moves to end operation 1085.

[00100] FIG. 11 is a flowchart of a process for estimating parameters of an unknown distribution. In some embodiments of the present disclosure, the unknown distribution is a distribution of events occurring on a monitored network. This unknown distribution is dynamically estimated by estimating parameters of the distribution. Based on this estimation process, the disclosed embodiments are able to determine a probability of individual events occurring within the distribution. A brute force attack is then determined, in at least some embodiments, based on the determined probability.

[00101] The unknown distribution is described by a plurality of parameters. Each parameter describes a characteristic of the unknown distribution. For example, the plurality of parameters describe, in various embodiments, two or more of a standard deviation, mean, median of the distribution. In at least some embodiments, the distribution is modeled as a finite mixture of distributions specified from among a list of distributions Pi[Y|Oi],...,Pm[Y|Om], where each parameter Op is a stochastic process. [00102] Process 1100 discussed below with respect to FIG. 11 determines updated distribution parameters, including ranges for selecting a distribution as discussed above.

In some embodiments, process 1100 discussed below is performed by the updater 851 discussed above with respect to FIG. 9.

[00103] In some embodiments, process 1100 is performed for multiple time series obtained by the disclosed embodiments. For example, with respect to FIG. 8 discussed above, process 1100 is performed, in some embodiments, to estimate parameters of distributions for time series 802A, and separately performed to estimate parameters of distributions for time series 802B. A collection of time series is denoted as Yi . Yk.

Thus, each Yi = { y« | j = 1 ..ru} is a subset of time series from a given data generating process P[Yi jOi], where each P[.|Oi] can be a different probability distribution family. [00104] In some embodiments, one or more of the functions discussed below with respect to FIG. 11 are performed by hardware processing circuitry (e.g. 1202). In some embodiments, instructions (e.g. 1224) stored in a memory (e.g. 1204, 1206) configure the hardware processing circuitry to perform one or more of the functions discussed below. In some embodiments, process 1100 discussed below is performed by a network management system or a network security system (e.g. 110). In some embodiments, one or more of the functions discussed below with respect to FIG. 11 are performed as part of operation 1080, discussed above with respect to FIG. 10.

[00105] After start operation 1102, process 1100 moves to operation 1104, which initializes parameters of the distribution. In some embodiments, the parameters are estimated using Gibbs sampling. For example, some embodiments compute a conditional likelihood for each parameter under a selected set of hierarchical priors for the parameter. The parameter is then initialized based on the conditional likelihood.

[00106] In operation 1106, a parameter of the plurality of parameters is selected. In operation 1110, an additive parameter corresponding to the selected parameter is obtained via a link function. For example, in some embodiments, the additive parameter t¾ is obtained via: where: g# is a link function of parameter φ.

[00107] In operation 1120, a smoothing weight is sampled. In operation 1130, the additive parameter is updated based on the smoothing weight and a central moment corresponding to the parameter. For example, in some embodiments, the additive parameter is updated via : where: is the smoothing weight, y t+€ is the time series at time t+€ , and Μ φ is a central moment corresponding to the parameter Φ.

[00108] In operation 1140, the selected parameter is updated based on an inverse link function of the updated additive parameter. In some embodiments, the selected parameter is updated according to: where: φ t+€ is the updated parameter, gΦ -1 is the inverse of the link function, and is the updated additive parameter.

[00109] Operation 1170 determines if there are additional parameters of the distribution. If so, processing returns to operation 1106. Otherwise, processing moves to end block 1185. [00110] Note that at least some embodiments of FIG. 11 function to define or otherwise update values ranges used to select a distribution in a multi-modal model. For example, as discussed above, boundaries delineating a first parameter value range and a second parameter value range are adjusted in some embodiments of process 1100 based on values of a time series. A likelihood of a brute force attack is then determined based on the adjusted boundary. As illustrated above with respect to FIG. 8, adjusting the boundary between value ranges controls which distribution is compared to the time series at any particular time. The comparison of the time series with the distribution determines how likely the values included in the time series occur within the selected distribution. This likelihood may be combined, in some embodiments, with other likelihoods of values from other time series (e.g. combined via Fisher’s method).

[00111] FIG. 12 illustrates a block diagram of an example machine 1200 upon which any one or more of the techniques (e.g., methodologies) discussed herein may perform. In alternative embodiments, the machine 1200 may operate as a standalone device or may be connected (e.g., networked) to other machines. In a networked deployment, the machine

1200 may operate in the capacity of a server machine, a client machine, or both in server- client network environments. In an example, the machine 1200 may act as a peer machine in peer-to-peer (P2P) (or other distributed) network environment. The machine 1200 may be a personal computer (PC), a tablet PC, a set-top box (STB), a personal digital assistant (PDA), a mobile telephone, a smart phone, a web appliance, a network router, switch or bridge, a server computer, a database, conference room equipment, or any machine capable of executing instructions (sequential or otherwise) that specify actions to be taken by that machine. In various embodiments, machine 1200 may perform one or more of the processes described above with respect to FIGs. 1-11 above. Further, while only a single machine is illustrated, the term “machine” shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein, such as cloud computing, software as a service (SaaS), other computer cluster configurations.

[00112] Examples, as described herein, may include, or may operate on, logic or a number of components, modules, or mechanisms (all referred to hereinafter as

“modules”). Modules are tangible entities (e.g., hardware) capable of performing specified operations and may be configured or arranged in a certain manner. In an example, circuits may be arranged (e.g., internally or with respect to external entities such as other circuits) in a specified manner as a module. In an example, the whole or part of one or more computer systems (e.g., a standalone, client or server computer system) or one or more hardware processors may be configured by firmware or software (e.g., instructions, an application portion, or an application) as a module that operates to perform specified operations. In an example, the software may reside on a machine readable medium. In an example, the software, when executed by the underlying hardware of the module, causes the hardware to perform the specified operations.

[00113] Accordingly, the term “module" is understood to encompass a tangible entity, be that an entity that is physically constructed, specifically configured (e.g., hardwired), or temporarily (e.g., transitorily) configured (e.g., programmed) to operate in a specified manner or to perform part or all of any operation described herein. Considering examples in which modules are temporarily configured, each of the modules need not be instantiated at any one moment in time. For example, where the modules comprise a general-purpose hardware processor configured using software, the general-purpose hardware processor may be configured as respective different modules at different times. Software may accordingly configure a hardware processor, for example, to constitute a particular module at one instance of time and to constitute a different module at a different instance of time. [00114] Machine (e.g., computer system) 1200 may include a hardware processor 1202 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), a hardware processor core, or any combination thereof), a main memory 1204 and a static memory 1206, some or all of which may communicate with each other via an interlink (e.g., bus)

1208. The machine 1200 may further include a display unit 1210, an alphanumeric input device 1212 (e.g., a keyboard), and a user interface (UI) navigation device 1214 (e.g., a mouse). In an example, the display unit 1210, input device 1212 and UI navigation device 1214 may be a touch screen display. The machine 1200 may additionally include a storage device (e.g., drive unit) 1216, a signal generation device 1218 (e.g., a speaker), a network interface device 1220, and one or more sensors 1221, such as a global positioning system (GPS) sensor, compass, accelerometer, or another sensor. The machine 1200 may include an output controller 1228, such as a serial (e.g., universal serial bus (USB), parallel, or other wired or wireless (e.g., infrared (IR), near field communication (NFC), etc.) connection to communicate or control one or more peripheral devices (e.g., a printer, card reader, etc.).

[00115] The storage device 1216 may include a machine readable medium 1222 on which is stored one or more sets of data structures or instructions 1224 (e.g., software) embodying or utilized by any one or more of the techniques or functions described herein. The instructions 1224 may also reside, completely or at least partially, within the main memory 1204, within static memory 1206, or within the hardware processor 1202 during execution thereof by tire machine 1200. In an example, one or any combination of the hardware processor 1202, the main memory 1204, the static memory 1206, or the storage device 1216 may constitute machine readable media.

[00116] While the machine readable medium 1222 is illustrated as a single medium, the term "machine readable medium" may include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) configured to store the one or more instructions 1224. [00117] The term “machine readable medium” may include any medium that is capable of storing, encoding, or carrying instructions for execution by the machine 1200 and that cause the machine 1200 to perform any one or more of tire techniques of the present disclosure, or that is capable of storing, encoding or carrying data structures used by or associated with such instructions. Non-limiting machine readable medium examples may include solid-state memories, and optical and magnetic media. Specific examples of machine readable media may include: non-volatile memory, such as semiconductor memory devices (e.g., Electrically Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM)) and flash memory devices; magnetic disks, such as internal hard disks and removable disks; magneto-optical disks; Random Access Memory (RAM); Solid State Drives (SSD); and CD-ROM and

DVD-ROM disks. In some examples, machine readable media may include non-transitory machine readable media. In some examples, machine readable media may include machine readable media that is not a transitory propagating signal.

[00118] The instructions 1224 may further be transmitted or received over a communications network 1226 using a transmission medium via the network interface device 1220. The machine 1200 may communicate with one or more other machines utilizing any one of a number of transfer protocols (e.g., frame relay, internet protocol (IP), transmission control protocol (TCP), user datagram protocol (UDP), hypertext transfer protocol (HTTP), etc.). Example communication networks may include a local area network (LAN), a wide area network (WAN), a packet data network (e.g., the internet), mobile telephone networks (e.g., cellular networks), Plain Old Telephone (POTS) networks, and wireless data networks (e.g., Institute of Electrical and Electronics Engineers (IEEE) 802.11 family of standards known as Wi-Fi®, IEEE 802.16 family of standards known as WiMax®), IEEE 802.15.4 family of standards, a Long Term Evolution (LTE) family of standards, a Universal Mobile Telecommunications System (UMTS) family of standards, peer-to-peer (P2P) networks, among others. In an example, the network interface device 1220 may include one or more physical jacks (e.g., Ethernet, coaxial, or phone jacks) or one or more antennas to connect to the communications network 1226. In an example, the network interface device 1220 may include a plurality of antennas to wirelessly communicate using at least one of single-input multiple-output (SIMO), multiple-input multiple-output (MIMO), or multiple-input single-output (MISO) techniques. In some examples, the network interface device 1220 may wirelessly communicate using Multiple User MIMO techniques. [00119] Example 1 is a method performed by hardware processing circuitry, comprising: obtaining a first time series of operational parameter values of a device attached to a network; comparing the values of the first time series to a first parameter value range and a second parameter value range; determining, based on the comparing, that the values of the first time series are within the first parameter value range; based on the determining, selecting, from a plurality of distributions, a first distribution; comparing the first time series to the selected first distribution; determining, based on the comparing, a first probability at which values in the first time series occur in the selected distribution; determining, based on the first probability, a likelihood of a brute force attack on the network; based on the first time series, adjusting a boundary between the first parameter value range and the second parameter value range; and determining, based on the adjusted boundary, a second likelihood of a brute force attack.

[00120] In Example 2, the subject matter of Example 1 optionally includes performing, based on the likelihood, a mitigating action.

[00121] In Example 3, the subject matter of Example 2 optionally includes wherein the mitigating action includes changing an access control policy of the network.

[00122] In Example 4, the subject matter of Example 3 optionally includes wherein the changing of the access control policy comprises programmatically configuring a firewall of the network.

[00123] In Example 5, the subject matter of any one or more of Examples 1-4 optionally include wherein the adjusting of the boundary comprises updating a threshold value r_ρ defining the boundary between the first range and the second range, r_ρ defined according to: r_p:=E[Y|Y>Q_(p-1)] where: E[] is an expected value function, Y is the first time series, tp is a threshold value between the first range and the second range, Qp is a qth quantile of a negative binomial distribution. [00124] In Example 6, the subject matter of any one or more of Examples 1-5 optionally include wherein the adjusting of the boundary further comprises updating parameters defining the first and second distributions based on the first time series.

[00125] In Example 7, the subject matter of any one or more of Examples 1-6 optionally include wherein the parameters are updated via exponential smoothing and a grid of smoothing weights.

[00126] In Example 8, the subject matter of Example 7 optionally includes where at least one parameter of the distribution is updated according to: φt+€ = gΦ-1[η -t+€ ], where: φt+€ is the updated parameter, gφ is a link function for a parameter where: Μ Φ is a central moment corresponding to the parameter is a smoothing weight, is a sample value included in the first time series.

[00127] In Example 9, the subject matter of Example undefined optionally includes , wherein the first distribution and the second distribution are included in the finite mixture of distributions.

[00128] In Example 10, the subject matter of any one or more of Examples 1-9 optionally include determining a second time series for second operational parameter values of the device; selecting, based on tire second time series, a second distribution; second comparing the second time series to the selected second distribution; determining, based on the second comparing, a second probability at which second values in the second time series occur in the selected second distribution; applying Fisher’s method to the first probability and the second probability; and based on the applying, generating a combined indicator of anomaly, wherein the determining of the likelihood of the brute force attack is further based on the combined indicator. [00129] In Example 11, the subject matter of Example 10 optionally includes wherein the first time series indicates a cumulative number of logins over time that are successful, and the second time series indicates a cumulative number of logins over time that are unsuccessful.

[00130] In Example 12, the subject matter of Example 11 optionally includes generating a third time series indicating user account information associated with each of the logins indicated by the first and second time series, and second applying Fisher’s method to the third time series, wherein the generating of the combined indicator is further based on the second applying.

[00131] In Example 13, the subject matter of any one or more of Examples 11-12 optionally include wherein the logins are remote desktop protocol (RDP) login attempts. [00132] In Example 14, the subject matter of any one or more of Examples 1-13 optionally include wherein the first time series indicates one or more of hour of day and day of week of failed login and RDP connections, timing of successful logins following failed attempts, login type information, login failure reasons, a cumulative count of unique usernames that generated an unsuccessful login without a subsequent successful login, a count of failed login attempts, a count of RDP external IP addresses per device, and counts of two devices having RDP inbound connections from a common device.

[00133] Example 15 is a non-transitory computer readable medium comprising instructions that when executed configure hardware processing circuitry to perform operations comprising: obtaining a first time series of operational parameter values of a device attached to a network; comparing the values of the first time series to a first parameter value range and a second parameter value range; determining, based on the comparing, that the values of the first time series are within the first parameter value range; based on tire determining, selecting, from a plurality of distributions, a first distribution; comparing the first time series to the selected first distribution; determining, based on the comparing, a first probability at which values in the first time series occur in the selected distribution; determining, based on the first probability, a likelihood of a brute force attack on the network; based on the first time series, adjusting a boundary between the first parameter value range and the second parameter value range; and determining, based on the adjusted boundary, a second likelihood of a brute force attack.

[00134] In Example 16, the subject matter of Example 15 optionally includes the operations further comprising performing, based on the likelihood, a mitigating action. [00135] In Example 17, the subject matter of Example 16 optionally includes wherein the mitigating action includes changing an access control policy of the network.

[00136] In Example 18, the subject matter of Example 17 optionally includes wherein the changing of the access control policy comprises programmatically configuring a firewall of the network.

[00137] In Example 19, the subject matter of any one or more of Examples 15-18 optionally include wherein the adjusting of the boundary comprises updating a threshold value r_ρ defining the boundary between the first range and the second range, r_ρ defined according to: r_p:=E[Y|Y>Q_(p-l)] where: E[] is an expected value function, Y is the first time series, tp is a threshold value between the first range and the second range, Qp is a qth quantile of a negative binomial distribution. [00138] In Example 20, the subject matter of any one or more of Examples 15-19 optionally include wherein the adjusting of the boundary further comprises updating parameters defining the first and second distributions based on the first time series. [00139] In Example 21, the subject matter of any one or more of Examples 15-20 optionally include wherein the parameters are updated via exponential smoothing and a grid of smoothing weights.

[00140] In Example 22, the subject matter of Example 21 optionally includes wherein at least one parameter of the distribution is updated according to: φt+€ = gΦ-1[ηt+€], where: φt+€ is the updated parameter, gφ is a link function for a parameter φ, η_(t+e) = α _θ*Μ_Φ [y_(t+€) |η_t]+( 1 -α ~ _θ)*η_t , where: Μ Φ is a central moment corresponding to the parameter φ, α _θ is a smoothing weight, ηt is gφ[φt], y_(t+€) is a sample value included in the first time series.

[00141] In Example 23, the subject matter of Example undefined optionally includes , wherein the first distribution and the second distribution are included in the finite mixture of distributions.

[00142] In Example 24, the subject matter of any one or more of Examples 15-23 optionally include the operations further comprising: determining a second time series for second operational parameter values of the device; selecting, based on the second time series, a second distribution; second comparing the second time series to the selected second distribution; determining, based on the second comparing, a second probability at which second values in the second time series occur in the selected second distribution; applying Fisher’s method to the first probability and the second probability; and based on the applying, generating a combined indicator of anomaly, wherein the determining of the likelihood of the brute force attack is further based on the combined indicator. [00143] In Example 25, the subject matter of Example 24 optionally includes wherein the first time series indicates a cumulative number of logins over time that are successful, and the second time series indicates a cumulative number of logins over time that are unsuccessful.

[00144] In Example 26, the subject matter of Example 25 optionally includes the operations further comprising generating a third time series indicating user account information associated with each of the logins indicated by the first and second time series, and second applying Fisher’s method to the third time series, wherein the generating of the combined indicator is further based on the second applying.

[00145] In Example 27, the subject matter of Example 26 optionally includes wherein the logins are remote desktop protocol (RDP) login attempts.

[00146] In Example 28, the subject matter of any one or more of Examples 15-27 optionally include wherein the first time series indicates one or more of hour of day and day of week of failed login and RDP connections, timing of successful logins following failed attempts, login type information, login failure reasons, a cumulative count of unique usernames that generated an unsuccessful login without a subsequent successful login, a count of failed login attempts, a count of RDP external IP addresses per device, and counts of two devices having RDP inbound connections from a common device.

[00147] Example 29 is a system, comprising: hardware processing circuitry; one or more hardware memories storing instructions that when executed configure the hardware processing circuitry to perform operations comprising: obtaining a first time series of operational parameter values of a device attached to a network; comparing the values of the first time series to a first parameter value range and a second parameter value range; determining, based on the comparing, that the values of the first time series are within the first parameter value range; based on the determining, selecting, from a plurality of distributions, a first distribution; comparing the first time series to the selected first distribution; determining, based on the comparing, a first probability at which values in the first time series occur in the selected distribution; determining, based on the first probability, a likelihood of a brute force attack on the network; based on the first time series, adjusting a boundary between the first parameter value range and the second parameter value range; and determining, based on the adjusted boundary, a second likelihood of a brute force attack.

[00148] In Example 30, the subject matter of Example 29 optionally includes the operations further comprising performing, based on the likelihood, a mitigating action. [00149] In Example 31 , the subject matter of Example 30 optionally includes wherein the mitigating action includes changing an access control policy of the network.

[00150] In Example 32, the subject matter of Example 31 optionally includes wherein the changing of the access control policy comprises programmatically configuring a firewall of the network. [00151] In Example 33, the subject matter of any one or more of Examples 29-32 optionally include wherein the adjusting of the boundary comprises updating a threshold value T_p defining the boundary between the first range and the second range, τ_ρ defined according to: r_p:=E[Y|Y>Q_(p-l)] where: E[] is an expected value function, Y is the first time series, tp is a threshold value between the first range and the second range, Qp is a qth quantile of a negative binomial distribution.

[00152] In Example 34, the subject matter of any one or more of Examples 29-33 optionally include wherein the adjusting of the boundary further comprises updating parameters defining the first and second distributions based on the first time series. [00153] In Example 35, the subject matter of any one or more of Examples 29-34 optionally include wherein the parameters are updated via exponential smoothing and a grid of smoothing weights.

[00154] In Example 36, the subject matter of Example 35 optionally includes wherein at least one parameter of the distribution is updated according to: φt+∈ = gΦ-1[ηt+∈], where: φt+∈ is the updated parameter, gφ is a link function for a parameter φ, η_(t+∈) = , where: Μ Φ is a central moment corresponding to the parameter is a smoothing weight, ηt is gφ[φt], y_(t+∈) is a sample value included in the first time series.

[00155] In Example 37, the subject matter of Example undefined optionally includes , wherein the first distribution and the second distribution are included in the finite mixture of distributions.

[00156] In Example 38, the subject matter of any one or more of Examples 29-37 optionally include the operations further comprising: determining a second time series for second operational parameter values of the device; selecting, based on the second time series, a second distribution; second comparing the second time series to the selected second distribution; determining, based on the second comparing, a second probability at which second values in the second time series occur in the selected second distribution; applying Fisher’s method to the first probability and the second probability; and based on the applying, generating a combined indicator of anomaly, wherein the determining of the likelihood of the brute force attack is further based on the combined indicator.

[00157] In Example 39, the subject matter of Example 38 optionally includes wherein the first time series indicates a cumulative number of logins over time that are successful, and the second time series indicates a cumulative number of logins over time that are unsuccessful. [00158] In Example 40, the subject matter of Example 39 optionally includes the operations further comprising generating a third time series indicating user account information associated with each of the logins indicated by the first and second time series, and second applying Fisher's method to the third time series, wherein the generating of the combined indicator is further based on the second applying. [00159] In Example 41, the subject matter of Example 40 optionally includes wherein the logins are remote desktop protocol (RDP) login attempts.

[00160] In Example 42, the subject matter of any one or more of Examples 29-41 optionally include wherein the first time series indicates one or more of hour of day and day of week of failed login and RDP connections, timing of successful logins following foiled attempts, login type information, login failure reasons, a cumulative count of unique usernames that generated an unsuccessful login without a subsequent successful login, a count of failed login attempts, a count of RDP external IP addresses per device, and counts of two devices having RDP inbound connections from a common device.