Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
MARKING NODES FOR ANALYSIS BASED ON DOMAIN NAME SYSTEM RESOLUTION
Document Type and Number:
WIPO Patent Application WO/2016/118153
Kind Code:
A1
Abstract:
Example embodiments disclosed herein relate to marking or associating nodes. A domain name system resolution graph is constructed. The graph includes client nodes and domain nodes. A subset of the domain name resolution graph (e.g., biclique) is determined. At least one client node in the subset is marked based on a label of at least one domain node.

Inventors:
HORNE WILLIAM G (US)
RAO PRASAD V (US)
Application Number:
PCT/US2015/012654
Publication Date:
July 28, 2016
Filing Date:
January 23, 2015
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
HEWLETT PACKARD ENTPR DEV LP (US)
International Classes:
H04L12/24
Foreign References:
US20130179974A12013-07-11
US8260914B12012-09-04
US8341745B12012-12-25
US8402543B12013-03-19
US20130179567A12013-07-11
Attorney, Agent or Firm:
PATEL, Milin, N. et al. (3404 E. Harmony RoadMail Stop 7, Fort Collins CO, US)
Download PDF:
Claims:
CLAIMS

What is claimed is:

1 . A computing device comprising:

a construction engine to construct a domain name system (DNS) resolution graph as a bipartite graph,

the construction engine further to include a plurality of client nodes representing clients and a plurality of domain nodes representing domains in the bipartite graph, and

the construction engine to include respective edges connected between respective client nodes and respective domain nodes if a client DNS query resolves to the respective domain associated with the respective domain nodes;

a sub graph engine to determine at least one biclique of the bipartite graph, wherein the respective client nodes in a biclique map to at least the same domain nodes; and

an analysis engine to mark the client nodes of the biclique based on a label of at least one domain node of the biclique.

2. The computing device of claim 1 , wherein the analysis engine marks the client nodes for further analysis based on the label.

3. The computing device of claim 2, wherein the analysis engine marks previously unlabeled domain nodes associated with the at least one biclique for further analysis.

4. The computing device of claim 1 , wherein the label indicates that the at least one domain node is associated with at least one of a blacklist, a domain generation algorithm, and a non-existent domain.

5. The computing device of claim 1 , wherein the clients are associated with a network, and wherein the constructed bipartite graph does not include at least one of: whitelisted domains and domains resolving within the internal network.

6. The computing device of claim 1 , further comprising: a threshold engine to determine that a size of the biclique is greater than a threshold size,

wherein the analysis engine marks the client nodes is based on a determination that the size of the biclique is greater than the threshold size.

7. A method comprising:

constructing, by at least one processor, a bipartite graph that is a domain name system (DNS) resolution graph, by including in the bipartite graph a plurality of client nodes representing clients in an internal network and a plurality of domain nodes representing domains, and wherein the at least one processor connects an edge from respective client nodes to respective domain nodes if a client DNS query resolves to the respective domain associated with the respective domain nodes;

determining, by the at least one processor, a subset of the bipartite graph, wherein the respective client nodes in the subset map to at least the same domain nodes; and

marking, by the at least one processor, the client nodes of the subset based on a label of at least one domain node of the subset.

8. The method of claim 7, wherein the subset is a biclique.

9. The method of claim 7, further comprising:

determining that a size of the biclique is greater than a threshold size, wherein the marking is based on the determination that the size of the biclique is greater than the threshold size.

10. The method of claim 7, wherein the constructed bipartite graph does not include at least one of: whitelisted domains and domains resolving within the internal network.

1 1 . The method of claim 7, wherein the label indicates that the at least one domain node is associated with at least one of a blacklist, a domain generation algorithm, and a non-existent domain.

12. A non-transitory machine-readable storage medium storing instructions that, if executed by at least one processor of a computing device, cause the computing device to:

construct a domain name system (DNS) resolution graph as a bipartite graph, by including in the bipartite graph a set of client nodes representing a plurality of clients, a set of domain nodes, wherein each domain node represents a domain, and a mapping of the client nodes to domain nodes,

wherein the clients are part of an internal network;

determine at least one biclique of the bipartite graph, wherein the respective client nodes in the biclique map to each of the domain nodes; and mark the client nodes of the biclique for further analysis based on a label of at least one domain node of the biclique.

13. The non-transitory machine-readable storage medium of claim 12, wherein the label indicates that one of the at least one domain node is associated with at least one of a blacklist, a domain generation algorithm, and a non-existent domain.

14. The non-transitory machine-readable storage medium of claim 12, wherein the constructed bipartite graph does not include at least one of: whitelisted domains and domains resolving within the internal network.

15. The non-transitory machine-readable storage medium of claim 12, further comprising instructions that, if executed by the at least one processor, cause the computing device to:

determine that a size of the biclique is greater than a threshold size, wherein marking is based on the determination that the size of the biclique is greater than the threshold size.

Description:
MARKING NODES FOR ANALYSIS BASED ON DOMAIN NAME SYSTEM RESOLUTION

BACKGROUND

[0001 ] Service providers and manufacturers are challenged to deliver quality and value to consumers, for example by providing secure networks. Many private networks (e.g., enterprise networks, home networks, business networks, etc.) have clients that may be infected with malware. It is advantageous to protect these networks from malware.

BRIEF DESCRIPTION OF THE DRAWINGS

[0002] The following detailed description references the drawings, wherein:

[0003] FIG. 1 is a block diagram of a computing device capable of marking nodes of a network as suspicious, according to one example;

[0004] FIG. 2 is a diagram of a system capable of marking nodes of a network as suspicious, according to one example;

[0005] FIG. 3 is a diagram of a bipartite graph for resolution of a domain name system, according to one example;

[0006] FIG. 4 is a diagram that shows a subset of a bipartite graph for resolution of a domain name system, according to one example;

[0007] FIG. 5 is a diagram that shows a biclique of a bipartite graph for resolution of a domain name system, according to one example;

[0008] FIG. 6 is a flowchart of a method for marking nodes of a domain name system resolution graph, according to one example; and [0009] FIG. 7 is a block diagram of a computing device capable of marking nodes of a network for further analysis, according to one example.

DETAILED DESCRIPTION

[0010] Malware infections within a private networks, such as enterprise networks, home local area networks (LANs), office networks, etc. Domains accessed by a network host can be indicative of whether the host is infected by malware. Identifying malicious domains and benign domains accessed by the network's hosts or clients (e.g., users, user devices, other computing devices, etc.) is critical for these private networks, particularly for enterprise networks. Some malicious and benign domains can be identified by comparing accessed domains to a domain blacklist and a domain whitelist.

[001 1 ] However, blacklists and whitelists may not cover all domains. Some legitimate traffic may be directed towards this gray area. However, some of the gray areas may be malicious. Malicious activity by one host in a network may spread the infection to other hosts or hosts may be infected by the same malware from other means. Common behavior seen across multiple hosts in a network could indicate infection by the same malware or use of a same or similar executable. Detecting infections as a group allows for group alerts and actions resulting from them.

[0012] Accordingly, various embodiments described herein detect common behavior across multiple host nodes to facilitate security. Domain Name System (DNS) information can be used to detect this common behavior of host nodes or client nodes. In certain examples, a client node is a computing device that requests at least one resolution of a domain name to an Internet Protocol (IP) address. The request can go to at least one DNS server, which can respond. The DNS information can be stored in a log. The information can be provided from the DNS server, a DNS snooper collecting information about DNS traffic, or the like. In certain examples, the log can include information about what client requested the query, the domain name, what the domain name was resolved to (e.g., the IP address, a non-existent domain, etc.), time information, a number of times the query was made, or the like. Further, log information can be kept for a time period and/or used for a time period. Moreover, in some examples, logs over a time period (e.g., 3 to 4 hours, a day, etc.) can be processed according to the processes described herein.

[0013] A DNS resolution graph can be constructed that takes into account the DNS information. The graph can include client nodes and domain nodes. As used herein, a client node is a representation of a device on a network that is being examined that provides requests to the DNS server. The DNS server can respond to the DNS query. Further, as used herein, a domain node is a node representing a domain name used in a query to resolve to an IP address. The domain node can be represented in the form of a domain name in the request. When a client c requests a DNS resolution to domain d, an edge <c,d> is added to the graph.

[0014] The DNS resolution graph can be a bipartite graph further described in relation to FIGs. 3 - 5. A bipartite graph is a graph whose vertices can be divided into two disjoint sets (one for client nodes and another for domain nodes) such that each edge of the graph connects from a vertex of the client nodes to a vertex of the domain nodes. In certain examples, the DNS resolution graph can become a bipartite graph if edges from clients to other clients are either not constructed or are filtered out. This can be implemented, for example, by ignoring DNS resolutions within internal domains of the network the clients are on.

[0015] Bicliques of the constructed graph can be determined. In the context used herein, a bidique is a special type of bipartite graph where each vertex of the client node set is connected to each vertex of the domain node set. A number of the client nodes of the bidique can indicate how many internal clients exhibit similar behavior. A number of the domain nodes of the bidique indicates how similar the behavior is.

[0016] If DNS resolutions were independent, bicliques would be rare in non- dense graphs. If many bicliques occur, the DNS resolutions are not independent. To exhibit this type of behavior, an inference can be made that the client nodes share executable content (e.g., malware infection, a script snippet from the same web page, etc.) in order to make the same set of DNS resolutions.

[0017] Each domain resolution graph can include large bicliques that represent commonly occurring benign activity. In recognition, the approach can save computation time by not including domains that are likely to be benign by "whitelisting" the domain nodes and thus not including them into the domain resolution graph and/or the bicliques. The whitelist can be a list kept for domains and/or hosts that are either known to be or assumed to be good (e.g., the company website for the company network, a large search engine, a web encyclopedia, etc.). The use of the whitelist results in smaller graphs to process.

[0018] To construct a biclique, the process can start with a node. The process finds its neighbors N. Then, the process finds M, which is the intersection of the set of neighbors of each node in N. By doing this the approach produces a pair of sets M and N such that each node in M is connected to every node in N.

[0019] When bicliques are examined, if the process finds signs that lead to a determination that some domain nodes on the domain node side of the biclique are suspicious (e.g., malicious, blacklisted, non-existent, etc.), the process can flag domains on the domain node side for further examination (e.g., by labeling as suspicious) and flag the client nodes of the biclique as suspicious (e.g., infected, possibly infected, etc.) and mark them for remediation. Such signs include membership in known black lists, presence of syntactic features that indicate an algorithmically generated domain name, the domain name resolving to a non-existent domain, etc.

[0020] FIG. 1 is a block diagram of a computing device capable of marking nodes of a network as suspicious, according to one example. FIG. 2 is a diagram of a system capable of marking nodes of a network as suspicious, according to one example.

[0021 ] According to FIG. 1 , computing device 100 can include a construction engine 1 10, a sub graph engine 1 12, and an analysis engine 1 14. Further, according to FIG. 2, the computing device 100 can further include a threshold engine 1 16, a blacklist 1 18, a whitelist 120, communication engine 122, a processor 130, memory 132, and/or input/output interfaces 134. Memory 132 can include a machine-readable storage medium such. Moreover, according to FIG. 2, system 200 can include the computing device 100, clients 220a - 220n that can communicate with DNS server(s) 240 to request and receive domain name information, a DNS log 250, and devices 230a - 230n.

[0022] In certain examples, the clients 220, the DNS log 250, and the DNS server(s) 240 can be part of an internal network (e.g., a home LAN, an office LAN, an enterprise LAN or network, etc.), while devices 230a - 230n can be external to the internal network (e.g., as available through the Internet). In certain examples, an internal network can be considered a network that uses a private IP address space. A local network can connect to the Internet via a one or more devices, such as gateways, modems, etc.

[0023] The clients 220, DNS log 250, DNS server 240, and devices 230 can connect to each other via network 260. The respective devices 220, 230, 240 may be a notebook computer, a desktop computer, a server, a workstation, or any other computing device capable of performing the recited functionality.

[0024] The construction engine 1 10 can construct a DNS resolution graph. As noted above, the DNS resolution graph can be a bipartite graph. The bipartite graph can include multiple client nodes representing clients 220 of DNS server 240. Further, the bipartite graph can include multiple domain nodes representing domains of what is queried by the DNS server 240. Moreover, the bipartite graph can include edges connected from the respective client nodes to the respective domain nodes. The edges connect the nodes if a client DNS query resolves to the respective domain associated with the respective domain nodes. Illustrations of the graphs are described further in reference to FIGs. 3 - 5. The construction engine 1 10 can construct the DNS resolution graph (e.g., a bipartite graph) by creating a new data structure and populating the client nodes, domain nodes, and edges based on DNS information of the DNS server 240 (e.g., information from the DNS log 250). As noted, client nodes and domain nodes can be populated to sets based on the DNS information. Further, an edge between a client node and a domain node can be populated if a client node DNS query resolves to the associated domain of the domain node.

[0025] As noted above, in one example, the constructed bipartite graph does not include whitelisted domains, domains resolving within the internal network, or a combination thereof. In certain examples, the whitelist may include the domains resolving within the internal network. Further, multiple whitelists may be used for filtering. Moreover, a whitelist may include certain hosts in the internal network (e.g., the DNS server 240). In one example, the whitelisted domains and/or domains resolving within the internal network can be filtered before the graph is constructed. As such, the source DNS log 250 does not include this information. In other examples, the construction engine 1 10 can filter whitelisted domains and/or domains resolving within the internal network after the graph is created. Not including domains resolving within the internal network can ensure a bipartite graph because the queries of client nodes will not resolve to another client node. Not including whitelisted domains can decrease size of the graph as well as increase processing speed.

[0026] Though some DNS servers have some capacity to log information regarding DNS packets, these servers may incur a performance penalty that increases as the amount of logging increases. Consequently, most DNS servers disable logging. Thus, to avoid delaying traffic, a device may be placed in between a DNS server 240 and clients 220 (e.g., computers) in communication with the DNS server 240. The device may copy DNS packets from a packet stream between the DNS server and the clients to an appliance specifically designed to facilitate out of band logging of the normal DNS packet stream so the packet stream is not slowed down. In one example, the DNS log 250 can be based on such a device. In another example, the logged information can be filtered. For example, DNS requests to whitelisted services or domains can be filtered from the DNS log 250. Further, in some examples, the whitelisted domains can include domains that are internal to the network 260. In other examples, the whitelisted domains can include domains that are external to the network 260. This can reduce the amount of information stored. [0027] Comparing packets to the whitelist may allow the appliance to avoid logging packets associated with known or assumed benign entities. These entities may be, for example, domains, IP addresses, applications, clients, and so forth. By way of illustration, for some large companies, internal DNS traffic may make up a substantial portion of DNS traffic processed by a DNS server. Domains associated with external websites may also be whitelisted based on additional criteria. By way of illustration a small number of websites drive a substantial amount of web traffic, and many of these domains are managed by reputable companies that are unlikely to be associated with a malware. They may be culled from a list of high traffic websites or generated by examining traffic over time and automatically or manually whitelisting commonly accessed domains that are unlikely to be associated with malware or other security issue.

[0028] Similarly, the DNS log 250 can include information about blacklisted domains. The list of blacklisted domains can come from a service and can be used to label domain nodes as blacklisted. Moreover, the appliance can determine whether domains that are resolved to are generated by a domain generation algorithm or resolve to non-existent domains. The DNS log 250 can include such information as labels (e.g., tags, metadata, etc.). When the construction engine 1 10 constructs the bipartite graph, the construction engine can label domain nodes (e.g., as blacklisted, whitelisted, otherwise suspicious (e.g., resolving to a non-existent domain, having characteristics of an algorithmically generated domain, etc.), etc.).

[0029] In one example, if the domain is listed on a blacklist 1 18, the label that the construction engine 1 10 associates can include that it is a blacklisted domain. In another example, if the DNS server 240 resolves the domain name of the query to a non-existent domain, the construction engine 1 10 can assign a label of non-existent domain with the domain node. In yet another example, if the domain name looks to be generated by a domain generation algorithm, the construction engine 1 10 can assign a domain generation algorithm label. In some examples, a domain generation algorithm is software that can be implemented using a processor to generate domain names that can be used as rendezvous points to malware controllers. Domains generated through such a manner may meet particular criteria (e.g., not be associated with real words, include random characters or numbers, other syntactic features, etc.). In other examples, a more generic label of suspicious can be associated. The construction engine 1 10 can associate suspicion based on one or more lists (e.g., a whitelist, a blacklist, etc.) and/or analysis of the domain name and/or responses from the DNS server 240.

[0030] The sub graph engine 1 12 determines at least one biclique of the bipartite graph. In the biclique, the client nodes of the biclique map to the same domain nodes. The sub graph engine 1 12 can use various processes for finding bicliques. For example, a matrix factorization technique may be used, the Bron- Kerbosch technique to find maximal bicliques, a greedy process that builds a biclique cover by identifying and including one biclique at a time in the cover until all edges are covered, etc. In one example, given a node n, the set of its neighbors can be considered A n . Consider the set B n = Π A m where m is an element of A n . This can be used to show that BiCliquen = <A„, B n > is a clique. Further, find a node x that is not yet assigned to a biclique, with the largest number of neighbors not yet assigned to bicliques. Compute BiClique x . Repeat until each of the nodes are assigned to bicliques. This is one approach that may be used to determine bicliques from bipartite graphs.

[0031 ] The output of the biclique detection approach used is a set of bicliques. Domains in a biclique may have a high likelihood of being related (e.g., being infected with the same malware or use a same executable). This can be used to analyze the graphs to determine additional suspicious or infected client nodes and/or suspicious and/or malware associated domain nodes. In certain examples, a client can be labeled as infected if it can be associated with malware (e.g., if a threshold number of DNS queries are to blacklisted domains, access to particular blacklisted domains associated with particular malware, etc.). In certain examples, a client can be labeled as suspicious if malware has not yet been confirmed. In one example, the labels are based on a rule where the client is labeled suspicious if a threshold number of blacklisted domains or otherwise suspicious domains (e.g., non-existent domains) are queried. In the example, clients are labeled as infected of a higher threshold number of blacklisted domains are queried.

[0032] The analysis engine 1 14 can mark client nodes of the biclique based on a label of at least one domain node of the biclique. Labels can be based on information known about the respective domain nodes and can be added by the construction engine 1 10 based on DNS information. Further, in some examples, the analysis engine 1 14 can mark the client nodes for further analysis based on the label.

[0033] In some examples, a threshold engine 1 16 can be used to determine a size of the biclique. The size can be based on a number of the client nodes, a number of the domain nodes, or a combination thereof. In some examples, the size of the number of domain nodes being over a threshold increases the likelihood of clients in the bicliques are similar enough to draw an inference.

[0034] In some examples, the threshold engine 1 16 can determine a size of a biclique is greater than a threshold on both the domain node side and the client node side. The biclique may not include a node with a label associated with suspicious activity. However, because the threshold size is met, which may indicate that a large number of client nodes have this similar activity (e.g., each of the client nodes have an edge to the same domain nodes), the nodes can be marked for further analysis to understand why a large number of clients 220 are displaying identical behavior.

[0035] The analysis engine 1 14 can, in certain examples, further mark suspicious client nodes as to be blacklisted to a service. In this scenario, an indication of to be blacklisted means that the nodes should be included on a blacklist at the service. In one example, the service can include an Intrusion Prevention System (IPS) protecting the network 260. In another example, the service can be a blacklist or reputation service.

[0036] Moreover, client nodes in a biclique can be grouped as having related malware by the analysis engine 1 14. An IPS or other service can use the information in remedying the malware. In some examples, the IPS or other service can limit or disable communications from an infected or suspicious client node and/or domain node. In some examples, when a remedy is found, the group having related malware can be treated. This can be facilitated by a communication engine 122 providing information about the group (e.g., via an email or log) to the service. In other examples, information determined can be forwarded to a security information and event management system for further analysis.

[0037] The engines, modules, and parts described herein can be distributed between one or more devices. The engines 1 10, 1 12, 1 14, 1 16, 122 include hardware and/or combinations of hardware and programming to perform functions provided herein. Moreover, modules can include programing functions and/or combinations of programming functions to be executed by hardware as provided herein. When discussing the engines, modules, and parts, it is noted that functionality attributed to an engine can also be attributed to a corresponding module and vice versa. Moreover, functionality attributed to a particular module and/or engine may also be implemented using another module and/or engine.

[0038] A processor 130, such as a central processing unit (CPU) or a microprocessor suitable for retrieval and execution of instructions, and/or electronic circuits can be configured to perform the functionality of any of the engines and/or modules described herein. In certain scenarios, instructions and/or other information, such as rules, can be included in memory 132. In some examples, input/output interfaces 134 may additionally be provided by the devices. For example, input devices, such as a keyboard, a sensor, a touch interface, a mouse, a microphone, etc. can be utilized to receive input from an environment surrounding the devices. Further, an output device, such as a display, can be utilized to present information to users. Examples of output devices include speakers, display devices, amplifiers, etc. Moreover, in certain embodiments, some components can be utilized to implement functionality of other components described herein. Input/output devices such as communication devices like network communication devices or wireless devices can also be considered devices capable of using the input/output interfaces. [0039] The network 260 can use wired communications, wireless communications, or combinations thereof. Further, the network 260 can include multiple sub communication networks such as data networks, wireless networks, telephony networks, etc. Such networks can include, for example, a public data network such as the Internet, local area networks (LANs), wide area networks (WANs), metropolitan area networks (MANs), cable networks, fiber optic networks, combinations thereof, or the like. In certain examples, wireless networks may include cellular networks, satellite communications, wireless LANs, etc. Further, the network 260 can be in the form of a direct network link between devices. Various communications structures and infrastructure can be utilized to implement the communication network(s).

[0040] By way of example, the clients 220, devices 230, and DNS server 240 communicate with each other and/or other components with access to the network 260 via a communication protocol or multiple protocols. A protocol can be a set of rules that defines how nodes of the network 260 interact with other nodes. Further, communications between network nodes can be implemented by exchanging discrete packets of data or sending messages. Packets can include header information associated with a protocol (e.g., information on the location of the network node(s) to contact) as well as payload information. As noted, clients 220a - 220n, DNS server 240, and/or the DNS log 250 may be in a separate network than devices 230a - 230n. A firewall, edge device, or other device may be implemented to separate network 260 from devices 230a - 230n.

[0041 ] FIG. 3 is a diagram of a bipartite graph for resolution of a domain name system, according to one example. This is an example DNS resolution Graph 300 that is also a bipartite graph. As such, edges connect client nodes 302, 304, 306, 308, 310, 312, 314 to respective domain nodes 320, 322, 324, 326, 332, 334, 340, 342, 344. In this example, domain nodes 320, 322, 324, 326 can be whitelisted, while domain nodes 332, 334 are blacklisted or otherwise labeled as suspicious (e.g., relating to a non-existent domain, relating to a domain generation algorithm based on syntax, etc.), and domain nodes 340, 342, 344 can be unlabeled nodes (e.g., not yet associated with a whitelist, a blacklist, or other type of label). DNS resolution graph 300 is simplified for illustration. DNS resolution graphs 300 constructed by the construction engine 1 10 as used herein can be much larger.

[0042] FIG. 4 is a diagram that shows a subset of a bipartite graph for resolution of a domain name system, according to one example. Bipartite graph 400 can be a subset or sub graph of another bipartite graph. The bipartite graph 400 can be used at to identify potentially infected clients. In one example, client node 410 can be considered potentially infected because it is related to a threshold number of domain nodes 420, 422, 424 that are labeled as suspicious. For illustrative purposes, in this example, the threshold number is three. In practice the threshold can be larger. The threshold can be based on various factors and may be customizable for particular internal networks. For example, a large enterprise network may use larger thresholds to limit the amount of follow-up on suspicious clients or domains. There is a correlation of a client node 410 connecting to a large number of suspicious domains and the client node 410 being infected. This can be different than a single connection or a small number of connections, which can be attributed to clicking on a link to a blacklisted site rather than to an infection. This type of a sub graph can be used in determination of bicliques.

[0043] FIG. 5 is a diagram that shows a biclique of a bipartite graph for resolution of a domain name system, according to one example. In this example of a biclique 500, the two client nodes 510, 512 map to domain nodes 520, 522, 524, 526. Domain nodes 520, 522, 524 are labeled as suspicious. For illustrative purposes, a threshold of three domain nodes in a biclique as being considered suspicious is used to signify that a client node 510, 512 is potentially infected or infected. As such, client nodes 510, 512 can be labeled as suspicious, potentially infected, infected, etc. Because domain node 526, which has not yet been labeled is in the biclique with suspicious client nodes 510, 512, domain node 526 can be labeled for further analysis (e.g., labeled as suspicious). Further, this label can be used for analysis of other bicliques to label other client nodes and/or domain nodes.

[0044] FIG. 6 is a flowchart of a method for marking nodes of a domain name system resolution graph, according to one example. FIG. 7 is a block diagram of a computing device capable of marking nodes of a network for further analysis, according to one example. Although execution of method 600 is described below with reference to computing device 700, other suitable components for execution of method 600 can be utilized (e.g., computing device 100). Additionally, the components for executing the method 600 may be spread among multiple devices. Method 600 may be implemented in the form of executable instructions stored on a machine-readable storage medium, such as storage medium 720, and/or in the form of electronic circuitry.

[0045] The computing device 700 includes, for example, a processor 710, and a machine-readable storage medium 720 including instructions 722, 724, 726 for marking nodes of a network for analysis. Computing device 700 may be, for example, a notebook computer, a desktop computer, a workstation, a server, or any other computing device capable of performing the functionality described herein.

[0046] Processor 710 may include at least one central processing unit (CPU), at least one semiconductor-based microprocessor, at least one graphics processing unit (GPU), other hardware devices suitable for retrieval and execution of instructions stored in machine-readable storage medium 720, or combinations thereof. For example, the processor 710 may include multiple cores on a chip, include multiple cores across multiple chips, multiple cores across multiple devices (e.g., if the computing device 700 includes multiple node devices), or combinations thereof. Processor 710 may fetch, decode, and execute instructions 722, 724, 726 to implement method 600. As an alternative or in addition to retrieving and executing instructions, processor 710 may include at least one integrated circuit (IC), other control logic, other electronic circuits, or combinations thereof that include a number of electronic components for performing the functionality of instructions 722, 724, 726.

[0047] Machine-readable storage medium 720 may be any electronic, magnetic, optical, or other physical storage device that contains or stores executable instructions. Thus, machine-readable storage medium may be, for example, Random Access Memory (RAM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a storage drive, a Compact Disc Read Only Memory (CD-ROM), and the like. As such, the machine- readable storage medium can be non-transitory. As described in detail herein, machine-readable storage medium 720 may be encoded with a series of executable instructions for associating nodes with labels. Further, in some examples, the various instructions 722, 724, 726 can be stored on different media.

[0048] At 602, construction instructions 722 can be executed by processor 710 to construct a DNS resolution graph as a data structure. In one example, the DNS resolution graph is a bipartite graph. In the data structure, a set of client nodes representing clients of an internal network that can each map to a set of domain nodes, where each domain node represents a domain. The processor 710 can map information about the clients and domains to the nodes in the graph based on DNS information. For the mapping, the processor 710 connects edges between the respective client nodes and respective domain nodes if a client DNS query resolves to the respective domain associated with the respective domain nodes. In one example, the data structure can include a list of domain nodes, a list of client nodes and a list of edges from the client nodes to the domain nodes for each of the client nodes.

[0049] As noted above, in some examples, whitelisted domains are not included as domain nodes in the bipartite graph. Further, in some examples, domains resolving to within the internal network are not included as domain nodes in the bipartite graph.

[0050] At 604, sub graph instructions 724 can be executed by the processor 710 to determine a subset of the bipartite graph. In this subset, the client nodes map to at least the same domain nodes. In certain examples, the subset is a biclique where each client node has an edge to each domain node of the subset.

[0051 ] At 606, label instructions 726 can be executed by processor 710 to mark client nodes of the subset based on a label of at least one domain node of the subset. In one example, the label of the domain node(s) is associated with a blacklist, a domain generation algorithm, and/or a non-existent domain. [0052] The marking of the client nodes can also be based on a size of particular data of the sub graph or biclique. For example, the computing device 700 can determine a size of the biclique. In one example, the size is based on a count of the domain nodes. In another example, the size can be based on a count of the domain nodes as well as client nodes. Further, in one example, a count of the number of domain nodes that are labeled as suspicious can be used to determine the size. If the size is greater than a threshold number, the client nodes are marked. This is because a correlation can be drawn that a client node may be infected if it attempts to connect to a threshold number of suspicious domains. The threshold can be based on various factors and may be customizable for particular internal networks. For example, a large enterprise network may use larger thresholds to limit the amount of follow-up on suspicious clients or domains.

[0053] Further, as noted above, unlabeled domain nodes can be marked as suspicious based on the biclique. This can come from the correlation that if a set of client nodes are considered suspicious because the nodes attempt to access suspicious domain nodes, other nodes that they may attempt contact with may also be suspicious. As such, these nodes can be labeled as suspicious and marked for further analysis. Moreover, these labels can be fed back for future construction of DNS resolution graphs.