PROCESSING TIME-VARYING DATA USING AN ADJACENCY LIST REPRESENTATION OF A TIME-VARYING GRAPH

Title:

PROCESSING TIME-VARYING DATA USING AN ADJACENCY LIST REPRESENTATION OF A TIME-VARYING GRAPH

Document Type and Number:

WIPO Patent Application WO/2017/131795

Kind Code:

Abstract:

Examples of processing time-varying data using an adjacency list representation of a time-varying graph is used. In this case, vertices in the adjacency list representation correspond to defined entities in the time-varying data and each vertex has a corresponding neighbor list. Neighbor pointers are provided indicating a relationship between elements in the set of neighbor arrays. A distinct chain representation is also used for each neighbor list, wherein each distinct chain representation indicating distinct elements of a corresponding neighbor list that are ordered based on a relationship time from the time-varying data. A first part of the distinct chain representation is stored in a first cell having a fixed size and a second part of the distinct chain representation is stored in a second cell having variable size.

Inventors:

NOR IGOR (IL)
SCHEIN SAGI (IL)
BARKOL OMER (IL)

Application Number:

PCT/US2016/015864

Publication Date:

August 03, 2017

Filing Date:

January 31, 2016

Export Citation:

Click for automatic bibliography generation Help

Assignee:

HEWLETT PACKARD ENTPR DEV COMPANY LP (US)

International Classes:

G06F17/30

Domestic Patent References:

WO2013149381A1

2013-10-10

Foreign References:

US20070239677A1	2007-10-11
US20080162552A1	2008-07-03
US20090303239A1	2009-12-10
US20130013534A1	2013-01-10

Attorney, Agent or Firm:

DE MERLIER, Mallary K. et al. (US)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

What is claimed is:

1 . A method for processing time-varying data, comprising:

obtaining, by a processor, time-varying data indicating relationships between defined entities that occur at given times; and

processing, by the processor, the time-varying data to configure and store an adjacency list representation of a time-varying graph based on the time-varying data, the time-varying graph having a plurality of vertices,

wherein the plurality of vertices in the adjacency list representation respectively correspond to the defined entities in the time-varying data and each vertex has a corresponding neighbor list,

wherein the processing comprises for each relationship between two of the defined entities:

storing, in association with the adjacency list representation, at least one timestamp indicating a respective time of the relationship;

storing a neighbor pointer associated with the defined entities in the at least one relationship in respective neighbor lists; and

storing a distinct chain representation for each element in each neighbor list, each distinct chain representation indicating distinct elements of a corresponding neighbor list that are ordered based on a relationship time from the time-varying data,

wherein storing the distinct chain representation comprises:

storing a first part of the distinct chain representation in a first cell, wherein the first cell forms part of a distinct chain array having a plurality of fixed-size cells, and

storing a second part of the distinct chain representation in a second cell, wherein the second cell forms part of an additional distinct chain array having a plurality of variable-size cells.

2. The method of claim 1 , wherein the distinct chain representation comprises a pointer associated with a distinct element of the corresponding neighbor list and referencing a further distinct element of the corresponding neighbor list.

3. The method of claim 1 , wherein the distinct chain representation comprises a hash table indicating distinct elements of the corresponding neighbor list.

4. The method of claim 1 , comprising:

obtaining, by the processor, a time interval associated with a temporal portion of the time-varying graph;

loading, by the processor, the adjacency list representation of the time- varying graph, including:

searching vertices of the adjacency list representation to locate a vertex with an associated relationship time within the time interval;

for the located vertex, determining a last element within the neighbor list for the located vertex that has an associated relationship time within the time interval; and

retrieving elements in the adjacency list representation starting from the determined last element, said retrieving comprising traversing, based on the neighbor pointers, elements within the neighbor lists that have an associated relationship time within the time interval and that form part of the distinct chain representation for the determined last element.

5. The method of claim 4, wherein said searching vertices and determining a last element within the neighbor list for the located vertex comprises a binary search and said retrieving elements comprises a breadth-first search.

6. The method of claim 1 , wherein, for each new element to be added to a neighbor list, storing a distinct chain representation for said new element of the neighbor list comprises:

adding the new element to an end of the distinct chain representation; and removing any element having a same value as the new element from the distinct chain representation.

7. The method of claim 1 , comprising storing, by the processor, in association with the adjacency list representation, at least one count indicating a number of relationships between the at least two defined entities occurring in a given time interval.

8. The method of claim 1 , wherein the distinct chain representation is stored in non-volatile memory.

9. A system comprising:

non-volatile memory storing data representative of a time-varying graph; a graph data insertion interface for receiving data for inclusion in the time- varying graph; and

a graph query interface for receiving a query relating to the time-varying graph,

wherein the data representative of a vertex of the time-varying graph comprises:

a temporal adjacency array having a plurality of temporal adjacency array cells, each temporal array adjacency cell corresponding to a connection to another vertex;

a distinct chain array having a plurality of distinct chain array cells, each distinct array cell indicating for a corresponding temporal adjacency array cell up to a fixed number of distinct connections of the temporal adjacency array that are ordered based on a relationship time from the time-varying data; and

an additional distinct chain array having a plurality of additional distinct chain array cells, each additional distinct chain array cell indicating for a corresponding distinct chain array cell additional distinct connections.

10. The apparatus according to claim 9, wherein each of the plurality of temporal adjacency array cells has the same size, and each of the distinct chain array cells has the same size.

1 1 . The apparatus of claim 9, wherein the additional distinct chain array cells are non-uniform in size.

12. The apparatus of claim 9, wherein the plurality of additional distinct chain array cells is different in number to the plurality of distinct chain array cells.

13. A non-transitory computer-readable medium storing data representative of a dynamic graph, the data comprising:

a temporal adjacency array having a plurality of temporal adjacency array cells, each temporal array adjacency cell corresponding to a connection to another vertex;

14. The non-transitory computer-readable medium of claim 13, wherein the non-transitory computer-readable medium is non-volatile memory.

15. The non-transitory computer-readable medium of claim 13, wherein the wherein the additional distinct chain array cells are non-uniform in size.

Description:

PROCESSING TIME-VARYING DATA USING AN ADJACENCY LIST REPRESENTATION OF A TIME-VARYING GRAPH

BACKGROUND

[0001] Many situations arise in which time-varying data is produced. For example, social media feeds, system logs, telecommunications systems and network monitoring applications may all generate records of events that occur over time.

BRIEF DESCRIPTION OF THE DRAWINGS

[0002] Various features of the present disclosure will be apparent from the detailed description which follows, taken in conjunction with the accompanying drawings, which together illustrate, features of certain examples, and wherein:

[0003] Figure 1 is a flow diagram showing a method for obtaining and processing time-varying data according to an example;

[0004] Figure 2A-2C are schematic illustrations showing an adjacency list representation of a time-varying graph according to an example;

[0005] Figure 3 is a schematic illustration of a distributed system for graph analytics according to an example;

[0006] Figure 4 is a schematic illustration showing a temporal adjacency array, a distinct chain array and an additional distinct chain array used to store part of a time-varying graph according to an example; and

[0007] Figure 5 is a schematic illustration showing an adjacency list representation of a time varying graph, stored on a non-transitory computer readable medium according to an example.

DETAILED DESCRIPTION

[0008] In a social media feed a user may add or remove other users. In a telecommunications system, connectivity between a mobile device and a base station may vary over time. Similarly, repositories of electronic documents, such as the HyperText Markup Language (HTML) documents forming the World Wide Web, change over time as new documents get added and old documents are removed or modified. It is frequently desirable to quickly and efficiently store and/or retrieve such data.

[0009] It is frequently desirable to analyze time-varying data, which may be received as streaming data (i.e. a "data stream"). For example, the data may be indicative of connections at given times between entities in a network such as a computer network or a telecommunications system. As another example, the data may be indicative of e-commerce transactions wherein connections between users represent transactions between those users. In certain examples described herein this time-varying or "dynamic" data is represented as a graph in a graph data structure. This then allows the data to be analyzed using graph processing techniques.

[0010] When analyzing time-varying data, it may be desirable to analyze an up- to-date representation of the data and/or a representation of the data at a given time, or over a given time range, in the past. This time-based querying presents challenges for data storage and/or retrieval. Processing an incoming or new event in a data stream may take a given amount of time and computing resources and this may vary depending on the processing methods that are being used. In one case, a complete copy of a graph data structure could be stored every time a change in the data stream occurs. This would take considerable time and resources at the time of storage but would aid rapid retrieval of data. In certain cases it may not be possible to appropriately handle changes if they are received too frequently and/or the data structure becomes too large. In another case, changes may be stored in a log file. This would enable rapid storage of data. However, in this case, a new graph data structure would need to be constructed and processed with each data retrieval operation. As such, time-based queries may take a considerable amount of time to perform and/or fail to complete for large datasets and constrained resources. Given this, there is a desire for a time and resourceefficient methods for storing and loading time-varying data, in particular where that data may be represented as graph data.

[0011] Certain examples described herein allow for useful processing of time- varying data. Certain examples obtain time-varying data, said data indicating at least one relationship between two defined entities that occurs at a given time. This may, for example, represent a network connection or transaction between two computing devices, or a recorded grouping or coupling between user accounts. The time- varying data is then processed to configure and store an adjacency list representation of a time-varying graph based on the time-varying data. Vertices in the adjacency list representation correspond to defined entities in the time-varying data and each vertex has a corresponding neighbor list. The processing comprises storing, in association with the adjacency list representation, at least one timestamp indicating a respective time of the at least one relationship. A neighbor pointer is also stored, indicating elements associated with the defined entities in the at least one relationship in respective neighbor lists. For each entry in the neighbor list, a distinct chain representation is stored, each distinct chain representation indicating distinct elements of the corresponding neighbor list that are ordered based on a relationship time from the time-varying data.

[0012] Storing time-varying data as described in examples herein takes less time and uses fewer resources than storing, for each change (such as a new connection), a complete representation of the data at a given time. However, it also allows for fast and efficient loading of the data, for example to return a representation of the data at a request point or interval in time, as compared to a log-based approach.

[0013] In certain examples, the time-varying data is stored in non-volatile memory which provides a persistent data structure providing access much faster than a hard disk device. The time-varying data for a vertex comprises three arrays. A temporal adjacency array has a plurality of temporal adjacency array cells, with each temporal adjacency array cell storing data associated with a neighboring vertex, i.e. a vertex with which an edge exists, together with temporal data associated with the timing of the connection. A distinct chain array has a plurality of distinct chain array cells with each distinct chain array cell indicating up to a fixed number of distinct neighbor vertices with which a connection has existed, such that the distinct chain array has a plurality of fixed-size cells. An additional distinct chain array is also provided storing additional distinct neighbor vertices should the number of distinct neighbor vertices exceed the number which can be stored within the fixed sized of a distinct chain array cell. The additional distinct chain array has a plurality of variable-size cells. By storing the time-varying data in arrays, and to the extent possible using arrays having fixed cell sizes, makes efficient use of the non-volatile memory with regard to loading and retrieval of the time-varying data.

[0014] To retrieve or load time-varying data, in certain examples described herein, a time interval is obtained associated with a temporal portion of a time- varying graph, e.g. the graph as described above. The adjacency list representation of the time-varying graph may then be loaded. This may comprise searching vertices of the adjacency list representation to locate a vertex with an associated relationship time within the requested time interval. For the located vertex, a last element is then determined within the neighbor list for the located vertex that has an associated relationship time within the time interval. This may represent the first or last connection of the vertex in question within the time interval.

[0015] Elements in the adjacency list representation may then be retrieved, starting from the determined last element. In an example, this retrieving comprises traversing, based on at the neighbor pointers, elements within the neighbor lists that have an associated relationship time within the time interval and that form part of one of the distinct chain representations. The searching of vertices and determining a last element within the neighbor list for the located vertex may comprise a binary search, and the retrieving of elements may comprise a breadth-first search. In this manner, a representation of the graph data corresponding to the time interval may be retrieved faster and more efficiently than in a system in which the stored data comprises merely a log of each change.

[0016] As noted above, certain examples described herein provide for processing of graph data so that it may be both stored and subsequently retrieved more efficiently, in terms of time and/or computing resources, than with comparative techniques.

[0017] Figure 1 illustrates a flowchart showing an example of a method 100 for processing time-varying data according to an example. Although execution of the method 1 00 is described below with reference to a distributed processing system, the method 100 may be implemented by a single processor. The method 100 may be implemented in the form of executable instructions stored on a machine-readable storage medium, and/or in the form of electronic circuitry.

[0018] In this example, the time-varying data, obtained at block 1 10, indicates at least one relationship between two defined entities that occurs at a given time. Time-varying data may relate to an event stream, e.g. a series of events wherein each event involves two or more defined entities and occurs at a defined time. The defined entities may be computers in a network, and the at least one relationship may indicate a connection between these computers, e.g. as extracted by a packet sniffing tool. As another example, the defined entities may comprise elements of a telecommunications network, such as mobile devices and base stations or core network elements, with the at least one relationship similarly representing a connection between these elements. As a further example, the defined entities may represent users in a computing application, e.g. as identified by a user account having an appropriate user identifier. The relationship in this case may comprise adding a particular second user to a user list of a first user. In another case, the relationship may comprise a hyperlink and the entities may comprise HTML documents on the World Wide Web. It will be clear to one skilled in the art that the presently described method is applicable to other forms of data indicating relationships between defined entities.

[0019] At block 120, the time-varying data is processed to configure and store an adjacency list representation of a time-varying graph, based on the time-varying data. Vertices in the adjacency list representation correspond to defined entities in the time varying data and each vertex has a corresponding neighbor list. The neighbor list indicates vertices with which connections exist to the vertex in question. In certain cases, the adjacency list representation may comprise a plurality of array data structures. For example, a first array data structure may store a list of vertices and each entry in the list of vertices may have a corresponding linked array data structure implementing a neighbor list. These array structures may be dynamic, e.g. the number of elements in the neighbor list may change in size over time.

[0020] In Figure 1 , the processing at block 120 includes a number of sub- blocks. At sub-block 130, at least one timestamp indicating a respective time of the at least one relationship is stored. For example, the timestamp may indicate a time of connection between two computers in a network.

[0021] At sub-block 140, the processing comprises storing a neighbor pointer indicating elements associated with the defined entities in the at least one relationship in respective neighbor lists. For example, a connection may exist between vertices A and B. B would then be in the neighbor list of A, and A would be in the neighbor list of B. In such a situation, a neighbor pointer associated with the representation of A in the neighbor list of B would point to the representation of B in the neighbor list of A. For example, the neighbor pointer may comprise a tuple data structure that stores the indices of A and B in their respective neighbor lists, in certain cases together with a timestamp indicating the time of connection. In one case, the timestamp is stored in association with the neighbor pointer.

[0022] At sub-block 150, a distinct chain representation is stored for each neighbor list. The distinct chain representation indicates distinct elements of a corresponding neighbor list. As such, if a given element is repeated in a neighbor list, the distinct chain representation would indicate one instance of that element. A first part of the distinct chain representation is stored in a first cell which forms part of a distinct chain array having a plurality of fixed-size cells, and a second part of the distinct chain representation is stored in a second cell which forms part of an additional distinct chain array having a plurality of variable-size cells.

[0023] Figures 2A-2C are schematic illustrations that show an adjacency list representation of a time-varying graph according to an example. In Figure 2A, an array or vertex list 200, which may be a dynamic array, is stored in which elements 210-1 ... 21 0-n represent vertices of the time-varying graph Vi ... Vn, including vertices Vi (210-i) and Vj (210-j).

[0024] For each vertex a corresponding array is stored representing a corresponding neighbor list (220-1 ... 220-n). These neighbor lists 220 may also comprise dynamic arrays. Each neighbor list 220 comprises a list of vertex identifiers, each identifier indicating a connection between the identified vertex in the neighbor list and the corresponding vertex in the vertex list 200. For example, Figure 2A shows that a connection exists in the time-varying graph between vertex Vi and vertex Vj. As such, the neighbor list 220-i of vertex Vi includes an element 230-j corresponding to vertex Vj, and the neighbor list 220-j of vertex Vj will include an element 230-i corresponding to vertex Vi. The elements of the neighbor lists 220-1 ... 220-n may be stored in an order indicating the order in which the connections occurred. For example, in Figure 2A elements to the right-hand side of the Figure may represent newer connections, wherein a far right element represents the most recent connection or graph edge.

[0025] Figure 2A also indicates schematically a neighbor pointer 240 that is stored. The neighbor pointer 240 indicates a connection between the element 230-j corresponding to vertex Vj in the neighbor list 220-i of vertex Vi, and the element 230-i corresponding to vertex Vi in the neighbor list 220-j of vertex Vj. The neighbor pointer 240 may comprise references to memory locations storing identifiers for vertices Vi and Vj. Given a known element in one of the neighbor lists 220-i and 220- j, the neighbor point 240 enables the corresponding element in the other neighbor list to be located.

[0026] Certain examples described above enable events from a data stream (time-varying data) to be efficiently stored. If data is stored as described above, efficient loading or retrieval of data is enabled. An example of retrieving a portion of data corresponding to a particular time interval will now be described.

[0027] Given an adjacency list representation, as schematically shown in Figure 2A, a retrieval operation begins by searching vertices of the vertex list 200 to locate any vertex with an associated relationship time within the time interval. This may for example be a binary search. A second search, which may also be a binary search, is then performed within the neighbor list 220 of the located vertex, to find a vertex with an associated relationship time within the time interval. This may for example correspond to the first or last relationship in the neighbor list within the time interval. Elements of the adjacency list representation may then be retrieved by starting at the determined element and traversing, based on the neighbor pointers, elements within neighbor lists that have an associated relationship time within the time interval. This may for example be a breadth-first search. This retrieval operation retrieves vertices and edges or connections from the graph data structure that are within the queried time interval. [0028] Figure 2B shows a representation of a neighbor list 220-B for a given vertex in Figure 2A. The neighbor list 220-B comprises elements corresponding to connections from the given vertex to other vertices in the graph data structure, the other vertices being represented by the characters a, b, c, d, e (e.g. said characters comprise identifiers for the vertices). Multiple connections occur between the given vertex and some of the other vertices, and as such the neighbor list 220-B may comprise multiple elements corresponding to a single other vertex. For example, the repeated inclusion of the identifier for vertex a, may represent multiple connections over time between the given vertex and vertex a.

[0029] In the present described example, when retrieving graph data for a requested time or time interval, a distinct chain representation is used to determine all vertices that have connections that occur within that time interval. In this case, it is deemed sufficient to know that at least one connection occurred between two vertices in a requested time interval, without returning further information regarding the nature of any connections within the time interval. By returning information regarding entities active within a time period, but reducing the amount of information regarding specific connections within that time period, graph retrieval operations may be increased in speed. In this case, a distinct chain representation 250 is stored for each neighbor list. The distinct chain representation indicates distinct elements of its corresponding neighbor list. That is to say, it skips repeated instances of the elements corresponding to connections with the same vertex. In Figure 2B, the distinct chain representation indicates the first occurrence of a connection to a particular other vertex. In other embodiments, it may for example indicate the last occurrence of a connection to a particular other vertex.

[0030] As illustrated in Figure 2B, in one example, the distinct chain representation may comprise at least one pointer, each said pointer being associated with a distinct element of the corresponding neighbor list and referencing a further distinct element of the corresponding neighbor list. In this manner, a set of pointers may link the distinct elements of the neighbor list. The distinct chain representation may comprise at least one timestamp associated with an element of the distinct chain representation. For example, where the distinct chain representation comprises at least one pointer associated with a distinct element of the corresponding neighbor list, a timestamp may be associated with each of these pointers indicating the time at which that pointer was created. Figure 2C shows the array structure of Figure 2A, wherein a distinct chain representation 250-1 ... 250-n is stored for each neighbor list 220-I ... 220-j.

[0031 ] In certain examples, the use of a distinct chain representation may be extended to include additional information regarding connections within a requested time or time period. In this case, at least one count indicating a number of relationships between at least two defined entities occurring in a given time interval may be stored in association with the adjacency list representation.

[0032] In examples, the distinct chain representation may comprise a hash table indicating distinct elements of the corresponding neighbor list.

[0033] Returning to Figure 2C, when an indication of a new connection between vertices Vi and Vj is obtained, for example as part of streaming data, an element corresponding to Vi is stored at one end of the neighbor list 220-j corresponding to Vj, and an element corresponding to Vj is stored at one end of the neighbor list 220- i corresponding to Vi. A neighbor pointer is then stored indicating the connection between these elements. The distinct chain representation is also updated. For example, as described above, the distinct chain representation may identify the most recent connection between a vertex and each relevant other vertex. On obtaining a new connection between vertices Vi and Vj, The distinct chain representation is updated to identify that connection instead of any previous connection between vertices Vi and Vj. A timestamp may also be stored, indicative of the time of updating the distinct chain representation. As such, for each new element to be added to a neighbor list, storing a distinct chain representation for said neighbor list comprises adding the new element to an end of the distinct chain representation and removing any element having a same value as the new element from the distinct chain representation. For example, starting from Figure 2B, a new connection to element c would result in the links from b to c and c to d being removed and replaced with a new link from b to d (in addition to a new link from c to a). This process takes a constant period of time (O(1 )) for each new connection and so is independent of the size of the dataset. It is thus straightforward to assign computing resources for the storage of new connections. [0034] In some embodiments, representations of relationships may be removed from the adjacency list representation. For example, relationships may be removed after a given amount of time has elapsed. Removing a relationship between vertices Vi and Vj comprises removing elements corresponding to the relationship from the neighbor list 220-i of vertex Vi and from the neighbor list 220-j of vertex Vj. A timestamp may be stored indicating the time at which the removal occurred. As the elements of each neighbor list are time-ordered, removal of the earliest stored relationship comprises removing the first element of the corresponding neighbor lists.

[0035] Certain examples described above enable events from a data stream, (time-varying data) to be efficiently stored. In the following passages examples of a corresponding loading or data retrieval operation are described. These examples enable portions of a time-varying graph data structure to be retrieved in response to a query having a particular time or time period.

[0036] In an example data retrieval operation, a time interval associated with a temporal portion of a time-varying graph is first obtained. A representation of the time-varying graph data may then be loaded that corresponds to that time interval. In this example, vertices of the above-described adjacency list representation are searched to locate a vertex with an associated relationship time within the time interval. This may be, for example, a binary search. For the located vertex, a search, which again may be a binary search, is performed within its neighbor list to determine an element that has an associated relationship time within the time interval. This may correspond to the first relationship in the neighbor list within the time interval, in examples in which the distinct chain representation indicates the first occurrence of each vertex, such that the element is represented in the distinct chain representation. Similarly, in examples in which the distinct chain representation indicates the last occurrence of each vertex, this may advantageously correspond to the last relationship in the neighbor list within the time interval.

[0037] Elements in the adjacency list representation may then be retrieved by starting at the determined element and traversing, based on the neighbor pointers, elements within neighbor lists that have an associated relationship time within the time interval and that form part of one of the distinct chain representations. This may for example be a breadth-first search. Upon following a neighbor pointer, this may comprise advancing through the distinct chain until the last element of the time interval, then travelling backward until reaching the end of the time interval. In this manner, a representation of the relationships associated with the time period may be efficiently loaded.

[0038] If a binary searches are used these may take 0(log(E)) time to perform, where E is the number of edges or connections present in the graph data structure. Without a distinct chain representation, a breadth first search may take 0(Vi + Ei) time to perform, where Vi represents the number of vertices or entities within the requested time interval and Ei represents the number of edges or connections within the requested time interval. In this case, a complexity of the loading operation is 0(log(E)+ Vi + Ei). When a distinct chain representation is used, a breadth first search may take 0(Vi* + Ei*) time to perform, where Vi* represents the number of unique or distinct vertices or entities within the requested time interval and Ei* represents the number of unique or distinct edges or connections within the requested time interval. In this case, a complexity of the loading operation is 0(log(E)+ Vi* + Ei*). In both cases, the time required may be modelled and appropriate computing resources may be assigned to perform the operation.

[0039] Figure 3 schematically shows a distributed system for graph analytics according to an example. In this example, the distributed system includes a master node 305 and a plurality of worker nodes 31 0a to 31 On, hereafter referred to as worker nodes 31 0, interconnected by a message passing system 31 5. In one example, each of the master node 305 and the worker nodes 31 0 is formed by a system on a chip. As shown in Figure 4, in this example the plurality of worker nodes 31 0 have respective core processors 320a to 320n, hereafter referred to as core processors 320.

[0040] The message passing system 31 5 allows for communication of messages between the master node 305 and the worker nodes 31 0, and also allows for communication of messages directly between the worker nodes 310. Although schematically shown in Figure 3 as being outside of the master node 305 and the worker nodes 31 0 for ease of illustration, it will be appreciated that parts of the message passing interface 31 5 may be provided in the systems on a chip for the master node 305 and the worker nodes 310. In an example, the message passing system 315 comprises the Message Passing Interface.

[0041] In an example, the worker nodes 310 can access time-varying graph data stored by a shared memory 325 via a memory management module 430. The shared memory 325 distributes the graph data between a plurality of non-volatile memories 335a to 335m, hereafter collectively referred to as NVMs 335. The memory management module 330 provides the worker nodes 310 with shared access to the shared memory 325. Although the memory management module 330 is schematically shown as being outside the worker nodes 310 and the distributed memory 325 for ease of illustration, it will be appreciated that at least part of the memory management module 330 may be located in the same hardware devices as the worker nodes 310 and the NVMs 335. In examples, the memory management module may be implemented in software, hardware or a combination of software and hardware.

[0042] The NVMs 335 may be any form of non-volatile memory. In an example, the NVMs may be formed using memristor technology.

[0043] In an example, as schematically shown in Figure 3, the master node 305 has program code for a graph data insertion interface 340, a graph query interface 345, a task dispatcher 350, a graph composer 355 and a task monitor 360.

[0044] The graph data insertion interface 340 receives data from a data source 365, for example a telecommunications network or a social network, and inserts the data as new vertices or edges in the time-varying graph stored by the distributed memory 325. In an example, this is achieved by the graph data insertion interface 340 allocating tasks for achieving the data insertion to the plurality of worker nodes 310. This allocation may be made on the basis that each worker node 31 0 is responsible for a respective portion of the time-varying graph, for example the portion of the time varying graph stored on one of the NVMs 335. The tasks are then sent by the task dispatcher 350, via the message passing system 315, to their respective worker nodes 310.

[0045] The graph query interface 345 receives and processes a query from a query application 370, which may be for example a web application. An example of a query that may be received is a "k-hop" query for a prescribed time interval, which requests details of all paths from a query vertex of the time-varying graph having a depth of up to k edges over the prescribed time interval. The graph query interface 345 allocates tasks for implementing the query to the plurality of worker nodes 310, and then the allocated tasks are sent by the task dispatcher 350 to their respective worker nodes 310 via the message passing system 315. Reply data for the query is then received from the worker nodes 31 0 via the message passing interface 315, and collated in the graph composer 355. The collated data is then processed by the graph query interface 345 to generate further tasks or answer data for sending to the query application 370.

[0046] Figure 4 shows in more detail the array structure used to store the neighbor cell list and distinct chain representation for certain examples. As shown, the neighbor list is stored in a temporal adjacency array having a plurality of temporal adjacency array cells. Each temporal adjacency array cell corresponds to an element in the neighbor list, i.e. a connection to a neighbor vertex, and accordingly the number of temporal adjacency array cells matches the number of connections. Each temporal adjacency array cell has the same size, and stored data such as a pointer to the neighbor vertex and a timestamp. Once a temporal adjacency array cell has been filled then the filled temporal adjacency array cell can be transferred to non-volatile memory for storage.

[0047] The distinct chain array has a plurality of distinct chain array cells, with each distinct chain array cell corresponding to a respective temporal adjacency array cell such that the number of distinct chain array cells matches the number of connections. In an example, the size of each distinct cell array cell is fixed. However, the number of distinct connections to a vertex will vary over time and from vertex to vertex. Accordingly, a distinct chain array cell is not necessarily filled, with the number of incomplete cells being at most the number of different neighbors for a vertex. Further, the size of a distinct chain array cell may not be large enough to store data indicative of all distinct connections to the vertex. To address this, data indicative of distinct connections to the vertex that cannot fit within the fixed size of the distinct chain array cell is put in an additional distinct chains array, which has a plurality of additional distinct chain array cells. [0048] The additional distinct chain array has a plurality of additional distinct chain array cells which will necessarily match the number of temporal adjacency array cells, given that for some temporal adjacency array cells the number of distinct connections may be equal to or less than the number that can be indicated in a distinct chain array cell. Further, the size of each additional distinct chain array cell is not necessarily the same. In other words, the additional distinct chain array cell are non-uniform in size. Each additional distinct chain array cell will be completed synchronously with the corresponding distinct chain array cell.

[0049] Figure 5 depicts a non-transitory computer-readable medium 510 storing data 520 representative of a dynamic graph. The computer-readable medium 51 0 may be connected to a processor 530. The computer-readable medium 510 may comprise any machine-readable storage media, e.g. such as a memory and/or a storage device. In certain examples, the computer-readable medium comprises nonvolatile memory. In one case, the processor 530 may be arranged to store data 520 in memory such as RAM during active storing and/or loading of graph data. In another example, processor 530 may be arranged to store data 520 in a persistent storage device such as a hard disk or solid state device as part of a storing and/or loading operation.

[0050] The data 520 comprises an adjacency data structure 540, the adjacency data structure 540 comprising an array of vertices 545 and a set of temporal adjacency arrays 548 corresponding to said vertices, for example as described above. The data 520 further comprises a set of distinct chain arrays 550 and a set of additional distinct chain arrays 560, for example as described above.

[0051] Hence, the data 520 may comprise the data that is generated as part of the store operation described above. It may also comprise data that is interrogated to load a portion of a time-varying graph as described above.

[0052] The non-transitory computer-readable medium 510 also stores graph data insertion instructions executable by the processor 530 to insert new data into the adjacency data structure 540 and modify the set of distinct chain arrays 550 and the set of additional distinct chain arrays 560 accordingly as set out above. Further, the non-transitory computer-readable medium 51 0 also stores graph data query instructions 580 executable by the processor 530 to perform query operations on the time-varying graph data.

[0053] Certain examples described above allow for efficient storing and subsequent loading of time-varying graph data, for example received as streaming data. As explained above, storing a log file of received events allows for rapid storage and requires relatively low disk space. However, retrieving data involves constructing a new graph data structure for each retrieval operation, which may take a considerable amount of time to perform and/or fail to complete for large datasets and constrained resources. Conversely, storing a complete copy of a graph data structure for each received event allows for efficient retrieval but requires significant time and resources at the time of storage. Certain examples described herein allow for such data to be efficiently received, stored, and subsequently loaded. This allows for such processing and storage to be implemented with a reduced time and computing resource burdens.

[0054] The preceding description has been presented to illustrate and describe examples of the principles described. This description is not intended to be exhaustive or to limit these principles to any precise form disclosed. Many modifications and variations are possible in light of the above teaching. It is to be understood that any feature described in relation to any one example may be used alone, or in combination with other features described, and may also be used in combination with any features of any other of the examples, or any combination of any other of the examples.

Previous Patent: GRAPH ANALYTICS SYSTEM

Next Patent: NURSE CALL SYSTEM WITH MOBILE DEVICE INTERFACE