Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A DEVICE FOR AND A METHOD OF EVALUATING A NETWORK RELATED RELEVANCE OF A NETWORK ELEMENT LINKED TO ONE OR MORE FURTHER NETWORK ELEMENTS IN A NETWORK, A PROGRAM ELEMENT AND A COMPUTER-READABLE MEDIUM
Document Type and Number:
WIPO Patent Application WO/2007/074172
Kind Code:
A1
Abstract:
A device (100) for evaluating a network related relevance of a network element (201) 5 linked to one or more further network elements (202 to 206) in a network (200), the device (100) comprising a determination unit (103) adapted for determining the network related relevance of the network element (201) based on a quantitative occurrence of the network element (201) in hits of a number of predetermined network element patterns (207), at least a part of the number of predetermined 10 network element patterns (207) being indicative of a link between at least two network elements (208, 209) representing a part of the network (201).

Inventors:
SCHREIBER FALK (DE)
SCHWOEBBERMEYER HENNING (DE)
KOSCHUETZKI DIRK (DE)
Application Number:
PCT/EP2006/070262
Publication Date:
July 05, 2007
Filing Date:
December 28, 2006
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
LEIBNIZ INST FUER PFLANZENGENE (DE)
SCHREIBER FALK (DE)
SCHWOEBBERMEYER HENNING (DE)
KOSCHUETZKI DIRK (DE)
International Classes:
H04L12/24; G06F17/30
Foreign References:
US20050278325A12005-12-15
Other References:
KLEINBERG J M: "Authoritative sources in a hyperlinked environment", JOURNAL OF THE ACM, ACM, NEW YORK, NY, US, vol. 46, no. 5, September 1999 (1999-09-01), pages 604 - 632, XP002226183, ISSN: 0004-5411
NOLKER R D ET AL: "Social Computing and Weighting to Identify Member Roles in Online Communities", WEB INTELLIGENCE, 2005. PROCEEDINGS. THE 2005 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON COMPIEGNE, FRANCE 19-22 SEPT. 2005, PISCATAWAY, NJ, USA,IEEE, 19 September 2005 (2005-09-19), pages 87 - 93, XP010841907, ISBN: 0-7695-2415-X
SCHREIBER, SCHÖBBERMEYER: "MAVisto: a tool for the exploration of network motifs", BIOINFORMATICS, vol. 21, no. 17, 1 September 2005 (2005-09-01), Germany, pages 3572 - 35724, XP008077898
Attorney, Agent or Firm:
NEUEFEIND, Regina (Elisenhof Elisenstrasse 3, Munich, DE)
Download PDF:
Claims:

Claims:

1. A device (100) for evaluating a network related relevance of a network element (201) linked to one or more further network elements (202 to 206) in a network

(200), the device (100) comprising a determination unit (103) adapted for determining the network related relevance of the network element (201) based on a quantitative occurrence of the network element (201) in hits of a number of predetermined network element patterns (207), at least a part of the number of predetermined network element patterns (207) being indicative of a link between at least two network elements (208, 209) representing a part of the network (201).

2. The device (100) according to claim 1, wherein the number of predetermined network element patterns (207) is one.

3. The device (100) according to claim 1, wherein the number of predetermined network element patterns (207) is larger than one.

4. The device (100) according to any one of claims 1 to 3, wherein at least a part of the number of predetermined network element patterns (207) disregards a link with at least one further network element representing another part of the network (200).

5. The device (100) according to any one of claims 1 to 4, wherein at least a part of the number of predetermined network element patterns (207) represents a subset of the network (200).

6. The device (100) according to any one of claims 1 to 5,

wherein at least a part of the number of predetermined network element patterns (207) includes characteristics concerning a coupling strength between the at least two network elements (208, 209) representing the part of the network (200).

7. The device (100) according to any one of claims 1 to 6, wherein at least a part of the number of predetermined network element patterns (207) includes characteristics concerning a coupling direction between the at least two network elements (208, 209) representing the part of the network (200).

8. The device (100) according to any one of claims 1 to 7, wherein at least a part of the number of predetermined network element patterns (207) includes characteristics concerning a spatial structure and/or concerning a logical structure formed by the at least two network elements (208, 209) representing the part of the network (200).

9. The device (100) according to any one of claims 1 to 8, wherein at least a part of the number of predetermined network element patterns (207) is indicative of a direct link of a network element to at least one directly coupled network element and is indicative of an indirect link of the network element to at least one indirectly coupled network element.

10. The device (100) according to any one of claims 1 to 9, wherein each of the number of predetermined network element patterns (207) is indicative of a link between the at least two network elements (208, 209) representing the part of the network (200).

11. The device (100) according to any one of claims 1 to 10,

wherein the determination unit (103) is adapted for determining the network related relevance of the network element (201) as a parameter value being proportional to a number of occurrences of the network element (201) in hits of the number of predetermined network element patterns (207).

12. The device (100) according to any one of claims 1 to 11, wherein the determination unit (103) is adapted for determining the network related relevance of the network element (201) based on a position of the network element (201) within hits of one of the number of predetermined network element patterns (207).

13. The device (100) according to claim 12, wherein the position of the network element (201) within the network element pattern (207) is indicative of a functional role of the network element (201) within the network element pattern (207).

14. The device (100) according to any one of claims 1 to 13, comprising a pattern identification unit (102) adapted for identifying the number of predetermined network element patterns (207).

15. The device (100) according to claim 14, wherein the pattern identification unit (102) is adapted for identifying the number of predetermined network element patterns (207) based on an analysis of the network (200).

16. The device according to claim 14 or 15, wherein the pattern identification unit (102) is adapted for identifying the number of predetermined network element patterns (207) based on a network pattern database

(104) including data indicative of patterns being characteristic for networks or for network types.

17. The device (100) according to any one of claims 1 to 16, comprising a ranking unit (105) adapted for ranking at least a part of the network elements (201 to 206) of the network (200) based on the determined network related relevance of the network elements (201 to 206).

18. The device (100) according to claim 17, comprising a network function regulation unit (106) adapted for regulating a function of the network (200) based on the ranking.

19. The device (100) according to any one of claims 1 to 18, adapted for evaluating the network related relevance of network elements (201 to 206) in a network (200) selected from the group consisting of a biological network, a bioinformatics network, a database comprising linked data items, a neural network, a computer network comprising linked computers, a document network comprising linked documents, an electrical circuit comprising interconnected electrical circuit members, a telecommunication network, a biochemical network, a social network, and a finance data analysis network.

20. A method of evaluating a network related relevance of a network element (201) linked to one or more further network elements (202 to 206) in a network (200), the method comprising determining the network related relevance of the network element (201) based on a quantitative occurrence of the network element (201) in hits of a number of predetermined network element patterns (207), at least a part of the number of

predetermined network element patterns (207) being indicative of a link between at least two network elements (208, 209) representing a part of the network (201).

21. A program element of evaluating a network related relevance of a network element (201) linked to one or more further network elements (202 to 206) in a network (200), which program element, when being executed by a processor (100), is adapted to control or carry out a method comprising: determining the network related relevance of the network element (201) based on a quantitative occurrence of the network element (201) in hits of a number of predetermined network element patterns (207), at least a part of the number of predetermined network element patterns (207) being indicative of a link between at least two network elements (208, 209) representing a part of the network (201).

22. A computer-readable medium, in which a computer program of evaluating a network related relevance of a network element (201) linked to one or more further network elements (202 to 206) in a network (200) is stored, which computer program, when being executed by a processor (100), is adapted to control or carry out a method comprising: determining the network related relevance of the network element (201) based on a quantitative occurrence of the network element (201) in hits of a number of predetermined network element patterns (207), at least a part of the number of predetermined network element patterns (207) being indicative of a link between at least two network elements (208, 209) representing a part of the network (201).

Description:

A device for and a method of evaluating a network related relevance of a network element linked to one or more further network elements in a network, a program element and a computer-readable medium

This application claims the benefit of the filing date of US Provisional Patent Application No. 60/754,933 filed December 29, 2005, the disclosure of which is hereby incorporated herein by reference. The invention relates to a device for evaluating a network related relevance of a network element linked to one or more further network elements in a network.

The invention further relates to a method of evaluating a network related relevance of a network element linked to one or more further network elements in a network. Moreover, the invention relates to a program element.

Further, the invention relates to a computer-readable medium.

A network may be considered to comprise components and connections therebetween. An example for a network is a neural network being an interconnected group of artificial or biological neurons. It is possible to differentiate between two groups of neural networks: On the one hand biological neural networks, for example the human brain or parts thereof. On the one hand artificial neural networks which refer to electrical, mechanical or computational simulations or models of biological neural networks. There exist hybrids, incorporating biological neurons as part of electronic circuits.

It may be of interest to distinguish between more relevant and less relevant network elements within a network.

An example is the PageRank method used by Google, Inc. (US 6,285,999). This method ranks the web pages with the help of their positions within the network (web) on the basis of the references (links) between the (HTML) documents.

US 6,285,999 discloses a method which assigns importance ranks to nodes in a linked database, such as any database of documents containing citations, the world wide web or any other hypermedia database. The rank assigned to a document is calculated from the ranks of documents citing it. In addition, the rank of a document is calculated from a constant representing the probability that a browser through the database will randomly jump to the document. The method is particularly useful in enhancing the performance of search engine results for hypermedia databases, such as the world wide web, whose documents have a large variation in quality.

It is an object of the invention to efficiently evaluate the relevance of a network element in a network.

In order to achieve the object defined above, a device for and a method of evaluating a network related relevance of a network element linked to one or more further network elements in a network, a program element and a computer- readable medium according to the independent claims are provided.

According to an exemplary embodiment of the invention, a device for evaluating a network related relevance of a network element linked to one or more further network elements in a network is provided, the device comprising a determination unit adapted for determining the network related relevance of the network element based on a quantitative occurrence of the network element in hits of a number of predetermined network element patterns, at least a part of the predetermined network element patterns being indicative of a link between at least two network elements representing a part of the network.

According to a further exemplary embodiment of the invention, a method of evaluating a network related relevance of a network element linked to one or more further network elements in a network is provided, the method comprising determining the network related relevance of the network element based on a quantitative occurrence of the network element in hits of a number of predetermined

network element patterns, at least a part of the predetermined network element patterns being indicative of a link between at least two network elements representing a part of the network.

According to still another exemplary embodiment of the invention, a program element is provided, which, when being executed by a processor, is adapted to control or carry out a method having the above-mentioned features.

According to yet another exemplary embodiment of the invention, a computer-readable medium (e.g. a CD, a DVD, a USB stick, a floppy disk or a harddisk) is provided, in or on which a computer program is stored which, when being executed by a processor, is adapted to control or carry out a method having the above-mentioned features.

The system according to embodiments of the invention can be realized by a computer program, that is by software, or by using one or more special electronic optimization circuits, that is in hardware (for instance including one or more microprocessors), or in hybrid form, that is using software components and hardware components.

According to an exemplary embodiment of the invention, a method for a pattern-based ranking or rating of network elements within a network is provided comprising the determination of each incidence of a pattern or of a set of patterns in the network to be analyzed. Each incident may be denoted as a score or a hit. The entirety of incidences forms a set of hits. On the basis of such a pattern recognition result, it may be estimated for each network element (for instance for each node of the network) how often this element occurs in the various hits of the hit set. The number of incidents may then be taken as a basis for evaluating the importance of this network element.

With such an algorithm, network- related data (for instance data related to a linked database, to a biological network, to a search engine for the Worldwide Web, to a finance analysis system) may be ranked efficiently and logically so that a

meaningful statement may be made which network elements might be of higher relevance compared to other network elements.

According to an exemplary embodiment, not only the local environment of a single network element (for instance the number of couplings of this network elements) may be taken into account for estimating the relevance of this particular network element. Furthermore, a system according to an exemplary embodiment of the invention does not only exploit global network information, like an estimation of all network elements of a network to determine information concerning the relevance of individual nodes. In contrast to this, an approach between these two extremes may be taken. According to such an approach, patterns in the form of spatial or logical structures in the order of magnitude between a single network element restricted approach and an approach considering all network elements may be taken. This may, on the one hand, provide more significant and more reliable relevance information as compared to a single node approach. On the other hand, the computational burden for mapping virtual patterns to a physical network may be lower as compared to an analysis which takes into account all network elements, particularly in a scenario in which the analysis of a part of the network is sufficient. Thus, embodiments of the invention take a reasonable middle course.

The number of hits with regard to patterns to which a particular network element under consideration contributes may be taken as a measure for the importance of this network element within the entire network system. After an optional pattern recognition procedure for recognizing appropriate patterns within the network, such pattern information may be used for an initial evaluation or for an updated evaluation of network elements. For instance, components of a gene regulation network may be evaluated using such relevance information. Genes, more particularly parts of genes, may encode biological information for protein synthesis. In a gene regulation network, individual genes may switch on other genes and/or may switch off other genes. In

such a gene regulation network, patterns may represent the structural and/or functional coupling of, for instance, three to four gene portions as network elements and may represent the connections therebetween. Embodiments of the invention may allow, for such a biological network, pattern recognition and pattern evaluation. Thus, genes or gene portions may be detected which occur relatively often in such patterns. Taking this measure, more important gene parts may be distinguished from less important gene parts. Therefore, regulatory networks for plants may be analyzed. According to another exemplary embodiment, a (text) document database of a plurality of linked (text) documents may be analyzed. By analyzing the manner according to which different documents are linked to one another, by analyzing which linkage patterns occur in such a system, and by analyzing how often individual documents occur in such linked structures, it may be possible to distinguish automatically between more important documents and less important documents. A number of pattern recognition methods are known to the person skilled in the art. However, particular reference is made to Schreiber, F. and Schwδbbermeyer, H., "MAVisto: a tool for the exploration of network motifs", Bioinformatics, 21, pages 3572-3574, 2005. The entire disclosure of this document is herewith incorporated by reference in the disclosure of this application. Particularly, explicit reference is made to those portions of the cited document which are related to automatic pattern recognition in networks. These methods may be particularly appropriate for use with a system according to an exemplary embodiment of the invention.

MAVisto, a software for the identification and examination of patterns in networks may allow for an evaluation of individual nodes of the network based on identified pattern distributions. In this context, various frequency concepts may be applied according to which the evaluations may be calculated. The corresponding algorithms and processes have already been implemented on a computer and have

been tested for networks of up to several hundred components. These experiments have proven the reliability of the results obtained using methods according to exemplary embodiments of the invention.

According to an exemplary embodiment, a method for evaluating network elements based on their contribution to defined link patterns may be provided. A methodology according to an exemplary embodiment includes, in a first step, a determination and identification of any desired pattern in a given network, followed by the (qualitative or quantitative) evaluation of each individual network element in accordance with the number of independent or overlapping patterns the respective network element is part of.

Such a method may be used for regulating an analysis of networks in various technical fields like bioinformatics, process technology or information technology. According to an exemplary embodiment, the concern of network elements (like neurons in a neural network) is estimated based on the frequency with which an individual network element occurs in one or more defined linkage patterns. Thus, structures of a network element under examination linked with next neighbours and/or second next neighbours, etc., may be taken as a criteria how important the role of this individual network element might be in the entire network. If desired, in addition to this priority scheme, further criteria may be included optionally to determine the relevance of an individual network element. For instance, the estimation of node degree centralities, like for instance local maxima, the sum of the shortest paths, a position of the network element in the network or a purely numerical analysis of linear links of individual network elements to their partners may be taken into account. In other words, apart from the spatial structure of a network element in a pattern image, local and/or global network characteristics may be considered additionally for further refining the relevance ranking of individual network elements.

Exemplary embodiments of the invention are based on the recognition, that, for instance in biological systems, particular patterns of linkages between various elements can be found which cannot be illustrated by established correlations or centralities. However, such links may be of significance for the determination and identification of patterns in given networks, as well as for the evaluation of each individual network elements in accordance with the number of independent or overlapping patterns the respective network element is part of.

Based on this recognition, reliable network evaluation rules may be derived. Such information may be taken as a basis for regulating complex networks, for instance in process technology, in database systems, in the Internet, in the analysis of bioinformatics data, and also in academic fields like economic science. The fast identification of key nodes of a network may be of importance for the secure and accurate transmission or routing of information or signals, for instance in the field of communication technology. According to an exemplary embodiment, a method for a pattern-based evaluation of network elements is provided. Such a method for evaluating elements of a network may be applied, for instance, to a biological network, an electronic circuit or a database comprising documents which are cross- linked to one another (for instance hypertext documents in the Worldwide Web). In contrast of purely local or purely global structures within a network, embodiments of the invention use patterns, that is spatial and/or logic sub -networks within the network for estimating the relevance of individual network elements. Particularly, the utilization of pre-known patterns in networks (for instance patterns like a "feed- forward loop") may be of significance for a reliable ranking of the network elements. Thus, information concerning pattern shapes in networks may be implemented in a system for estimating of the importance of individual network elements.

Exemplary technical fields in which embodiments of the invention may be applied are all fields in which complex networks occur. Products which may be developed, operated or manufactured using embodiments of the invention may be systems for the analysis of network based data (like web search engines), bioinformatics tools, programs for finance data analysis, or the like.

Next, further exemplary embodiments of the invention will be explained.

In the following, further exemplary embodiments of a device for evaluating a network related relevance of a network element linked to one or more further network elements in a network will be explained. However, these embodiments also apply for the method of evaluating a network related relevance of a network element linked to one or more further network elements in a network, for the computer- readable medium and for the program element.

The number of predetermined network element patterns may be exactly one. The use of only a single pattern and the analysis of the occurrence of this pattern within a given network may already provide significant relevance information and may allow to rank network elements with low computational burden. Thus, the evaluation of a network can be performed very fast so as to rapidly provide a user with fundamental information concerning possib Iy relevant parts or items of the network.

Alternatively, the number of predetermined network element patterns may be larger than one. By using a plurality of predetermined network element patterns for estimating the relevance of network elements in the network, a more refined approach may be taken allowing a deeper understanding and an even more reliable identification of key network elements within the network.

At least a part of the predetermined network elements may disregard a link with at least one further network element representing another part of the network. In other words, the pattern-based analysis of the network may include patterns which

represent only a real subset of the network. In contrast to global approaches, patterns do not consider links between all network elements but may consider only a part, preferably a relatively small part of the network elements. This may allow for a simple and fast search in the network for portions of the network which fulfil the pattern requirements.

At least a part of the predetermined network element patterns may represent a subset (particularly a proper subset or a proper subclass) of the network. For instance, the number of network elements being part of a pattern may be n, wherein the number of network elements in the entire network may be N. Then, n is preferably smaller than N so that the pattern forms a real subset of the entire network.

At least a part of the predetermined network element patterns may be indicative of a coupling strength between the at least two network elements representing the part of the network. A network may comprise a plurality of network nodes (for instance neurons of a neural network), wherein individual network elements may be coupled to one another via connection lines, for instance so-called synapses. These couplings may have different strength values so that more significant or frequently used connections may be described with a higher strength value than less relevant network element connections. A pattern may take such information concerning individual couplings into account. One rule characterizing a pattern may be that a portion of a network may be considered to map to a pattern only if the strength values of the individual linkages between the network elements fulfil certain conditions, for instance concerning the ratios of the individual strengths.

Additionally or alternatively, at least a part of the predetermined network element patterns may be indicative of a coupling direction between the at least two network elements representing the part of the network. Embodiments of the invention may be applied to a directed network or to a non- directed network. A non- directed network comprises network elements coupled without a certain coupling direction between two individual network elements. A directed network comprises

network elements coupled with a certain coupling direction between two individual network elements. A coupling direction from a first network element to a second network element may encode the fact that the source network element of this connection serves as some kind of control network element and the destination network element of such a coupling pair is a controlled instance. It is possible that the direction of a connection between two network elements is defined to extend from one network element to the other, but it is also possible that bi-directional couplings are provided, so that a signal transfer in both directions between the two network elements is possible. At least a part of the predetermined network element patterns may be indicative of a spatial structure and/or of a logical structure formed by the at least two network elements representing the part of the network. In other words, a pattern may include some kind of structural or functional or logical coupling scheme of a coupling between a plurality of network element nodes. At least a part of the predetermined network element patterns may be indicative of a direct link of the network element to at least one directly coupled network element and may be indicative of an indirect link of the network element to at least one indirectly coupled network element. A "direct" link between two network elements may be defined as a link which allows a signal transfer between one network element to another network element without any intermediate instance. An "indirect" link of one network element to another network element may be present in a case in which at least one intermediate network element is located in between two network elements when the signal is exchanged between these two indirectly coupled network elements. Not only a part, but each of the predetermined network element patterns may be indicative of a link between the at least two network elements representing the part of the network. Thus, not only a part of, but all considered patterns may

represent such a network element correlation having a characteristic in between a purely local correlation and a purely global correlation.

The determination unit may be adapted for determining the network related relevance of the network element as a parameter value (for instance a number) being proportional to a number of occurrences of the network element in the number of predetermined network element patterns. Thus, as a result of the analysis, one, a part of, or all network elements may be assigned with a numerical value being a measure for the relevance of the respective network element within the network. This number may be also normalized (for instance to fall in a range between "0" and "1") so as to provide a value representing the relative importance of an individual network element within the network.

The determination unit may be adapted for determining the network related relevance of the network element based on a position of the network element within one of the number of predetermined network element patterns. Thus, not only the frequency of occurrences of a network element in the pattern(s) may be taken as a basis or a criteria for a decision whether this particular network element is more or less relevant for the entire network, but a particular (spatial or logical) position of a network element within such a pattern may be used to weight the importance of the individual network element within a pattern structure. Particularly, the position of the network element within the network element pattern may be indicative of a functional role of the network element within the network element pattern. For example, in a feed- forward loop pattern, which will be described below in more detail, the position and the functional role of an individual network element within such a structure may be different for individual network elements of this pattern. Considering a position or a function of a network element with a pattern and assigning "relevance credits" in dependence of this position or function may allow to further refine the relevance evaluation of embodiments of the invention, since the

functional role within the network may assist in distinguishing between more important and less important network elements within a pattern.

The device may comprise a pattern identification unit adapted for automatically identifying the number of predetermined network element patterns. Particularly appropriate network element pattern recognition methods are disclosed in Schreiber, F. and Schwδbbermeyer, H., "MAVisto: a tool for the exploration of network motifs", Bioinformatics, 21, pages 3572-3574, 2005, the entire disclosure of which is herewith incorporated by reference in the disclosure of this application.

The pattern identification unit may be adapted for identifying the number of predetermined network element patterns based on an analysis of the network. Thus, characteristics of a special network under examination may be taken to individually derive characteristic pattern structures from this particular network.

Additionally or alternatively, the pattern identification unit may be adapted for identifying the number of predetermined network element patterns based on content stored in a network pattern database including data indicative of patterns being characteristic for networks or for network types. Particularly, empiric data, experimental knowledge and/or expert rules may be applied to analyze a network with regard to the occurrence of patterns. For instance, a feed- forward loop or other typical functional coupling schemes between different network elements may be taken from the database which may be of relevance to occur in networks. The usage of such a database may be refined by distinguishing between different network types, for instance directed and non- directed networks, or biological or artificial networks. The device may further comprise a ranking unit adapted for ranking at least a part of the network elements of the network based on the determined network related relevance of the network elements. As an output of such a ranking unit, an ordered table of the network elements may be provided, wherein each network element may be assigned to a corresponding relevance value. The ranking unit may provide an ordered table of the network elements so as to allow a fast and easy overview over

more and less relevant network elements. It is also possible that the network elements are grouped in various importance groups.

The device may further comprise a network function regulation unit adapted for regulating a function of the network based on the ranking. When a network is controlled, for instance a search engine of the public Internet, the order according to which documents are output as a result of a search query may be adjus ted in accordance with the relevance of the network elements. This may ensure, for instance in the case of such a search engine, that a user is provided predominantly with more relevant network elements (for instance documents within the network) as compared to less relevant networks.

The device may be adapted for evaluating the network related relevance of network elements in a network selected from the group consisting of a biological network, a bioinformatics network, a database comprising linked data items, a neural network, a computer network comprising linked computers, a document network comprising linked documents, an electrical circuit, and a finance data analysis network.

In the following, a method for pattern-based ranking of network elements according to an exemplary embodiment of the invention will be explained.

The ranking of elements of a network has many technical applications such as the identification of key documents in a document database, the detection of important pages in the world wide web, the recognition of key players in the network of financial dependencies between companies and the finding of important genes in a biological regulatory network.

Centrality measures are based on a limited set of concepts and usually consider either the local topology (i.e., the direct neighborhood of an element, for example, by counting all links to neighbors) or the global topology (i.e., the overall structure of the network, for example, by building the sum of the shortest paths to all other elements and considering elements with a low sum as central). According to an

exemplary embodiment of the invention, a method for the ranking of network elements is provided which uses patterns of local interconnections (local structural properties) within the network. It uses information derived from the occurrences of such patterns to rank the network elements. Usually these patterns are small and therefore this concept considers information placed between the local and global topology.

Such a method may be particularly useful for certain biological networks, but can be used for different kinds of networks such as linked documents. In biology, the application of high- throughput methods (transcriptomics, proteomics, metabolomics) delivers huge amounts of data about biological elements and their interactions (e.g., protein-protein interactions, correlation networks, gene regulation). These experiments aim on gaining new scientific insight, for instance, finding of new target genes for drug development. A specific prediction of important elements which serve as potential candidate genes for a certain question can serve to focus the necessary experiments stronger and thereby reduce time and costs of experiments. One possibility for such predictions is based on the topology of the biological network. Based on the topology, the single elements of the network can be ranked and clustered depending on their importance.

One aspect of the method for the pattern-based ranking of network elements may consider different aspects of the context (of the local structural properties), so that the evaluation or ranking of network elements can be customized to specific questions.

A pattern may be formed by a plurality of consecutively connected network elements. However, a pattern may also be formed by a plurality of parts which are not connected to one another.

The aspects defined above and further aspects of the invention are apparent from the examples of embodiment to be described hereinafter and are explained with reference to these examples of embodiment.

The invention will be described in more detail hereinafter with reference to examples of embodiment but to which the invention is not limited.

Fig. 1 shows a device for evaluating a network related relevance of a network element linked to further network elements in a network according to an exemplary embodiment of the invention.

Fig. 2A shows a network and Fig. 2B shows a network pattern occurring in the network of Fig. 2A.

Fig. 3A shows another network and Fig. 3B shows another network pattern occurring in the network of Fig. 3A.

Fig. 4A shows a pattern, Fig. 4B shows a network and Fig. 4C shows a match of the pattern which is highlighted in the target network, wherein only the structure (topology), but not the current layout (geometry) of the pattern and the network are of interest. Fig. 5A and Fig. 5B show two important patterns in biological networks, namely a bi-fan (Fig. 5A) and a feed- forward loop (Fig. 5B).

Fig. 6A shows a pattern (feed- forward loop), Fig. 6B shows a network and Fig. 6C to Fig. 6E show all matches of the pattern of Fig. 6A in the network of Fig. 6B. Fig. 7A shows a pattern P 1 , Fig. 7B shows a network showing the pattern- based ranking of network elements for this pattern, Fig. 7C shows a pattern p 2 , and Fig. 7D shows the same network as in Fig. 7B, now with the pattern-based ranking computed using pattern p 2 .

Fig. 8 A shows a pattern with roles and Fig. 8B shows a small network where three matches of the pattern exist.

Fig. 9 shows a table illustrating a correlation between network elements and a pattern-based network ranking value.

Fig. 10 shows a table illustrating a correlation between network elements and a role-based pattern-based network ranking value.

Fig. 11 shows a table illustrating a correlation between network elements and a pattern-based network ranking value for the gene regulatory network of E. coli. Fig. 12 shows a table illustrating a correlation between network elements and a role-based pattern-based network ranking value for the gene regulatory network of E. coli.

Fig. 13 shows the gene regulatory network of E. coli.

Fig. 14A to Fig. 14C illustrate a graph, a motif and a match for a system according to an exemplary embodiment of the invention.

Fig. 15A and Fig. 15B illustrate a feed-forward loop (FFL) motif and a target graph for a system according to an exemplary embodiment of the invention.

Fig. 16 illustrates motif based centralities for the graph shown in Fig. 15B according to the FFL motif. Fig. 17A and Fig. 17B illustrate the FFL motif with roles A, B and C and the example graph from Fig. 15B.

Fig. 18 illustrates an extended motif based centrality based on the FFL motif with roles (Fig. 17B) in the example graph (Fig. 17A).

Fig. 19A and Fig. 19B show a motif with three vertices and a target graph, wherein the use of two different roles (B and C) for the vertices at the bottom of the motif is not allowed.

Fig. 20 shows a sketch of the chain motif class, wherein for centrality analysis based on this motif class the roles A or B are of interest.

Fig. 21 shows an example graph to illustrate the concept of the motif class centrality for a system according to an exemplary embodiment of the invention.

Fig. 22 shows a table illustrating centralities for the vertices of the example graph in Fig. 21 based on the chain motif class of Fig. 20, wherein the length of the

chain is defined as the number of vertices in the chain, and the centrality values for role A (top vertex of each chain) for the motif class chains are shown.

Fig. 23 shows a table of top ranking genes of the E. coli GRN according to the motif based centrality based on the FFL motif. Fig. 24 shows a table showing top ranking genes of E. coli GRN according to the sum of the extended motif based centrality values based on the FFL motif, wherein reference is made to Fig. 17A for the roles, and the centrality values of the genes are shown for each role.

Fig. 25 shows top ranking genes of E. coli GRN according to the extended motif based centrality values based on the FFL motif shown for each role, wherein numbers in brackets are the centrality values.

Fig. 26 shows a table of top ranking genes of the E. coli GRN according to the motif class centrality based on the chain motif class for the role A (see Fig. 20), wherein there are no chains with a length greater than 7, wherein a chain of length 2 gives the number of directly regulated genes.

The illustration in the drawing is schematically. In different drawings, similar or identical elements may be provided with the same reference signs. In the following, referring to Fig. 1, a device 100 for evaluating a network related relevance of a network element linked to further network elements in a network according to an exemplary embodiment of the invention will be described.

The device 100 for evaluating a network related relevance of a network element linked to a plurality of further network elements comprises a network input unit 101. The network input unit 101 may be denoted as a source or an origin of an individual network and describes or defines the coupling of individual network elements of a given network. For instance, the network input unit 101 may include

(links to) all documents of the Worldwide Web reachable via a search engine which documents are linked to one another.

The network characteristics may be supplied to a determination unit 103 which is adapted for determining a network related relevance value for each of the network elements of the network defined by the network input unit 101 based on a quantitative occurrence of the network elements in hits of a number of predetermined network elements patterns. The network element patterns are indicative of a link between at least two network elements representing a part of the network. The determination unit 103 may estimate the importance of one, a part or all network elements of the network defined by the network input unit 101 based on network pattern identification information provided by a network pattern identification unit 102.

This network pattern identification unit 102 may recognize patterns from the particular network provided by the network input unit 101 and/or may use network pattern database information provided by a network pattern database 104.

In such a network pattern database 104, characteristic, typical or frequently occurring patterns in networks, if desired separately for individual network types, may be stored. This information may be used by the determination unit 103 so as to evaluate the importance of the individual network elements of the network defined by the network input unit 101.

At an output of the determination unit 103, individual network elements of the network are assigned to corresponding relevance values which are supplied to a ranking unit 105. The ranking unit 105 may order the network elements concerning their relevance and may store the result in a look-up table. The ranking performed by the ranking unit 105 may serve as a basis for the operation of a network control unit 106 controlling or regulating the network input unit 101. In other words, the handling of the network may be regulated or updated based on the ranking information processed by the network control unit 106.

Coming back to the example of the number of documents linked to one another in the Worldwide Web, when a user operates a system 100 and performs a keyword search in the Worldwide Web, the resulting documents matching with the user- defined search criteria may be ordered in accordance with the relevance ranking of the network elements (that is to say of the documents) forming the network.

Fig. 2A shows a network 200 to be analyzed according to an exemplary embodiment of the invention.

The network 200 includes a plurality of network nodes 201 to 206 which are interconnected to one another using connections 210. A coupling strength may or may not be assigned to each of the connections 210.

For analyzing the network 200, a network pattern 207 shown in Fig. 2B may be utilized. This virtual network pattern 207 consists of two network elements 208, 209 coupled via a connection 211.

When the network 200 is analyzed with regard to the occurrence of the network pattern 207, each of the network elements 201 to 206 is assigned with a number illustrated within the respective rectangular symbols of the network elements 201 to 206. The central network element 201, for instance, has assigned a relevance value of "4", since it occurs four times in hits of the pattern 207, namely in combination with each of the network elements 202, 203, 204 and 205. The network element 205 occurs two times in hits of the pattern 207, namely in combination with the network element 201 and in combination with the network element 206. Each of the remaining network elements 202 to 204, 206 only occurs a single time in hits of the pattern 207 so that the corresponding values of relevance are "1" in all four cases.

Thus, Fig. 2A and Fig. 2B show an example for a pattern-based evaluation in an undirected network. The numbers within the nodes 201 to 206 of the network 200 show the results of the evaluation in accordance with the pattern 207.

Fig. 3A shows a network 300 and Fig. 3B shows a network pattern 310 occurring in the network 300 of Fig. 3 A.

Fig. 3A shows a plurality of interconnected network elements 301 to 307 which are coupled by directed connections 315. The network pattern 310 represents a coupling between three network elements 311 to 313 interconnected via connections 316. The network pattern 310 may be mapped onto the network 300 so as to provide relevance values for each of the physical network elements 301 to 307. In contrast to this, the network elements 311 to 313 may be denoted as artificial or virtual network elements, since they occur in the pattern 310 and not directly in the network 300.

Fig. 3 A and Fig. 3B is an example for a pattern-based evaluation in a directed network. The numbers within the nodes 301 to 307 of the network 300 are indicative of the evaluation of the individual network nodes 301 to 307 in accordance with the pattern 310.

Node 301, for instance, has a relevance value of "2", since node 301 forms part of the pattern 310 in combination with nodes 305 and 307, and with nodes 306 and 307. In the following, definitions and descriptions for a method for pattern-based ranking of network elements according to an exemplary embodiment of the invention will be given.

Next, a network, a pattern, and a match will be explained

A network G = (V, E) consists of a finite set of vertices V = {vi,...,v n } and a finite set of edges E = {ei,...,e m } where each edge connects two vertices v h V j . A vertex may also be denoted as a network node. Edges can be either undirected or directed. Two networks Gi = (Vi, Ei) and G2 = (V 2,Ei) are isomorphic, if

(1) there exists a bijective mapping between the vertices in Vi and V 2 ,

(2) there is an edge between two vertices of one network if and only if there is an edge between the two corresponding vertices in the other network.

A network Go = (Vo, Eo) is a sub -network of a network G = (V,E) if Vo is a subset of V and Eo is a subset of E\(VoX Vo).

A pattern as used here may be a small, network. A match of a pattern within a

target network G is a network Go, which is

(1) a sub -network of G and

(2) isomorphic to the pattern.

Fig. 4Ato Fig. 4C show a pattern 400 formed of virtual network elements 401 interconnected via couplings 402, show a network 410 formed of real or actual network elements 411 interconnected via couplings 412, and show a match of the pattern 400 in the network 410.

Next, a pattern search will be explained.

The analysis of patterns of local interconnections within networks may be of interest to many technical research areas. Network pattern analysis may be particularly important for the functional analysis of biological networks.

An example of a pattern 500 (which may also be denoted as a motif) is the bi- fan (see Fig. 5A).

Another example of an important pattern 510 (also called motif) is the feed- forward loop (see Fig. 5B) which has been shown to perform information processing tasks.

As the finding of matches of a pattern in a network may be important, several methods for pattern- finding do exist. These methods can be used in a method for pattern-based ranking of network elements according to an exemplary embodiment of the invention.

The result of such an algorithm is a list of all matches of a particular pattern, see Fig. 6 A to Fig. 6E

Fig. 6A shows a pattern 510 (feed- forward loop), Fig. 6B shows a network 600 and Fig. 6C to Fig. 6E show all matches 610, 620, 630 of the pattern 510 in the network 600.

Next, a method description will be given.

Prerequisite:

- A network consisting of network elements (vertices) v and connections

(edges) e (if necessary the network has to be derived from the data)

- All network elements v have a centrality value of 0 (c(v) = 0) Method:

1. Choose a pattern p of interest 2. Compute all matches m of the pattern p in the network

3. For each network element v:

3. a. For each match m of the pattern p :

3.aa. If the element v is part of the match m then increase the centrality value of v by 1 (c(v) = c(v)+l) 4. Rank all network elements depending on the centrality value of the elements

For two examples, see Fig. 7A to Fig. 7D.

Fig. 7A shows a pattern 700 /?;, Fig. 7B shows a network 710 showing the pattern-based ranking of network elements for this pattern 700. Fig. 7C shows a pattern 720 p 2 , and Fig. 7D shows the same network 710 as in Fig. 7B, now with the pattern-based ranking computed using pattern 720 p 2 .

The result of the method applied to a biological network (gene -regulatory network) is presented below.

As higher the number of a vertex is, as more important is this vertex. Next, a role based determination of centrality values will be explained.

For some applications the position of a vertex within a pattern is of interest. This enriched problem can be solved with an extended method. The extension uses roles, a way to classify vertices in a pattern. As an example, a regulatory pattern will be considered. Fig. 8A shows a pattern 800 with roles A, B, C and Fig. 8B shows a small network 810 where three matches of the pattern 800 exist.

Within this pattern, one vertex (vertex A) is the main regulator, one (vertex B) is an intermediate regulator and the third vertex (vertex C) is the regulated entity,

see Fig. 8 A, Fig. 8B.

In Fig. 8 A, the pattern 800 is shown where the vertices are marked with the roles A, B and C. In Fig. 8B, the small network 810 is shown. In this network the pattern matches exactly three times. Without the interpretation of roles the vertices marked vl to v5 are assigned the centrality values shown in Fig. 9.

By considering roles for the calculation of the pattern-based ranking, the results shown in Fig. 10 are obtained.

This extended concept considers different aspects of the context (of the local structural properties) of a vertex, so that the evaluation or ranking of network elements can be customized to specific question. The assignment of the same role to several vertices within the pattern is possible. This extended method, as the basic method, is applicable for directed and undirected networks.

Formally, the addition of roles to the patterns may be accomplished by the notation of a labelled network. A labelled network is a network G = (V, E) with an additional label function 1: Vl R where R is the set of labels (roles) available. The assignment of labels to the vertices does not have to be unique, therefore several vertices may receive the same label.

In the following, an extension of the above method will be described.

The computation of the pattern-based ranking including roles follows the same line as the computation of the centrality for the case without roles. Only in step 4 the method has to be modified.

Prerequisite:

- A network consisting of network elements (vertices) v and connections (edges) e (if necessary the network has to be derived from the data) - All network elements v have a centrality value of 0 under all roles r (c(v,r) =

0)

Extended method:

1. Choose a pattern p with roles of interest

2. Compute all matches m of the pattern p in the network without consideration of the roles

3. For each network element v:

3. a. For each match m of the pattern p : 3.aa. If the element v is part of the match m at role r then increase the centrality value of v for role r by 1 (c(v,r) = c(v,r)+\)

4. Rank all network elements according to their centrality value depending on the role of interest

In the following, an application of ranking to network elements in biological networks according to an exemplary embodiment of the invention will be explained. Network analysis for biological networks covers for example protein- protein interaction networks, brain networks representing neuronal connectivity and gene regulatory networks. In gene regulatory networks vertices correspond to the DNA sequences (genes) and edges represent interactions between genes (i.e., if the corresponding product of one gene interacts with the promoter of the regulated gene). To understand the complex mechanisms of gene regulation, the identification of key regulators is important as they control the structure and function of cells and organisms.

Here the known gene regulatory network of E. coli which comprises 418 vertices and 519 edges is analysed, see Fig. 13. Based on the feed- forward loop pattern, the pattern-based ranking of network elements calculates a high rank for a few transcription factors, see the table of Fig. 11. These factors are known to be global regulators which control the transcription of many genes under particular conditions. Therefore the method is able to identify these key elements from the network structure, see Fig. 11.

The vertices of the feed- forward loop pattern can be distinguished into three different functional roles: a main regulator, an intermediate regulator and a regulated entity (see Fig. 8A, Fig. 8B). The importance of genes in the transcriptional

regulatory network of E. coli can be further studied by considering the covering of functional roles for the pattern. This analysis gives an even better understanding of the different network elements as it shows that the vertices have only one role in different matches and adopt another role at most in one other match. The results (see Fig. 12) indicate that the ranking of transcription factors based on the network-based ranking for the feed-forward loop pattern can distinguish between transcriptional regulators and regulated entities.

Next, ranking of gene regulatory network elements based on functional substructures will be explained in more detail. Network centrality analysis is a valuable method for the structural analysis of biological networks. It may be used to identify key elements within networks and to rank network elements such that experiments can be tailored to interesting candidates. Several centrality measures are conventionally used, in particular for gene regulatory, metabolic and protein interaction networks. However, the conventional centralities either consider only the direct neighbourhood of a network element or use a global analysis of the complete network structure to compute the importance of single elements. In particular, they ignore functional building blocks within biological networks and therefore do not consider specific network substructures of interest. According to an exemplary embodiment of the invention, it is possible to incorporate functional substructures (motifs) into network centrality analysis and to implement a new architecture to rank vertices of networks. A method for motif based centrality analysis and two extensions are presented below. These extensions broaden the idea of motif based centrality to specific functions within motifs and to the consideration of classes of similar networks. The method is applied to the gene regulatory network of E. coli, where it yields interesting results about key regulators.

Systems biology includes the analysis of biological networks, for example, protein interaction, metabolic and gene regulatory networks. Such networks comprise

vertices and edges, wherein the vertices represent, for example, proteins, metabolites or genes, and edges represent functional relations between these elements. Of particularly interest is the ranking or ordering of such network elements. This may be useful for studying biological networks such as the lethality analysis and protein interaction networks and the examination of metabolic pathways and protein domain networks.

Conventional centrality measures consider either only the local or the global network structure for the computation of the ranking of network elements. However, to analyze functional interactions between network elements, the neighbourhood of these elements should be incorporated. For this task, the concept of network motifs may be introduced. A network motif is a pattern (sub -network) of local interconnections with particular statistical or functional properties. The analysis of network motifs is particularly important for the study of functional properties of biological networks such as gene regulation or protein interaction. In the following, an approach will be presented that combines the strength of motif analysis with the concept of centralities. This leads to, in total, three new centrality measures.

In the following, graphs, centralities and motifs will be introduced to present the concepts of motif based centralities and algorithms for computing them. Next, motif based centralities will be discussed. Motif based centralities will be defined and algorithms for computing them will be given.

Fig. 14A shows a graph G 1400, Fig. 14B illustrates a motif M 1410, and Fig. 14C shows a match G M 1420 of M in G.

A directed graph G=(V, E) consists of a set of vertices (or nodes) V and a set of edges (or arcs) E c: (VxV) . V(G) and E(G) denote the vertex set and the edge set of the graph G, respectively. A graph G'=(V, E') is a sub-graph of the graph G=(VE) (written as G'c G ) if VcV and £'c E n (VxV) . A graph G 1 =(VuE 1 ) is isomorphic to a graph G 2 =(V 2 ,E 2 ) (written as G 1 ^G 2 ) if a bijective mapping

φ : V 1 → V 2 exits with Vv a , v b e V 1 : (v a , vj e E 1 <=> (φ (v a ),φ (v fc )) e E 2 . Such a mapping is called an isomorphism and, if G 1 = G 2 , it is called an automorphism.

A centrality c is a function c : V — > 9ϊ that assigns every vertex v of a set of vertices V a real number^ . A vertex v a is said to be more important (more central) than a vertex v & ϊ£ c(v a )>c(v b ). Based on centrality values, vertices can be ordered or ranked.

Small recurring sub- graphs within a given graph may be called motifs. A motif M is a directed graph according to the definition of graphs above. A match G M of a motif M in a target graph G is a sub- graph of G(G M czG) which is isomorphic to the motif M (G M -M) .

The motif match set G M = {G M \ G M C G λ G M -A!} of a motif M is the set of all matches of M in the graph G. It is a set of sub- graphs of G, wherein algorithms are existent for such a computation which may be used according to exemplary embodiments of the invention (see F. Schreiber, H. Schwδbbermeyer, "Frequency concepts and pattern detection for the analysis of motifs in networks", Transactions on Computational Systems Biology, 3, LNBI 3737, pp. 89 to 104, 2005; S. Shen-Orr, R. MiIo, S. Mangan, U. Alon, "Network motifs in the transcriptional regulation network of Escherichia coli", Nature Genetics, 31(1), pp. 64 to 68, 2002; O. Sporns, R. Kδtter, "Motifs in brain networks", PIoS Biology, 2(l l):e369, 2004) . In the following, motif based centrality will be explained.

Given a graph G, a motif M and the corresponding motif match set G M a new centrality can be defined. The motif based centrality c mc :V-> 9ϊ assigns to every vertex ve V(G) the number of matches (given by the motif match set G M ) the vertex v occurs in. It is defined as c mc (v) =1 {G M I G M e G M λ V e V{G M )} I . Based on the centrality c cm , the vertices of G can be ordered: A vertex v a that occurs in more matches than vertex V b is more central or more important.

In the following, an algorithm for motif based centrality will be explained.

Input of such an algorithm is a graph G and the motif M. Output of the algorithm are the centrality values c mc (v) for the vertices ve V(G) .

The following steps are included in the algorithm: 1: //initialize result vector

2: For all v e V(G) do 3: c mc [v]^0

4: //compute motif match set with algorithm 5: GM^ ComputeMotifMatchSet (G,M) 6: //compute motif based centrality values

7: For all G M e G M do 8: For all V G V(G M ) do

9: c mc [v]<rc mc [v]+l

In step 5, the computation may be performed, for instance, with one of the methods disclosed in F. Schreiber, H. Schwδbbermeyer, "Frequency concepts and pattern detection for the analysis of motifs in networks", Transactions on Computational Systems Biology, 3, LNBI 3737, pp. 89 to 104, 2005, S. Shen-Orr, R. MiIo, S. Mangan, U. Alon, "Network motifs in the transcriptional regulation network of Escherichia coir , Nature Genetics, 31(1), pp. 64 to 68, 2002, or O. Sporns, R. Kδtter, "Motifs in brain networks", PIoS Biology, 2(11):e369, 2004.

As an example, it is possible to consider the motif feed- forward loop (FFL) 1500 as shown in Fig. 15 A which matches three times in the target graph 1510 shown in Fig. 15B.

Here, the motif match set consists of three sub -graphs with the following vertex set: Iv 15 V 25 V 3 J 5 Iv 25 V 35 V 4 J 5 Iv 25 V 4 5 V 5 ) .

Fig. 16 illustrates a table 1600 which shows the resulting centrality value for all vertices of this graph 1510. Vertex V 2 is the most important vertex as it participates in all three matches of the motif 1500.

The complexity of the algorithm is as follows, where IXI denotes the cardinality of set X. The running times are: for step 2-3 O (W(G)I), for step 5 O(coπψ GM J, the upper bound for computing the motif match set, and for steps 7-9 0(\ G M \XW(M)\ . AS the dominating factor of the running time is the computation of the motif match set the complexity of the above described algorithm is bounded by O(coπψ GM )- To compute this set, I G M I isomorphic sub-graphs of size W(M)I have to be found in the graph G, and even deciding if there exists one such sub -graph is a well-known NP complete problem. The computation of motif based centralities is therefore feasible for the same size of graphs and motifs that are currently analyzed with existing motif analysis methods.

Next, role based motif based centrality will be explained.

Vertices of motifs may represent different functions. For example, in the gene regulatory network context considered herein, three different functions of the vertices of the FFL motif 1500 as shown in Fig. 15A can be identified: 1. The vertex at the top is the master regulator, this vertex regulates the other two vertices.

2. The vertex on the right side is the intermediate regulator. It is regulated by the master regulator and itself regulates together with the master regulator the vertex at the bottom. 3. The vertex at the bottom of Fig. 15A is regulated by both other vertices and is therefore called the regulated vertex. Such different functions of vertices within motifs may be called roles. It is possible to assign three roles to the vertices of the FFL motif 1500.

Let R be a set of roles, G be a graph, M a motif and G M the corresponding motif match set. It is possible to define a function role: which assigns a role to every vertex of G under a specific match. The role based motif based centrality (short extended motif based centrality) c emc (v,r):(VxR)^> 9ϊ assigns to every vertex ve V(G) the number of matches the vertex v occurs in, and it has the

role r. It is defined as c emc (v, r) =1 { G M I G M e s λ V e V (G M ) λ role (v, G M ) = r} I .

Using a particular role r, the centrality can be used to rank or order the vertices.

In the following, an algorithm for the extended motif based centrality based on the above -described algorithm is presented: An input of such an extended motif based centrality algorithm is a graph G, and a motif M with roles R. An output are the centralities c emc (v,r) for the vertices ve V(G) and roles r e R .

1: //initialize result table 2: For all ve V do 3: For all r e R do

4: c emc [v,r] ^ 0

5: //compute motif match set with algorithm 6: GM <- ComputeMotifMatchSet (G,M) 7: //compute extended motif based centralities 8: For all G M e G M do

9: For all ve V(G M ) do

10: r<r GetRoleOfMatching Vertex (v,G M )

11: c emc [v,r] ^ c emc [v, r] + l

In step 5, the computation may be performed, for instance, with one of the methods disclosed in F. Schreiber, H. Schwδbbermeyer, "Frequency concepts and pattern detection for the analysis of motifs in networks", Transactions on Computational Systems Biology, 3, LNBI 3737, pp. 89 to 104, 2005, S. Shen-Orr, R. MiIo, S. Mangan, U. Alon, "Network motifs in the transcriptional regulation network of Escherichia coir , Nature Genetics, 31(1), pp. 64 to 68, 2002, or O. Sporns, R. Kδtter, "Motifs in brain networks", PIoS Biology, 2(11):e369, 2004.

The function GetRoleOfMatching Vertex returns the role of the vertex v within a motif M based on the match G M - The result of the extended algorithm is not

a single centrality vector for the vertices, as in the first described algorithm, but a matrix consisting of rows and columns where the rows denote the vertices of the graph, the columns denote the roles and the entries are the centrality values.

The complexity of the described second algorithm is in the same class as the first algorithm, as steps 2-4 have a running time of O(\V (G)\x\R\), the function GetRoleOfMatching Vertex (step 10) can be implemented as an 0(1) operation as it is assumed that during the computation of the motif match set G M , the bijective mapping between the motif M and the match G M is stored inside the match G M , and all other operations are unmodified compared to the above described first algorithm. For the FFL motif 1500 with roles shown in Fig. 17A and for the graph 1510 shown in Fig. 17B, a table 1800 shown in Fig. 18 is obtained being the result of the extended algorithm for the FFL motif.

The column named A contains the number of matches for the vertices v; to vs for the role master regulator. Vertex V 2 is the most important vertex according to this role, followed by vertex v;. A comparison with table 1600 shows that, based on the FFL motif, the vertex v; is a more important master regulator than the vertices vj to V 5 . This is not obvious from the ranking based on the motif based centralities without roles.

There may exist several isomorphism between a motif and a specific match, see for example Fig. 19A showing a motif 1900 and Fig. 19B showing a target graph 1910.

In this example, two matches between the motif 1900 in Fig. 19A and the sub- graph of the target graph 1910 in Fig. 19B given by the vertices vi, V 2 and vj exists. In the first match, Cp 1 (A) = V 15 Cf) 1 (B) = V 2 , and Cp 1 (C) = V 3 . In a second match, ψ 2 (α) = V 1 2 (B) = v 3 , and φ 2 (C) = V 2 holds. Obviously, it is not clear which role should be assigned to the vertices V 2 and vj as B and C are both candidates. Therefore, restrictions for the assignment of roles to vertices are appropriate. If an automorphism φ in the motif M exists with φ(v a ) = v b for any two vertices

v a , v b e V (M ) , then the roles of v a and v & have to be identical. In the motif 1900 shown in Fig. 19A, there exists such an automorphism which maps the vertices B and C onto each other. The use of three different roles is therefore not allowed for this motif and identical roles have to be assigned to the two vertices at the bottom of Fig. 19A.

Next, role based motif based centrality for motif classes will be explained. Using the previously introduced concepts, it is possible to extend the method further. By assigning the same role to similar vertices of a group of similar motifs, it is possible to establish a centrality based on a class (or a group) of motifs. Consider, for example, a group of chains as illustrated in a sketch 2000 of Fig. 20, where all vertices at a start of such chains have a similar characteristic (no incoming edges) and all vertices at the end have another similar characteristic (no outgoing edges). For gene regulatory networks, several motif classes are known. For example, the regulatory chain motif class, as in the example above, consists of a set of chains of three or more regulators in which one regulator regulates another regulator, which in turn regulates a third one and so forth. In the motif class single input motif (SIM), a set of vertices is exclusively regulated by a single vertex.

The role based motif based centrality for motif classes (short motif class centrality) may be computed by using the second above described algorithm for the extended motif based centrality. For each member of the motif class of interest, the matrix containing centrality values for vertices and roles may be computed. If the motif class consists of / different motifs, then / matrices c ; emo c 2 emc ..., c 1'1 emCι c l emc with centrality values are obtained. The centrality values for a vertex ve V(G) and a role r e R for a motif class may be defined as the sum of the centrality values over

all 1 matrices, [v, r] . The complexity of this method depends on the size of the motif class and can be approximated as / times the running time of the second above described algorithm.

The described method is demonstrated on the example graph 2100 shown in Fig. 21 and the chain motif class 2000 of Fig. 20.

The centrality value for vertices at the top of chains (role A in Fig. 20) may be of interest. The size of the chain motif class is given by the number of chains of different lengths in the target graph. The centrality values for the vertices of the example graph in Fig. 21 are given in a table 2200 of Fig. 22.

The vertex V 1 receives the highest centrality value, as it is the only vertex from which all other vertices are reachable (role A).

Next, centralities for the gene regulatory network of E. coli will be explained. First, the gene regulatory network of E. coli will be explained, as well as global regulators and feed-forward loop motifs.

Thus, in the following, centralities within the gene regulatory network (GRN) of Escherichia coli will be analyzed. The network is based on the data of transcriptional regulatory interactions of genes from RegulonDB (version 5.0). Genes are represented by vertices and transcriptional regulatory interactions between genes may be modelled as edges. The interactions between genes represent transcriptional control of transcription factors on the transcription of regulated genes. There are a few cases where transcription factors are formed by sub-units of different gene products. They are here replaced by a common identifier which corresponds to the transcription factor, for instance ihfA or ihfB result in ihfAB. The regulatory interactions of such different sub-units are assigned to this new identifier, and parallel edges which occur due to the previous operations are replaced by a single edge. Self loops representing autoregulation of genes are deleted since they do not contribute to functional interactions among different vertices. The resulting network consists of 1250 vertices and 2431 edges.

In gene regulatory networks, genes at a high level within the hierarchy of regulatory control are of particular interest due to their far reaching influence on other genes within the network. These genes may be called global regulators.

Possible criteria for the characterization of global regulators are the number of regulated genes, the number and type of co-regulators, the number of other regulators they control, the size of their evolutionary family, and the variety of conditions where they exert their control. As a motif for the motif based centrality analysis, the feed- forward loop

(FFL) motif as described above may be used. Depending on the type of interactions (activating or repressing) the FFL motifs acts as an accelerator or delay element in the process of gene expression and therefore has particular properties that control the expression of target genes. Next, motif based centrality for E. coli GRN will be explained.

The motif based centrality c mc based on the FFL motif is computed for the

GRN of E. coli, and the resulting centrality values are used to rank the genes, see table 2300 illustrated in Fig. 23.

The genes at the top position of this ranking have important functions in the regulation of cellular processes according to Ecocyc: crp is the major global regulator of catabolite- sensitive operons which monitors the energy status of cells by cAMP concentration and is capable of regulating the expression of more than 200 genes, fnr and narL are transcriptional regulators for fermentation and anaerobic respiration, arc A is a transcriptional regulator of aerobic respiration control, and fis and ihfAB are involved in the process of DNA replication. These genes have also been characterized as global regulators.

Genes which have been characterized as global regulators are assigned on top positions in the ranking. However, these genes can be further differentiated by consideration of the functional roles that they adopt within the FFL motif. This approach may allow an even more detailed view of the functions that a particular gene occupies within the network and is discussed in the following.

Thus, extended motif based centrality for the E. coli GRN will be explained next.

The extended motif based centrality c emc based on the FFL motif with roles will be calculated. This leads to three different rankings of the genes depending on the role, see table 2400 shown in Fig. 24 and table 2500 shown in Fig. 25.

It allows a differentiation of the genes into global regulators which mainly act as master regulators, local regulators which are intermediate regulators and target genes which are controlled by at least two regulators as part of a functional feedforward loop motif. The three rankings have obvious differences. The genes ranked at top positions for role A and role B can be clustered into three groups: some genes (crp, ihfAB, soxS) nearly exclusively adopt role A and therefore mainly act as global regulators without being controlled by many other genes. Some genes (narL, fur, hyfR, gadX) nearly exclusively adapt role B and therefore mainly act as local regulators, which are controlled by other genes. For the third group of genes (fnr, arcA, fis, hns) both roles are important and occur with considerable frequency within the analyzed network. These genes therefore act selectively as global and as local regulators. The genes in top positions according to the ranking based on role C are not present in the top positions of the rankings for role A and role B as their centrality values are lower.

By consideration of roles of vertices, extended motif based centrality allows for a detailed analysis of the functional properties of the elements within the network and the extended motif based centrality analysis allows a more precise differentiation of the important regulators by identifying groups of different functional properties based on the well studied FFL motif.

In the following, motif class centrality for the E. coli GRN will be explained.

Motifs which share similar structural properties can be combined into motif classes, as described above. In the following, regulatory chains will be analyzed that belong to the chain motif class, see Fig. 20.

Starting from a regulator at the top of the chain, the regulation is directed to its bottom via intermediate regulators. Generally, due to the hierarchical structure of

gene regulatory networks, regulators increase their range of control by indirectly affecting genes via intermediate regulators. Regulators at the top of regulatory chains are of particular interest as they start regulatory cascades.

To study the regulators with the highest influence on other genes within the E. coli GRN, the motif class centrality for the chain motif class is computed and only the role of the vertex at the top of the chain is considered (see Fig. 20). Comparing the ranking for the motif class centrality based on the chain class and the motif based centrality based on the FFL motif (see above) shows that crp, fnr, arcA, fis and ihfAB are in both cases among the top six positions, see table 2300 and a table 2600 shown in Fig. 26.

The gene narL which is ranked at position 6 for the motif based centrality holds only position 20 for the motif class centrality, because it regulates only a few genes indirectly although the number of directly controlled genes is relatively high. Furthermore, the composition of centrality values for individual motif chains show some interesting characteristics. There are some genes among the top 20 positions that have a very low centrality value for motif chains of size 2 (evgA, ydeO, soxR, gadW, cspE and cspA) and therefore a low range of direct control. For all these genes, the values for motif bases centrality based on the FFL motif are also very low (between 0 and 3). However, all these genes indirectly control a large number of other genes. Analyzing, for example, the genes evgA and ydeO (positions 6 and 7 in table 2600), these genes both regulate gadE and are part of the same regulatory chain via gadE. The gene evgA additionally regulates three other genes which themselves do not regulate any genes. Although the genes evgA and yde O have a low out- degree and therefore the range of direct control is weak, they have a high range of control via indirect regulation. According to Ecocyc, evgA and ydeO are involved in drug and acid resistance and gadE is involved in resistance to low pH. Therefore motif based centrality is able to identify additional interesting genes.

Thus, an approach to rank vertices of networks based on network motifs has been presented and discussed in the context of three particular methods. The first method (motif based centrality) ranks vertices according to the number of motif matches such that the match contains the vertex of interest. The other two methods (extended motif based centrality and motif class centrality) are based on this method. Extended motif based centrality considers roles and allows a more detailed analysis of the network of interest based on functions assigned to the vertices of the motif. Motif class centrality uses a whole group of similar motifs for the ranking and therefore takes related functional network substructures into consideration. Approaches presented herein deal with structural information on a level between the local and the global network structure.

The application of motif based centrality methods based on the FFL motif to the gene regulatory network of E. coli provides interesting results. The motif based centrality c mc shows at the top position genes that have been already identified as global regulators and can be therefore us ed for an analysis of key regulators in other organisms. A more detailed analysis based on the extended motif based centrality c emc which in this example considers the roles (functions) of vertices within the FFL motif allows a differentiation of the genes into three groups: genes that mainly act as global regulators, genes that mainly act as local regulators, and genes that selectively act as global and as local regulators. The extended motif based centrality is further refined by the use of a group of similar motifs to establish the motif class centrality C mcc - The genes at the top of the chain motif class have been studied as this allows the analysis of genes which have the highest ranges of control within the E. coli GRN. This method again ranks genes that have already been identified as global regulators at top positions. Interestingly, genes with a low range of direct control (out degree) are also among the top 20 positions for the centrality. These regulators amplify their range of control via intermediate regulators and indirectly affect a large number of

genes. Genes with these kinds of regulatory properties are not detected by other approaches which, for example, are based on the number of directly regulated genes.

It should be noted that the term "comprising" does not exclude other elements or steps and the "a" or "an" does not exclude a plurality. Also elements described in association with different embodiments may be combined.

It should also be noted that reference signs in the claims shall not be construed as limiting the scope of the claims.

Implementation of the invention is not limited to the preferred embodiments shown in the figures and described above. Instead, a multiplicity of variants are possible which use the solutions shown and the principle according to the invention even in the case of fundamentally different embodiments.