Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
NEURAL RELATION EXTRACTION WITHIN AND ACROSS SENTENCE BOUNDARIES
Document Type and Number:
WIPO Patent Application WO/2021/013466
Kind Code:
A1
Abstract:
The invention provides a computer-implemented method for intersententially determining a semantic relationship between a first entity and a second entity in a natural language document, comprising at least the steps of: generating (S10) a first dependency parse tree, DPT, for a first origin sentence of the document which comprises the first entity, wherein each DPT comprises at least a root node; generating (S20) a second DPT for a second origin sentence of the document which mentions the second entity; linking (S30) the root nodes of the first DPT and the second DPT so as to create a chain of words, COW; determining (S40) for each word in the COW a subtree; generating (S50) for each word in the COW a subtree embedding vector (SEV) cw which is based at least on word embedding v ectors (WEV) xw of the words of the subtree; generating (S60) a representation vector pw for each word in the COW; and classifying (S80), using a recurrent neural network, the semantic relationship between the first entity (e1) and the second entity (e2), based on the input representation vectors pw.

Inventors:
GUPTA PANKAJ (DE)
RAJARAM SUBBURAM (DE)
ANDRASSY BERNT (DE)
RUNKLER THOMAS (DE)
Application Number:
PCT/EP2020/067835
Publication Date:
January 28, 2021
Filing Date:
June 25, 2020
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
SIEMENS AG (DE)
International Classes:
G06F40/279; G06F40/30; G06N3/02
Foreign References:
US20170351749A12017-12-07
Other References:
PANKAJ GUPTA ET AL: "Neural Relation Extraction Within and Across Sentence Boundaries", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 11 October 2018 (2018-10-11), XP081064578
GUPTA ET AL.: "Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers", 2016, article "Table filling multi-task recurrent neural network for joint entity and relation extraction", pages: 2537 - 2547
VU ET AL.: "Proceedings of the Acoustics, Speech and Signal Processing (ICASSP", 2016, IEEE, article "Bi-directional recurrent neural network with ranking loss for spoken language understanding", pages: 6060 - 6064
CHEN ET AL.: "A Fast and Accurate Dependency Parser Using Neural Networks", PROCEEDINGS OF EMNLP, 2014
BUNESCU ET AL.: "A Shortest Path Dependency Kernel for Relation extraction", PROCEEDINGS OF THE HUMAN LANGUAGE TECHNOLOGY CONFERENCE AND CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANAUGE PROCESSING, VANCOUVER, B.C., October 2005 (2005-10-01), pages 724 - 731, XP055547726, DOI: 10.3115/1220575.1220666
LIU ET AL.: "A Dependency-Based Neural Network for Relation Classification", PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (SHORT PAPERS, 26 July 2015 (2015-07-26), pages 285 - 290, XP055553330, DOI: 10.3115/v1/P15-2047
PENNINGTON ET AL.: "Global vectors for word representation", PROCEEDINGS OF THE 2014 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP, 2014, pages 1532 - 1543, XP055368288, DOI: 10.3115/v1/D14-1162
FINKEL ET AL.: "Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics", 2005, ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, article "Incorporating non-local information into information extraction systems by gibbs sampling", pages: 363 - 370
Download PDF:
Claims:
Patent Claims

1. A computer-implemented method for inter-sententially de- termining a semantic relationship between a first entity (el) and a second entity (e2) in a natural language document, com- prising at least the steps of:

generating (S10) a first dependency parse tree (DPT1), DPT, for a first origin sentence of the document which comprises the first entity (el), wherein each DPT comprises at least a root node;

generating (S20) a second DPT (DPT2) for a second origin sen- tence of the document which mentions the second entity (e2); linking (S30) the root nodes of the first DPT and the second DPT so as to create a chain of words, COW, from the first en- tity (el) to the second entity (e2);

determining (S40) for each word in the COW a subtree compris- ing the word itself as well as, if applicable, all words de- pending as nodes from said words as a root;

generating (S50) for each word in the COW a subtree embedding vector (SEV) cw which, in case the word in the COW has a sub- tree, is based at least on word embedding vectors (WEV) xw of the words of the subtree;

generating (S60) a representation vector pw for each word in the COW, wherein the representation vector pw for each word comprises a word embedding vector (WEV) xw for said word it- self as well as its subtree embedding vector (SEV) cw;

inputting (S70) the representation vectors pw for each word in the COW into a recurrent neural network; and

classifying (S80), using the recurrent neural network, the semantic relationship between the first entity (el) and the second entity (e2), based on the input representation vectors

Pw.

2. The method of claim 1

wherein the root nodes of the first DPT (DPT1) of the first origin sentence and of the second DPT (DPT2) of the second origin sentence are linked directly.

3. The method of claim 1 or claim 2,

comprising selecting the first origin sentence and the second origin sentence from the sentences in the document such that the distance in sentences or the distance in words between the first origin sentence and the second origin sentence in the document is the smallest out of any pair of sentences of the document which comprise the first entity (el) and the second entity (e2), respectively.

4. The method of claim 1 or claim 2,

comprising selecting the first origin sentence and the second origin sentence from the sentences in the document such that the distance in sentences or the distance in words between the first origin sentence and the second origin sentence within the COW is the smallest out of any pair of sentences of the document which comprise the first entity (el) and the second entity (e2), respectively.

5. The method of any of claims 1 to 4,

wherein generating the subtree embedding vector (SEV) cw for any word of the COW comprises, for each word of its subtree, recurrently applying a recursive neural network to generate a subtree embedding vector (SEV) cw for each word based on the word embedding vectors (WEV) xw and the subtree embedding vectors (SEV) cw of its directly depending words within the subtree, starting with the leaf nodes of the subtree towards the root node which represents the word of the COW.

6. The method of claim 5

wherein a transformation matrix W selected and used in the recursive neural network for generating a subtree embedding vector (SEV) cw for a first word based on a directly depend- ing second word is dependent on a type of relationship be- tween the first word and the second word according to the first DPT (DPT1) or according to second DPT (DPT2) .

7. The method of claim 5 or claim 6,

wherein the representation vector pw for each word of a sub tree is a generated as a concatenation of a word embedding vector (WEV) xw for said word and a subtree embedding vector (SEV) cw for said word.

8. The method of any of claims 5 to 7,

wherein the subtree embedding vector (SEV) cw for all leaf nodes is set to a predetermined leaf node representation vec- tor cLEAF ·

9. The method of any of claims 1 to 8,

wherein the recurrent neural network for classifying the se mantic relationship between the first entity (el) and the second entity (e2) is a bidirectional recurrent neural net- work .

10. The method of any of claims 1 to 9,

wherein the word embedding vectors (WEV) xw of the words are taken from a pre-developed list of word embedding vectors (WEV) .

11. A method for training a recurrent neural network for use in the method according to any of claims 1 to 10.

12. A method for training a recursive neural network for use in the method according to any of claims 5 to 8.

13. An apparatus (100) configured to perform the method ac- cording to any of claims 1 to 10.

14. A computer program product (300) comprising executable program (350) code configured to, when executed, perform the method according to any of claims 1 to 10.

15. A non-transitory computer-readable data storage medium (400) comprising executable program code (450) configured to, when executed, perform the method according to any of claims 1 to 10.

Description:
Description

Neural Relation Extraction Within and Across Sentence Bounda- ries

Field of the Invention

Relation extraction is an important task in automated docu ment processing. The task of relation extraction (RE) aims to identify semantic relations between a pair of entities (or: "nominals") el and e2 in a given sentence S. The semantic re- lationships may be identified out of a provided list of pre- defined relation classes, for example: content-container, cause-effect, instrument-agency but also "works for", "is re- lated by blood to" and so on. Relation extraction is usually defined as a combination of entity recognition and relation classification but may also refer to the classification of a relation between two given entities.

Due to a rapid growth in information, relation extraction plays a vital role in knowledge extraction from unstructured texts and serves as an intermediate step in a variety of nat- ural language processing (NLP) applications for example in newswire, web and high-valued biomedicine domains. Conse- quently, there has been increasing interest in relation ex- traction, particularly for augmenting existing knowledge ba ses .

Particular difficulties arise when it is desired to know a relation between an entity pair which are not comprised in one and the same sentence.

Background of the invention Past work in relation extraction mostly focuses on binary re lations between entity pairs within single sentences (intra- sentential relationships), for example, Gupta et al . , "Table filling multi-task recurrent neural network for joint entity and relation extraction", 2016, Proceedings of COLING 2016, the 26th International Conference on Computational Linguis- tics: Technical Papers, 2537-2547.

Other works focus on high-performance artificial neural net- works for natural language processing, NLP. For example, Vu et al, "Bi-directional recurrent neural network with ranking loss for spoken language understanding", in "Proceedings of the Acoustics, Speech and Signal Processing (ICASSP) , 6060-

6064, Shanghai, China: IEEE, 2016 (hereafter cited as "Vu et al."), describes a so-called connectionist bi-directional re- current neural network which combines the forward and back- ward pass by adding their hidden layers at each time step and also adds a weighted connection to the previous combined hid- den layer to include all intermediate hidden layers into the final decision.

In Chen et al . , "A Fast and Accurate Dependency Parser Using Neural Networks", Proceedings of EMNLP 2014 (hereafter cited as "Chen et al."), a transition-based parser is described which produces typed dependency parses of natural language sentences. The results of the dependency parsing can then be arranged in the form of dependency parse trees, DPT, for ex- ample by using NetworkX, see Hagberg et al . , Exploring net work structure, dynamics, and function using networkx, Tech nical report, Los Alamos National Laboratory (hereafter cited as "Hagberg et al.") . In Bunescu et al . , "A Shortest Path Dependency Kernel for Re- lation extraction", Proceedings of the Human Language Tech nology Conference and Conference on Empirical Methods in Nat- ural Lanauge Processing, Vancouver, B.C., pages 724-731, Oc- tober 2005 (hereafter cited as "Bunescu et al.") , it is de- scribed how shortest paths between two entities within a sen- tence may capture essential information about their relation- ship .

Relationships or dependencies between words can be expressed for example using "Universal Dependencies", or UD, which is a framework for cross-linguistically consistent grammatical an- notation in currently over 70 languages. Whenever herein UD symbols are used, the version 2 of the UD, currently availa- ble under the URL https://universaldependencies.org/ , is re ferred to, although future versions of UD, or other similar frameworks, may be similarly applied.

Based thereon, the use of dependency trees (or: dependency parse trees) for relation classification is described, for example, in Liu et al . , "A Dependency-Based Neural Network for Relation Classification", Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Short Papers), pages 285-290, Beijing, China, Ju- ly 26-31, 2015 (herafter cited as "Liu et al.") . In Liu et al . , so-called Dependency-based Neural Network, or DepNN, are introduced. DepNNs are used for modelling augmented dependen- cy paths, ADP .

ADPs comprise shortest dependency paths between two entities within the same sentence, wherein each word of this shortest dependency path (as in Bunescu et al . ) is augmented by its attached dependency subtree (if there is one) .

Entity pairs spanning multiple sentences (inter-sentential relationships) are ignored in the approaches described above. Evidently, this can result in a lot of information in a docu- ment being lost when relation extraction is performed.

Summary of the Invention

Accordingly, it is one of the objectives of the present in- vention to provide a method and system for inter-sententially determining semantic relationships between entities in natu- ral language documents.

Accordingly, a computer-implemented method for inter- sententially determining a semantic relationship between a first entity and a second entity in a natural language docu- ment is provided, the method comprising at least the steps of :

generating a first dependency parse tree, DPT, for a first origin sentence of the document which comprises the first en- tity, wherein each DPT comprises at least a root node;

generating a second DPT for a second origin sentence of the document which mentions the second entity;

linking the root nodes of the first DPT and the second DPT so as to create a chain of words, COW, from the first entity to the second entity;

determining for each word in the COW a subtree comprising the word itself as well as, if applicable, all words depending as nodes from said words as a root;

generating for each word in the COW a subtree embedding vec tor which, in case the word in the COW has a subtree, is based at least on word embedding vectors of the words of the subtree;

generating a representation vector for each word in the COW, wherein the representation vector for each word comprises a word embedding vector for said word itself as well as its subtree embedding vector;

inputting the representation vectors for each word in the chain of words into a recurrent neural network; and

classifying, using the recurrent neural network, the semantic relationship between the first entity and the second entity, based on the input representation vectors.

The term "origin sentence" is used to differentiate the spe- cific first and second origin sentences, which may be select- ed as part of the method, from other sentences. In the terms "first origin sentence" and "second origin sentence", the words "first" and "second" refer to the "first entity" and the "second entity", respectively, and not to a particular order of appearance or priority. Thus, the first origin sen- tence may appear later in the natural language document than the second origin sentence, or vice versa.

The dependency parse trees, DPT, may be generated by parsing sentences as described e.g. in Chen et al . and then by ar- ranging the parsed sentences in DPTs by using e.g. NetworkX (see Hagberg et al . ) . Each token (word) is realized as a node of the tree, and the dependency relation is realized as a link between the nodes. Nodes that do not have any dependent nodes are sometimes also designated as "leaf nodes". The root node is dependent from no other node. If a word does not have a subtree (or, expressed in another way, if the word is iden- tical to its subtree, i.e. a subtree of one single word), then said word is both a root node and a leaf node. The linking of the root nodes may in particular be performed by connecting the roots by NEXTS. NEXTS is a weight or learn- able parameter that connects roots of two trees.

The chain of words, COW, may also be designated as the short- est dependency path, SDP, as it is the shortest connection between the two entities along a dependency parse tree, DPT. The subtree embedding vectors may indicate the semantic rela- tionships expressed by, or in, each subtree. The word embed- ding vectors may indicate the semantic proximity of each word to any or all of a list of other words.

A recurrent (artificial) neural network comprises, for exam- ple, a network of neural units (or "nodes" or "neurons") which are connected by a directional connection or by a for- ward and a backward connection. As such, an activation or signal can propagate in both directions between two connected neurons of the recurrent neural network. Weights and biases of the neural network are adjusted during training of the neural network in order to improve the performance of the network ("learning") .

In some advantageous embodiments, variants, or refinements of embodiments, the root nodes of the DPT of the first origin sentence and of the DPT of the second origin sentence are linked directly, preferably via NEXTS.

In some advantageous embodiments, variants, or refinements of embodiments, the method comprises selecting the first origin sentence and the second origin sentence from the sentences in the document such that the distance in sentences or the dis- tance in words between the first sentence and the second sen- tence in the document is the smallest out of any pair of sen- tences of the document which comprise the first entity and the second entity, respectively. This follows the notion that the contents of the first and second origin sentences will be the closer referring to one another when they appear closer together within the natural language document and that there- fore the performance of the method will increase.

In some advantageous embodiments, variants, or refinements of embodiments, the method comprises selecting the first origin sentence and the second origin sentence from the sentences in the document such that the distance in sentences or the dis- tance in words between the first origin sentence and the sec- ond origin sentence within the COW is the smallest out of any pair of sentences of the document which comprise the first entity and the second entity, respectively. In this variant, a plurality of first and second DPTs may be produced for each of a plurality of candidate first and second origin sentenc- es, the linking may be performed, and the size of the chain of words (i.e. of the shortest dependency path) may be calcu- lated in order to then choose the final first and second origin sentences.

In some advantageous embodiments, variants, or refinements of embodiments, generating the subtree embedding vector for any word of the COW comprises, for each word of its subtree, re- currently applying a recursive or recurring neural network to generate a subtree embedding vector for each word based on the word embedding vectors and the subtree embedding vectors of its directly depending words within the subtree, starting with the leaf nodes of the subtree towards the root node which represents the word of the COW. In this way, the com- plete semantic information of the subtree is encapsulated in a final subtree embedding vector of the root node. In some advantageous embodiments, variants, or refinements of embodiments, a transformation matrix is selected and used in the recursive or recurring neural network for generating a subtree embedding vector for a first word based on a directly depending second word is dependent on a type of relationship between the first word and the second word according to the first DPT or according to the second DPT, respectively. The respective dependency parse tree which comprises each word also comprises information about the relation between its words which can thus be advantageously exploited.

In some advantageous embodiments, variants, or refinements of embodiments, the representation vector for each word of a subtree is generated as a concatenation of a word embedding vector for said word and a subtree embedding vector for said word .

In some advantageous embodiments, variants, or refinements of embodiments, the subtree embedding vector for all leaf nodes is set to a predetermined leaf node representation vector c LEAF ·

In some advantageous embodiments, variants, or refinements of embodiments, the recurrent neural network for classifying the semantic relationship between the first entity and the second entity is a bidirectional recurrent neural network.

In some advantageous embodiments, variants, or refinements of embodiments, the word embedding vectors of the words are tak- en from a pre-developed or pre-trained list of word embedding vectors. For example, the word embedding vectors may be taken from GloVe (Pennington et al . , "Global vectors for word rep- resentation", Proceedings of the 2014 conference on empirical methods in natural language processing, EMNLP, 1532-1543, 2014) or from Moen et al . , "Distributional semantics re- sources for biomedical text processing", 2013.

Further advantageous embodiments, variants, options and modi- fications are apparent from the dependent claims for each of the aspects of the invention as well as from the description in combination with the figures.

The invention also provides, according to a second aspect, an apparatus configured to perform the method according to any embodiment of the first aspect of the present invention. The apparatus may in particular comprise an input interface, a computing device and an output interface.

The computing device may be realized in hardware, such as a circuit or a printed circuit board and/or comprising transis- tors, logic gates and other circuitry. Additionally, the com- puting device may be at least partially realized in terms of software. Accordingly, the computing device may comprise, or be operatively coupled to, a processor (one or more CPUs and/or one or more GPUs and/or one or more ASICs and/or one or more FPGAs) , a working memory and a non-transitory memory storing a software or a firmware that is executed by the pro- cessor to perform the functions of the computing device. Sig- nals may be received by the input interface and signals that the processor of the computing device creates may be output- ted by the output interface. The computing device may be im- plemented, at least partially, as a microcontroller, an ASIC, an FPGA and so on.

The invention further provides, according to a third aspect, a non-transitory computer-readable data storage medium com- prising executable program code configured to, when executed by a computing device, perform the method according to any embodiment of the first aspect.

The invention also provides, according to a fourth aspect, a computer program product comprising executable program code configured to, when executed by a computing device, perform the method according to any embodiment of the first aspect.

The invention also provides, according to a fifth aspect, a data stream comprising, or configured to generate, executable program code configured to, when executed by a computing de- vice, perform the method according to any embodiment of the first aspect .

According to a seventh aspect, the invention provides a meth od for training a recurrent neural network for use in the method according to any embodiment of the first aspect of the present invention and/or for training a recursive neural net- work for use in the method according to any embodiment of the first aspect of the present invention.

Brief description of the drawings

The invention will be explained in yet greater detail with reference to exemplary embodiments depicted in the drawings as appended.

The accompanying drawings are included to provide a further understanding of the present invention and are incorporated in and constitute a part of the specification. The drawings illustrate the embodiments of the present invention and to- gether with the description serve to illustrate the princi- pies of the invention. Other embodiments of the present in- vention and many of the intended advantages of the present invention will be readily appreciated as they become better understood by reference to the following detailed descrip- tion. Like reference numerals designate corresponding similar parts .

The numbering of method steps is intended to facilitate un- derstanding and should not be construed, unless explicitly stated otherwise, or implicitly clear, to mean that the des- ignated steps have to be performed according to the numbering of their reference signs. In particular, several or even all of the method steps may be performed simultaneously, in an overlapping way or sequentially.

Fig. 1 shows a schematic flow diagram illustrating a comput er-implemented method according to the first aspect of the present invention;

Fig. 2, Fig. 3 and Fig. 4

schematically illustrate further details of the meth- od according to Fig. 1;

Fig. 5 shows a block diagram schematically illustrating an apparatus according to an embodiment of the second aspect of the present invention;

Fig. 6 shows a block diagram schematically illustrating a computer program product according to an embodiment of the third aspect of the present invention; and Fig. 7 shows a block diagram schematically illustrating a data storage medium according to an embodiment of the fourth aspect of the present invention.

Detailed Description of the invention

Although specific embodiments have been illustrated and de- scribed herein, it will be appreciated by those of ordinary skill in the art that a variety of alternate and/or equiva- lent implementations may be substituted for the specific em- bodiments shown and described without departing from the scope of the present invention. Generally, this application is intended to cover any adaptations or variations of the specific embodiments discussed herein.

Fig. 1 shows a schematic flow diagram illustrating a comput- er-implemented method according to the first aspect of the present invention, i.e. a computer-implemented method for in- ter-sententrally determining a semantic relationship between a first entity and a second entity in a natural language doc- ument. The method may evidently also be configured for intra- sententially determining said semantic relationship in addi- tion. In the following, for the most part the more difficult case of the inter-sententional relationship extraction will be explained. The method will be described using the English language for examples. However, it will be evident that the same method may be applied to any other language as well, for example to Chinese, Hindustani, Spanish, Arabic, Malay, Rus- sian, Bengali, Portuguese, French, German, Japanese, and so on . In a step SOI, a natural language document is provided, for example uploaded into a memory, or retrieved from a database or the like.

In a step S02, a pair of entities, consisting of a first en- tity and a second entity, is provided, the relationship be- tween which is desired to be known. This may be done, for ex- ample, by a user sending a query to a system, indicating the two entities.

In a step S03 a first origin sentence which comprises the first entity and a second origin sentence which comprises the second entity are automatically selected from the natural language document. The first and the second origin sentences may be chosen such that the distance in sentences or the dis- tance in words between the first sentence and the second sen- tence in the document is the smallest out of any pair of sen- tences of the document which comprise the first entity and the second entity, respectively.

In a step S10, a first dependency parse tree, DPT, is gener- ated for the first origin sentence of the document which com- prises the first entity, wherein each DPT comprises at least a root node. Since sentences in most languages comprise more than one word, usually the DPT will comprise further nodes in addition to the root node. The further nodes may be leaf nodes (i.e. have no nodes dependent from them) or may them- selves have nodes dependent from them.

In a step S20, a second dependency parse tree, DPT, is gener- ated for the second origin sentence of the document which comprises the second entity, wherein each DPT comprises at least a root node, but usually also additional nodes. The dependency parse trees, DPT, may be generated by parsing sentences as described e.g. in Chen et al . and then by ar- ranging the parsed sentences in DPTs by using e.g. NetworkX (see Hagberg et al . ) . Each token (word) is realized as a node of the tree, and the dependency relation is realized as a link between the nodes.

Nodes that do not have any dependent nodes are sometimes also designated as "leaf nodes". The root node is dependent from no other node. If a word does not have a subtree (or, ex- pressed in another way, if the word is identical to its sub- tree, i.e. a subtree of one single word), then said word is both a root node and a leaf node.

In the following, an example with two sentences in a document in the English language will be used for illustration of the method. Assume that a natural language text document compris- es the following two sentences in the English language, wherein for the further purpose the words in these two sen- tences have been numbered from Y1 to Z14:

Table 1: first origin sentence (Sentence #1)

Table 2: second origin sentence (Sentence #2)

Assume further that it is desired to know the relationship between, as a first entity el, the name "Vern Raburn", and, as a second entity e2, the company "Paul Allen Group".

To a human reader it is a simple task to infer, from reading the two sentences above, that the two entities el, e2 are connected in the relationship "Person-Organisation" (or: Per- Org for short) . However, to a natural language processing, NLP, system this task is more difficult, as the first entity el, "Vern Raburn", is comprised in a first origin sentence (Sentence #1) and the second entity e2, "Paul Allen Group", is comprised in a second origin sentence (Sentence #2) of a natural language text document .

Fig. 2 shows in its upper half a dependency parse tree DPT1 for the first origin sentence, and in its lower half a de- pendency parse tree DPT2 for the second origin sentence. In the first origin sentence, "started" has been determined to be the root node, whereas in the second origin sentence

"based" has been determined to be the root node. The symbols accompanying the arrows in Fig. 2 denote the semantic and grammatical relationship between the words (or nodes) in the English language. For example, the symbol "compound" between words Y1 ("Paul") and Y2 ("Allen") indicates that these two words Y1 and Y2 form a compound noun, and the symbol "nsubj" indicates that the compound "Paul Allen" (i.e. entity el, or Y1+Y2) is the subject for the word Y3 ("started") which is the predicate of the first origin sentence.

The symbols in Fig. 2 have been taken from the Universal De- pendencies (UD) framework, version 2.

In a step S30, the root nodes Y3 of the first DPT and the root node Zll of the second DPT are linked, preferably by NEXTS, so as to create a chain of words, COW, from (and in- cluding) the first entity el to (and including) the second entity e2. Table 3 shows that chain of words, COW, which may also be designated as, or interpreted as indicating, the shortest dependency path, SDP :

Table 3: Chain of words, COW, of the shortest dependency path, SDP

In a step S40, for each word in the COW (or, in other words, in the SDP), a subtree comprising the word itself as well as, if applicable, all words depending as nodes from said words as a root, is determined (or generated) . Thus, the COW or SDP is made into an inter-sentential Dependency Path, iSDP .

Fig. 3 schematically illustrates this iDSP, wherein the SDP has been highlighted.

In a step S50, for each word in the COW (or in the SDP), a subtree embedding vector c w is generated which, in case the word in the COW has a subtree, is based at least on word em- beddings of the words of the subtree. If the word in the COW does not have a subtree (i.e. if it is a leaf node as well as a root node of its own subtree) , then - as for any leaf nodes - its subtree embedding vector c w is set to a predefined leaf node subtree embedding vector c LEAF .

In a step S60, for each word in the COW a representation vec tor p w is generated, wherein the representation vector p w for each word comprises a word embedding vector x w for said word itself as well as its subtree embedding vector c w . Prefera- bly, the representation vector p w comprises, or consists of, a concatenation of the word embedding vector x w and the sub tree embedding vector c w .

The word embedding vectors x w are preferably taken from a pre-trained word embeddings database suitable for the type of natural language document. For example, for general text doc- uments, GloVe (see Penington et al . ) may be used and for bio- medical natural language text documents the word embeddings database from Moen et al . may be used.

Fig. 4 schematically illustrates some steps of the computer- implemented method according to Fig. 1.

In the line of Fig. 4 second from the top, the COW (or SDP) is shown. Below that, the representation vectors p w for each word Y8, Y9, Y7, Y3, Zll, Z5, Z8, Z6 and Z7 in the COW are shown. Each representation vector p w G R d+d comprises a word embedding vector WEV X w Î R d for the word of the COW itself and a subtree embedding vector SEV c w Î R d' for the subtree of each word of the COW. Herein, d is the dimension of the word embedding vectors WEV, determined by their source database, and d' is the dimension of the subtree embedding vector SEV which is a hyperparameter.

For words in the COW which do not have a subtree (or, in the other interpretation, which are root nodes as well as leaf nodes of their own subtree), C W =C LEAF · In the present example, this applies to the words Y8, Y9, Z8, Z6 and Z7. For in- stance, the representation vector p Y8 for the word Y8

("Vern") would be [x Y8, C LEAF ] wherein [,] represents concate- nation, and x Y8 the word embedding vector WEV for "Vern".

For the other words which do have subtrees (Y7, Y3, Z11, Z5) , in step S50 the subtree embedding vectors SEV c w are generat- ed by recursive transformations of the representation vectors p w of its children words, preferably as follows: A recursive artificial neural network, RecNN, is used to con- struct the subtree embedding vectors SEV c w , traversing bot- tom up from the leaf nodes of the subtree to the root node of the subtree. The individual dependency between each word w and each of its children words (designated as " children (w) ") is given a dependency relation r=R (W , q) based on the dependen- cy parse, for example r=R (w , q) =dobj when w and q have the "dobj" relation according to Universal Dependencies, UD . For each r, a transformation matrix W r Î d' x(d+d' ) is learned during training .

Thus, the subtree embedding vectors SEV c w may be computed as :

with an activation function /, with a learnable bias bÎ d' and with the representation vector p q for the children q of word w given by the concatenation p q =[w q , c q ] , and wherein c q is either c LEAF , or calculated again according to the formula (1) above.

For example, for the word Y7 ("named") in the COW or SDP, first for the word Y10 ("its") c Y10 would be determined to be

CL EAF . Then, a word embedding vector WEV x y10 from a word em- beddings database (e.g. GloVe) would be retrieved and concat- enated with c y10 to form p y 10=[Xy10 / c LEAF ] ·

Then, for the word Y11 ("President"), a word embedding vector WEV x Y11 would be retrieved, for the word "president". The subtree embedding vector SEV c Y11 would be calculated using formula (1) based on p Y10 and using the trained transformation matrix VS poss , as the dependency between word Yll "President" and word Y10 "its" in the first origin sentence is "poss" ac- cording to the Universal Dependencies, UD, framework. The subtree embedding vector SEV p Y11 would then be provided as P Y11 = [X y11, C y11 ] .

Then, for the word Y7 ("named"), a word embedding vector WEV x Y7 would be retrieved, for the word "named". The subtree em- bedding vector SEV c Y7 would be calculated using formula (1) based on p Y11 and using the trained transformation matrix W dobj, as the dependency between word Y7 "named" and word Yll "President" in the first origin sentence is "dobj" according to the Universal Dependencies, UD, framework.

Since formula (1) uses a sum over all children q of word w , branches such as the word Y3 ("started") having three chil- dren - words Y2, Y5 and Y6 - can be easily resolved, wherein again for each child a different trained weight matrix W R(W q) may be applied according to the dependency R(w,q) between the child, q, and word, w, Y3 ("started") . For example, as illus- trated in Fig. 4, the weight matrices W nsubj , W cc and W dobj are used for calculating c Y3 and thus p Y3 .

In the described way, each word in the COW is enriched with additional information about how this word functions in a specific sentence to form a more precise structure for clas- sifying relationships within and across sentences. For inter- sentential relationship extraction this is especially benefi- cial, as the COW is by definition composed from parts of two different origin sentences which will make its meaning more cryptic. The dependency subtrees, each of which stems exclu- sively from a single one of the origin sentences, thus help to provide useful additional information. In a step S70, the representation vectors p w for the words w in the COW are input into a recurrent artificial neural net- work, in particular into a bi-directional artificial neural network biRNN. The layers of this biRNN are shown in the first row of Fig. 4, wherein the bi-directionality is indi- cated by bi-directional arrows between the bi-directional hidden layer vectors, wherein each layer receives as further input also a representation vector p w of a word of the COW.

In a step S80, using the biRNN, the semantic relationship be- tween the first entity el and the second entity e2 is classi- fied, based on the input representation vectors p w . For exam- ple, as illustrated in Fig. 4, a final softmax layer SML may be used to indicate probabilities that the relationship be- tween the first entity el and the second entity e2 is of a specific type.

In a step S90, an output signal indicating the semantic rela- tionship between the first entity el and the second entity e2 may be output, e.g. in form of an electronic signal, as a control signal for a display device for displaying the seman- tic relationship visually, as a control signal for an audio device for indicating the determined semantic relationship as audio and/or the like.

In some variants, the method may first be applied with a first trained biRNN to determine whether or not the first and second entities el, e2 have any of the pre-specified rela- tionship types or not such that the output may be a single number indicating the likelihood that this is the case. Then, the method may be applied a second time with a differently trained second biRNN to determine which of the pre-specified relations exists between the first and second entities el, e2.

For example, for the MUC6 (Grisham and Sundheim, 1996) da- taset, which contains information about management succession events from newswire, three types of relations may be pre- specified :

- Person-Organisation (Per-Org) ;

- Person-Position (Per-Post) ; and

- Position-Organisation (Post-Org) .

Evidently, for other types of natural language documents or datasets, other pre-specified relation types may be suitable, such as content-container, cause-effect, instrument-agency, "works for", "is related by blood to", "owes money to" and so on .

Training data for the MUC6 dataset can be provided e.g. by tagging entities "Person" (Per) , "Organization" (Org) using Stanford NER tagger (see e.g. Finkel et al . , "Incorporating non-local information into information extraction systems by gibbs sampling", Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 363-370, Asso- ciation for Computational Linguistics, 2005) . The entity type "Position" (Post) can be annotated based on the templates. MUC6 is the standard annotated dataset, provided by LDC. How- ever, there are missing annotations for relationships span- ning sentence boundaries. Therefore, on top of the available intra-sentential relation annotations inter-sentence relation annotations can be performed in addition.

As is also sketched in Fig. 4, the biRNNs receiving a short- est dependency path COW for inter-sentential relation extrac- tion may be designated as "Inter-sentential Dependency-Based Neural Networks", iDepNN-SDP, whereas the complete structure may be designated as the "Inter-sentential Augmented Depend- ency Path" method, iDepNN-ADP method.

The output of the softmax layer SML may be a probability dis- tribution y over L relation labels, as y=softmax (U-h N +b y ) , where UÎ R LxH is the weight matrix connecting a hidden vector of dimension H to an output of dimension L, b y Î R L is a bias parameter, and h N is the last hidden vector of the biRNN. In the example of Fig. 4, the first hidden vector hi (or: h Y8 ) receives the representation vector p Y8 of word Y8 ("Vern") and the last hidden vector h N (or h 9 , since N=9 because the COW consists of 9 words, or h z8 ) receives the representation vector p z8 of word Z8 ("Group") .

To compute the hidden representations hw for each word, in particular a Connectionist biRNN may be used (see Vu et al . , which is incorporated herein by reference) , which combines the forward and backward pass by adding their hidden layers (h f for the forward direction and h bt for the backward direc- tion) at each time step t and also adds a weighted connection to the previous combined hidden layer h t _ to include all in- termediate hidden layers into the final decision:

wherein VÎ R H N is the total number of words in the COW or SDP and i t is the input vector at time t, defined by:

iDepNN-SDP: it=[x t , L t ]

iDepNN-ADP: it= [p t , L t ] , wherein L t are lexical level features (e.g., part-of-speech tags, position indicators, entity types) for each wort w at time point t. In order to minimize the parameters, advanta- geously the same weight matrix W may be used in all three of the formulae (2) -(4) above, i.e. during the forward pass in formula (2), during the backward pass in formula (3) and dur- ing the combination in formula (4) . The parameters of the ma- trix W are learned using backpropagation . Issues with loopy backpropagation as may be present in other approaches are avoided with the present method.

For training, negative samples may be generated by randomly sampling co-occurring entities without known interactions. Then the same number may be sampled as positives to obtain a balanced dataset during training and validation for different sentence ranges. The train/dev/test split may e.g. be

60/20/20 as is common.

Fig. 5 shows an apparatus 100 according to an embodiment of the second aspect of the present invention, i.e. an apparatus for inter-sententrally determining a semantic relationship between a first entity and a second entity in a natural lan- guage document. In particular, the apparatus 100 is config- ured to perform the method according to any embodiment of the first aspect of the present invention, in particular the method as described in the foregoing with respect to Fig. 1 through Fig. 4.

The apparatus 100 comprises an input interface 110 for re- ceiving an input signal 71 such as a natural language docu- ment and a pair of entities el, e2, wherein the task is to determine the semantic relationship between el and e2. The input interface 100 may be realized in hard- and/or software and may utilize wireless or wire-bound communication. For ex- ample, the input interface 110 may comprise an Ethernet adapter, an antenna, a glass fiber cable, a radio transceiver and/or the like. The input interface 110 may in particular be used to perform the steps SOI and S02 as they have been de- scribed in the foregoing.

The apparatus 100 further comprises a computing device 120 configured to perform the steps S03 and S10 through S80. The computing device 120 may in particular comprise one or more central processing units, CPUs, one or more graphics pro- cessing units, GPUs, one or more field-programmable gate ar- rays FPGAs, one or more application-specific integrated cir- cuits, ASICs, and or the like for executing program code. The computing device 120 may also comprise a non-transitory data storage unit for storing program code and/or inputs and/or outputs as well as a working memory, e.g. RAM, and interfaces between its different components and modules.

The apparatus may further comprise an output interface 140 configured to output an output signal 72, for example as has been described with respect to step S90 in the foregoing. The output signal 72 may have the form of an electronic signal, as a control signal for a display device 200 for displaying the semantic relationship visually, as a control signal for an audio device for indicating the determined semantic rela- tionship as audio and/or the like. Such a display device 200, audio device or any other output device may also be integrat- ed into the apparatus 100 itself.

Fig. 6 shows a schematic block diagram illustrating a comput er program product 300 according to the third aspect of the present invention, i.e. a computer program product 300 com- prising executable program code 350 configured to, when exe- cuted (e.g. by the apparatus 100), perform the method accord- ing to the first aspect of the present invention, in particu- lar the method as has been described with respect to Fig. 1 through Fig. 4 in the foregoing.

Fig. 7 shows a schematic block diagram illustrating non- transitory computer-readable data storage medium 400 accord- ing to the fourth aspect of the present invention, i.e. a da- ta storage medium 400 comprising executable program code 450 configured to, when executed (e.g. by the apparatus 100), perform the method according to the first aspect of the pre- sent invention, in particular the method as has been de scribed with respect to Fig. 1 through Fig. 4 in the forego- ing .

In the foregoing detailed description, various features are grouped together in the examples with the purpose of stream- lining the disclosure. It is to be understood that the above description is intended to be illustrative and not restric- tive. It is intended to cover all alternatives, modifications and equivalence. Many other examples will be apparent to one skilled in the art upon reviewing the above specification, taking into account the various variations, modifications and options as described or suggested in the foregoing.