Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SYSTEM AND METHOD FOR NATURAL LANGUAGE GENERATION OF A NEWS STORY
Document Type and Number:
WIPO Patent Application WO/2022/157766
Kind Code:
A1
Abstract:
A computer-based system for the natural language generation of a story using data sources to collect data, determining facts from the data according to specific rules of predetermined aspects of a specific subject area, characterizing the data types of said faces and calculate the significance scores of these facts. The system facts and story tree to generate a story or an article heading and outline and fill it with facts based on the specific subjects. The story is then populated with sentences created using facts previously retrieved and scored data.

Inventors:
ERELL AMIR (IL)
NEVO UDI (IL)
Application Number:
PCT/IL2022/050079
Publication Date:
July 28, 2022
Filing Date:
January 19, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
HOOPSAI TECH LTD (IL)
International Classes:
G06F40/56; G06F40/30; G06N5/02; G06N20/00
Foreign References:
US20200401770A12020-12-24
US20200019592A12020-01-16
US20200293617A12020-09-17
US20160232152A12016-08-11
Other References:
NEIL MCINTYRE ; MIRELLA LAPATA: "Learning to tell tales", NATURAL LANGUAGE PROCESSING OF THE AFNLP: VOLUME 1, ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, N. EIGHT STREET, STROUDSBURG, PA, 18360 07960-1961 USA, 2 August 2009 (2009-08-02) - 7 August 2009 (2009-08-07), N. Eight Street, Stroudsburg, PA, 18360 07960-1961 USA , pages 217 - 225, XP058111650, ISBN: 978-1-932432-45-9
Attorney, Agent or Firm:
BRESSLER, Eyal et al. (IL)
Download PDF:
Claims:
CLAIMS

1 A computer-based system for natural language generation of a story, comprising a. one or more data point amalgamators 112, each configured to i. access one or more data sources 105 comprising data points 110 concerning a subject area; ii. establish one or more facts 115 from the data sources 105, each fact 115 determined from a combination of one or more of the data points 110, the data points 110 selected according to selection rules for a predetermined aspect of the subject area; iii. for each extracted fact 115, characterize a data type of each data point 110 as one or more of a) an indicator 1101, characterizing the fact 115; and b) a variable 110V, comprising a textual contribution of the fact 115 to a story; iv. calculate significance scores of each fact 115; b. a story tree library 117, comprising one or more story trees 120 associated with one or more of the data point amalgamators 112, each story tree 120 comprising i. section nodes 125, each section node 125 corresponding to an outline heading of an article or a behavior of child nodes of the section node 125; ii. leaf nodes 130 corresponding to article contents; each leaf node 130 is characterized by a minimum and a maximum number of scored facts to be populated in the leaf node; and c. a story planning module 135, configured to i. receive the scored facts 115 from the data point amalgamator 112; ii. receive the story trees 120 that are associated with the data point amalgamator 112 from the story tree library 117; iii. populate each of the leaf nodes 130 with scored facts 115, according to the minimum and maximum number of scored facts and to the behaviors of section nodes 125 of the story tree 120; iv. for each story tree 120, summing the topic significance scores of facts 115 placed in the leaf nodes; and v. select a story plan 120' comprising the populated story tree 120 with the highest summed significance score; d. a message template library comprising one or more message templates 145, each message template 145 associated with a data point amalgamator 112; e. a story scripting module configured to obtain the message templates 145 associated with the data point amalgamator 112 or with a fact 115 and, for each fact 115 in each leaf 130 of the story plan 120' i. find the best-matching message template 145 for the fact 115; and ii. inject the textual contribution of the variables of the data points 110 into the selected message. The system of claim 1, wherein said subject area is a sport or finance. The system of claim 1, wherein said data sources comprise an Al analyzer of a live video stream, a live textual news feed, a historical archive, a news site, a social media site, or any combination thereof. The system of claim 1, wherein said significance score is computed as a function of fact uniqueness, user bias, or any combination thereof. The system of claim 1, wherein said behaviors of child nodes comprise placing facts in chronological order, separating different classifications of facts, or any combination thereof. The system of claim 1, wherein the summed score of a story tree is penalized if said minimum number of a leaf node is not met. The system of claim 1, wherein said the story planning module is further configured to populate the story trees with facts established by two or more said data point amalgamators. The system of claim 1, wherein the story scripting module is further configured to insert a non-textual media item into the story, said media item serving as the basis for one or more said data points in a said fact. The system of claim 1, further configured to perform orthographic realization to fix punctuation and casing. The system of claim 1, wherein the scripting module is configured to find the bestmatching template by d. eliminating message templates containing names of indicators or variables not found among the data points; e. testing the remaining message templates for how many indicators in the fact have matching values in the message template; and f. selecting message templates with the most matching indicator values. The system of claim 10, wherein the scripting module is further configured to randomly select message templates that are tied for the most matching indicator values. A method for natural language generation of a story, comprising a story planning stage comprising steps of a. providing the system of claim 1; b. providing said system with access to one or more data sources comprising data points concerning a subject area 205; c. establishing one or more facts from the data sources, each said fact comprising a combination of one or more said data points, the data points selected according to a selection rules for a predetermined aspect of the subject area 210; d. for each said fact, characterizing a data type of each data point 215 as one or more of i. an indicator, characterizing the data point; and ii. a variable, comprising a textual contribution of the data point; e. calculating significance scores of each said fact 220; f. obtaining one or more story trees associated with the aspect 225, each said story tree comprising i. section nodes, each said section node corresponding to an outline heading of an article or a behavior of child nodes of the section node; ii. leaf nodes corresponding to article contents; each said leaf node is characterized by a minimum and a maximum number of scored facts to be populated in the leaf node; g. for each said story tree, populating each said leaf node with the scored facts, according to the minimum and a maximum number of scored facts and a story ruleset of the story tree 230; h. for each said story tree, summing the significance scores of facts placed in the leaf nodes 235; and i. selecting a said populated story tree with a highest said summed significance score as a story plan, thereby selecting the best story for the aspect 240; a realization stage for expression of said story, comprising, for each leaf node in the selected tree, steps of j. obtaining message templates associated with the data point amalgamator 245; k. for each fact in each leaf of the selected tree, i. finding the best-matching message template 250; and ii. injecting the textual contribution of the variables of the data points into the selected message 255. The method of claim 12, wherein said subject area is a sport or finance. The method of claim 12, wherein said data sources comprise an Al analyzer of a live video stream, a historical archive, a news site, a social media site, or any combination thereof. The method of claim 12, wherein said step of computing a significance score made as a function of fact uniqueness, user bias, or any combination thereof. The method of claim 12, wherein said behaviors of child nodes comprise placing facts in chronological order, separating different classifications of facts, or any combination thereof The method of claim 12, further comprising a step of penalizing the summed score of a story tree if said minimum number of a leaf node is not met. The method of claim 12, further comprising steps of populating the story trees with facts established by two or more said data point amalgamators. The method of claim 12, further comprising a step of inserting a non-textual media item into the story, said media item serving as the basis for one or more said data points in a said fact. The method of claim 12, further comprising a step of performing orthographic realization to fix punctuation and casing. The method of claim 12, wherein said finding the best-matching template comprises steps of d. eliminating message templates containing names of indicators or variables not found among the data points; e. testing the remaining message templates for how many indicators in the fact have matching values in the message template; and f. selecting message templates with the most matching indicator values. The method of claim 21, w'herein the scripting module is further configured to randomly select message templates that are tied for the most matching indicator values.

Description:
SYSTEM AND METHOD FOR NATURAL LANGUAGE GENERATION OF A NEWS STORY

FIELD OF THE INVENTION

The invention is in the field of natural language generation, and in particular relates to generating a news story from data acquired from sources such as live videos, news feeds, and historical archives.

BACKGROUND TO THE INVENTION

Computer-based methods and systems for generating written matter from raw data are previously disclosed:

US patent 9,720,884 B2 discloses a system and method for automatically generating a narrative story that receive data and information pertaining to a domain event. The received data and information and/or one or more derived features are then used to identify a plurality of angles for the narrative story. The plurality of angles is then filtered, for example through use of parameters that specify a focus for the narrative story, length of the narrative story, etc. Points associated with the filtered plurality of angles are then assembled and the narrative story is rendered using the filtered plurali ty of angl es and the assembled points.

China patent application 110309320A discloses an NBA basketball news automatic generation method combined with an NBA competition knowledge map, and the method comprises the steps of preprocessing the NBA text live text data, crawled by a network, removing the crawler webpage labels, removing the stop words in the character text, and representing through a quintuple; according to a proposed segmentation algorithm, performing data segmentation on the preprocessed text live broadcast data to obtain a competition development trend; performing special event extraction according to the proposed definition of the basketball competition special event; defining a basketball news description template; combining the data segmentation result, the special event extraction result and the corresponding news description template to generate a news first draft; generating the competition background information in combination with the knowledge graph to obtain a news final draft, so that the automatic generation of the NBA competition news is realized, the quality of the generated NBA competition news is improved, and the generated news content can be better controlled. US patent 9,721,207 B2 discloses a method for generating written content; in an application in accordance with an embodiment includes: receiving a query from a user; importing data from at least one data source in response to the query; ranking the imported data based on a plurality of ranking factors to determine a relevance of the imported data; automatically generating written content using at least a portion of the imported data based on the determined relevance of the imported data; and automatically customizing the written content based on a file format of the application.

The present invention advances the technology for natural language generation of written content, as further described below.

SUMMARY

Existing NLG story-writing systems typically begin with a well-defined topic or story structure and attempt to find information that best fits the defined topic or structure. In contrast, the present invention begins with a general aspect of a particular subject area, and then follows the information itself, wherever it may lead, as a guide to the topic and story structure yielding the most significant (e.g., most important or most interesting) story.

In an exemplary embodimen t of the in vention, a data point amalgamator searches one or more data sources for data points informing on a particular aspect of a subject area. Data points are amalgamated into facts and the facts are scored for their significance. A fact's significance score can be based on, for example, uniqueness of the fact or on the fact's appeal to biases of a user for whom a story is being prepared.

A story planning module selects story trees from a story tree library. Each story tree defines a story outline and how facts are assembled therein to form a story. The story planning module populates each one of the selected story tress with as many of the scored facts as possible given constraints of the story tree. The story planning module sums the combined score of the inserted facts. The populated story tree with highest summed score is selected as containing the most noteworthy story.

To realize the final story, a story scripting module matches data points in each leaf node of the selected populated story tree to a best-fitting message template. The message templates may comprise either textual components of the story or metadata of non-textual media files, such as images, sound, and video. It is therefore within the scope of the present invention to provide a computer-based system for natural language generation of a story, comprising a. one or more data point amalgamators, each configured to i. access one or more data sources comprising data points concerning a subject area; ii. establish one or more facts from the data sources, each fact determined from a combination of one or more of the data points, the data points selected according to selection rules for a predetermined aspect of the subject area; iii. for each extracted fact, characterize a data type of each data point as one or more of a) an indicator, characterizing the fact; and b) a variable, comprising a textual contribution of the fact to a story; iv. calculate significance scores of each fact; b. a story tree library, comprising one or more story trees associated with one or more of the data point amalgamators, each story tree comprising i. section nodes, each section node corresponding to an outline heading of an article or a behavior of child nodes of the section node; ii. leaf nodes corresponding to article contents; each leaf node is characterized by a minimum and a maximum number of scored facts to be populated in the leaf node; and c. a story planning module, configured to i. receive the scored facts from the data point amalgamator; ii. receive the story trees that are associated with the data point amalgamator from the story tree library; iii. populate each leaf node with scored facts, according to the minimum and maximum number of scored facts and to the behaviors of section nodes of the story tree; iv. for each story tree, summing the topic significance scores of facts placed in the leaf nodes; and v. select the populated story tree with the highest summed significance score, the selected populated story tree constituting a story plan; d. a message template library comprising one or more message templates, each message template associated with a the data point amalgamator; e. a story scripting module configured to obtain the message templates associated with the data point amalgamator or with a fact and, for each fact in each leaf of the story plan, i. find the best-matching message template for the fact; and ii. inject the textual contribution of the variables of the data points into the selected message.

It is further within the scope of the invention to provide the ahovementioned system, wherein the subject area is a sport or finance.

It is further within the scope of the invention to provide the ahovementioned system, wherein the data sources comprise an AT analyzer of a live video steam, a live textual news feed, a historical archive, a news site, a social media site, or any combination thereof.

It is further within the scope of the invention to provide the ahovementioned system, wherein the significance score is computed as a function of fact uniqueness, user bias, or any combination thereof.

It is further within the scope of the invention to provide the ahovementioned system, wherein the behaviors of child nodes comprise placing facts in chronological order, separating different classifications of facts, or any combination thereof.

It is further within the scope of the invention to provide the ahovementioned system, wherein the summed score of a story tree is penalized if the minimum number of a leaf node is not met.

It is further within the scope of the invention to provide the ahovementioned system, wherein the story planning module is further configured to populate the story trees with facts established by two or more of the data point amalgamators. It is further within the scope of the invention to provide the abovementioned system, wherein the story scripting module is further configured to insert a non- textual media item into the story, the media item serving as the basis for one or more of the data points in a fact.

It is further within the scope of the invention to provide the abo vementioned system, further configured to perform orthographic realization to fix punctuation and casing.

It is further within the scope of the invention to provide the abovementioned system, wherein the scripting module is configured to find the best-matching template by a. eliminating message templates containing names of indicators or variables not found among the data points; b. testing the remaining message templates for how many indicators in the fact have matching values in the message template; and c. selecting message templates with the most matching indicator values.

It is further within the scope of the invention to provide the abovementioned system, wherein the scripting module is further configured to randomly select message templates that are tied for the most matching indicator values.

It is further within the scope of the invention to provide a method for natural language generation of a story, comprising a story planning stage comprising steps of a. providing the system for natural language generation of a story; b. providing the system with access to one or more data sources comprising data points concerning a subject area; c. establishing one or more facts from the data sources, each fact comprising a combination of one or more of the data points, the data points selected according to a selection rules for a predetermined aspect of the subject area; d. for each fact, characterizing a data type of each data point as one or more of i. an indicator, characterizing the data point; and ii. a variable, comprising a textual contribution of the data point; e. calculating significance scores of each fact; f. obtaining one or more story trees associated with the aspect, each story tree comprising i. section nodes, each section node corresponding to an outline heading of an article or a behavior of child nodes of the section node; ii. leaf nodes corresponding to article contents; each leaf node is characterized by a minimum and a maximum number of scored facts to be populated in the leaf node; g. for each story tree, populating each leaf node with the scored facts, according to the minimum and a maximum number of scored facts and a story ruleset of the story tree; h. for each story tree, summing the significance scores of facts placed in the leaf nodes; and i. selecting the populated story tree with the highest summed significance score as a story plan, thereby selecting the best story for the aspect; a realization stage for expression of the story, comprising, for each leaf node in the selected tree, steps of j. obtaining message templates associated with the data point amalgamator; k. for each fact in each leaf of the selected tree, i. finding the best-matching message template; and ii. injecting the textual contribution of the variables of the data points into the selected message.

It is further within the scope of the invention to provide the abovementioned method, wherein the subject area is a sport or finance.

It is further within the scope of the invention to provide the abovementioned method, wherein the data sources comprise an Al analyzer of a live video stream, a historical archive, a news site, a social media site, or any combination thereof.

It is further within the scope of the invention to provide the abovementioned method, wherein the step of computing a significance score made as a function of fact uniqueness, user bias, or any combination thereof. It is further within the scope of the invention to provide the abovementioned method, wherein the behaviors of child nodes comprise placing facts in chronological order, separating different classifications of facts, or any combination thereof.

It is further within the scope of the invention to provide the abovementioned method, further comprising a step of penalizing the summed score of a story tree if the minimum number of a leaf node is not met.

It is further within the scope of the invention to provide the abovementioned method, further comprising steps of populating the story trees with facts established by two or more data point amalgamators.

It is further within the scope of the invention to provide the abovementioned method, further comprising a step of inserting a non-textual media item into the story, the media item serving as the basis for one or more data points in a fact.

It is further within the scope of the invention to provide the abovementioned method, further comprising a step of performing orthographic realization to fix punctuation and casing.

It is further within the scope of the invention to provide the abovementioned method, wherein finding the best-matching template comprises steps of a. eliminating message templates containing names of indicators or variables not found among the data points; b. testing the remaining message templates for how many indicators in the fact have matching values in the message template; and c. selecting message templates with the most matching indicator values.

It is further within the scope of the invention to provide the abovementioned method, wherein the scripting module is further configured to randomly select message templates that are tied for the most matching indicator values.

BRIEF DESCRIPTION OF THE DRAWINGS

Figs. 1A and 1B show a computer-based system for natural language generation of a story, according to some embodiments of the invention.

Fig. 2 shows steps of a computer-based method for natural language generation of a story, according to some embodiments of the invention. DETAILED DESCRIPTION

Reference is now made to Figs. 1A and 1B, showing a computer-based system 100 for natural language generation of a story.

Generation of a story entails two stages: story planning and realization. Fig. 1 A shows modules involved in the story planning stage, for which the system 100 comprises one or more data point amalgamators 112, a story tree library 117, a story planning module 135. Fig. 1B shows modules involved in the story realization stage, for which the system 100 comprises a message template library 140 and a story scripting module 150.

The modules of the system 100, described herein, are implemented by one or more processors and one or more non-transitory computer-readable media (CRMs). The CRMs store instructions to the processors for executing the module functions. The system 100 may be made available to users by any means; for example, as licensed software, software-as- service (SaS), etc. Computer configuration details, such as the type(s) of computer(s), storage media, display devices, operating system(s), etc., are not specified in this disclosure. A person skilled in the art, given this disclosure, would know' one or more configurations for implementing the system 100.

While the present disclosure is directed towards writing news stories about basketball or finance, it is understood that the teachings of the invention can be applied to any subject area whose news is sourced from objective data.

Story Planning Stage

Reference is now made to Fig. 1A. A data point amalgamator 112 has access to one or more data sources 105 containing data points 110 concerning a subject area. A subject area can be, for example, a particular sport (e.g., basketball) or finance. A data source 105 may- be publicly available, subscribed, or proprietary. The data source 105 can be, for example, an Al analyzer of a basketball game's live video stream and/or a live statistical feed from the game. The Al analyzer computes each team's probability of winning in real time, after each play. Another possible data source 105 is an archive of historical data concerning the subject area, such as an archive of team and player statistics for present and/or previous seasons. Yet other possible data sources 105, while not real-time and not historical, contain current newsworthy data; examples of such data sources 105 include news sites and social media sites. Live, current, and historical data points 110 can be combined, enabling comparing and contrasting of outcomes over different time scales.

Possible data points 110 include a player with the ball, an expected outcome of a game or single play, a win probability of either team, an outcome (e.g., a basket, a final score), active and inactive injured players, the teams' league standings, past years' performances, etc.

Each data point amalgamator 112 is characterized by a particular aspect of the subject area. A data point amalgamator 112 harvests data source(s) 105 for one or more predetermined data points 110 needed in order to establish a fact 115 pertaining to the particular aspect of the data point amalgamator 112. For example, a data point amalgamator 112 may be dedicated to seeking players who made notable performances during a game. The data point amalgamator 112 expresses data points 110 of players' individual contributions to the team's win probability throughout the game. The data point amalgamator 112 combines the data points to establish one or more facts 115. For example, one fact 115 can be a comparison between a player's performance in a play with a set of expectations based, for example, on the player's historical performance and league averages. The data point aggregator 112 may further establish facts 115 of a rank of each performance according to the likelihood of such a performance to take place.

The data point amalgamator 112 predeterminedly allocates a data, type to data point 110. The data type of a data point 110 reflects the content of the data point 110 and determines how the data point 110 shall be processed later, as further described in connection with the realization stage. The data types comprise 1) an indicator 1101, used to match facts 115 with message templates 145 (further described herein); 2) a variable 110V, providing a variable string value to a message template 145; and 3) a hybrid 110H, comprising both an indicator 1101 and a variable 110V component. Each data point 110 is characterized by a name and a value.

The data point amalgamator 112 calculates a significance score for each established fact 115. Computation of a fact's 115 significance score may be a function of uniqueness of the fact 115; for example, an injury to an important player, a sudden change in a team's winning probability, an unexpected win, etc. A significance score may be based, for example, on a user bias. If a particular user requesting the news story is known to be interested in a particular team or player, then an event that involves the team or player of interest receives a higher significance score than an event that does not. A data point amalgamator 112 may employ one or more statistical methods to compute significance scores of facts 115. For example, after each play (of which there are about 400 per basketball game), the data point amalgamator 112 may employ a neural network module (not shown) to predict, in combination with other data points 110, the win probability of each team. For example, the win probability method disclosed in [Ganguly, Sujoy, and Nathan Frank; “The Problem with Win Probability”; 2018 MIT Sloan Sports Analytics Conference; 2018], incorporated herein by reference. The neural network module may also simulate a contrary play scenario — for example, a missed basket instead of a basket — and compute the win probabilities after the contrary play. The impact of the play on the game outcome, and the significance score of the play's fact 115, are computed in correlation to the difference between the win-probability prediction for the real play and the contrary play.

A data point amalgamator 112 may similarly score a player's game performance by summing the impact on the predicted game outcome for each play the player participates in. For example, the data point amalgamator 112 may tally the impact sum of the player's points, assists, and rebounds during die game, and assign a significance score to the play's fact 115 in correlation.

In a finance application, a data point amalgamator 112 may assign a score to an asset's predicted daily returns. For each asset, the data point amalgamator 112 continuously calculates data points 110 comprising a distribution of the historical daily returns using a decay model, such as a Johnson's Su-distribution. The data point amalgamator 112 then calculates likelihood of achieving the actual daily return given the distribution fact 115. A significance score is calculated using the found likelihood and, typically, other contributing factors.

The story tree library 117 is a database containing a selection of a type of template called a story tree 120. Some or all of the story trees 120 may be associated with a particular data point amalgamator 112. A story tree is comprised of leaf nodes 130 and section nodes 125. Leaf nodes 130 are designated for placement therein of facts 115 scored by the data point amalgamator 112; such placement is further described herein in connection with the story planning module 135. Leaf nodes 130 can be characterized by a minimum and maximum number of facts 115, designating a range of number of facts 115 that may populate each leaf node 130. A section node 125 can serve two purposes: 1) correspondence to an outline sectional heading of a news story; and/or 2) behavior of child nodes of the section node. For example, a section node 125 may dictate that all facts 115 placed in children of the section node 125 be placed in chronological order; or, for example, facts 115 comprising “good news” (e.g., gaining possession of the basketball in the opponent's side of the court) be separated from facts 115 comprising “bad news” (e.g. a 3-point basket by the opposing team).

The story planning module 135 receives a set of scored facts from the data point amalgamator 112. The story planning module 135 selects appropriate story trees 120 from the story tree library 117. For one of the selected story trees 120, the story planning module 135 populates the leaf nodes 130 with the scored facts 115. The story planning module 135 observes sectional and behavioral placement rules dictated by section nodes 125, as a fact 115 is cascaded from the top of the story tree 120 down to a leaf node 130. The story planning module 135 further observes the maximum number of facts 115 of each leaf node 130, and will stop further populating of a leaf node 130 whose maximum has been reached. The story planning module 135 adds the significance scores of facts 115 populating the leaf nodes 130, producing an overall score of the story tree 120. In some embodiments, the story tree's 120 overall score is penalized (reduced by a predetermined number of points) if the minimum number of facts 115 in a leaf node 130 is not met.

The story planning module 135 repeats the process of populating of leaf nodes 130 with the scored facts 115 for each of the selected story trees 120. The story planning module 135 selects a story plan 120', the populated story tree 120 with the highest overall score. The story plan 120 ' is deemed to contain the outline and content of the best story for the subject area aspect of the data point amalgamator 112.

Story Realization Stage

Reference is now made to Fig. 1B. The message template library 140 contains message templates 145. A message template 145 comprises indicators 1101 to be best matched with indicators 1101 within the data points 110 of a fact 115. Additionally, a message template 145 contains a skeletal text message (e.g., one or more sentences or phrases) with variable text to be filled in by the textual contribution of variables 110V in the fact 115.

For example, in a finance application, a message template could be,

Indicators: up, daily; Skeletal text: $asset_name is trading around $close_value after starting the day at $previous_close_vadue (up SperceMage^change %)

The matching indicators up and daily specify that the message template 145 is a best match for a combination of data points 110 with indicators up and daily (showing that the combination of data points relates to an asset whose daily value has risen). The unboldfaced text is to be completed by textual contributions of the boldfaced variables 110 V in the matched combination.

Typically, each message template 145 is associated with one or more data point amalgamators 112 or a fact 115 generated thereby.

The story scripting module 150 receives the story plan 120 ? from the story planning module 135; and receives the message templates 145 associated with the data point amalgamator 112 from the message template library 140. For each fact 115 in the story plan 120', the scripting module 150 finds the best-matching message template 145; for example, as follows: 1) the story scripting module 150 eliminates message templates 145 containing names of indicators 1101 or variables 110V not found among the data points 110 constituting the fact 115; 2) the remaining message templates 145 are tested to see how many indicators constituting the fact 115 have matching values in the message template 145; and 3) the message templates 145 with the most matching indicator values is selected. If more two or more message templates 145 are tied for the most matching indicator values, a message template 145 may be selected at random among them.

As an example, a fact 115 contains the data points 110 from a finance data source 105 listed in Table 1:

Table 1

Four message templates 145 are available for this fact 115, message templates 1—4 listed in Tables 2-5: Table 2 - Message Template 1

Table 3 - Message Template 2

Table 4 - Message Template 3 Table 5 — Message Template 4

Message template 2 is eliminated because the indicator 1101 named ‘weekly' in message template 2 is not among the indicators 1101 in the fact 115.

Message template 3 is eliminated because the variable 110V named ‘open value' in message template 3 is not among the variables 110V in the fact 115.

The indicators 1101 in the fact 115 (of Table 1) is compared with indicators 110I in remaining message templates, 1 and 4. Message template 1 contains two indicators 110I in the fact 115, 'daily' and ‘flat.' Message template 4 contains one indicator 1101 in the fact 115, 'daily.' Therefore, message template 1, with the most indicators 1101 matching the indicators 1101 in the fact 115 (among non-eliminated message templates 145) is chosen as the message template 145 for the fact 115.

The story scripting module 150 injects the textual contribution from values of variables 110V in the fact 115 into the variable text of the selected message template 145. In our example, the scripting module generates the text, “A mostly flat day for the euro, now at 1.1847 to the dollar.”

The story scripting module 150 iteratively writes the story 155 as the story scripting module 150 traverses data points 110 in the leaf nodes 130 of the story plan 120 ? . The story scripting module 150 may employ orthographic realization to fix punctuation and casing.

The selected story plan 120' may alternatively or additionally be matched to nontextual media serving as the basis for data points 110 of a fact 115. The media may be, for example, an image, an audio clip, a video clip, or any combination thereof. The story scripting module 150 may insert the media file into the story, in lieu of or in addition to the textual message. The textual message may serve as a caption for the media file content. Reference is now made to Fig. 2, showing steps of a computer-based method 200 for natural language generation of a story, according to some embodiments of the invention. The method 200 comprises a story planning stage comprising steps of a. providing a system for natural language generation of a story 202; b. providing the system with access to one or more data sources comprising data points concerning a subject area 205; c. establishing one or more facts from the data sources, each fact comprising a combination of one or more of the data points, the data points selected according to a selection rules for a predetermined aspect of the subject area 210; d. for each fact, characterizing a data type of each data point 215 as one or more of i. an indicator, characterizing the data point; and ii. a variable, comprising a textual contribution of the data point; e. calculating significance scores of each fact 220; f. obtaining one or more story trees associated with the aspect 225, each story tree comprising i. section nodes, each section node corresponding to an outline heading of an article or a behavior of child nodes of the section node; ii. leaf nodes corresponding to article contents; each leaf node is characterized by a minimum and a maximum number of scored facts to be populated in the leaf node; g. for each story tree, populating its leaf nodes with the scored facts, according to the minimum and a maximum number of scored facts and a story ruleset of the story tree 230; h. for each story tree, summing the significance scores of facts placed in the leaf nodes 235; and i. selecting the populated story tree with the highest summed significance score as a story plan, thereby selecting the best story for the aspect 240; a realization stage for expression of the story, comprising, for each leaf node in the selected tree, steps of j. obtaining message templates associated with the data point amalgamator 245; k. for each fact in each leaf of the selected tree, i. finding the best-meatching message template 250; and ii. injecting the textual contribution of the variables of the data points into the selected message 255.