HIERARCHICAL LANGUAGE GENERATION WITH RECURRENT AGGREGATION

Title:

HIERARCHICAL LANGUAGE GENERATION WITH RECURRENT AGGREGATION

Document Type and Number:

WIPO Patent Application WO/2023/030637

Kind Code:

Abstract:

A language generator for generating a natural language output in dependence on a conceptual input structure, the language generator comprising one or more processors configured to: split the conceptual input structure into a plurality of sub-units, each sub-unit representing a conceptual subpart of the conceptual input structure; input each sub-unit to a trained weighted processing network capable of generating a natural language representation of its inputs so as to form a plurality of natural language segments; and combine the natural language segments to form the natural language output. A method for adjusting the weighting of a weighted processing network for use in a language generator. The hierarchical Recurrent Aggregative Generation provides a three-layered architecture specifically designed to maximize transfer learning from large pretrained language models for different subtasks of language generation.

Inventors:

ZHOU GIULIO (DE)
LAMPOURAS GERASIMOS (DE)

Application Number:

PCT/EP2021/074320

Publication Date:

March 09, 2023

Filing Date:

September 03, 2021

Export Citation:

Click for automatic bibliography generation Help

Assignee:

HUAWEI TECH CO LTD (CN)
ZHOU GIULIO (DE)

International Classes:

G06F40/35; G06F40/56

Other References:

SHANG-YU SU ET AL: "Natural Language Generation by Hierarchical Decoding with Linguistic Patterns", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 8 August 2018 (2018-08-08), XP081096851
ZDEN\V{E}K KASNER ET AL: "Data-to-Text Generation with Iterative Text Editing", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 28 January 2021 (2021-01-28), XP081869194
MIHIR KALE ET AL: "Few-Shot Natural Language Generation by Rewriting Templates", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 30 April 2020 (2020-04-30), XP081655902
THIAGO CASTRO FERREIRA ET AL: "Neural data-to-text generation: A comparison between pipeline and end-to-end architectures", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 23 August 2019 (2019-08-23), XP081469047
ANONYMOUS: "Hierarchical Recurrent Aggregative Generation for Few-Shot NLG Anonymous ACL submission", 16 September 2021 (2021-09-16), XP055922091, Retrieved from the Internet [retrieved on 20220517]
WEN ET AL.: "Semantically conditioned Istm-based natural language generation for spoken dialogue systems", PROCEEDINGS OF THE 2015 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, 2015, pages 1711 - 1721, XP055400405, DOI: 10.18653/v1/D15-1199
DEVLIN ET AL.: "In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies", vol. 1, 2019, ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, article "BERT: Pre-training of deep bidirectional transformers for language understanding", pages: 4171 - 4186
RADFORD ET AL., LANGUAGE MODELS ARE UNSUPERVISED MULTITASK LEARNERS, 2018
RAFFEL ET AL.: "Exploring the limits of transfer learning with a unified text-to- text transformer", JOURNAL OF MACHINE LEARNING RESEARCH, vol. 21, no. 140, 2020, pages 1 - 67
PENG ET AL.: "Few-shot natural language generation for task-oriented dialog", IN FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: EMNLP, 2020, pages 172 - 182
AGARWAL ET AL.: "In Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+", 2020, ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, article "Machine Translation Aided Bilingual Data-to-Text Generation and Semantic Parsing", pages: 125 - 130
WEN ET AL.: "In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies", 2016, ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, article "Multi-domain neural network language generation for spoken dialogue systems", pages: 120 - 129
TRANNGUYEN: "In Proceedings of the 27th International Conference on Computational Linguistics", 2018, ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, article "Adversarial domain adaptation for variational neural language generation in dialogue systems", pages: 1205 - 1217
MI ET AL.: "Meta-learning for low-resource natural language generation in task-oriented dialogue systems", IN PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, vol. 19, 2019, pages 3151 - 3157
ZHOULAMPOURAS: "In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11 th International Joint Conference on Natural Language Processing", vol. 1, 2021, LONG PAPERS, article "Generalising multilingual concept-to-text NLG with language agnostic delexicalization", pages: 114 - 127

Attorney, Agent or Firm:

KREUZ, Georg (DE)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

1. A language generator for generating a natural language output in dependence on a conceptual input structure, the language generator comprising one or more processors configured to: split the conceptual input structure into a plurality of sub-units, each sub-unit representing a conceptual subpart of the conceptual input structure; input each sub-unit to a trained weighted processing network capable of generating a natural language representation of its inputs so as to form a plurality of natural language segments; and combine the natural language segments to form the natural language output.

2. The language generator according to claim 1, wherein the conceptual input structure comprises a plurality of attribute-value pairs.

3. The language generator according to claim 1 or 2, wherein each of the sub-units of the conceptual input structure includes at least one attribute-value pair.

4. The language generator according to claim 2, wherein each attribute-value pair includes an attribute identifier information and at least one corresponding unit of attribute information.

5. The language generator according to claim 4, wherein the one or more processors are configured to order the plurality of sub-units prior to being input to the trained weighted processing network.

6. The language generator according to any preceding claim, wherein each of the natural language segments includes attribute identifier information included within the sub-unit input to the trained weighted processing network.

7. The language generator of any one of the preceding claims, wherein the one or more processors are, in combining the natural language segments to form the natural language output, further configured, when combining the natural language segments, to: in a first iteration, input two of the natural language segments to a trained weighted processing network capable of structuring its inputs into a natural language phrase so as to form a first intermediate phrase; and iteratively process each other one of the natural language segments by inputting it together with an intermediate phrase generated in an immediately preceding iteration to the neural network so as to form a further intermediate phrase.

8. A language generator for generating a natural language output in dependence on three or more natural language segments, the language generator comprising one or more processors configured to: in a first iteration, input two of the natural language segments to a trained weighted processing network capable of structuring its inputs into a natural language phrase so as to form a first intermediate phrase; and iteratively process each other one of the natural language segments by inputting it together with an intermediate phrase generated in an immediately preceding iteration to the trained weighted processing network so as to form a further intermediate phrase.

9. The language generator according to claim 7 or 8, wherein the one or more processors are configured to form the attribute-value pairs, so as to comprise an attribute identifier information and at least one corresponding unit of attribute information, and corresponding to each natural language segment, are also input into the trained weighted processing network.

10. The language generator according to claim 9, wherein, the one or more processors are configured to, at each iterative step after the first iteration, combine using the trained weighted processing network the intermediate phrase, generated in an immediately preceding iteration, and the natural language segment input in the current iterative step, based on the all of the natural language segments and the corresponding attribute-value pairs input in previous iterations so as to form the further intermediate phrase.

11. The language generator according to any one of claims 7 to 10, wherein the one or more processors are configured to, in an iterative step, combine the intermediate phrase generated in the immediately preceding stage and the natural language segment in that iterative step by one of the following mechanisms:

(i) concatenating the intermediate phrase generated in an immediately preceding iteration and the natural language segment input in the current iterative step, or

(ii) reframing the intermediate phrase generated in an immediately preceding iteration and the natural language segment input in the current iterative step.

12. The language generator according to claim 9, wherein the one or more processors are configured to, in the first iteration, concatenate the two natural language segments in dependence on the corresponding attribute-value pairs to form the first intermediate phrase.

13. The language generator according to any preceding claim, wherein the one or more processors are configured to, in each iteration, form a plurality of candidate intermediate phrases, calculate for each candidate intermediate phrase a slot error value, and select as the intermediate phrase to be output by that iteration the candidate intermediate phrase having the fewest slot errors.

14. The language generator according to claim 13, wherein the slot error for a candidate intermediate phrase is dependent on the number of attribute-value pairs corresponding to the natural language segments that are present in the inputs to the respective iteration but absent from the candidate intermediate phrase.

15. The language generator according to claim 14, wherein the processor is further configured to select, based on which has the lowest slot error, a natural language output from one of the intermediate phrases formed in the final iteration and the output of a further iteration wherein the intermediate phrase formed in the final iteration is corrected using machine-learnt lexicalisation.

16. The language generator according to any preceding claim, wherein the language generator further comprises the trained weighted processing network.

17. A method for adjusting the weighting of a weighted processing network for use in a language generator according to claim 16, the method comprising forming a training signal by determining matches between a target natural language text and target natural language elements based on target information units corresponding to each natural language segment and thereby forming one or more weights of the network; and inputting the training signal into a weighted processing network.

18. A method according to claim 17, the method comprising determining a match between a target natural language element and a target information unit when a a similarity metric exceeds a pre-determined threshold.

19. A method according to claim 17 or 18, wherein the target natural language segments are ordered and the method comprises forming natural language segments from natural

19 language elements that do not produce a match immediately preceding and following the matched target natural language element.

20. A method according to claim 17, the method comprising forming the natural language segments from the matched natural language elements aggregated in the same order as the target attributes occur in the target natural language text.

Description:

HIERARCHICAL LANGUAGE GENERATION WITH RECURRENT AGGREGATION

FIELD OF THE INVENTION

This invention relates to natural language generation. One example may involve hierarchical language generation with recurrent aggregation. Additionally, there is a method of adjusting a training signal for a weighted processing network.

BACKGROUND

Large pretrained language models (LPLM) are typically trained on vast amounts of raw data that has not undergone annotation or correction by human supervisors. Such models can be intended to model general language knowledge but not necessarily the particular vocabulary or structure required for a specific task. Once an LPLM of this type can model language knowledge to an acceptable level it can be further optimized to achieve high levels of performance on specific tasks. Due to their extensive training on large general language data, LPLMs are especially useful for domain adaptation and transfer learning for few-shot and zeroshot training settings. “Few-shot” and “zero-shot” describe training settings where very few or zero training data specific to a particular task are available.

Natural language generation (NLG) is a family of tasks with the goal of generating a natural language text from a specified input. Examples of suitable input include machine-readable semantic representation (MR), a graph, a set of database entries, or another natural language text. Figure 1 shows a basic input and output for a concept NLG task as part of a dialogue system. Neural models, and machine learning models in general, are popular for language generation tasks, but tend to require large amounts of domain-specific data that has been annotated or corrected by humans. The lack of availability of domain-specific human- annotated data is a substantial bottleneck to the adoption of machine learnable language generation in current dialogue systems. To reduce the requirement for such heavily curated data, some NLG models exploit transfer learning from LPLMs.

Prior work that attempts to improve NLGs includes pipelined and end-to-end approaches. Both types of approaches can be addressed either by either rule-based or machine learning-based solutions, but end-to-end approaches are predominantly addressed with machine learning.

Pipelined approaches treat language generation as a sequence of tasks and subtasks that gradually transform the initial input into the final output. The main tasks (Reiter, E., & Dale, R. (2000). Building Natural Language Generation Systems Studies in Natural Language Processing Cambridge: Cambridge University Press, doi: 10.1017/CBO9780511519857) include content selection, document planning, and surface realization. Subtasks of document planning include lexicalization and aggregation. Lexicalization is the process that determines the basic vocabulary and lexical structures that will be used to realize the input into text. Aggregation is the process through which smaller lexical structures are combined to form larger sentences.

These pipelined approaches perform well but suffer from the problem of error propagation from one subtask to the other and have since been outperformed by end-to-end approaches. The reduction of required data has not been addressed in machine-learning approaches to pipelined-models.

Figure 2 shows the architecture of an end-to-end approach. Such approaches have become widely adopted in recent years, since neural models are especially adapted to such and exhibit higher performance and versatility. In end-to-end approaches the tasks and subtasks of the more traditional pipeline can be performed latently by the model. Examples of end-to-end neural models for NLG include recurrent models such as SCLSTM (Wen et al. 2015. Semantically conditioned Istm-based natural language generation for spoken dialogue systems. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1711-1721 , Lisbon, Portugal. Association for Computational Linguistics) and transformer-based models. One drawback of typical neural systems is a requirement for a large amount of high-quality human-annotated data for the training process. This is usually not readily available and/or is costly to obtain.

One strategy employed by previous work to reduce the amount of domain-specific human- annotated data needed is to employ transfer learning from LPLMs. LPLMs have been pretrained on a large amount of non-parallel non-annotated data such as BERT (Devlin et al. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 Long and Short Papers, pages 4171—4186, Minneapolis, Minnesota. Association for Computational Linguistics) or GPT-2 (Radford et al. 2018. Language Models are Unsupervised Multitask Learners), or pretrained on a variety of tasks like T5 (Raffel et al. 2020. Exploring the limits of transfer learning with a unified text-to- text transformer. Journal of Machine Learning Research, 21(140):1-67). NLG systems can take advantage of the generation ability of LPLMs by directly fine-tuning them with in-domain data (Peng et al. 2020 Few-shot natural language generation for task-oriented dialog. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 172-182, Online. Association for Computational Linguistics, Agarwal et al. 2020 Machine Translation Aided Bilingual Data-to-Text Generation and Semantic Parsing. In Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+), pages 125-130, Dublin, Ireland (Virtual), Association for Computational Linguistics, Kasner and Dusek 2020 Train Hard, Finetune Easy: Multilingual Denoising for RDF-to-Text Generation. In Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+), pages 125-130, Dublin, Ireland (Virtual), Association for Computational Linguistics).

Such approaches can suffer from exposure bias stemming from the fact that LPLMs have not been prior exposed to the structured input (such as a meaning representation, graph or a set of database entries) of certain NLG tasks.

Peng et al. 2020 (Few-shot natural language generation for task-oriented dialog. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 172-182, Online. Association for Computational Linguistics) propose SCGPT: a LPLM GPT-2 model that is further pretrained with a large human-annotated out-of-domain dialogue dataset with structured data as inputs. A problem of this approach can be that such systems underperform when reducing the amount of required human-annotated data, resulting in fluent but inadequate output. Such systems can produce output exhibiting omissions and/or hallucinations of input information. Figure 3 shows an example of the structure of a end-to-end approach, in particular a SC-GPT neural language model. As shown in figure 3, the meaning representation is converted into a language output.

Other approaches to reduce domain-specific human-annotated data have been proposed. Wen et al. 2016 (Multi-domain neural network language generation for spoken dialogue systems. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 120-129, San Diego, California. Association for Computational Linguistics) leveraged the scarcity of target in-domain data by augmenting it with synthetic data. Tran and Nguyen 2018 (Adversarial domain adaptation for variational neural language generation in dialogue systems. In Proceedings of the 27th International Conference on Computational Linguistics, pages 1205- 1217, Santa Fe, New Mexico, USA. Association for Computational Linguistics) used variational autoencoders in conjunction with text similarity and domain critics to better guide the fine- tuning process. Mi et al. 2019 (Meta-learning for low-resource natural language generation in task-oriented dialogue systems. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI- 19, pages 3151-3157. International Joint Conferences on Artificial Intelligence Organization) tackled the problem by defining domain adaptation as an optimization meta-learning task.

It would be desirable to have an improved language generator. Such a language generator might have reduced requirements for human-annotated data and/or might generate more flexible or accurate natural language.

Different generation subtasks might exhibit different capacities for transfer learning. Systems to be described below can separate lexicalization from aggregation. Typically, lexicalisation is more domain-dependent than aggregation. To further reduce training requirements, the aggregation model can be treated as domain-independent and provided with different domainspecific lexicalization models.

SUMMARY

According to one aspect there is provided a language generator for generating a natural language output in dependence on a conceptual input structure, the language generator comprising one or more processors configured to: split the conceptual input structure into a plurality of sub-units, each sub-unit representing a conceptual subpart of the conceptual input structure; input each sub-unit to a trained weighted processing network capable of generating a natural language representation of its inputs so as to form a plurality of natural language segments; and combine the natural language segments to form the natural language output.

In the language generator the conceptual input structure may comprise a plurality of attributevalue pairs. This may make it easier to control the nature of the desired output.

Each of the sub-units of the conceptual input structure may also include at least one attributevalue pair. This may make it easier to control the nature of the desired output.

In the language generator each attribute-value pair may include an attribute identifier information and at least one corresponding unit of attribute information. This may allow the attribute-value pairs to readily designate information useful in the language generation process.

The one or more processors present in the language generator may be configured to order the plurality of sub-units prior to them being input to the trained weighted processing network. This can allow the language generator to adapt the sub-units to fit a natural language pattern. In the language generator each of the natural language segments may include attribute identifier information included within the sub-unit input to the trained weighted processing network. The attribute identifier information may assist the system in forming the natural language output.

In the language generator as discussed above the one or more processors may, when combining the natural language segments to form the natural language output, be configured to: in a first iteration, input two of the natural language segments to a trained weighted processing network capable of structuring its inputs into a natural language phrase so as to form a first intermediate phrase. The processor(s) may then iteratively process each other one of the natural language segments by inputting it together with an intermediate phrase generated in an immediately preceding iteration to the neural network so as to form a further intermediate phrase. This architecture can allow the ultimately output phrase to be built up from successive segments. This can simplify the task of forming the output by splitting it up in an effective way.

In another aspect, there is provided a language generator for generating a natural language output in dependence on three or more natural language segments. The language generator may comprise one or more processors that may be configured to: in a first iteration, input two of the natural language segments to a trained weighted processing network capable of structuring its inputs into a natural language phrase so as to form a first intermediate phrase. The processors may be further configured to iteratively process each other one of the natural language segments by inputting it together with an intermediate phrase generated in an immediately preceding iteration to the trained weighted processing network so as to form a further intermediate phrase. This architecture can allow the ultimately output phrase to be built up from successive segments. This can simplify the task of forming the output by splitting it up in an effective way.

The one or more processors may be configured to form the attribute-value pairs, so as to comprise an attribute identifier information and at least one corresponding unit of attribute information, and corresponding to each natural language segment, are also input into the trained weighted processing network. This may make it easier to control the nature of the desired output.

The one or more processors may be configured to, at each iterative step after the first iteration, combine using the trained weighted processing network the intermediate phrase, generated in an immediately preceding iteration, and the natural language segment input in the current iterative step, based on the all of the natural language segments and the corresponding attribute-value pairs input in previous iterations so as to form the further intermediate phrase. This may make it easier to control the nature of the desired output.

The one or more processors may be configured to, in an iterative step, combine the intermediate phrase generated in the immediately preceding stage and the natural language segment in that iterative step by one of the following mechanisms: (i) concatenating the intermediate phrase generated in an immediately preceding iteration and the natural language segment input in the current iterative step, or (ii) reframing the intermediate phrase generated in an immediately preceding iteration and the natural language segment input in the current iterative step. In this way a range of measures can be available to the system to generate a suitable natural language output that is reflective of a sense imported collectively by the intermediate phrase and the segment.

The one or more processors may be configured to, in the first iteration, concatenate the two natural language segments in dependence on the corresponding attribute-value pairs to form the first intermediate phrase. This can provide a convenient way to form the desired output from the segments.

In addition, the one or more processors may be configured to, in each iteration, form a plurality of candidate intermediate phrases, calculate for each candidate intermediate phrase a slot error value, and select as the intermediate phrase to be output by that iteration the candidate intermediate phrase having the fewest slot errors. This can provide an efficient way to determine a preferred output.

In the language generator the slot error for a candidate intermediate phrase may be calculated in such a way as to be dependent on the number of attribute-value pairs corresponding to the natural language segments that are present in the inputs to the respective iteration but absent from the candidate intermediate phrase. This can provide an effective metric for determining a preferred output.

The processor may be further configured to select, based on which has the lowest slot error, a natural language output from one of the intermediate phrases formed in the final iteration and the output of a further iteration. The intermediate phrase formed in the final iteration may be corrected using machine-learnt lexicalisation. This can provide an efficient way to determine a preferred output. The language generator may also include a trained weighted processing network as set out above.

In another aspect there is a method for adjusting the weighting of a weighted processing network for use in a language generator. The method may comprise forming a training signal by determining matches between a target natural language text and target natural language elements based on target information units corresponding to each natural language segment and thereby forming one or more weights of the network; and inputting the training signal into a weighted processing network.

The method may further comprise determining a match between a target natural language element and a target information unit when a similarity metric exceeds a pre-determined threshold. This can be an effective mechanism for accurately training the system.

The target natural language segments may be ordered and the method may comprise forming natural language segments from natural language elements that do not produce a match immediately preceding and following the matched target natural language element. This can assist in reflecting suitable re-orderings of the segments.

The method may include forming the natural language segments from the matched natural language elements aggregated in the same order as the target attributes occur in the target natural language text. This may assist in reflecting suitable pre-existing orderings of the segments.

BRIEF DESCRIPTION OF THE FIGURES

The present invention will now be described by way of example with reference to the accompanying drawings. In the drawings:

Figure 1 shows an example of inputs and outputs for a concept-to-text natural language generation task that is part of a dialogue system.

Figure 2 illustrates an example of the inputs and outputs used in an end-to-end approach in a concept-to-text natural language generation task.

Figure 3 illustrates an SC-GPT end-to-end approach neural language model. Figure 4 illustrates an example of separate lexicalization and aggregation when performing a concept to text natural language generation task.

Figure 5 shows an overview of phases undertaken when performing Hierarchical Recurrent Aggregative Generation (HRAG).

Figure 6 shows an example of sub-phrase inference that is used to train the weighted processing network.

DETAILED DESCRIPTION OF THE INVENTION

Embodiments of the system to be described below reduce the required amount of domainspecific human-annotated data for language generation by exploiting transfer learning from LPLMs.

Since different language generation decisions exhibit different capacities for transfer learning, the present disclosure describes an architecture that separates lexical and low-structure decisions (lexicalisation, i.e. selecting the vocabulary to form small noun and verb phrases) from high-structural decisions (aggregation, i.e. how to combine smaller sentence units/sub- units to form full sentences, phrases and texts).

The present disclosure describes Hierarchical Recurrent Aggregative Generation (HRAG) that may be formed of a three-layered architecture specifically designed to maximize transfer learning from large pretrained language models for different subtasks of language generation. While task-oriented dialogue is ideal for this architecture, it is applicable to any generation task where the input can be compartmentalized into distinct facts, e.g. summarization. Since the current configuration can separate the conceptual input into distinct sub-units e.g. attributevalue pairs and attends the resulting sub-phrases, omissions and hallucinations of input information in the output can be reduced.

The first module may be in charge of independently lexicalising a unit of information in the conceptual input structure as a sentence sub-phrase (e.g. a phrase expressing that unit of information alone), the second module may be responsible for aggregating these sub-phrases into larger sub-phrases (natural language segments), and the third module fixes any errors and generates a coherent and fluent final output. These three modules are jointly trained with a loss that combines their individual objectives. To finetune the separate lexicalization and aggregation LPLM models (trained weighted processing networks) explicitly annotated data should be used, however this is not readily available. To address a further aspect of the present disclosure relates to a method for adjusting the weighting of a weighted processing network. This may be achieved by inferring an appropriate training signal that may be inferred from limited domain-specific human- annotated end-to-end data. The present disclosure describes an approach that is orthogonal to known specific LPLM models or other known language generation models used for lexicalization and aggregation. This approach reduces the domain-specific human-annotated data needed to finetune the models.

Figure 4 shows an example of a language generator in which lexicalization and aggregation subtasks performed in order to form a natural language input are handled non-latently and separately.

The language generator shown in figure 4 is given an input, this input may be a specific structured input, for example in figure 4 this input is an example of a conceptual input structure and is a series of attribute categories and values corresponding to these categories. When given an input such as that in the bottom box shown in figure 4, e.g. the meaning representation OFFER (stylist name = Atelier Salon Willow Glen ; city = San Jose) INFORM (count = 10), the first step of the language generator is to separate the input it into distinct sub-units. Each of the sub-units represents a sub-part of the overall conceptual input structure. In figure 4, each sub-part and sub-unit of the input includes an attribute-value pair and the example of figure 4 includes three sub-units. Although as shown in the example of figure 4, the sub-units include attribute-value pairs, however, the sub-units may be other formed of different information for example abstract meaning representations such as a vector of numbers that have an associated meaning. Alternatively, the sub-units may be formed of any structured form allowing meaning to be extracted from the inputs.

Returning to the example shown in figure 4, using a finetuned LPLM, or other known language generator, the present language generator can transform each attribute-value pair that forms a sub-unit into a short sub-phrase (i.e. lexicalization is performed). This sub-phrase may be created by inputting a sub-unit into the finetuned LPLM, which is an example of a trained weighted processing network. The finetuned LPLM will generate a natural language representation of the input to form a natural language segment. This may be repeated for each of the sub-units corresponding to a part of the conceptual input structure. Once a series of natural language segments have been generated, the next stage in the language generation process is that that the natural language segments (sub-phrases) are input into another finetuned LPLM model (trained weighted processing network) in order to aggregate them. This trained weighted processing network may be separate from the one used to form the plurality of natural language segments. In this processing step the trained weighted processing network can take as input these generated natural language segments and combine them to form a natural language output that may take the form of a final text (i.e. perform aggregation). The restructuring or reframing of these natural language segments may be performed latently by this model as well. The restructuring or reframing may take the form of reordering or concatenating the natural language segments when integrating them together.

Turning to figure 5, for this implementation example, the description will focus on concept-to- text natural language generator, where the input is a machine-readable meaning representation (MR) and the output is an utterance expressing the input in natural language.

There is no standard for how MRs are represented or even what information they need to contain. The present architecture is independent from the particulars of the MR. For the description of figure 5 the example in which the conceptual input MR consists of one or more units of information (predicates) will be used. Each predicate may have a set of attributes and corresponding values e.g. attribute value pairs.

Each attribute value pair dictates the content of the conceptual input structure. For example, the MR OFFER (stylist name = Atelier Salon Willow Glen ; city = San Jose) INFORM (count = 10) denotes that the output of the NLG model should inform the user that it found “10” options and suggest a stylist named “Atelier Salon Willow Glen” located in “San Jose”.

Instead of relying on a single monolithic NLG model (e.g. a neural encoder-decoder architecture) to lexicalise the meaning representation as in Figure 2, this disclosure HRAG may employ three modules, each in charge of lexicalisation, aggregation and postedit respectively, as shown in figure 5. Each module may be agnostic to the specifics of the underlying natural language generation model. Specifically, any architecture or model capable of generating text can be employed by HRAG, with a preference for LPLM encoder-decoder models such as T5.

As seen in figure 5 the lexicalisation module assumes that each attribute-value pair corresponds to one distinct fact or unit of information that has an associated attribute identifier information. For example, “city” is an example of an attribute identifier information and “San Jose” is an example of a “value” or unit of attribute information. In the lexicalisation module, each unit of attribute information in turn is expressed as a sub-phrase of the final output, e.g. [OFFER (city= San Jose)] loosely corresponds to the natural language segment “in San Jose”. Thus, the conceptual input structure is first divided into distinct attribute-value pairs s _xv _x (for the present purposes predicates can be considered as part of the attribute, e.g. “OFFER-city” is considered a distinct attribute), then the lexicalisation module independently generates a corresponding natural language segment wi ^x ..wi _en-x ^x for each unit of information by inputting each sub-unit that may be comprised of at least one attribute-value pair into a trained weighted processing network.

As seen in the language generator of figure 5, the generated sub-phrases of the lexicalization module may be ordered in a consistent manner (e.g. alphabetically by the corresponding attribute names) and input into an aggregation module one at a time in a recurrent fashion to form an iterative process. In a first iteration, the first two natural language segments generated in the lexicalisation module. At the first step the first two sub-phrases wi ¹...w ¹i _en_i and wi ²...w ²ien_2, and the correspondent attribute-value pairs s ¹v ¹ s ²v ², are input into a trained weighted processing network that may be within the aggregation layer so as to produce the combined sub-phrase, hence forth known as a first intermediate phrase, wi ^[1’ ^2l...w ^[1’ ^2li _en_ ^{[1 ,21}.

Next the language generator, specifically the aggregation layer, iteratively processes the each of the other natural language segments to combine them with the intermediate phrase generated in the previous iteration. This combination may be a concatenation of the intermediate phrase with the next natural language segment to be processed, or it may be a restructuring of all the component sub-units of the intermediate phrase and the next natural language segment. At each subsequent iterative step r the input of the aggregation module consists of the concatenation of the previously aggregated intermediate phrase wi ^{[1 r-} ¹⁾...wien_[i,r-i] ^{[1 ,r}' ¹¹ and the current natural language segment wi ^r...wi _en_r ^r, to produce a further intermediate phrase wi ^[1’ ^rl...wi _en_[i,r-i] ^{[1 ,rl}. The attribute-value pairs SiVi, S2V2... s _rv _r that correspond to the intermediate phrase and the natural language segment may also be input into the trained weighted processing network iteratively. This process is continued iteratively such that the aggregation module is called recurrently until all the natural language segments generated by the lexicalisation module are combined into a single output I the form of a further intermediate phrase wi ^{[1 nl}...w ^{[1 nl}ien_[i,n] ^{[1 ,nl}.

The third and final module that may form part of the natural language generator is the postedit module shown in figure 5. The postedit module may be configured to take the fully aggregated further intermediate phrase wi ^[1’ ^nl...wi _en_[i,n] ^{[1 ,nl} and produces the natural language output w’i ...w’i. This natural language output may be performed by inputting the further intermediate phrase into a further trained weighted processing network. In the aggregation step, the network used to form the intermediate phrases and further intermediate phrase may be trained to combine the natural language segments into larger intermediate phrases in a manner and do not necessarily produce a fluent and coherent text complete with appropriate punctuation that is devoid of errors. Therefore, it is the purpose of the post-edit layer to rewrite the aggregated sub-phrases, fix any errors and finalize the text.

To minimise the error propagated between layers, for example between each module, each module may generate multiple possible outputs from the trained weighted processing network as multiple hypotheses per input and can select the output with the fewest slot errors to be used at the next step of the process or in the next iteration. Taking the generation of the first iteration in the aggregation module as an example, the trained weighted processing network may output a number of first intermediate phrases and then the one with the fewest slot errors may be selected to be the first intermediate phrase used in the next iteration.

The slot error may be calculated as missed slots divided by total slots, wherein the missed slots are the number of input attribute-value pairs that are not appearing verbatim in the output e.g. information that was input that has not been used in the output. Total slots are the number of pieces of information (values from the attribute value pairs) that have been input in every iteration so far, regardless as to whether they were used in the output. The accuracy of the language generated may be further improved by a selection made in the post-edit layer. Due to frequent imperfections in the post-edit layer’s output, the final output of the natural language generator may be selected between the output of the last aggregation iteration e.g. the further intermediate phrase output after all the natural language segments input to the aggregation module have been processed, and the output of the post-edit module that has corrected the further intermediate phrase output from the aggregation module using machine learnt lexicalisation. This selection can be made according to which one has the lowest slot error.

Some examples of comparisons between slot error calculations performed using a T5-based finetuned LPLM and those of the present language generator and those of the present language generator can be seen below. The “naive” model shown below is another name for the T5-based finetuned LPLM described in the first two tables.

The above tables demonstrate that for examples of domain specific scenarios the slot error that occurs during language processing using the language generator of the present disclosure is less than that that occurs when using the known T5-based finetuned LPLM. The tables were calculated using the known BLEU metric.

Each module of the natural language generator may have a separate trained weighted processing network that may form part of the natural language generator, or they may be separate by useable by the natural language generator. The natural language generator may be implemented on a computer and/or using processors configured to perform the natural language processing.

The method of adjusting the weighting of weighted processing network for use in a natural language generator will be described below in relation to figure 6.

Due to its multi-layer architecture, the hierarchical natural language generation with recurrent aggregation requires module-specific target sub-phrases as training signal. Specifically, HRAG requires parallel data between units of information (i.e. attribute value pairs) and corresponding natural language segments for the lexicalisation module, and similar parallel data for the partially aggregated natural language segments or intermediate phrases for the aggregation module. Large sets of this data are ordinarily not available. As such this disclosure includes a distant supervision approach to automatically extract target natural language elements from a full target text and use this to output a training signal to be used to adjust the weighting of a weighted processing network. As seen in figure 6, given some inputs of attribute value pairs (i.e. a structured meaning representation), matched values are extracted from a sample text by determining all values in the full target text via language-agnostic delexicalization as described in Zhou and Lampouras 2021 (Generalising multilingual concept-to-text NLG with language agnostic delexicalization. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1 : Long Papers), pages 114-127, Online. Association for Computational Linguistics). This may be achieved by comparing semantic embeddings of the input attribute-value pairs to the target natural language text. The attribute-value pairs may have associated target natural language elements (values) and associated target information units (e.g. categories of attributes like “city”). A match between a target natural language element and a target information unit in the full target text is achieved when the similarity between the two exceeds a pre-defined threshold. The threshold may be a pre-determined numeric value that may denote a degree of similarity between a target natural language element and text in the target natural language text. The threshold may also be set in other known ways.

These matched values are used to form sub-phrases comprised of all of the words of the matched values and some surrounding text elements from the target natural language text. This may include all words preceding and following the matched value until either a punctuation or another matched value is reached. This can be seen in the three boxes below the target natural language text box in figure 6.

While this process results in some overlaps in the target texts, it helps the finetuned models be more robust. If a value is not matched, the process is repeated by comparing the corresponding attribute names against the full target text via language-agnostic delexicalization. If a match is again not found, that attribute-value pair from the input is ignored during training.

The matched sub-phrases are then used as a training signal to train existing methods of training to adjust the weighting of the network.

To facilitate the distant supervision inference of the training signal for aggregation, the ordering of the generated sub-phrases of the lexicalization module differs between training and inference. As mentioned above, during inference the generated sub-phrases of the lexicalization module may be ordered in a consistent manner (e.g. alphabetically by the corresponding attribute names). During training however, the sub-phrases that are generated from matches may be ordered based on the order of appearance of the corresponding matched values in the full target sentence. For example, in figure 5 the order would be: INFORM (count = 10) > OFFER (city = San Jose) > OFFER (stylist name = Atelier Salon Willow Glen). The training signal for the aggregation steps is then inferred as such: for every step of the aggregation layer, the target sub-phrase/natural language element may consist of the words from the start of the full target text, including up to the words of the last matched value that is included in the aggregation group, and until either a punctuation or another matched value is reached after that point. Following the example of figure 6, the resulting training signal for the steps of the aggregation layer are shown below:

Aggregation group: INFORM (count = 10) + OFFER (city = San Jose)

Target sub-phrase: I found 10 salons you may like. There is a nice salon in San Jose called

Aggregation group: INFORM (count = 10) + OFFER (city = San Jose) + OFFER (stylist name = Atelier Salon Willow Glen)

Target sub-phrase: I found 10 salons you may like. There is a nice salon in San Jose called Atelier Salon Willow Glen.

As can be seen above, the subphrases are formed of the matched value and some surrounding words. These target sub-phrases can then be used adjust the weighting of a weighted processing network such as those used in the natural language generator of the present disclosure in order to provide a more accurate natural language output based on a series of conceptual input structures.

Below are some examples of comparisons between the HRAG language generator of the present disclosure and that of known T5-based finetuned LPLM.

Meaning Representation: inform (street address = 15 fiesta lane, average rating = 4.3) offer (stylist name = 18|8 fine men 's salons - lafayette, appointment date = march 6th, appointment time = 6 pm)

HRAG: there is a nice salon at 15 fiesta lane with a rating of 4.3. the hairstylist is 18|8 fine men 's salons - Lafayette, the appointment is on march 6th at 6 pm. Base T5: the name of the hairstylist is 18 fine men 's salons - lafayette. the appointment is march 6th at 6 pm.

Meaning Representation: offer (playmovie)

HRAG: do you want to watch the movie?

Base T5: do you want to watch penguin highway with subtitles?

Meaning Representation: offer (therapist name = culberson couple counselling, type = psychologist, city = mountain view)

HRAG: there is a psychologist called culberson couple counseling in mountain view.

Base T5: culberson couple counseling is a psychologist, in mountain view is culberson couple counselling.

As can be seen in the above examples and as has been previously discussed the language generator of the present disclosure provides a more natural and accurate natural language output with fewer hallucinations and errors carried through the generation process.

The applicant hereby discloses in isolation each individual feature described herein and any combination of two or more such features, to the extent that such features or combinations are capable of being carried out based on the present specification as a whole in the light of the common general knowledge of a person skilled in the art, irrespective of whether such features or combinations of features solve any problems disclosed herein, and without limitation to the scope of the claims. The applicant indicates that aspects of the present invention may consist of any such individual feature or combination of features. In view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the invention.

Previous Patent: AN AUDIO GENERATING ARRANGEMENT FOR AN ELECTRONIC DEVICE

Next Patent: INTELLIGENT SOURCE SWITCHING IN MICROGRID CONNECTED HOUSEHOLDS