Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A COMPUTER IMPLEMENTED METHOD AND A COMPUTER SYSTEM FOR DETERMINING A SET OF CITATIONS RELATED TO AN ELECTRONIC DOCUMENT EDITED BY A USER ON A COMPUTING DEVICE
Document Type and Number:
WIPO Patent Application WO/2019/222787
Kind Code:
A1
Abstract:
A computer- implemented method (200) for determining a set of citations related to an electronic document edited by a user on a computing device. The method comprises obtaining (210) the electronic document from the computing device, the electronic document including a set of paragraphs and an editing indication, wherein the editing indication indicates an editing paragraph being one of the set of paragraphs is edited by the user; determining (220) the editing paragraph from the electronic document based on the editing indication; determining (230) a first set of phrases based on the editing paragraph, generating (240) a first query string including the first set of phrases; invoking (250) a search facility with the first query string to identify a first set of citations that matches the first query string, the first set of citations being represented by a first set of citation indications; receiving (260) from the search facility the first set of citation indications, determining (270) a set of citation indications based on the first set of citation indications, the set of citation indications representing the set of citations; and sending (280) the set of citation indications to the computing device in order for the set of citation indications to be presented on the computing device for selection by the user.

Inventors:
CLARK GRAEME (AU)
Application Number:
PCT/AU2018/050484
Publication Date:
November 28, 2019
Filing Date:
May 21, 2018
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
CITEHERO PTY LTD (AU)
International Classes:
G06F40/20
Domestic Patent References:
WO2010014403A12010-02-04
Foreign References:
US20150039639A12015-02-05
US20090187567A12009-07-23
US20110060761A12011-03-10
US20140006424A12014-01-02
US9703763B12017-07-11
Attorney, Agent or Firm:
BAXTER PATENT ATTORNEYS PTY LTD (AU)
Download PDF:
Claims:
C LAIMS:

1. A computer- implemented method for determi ni ng a set of citations related to an el ectroni c document edited by a user on a computi ng devi ce, the method compri si ng:

obtaini ng the electronic document from the computing device, the electronic document i ncl uding a set of paragraphs and an editing i ndication, wherein the editi ng i ndi cation i ndicates an editi ng paragraph bei ng one of the set of paragraphs is edited by the user;

determini ng the editing paragraph from the electronic document based on the editing i ndi cation;

determini ng a first set of phrases based on the editi ng paragraph;

generati ng a fi rst query stri ng i ncl udi ng the f i rst set of phrases;

i nvoking a search faci l ity with the first query string to identify a first set of citations that matches the fi rst query stri ng, the fi rst set of citati ons bei ng represented by a fi rst set of citation i ndications;

recei vi ng from the search faci lity the fi rst set of citati on i ndi cati ons;

determi ni ng a set of citati on i ndi cati ons based on the fi rst set of citati on i ndi cati ons, the set of citation i ndications representi ng the set of citations; and

sendi ng the set of citati on i ndi cati ons to the computi ng devi ce i n order for the set of citation i ndications to be presented on the computing device for selection by the user.

2. T he computer- implemented method of clai m 1 , wherein each of the set of paragraphs i ncl udes a set of sentences and the editi ng i ndi cati on further i ndi cates an editi ng sentence bei ng one of the set of sentences i ncl uded i n the editi ng paragraph is edited by the user, the method further compri si ng:

determini ng the editing sentence i n the editing paragraph based on the editi ng i ndi cation;

determini ng a second set of phrases based on the editi ng sentence;

generating a second query stri ng incl udi ng the second set of phrases;

i nvoking the search faci lity with the second query string to identify a second set of citations that match the second query stri ng, the second set of citations being represented by a second set of citation i ndications;

receiving from the search faci lity the second set of citation i ndications, determi ning one or more citation i ndications that are i ncl uded i n both the first set of citation i ndications and the second set of citation indications; and

determi ni ng the one or more citati on i ndi cati ons to be the set of citati on i ndi cati ons.

3. T he computer- implemented method of clai m 2, wherein determi ning the first set of phrases based on the editing paragraph comprises:

generati ng a set of word occurrence count i ndi cati ons i ndi cati ng one or more words i n the electronic document and respective occurrence counts of the one or more words;

determini ng a set of i nitial document phrases i n the electronic document based on the set of word occurrence count i ndications;

determini ng a set of document phrase synonyms, the set of document phrase synonyms incl udi ng one or more synonyms of the set of initial document phrases;

determi ni ng a set of document phrases from the set of document phrase synonyms based on a first set of criteria, the set of document phrases meeting the first set of criteria; determi ni ng a set of paragraph phrases from the set of document phrases, wherei n the set of paragraph phrases i ncl ude one or more of the set of document phrases that are i ncl uded i n the editi ng paragraph;

determi ni ng the f i rst set of phrases i ncl udi ng the set of paragraph phrases, the synonyms of the set of paragraph phrases that are i ncl uded i n the set of document phrase synonyms, a subset of the set of document phrases, and the synonyms of the subset of the set of document phrases.

4. T he computer- implemented method of dai m 2, wherei n determi ning the second set of phrases based on the editing sentence comprises:

determini ng a set of initial sentence phrases from the editi ng sentence based on a language processing tool;

determini ng a set of sentence phrase synonyms, the set of sentence phrase synonyms i ncluding one or more synonyms of the set of i nitial sentence phrases;

determi ni ng a set of sentence phrases from the set of sentence phrase synonyms based on a second set of criteria, the set of sentence phrases meeti ng the second set of criteria; and determini ng the second set of phrases incl uding the set of sentence phrases, the synonyms of the set of sentence phrases that are i ncl uded i n the set of sentence phrase synonyms, the set of paragraph phrases, and the synonyms of the set of paragraph phrases.

5. T he computer- implemented method of dai m 4, wherein the language processi ng tool comprises the Stanford CoreNL P language processing tool.

6. T he computer- implemented method of clai m 5, further comprisi ng:

receiving one or more background phrases from the computi ng device, the one or more background phrases defi ni ng background of the electronic document; and

addi ng the one or more background phrases to the f i rst set of phrases or the second set of phrases.

7. T he computer- implemented method of clai m 5, further comprisi ng:

receivi ng a search I i mitati on i ndi cati on from the computi ng devi ce, the search l i mitation i ndication incl udi ng one or more search l imitations; and

i ncorporati ng the one or more search I i imitations i nto the fi rst query stri ng or the second query stri ng i n order for the search faci I ity to identify the fi rst set of citati ons or the second set of citations subject to the one or more search l imitations.

8. T he computer- implemented method of clai m 5, wherein the search faci l ity comprises one or more of a group of databases comprising a journal article database, a law database, a medical research database, a historical archives database, a patent database, and an onli ne

I i brary.

9. T he computer- implemented method of clai m 5, wherein the search faci l ity comprises one or more of a group of Internet search engi nes comprisi ng Google, B ing, Y ahoo, Ask, AOL, Baidu, Wolframalpha, and DuckDuckGo.

10. A computer- i mpl emented method for generati ng a citati on reference i n an el ectroni c document edited by a user on a computi ng device, the method comprisi ng:

receivi ng at the computi ng devi ce a set of citati on i ndi cati ons as determi ned by any one of clai ms 1 to 8, the set of citation i ndications representing a set of citati ons related to the electronic document;

present! ng the set of citati ons on the computi ng devi ce for the user to sel ect one of the set of citations;

receivi ng a sel ecti on i ndi cati on from the computi ng devi ce i ndi cati ng the one of the set of citations is selected by the user; and generating the citation reference based on the selection indication in the electronic document to associate the one of the set of citations with the electronic document.

11. The computer- implemented method of claim 10, wherein generating the citation reference comprises:

determining a sentence edited by the user in the electronic document; and

generati ng the citati on reference i n a fi rst format i n the sentence edited by the user.

12. The computer- implemented method of claim 11, further comprising generating a hyperlink as the citation reference in the sentence, the hyperlink being associated with the one of the set of citati ons.

13. The computer- implemented method of claim 10, wherein generating the citation reference comprises:

generating the citation reference in a second format; and

inserting the citation reference at a position in the electronic document designated by the user.

14. A computer systemfor determining a set of citations related to an electronic document edited by a user on a computing device, the computer system comprising:

a memory device to store machine-readable instructions;

a communication interface; and

a processor operably connected to the memory device and the communication i nterface, the processor obtai ni ng the machi ne- readabl e i nstructi ons from the memory devi ce, wherein the processor is configured by the machine-readable instructions to

obtai n the el ectroni c document from the computi ng devi ce vi a the communication interface, the electronic document including a set of paragraphs and an editing indication, wherein the editing indication indicates an editing paragraph being one of the set of paragraphs is edited by the user;

determi ne the editi ng paragraph from the electron! c document based on the editing indication;

determi ne a fi rst set of phrases based on the editi ng paragraph; generate a f i rst query stri ng i ncl udi ng the fi rst set of phrases; i nvoke a search faci I ity with the fi rst query stri ng to i dentify a fi rst set of citations that match the first query stri ng, the fi rst set of citations being represented by a first set of citation indications;

receive via the communication i nterface from the search faci lity the first set of citation i ndications;

determi ne a set of citati on i ndi cati ons based on the f i rst set of citati on i ndi cations, the set of citation indications representing the set of citations; and

send vi a the communi cati on i nterface the set of citati on i ndi cati ons to the computi ng devi ce i n order for the set of citati on i ndi cati ons to be presented on the computi ng device for selection by the user.

15. T he computer system of cl ai m 14, wherei n each of the set of paragraphs i ncl udes a set of sentences and the editing indication further i ndicates an editi ng sentence being one of the set of sentences incl uded in the editi ng paragraph is edited by the user, the processor is further configured to

determine the editi ng sentence in the editi ng paragraph based on the editing i ndi cation;

determine a second set of phrases based on the editing sentence;

generate a second query stri ng i ncl udi ng the second set of phrases;

i nvoke the search faci lity with the second query stri ng to identify a second set of citations that match the second query stri ng, the second set of citations being represented by a second set of citation i ndications;

receive via the communication i nterface from the search faci I ity the second set of citation i ndications,

determine one or more citation indi cati ons that are incl uded i n both the fi rst set of citation i ndications and the second set of citation indications; and

determine the one or more citation i ndications to be the set of citation i ndications.

16. A computi ng devi ce for generati ng a citati on reference i n an el ectroni c document edited by a user, the computi ng device comprisi ng:

a memory device to store machine-readable i nstructions;

a communicati on i nterface;

a user interface; and a processor operably connected to the memory devi ce, the communication i nterface and the user i nterface, the processor obtaini ng the machine- readable instructions from the memory device, wherei n the processor is configured by the machi ne-readable i nstructions to receive vi a the communi cati on i nterface a set of citati on i ndi cati ons as determi ned by any one of claims 1 to 8, 13 and 14, the set of citation indicati ons representing a set of citations related to the electronic document;

present on the user i nterface the set of ci tati ons for the user to sel ect one of the set of citations;

receive a sel ecti on i ndi cati on from the user i nterface i ndi cati ng the one of the set of citations is selected by the user; and

generate the citation reference based on the selection i ndication i n the electronic document to associate the one of the set of citations with the electronic document.

Description:
A COMPUTER IMPLEMENTED METHOD AND A COMPUTER SYSTEM FOR DETERMINING A SET OF CITATIONS RELATED TO AN ELECTRONIC DOCUMENT EDITED BY A USER ON A COMPUTING DEVICE

T echnical Field

[1] The present invention generally relates to facilitating document editing and in particular to a computer implemented method and a computer system for determining a set of citations related to an electronic document edited by a user on a computing device and to a computer implemented method and a computer system for generating a citation reference in an electronic document edited by a user on a computing device.

Background

[2] It is quite common that while an author is editing an electronic document such as a book, research paper, thesis etc. the author of the electronic document needs to cite in the electronic document being edited a citation that is related to text within the electronic document Generally, the author has to identify the citation from a database (for example, IE E E journal databases) by using key words search, and cites the citation in a format desired in the electronic document. It is very time consuming and also distracting for the author to search the database for the relevant citation and type the citation in the format desired in the electronic document As a result, this reduces the efficiency in editing the electronic document

[3] It i s therefore desi rabl e to have a computer i mpl emented method and a computer system to alleviate above mentioned deficiencies.

[4] Throughout this specification, unless the context requires otherwise, the words ' comprise , ' comprises and ' comprising will be understood to imply the inclusion of a stated step or element or group of steps or elements but not the exclusion of any other step or element or group of steps or elements.

[5] Throughout this specification, unless the context requires otherwise, the words ' edit , ' edited and ' editing will be understood to refer to any authoring of a document, including writi ng origi nal text and/or modifyi ng exi sti ng text. [6] Any one of the terms: ' including or ' which includes or ' that includes as used herein is also an open term that also means including at least the elements/features that follow the term but not excl udi ng others.

[7] Any discussion of the background art throughout the specification should in no way be considered as an admission that such background art is prior art nor that such background art is widely known or forms part of the common general knowledge in the field in Australia or worldwide.

Summary

[8] There is provided a computer- implemented method for determining a set of citations related to an electronic document edited by a user on a computing device. The method comprises obtaining the electronic document from the computing device, the electronic document including a set of paragraphs and an editing indication, wherein the editing indication indicates an editing paragraph being one of the set of paragraphs is edited by the user; determining the editing paragraph from the electronic document based on the editing indication; determining a first set of phrases based on the editing paragraph; generating a first query string including the first set of phrases; invoking a search facility with the first query string to identify a first set of citations that matches the first query string, the first set of citations being represented by a first set of citation indications; receiving from the search facility the first set of citation indications; determining a set of citation indications based on the first set of citation indications, the set of citation indications representing the set of citations; and sending the set of citation indications to the computing device in order for the set of citation indications to be presented on the computing device for selection by the user.

[9] It is an advantage of the present invention that the invention analyses the content of the editing paragraph of the electronic document being edited by the user and determines the first set of phrases (i.e„ search key words) based on the editi ng paragraph. A s the f i rst query stri ng including the first set of phrases is generated with reference to the editing paragraph, the search results (i.e., the first set of citations) are more relevant to the editing paragraph. Also, the first set of phrases are automatically determined based on the content of the editing paragraph while the user is editing the electronic document; the user does not need to manually designate the search key words in order to identify the citations that are related to the electronic document being edited. Therefore, the efficiency in editing the electronic document is dramatically improved. [10] Each of the set of paragraphs includes a set of sentences and the editing indication further indicates an editing sentence being one of the set of sentences included in the editing paragraph is edited by the user.

[11] The computer-implemented method may further comprise determining the editing sentence in the editing paragraph based on the editing indication; determining a second set of phrases based on the editing sentence; generating a second query string including the second set of phrases; invoking the search facility with the second query string to identify a second set of citations that matches the second query string, the second set of citations being represented by a second set of citation indications; receiving from the search facility the second set of citation indications; determining one or more citation indications that are included in both the first set of citation indications and the second set of citation indications; and determi ni ng the one or more citation indications to be the set of citation indications.

[12] The step of determining the first set of phrases based on the editing paragraph may further comprise generating a set of word occurrence count indications indicating one or more words in the electronic document and respective occurrence counts of the one or more words; determining a set of initial document phrases in the electronic document based on the set of word occurrence count indications; determining a set of document phrase synonyms, the set of document phrase synonyms including one or more synonyms of the set of initial document phrases; determining a set of document phrases from the set of document phrase synonyms based on a first set of criteria, the set of document phrases meeting the first set of criteria; determining a set of paragraph phrases from the set of document phrases wherein the set of paragraph phrases includes one or more of the set of document phrases that is included in the editing paragraph; and determining the first set of phrases including the set of paragraph phrases, the synonyms of the set of paragraph phrases that are included in the set of document phrase synonyms, a subset of the set of document phrases and the synonyms of the subset of the set of document phrases.

[13] The step of determining the second set of phrases based on the editing sentence may further comprise determining a set of initial sentence phrases from the editing sentence based on a language processing tool; determining a set of sentence phrase synonyms, the set of sentence phrase synonyms including one or more synonyms of the set of initial sentence phrases; determining a set of sentence phrases from the set of initial document phrases based on a second set of criteria , the set of sentence phrases meeting the second set of criteria; and determining the second set of phrases including the set of sentence phrases, the synonyms of the set of sentence phrases that are i ncl uded i n the set of sentence phrase synonyms, the set of paragraph phrases, and the synonyms of the set of paragraph phrases.

[14] The language processing tool may comprise the Stanford CoreNLP language processing tool.

[15] The computer- implemented method may further comprise receiving one or more background phrases from the computing device, the one or more background phrases defining background of the electronic document; and adding the one or more background phrases to the fi rst set of phrases or the second set of phrases.

[ 16] T he computer-implemented method may further comprise receiving a search limitation indication from the computing device, the search limitation indication including one or more search limitations; and incorporating the one or more search limitations into the first query string or the second query string in order for the search facility to identify the first set of citations or the second set of citations subject to the one or more search limitations.

[17] The search facility may comprise one or more of a group of databases comprising a journal article database, a law database, a medical research database, a historical archives database, a patent database, and an online library.

[ 18] T he search facility may comprise one or more of a group of Internet search engines comprising Google, Bing, Y ahoo, Ask, AOL, Baidu, Wolframalpha, and DuckDuckGo.

[19] There is provided a computer-implemented method for generati ng a citation reference in an electronic document edited by a user on a computing device. The method comprises receiving at the computing device a set of citation indications as determined above, the set of citation indications representing a set of citations related to the electronic document; presenting the set of citations on the computing device for the user to select one of the set of citations; receiving a selection indication from the computing device indicating the one of the set of citations is selected by the user; and generating the citation reference based on the selection indication in the electronic document to associate the one of the set of citations with the electronic document.

[20] The step of generating the citation reference may further comprise determining a sentence edited by the user in the electronic document and generating the citation reference in a fi rst format i n the sentence edited by the user. [21] The computer- implemented method may further comprise generating a hyperlink as the citation reference in the sentence, the hyperlink being associated with the one of the set of citations.

[22] T he step of generati ng the citati on reference may further compri se generati ng the citati on reference in a second format and inserting the citation reference at a position in the electronic document designated by the user.

[23]There is provided a computer system for determining a set of citations related to an electronic document edited by a user on a computing device. The computer system comprises a memory device to store machine- readable instructions; a communication interface; and a processor operably connected to the memory device and the communication interface. The processor obtains the machine-readable instructions from the memory device. The processor is configured by the machine- readable i nstructi ons to obtain the electronic document from the computing device via the communication interface, the electronic document including a set of paragraphs and an editing indication, wherein the editing indication indicates an editing paragraph being one of the set of paragraphs is edited by the user; determine the editing paragraph from the electronic document based on the editing indication; determine a first set of phrases based on the editing paragraph; generate a first query string including the first set of phrases; invoke a search facility with the first query string to identify a first set of citations that matches the first query string, the first set of citations being represented by a first set of citation indications; receive via the communication interface from the search facility the first set of citation indications; determine a set of citation indications based on the first set of citation indications, the set of citation indications representing the set of citations; and send via the communication interface the set of citation indications to the computing device in order for the set of citation indications to be presented on the computing device for selection by the user.

[24] Each of the set of paragraphs includes a set of sentences and the editing indication further indicates an editing sentence being one of the set of sentences included in the editing paragraph is edited by the user.

[25] T he processor may be further configured to determi ne the editi ng sentence i n the editi ng paragraph based on the editing indication; determine a second set of phrases based on the editing sentence; generate a second query string including the second set of phrases; invoke the search facility with the second query string to identify a second set of citations that matches the second query string, the second set of citations being represented by a second set of citation indications; receive via the communication interface from the search facility the second set of citation indications; determine one or more citation indications that are included in both the first set of citation indications and the second set of citation indications; and determine the one or more citation indications to be the set of citation indications.

[26]There is provided a computer system for generating a citation reference in an electronic document edited by a user. The computer system comprises a memory device to store machine- readable instructions; a communication interface; a user interface; and a processor operably connected to the memory device, the communication interface and the user interface. The processor obtains the machine- readable instructions from the memory device. The processor is configured by the machine-readable instructions to receive via the communication interface a set of citation indications as determined above, the set of citation indications representing a set of citations related to the electronic document; present on the user interface the set of citations for the user to select one of the set of citations; receive a selection indication from the user interface indicating the one of the set of citations is selected by the user; and generate the citation reference based on the selection indication in the electronic document to associate the one of the set of citations with the electronic document

Brief Description of Drawings

[27] At least one example of the invention will be described with reference to the accompanying drawings, in which:

Figure 1 illustrates an exemplary system in which various embodiments are implemented in accordance with the present invention;

Figure 2 illustrates a computer-implemented method for determining a set of citations related to an electronic document edited by a user on a computing device in accordance with an embodi ment of the present i nventi on;

Figure 3 illustrates an exemplary computing device for generating a citation reference in an electronic document edited by a user i n accordance with an embodiment of the present invention;

Figure 4 illustrates a computer-implemented method for generating a citation reference in an electronic document edited by a user on the computing device in accordance with an embodi ment of the present i nventi on; Figure 5 illustrates a computer graphical user interface where the present invention is i mpl emented as a si debar pi ugi n of M i crosoft W ord;

Figure 6 illustrates a computer graphical user interface where the present invention is implemented as an online word editor; and

Figure 7 illustrates an exemplary system in which the various embodiments are implemented in accordance with the present invention.

[28] It should be noted that the same numeral represents the same or similar elements throughout the drawings.

Description of Embodiments

[29] Figure 1 illustrates an exemplary system 100 in which various embodiments of the present invention are implemented. The system 100 includes a computing device 102 associated with a user. The user uses the computing device 102 to edit an electronic document and generate citation references in the electronic document The computing device 102 may be selected from a group of computing devices comprising mobile handheld devices (such as mobile phones, PDA and tablet PCs etc.), desktop PCs and notebooks, etc. Alternately, the computing device 102 may be purpose built dedicated devices capable of reading data and retrieving information in different formats through a communication network 104. The communication network 104 may be one of but not limited to a Local Area Network (LAN) or a Wide Area Network (WAN). The communication network 104 may operate using a number of protocols, such as but not limited to, TCP/IP, 3GPP, 3GPP2, LTE, IEEE 802.x etc.

[30] Further connected to the communication network 104 is a computer system 106. The computer system 106 includes a memory device 1062 configured to store machine readable instructions. The machine- readable instructions may be loaded into the memory device 1062 from a non- transitory machine- readable medium, such as but not limited to CD-ROMs, DVD-ROMs and Flash Drives. The machine- readable instructions may be loaded into the memory device 1062 in a form of a computer software program. The memory device 1062 may be selected from a group of memory devices comprising Random Access Memory (RAM), Read Only Memory (ROM), Erasable Programmable Read-only Memory (EPROM), Electrically Erasable Programmable Read-only Memory (EE PROM) and Flash memory. [31] The computer system 106 includes a communication interface 1066 for transmission of data between the computer system 106 and any other external device connected with the computer system 106 via the communication network 104. In that manner, the communication interface 1066 may include one or more of a group of ports comprising a wired or wireless network connection interface, a Universal Serial Bus (USB) port, an Ethernet port, e-SATA and SCSI etc. The computer system 106 also includes a processor 1064 operably connected with the memory device 1062 and the communication interface 1066. In various embodiments, the processor 1064 is one of, but not limited to, a general- purpose processor, an Application Specific Integrated Circuit (ASIC) and a Field- Programmable Gate Array (FPGA).

[32] Further connected to the communication network 104 is a search facility 110. The search facility 110 may include one or more of a group of Internet search engines 1101 comprising Google, Bing, Yahoo, Ask, AOL, Baidu, Wolframalpha, and DuckDuckGo. The search facility 110 may also include one or more of a group of databases 1102 comprising a journal article database, a law database, a medical research database, a historical archives database, a patent database, and an online library. The database 1102 may be maintained on a cloud-based storage or a local storage. In some embodiments, the database 1102 may include multiple databases. The database 1102 may also be a distributed database, distributed along nodes providing peer to peer functionalities. In any manner, the database 1102 is capable of providing the data to devices connected with the communication network 104 when the data is queried appropriately using applicable security and other data transfer protocols.

[33] As an example, a user is editing an electronic document titled ' Penguin using the computing device 102, as shown below. In this example, the electronic document is extracted from Wikipedia (https://en.wikipedia.org/Wiki/Penguin) for description purposes only. While the user is editing the electronic document, the computing device 102 sends the electronic document to the computer system 106 via the communication network 104. The electronic document is shown in Table 1 below.

Table 1

[34] Upon receipt of the electronic document at the computer system 106, the processor 1064 of the computer system 106 analyses the content of the electronic document and generates a search query. The processor 1064 communicates the search query to the search facility 110 via the communication network 104. The search facility 110 performs a search for a set of citations in the database 1102 that matches the search query. The search facility 110 can also invoke the search engine(s) 1101 to perform the search for the set of citations. The determined set of citations is communicated via the communication network 104 to the computing device 102 of the user. The set of citations is presented on the computing device 102 for the user to select a citation from the set of citations. [35] Figure 2 illustrates a computer- implemented method 200 for determining a set of citations related to an electronic document edited by a user on the computing device 102 in accordance with an embodiment of the present invention. Although the method 200 is described as being performed by the computer system 106, it can also be performed by the computing device 102 without departing from the scope of the present invention. As described above, the computing device 102 sends the electronic document to the computer system 106 via the communication network 104. As a result the processor 1064 of the computer system 106 obtains 210 the electronic document from the computing device 102 being used by the user. For example, the user is using a word processor installed on the computing device 102 and the processor 1064 of the computer system 106 obtains the electronic document via communication network 104. T ake the electronic document shown in Table 1 as an example, the electronic document includes a set of paragraphs, and the set of paragraphs includes one or more paragraphs. In this example, a paragraph can be defined as being a chunk of text that is separated by 2 or more computer- readable control characters such as ' NEWLINE or ' RETURN (at the end of each paragraph). Each of one or more paragraphs includes a set of sentences including one or more sentences. In this example, a sentence can be defined as being a chunk of text that is separated by 2 adjacent computer- readable punctuation marks, for example, full stop, i.e., In other examples, a sentence can also be defined as per a language processing tool, for example, Stanford Core NLP.

[36] The electronic document also includes an editing indication. In this example, the editing indication shows the current editing position in the electronic document indicating where the user is about to type in the next character in the electronic document. The editing indication can be understood as a ' cursor in the electronic document represented as for example [CURSOR] in the electronic document shown in Table 1. The paragraph in which the editing indication resides is the paragraph being edited by the user, referred to as editing paragraph hereinafter. Clearly, the editing paragraph is one of the set of paragraphs in the electronic document. In the electronic document shown in Table 1, the editing paragraph is the last paragraph in the electronic document Further, the sentence in which the editing indication resides is the sentence being edited by the user, referred to as editing sentence hereinafter. Clearly, the editing sentence is one of the set of sentences included in the editing paragraph. In the electronic document shown in Table 1, the editing sentence is the second sentence in the editing paragraph, i.e., ' The smallest penguin species is the little blue penguin (Eudyptula minor), also known as the fairy penguin, which stands around 40 cm(16 in) tall and weighs 1 kg (2.2 lb)._ At step 220, the processor 1064 determines the editing paragraph based on the editing indication. For example, the processor 1064 identifies the editing indication, i.e., [CURSOR], and determines the paragraph that includes the editing indication as the editing paragraph.

[37] For description purposes, some variables are i ntroduced i n the present specification: the_doc: this variable represents the entire text content of the electronic document with all ' NEWLINE _s replaced by spaces and any additional adjacent spaces replaced by a single space. the_ paragraph: this variable represents the text content of the editing paragraph, which is the paragraph that the cursor currently resides in, with all ' NEWLINE _s replaced by spaces and any additional adjacent spaces replaced by a single space. As described above, a paragraph is identified as being a chunk of text that is separated by 2 or more adjacent ' NEWLINE _s. If a new line has only space characters (spaces or tabs) then the spaces and tabs are ignored. This allows for new lines that show only space to be considered as paragraph boundaries. the_ sentence: this variable represent the text content of the editing sentence, which is a sentence as per Stanford Core NL P including the cursor. the_ doc_ map: thi s vari abl e i s an obj ect and has the f ol I owi ng properti es: paragraph- count representing the number of paragraphs; paragraph- num: representing the paragraph number that includes the_ cursor. Paragraphs are numbered starting from 1; doc_ tokens: representing an array of tokens of the_doc. The tokens of the_doc can be determined using the_tokenizer (see https://gjthul coi7yarbox/tokeni zer for details). Note that the tokens of the_doc can also be determi ned i n a different way; doc_ tokens- paragraph- bounds: an object that stores the index of the first word and last word i n each paragraph with respect to the enti re document; paragraph_first_ sentence: an object representing the start index and end i ndex of doc tokens represented by the fi rst sentence i n the_ paragraph.

[38] Below is an example document to explain the above variables: This is the first paragraph.

And another paragraph here.

[39] The above example document can be represented by the variable the_doc. The variable doc_tokens for the_doc_map would be:

[This, is, the, first, paragraph, ., And, another, paragraph, here, .]

[40] doc_tokens_ paragraph bounds for this doc would be:

[

{start_ index: 0, end_ index: 5},

{start_ index: 6, end_ index: 10}

]

[41 ] As another exampl e, the_ doc i s as fol I ows:

This is the first paragraph.

Two sentences residing in this paragraph. And another sentence<the_cursor> here.

[42] doc_tokens_ paragraph- bounds for the_ doc_ map woul d be:

[

{start_ index: 0, end_ index: 5},

{start- index: 6, end_ index: 17}

]

[43] As the cursor resides in the second paragraph, the variable paragraph_first_ sentence would be:

{start- index: 6, end_ index: 12} [44] T he processor 1064 of the computer system 106 cl eanses the el ectroni c document. F or example, the processor 1064 of the computer system 106 joins any words broken by a dash (for example, one of:

[45] T he processor 1064 of the computer system 106 determi nes the above variabl es from the electronic document particularly, the_doc, the_ paragraph, the_ sentence, the_doc_map and the properties of the_doc_map. At step 230, the processor 1064 determi nes a fi rst set of phrases based on the editing paragraph. In order to save computi ng capacity at the computer system 106, step 230 can be performed only when it is determi ned that the_paragraph has changed, which means the user is editi ng a new paragraph.

[46] Specifically, the processor 1064 of the computer system 106 generates a set of word occurrence count i ndications i ndicati ng one or more words i n the electronic document and respective occurrence counts of the one or more words. T he set of word occurrence count i ndi cations is also referred to as a Document Terms Matrix ( DT M). The DT M is an object that i ncl udes each word used in the electronic document together with the occurrence count of that word withi n the electroni c document, sorted by occurrence count i n descending order. In order to reduce the size of the DT M, words that meet the foll owing criteria may be removed from the DT M:

1. occurrence count <= 1; or

2. occurrence count < 2 if the word count of the_doc < 500; or

3. occurrence count < average occurrence count of the DT M words if the word count of the doc >= 500

[47] For the Pengui n document example, the DT M of the Penguin document is as fol lows:

{ "the": 17, "i n": 11, "penguin": 10, "and": 8, "species": 8, "of": 6, "are": 6, "penguins": 6, "as": 5, "on": 4, "found": 4, "a": 4, " regions": 3, "tall": 3, "equator": 3, "one": 3, "only": 3, "they": 3, "or": 2, "around": 2, "also": 2, "l b": 2, " kg": 2, "is": 2,

"temperate": 2, "south": 2, "cli mates": 2, "not": 2, "to": 2, "lives": 2, " half": 2, "about": 2, "whi le": 2, "thei r": 2, "have": 2, " life": 2, "hemisphere": 2, "southern": 2, "al most":

2, "live": 2}

[48] T he processor 1064 of the computer system 106 determi nes a set of i nitial document phrases i n the electronic document based on the set of word occurrence count i ndications, e.g., the DT M. For example, the processor 1064 of the computer system 106 starts from phrasejength (i.e. the number of words in a phrase) = 10, worki ng backwards to phrase_ I ength = 1 , and repeats the fol I owi ng: Iterating through each doc_tokens and from that token buildi ng a phrase with tokens adjacently positioned to its right side i n the document,

obtaini ng a phrase where the number of tokens equals phrase_ length. T okens (i n this example, words) can only be joi ned together to form a phrase if the phrase meets these conditions:

1. T he first token must:

a) exist i n the DT M; and

b) not be i n a smart_ stopl ist; and

c) have a length > 1

2. Remaini ng tokens can be added to the phrase if:

a) the token i s i n the smart_ stopl i st but the pri or token was not i n the smart_ stopl i st, note that in this situation if the current token is i n the smart_stoplist and the phrasejength has been reached, the phrase is not added to the set of i nitial document phrases because the last token of a phrase cannot be in the smart_ stopl ist

b) the token is i n the DT M and the token.length > 1

[49] T he smart_ stopl ist is a list of common stop words such as articles, conj unctions adverbs, e.g., the, to, a. A n example of the smart_ stopl ist can be found at https://github.com/aneesha/RA K E/bl ob/master/S marts topi ist txt.

[50] As soon as phrasejength has been reached, the phrase is added to the set of i nitial document phrases. As a result, the set of i nitial document phrases of varying lengths from 1 to 10 is determi ned. For the Pengui n example document, the set of i nitial document phrases is shown in T able 2 below. A phrase i n the set of i nitial document phrases can be identified by an index for the phrase. For example, the phrase ' live_ i n the set of i nitial document phrases can be identified as [1, 2]. This means that the phrase ' live_ is positioned at the second position in al l the phrases that have a length of 1.

_

Table 2

[51] T he processor 1064 of the computer system 106 then determi nes a set of document phrase synonyms. T he set of document phrase synonyms incl udes one or more synonyms of the set of initial document phrases. For example, the processor 1064 of the computer system 106 aggregates the phrases i n the set of initial document phrases by frequency to obtai n a l ist of phrases and a frequency count for each phrase. For each phrase i n the set of i nitial document phrases, the processor 1064 of the computer system 106 determi nes its one or more relevant synonyms based on the phrase s context i n the electronic document. It is however possi ble that a phrase has no synonym associated therewith.

[52] T he processor 1064 of the computer system 106 combines the relevant synonym(s) with the set of i nitial document phrases to form the set of document phrase synonyms. The frequency counts of these phrases may be summed when combined.

[53] For the Pengui n example document the set of document phrase synonyms is shown in Table 3 below. In the set of document phrase synonyms, for example, the phrase ' live_ has appeared twi ce and has one synonym ' living , the phrase ' kg_ has appeared twice and has multi ple synonyms ki lo, kilos, ki I ogram, kilograms, while the phrase "species of pengui n" has appeared once and has no synonym associated.

_ _

Table 3

[54] T he processor 1064 of the computer system 106 also determines a set of document phrases from the set of document phrase synonyms based on a first set of criteria. The set of document phrases meets the first set of criteria. For example, the processor 1064 of the computer system 106, starting from phrase_ length = 10, worki ng backwards to phrase_ length = 1, repeats:

1. If the current phrase frequency > a threshol d, then keep the phrase and delete any other phrase that has tokens that overlap with the current phrases tokens. The threshold may be determined as fol lows:

if the word count of the_doc > 500, then threshold is the average phrase frequency, else

the threshold is 2.

2. Decrement the frequency count of each deleted phrase accordi ngly.

[55] From the above loop, a l ist of phrases from the_doc that does not have overlappi ng tokens is determined, and so is the frequency count of each of these phrases.

[56] T he processor 1064 of the computer system 106 also removes any phrase(s) that does not meet quality criteria from the set of document phrase synonyms. For example, the processor 1064 of the computer system 106 deletes any of these phrases where:

1. the phrase is in a doc_phrase_stoplist; or

2. the phrase consists of words that are only i n the smart_stopl ist; or

3. the length of characters i n the phrase is <= 1 ; or 4. the phrase is i dentical to another phrase except that one or more words i n one phrase are the plural ized form of other phrases words, i.e., the phrases would be i denti cal if the pi ural i zed forms of words were si ngul ari zed.

4. the phrase is compl etely i ncorporated by another larger phrase, i.e., the smal ler phrase is a subset of a larger phrase.

[57] T he doc_phrase_stopl ist is a l ist of phrases that is used to fi lter the phrases as shown i n Table 3. Fi nal ly, the number of phrases that are kept depends on a dtm_phrase_ l imit variable, which can be dynamically set by the user. T herefore, the processor 1064 keeps the number of phrases as li mited by the dtm_phrase_ li mit phrases variable. T he rules for what phrases to keep are (in priority order):

1. phrases that have the highest frequency of occurrence i n the_doc

2. phrases that have the most amount of non smart_stoplist words.

3. phrases that appear closest to the begi nni ng of the document

4. phrases that have the rarest words withi n the top 20,000 words most used i n E ngl i sh language.

[58] T he remai ni ng phrases are now determi ned as the set of document phrases, referred to as doc_ phrases. For the Penguin example document the set of document phrases and their synonym(s) are shown in Table 4 below. For the purpose of understandi ng the present i nventi on, the del eted phrases are al so I i sted i n T abl e 4.

Table 4

[59] T he processor 1064 further determi nes a set of paragraph phrases from the set of document phrases. For example, the processor 1064 identifies one or more of the set of document phrases that is i ncl uded i n the editing paragraph of the electronic document and determi nes the one or more phrases i n the set of documents phrases to be the set of paragraph phrases, referred to as paragraph- phrases. T his can be done with some restrictions, for example, the processor 1064 identifies the paragraph phrases only if the_doc paragraph count > 1 and the_ doc word_ count > 500.

[60] T he processor 1064 further determi nes the fi rst set of phrases by combi ni ng the set of paragraph phrases, the synonyms of the set of paragraph phrases, a subset of the set of document phrases (for example, the top 5 document phrases), and the synonyms of the subset of the set of document phrases.

[ 61 ] At step 240, the processor 1064 generates a f i rst query stri ng that i s used to i nvoke the searching facility 110. The first query string includes one or more search elements that comprise the first set of phrases as determined above. The first query string is referred to as query_1 in the present specification. For example, if the first set of phrase includes two phrases: [ ' This is phrase one_ ' This is phrase twoj, the search elements include these two phrases. The search elements also include individual words from the two phrases where the word is not in the smart_stoplisf which are ' phrase_, ' one_, ' phrase_, ' two_ assuming that ' this_ and ' is_ are in the smart_stoplist for this example. Further, each search element in query_1 is considered connected by a Boolean operator ' OR_. Therefore, query_1 determined by the processor 1064 that may look like

{ ' This is phrase one_ OR ' This is phrase two_ OR phrase OR one OR phrase OR two}

[62] For the Penguin example document, an example of query_1 (any relevant synonyms are i ncl uded i n the fi rst set of phrases) is as fol I ows:

{"southern hemisphere" OR equator OR lbs: [pound, pounds] OR penguin OR southern OR hemisphere OR species OR temperate s: [warm, mild]}

[63] At step 250, the processor 1064 invokes the search facility 110 with query_1 to identify a first set of citations that matches query_1. For example, the processor 1064 communi cates query_ 1 to the search facility 110 via the communi cati on i nterface 1066.

[64] query_1 may include one or more search limitations in order for the search facility 110 to identify the first set of citations subject to these search limitations. With the search limitation(s), for example, the search can be conducted only with reference to a certain field (for example, the title, the abstract publication date, etc.) of a citation. Specifically, the user inputs the one or more search limitations via the graphical user interface provided on the computing device 102. For example, the user ticks check boxes ' Title_, ' Abstract displayed on the display of the computing device 102 or inputs a range of publication date to indicate the search limitation(s). The computing device 102 generates a search limitation indication including the search limitation(s) and sends the search limitation indication to the computer system 106. The processor 1064 of the computer system 106 receives the search limitation indication and incorporates the search limitations included in the search limitation indication into query_1. For example, query_1 with a search limitation ' Title is:

@ (title) \" southern hemisphereV

[65] This means the search facility 110 only searches for citations with titles that contain "southern hemisphere".

[66] As another example, query_1 with a search limitation ' Full text is:

@(fulltext) Vequator lb pengui n southern hemisphere species temperate\"/1

[67] This means the search facility 110 only searches for citations full text of which contains at least one of the words but better match: "equator lb penguin southern hemisphere species temperate".

[68] Asa further example, query_1 is even more complex in that it includes synonyms and more search limitations including fields and Boolean operators:

((@(title) Vsouthern hemisphereV) |(@(title) Veudyptula genusV) |(@(title)

Vequator lb s: [pound, pounds] penguin southern hemisphere species temperate s: [warm mild]\"/1) |(@( keywords) Vsouthern hemisphereV) |(@( keywords)

Veudyptula genusV) | (@( keywords) Vequator lb s: [pound, pounds] pengui n southern hemisphere species temperate s: [warm mild]\"/1) |(@(ful Itext) Vsouthern hemisphereV) |(@ (full text) Veudyptula genusV) |(@(fulltext) Vequator lbs: [pound, pounds] penguin southern hemisphere species temperate s: [warm mild]\"/1))

[69] The search facility 110 is invoked by the computer system 106 to execute query_1 by searching the database 1102 or invoking the search engine(s) 1101. Asa result, the first set of citations that matches query_1 is identified. The first set of citations is represented by a first set of citations indications. The first set of citation indications may include for example article IDs to identify the first set of citations,

[95761729, 95761740, 95764122, 95764166, 95765129, 95765147, 95765470, 95766001, 95766158, 95766490, 95761332, 95762157, 95762815, 95766069, 95760686, 95760792, 95760907, 95760995, 95761115, 95761182].

[70] The first set of citation indications also includes information in relation to the citations, for example, author, article title, journal name, page number, issue number, volume, publication date, etc. The search facility 110 sends the first set of citation indications to the computer system 106.

[71] At step 260, the processor 1064 of the computer 106 receives the first set of citation indications via the communication interface 1066. As the first set of citation indications may include a large number of citation indications, to present more relevant citations to the user that is editing the electronic document the processor 1064 further at step 270 determines a set of citation indications based on the first set of citation indications. For example, the processor 1064 determines the top 10 citation indications included in the first set of citation indications to be the set of citation indications. The set of citation indications represents the set of citations that is more relevant to the electronic document As another example, the processor 1064 invokes the search facility 110 with a further query string that is generated particularly with reference to the editing sentence and receives a second set of citation indications. The processor 1064 then determi nes the set of citation indications based on the first set of citation indications and the second set of citation indications, as described in detail below.

[72] The processor 1064 of the computer system 106 then at step 280 sends the set of citation indications to the computing device 102 being used by the user. The computing device 102 presents the set of citation indications on the computing device 102 for selection by the user. The set of citation indications may be presented in a computer graphical user i nterface on the computi ng devi ce 102.

[73] T o present more relevant citations to the user that is editing the electronic document the processor 1064 is configured to determine a second set of phrases based on the editing sentence. Specifically, the processor 1064 identifies the editing sentence that is being edited by the user, i.e., determining the variable the_ sentence. As described above, the editing sentence is the sentence in which the editing indication (e.g., the cursor) resides, which is one of the set of sentences included in editing paragraph. The processor 1064 determines the editing sentence by identifying the cursor. For example, in the electronic document shown in Table 1, the editing sentence is the second sentence in the editing paragraph.

[74] The processor 1064 then determines a set of initial sentence phrases from the editing sentence based on a language processing tool, see https:ffstanfordnip.gi for

details. For example, the language processing tool includes Stanford CoreNLP language processing tools. Using Stanford Core NLP part of speech (pos) tagging, the processor 1064 retrieves the pos tags for the_ sentence. The processor 1064 then identifies any phrases within the_ sentence where the phrase is structured in a certain pattern: 1 or more nouns (NN, NNS, NNP, NNPS, FW), followed by a single filler word (CC, IN, DT) or smart_stoplist word, followed by one or more nouns where the nouns are not in a sentence_phrase_stoplist The sentence_phrase_stoplist is a list of noun stop words that are considered too common to be used as nouns that would be searched on their own. Instead, these nouns are only used if joined with other adjectives/nouns to form search phrases.

[75] The processor 1064 also identifies any phrases that are found within the sentence where the phrase does not overlap with any extracted phrases (in terms of index of the words extracted in the overall sentence) already identified for the current sentence above, where the phrase is structured: 0, 1 or 2 adjectives (JJ, JJ R, JJS), followed by 1 or more nouns (NN, NNS, NNP, NNPS, FW) (where the nouns are not in the sentence_phrase_stoplist). Therefore, the set of initial sentence phrases includes one or more phrases identified above.

[76] For each phrase in the set of initial sentence phrases, the processor 1064 determines its synonym(s) in the phrase s context in the electronic document Each phrase potentially has 0 or more synonyms associated.

[77] The processor 1064 further combines the set of initial sentence phrases with their synonym(s) to determine a set of sentence phrase synonyms. The set of sentence phrase synonyms includes the set of initial sentences and their synonym(s). The frequency counts of these phrases may be summed when combined. For the Penguin document example, the set of sentence phrase synonyms is shown below in T able 5:

Table 5

[78] The processor 1064 may further identify overlapping synonyms and removes the overlapping synonyms from the set of phrase synonyms. Therefore, the set of sentence phrase synonyms include synonyms that do not overlap.

[79] The above set of sentence phrase synonyms may be further processed by the processor 1064 based on a second set of criteria to determine a set of sentence phrases. The set of sentence phrases meets the second set of criteria. For example, the processor 1064 identifies and deletes any phrase i n the set of sentence phrase synonyms where:

1. the phrase is in the sentence_phrase_stopl ist; or

2. the phrase consists of words that are only i n the smart_stopl ist; or

3. the length of the phrase i n characters is <= 1 ; or

4. T he phrase contai ns more than 3 si ngl e character words; or

5. T he phrase contains only words that are non-language characters; or

6. the phrase is identical to another phrase except that one or more words in one phrase are the pluralised form of other phrases words, i.e., the phrases would be identical if the plural ised forms of words were si ngularized; or

7. the phrase is completely i ncorporated by another larger phrase, i.e., the smal ler phrase is a subset of a larger phrase.

[80] T o l imit the size of the set of sentence phrases, the number of sentence phrases that are kept in the set of sentence phrases may be li mited to a sentence_phrase_ l i mit variable, which can be dynamically set T herefore, the processor 1064 removes sentence phrases such that only sentence_phrase_ l i mit phrases remain. As an example, the processor 1064 fol lows the rules below to identify the phrases to be kept (i n priority order):

1. phrases that have the highest frequency of occurrence i n the_doc.

2. phrases that have the highest number of non smart_stoplist words.

3. phrases that appear closest to the begi nni ng of the document

4. phrases that have the most rarest words in the top 20,000 words most used i n E ngl ish language.

[81] T hen from the remai ni ng sentence phrases, the processor 1064 identifies any of these phrases that have 2 or more adj acent nouns, f ol I owed by a j oi ner word, f ol I owed by 2 or more adjacent nouns. From each of these phrases, the processor 1064 extracts the adjacent nouns before and after the fi ller word (for example, a set of word classifications (CC, IN, DT) as determi ned by the Stanford Core N L P part of speech (pos) taggi ng), and add these as additional phrases to the list of sentence phrases. As a result the remaining phrase(s) i n the set of sentence phrase synonyms are determi ned to be the set of sentence phrases, referred to as sentence_ phrases.

[82] As a result of the above processing steps (or some of the above processi ng steps), for the Pengui n example document, the set of sentence phrases and thei r synonyms are shown bel ow in Table 6:

Table 6

[83] In order to save computing capacity at the computer system 106, the above steps can be performed only when it is determined that the_ sentence has changed, which means the user is editing a new sentence.

[84] T he processor 1064 further combi nes the set of sentence phrases, the synonyms of the set of sentence phrases that is included in the set of sentence phrase synonyms, the set of paragraph phrases and the synonyms of the set of paragraph phrases to determine the second set of phrases. For the Penguin example document; the second set of phrases may be as follows:

"smallest penguin species" "little blue penguin" s: [eudyptula minor, little penguin, blue penguin, fairy penguin] lbs: [pound, pounds] smallest penguin species little blue penguin eudyptula minor little penguin fairy penguin

[85] Thereafter, similar to the way of generating the first query string query_1, the processor 1064 generates a second query string including the second set of phrases, referred to as query_2 for example:

"smallest penguin species" OR "little blue penguin" s: [eudyptula minor, little penguin, blue penguin, fairy penguin] OR lbs: [pound, pounds] OR smallest OR penguin OR species OR little OR blue OR penguin OR eudyptula OR minor OR little OR penguin OR fairy OR penguin

[86] Itshould be noted that the search limitation indication described above with reference to generating query_1 is also applicable to generating query_2 in order for the search facility 110 to identify the second set of citations subject to the one or more search limitations. The computer system 106 invokes the search facility with query_2. For example, the computer system 106 communicates the second search string, i.e., query_2, to the search facility 110. The search facility 110 executes query_2 to identify a second set of citations that matches query_2 by searchi ng the database 1102 or i nvoki ng the search engi ne(s) 1101. Si mi lar to the first set of citations, the second set of citations is represented by a second set of citation indications. [87] T he second set of citation i ndications may i ncl ude for example article IDs to identify the second set of citations, for example,

[95761729, 95761740, 95764122, 95764166, 95764867, 95765129, 95765147, 95765470, 95766001, 95766158, 95766490, 95767289, 95761332, 95762157, 95762192, 95762679, 95762815, 95764103, 95767607, 95762692, 95766069, 95767120, 95762435, 95763149, 95760686, 95760792, 95760907, 95760995, 95761115, 95761182, 95761455, 95761553, 95761720, 95761772, 95761801, 95761921 , 95761988, 95762029, 95762233, 95762264, 95762346, 95762375, 95762446, 95762541, 95762713, 95762725, 95762958, 95762983, 95763014, 95763054]

[88] T he second set of citation indicati ons also i ncludes i nformation i n relation to the citations, for example, author, article title, j ournal name, page number, issue number, vol ume, publication date, etc.

[89] T he search faci lity 110 sends the second set of citation indications to the computer system 106. T he processor 1064 of the computer 106 receives the second set of citation i ndi cations via the communication i nterface 1066. T o present more relevant citations to the user, the processor 1064 further determi nes one or more citation i ndications that are i ncl uded i n both the fi rst set of citation i ndications and the second set of citation indications. T he processor 1064 determines the one or more citati on indi cations to be the set of citation i ndi cations. For the Pengui n document example, A rticle IDs of the set of citation indications may be as fol lows:

[95761729, 95761740, 95764122, 95764166, 95765129]

[90] In order to provide more citations related to the editi ng sentence, the processor 1064 may add more citation indications from the second set of citation i ndications into the set of citation indi cations. T herefore, A rticle IDs of the resulti ng set of citation i ndications may be as fol lows:

[95761729, 95761740, 95764122, 95764166, 95765129, 95764867, 95765147, 95765470, 95766001, 95766158]

[91] T he processor 1064 of the computer system 106 sends the set of citation indicati ons to the computing device 102 bei ng used by the user. T he computer device 102 presents the set of citation i ndications on the computing device 102 for selection by the user.

[92] In another embodi ment the processor 1064 is further configured to receive one or more background phrases from the computi ng device 102. For example, the user enters one or more background phrases i nto the computi ng device 102 descri bi ng the background of the electronic document. For the Penguin document the background phrase can be ' Eudyptula genus_. The background phrase is sent from the computing device 102 to the computer system 106. T he processor 1064 of the computer system 106 receives the background phrase from the computing device 102 and adds it to the first set of phrases or the second set of phrases so that the background phrase is included in query_1 or query_2 and is also searched by the search facility 110 in determi ni ng the set of citati on i ndi cati ons.

[93] Figure 3 illustrates an exemplary computing device 102 for generating a citation reference in an electronic document edited by the user in accordance with an embodiment of the present invention. As shown in Figure 3, the computing device 102 includes a memory device 301 configured to store machine readable instructions. The machine- readable instructions may be loaded into the memory device 301 from a non- transitory machine- readable mediunq such as but not limited to CD-ROMs, DVD-ROMs and Flash Drives. The machine- readable instructions may be loaded into the memory device 301 in a form of a computer software program. The memory device 301 may be selected from a group of memory devices comprising Random Access Memory (RAM), Read Only Memory (ROM), Erasable Programmable Read-only Memory (EPROM), Electrically Erasable Programmable Read-only Memory (EE PROM) and Flash memory.

[94] The computing device 102 also includes a user interface 303 for the user to interact with the computing device 102. For example, the user interface 303 includes one or more of a group of user interface devices comprising a display 3031, a keyboard 3032, a computer mouse 3033, and a touch screen 3034, etc.

[95] The computing device 102 also includes a communication interface 304 for transmission of data between the computing device 102 and any other external device connected with the computing device 102 via the communication network 104. In that manner, the communication interface 304 may include several ports such as a wired or wireless connection interface, a Universal Serial Bus (USB) port an Ethernet port e-SATA and SCSI etc.

[96] Further, the computing device 102 includes a processor 302 operably connected with the memory device 301, the communication interface 304 and the user interface 303. In various embodiments, the processor 302 is one of, but not limited to, a general-purpose processor, an Application Specific Integrated Circuit (ASIC) and a Field- Programmable Gate Array (FPGA). The processor 302 obtains the machine- readable instructions from the memory device 301. The processor 302 is configured by the machine-readable instructions to perform the method steps described with reference to Figure 4 to generate a citation reference in an electronic document.

[97] Figure 4 illustrates an example method 400 for generating a citation reference in an electronic document in accordance with the present specification. As described above with reference to Figures 1 and 2, the computer system 106 sends the set of citation indications determined to the computing device 102 via the communication network 104. The processor 302 of the computing device 102 receives 410 the set of citation indications from the computer system 106. The processor 301 presents 420 the set of citation indications in form of a list of citations on the user i nterface 303, for example, the display 3031, of the computing device 102 while the user is editing the electronic document Figure 5 illustrates a computer graphical user interface 500 where the present invention is implemented as a sidebar plugin of Microsoft Word. Figure 6 illustrates a computer graphical user interface 600 where the present invention is implemented as an online word editor. In Figures 5 and 6, the list of citations is presented next to the Penguin document for selection by the user.

[98] The user uses the user interface 303, for example, the computer mouse 3033 or the touch screen 3034, to select the one of the listed citations the user wants to cite in the document The selected citation may be then incorporated in the electronic document in a desired format at the desired location. For example, the user may consider the citation as shown in the box as a desired citation, then the citation is selected by the user using the user interface 303, particularly, using the computer mouse 3033 to click on the citation or in any other preferred way. Further, each citation included in the list may be hyperlinked and the user can check the content of the citation by clicking on the hyperlink before selecting the same as the desi red citation.

[99] Once the desired citation is selected by the user using the user interface 303, the processor 302 receives at step 430 a selection indication from the user interface 303 i ndi cati ng the one of the set of citati ons i s sel ected by the user.

[100] The processor 302 then at step 440 generates the citation reference based on the selection indication in the electronic document to associate the selected citation with the electronic document The selected citation is incorporated in the electronic document at a desired position in a desired format For example, the processor 302 determines a sentence being edited by the user in the electronic document by identifying the cursor and generates the citation reference in a fi rst format i n the sentence bei ng edited by the user, referred to as i n_ I i ne citation. For i n_ I i ne citation reference, the fi rst format may be in a short form only i ncluding author s name and year of the citation, as shown i n Figures 5 and 6. T he processor 302 may also generates a hyperl i nk i n the sentence for i n_ I i ne citation reference. T he hyperli nk is associated with the selected citation and is able to take the user to a webpage that i ncludes, for example, the ful l content of the citation if the user or a reader clicks on the hyperli nk. As another example, the processor 302 generates the citation reference i n a second format Further, the processor 302 i nserts the citation reference i n the second format at a position i n the electronic document designated by the user, for example, at the end of the electronic document as shown in Figures 5 and 6. T he formatti ng style of the i n-li ne citation and second format may be designated by the user, for example, C hicago format A PA format M L A format etc.

[101] Figure 7 i l l ustrates an exemplary system 700 in which the various embodi ments are i mplemented in accordance with the present i nvention. In Figure 7, the computing device 102 used by the user to edit the electronic document and the communication network 104 are the same as those in the system 100 shown i n Figure 1. T he system 700 differs from the system 100 in that the system 700 incl udes a backend platform 720 to perform the method steps performed by the computer system 106 i n Figure 1, particularly when there are many (for example, thousands of or even hundreds of thousands of) users are usi ng the system 700 si multaneously. T he backend platform 720 i ncl udes a load balancer 701 to route communications between the computi ng device 102 and the backend platform 720. T he load balancer 701 balances the communications between appl ication servers 702 to avoid overload or underload on some of the application servers 702. Each of the appl ication servers 702 provides services for a certai n group of users. For example, an application server 702 serves users No. 1 to 500, while another appl ication server 702 servers users No. 501 to 1000. Each of the applicati on servers 702 performs one or more of the method steps performed by the computer system 106. For example, the appl icati on servers 702 may simply obtai n the electronic documents from the computing devices 102. In order to reduce the computi ng burden of the application servers 702, some of the method steps described with reference to Figure 2 are distributed by the appl ications 702 to other servers 110, 706, 708. In order to distri bute the execution of these method steps, each of the appl ication servers 702 is connected to load balancers 703, 705, 707 to process different services. [102] The load balancer 703 balances search queries (for example, query_1 or query_2) between multiple search facilities 110. For example, if a search query determined at the application server 702 needs to be performed, the application server 702 sends the search query to the load balancer 703. The load balancer 703 determines one of the search facilities 110 associated therewith to perform the search query. The search results are then sent from the search faci I ity 110 that executes the search query back to the appl ication server 702.

[103] The load balancer 705 balances NLP requests between multiple NLP servers 706. For example, when the application server 702 needs to determine a set of initial sentence phrases from the editing sentence, the appl ication server 702 sends a NLP request to the I oad balancer 705. The load balancer 705 determines one of the NLP servers 706 to respond to the NLP request The NLP server 706 determined by the load balancer 705 performs the Stanford CoreNLP language processing to determine the set of initial sentence phrases and sends the set of initial sentence phrases back to the application server 702.

[104] The load balancer 707 balances contextual synonym requests between multiple contextual synonym engine servers 708. For example, when the application server 702 needs to determine a set of document phrase synonyms, the application server 702 sends a contextual synonym request to the load balancer 707.

[105] T he contextual synonym request i ncl udes:

a phrase for which a synonym is to be determined (or an index identifying the phrase, for example, the index for the set of initial document phrases as shown in Table 2 to identify the phrase),

a sentence i n the electronic document which contains the phrase,

a sentence prior to the sentence containing the phrase (if available), and

a sentence after the sentence contai ni ng the phrase (if avai I able).

[106] The load balancer 707 receives the contextual synonym request and determines one of the contextual synonym engine servers 708 to respond to the contextual synonym request. The load balancer 707 sends the contextual synonym request to the contextual synonym engi ne server 708 determi ned.

[107] The contextual synonym engine server 708 applies machine learning/natural language processing modelling to the full-text articles in a document database (for example, database 1102) and content within the ' Natural Language Processing Corpora to build a model to determi ne the synonym(s) of the phrase based on contextual i nformation i ncl uded in the contextual synonym request. T he contextual synonym engine server 708 generates a l ist of synonyms for the phrase withi n the context T he contextual synonym engi ne server 708 sends the l ist of synonyms (it is possi ble that there is no synonym for the phrase) to the appl ication server 702. T he ' Natural Language Processing Corpora may i nclude the Wi ki pedia database. T hese corpora may be extended to i nclude other natural language content sources without departi ng from the scope of the i nvention.

[108] As described above, doc_phrase_stoplist and sentence_phrase_stoplist variables are used to fi Iter the set of document phrase synonyms shown i n Table 3 and the set of sentence phrase synonyms as shown i n Table 5, respectively. In the present i nventi on, sentence_phrase_stopl ist and doc_phrase_stopl ist variables can be updated at a batch auto search server 109 i n the backend platform 720.

[109] A table called aggregate_doc_ phrases is mai ntained at the batch auto search server 109. The batch auto search server 109 stores in the aggregate_doc_ phrases table the running count of the phrases i n the set of document phrase synonyms and thei r frequencies across all ful l-text articles i n document databases for example database 1102. T he aggregate_doc_ phrases table therefore provides an aggregate view of the phrases across many articles i n the database 1102 i n order to determi ne the most common phrases on a document level, which are added to the doc_phrase_stopl ist

[110] Si mi larly, a table cal led aggregate_sentence_ phrases is mai ntai ned at the batch auto searc server 109 as well. T he batch auto search server 109 stores i n the aggregate_sentence_ phrase table the running count of the phrases in the set of sentence phrase synonyms and thei r frequencies across all full-text articles i n the database 1102. T his table therefore provides an aggregate view of the phrases across many articles in order to determi ne the most common phrases on a sentence level, which may be added to the sentence_ phrase_stopl ist

[111] T he process is as follows:

1. T he auto search server 109, for each arti cl e i n the database 1102, deternni nes the set of document phrase synonyms for the article using the method as described above, and adds the phrases in the set of document phrase synonyms to the aggregate_doc_ phrases table. For each phrase, the auto search server 109 adds the frequency count to the aggregate_doc_ phrases table for that phrase. 2. The auto search server 109, for each sentence in the article, determines the nouns and the set of sentence phrase synonyms using the method described above. The auto search server 109 adds the phrases in the set of sentence phrase synonyms to the aggregate_sentence_ phrases table. For each phrase, the auto search server 109 adds the frequency count to the aggregate_sentence_ phrases table for that phrase.

[112] After many articles are analyzed, the aggregate_doc_ phrases and aggregate_sentence_ phrases tables are constructed for the most common phrases on a document level and on a sentence level, and those phrases that are deemed ' too common or ' generic or ' process oriented are added to the doc_phrase_stoplist and sentence_ phrase_stopl isf respectively.

[113] As described above, the automatic generation of list of citations eliminates the need of manual intervention of the user in searching relevant citations to be incorporated in the electronic document Further, the flexibility of importing the citation in the electronic document at a desired location in a desired format enables a user to follow specific rules related to citation without hassle. Therefore, the efficiency in editing the electronic document is dramatically improved.

[114] It should be understood that the techniques of the present disclosure might be implemented using a variety of technologies. For example, the methods described herein may be implemented by a series of computer executable instructions residing on a suitable computer readable medium. Suitable computer readable media may include volatile (e.g. RAM) and/or non-volatile (e.g. ROM, disk) memory, carrier waves and transmission media. Exemplary carrier waves may take the form of electrical, electromagnetic or optical signals conveying digital data steams along a local network or a publically accessible network such as the Internet

[115] It should also be understood that, unless specifically stated otherwise as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as "controlling" or "obtaining" or "computing" or "storing" or "receiving" or "determining" or the like, refer to the action and processes of a computer system, or similar electronic computing device, that processes and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.