Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
AUTOMATIC QUERY
Document Type and Number:
WIPO Patent Application WO/2017/048277
Kind Code:
A1
Abstract:
An example method is provided in according with one implementation of the present disclosure. The method comprises analyzing, by a processor, a text corpus having terms. The method also comprises automatically organizing, by the processor, selected terms into a query including a disjunction or a conjunction of a plurality of disjunctions by: identifying a new term, automatically determining whether to place the new term into a new disjunction of the query or to add it to an existing disjunction of the query, and automatically determining into which existing disjunction to place the new term when the new term is placed into an existing disjunction.

Inventors:
FORMAN GEORGE (US)
SHAIN OLGA (IL)
Application Number:
PCT/US2015/050937
Publication Date:
March 23, 2017
Filing Date:
September 18, 2015
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
HEWLETT PACKARD ENTPR DEV LP (US)
International Classes:
G06F17/30; G06F17/27
Foreign References:
US20090150354A12009-06-11
US20090319498A12009-12-24
US6405190B12002-06-11
KR20110020117A2011-03-02
US20110066620A12011-03-17
Attorney, Agent or Firm:
KIRCHEV, Ivan T. et al. (US)
Download PDF:
Claims:
What is claimed is:

1. A method comprising:

analyzing, by a processor, a text corpus having terms; and

automatically organizing, by the processor, selected terms into a query including a disjunction or a conjunction of a plurality of disjunctions by:

identifying a new term,

automatically determining whether to place the new term into a new disjunction of the query or to add it to an existing disjunction of the query, and

automatically determining into which existing disjunction to place the new term when the new term is placed into an existing disjunction.

2. The method claims 1 , further comprising, by the processor:

determining whether the new term appears in the query or it does not appear in the corpus of terms;

ignoring the new term if the term appears in the query or it does not appear in the corpus of terms;

determining whether the query is empty or no query exists; and

automatically generating a query with a single disjunction that includes the new term when the query is empty.

3. The method of claim 1 , further comprising, by the processor:

building a proximity model for the new term based on the corpus, wherein the proximity model is based on terms in the corpus that are in proximity to the new term;

determining a similarity score for each disjunction in the query, the similarity score indicating the similarity of the new term to the disjunction; and

identifying the disjunction having a maximum similarity to the new term.

4. The method of claim 3, wherein determining a similarity score for each disjunction in the query comprises, by the processor: computing a similarity measure of the proximity model of the new term and a proximity mode! of each term already included in the disjunction;

identifying the highest similarity measure computed across all terms of the disjunction; and

using the highest similarity measure as the similarity score for the disjunction.

5. The method of claim 3, wherein determining a similarity score for each disjunction in the query comprises, by the processor:

building a proximity model for the entire disjunction by merging all the terms in the disjunction together;

computing a similarity measure by comparing the proximity model of the new term and the proximity model of the entire disjunction to determine the similarity score for the disjunction.

6. The method of claim 3, further comprising, by the processor:

comparing the similarity score of the disjunction having the maximum similarity with a threshold;

placing the new term into a new disjunction when the similarity score of the disjunction having the maximum similarity to the new term is less than the threshold; and

adding the new term to the disjunction having the maximum similarity to the new term when the similarity score is greater than the threshold.

7. The method of claim 6, further comprising, by the processor:

receiving, via a graphical user interface (GUI), an indication that the disjunction location of the new term is improper; and

automatically determining whether to place the new term into a new disjunction of the query or whether to add the new term to an existing disjunction while excluding the improper disjunction location of the new term.

8. The method of claim 7, further comprising, by the processor: disabling a disjunction of the query the GUI; and

enabling a disjunction of the query via the GUI.

9. A system comprising:

an analysis engine to analyze a text corpus having terms;

a query engine to automatically organize selected terms into a query including a disjunction or a conjunction of a plurality of disjunctions, wherein the query engine is to:

identify a new term,

automatically determine whether to place the new term into a new disjunction of the query or to add it to an existing disjunction of the query, automatically determine into which existing disjunction to place the new term when the new term is placed into an existing disjunction, and automatically generate a query with a single disjunction that includes the new term when the query is empty.

10. The system of claim 9, wherein the query engine is further to:

build a proximity model for the new term based on the corpus, wherein the proximity model is based on terms in the corpus that are in proximity to the new term;

determine a similarity score for each disjunction in the query, the similarity score indicating the similarity of the new term to the disjunction; and

identify the disjunction having a maximum similarity to the new term.

11. The system of claim 10, wherein the query engine is further to: compare the similarity score of the disjunction having the maximum similarity with a threshold;

place the new term into a new disjunction when the similarity score of the disjunction having the maximum similarity to the new term is less than the threshold; and

add the new term to the disjunction having the maximum similarity to the new term when the similarity score is greater than the threshold. 12. The system of claim 11 , wherein the query engine is further to:

determine, based on an input via a graphical user interface, that the disjunction location of the new term is improper; and

automatically determine whether to place the new term into a new disjunction of the query or whether to add it to an existing disjunction while excluding the improper disjunction location of the new term.

13. A non-transitory machine-readable storage medium encoded with instructions executable by a processor, the machine-readable storage medium comprising instructions to:

analyze a text corpus having terms; and

automatically organize selected terms into a query including a disjunction or a conjunction of a plurality of disjunctions, the instructions to

identify a new term,

ignore the new term if the term appears in the query or it does not appear in the corpus of terms,

automatically determine whether to place the new term into a new disjunction of the query or to add it to an existing disjunction of the query, and

automatically determine into which existing disjunction to place the new term when the new term is placed into an existing disjunction.

14. The non-transitory machine-readable storage medium of claim 13, further comprising instructions to:

build a proximity model for the new term based on the corpus, wherein the proximity model is based on terms in the corpus that are in proximity to the new term;

determine a similarity score for each disjunction in the query, the similarity score indicating the similarity of the new term to the disjunction; and

identify the disjunction having a maximum similarity to the new term. 15. The non-transitory machine-readable storage medium of claim 14, further comprising instructions to:

compare the similarity score of the disjunction having the maximum similarity with a threshold;

place the new term into a new disjunction when the similarity score of the disjunction having the maximum similarity to the new term is less than the threshold; and

add the new term to the disjunction having the maximum similarity to the new term when the similarity score is greater than the threshold.

Description:
AUTOMATIC QUERY

[0001] A variety of analytic tasks may be performed on data, and the results may be provided to a user. The analytics tasks may include creating and running queries, clustering, pattern detection, classification, and others.

BRIEF DESCRIPTION OF THE DRAWINGS

[0002] Figure 1 is a schematic illustration of an example system for automatically organizing selected terms from a text corpus into a query that includes a disjunction or a conjunction of a plurality of disjunctions in accordance with an implementation of the present disclosure.

[0003] Figure 2 illustrates a flowchart showing an example of a method for automatically organizing selected terms from a text corpus into a query that includes a disjunction or a conjunction of a plurality of disjunctions in accordance with an implementation of the present disclosure.

[0004] Figures 3 and 4 illustrate a flowchart showing an example of a method for automatically determining whether to place a new selected term into a new disjunction of the query or to add it to a specific existing disjunction in accordance with an implementation of the present disclosure.

[0005] Figure 5 is an example table including a plurality of input sequences of terms and the corresponding queries that are automatically generated based on the inputted terms in accordance with an implementation of the present disclosure.

[0006] Figure 6 is an example block diagram illustrating a computer- readable medium in accordance with an implementation of the present disclosure.

DETAILED DESCRIPTION OF SPECIFIC EXAMPLES

[0007] Many entities (e.g., enterprises, organizations) utilize databases for storage of data relating to the entities. For example, a business may maintain a database of customer information, and the customer information may be accessed by querying the database. Further, entities may generally store vast amounts of data originating from their business, including operations data, customer feedback data, financial data, Human Resource data, and so forth. Data stored in these databases may be accessed and updated for various purposes. [0008] As described above, data stored in a database may be accessed and updated for various purposes. Quite often, a relatively large volume of data is searched for purposes of identifying and retrieving the closest matches to a search query. For example, the data may be searched for patterns or for various other purposes, such as classification, pattern detection, modeling and anomaly detection, as examples.

[0009] There is a huge potential in mining such data for insights. At present, exploring this potential typically requires lengthy and expensive analytics projects, involving subject matter experts (SMEs), data scientists, IT personnel and developers. Generally, enormous time and resources are spent for the SMEs, data scientists and developers to communicate, transfer knowledge, identify relevant problems, and design appropriate solutions. Even creating a search query (e.g., a Boolean query) generally involves SMEs involvement and manual selection.

[0010] For example, in order to create a query, a user may utilize a Graphical User Interface ('GUI") web analytics tool. The most common GUI tool simply allows a user to type terms for the query and to connect the terms into the desired query. Another GUI too! may show a collection of potentially interesting terms and mode selector buttons (e.g., "and," "or"). In that GUI tool, selecting multiple terms may create a conjunction among the terms. Additionally, if the user clicks the mode selector button and then it toggles to the OR" mode, the multiple selected terms may form a disjunction. In any event, analytical decisions regarding the terms to be included in the query along with manually creating the query are required by the user. As used herein, "term" refers to an individual word, number, phrase, words with special symbols that may be used in a query. As used herein, the term "disjunction" refers to an OR" query among a set of terms and the term "conjunction" refers to an "AND" query among its sub-parts (e.g., a set of terms, set of disjunctions) or a single term/disjunction.

[0011] For a user, it can be confusing and annoying to select multiple terms for a query and have the wrong mode applied to the actual query. For example, if the user selects the terms line, lines, linea, and lineas, although he or she may be thinking of them as synonyms, they will likely receive zero search results if the mode selector happens to be in "AND" mode. Similarly, if the user selects the terms crack and screen, the user can be disoriented/confused why seemingly unrelated cases appear (e.g. "network problem causes blue screen" or "user victim of password crack attempt") until they figure out that the system unintentionally performed an OR-mode query when it should have done an AND-mode query. Furthermore, existing GUI tools may need a great user involvement and analysis to produce a mixture of a conjunction of a plurality of disjunctions.

[0012] Therefore, existing tools and techniques for creating queries may not be sufficient to automatically create a query by determining whether to place a new term into a new disjunction of the query or to add it to a specific existing disjunction. In this regard, according to examples, computer-implemented system and techniques for automatically organizing selected terms from a text corpus into a query that includes a disjunction or a conjunction of a plurality of disjunctions are disclosure herein. Such query representation is known in the art as "conjunctive normal form." A conjunctive normal form query also includes the case of a simple conjunction of terms: it is a conjunction of disjunctions, where each disjunction involves just a single term. Further, a disjunction of a single term is just the term itself (i.e., the records selected by that term). In one example, the proposed techniques may identify a new term (e.g., selected by a user via a GUI) from a text corpus and may automatically determine whether to place the new term into a new disjunction of the query or to add it to a particular existing disjunction.

[0013] Further, while existing query tools and techniques require extensive analytical and physical participation from a user, the proposed techniques are directed towards an automated system and a method for creating queries. For example, with the GUI web analytics tool described above the user can select either AND-mode for all words or OR-mode for all words. Thus, it is very easy for a user to make an error when applying interfaces with modality option between AND/OR. This requires an extra step to set the tool correctly, and it can be confusing and irritating when the tool is incorrectly set. In many cases, users don't think about the modes and they often jump straight from selection of words to the results with a different mode in their head than the one that was executed. In contrast, the proposed techniques allow a user to select multiple terms together in any order to produce a query in conjunctive normal form. [0014] In one example, a processor may analyze a text corpus having terms and may automatically organize selected terms into a query including a disjunction or a conjunction of a plurality of disjunctions. The processor may organize selected terms into a query by identifying a new term, automatically determine whether to place the new term into a new disjunction of the query or to add it to an existing disjunction of the query, and automatically determine into which existing disjunction to place the new term when the term is placed into an existing disjunction.

[0015] In the following detailed description, reference is made to the accompanying drawings, which form a part hereof, and in which is shown by way of illustration specific examples in which the disclosed subject matter may be practiced. It is to be understood that other examples may be utilized and structural or logical changes may be made without departing from the scope of the present disclosure. The following detailed description, therefore, is not to be taken in a limiting sense, and the scope of the present disclosure is defined by the appended claims. Also, it is to be understood that the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. The use of "including," "comprising" or "having" and variations thereof herein is meant to encompass the items listed thereafter and equivalents thereof as well as additional items. Furthermore, the term "based on," as used herein, means "based at least in part on." It should also be noted that a plurality of hardware and software based devices, as well as a plurality of different structural components may be used to implement the disclosed methods and devices.

[0016] Referring now to the figures, Figure 1 is a schematic illustration of an example system 10 for automatically organizing selected terms from a text corpus into a query that includes a disjunction or a conjunction of a plurality of disjunctions. The illustrated system 10 is capable of carrying out the techniques described below. As shown in Figure 1 , the system 10 is depicted as including at least one a computing device 100. in the embodiment of Figure 1 , computing device 100 includes a processor 102, an interface 106, and a machine-readable storage medium 110. Although only computing device 100 is described in details below, the techniques described herein may be performed with several computing devices or by engines distributed on different devices. 10017] In one example, the computing device 100 (or another computing device) may communicate with a text corpus 150 and with an interactive user interface 160 (e.g., graphical user interface). The text corpus 150 may include different types of data (i.e., plurality of terms) organized in documents, files, etc. The data in the text corpus 150 may include text-like data, categorical data, numerical data, structured data, unstructured data, or any other type of data.

[0018] The computing device 100 may be any type of a computing device and may include at least engines 120-130. in one implementation, the computing device 100 may be an independent computing device. Engines 120-130 may or may not be part of the machine-readable storage medium 110. In another alternative example, engines 120-130 may be distributed between the computing device 100 and other computing devices. The computing device 100 may include additional components and some of the components depicted therein may be removed and/or modified without departing from a scope of the system that allows for carrying out the functionality described herein. It is to be understood that the operations described as being performed by the engines 120-130 of the computing device 100 that are related to this description may, in some implementations, be performed by external engines (not shown) or distributed between the engines of the computing device 100 and other electronic/computing devices.

[0019] Processor 102 may be central processing unit(s) (CPUs), microprocessor(s), and/or other hardware device(s) suitable for retrieval and execution of instructions (not shown) stored in machine-readable storage medium 110. Processor 102 may fetch, decode, and execute instructions to identify different groups in a dataset. As an alternative or in addition to retrieving and executing instructions, processor 102 may include electronic circuits comprising a number of electronic components for performing the functionality of instructions.

[0020] Interface 106 may include a number of electronic components for communicating with various devices. For example, interface 106 may be an Ethernet interface, a Universal Serial Bus (USB) interface, an IEEE 1394 (Firewire) interface, an external Serial Advanced Technology Attachment (eSATA) interface, or any other physical connection interface suitable for communication with the computing device. Alternatively, interface 106 may be a wireless interface, such as a wireless local area network (WLAN) interface or a near-field communication (NFC) interface that is used to connect with other devices/systems and/or to a network. The user interface 160 and the computing device 100 may be connected via a network. In one example, the network may be a mesh sensor network (not shown). The network may include any suitable type or configuration of network to allow for communication between the computing device 100, the user interface 160, and any other devices/systems (e.g., other computing devices, displays), for example, to send and receive data to and from a corresponding interface of another device.

[0021] Each of the engines 120-130 may include, for example, a hardware device including electronic circuitry for implementing the functionality described below, such as control logic and/or memory. In addition or as an alternative, the engines 120-130 may be implemented as any combination of hardware and software to implement the functionalities of the engines. For example, the hardware may be a processor and the software may be a series of instructions or microcode encoded on a machine-readable storage medium and executable by the processor. Therefore, as used herein, an engine may include program code (e.g., computer executable instructions), hardware, firmware, and/or logic, or combination thereof to perform particular actions, tasks, and functions described in more detail herein in reference to Figures 2-6.

[0022] In one example, the analysis engine 120 may analyze a text corpus (e.g., corpus 150) having a plurality of terms. As noted above, the text corpus may include terms organized in documents, files, etc. Various techniques may be used to analyze the text corpus.

[0023] The query engine 130 may automatically organize selected terms into a query including a disjunction or a conjunction of a plurality of disjunctions. For example, the query engine 130 may identify a new term. Various techniques may be used to identify a new term - detecting user's input (e.g., typing a term, clicking on a term, highlighting a term), detecting an automatic selection of a term, etc. In addition, the query engine 130 may automatically determine whether to place the new term into a new disjunction of the query or to add it to an existing disjunction of the query. In other words, the query engine 130 evaluates the new term and determines the best placement for that new term. The engine 130 may automaticaliy determine into which existing disjunction to place the new term when the new term is placed into an existing disjunction. In one example, the query engine 130 may automaticaliy generate a query with a single disjunction that includes the new term when the query is empty. In other words, when the engine 130 determines that the query does not include any existing terms (i.e., is empty), the engine places the new term into a new disjunction that now forms the query.

[0024] Figure 2 illustrates a flowchart showing an example of a method 200 for automatically organizing selected terms from a text corpus into a query that includes a disjunction or a conjunction of a plurality of disjunctions. Although execution of the method 200 is described below with reference to the system 10, the components for executing the method 200 may be spread among multiple devices/systems. The method 200 may be implemented in the form of executable instructions stored on a machine-readable storage medium, and/or in the form of electronic circuitry.

[0025] In one example, the method 200 can be executed by at least one processor (e.g., processor 102 of device 100). In other examples, the method may be executed by another processor in communication with the system 10. Various elements or blocks described herein with respect to the method 200 are capable of being executed simultaneously, in parallel, or in an order that differs from the illustrated serial manner of execution. The method 200 is also capable of being executed using additional or fewer elements than are shown in the illustrated examples.

[0026] The method 200 begins at 210, where a processor may analyze a text corpus (e.g., corpus 150) having terms. Various techniques may be used to analyze the text corpus.

[0027] At 220, the processor may automaticaliy organize selected terms into a query that includes a disjunction or a conjunction of a plurality of disjunctions (i.e., a conjunctive normal form query). Therefore, the proposed method does not require extensive participation from a user but simply allows a user to select multiple terms together in any order to produce a query including a conjunction of a plurality of disjunctions. [0028] Automatically organizing the selected terms into conjunctive normal form query may involve the following steps. At 230, the processor may identify a new term. In some examples, the new term may be introduced by a user and the processor may detect the user's input (e.g., typing, clicking, highlighting). In another example, the new term may be introduced by the system.

[0029] At 240, the processor may automatically determine whether to place the new term into a new disjunction of the query or to add it to an existing disjunction of the query. In other words, the processor evaluates the new term and determines what the most accurate place in the query is for that new term. At 250, the processor may automatically determine into which existing disjunction to place the new term when the new term is placed into an existing disjunction. Thus, the proposed techniques may create a very specific and useful query based on the text corpus. The techniques used to determine whether to place the new term into a new disjunction of the query or to add it to a specific existing disjunction are described in additional details below in relation to Figures 3-5.

[0030] Figures 3 and 4 illustrate a flowchart showing an example of a method 300 automatically determining whether to place a new selected term into a new disjunction of the query or to add it to a specific existing disjunction. The method shown in Figures 3-4 will be described in references to Figure 5, which shows an example table 500 including a plurality of input sequences 520 of terms and the corresponding queries 530 that are automatically generated based on the inputted terms.

[0031] Although execution of the method 300 is described below with reference to the system 10, the components for executing the method 300 may be spread among multiple devices/systems. The method 300 may be implemented in the form of executable instructions stored on a machine-readable storage medium, and/or in the form of electronic circuitry. In one example, the method 300 can be executed by at least one processor of a computing device (e.g., processor 102 of device 100).

[0032] The method 300 begins at 310, where a processor may determine whether the new term appears in the query or it does not appear in the corpus of terms. In other words, a processor may evaluate the incoming term in view of the existing query (if any). If the term appears in the query or it does not appear in the corpus of terms, the processor may ignore the new term (at 320). Thus, if the new term already appears in the query or if the term is not found in the corpus of terms, the processor may discard the term, rather than placing it into a query.

[0033] Alternatively, when the new term does not appear in the query and it appears in the corpus of terms, the processor may determine whether the existing evaluated query is empty or whether a query exists at all (at 330). If the processor determines that the query is empty or no query exists, the processor may automatically generate a query with a single disjunction that includes the new term (at 340). In one example, the processor may initially receive an initial conjunctive normal form query (which at first is an empty query), prior to identifying any selected terms to be included in the query. For example, as shown in row 540 of the table in Figure 5, when the new term is "line" and there are no other terms in the query, the return query includes one disjunction with the term "line" in it. If, on the other hand, the processor determines that the there is an existing query that is not empty (e.g., the query includes at least one existing term), the processor takes additional steps to evaluate the new term in relation to the query and the corpus of terms to determine where exactly to place the new term in the query.

[0034] With continued reference to Figures 3-5, the processor may build a proximity model for the new term based on the corpus, where the proximity model is based on terms in the corpus that are in proximity to the new term (at 350). The goal of the proximity model is to help the processor to decide whether the new term is similar to one of the existing terms in the query (at this point, the processor knows that the query is not empty). In one example, the processor may build a term count histogram of the terms in the corpus that appear near the new term (e.g., terms within a window of predetermined distance (within 2 terms, 3 terms, etc.), terms within the same sentence/paragraph/document, etc.). Thus, the processor may build/train a model on the corpus to determine the usual terms that appear around the new term. In one implementation, the proximity model may be reduced to a normalized form (e.g., by dividing the count for each term by the Euclidean vector length). Thus, each proximity model can return a number for each proximal term and compare it with the corresponding number for the same term from another model. In some examples, the proximity model may be scaled (e.g. by Inverse Document Frequency or a similar technique).

[0035] At 360, the processor may determine a similarity score for each disjunction in the query, where the similarity score indicates the similarity of the new term to the disjunction ("each" may be just one if the query currently includes only a single disjunction). Various techniques may be used to determine the similarity score for a single disjunction. In one example, the processor may: a) compute a similarity measure of the proximity model of the new term and a proximity mode! of each term already included in the disjunction (i.e., the processor compares the similarity of their proximity models); b) identify the highest similarity measure computed across all terms of the disjunction; c) use the highest similarity measure as the similarity score for the disjunction. In another implementation, the processor may compute the similarity of the new term with a proximity model for the entire disjunction. For example, the processor may build a proximity model for the entire disjunction by merging all the terms in the disjunction together (e.g., the processor may treat occurrences of "screen" and "display" as identical and may include terms in proximity to either one). Then, the processor may compute a similarity measure comparing the proximity model of the new term and the proximity model of the entire disjunction to determine the similarity score for the disjunction.

[0036] When comparing two proximity models, various techniques may be used to compute their similarity score. In some examples, the commonplace cosine-similarity measure may be used. In other examples, other forms of similarity between distributions may also be used. Therefore, based on the comparison between the proximity model of the new term and the proximity models of the existing terms in a disjunction, the processor determines the similarity score for each disjunction (e.g., based on the similarity between a term in the disjunction and the new term).

[0037] At 370, the processor may identify a disjunction having a maximum similarity to the new term. In one example, the processor may select any disjunction that has the maximum similarity score as the disjunction being most similar to the new term (i.e. the disjunction having a maximum similarity to the new term).

[0038] With continued reference to Figures 3-5, the processor may compare the similarity score of the disjunction having the maximum similarity to the new term with a threshold (at 385). The threshold may be preset, computed by the processor, set by a user, etc. At 390, the processor may place the new term into a new disjunction when the similarity score of the disjunction having the maximum similarity to the new term is less than the threshold. Alternatively, the processor may add the new term to the disjunction having the maximum similarity to the new term when the similarity score is greater than the threshold (at 395). That way, the processor automatically determines into which existing disjunction to place the new term when the new term is placed into an existing disjunction. As shown in row 550 of the table in Figure 5, when the input sequence of terms includes "screen, crack, led, cracked, display, damage" (the sequence of input may be random or an ordered sequence as selected by the user) the query outputted by the processor using the proposed technique may be: (screen OR led OR display) AND (crack OR cracked OR damaged).

[0039] In some implementations, a term may be placed in an incorrect disjunction of the query and a user may prefer to correct that. For example, the processor may receive an indication that the disjunction location of a new term is improper (i.e., the term should not be included in that disjunction). The indication received by the processor may be initiated by a user action (e.g., clicking on the term, highlighting the term). In one example, a term and an improper disjunction location may be identified via a GUI (e.g., interface 160). For example, the user may click a single "fix-it" button which identifies the most recently placed term and its location as being improper. Next (after the processor was notified about an improper disjunction location of the new term), the processor may automatically determine whether to place the new term into a new disjunction of the query or whether to add the new term to an existing disjunction while excluding the improper disjunction location of the new term.

[0040] Thus, the processor may repeat the techniques described of method 300 but may not include the specified improper disjunction in the new selection process. That correction process may be repeated multiple times (i.e., based on multiple user interactions) for a single new term and several disjunctions may be excluded from the selection process (e.g., until the user is satisfied with the disjunction placement of that term). For example, if the technique initially places a new term into a new disjunction and that disjunction is identified as an improper disjunction location, the processor may not place the new term into another new disjunction but may place it only into any of an existing disjunction.

[0041] In another implementation, a user may like to "disable" or "enable" an entire disjunction of the query from receiving a new term. For example, the processor may disable a disjunction of the query via the GUI and may enable a disjunction of the query via the GUI (e.g., based on an action by the user - clicking on the disjunction, highlighting the disjunction). In addition, the GUI may provide a way to remove terms from the query (e.g., by deselecting a term by clicking a term that is currently selected, by clicking on a little X icon that appears next to term in the query). When this happens, the processor may remove the deselected term(s) from the query. Further, the GUI may provide a way for the user to drag a term from place to place in the query (e.g., to fix a mistake or to put the term in a more intuitive order). The user may also utilize the GUI to drag a term to a holding area for future use.

[0042] Figure 6 illustrates a computer 601 and a non-transitory machine- readable medium 605 according to an example. In one example, the computer 601 maybe similar to the computing device 100 of the system 10 or may include a plurality of computers. For example, the computer may be a server computer, a workstation computer, a desktop computer, a laptop, a mobile device, or the like, and may be part of a distributed system. The computer may include one or more processors and one or more machine-readable storage media. In one example, the computer may include a user interface (e.g., touch interface, mouse, keyboard, or gesture input device).

[0043] Computer 601 may perform methods 200-300 and variations thereof. Additionally, the functionality implemented by computer 601 may be part of a larger software platform, system, application, or the like. Computer 601 may be connected to a database (not shown) via a network. The network may be any type of communications network, including, but not limited to, wire-based networks (e.g., cable), wireless networks (e.g., cellular, satellite), cellular telecommunications network(s), and IP-based telecommunications network(s) (e.g., Voice over Internet Protocol networks). The network may also include traditional land!ine or a public switched telephone network (PSTN), or combinations of the foregoing.

[0044] The computer 601 may include a processor 603 and non-transitory machine-readable storage medium 605. The processor 603 (e.g., a central processing unit, a group of distributed processors, a microprocessor, a microcontroller, an application-specific integrated circuit (ASIC), a graphics processor, a multiprocessor, a virtual processor, a cloud processing system, or another suitable controller or programmable device) and the storage medium 605 may be operatively coupled to a bus. Processor 603 can include single or multiple cores on a chip, multiple cores across multiple chips, multiple cores across multiple devices, or combinations thereof.

[0045] The storage medium 605 may include any suitable type, number, and configuration of volatile or non- volatile machine-readable storage media to store instructions and data. Examples of machine-readable storage media in include read-only memory ("ROM"), random access memory ("RAM") (e.g., dynamic RAM ["DRAM"], synchronous DRAM ["SDRAM"]), electrically erasable programmable read-only memory ("EEPROM"), magnetoresistive random access memory (MRAM), memristor, flash memory, SD card, floppy disk, compact disc read only memory (CD-ROM), digital video disc read only memory (DVD-ROM), and other suitable magnetic, optical, physical, or electronic memory on which software may be stored.

[0046] Software stored on the non-transitory machine-readable storage media 605 and executed by the processor 603 includes, for example, firmware, applications, program data, filters, rules, program modules, and other executable instructions. The processor 603 retrieves from the machine-readable storage media 605 and executes, among other things, instructions related to the control processes and methods described herein.

[0047] The processor 603 may fetch, decode, and execute instructions 607- 611 among others, to implement various processing. As an alternative or in addition to retrieving and executing instructions, processor 603 may include at least one integrated circuit (IC), other control logic, other electronic circuits, or combinations thereof that include a number of electronic components for performing the functionality of instructions 607-611. Accordingly, processor 603 may be implemented across multiple processing units and instructions 607-611 may be implemented by different processing units in different areas of computer 601.

[0048] The instructions 607-611 when executed by processor 603 (e.g., via one processing element or multiple processing elements of the processor) can cause processor 603 to perform processes, for example, methods 200-300, and/or variations and portions thereof. In other examples, the execution of these and other methods may be distributed between the processor 603 and other processors in communication with the processor 603.

[0049] For example, analysis instructions 607 may cause processor 603 to analyze a text corpus having terms. These instructions may function similarly to the techniques described in block 210 of method 200.

[0050] Query instructions 611 may cause the processor 603 to automatically organize selected terms into a query including a disjunction or a conjunction of a plurality of disjunctions. These instructions may function similarly to the techniques described block 220 of method 200. Further, query instructions 611 may cause the processor 603 to identify a new term (e.g., based on user's input as explained above), ignore the new term if the term appears in the query or it does not appear in the corpus of terms, automatically determine whether to place the new term into a new disjunction of the query or to add it to an existing disjunction of the query, and automatically determine into which existing disjunction to place the new term when the new term is placed into an existing disjunction.

[0051] Query instructions 611 may also cause the processor 603 to: build a proximity model for the new term based on the corpus, where the proximity model is based on terms in the corpus that are in proximity to the new term; determine a similarity score for each disjunction in the query, the similarity score indicating the similarity of the new term to the disjunction; and identify the disjunction having a maximum similarity to the new term. Query instructions may further cause the processor 603 to: compare the similarity score of the disjunction having the maximum similarity with a threshold; place the new term into a new disjunction when the similarity score of the disjunction having the maximum similarity to the new term is less than the threshold; and add the new term to the disjunction having the maximum similarity to the new term when the similarity score is greater than the threshold. These instructions may function similarly to the techniques described in blocks 230-250 of method 200 and method 300.

[0052] In the foregoing description, numerous details are set forth to provide an understanding of the subject matter disclosed herein. However, implementations may be practiced without some or all of these details. Other implementations may include modifications and variations from the details discussed above. It is intended that the appended claims cover such modifications and variations.