SCATTER-GATHER APPROACH TO FLAT CLUSTERING FOR FREQUENT TEXT BASED SEARCH QUERIES

Title:

SCATTER-GATHER APPROACH TO FLAT CLUSTERING FOR FREQUENT TEXT BASED SEARCH QUERIES

Document Type and Number:

WIPO Patent Application WO/2020/115530

Kind Code:

A1

Abstract:

In this invention we use scatter-gather approach to flat clustering for frequently used text based search queries. Now for a given set of documents we compute the Term-Document or Document-Term matrix which is a matrix that describes the frequency of terms that occur in a collection of documents. Now from the above Term-Document matrix we create a Term Frequency-Inverse Document Frequency (TF-IDF) matrix which is used to compute document similarity and create a flat set of clusters of documents which relate to each other. Now in the scatter-gather approach for a particular frequently used text based search query of a user we group the clusters of documents with relevant information and the resulting set is again clustered. The above process is repeated until a cluster of interest is found.

More Like This:

JP2019057245	INFORMATION PROCESSING APPARATUS AND PROGRAM
JP2020035115	RETRIEVAL SYSTEM, RETRIEVAL METHOD AND RETRIEVAL PROGRAM
JP6491356	Classification method, classification device and classification program

Inventors:

SHARMA PRATIK (IN)

Application Number:

PCT/IB2018/059693

Publication Date:

June 11, 2020

Filing Date:

December 06, 2018

Export Citation:

Click for automatic bibliography generation Help

Assignee:

SHARMA PRATIK (IN)

International Classes:

G06F16/00

Foreign References:

US6038561A

2000-03-14

Download PDF:

View/Download PDF PDF Help

Claims:

Claims

Following is the claim for this invention:-

1. Flat clustering creates a flat set of clusters without any explicit structure that would relate clusters to each other. In this invention we use scatter-gather approach to flat clustering for frequently used text based search queries. Now for a given set of documents we compute the Term-Document or Document-Term matrix which is a matrix that describes the frequency of terms that occur in a collection of documents. (Note here we do not consider frequently occurring insignificant terms like the, of, for, etc.) Also here we have Term Frequency- Inverse Document Frequency (TF-IDF) which is a numerical statistic that is intended to reflect how important a word is to a document in a collection of documents. The Term Frequency-Inverse Document Frequency (TF-IDF) value increases proportionally to the number of times a word appears in the document and is offset by the number of documents in the corpus that contain the word. Now from the above Term-Document matrix we create a Term Frequency -Inverse Document Frequency (TF-IDF) matrix which is used to compute document similarity and create a flat set of clusters of documents which relate to each other. Now in the scatter-gather approach for a particular frequently used text based search query of a user we group the clusters of documents with relevant information and the resulting set is again clustered. The above process is repeated until a cluster of interest is found. The above novel technique of using scatter-gather approach to flat clustering for frequently used text based search queries is the claim for this invention.

Description:

Scatter-Gather Approach To Flat Clustering For Frequent Text Based Search Queries

Flat clustering creates a flat set of clusters without any explicit structure that would relate clusters to each other. In this invention we use scatter-gather approach to flat clustering for frequently used text based search queries. Now for a given set of documents we compute the Term- Document or Document-Term matrix which is a matrix that describes the frequency of terms that occur in a collection of documents. (Note here we do not consider frequently occurring insignificant terms like the, of, for, etc.) Also here we have Term Frequency-Inverse Document Frequency (TF- IDF) which is a numerical statistic that is intended to reflect how important a word is to a document in a collection of documents. The Term Frequency-Inverse Document Frequency (TF-IDF) value increases proportionally to the number of times a word appears in the document and is offset by the number of documents in the corpus that contain the word. Now from the above Term-Document matrix we create a Term Frequency-Inverse Document Frequency (TF-IDF) matrix which is used to compute document similarity and create a flat set of clusters of documents which relate to each other. Now in the scatter-gather approach for a particular frequently used text based search query of a user we group the clusters of documents with relevant information and the resulting set is again clustered. The above process is repeated until a cluster of interest is found.

Previous Patent: METHOD FOR IMPLEMENTING TRANSFER PRICING USING BLOCKCHAIN

Next Patent: STAINLESS STEEL, PRODUCTS MADE OF THIS STEEL AND METHODS OF MANUFACTURING SAME