METHOD AND SYSTEM OF INTELLIGENT SEMTIMENT AND EMOTION SENSING WITH ADAPTIVE LEARNING

Title:

METHOD AND SYSTEM OF INTELLIGENT SEMTIMENT AND EMOTION SENSING WITH ADAPTIVE LEARNING

Document Type and Number:

WIPO Patent Application WO/2018/182501

Kind Code:

Abstract:

A system for classifying text messages in accordance with sentiment and/or emotion expressed by the text messages, a method for classifying text messages in accordance with sentiment and/or emotion expressed by the text messages, and in which a method for handling ambivalence or hidden sarcasm in text messages are provided. The method for classifying text messages in accordance with sentiment and/or emotion expressed by the text messages includes decomposing each of the text messages into one or more portions and performing learning based scoring of each portion of each of the text messages while concurrently performing non-learning based scoring of each portion of each of the text messages to process each portion of each of the text messages in a single pass. The method further includes classifying each of the text messages in accordance with the sentiment and/or emotion expressed by the text message in response to an aggregation of a combination of the learning based scoring and the non-learning based scoring of each of the one or more portions of the text message.

Inventors:

WANG ZHAOXIA (SG)
TONG JOO CHUAN VICTOR (SG)
HO SENG-BENG (SG)

Application Number:

PCT/SG2017/050172

Publication Date:

October 04, 2018

Filing Date:

March 30, 2017

Export Citation:

Click for automatic bibliography generation Help

Assignee:

AGENCY SCIENCE TECH & RES (SG)

International Classes:

G06N5/04; G06F17/30; G06F40/20

Foreign References:

US20170011029A1	2017-01-12
US20160189057A1	2016-06-30
CN102682130A	2012-09-19
US20080249764A1	2008-10-09
US20160351187A1	2016-12-01
TW200506657A	2005-02-16

Other References:

SIDDIQUA U.A. ET AL.: "Combining a Rule-based Classifier with Ensemble of Feature Sets and Machine Learning Techniques for Sentiment Analysis on Microblog", 19TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 18 December 2016 (2016-12-18), pages 304 - 309, XP033068358, [retrieved on 20170509]

Attorney, Agent or Firm:

SPRUSON & FERGUSON (ASIA) PTE LTD (SG)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

What is claimed is:

1. A system for classifying text messages in accordance with sentiment and/or emotion expressed by the text messages, the system comprising:

a text decomposing module for decomposing each of the text messages into one or more portions;

a first learning based module for learning based scoring of each portion of each of the text messages;

a non-learning based module for non-learning based scoring of each portion of each of the text messages, wherein the first learning based module and the non- learning based module score each portion of each of the text messages concurrently as each portion of each of the text messages is processed in a single pass;

an aggregation module coupled to the first learning based module and the non- learning based module for classifying each of the text messages in accordance with the sentiment and/or emotion expressed by the text message in response to an aggregation of a combination of the learning based scoring and the non-learning based scoring of each of the one or more portions of the text message; and

a second learning based module coupled to the aggregation module to learn and update knowledge for the non-learning based module and the first learning based module in response to the aggregation of the combination of the learning based scoring and the non-learning based scoring of each of the one or more portions of the text message.

2. The system in accordance with Claim 1 wherein the aggregation module combines the learning based scoring and the non-learning based scoring of each portion of the text message in accordance with a user selectable weighting of the learning based scoring and the non-learning based scoring to obtain a combined scoring of each of the one or more portions of the text message.

3. The system in accordance with Claim 2 wherein the aggregation module aggregates the combined scoring of each of the one or more portions of the text message to classify each of the text messages into the sentiment and/or emotion expressed by the text message in accordance with a user selectable aggregation operation.

4. The system in accordance with Claim 1 wherein the non-learning based module includes a module for ambivalence and hidden sarcasm detection, the non-learning based scoring of each portion of each of the text messages determined at least partially in response to the module detecting hidden sarcasm in the portion of the text message.

5. The system in accordance with Claim 4 wherein the non-learning based scoring of the one or more portions of each of the text messages by the non- learning based module is determined at least partially in response to the module for ambivalence and hidden sarcasm detection not detecting hidden sarcasm but detecting ambivalence in the portion of the text message.

6. The system in accordance with Claim 1 further comprising a negation and question portion handling module coupled to the text decomposing module for receiving the one or more portions of each of the text messages, the negation and question portion handling module indicating whether a portion of a text message comprises a question and identifying a location of a negation within a portion of a text message, and wherein the negation and question portion handling module provides all portions of each of the text messages not comprising a question and identified as not having a negation located therein to the first learning based module for learning based scoring thereof.

7. The system in accordance with Claim 6 wherein the negation and question portion handling module further provides any portion of a text message identified as having a negation located therein and all text portions including text portions with questions and text portions with negations therein to the non-learning based module for non-learning based scoring of the portion of the text message.

8. The system in accordance with Claim 6 further comprising:

a portion sensing further analysis module receives each classified portion of each of the text messages from the non-learning based module and the associated learning based scoring from the first learning based module and provides them to the aggregation module; and

an automatic knowledge learning and updating unit for handling user inputs of ground truth labelled data with negation and question marks before feeding the data to the negation and question portion handling module, wherein the portion sensing further analysis module further receives classified portions of each of the text messages from the non-learning based module and aggregates such portions with the portions of each of the text messages received from the first learning based module for providing to the automatic knowledge learning and updating unit for knowledge extraction for automatic updating of the automatic knowledge learning and updating unit in response to the processed portions of the text messages.

9. The system in accordance with Claim 8 wherein the second learning based module receives the aggregated portions from the non-learning based module and the first learning based module, the second learning based module performing learning based knowledge extraction of the aggregated portions.

10. The system in accordance with Claim 1 wherein the second learning based module is coupled to the non-learning based module and the first learning based module for receiving an aggregation of each classified portion of each of the text messages from the non-learning based module and each portion of each of the text messages from the first learning based module, the second learning based module performing learning based knowledge extraction on the aggregation.

11. A method for classifying text messages in accordance with sentiment and/or emotion expressed by the text messages, the method comprising:

decomposing each of the text messages into one or more portions;

performing learning based scoring of each portion of each of the text messages while concurrently performing non-learning based scoring of each portion of each of the text messages to process each portion of each of the text messages in a single pass; and

classifying each of the text messages in accordance with the sentiment and/or emotion expressed by the text message in response to an aggregation of a combination of the learning based scoring and the non-learning based scoring of each of the one or more portions of the text message.

12. The method in accordance with Claim 11 wherein the classifying step combines the learning based scoring and the non-learning based scoring of each portion of the text message in accordance with a user selectable weighting of the learning based scoring and the non-learning based scoring to obtain a combined scoring of each of the one or more portions of the text message.

13. The method in accordance with Claim 12 wherein the classifying step aggregates the combined scoring of each of the one or more portions of the text message to classify each of the text messages into the sentiment and/or emotion expressed by the text message in accordance with a user selectable aggregation operation.

14. The method in accordance with Claim 11 wherein performing the non- learning based scoring of each portion of each of the text messages is determined at least partially in response to detecting hidden sarcasm in the portion of the text message.

15. The method in accordance with Claim 11 wherein performing the non- learning based scoring of the one or more portions of each of the text messages is determined at least partially in response to not detecting hidden sarcasm while detecting ambivalence in the portion of the text message.

16. The method in accordance with Claim 11 further comprising:

indicating whether a portion of a text message comprises a question and/or identifying a location of a negation within a portion of a text message; and

providing all portions of each of the text messages not comprising a question and identified as not having a negation located therein for learning based scoring thereof.

17. The method in accordance with Claim 16 further comprising providing any portion of a text message identified as having a negation located therein and all other text portions including text portions with questions therein for non-learning based scoring of the portion of the text message.

18. The method in accordance with Claim 11 further comprising performing learning based knowledge extraction on the aggregation of the combination of the learning based scoring and the non-learning based scoring of each of the one or more portions of the text message.

19. Computer readable storage media having stored thereon computer program code for performing, when running on a computing device, a method for classifying text messages in accordance with sentiment and/or emotion expressed by the text messages, the method configured to:

decompose each of the text messages into one or more portions;

perform learning based scoring of each portion of each of the text messages while concurrently performing non-learning based scoring of each portion of each of the text messages to process each portion of each of the text messages in a single pass; and

classify each of the text messages in accordance with the sentiment and/or emotion expressed by the text message in response to an aggregation of a combination of the learning based scoring and the non-learning based scoring of each of the one or more portions of the text message.

Description:

METHOD AND SYSTEM OF INTELLIGENT SEMTIMENT AND EMOTION SENSING WITH ADAPTIVE LEARNING

TECHNICAL FIELD

[0001] The present invention generally relates to text data analytics, such as social media analytics, and more particularly relates to a method and system for sentiment classification of text (e.g., social media text).

BACKGROUND OF THE DISCLOSURE

[0002] Social media has a vast amount of publicly available user-generated content, which offers merchants and organizations a larger, richer, closer-to-real-time data source of consumer insights than conventional means. Many customer-facing merchants and organizations are exploring the real business values of social media by seeking answers to important questions asked by marketing, product innovation, research and development (R&D), customer relations, public relations (PR) and branding practitioners. For example, sales and marketing managers need to make forecasts on the sales of new products. Product innovation and R&D directors need to understand consumer attitudes and preferences towards their products and services. Customer relationship managers and PR professionals need to detect any potential critical product/brand or service crisis early to devise risk-management strategies or capitalize on positive sentiments towards their brands.

[0003] Social media can be valuable in a number of application domains, but the adoption of only one sentiment classification method without an assurance of a sufficient level of accuracy may limit or bias prediction results. Therefore, despite the significant potential in harnessing consumer insights from social media, technical challenges still exist in finding an accurate yet cost-effective sentiment classification that is applicable to real-world multi-domain contexts.

[0004] Conventional learning-based methods cannot capture the complex structure of human language, especially in the context of social media data, while the existing non-learning-based, such as the knowledge-based method, lacks the learning capability, especially adaptive intelligent learning capability. Even though the existing fine-grained sentiment analysis algorithms has the capability of outputting finegrained emotion categories and outperforming other existing method, they still lack the intelligent sensing ability and also lack adaptive learning capability. This limits the applications of existing sensing technologies the intelligent analysis capability and adaptive learning capability needed for current industry applications.

[0005] Thus, what is needed is an efficient and accurate method and system for sentiment classification of text, such as social media data, utilizing intelligent features and learning capabilities. Furthermore, other desirable features and characteristics will become apparent from the subsequent detailed description and the appended claims, taken in conjunction with the accompanying drawings and this background of the disclosure.

SUMMARY

[0006] According to at least one embodiment of the present invention, a system for classifying text messages in accordance with sentiment and/or emotion expressed by the text messages is provided. The system includes a text decomposing module, a first learning based module, a non-learning based module, an aggregation module and a second learning based module. The text decomposing module decomposes each of the text messages into one or more portions. The first learning based module scores each portion of each of the text messages and the non-learning based module scores each portion of each of the text messages, wherein the learning based module and the non-learning based module score each portion of each of the text messages concurrently as each portion of each of the text messages is processed in a single pass. The aggregation module is coupled to the learning based module and the non-learning based module for classifying each of the text messages in accordance with the sentiment and/or emotion expressed by the text message in response to an aggregation of a combination of the learning based scoring and the non-learning based scoring of each of the one or more portions of the text message. The second learning based module is coupled to the aggregation module to learn and update knowledge for the non-learning based module and the first learning based module in response to the aggregation of the combination of the learning based scoring and the non-learning based scoring of each of the one or more portions of the text message.

[0007] According to another embodiment of the present invention, a method for classifying text messages in accordance with sentiment and/or emotion expressed by the text messages is provided. The method includes decomposing each of the text messages into one or more portions and performing learning based scoring of each portion of each of the text messages while concurrently performing non-learning based scoring of each portion of each of the text messages to process each portion of each of the text messages in a single pass. The method further includes classifying each of the text messages in accordance with the sentiment and/or emotion expressed by the text message in response to an aggregation of a combination of the learning based scoring and the non-learning based scoring of each of the one or more portions of the text message. [0008] According to a further embodiment of the present invention a system for handling ambivalence or hidden sarcasm in text messages is provided. The system includes a text decomposing module, a module for detecting ambivalence and/or hidden sarcasm in a portion of a text message, and an output module for combining the one or more portions of the text message. The text decomposing module decomposes each of the text messages into one or more portions and the output module handles a portion of the text message in response to the module for detecting ambivalence and/or hidden sarcasm detecting hidden sarcasm in the portion of the text message.

BRIEF DESCRIPTION OF THE DRAWINGS

[0009] The accompanying figures, where like reference numerals refer to identical or functionally similar elements throughout the separate views and which together with the detailed description below are incorporated in and form part of the specification, serve to illustrate various embodiments and to explain various principles and advantages in accordance with a present embodiment.

[0010] FIG. 1 depicts a block diagram of a data processing and analysis system in accordance with a present embodiment.

[0011] FIG. 2 depicts a block diagram of an intelligent sensing system of the data processing and analysis system of FIG. 1 in accordance with the present embodiment.

[0012] FIG. 3 depicts a block diagram of an obvious sarcasm detection module of the the intelligent sensing system of FIG. 2 in accordance with the present embodiment. [0013] FIG. 4 depicts a block diagram of an ambivalence handler with hidden sarcasm detection sub-module of the intelligent sensing system of FIG. 2 in accordance with the present embodiment.

[0014] FIG. 5 depicts a block diagram of a negation identification and question portion handling module of the intelligent sensing system of FIG. 2 in accordance with the present embodiment.

[0015] FIG. 6 depicts a block diagram of a first artificial intelligence (AI) learning- based module of the intelligent sensing system of FIG. 2 in accordance with the present embodiment.

[0016] FIG. 7 depicts a block diagram of a portion sensing further analysis sub- module of an aggregation module of the intelligent sensing system of FIG. 2 in accordance with the present embodiment.

[0017] FIG. 8 depicts a block diagram of an aggregation assembling sub-module of the aggregation module of the intelligent sensing system of FIG. 2 in accordance with the present embodiment.

[0018] And FIG. 9 depicts a block diagram of a second AI learning-based module of the intelligent sensing system of FIG. 2 in accordance with the present embodiment.

[0019] Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity and have not necessarily been depicted to scale.

DETAILED DESCRIPTION

[0020] The following detailed description is merely exemplary in nature and is not intended to limit the invention or the application and uses of the invention. Furthermore, there is no intention to be bound by any theory presented in the preceding background of the invention or the following detailed description. It is the intent of the present embodiment to present a system and method for drilling deep into text and automatically classify text messages into finer-grained domain specific sentiment categories such as positive, negative, neutral, and ambivalent categories (including mixed-positive, mixed-negative, and mixed-neutral) and to identify their prevailing emotion categories such as happiness, excitement, anger, sadness, and anxiety based on the context of the text thereby providing a better solution for business insight and enhancement from comments, reviews and other words in social media and other written text.

[0021] Besides the basic feature of the present embodiments, identifying finegrained sentiment as well as emotion patterns in textual data, the system and method of the present embodiments are designed with the following seven novel key technical features and capabilities:

a) the capability for adaptive learning;

b) the ability for ambivalence handling and sarcasm detection;

c) a unified classifier that combines non-learning scoring, such as knowledge- based and feature-based scoring, and learning-based scoring in one pass through the data;

d) the capability to classify both coarse-grained sentiments and fine-grained sentiments and emotions of a given text in accordance with requests of users; e) the capability of implementing an ambivalence and sarcasm handling module as a stand-alone module or integrated with other sentiment analysis tools; f) the capability of strong hybrid integration where components or processes can be integrated with components and/or processes into a strongly coupled system; and g) an automatic knowledge learning and updating unit for handling user inputs of ground truth labelled data with negation and question marks before feeding the data into the learning process.

These features and capabilities are discussed in accordance with a present embodiment presented in the description of figures herein.

[0022] The capability for adaptive learning is introduced in accordance with the present embodiment wherein the artificial intelligence (AI) learning-based modules are used not only as part of an adaptive classification mechanism, but are also used to adaptively continuously enhance the lexicon database and knowledge base of the system of the present embodiment. Thus, the adaptive learning not only can learn from a labelled data set as other learning-based systems, but can also perform adaptive learning in accordance with the present embodiment from a sensing processing without a ground truth training data set.

[0023] The ability of systems and methods in accordance with the present embodiments for ambivalence handling and sarcasm detection can not only handle obvious sarcasm by detecting sarcasm through a sarcasm indicator (such as "#sarcasm" in the tweet "I love going to work and then being done 3 hours later #sarcasm" ), they can also detect hidden sarcasm even though the text does not include any obvious sarcasm indicators. For example, while there is no sarcasm indicator in the text data "Your phone is broken down again, you are so lucky!", but the ambivalence handling and sarcasm detection in accordance with the present embodiments can detect the sarcasm in the text data.

[0024] The unified classifier that combines non-learning scoring, such as knowledge-based and feature-based scoring, and learning-based scoring in one pass of the data. In addition, a non-learning process in accordance with the present embodiment performs linguistic pattern recognition through linguistic analysis with feature relationship analysis. Therefore, the system and methods in accordance with the present embodiments not only has all the advantages of learning-based methods, but also has all the advantages of non-learning methods, including novel, robust non- learning methods having advantages over conventional non-learning method.

[0025] The capability to classify both coarse-grained as well as fine-grained sentiments and emotions of a given text according to the request of the users allows users to select output types in accordance with their needs. And the strong hybrid method provides a system that has advantages of non-learning-based methods (e.g. knowledge-based methods) and learning-based methods which is able to overcome the respective limitations of each of the methods thereby providing a strongly coupled system.

[0026] Referring to FIG. 1, a block diagram 100 depicts a data processing and analysis system in accordance with the present embodiment. The end-to-end text analysis system 102 advantageously demonstrates a real-world implementation of the intelligent sensing system 104 in accordance with the present embodiment. The system 102 receives social media or other textual data 106 as input by a user or listener 108 advantageously providing useful information for marketing research personnel, product suppliers, service providers and system integrators as the listener 108. The listener 108 input could be from various social media sources or other text data sources 106, including but not limited to sources from the Internet, such as Internet forums (e.g., HardwareZone and reddit), social networking websites (e.g., Twitter and Facebook), and weblogs (e.g., Blogger, Tumblr and WordPress).

[0027] The text is filtered 109 by a filter 110 to identify the text (i.e., filtered data 112) to be reviewed by the system 102. The filter 110 removes irrelevant text received from the listener 108 to provide the filtered data 112 for processing. Examples of irrelevant text could include advertisements, contents which do not include any comments on a product or a service, and other irrelevant content- specific textual data. In addition to the filter 110, one or more smart filters 114 could be designed according to requirements of the listener 108 and optionally provided to further filter the filtered data 112.

[0028] The filtered textual data is then provided to the intelligent system 104 for processing in accordance with the present embodiment. After processing, a profiler 116 organizes the data for reviewing by a result viewer module 118. The result viewer module 118 could be a based on a monitoring model to provide a consumer preference analysis, based on an alert model for detection of an anomaly or identification of a needed action, or based on a predictive model for time-series analysis for sales or other forecasting. Those skilled in the art will realize that the result viewer could be based on other models adapted to a customer's needs or requirements.

[0029] Both learning-based methods and non-learning-based methods (e.g., knowledge base methods) have their weaknesses and advantages. Learning-based methods typically require large training datasets to be effective, while non-learning- based methods have no learning capability. Conventional hybrid methods inherit the limitations of the learning-based methods such as requiring training datasets.

[0030] The intelligent sensing system 104 overcomes the limitations of the learning-based sensing methods as well as the limitations of non-learning based method by utilizing a novel and robust strong/tight hybrid method in which components or processes of different methods are integrated into components or processes of one another resulting in a strongly/tightly coupled system. [0031] Referring to FIG. 2, a block diagram 200 depicts the intelligent sensing system 104 in accordance with the present embodiment. The intelligent sensing system 104 includes an intelligent sentiment and emotion sensing section 202 and an automatic knowledge learning and updating section 204. The intelligent sentiment and emotion sensing module 202 includes a text clean-up module 206 which receives the filtered data 112 input and provides the cleaned-up text to an obvious sarcasm detection module 208.

[0032] The obvious sarcasm detection module 208 is shown in more detail in a block diagram 300 depicted in FIG. 3. The obvious sarcasm detection module 208 includes an expert knowledge-based system 302 coupled to a human expert input module 304 which allows users to input new rules or knowledge base to improve the performance of an obvious sarcasm detector 306. When the obvious sarcasm detector 306 detects no obvious sarcasm, the text is output to a text decompose and portion cleaning module 210 (FIG. 2). On the other hand, when the obvious sarcasm detector 306 detects obvious sarcasm, the text is output to a direct sensing analysis processing module 212. The obvious sarcasm detection module 208 is different from a hidden sarcasm detection sub-module in an ambivalence handler with hidden sarcasm detection module 214 as discussed later. In accordance with the present embodiment, the obvious sarcasm detection module 208 is an optional module designed to detect obvious sarcasm such as "I love that Arequipa just shuts the water off for a day #sarcasm", where there is an obvious sarcasm word indicator (i.e., #sarcasm).

[0033] It is well-known that typical communication relies mainly on body language and tone of voice. However, in social media, things become different from real life as communication relies mainly on text. For instance, the conversational style of sarcasm on Twitter uses hashtags like #sarcasm to avoid misunderstandings in social media interaction. In other words, the sarcastic tweet itself might not be understood by humans to be sarcasm if the hashtag is not present. For example, "I like this park. #sarcasm" and "mmmm\u2026 I love Applebee\u2019s. #sarcasm" would be understood as positive comments if the hashtags #sarcasm are not there. The tweets' owner intends these tweets to be negative. Other examples are:

i) Thanks to whoever just threw the bag of watermelon in the back of our truck.

# YouAreAppreciated #Sarcasm #NotCool

ii) Thank God my SGS3 has a removable battery... So I can take it out when the phone freezes. #sarcasm #android #Samsung #SGS3 #MyPhoneSucks iii) Yeah cheers for wishing me luck! Nice to know you care! #sarcasm iv) Thanks #Samsung for the #kitkat update. It's not like it was only several months wait. #sarcasm http://t.co/zHV6KvU4c2 v) Wow mom and dad, happy u show interest in what i say #sarcasm vi) Oh thanks iPhone, for programming this holiday into my calender so I wouldn't forget it. 'ppreciate that! #sarcasm vii) Yes cutting people off in traffic will get you to where you need to be so much faster #sarcasm viii) Shoutout to the calendar on iPhones reminding me that today is Flag Day.

#clutch #s arc asm

The obvious sarcasm detection module 208 is designed to handle such obvious sarcasm text to avoid misclassifying the opinion in such text post. This module is optional and the user can enable or disable it by using the human expert input module 304 through the expert knowledge base unit 308. In addition, the user can input new sarcasm indicators according to different data sources (e.g., "#not" for tweets). [0034] Online users may write a post which is sarcastic but without the label #sarcasm, such as "Her phone is broken down again, she is so lucky!" where the sarcasm is hidden in the tone and the ambivalence of the text. To detect such hidden sarcasm properly, one would have to figure out that the text owner meant the opposite of what he/she inputted. In accordance with the present embodiment, the ambivalence handler with sarcasm detection module 214 identifies hidden sarcasm by determining whether the text owner meant the opposite of what he/she inputted.

[0035] A linguistic processing module 216 in the direct sensing analysis processor 212 performs linguistic analysis, feature selection and relationship analysis of parsed separate portions text of each text message according to language structure, grammar as well as a knowledge base to derive the sentiment and emotion categories. The sentiment output of each separate portion of the message can be one of four categories, i.e., positive, negative, neutral and ambivalent (mixed) as shown in Table 1 below.

Table 1 - Details of the Definition for the Four Categories

[0036] The ambivalence handler with hidden sarcasm detection module 214 is designed to handle the "ambivalence" category in each text portion and sarcasm often appears as a type of ambivalent text. Even though each text portion can be classified into one of the four categories of sentiments (positive, negative, neutral and ambivalent), often the intended sentiment of text classified in the category of ambivalence can still be positive or negative as shown in Table 1. Often what seems to be a mixed sentiment of positivity and negativity in a sentence is actually primarily a positive or negative sentiment (such as in the example in Table 1: The sentence "The design of this brand is ok, but I dislike the colour and the price of it" is seemingly ambivalent, but it is actually negative). Therefore, the ambivalent sentence is converted to either a positive or a negative sentiment in response to the intended sentiment of the owner of the post. As a result, the four categories (positive, negative, neutral and ambivalent) are collapsed back into three categories (positive, negative and neutral).

[0037] In addition, some of the seemingly ambivalent texts are actually hidden sarcasms such as "Her phone of this brand is broken again, she is so lucky." In this example, there is an absence of a conjunction word or phrase which, when positioned between the positive and negative parts of the sentence, is the most common structure of expressing sarcasm. However, the comma placed between the two parts of the sentence with contrasting sentiments, but without a conjunction. This is a common structure used for designing rules for hidden sarcasm detection in accordance with the present embodiment. Sarcasm is different from irony in that while irony may be positive, sarcasm is usually negative.

[0038] For the handling of ambivalent categories without sarcasm, conjunctions such as "but," "however," "although," and "even though," are used to classify the intended positivity or negativity as shown in Table 1. Another common feature of sarcasm is that the sarcasm text creates a positive and negative contrast between a sentiment and a situation. Specifically, sarcastic text often expresses a positive sentiment or positive emotion in reference to a negative activity, state or a situation (e.g., "I love such an ugly brand!").

[0039] The hidden sarcasm detection rules are designed in based on the above structures of hidden sarcasm. Referring to FIG. 4, a block diagram 400 depicts the ambivalence handler with hidden sarcasm detection sub-module 214 in accordance with the present embodiment. According to sarcasm detection rules in an expert knowledge base 402 coupled to a human expert input module 404 which allows users to input new rules or knowledge base to improve the performance of a hidden sarcasm detection module 406, the hidden sarcasm detection module 406 analyses all the input information obtained from the linguistic analysis, feature selection and relationship analysis modules and other linguistic analysis modules 216 as shown in FIG. 2 to classify the sarcasm text.

[0040] Users can adjust and add new rules into the expert knowledge base unit 402 through the human expert input module 404 to further enhance the hidden sarcasm detection. The human expert input module 404 allows the user to select the mode of output to include either three categories (i.e., positive, negative, and neutral) or four categories (i.e., positive, negative, neutral and ambivalent) for each of the text portions.

[0041] The hidden sarcasm detection module 406 includes three submodules: a hidden sarcasm detector 410 (called "a hidden sarcasm detector" in order to differentiate it from the obvious sarcasm detector 306), a hidden sarcasm handler 412 and a non-sarcastic ambivalence handler 414. The hidden sarcasm detector 410, the hidden sarcasm handler 412 and the non-sarcastic ambivalence handler 414 are seamlessly connected with the expert knowledge base unit 402 which provides the sarcasm detection rules and ambivalence handler rules for use by the hidden sarcasm detector 410 and the non-sarcastic ambivalence handler 414, respectively. If sarcasm is detected by the hidden sarcasm detector 410, sarcasm handling is performed by the hidden sarcasm handler 412 in accordance with the sarcasm detection rules of the expert knowledge base unit 402. Otherwise, if no sarcasm is detected by the hidden sarcasm detector 410, the non-sarcastic ambivalence handler 414 will perform ambivalence handling in accordance with the ambivalence handler rules of the expert knowledge base unit 402.

[0042] Social media changes quickly every day. The sarcasm used in real-world social media may change and new social media platforms may emerge. Users of the system in accordance with the present embodiment can tune the knowledge base as well as the rules through the human expert input module 404.

[0043] Referring back to FIG. 2, text from the text decompose and portion cleaning module 210 is forwarded to a negation identification and question portion handling module 218 designed to prepare inputs for both the direct sensing analysis processing module 212 (when only one classified sentiment analysis results) as well as a learning-based AI classification module 220 (when M classified sentiment analysis results, where M = the number of classifiers used, each classifier outputting one sentiment analysis result). Referring to FIG. 5, a block diagram 500 depicts the negation identification and question portion handling module 218 in accordance with the present embodiment. The negation identification and question portion handling module 218 includes five submodules: a question portion detection sub-module 502, a negation detection sub-module 504, a question portion recording sub-module 506, a negation location recording sub-module 508 and an output decision sub-module 510. After the negation detection sub-module 504 finds all the negation words or items (such as "not", "don't", shouldn't") through matching with a knowledge base 512, the negation location recording sub-module 508 records the location of all the negation words or items and a data structure is created to hold the location indices of all found negation words or items for later processing. For example, the locations of the negation words (e.g., "not") are different in the two sentences: "I do not like it..." and "I like it not only for the design...." and the sentiments associate with these sentences are different. Similarly, the question portions are detected by the question portion detection sub-module 502 and recorded by the question portion recording sub-module 506.

[0044] The output decision sub-module 510 decides whether the portions from both the question portion recording sub-module 506 and the negation location recording sub-module 508 should be channelled to the direction sensing analysis processor 212 or the learning-based AI classification processor 220 in response to whether the text includes negation portion(s) and/or question portion(s). All text portions including text portions with questions and text portions with negation are channelled to the direction sensing analysis processor 212. All non-negation, non-question text portions are channelled to the learning-based AI classification processor 220.

[0045] The AI learning based module 220 is one of two AI learning based modules and performs functions of an AI classifier module. A second AI learning based module 222 works with the AI learning based module 220 to enable the intelligent sensing ability and adaptive learning ability. The novel and innovative intelligent sensing method in accordance with the present embodiment combines learning-based scoring by the two AI learning based modules 220, 222 with non-learning scoring, such as knowledge-based and feature-based scoring, in one pass of the data. The combined processing with both a learning process and a non-learning process starts at the negation identification and question portion handling module 218 (FIG. 2). The non-learning process is provided by the direct sensing analysis processing module 212 where, as indicated as a normal arrow emanating from the module 218, the textual data is provided for processing by the linguistic analysis and feature relationship analysis module 216 and then the ambivalence handler with hidden sarcasm detection sub-module 214. The AI learning based process proceeds simultaneously along another path emanating as a double arrow from the module 218 which goes through the AI learning-based module 220 to a portion sensing further analysis sub-module 224 of an aggregation module 223. Therefore, the system and methods in accordance with the present embodiments not only has all the advantages of learning based methods, but also has all the advantages of non-learning methods, including novel, robust non-learning methods having advantages over conventional non-learning methods.

[0046] The AI learning-based module 220 works together with the non-learning based sensing process to produce new ground truth data and feeds the obtained ground truth data to the automatic knowledge learning and updating section 204 through the portion sensing further analysis sub-module 224. In addition, in accordance with the present embodiment the AI learning-based module 220 supports the direct sensing analysis processing module 212 (i.e., the non-learning process) to produce the final valence of each portion of the text data by aggregating in the aggregation module 223 the learning based processed portion of the text data 232 and the non-learning based processed portion of the corresponding text data 230 to produce an output 234 of the intelligent sentiment and emotion sensing unit 202. The learning based processed portion of the text data 232 and the non-learning based processed portion of the corresponding text data 230 are processed through the portion sensing further analysis sub-module 224 in the aggregation module 223 and the non-learning based processed portion of the corresponding text data from the direct sensing analysis processing module 212 are further processed by a portion relationship analysis with topic and object identification module 228. An aggregation assembling sub-module 226 in the aggregation module 223 receives the inputs from both the portion relationship analysis with topic and object identification module 228 and the portion sensing further analysis sub-module 224 of the aggregation module 223, and performs aggregation assembling analysis.

[0047] Referring to FIG. 6, a block diagram 600 depicts the AI learning-based module 220 in accordance with the present embodiment. AI learning-based module 220 includes two submodules or process steps: a training classifiers sub-module 602 and a sensing analysis sub-module 604. The training classifiers sub-module 602 trains different learning based classifiers using ground truth data 606 and an existing knowledge base 608. Third party or open source training data sets 610 can also be used, if available. The sensing analysis sub-module 604 performs sensing analysis using different (trained) learning based classifiers (the learning-based classifiers can be Naive Bayes classifiers, Maximum Entropy, Support vector machine classifiers, or similar conventional or proprietary classifiers).

[0048] The output of the AI learning-based module 220 consists of the classified text portions from the various classifiers (e.g., suppose there are five classifiers used and each may classify the text portion differently, then the output will consist of the five classified sentiments for that text portion such as positive, positive, negative, negative, and positive).

[0049] As shown in FIG. 2, the portion sensing further analysis sub-module 224 in the aggregation module 223 receives a first input 230 from the non-learning sensing analysis process (one classified sentiment) and a second input 232 from the learning based AI classification process (as many sentiments as there are number of AI classifiers used (for example if M classifiers are used in the AI learning based module 220 then there are M classified outputs for each text portion). Referring to FIG. 7, a block diagram 700 depicts the portion sensing further analysis sub-module 224 in accordance with the present embodiment. The portion sensing further analysis sub- module 224 includes a consensus analysis for learning sub-module 702 and a consensus analysis for sensing sub-module 704. The consensus analysis for learning sub-module 702 aggregates the results from the non-learning analysis (1 classified sentiment) by the direct sensing analysis processing module 212 and the resulting classifiers from the AI learning based module 220 (M classified sentiments) and provides them to the AI learning based module 222 in the automatic knowledge learning and updating unit 204. The consensus analysis for sensing sub-module 704 aggregates only the resulting classifiers from the AI learning based module 220 and outputs the results to the aggregation assembling sub-module 226.

[0050] The aggregation process of the results from the learning based AI classifiers and the results from both the non-learning and learning based classifiers proceeds by considering a majority of agreement of the classified sentiments. In accordance with the present embodiment, a ratio of agreement is defined as the number of majority agreements divided by the total number of classified sentiments. For example, if there are five learning based AI classifiers used, the consensus analysis for learning sub-module 702 will receive the results 232 of the five learning based AI classifiers as well as the one result 230 from the non-learning bases analysis. Of these six results, the number of majority agreements divided by the total number of classified sentiments determines the ratio of agreement. For example, if the pattern is positive, negative, positive, positive, negative, positive, then the majority agreement is four positive and the ratio of agreement for positive is four divided by six.

[0051] The agreement ratio from the consensus analysis for learning sub-module 702 is compared with a threshold (where the default value is 1 or 100%) before the result is sent to the AI learning based module 222 with for processing by a negation and question mark handler thereof. The threshold above which the majority agreed sentiment is sent to the AI learning-based module 222 as the new obtained ground truth for learning is user selectable. By default, the threshold is 1 (i.e., all the results from the preceding modules must agree). This threshold can be tuned/selected by the user and the number of the learning based AI classifiers, M, can also be user adjusted depending on the number of suitable existing classifiers and available computational resources.

[0052] The ratio computed in the consensus analysis for sensing module 704 is not compared with the threshold and, instead, is sent directly to the aggregation assembling sub-module 226 in the aggregation module 223 for further processing.

[0053] Referring to FIG. 8, a block diagram 800 depicts the aggregation assembling sub-module 226 in the aggregation module 223 in accordance with the present embodiment. The aggregation assembling sub-module 226 receives input from both the portion relationship analysis with topic and object identification module 228 as well as the portion sensing further analysis sub-module 224. The aggregation assembling sub-module 226 includes a portion valence computing sub-module 802, an aggregation sum operation sub-module 804, a multi-portion ambivalence handler 806 and a sentiment and emotion further analysis sub-module 808, all of which are seamlessly connected with a human expert input module 810 through an expert knowledge base unit 812. The human expert input module 810 allows users to select desired output modes such as outputting two main sentiment categories (positive and negative only), outputting three main sentiment categories (positive, negative and neutral), outputting four main sentiment categories (positive, negative, neutral and mixed) or outputting six main sentiment categories (positive, negative, neutral, mixed positive, mixed negative and mixed neutral) via the output 234 (FIG. 2) of the intelligent sentiment and emotion sensing unit 202. When the user selects outputting two main sentiment categories or outputting three main sentiment categories, the multi-portion ambivalence handler 806 will be engaged. Otherwise, the multi-portion ambivalence handler 806 will be disabled.

[0054] The aggregation assembling sub-module 226 also allows a user to select the weightage of non-learning based processing (y%) and learning based AI processing ((100-y)%) for aggregating the classified sentiments from both these sources to compute the valence in the portion valence computing module 802. The portion valence computing module 802 will use y%=100% as a default setting in the absence of any user selection.

[0055] For a piece of text data item including multiple portions N (N≥ 1) , the aggregation sum operation submodule 804 performs a first summing operation to calculate the number of positive sentiment as shown in Equation (1):

and a second summing operation to calculate the negative sentiment as shown in Equation (2):

V ^~ =∑ v _j- (2) where V ⁺ is the total positive value obtained from all the positive portions, and where V ^~ is the total negative value obtained from all the negative portions, is the positive value of portion i; v is the negative value of the portion j; N ⁺ is the total number of the positive portions of the text data; N ^~ is the total number of the negative portions of the text data; N ⁺ + N ^~ + N° = N, where N° is the number of portions for which the valence values are neither positive nor negative.

[0056] The multi-portion ambivalence handler 806 is different from the non- sarcastic ambivalence handler submodule 414 in the ambivalence handler with hidden sarcasm detection module 214. The ambivalence handler with hidden sarcasm detection module 214 handles the ambivalence sentiment in one portion, whereas the multi-portion ambivalence handler 806 is designed to handle the ambivalence sentiment of a whole text data which can contain multiple text portions. The final output depends on the values of V ^~ and V ^~ as set forth in Equations (3) to (8):

If V ⁺ > V ^~ and V ^~ = 0, the output is positive (3)

If V ⁺ > V ^~ and V ^~≠ 0, the output is mixed-positive (4)

If V ^~ > V ⁺ and V ⁺ = 0, the output is negative (5)

If V ^~ > V ⁺ and V ⁺≠ 0, the output is mix-negative (6)

If V ^~ = V ⁺≠ 0, the output is mix-neutral (7)

If V ^~ = V ⁺ = 0, the output is neutral (8)

[0057] The aggregation assembling sub-module 226 will output via the output 234 the final sentiment analysis result according to users' requirements as shown in the Table 2 below:

Table 2 Outputs from the Aggregation Assembling Module

*user can make their own decision about this Mixed-Neutral category.

[0058] If users want only two outputs, positive and negative, they can treat neutral as a faulty classification or force the neutral category into either positive or negative according to their analysis. The default setting treats neutral as a faulty classification when the output mode is set to outputting only two main sentiment categories (positive and negative).

[0059] Considering that sentiment and emotion are closely related, a text portion with positive sentiment can never express negative emotions of anger and sadness. Therefore, the sentiment and emotion further analysis sub-module 808 is designed to perform further analysis on Sentiment and on Emotion before outputting final results via the output 234.

[0060] The greater the data volume, the greater the challenge will be for manual knowledge learning (manually inputting knowledge into the knowledge base). Thus, automatic analysis of social media content for automatic knowledge learning and updating becomes an urgently needed technology. In accordance with the present embodiment, an innovative approach which realizes automatic knowledge learning and updating is proposed in the automatic knowledge and learning and updating unit 204. The automatic knowledge and learning and updating unit 204 extracts knowledge from a set of user inputted ground truth labelled data 236 or other existing data, such as domain specific labelled data, and automatically extracts useful knowledge 238, updating that knowledge in a knowledge base 240. At the same time, the user inputted ground truth labelled data 236 is used to update a ground truth test database 242. The generated knowledge is updated online to support the main sentiment analysis engine, the intelligent sentiment and emotion sensing unit 202.

[0061] Referring to FIG. 9, a block diagram 900 depicts the AI learning module 222 in accordance with the present embodiment. The AI learning based module 222 includes a text pre-processing and different parts of Speech (POS) analysis sub- module 902, a special handler for negation and question mark handling 904, a knowledge extraction through learning sub-module 906 and a knowledge selection and updating sub-module 908. The text pre-processing and different parts of Speech (POS) analysis sub-module 902 basically aims to filter out noise from the input text. For example, in the sub-module 902, tags, urls and other unrecognized entities are removed from the source to minimize the impact on later analysis. Moreover, for languages like English in which a word may appear in different forms, the word is reduced to its original form so that the analysis load can be decreased. Part of speech (POS) tagging is a necessary step when performing sentiment analysis, as the part of speech of a word has a great impact on the word's sentiment polarity. In accordance with the present embodiment, different parts of speech are assigned different sentiment weights. For example, adjectives convey stronger sentiment information than verbs or nouns so adjectives are assigned larger sentiment weights. Verbs and nouns may also convey sentiment information from time to time. For example, the verb "love" and the noun "congratulations" are often associated with positive sentiment. However, to express the sentiment, adjectives play a significantly more dominant role than verbs and nouns. In light of the above, smaller sentiment weights to verbs and nouns.

[0062] For the automatic knowledge learning and updating unit 204 and the AI learning based module 222, inputs 244, 246 are received from both the intelligent sensing and emotion sensing unit 202 (the input 244 from the output of the portion sensing further analysis sub-module 224) and the users' input ground truth labelled data 236 (the input 246).

[0063] The special handler for negation and question mark handling 904 is designed to handle the text data from the users' input ground truth labelled data 236 received on the input 246 with negations or question marks before feeding them into the learning process. This special handler 904 addresses the issue that when a negation appears in a text, it works together with positive or negative words or phrases to produce a final sensing output of positive or negative sentiment opposite to the original sentiment. Similarly, texts with question marks pose problems for conventional knowledge learning. For example, in texts such as "Is this the best one?" and "Which one is a good brand?", people are asking questions rather than giving opinions. Therefore, the special handler for negation and question mark handling 904 filters away the data with question marks.

[0064] For the input 246 from the users' input ground truth labelled data 236, a learning-based method is selected to compute a polarity score for each word or phrase item occurring in the text. Before performing knowledge extraction by the learning method, the users' input ground truth labelled data 236 is tested by the system and, if the testing accuracy is higher than a threshold accuracy, the users' input ground truth labelled data 236 is used as ground truth data for the knowledge extraction through the knowledge extraction through learning sub-module 906, and the user can select the knowledge extraction mode and view the knowledge extracted before performing the knowledge updating by the knowledge selection and updating module 908 receiving input from a human expert input module 910. If the testing results are not good but the user would like to use such data to extract the knowledge 238, the extracted knowledge will be saved in a new domain to form a new user domain knowledge base 240.

[0065] For the input 244 from the the portion sensing further analysis sub-module 224 in the intelligent sensing and emotion sensing unit 202, since the data has been processed by both the direct sensing analysis process module 212 and the AI learning based module 220, there is no need for such testing, and the knowledge extraction through learning sub-module performs knowledge extraction directly. For the knowledge extraction through learning sub-module 906, any open sourced learning- based method which can extract knowledge from labelled datasets can be used.

[0066] Through the above learning process, a polarity value for each word and phrase item with different parts of speech that occurs in the text is computed. A sorting process based on the value in the ascending order is then carried out. The first few words at the top of the list are the most negative words for this domain and the last few words are the most positive ones. Users may handcraft their own strategies to select a word or phrase based on their own need. For example, a user may select the top and bottom k percent for their sentiment lexicon. Alternatively, the user may set a threshold where any word having a polarity value greater than the threshold is included in the sentiment dictionary for that particular domain.

[0067] The proposed method can utilize learning-based method through the AI learning based module 222 (AI learning) to extract new lexicons and n-grams from existing ground truth data. In addition, the AI learning based module 222 is also capable of continually identifying and collecting new ground truth data every time the system processes new test data through the AI learning based module 220 (AI classifiers). Thus, the system in accordance with the present embodiment has double the growth factor of a conventional system that merely relies on user supplied ground truth data to build its knowledge base. In the context of sentiment analysis, this provides a crucial advantage due to the unique difficulty in obtaining pre-labelled training data.

[0068] The social sentiment platform in accordance with the present embodiment is designed for use by a layman. The intelligent sentiment and emotion sensing unit 202 with the learning method embedded within the platform has the capabilities to collect, filter, classify, analyse and display a descriptive and predictive analytic dashboard for a given concept. The system with the above unique advantageous features enables users to understand the public voice over the Internet more effectively and accurately answering an urgent need in the industry.

[0069] Testing on publicly available social comments (which are downloaded from the website: htt : //www . g 1 as door. com/) was performed to validate the intelligent sensing system 104 in accordance with the present embodiment. More than five thousand entries were collected for each of the two sentiment categories - positive (Pros) and negative (Cons). The validation results show that the method can achieve better performance than current systems including the famous Stanford CoreNLP tool. A summary of the validation results are shown in Table 3.

Table 3. Preliminary results for analyzing public available social comments

[0070] Thus, it can be seen that the present embodiment provides an efficient and accurate method and system for sentiment classification of text, such as social media data, utilizing intelligent features and learning capabilities.

[0071] While exemplary embodiments have been presented in the foregoing detailed description of the invention, it should be appreciated that a vast number of variations exist. It should further be appreciated that the exemplary embodiments are only examples, and are not intended to limit the scope, applicability, operation, or configuration of the invention in any way. Rather, the foregoing detailed description will provide those skilled in the art with a convenient road map for implementing an exemplary embodiment of the invention, it being understood that various changes may be made in the function and arrangement of steps and method of operation described in the exemplary embodiment without departing from the scope of the invention as set forth in the appended claims.

Previous Patent: HIGH DENSITY MANUFACTURING CELL (HDMC) STRUCTURE OR THE LIKE

Next Patent: INFORMATION PROCESSING DEVICE