Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SYSTEM AND METHOD FOR AUTOMATIC VALIDATION
Document Type and Number:
WIPO Patent Application WO/2017/091825
Kind Code:
A1
Abstract:
A system and method for validating a transaction represented by an electronic document. The method includes: identifying, in the electronic document, at least one key field and at least one value; creating, based on the electronic document, a dataset, wherein the created dataset includes the at least one key field and the at least one value; analyzing the created dataset to determine at least one transaction parameter; creating a template for the transaction, wherein the template is a structured dataset including the determined at least one transaction parameter; and applying, in real-time, at least one rule to the created template to determine whether the transaction is validated.

Inventors:
GUZMAN NOAM (IL)
SAFT ISAAC (IL)
Application Number:
PCT/US2016/063828
Publication Date:
June 01, 2017
Filing Date:
November 28, 2016
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
VATBOX LTD (IL)
M&B IP ANALYSTS LLC (US)
International Classes:
G06F17/30; G06V30/224
Foreign References:
US7299408B12007-11-20
US20140079294A12014-03-20
US20150332283A12015-11-19
US5606609A1997-02-25
US20140344576A12014-11-20
Other References:
See also references of EP 3380958A4
Attorney, Agent or Firm:
BEN-SHIMON, Michael (US)
Download PDF:
Claims:
CLAIMS

What is claimed is:

1 . A method for validating a transaction represented by an electronic document, comprising:

identifying, in the electronic document, at least one key field and at least one value; creating, based on the electronic document, a dataset, wherein the created dataset includes the at least one key field and the at least one value;

analyzing the created dataset to determine at least one transaction parameter; creating a template for the transaction, wherein the template is a structured dataset including the determined at least one transaction parameter; and

applying, in real-time, at least one rule to the created template to determine whether the transaction is validated.

2. The method of claim 1 , further comprising:

determining, based on the created dataset, at least one entity identifier; and verifying an entity of the transaction based on the determined at least one entity identifier.

3. The method of claim 2, wherein the electronic document is a first electronic document, wherein the created dataset is a first dataset, further comprising:

obtaining, based on the first electronic document, a second electronic document; creating, based on the obtained second electronic document, a second dataset; and

comparing the first dataset to the second dataset, wherein the verification is based further on the comparison.

4. The method of claim 2, wherein the transaction is a reverse charge mechanism transaction, wherein the entity is a merchant enterprise entity, wherein verifying the entity further comprises: determining, based on the at least one entity identifier, whether the merchant enterprise entity is registered for reverse charge mechanism transactions.

5. The method off claim 1 , further comprising:

determining, based on at least one predetermined constraint, whether the created dataset is eligible for validation.

6. The method of claim 1 , further comprising:

generating a notification, wherein the notification indicates whether the transaction is validated.

7. The method of claim 1 , wherein identifying the at least one key field and the at least one value further comprises:

analyzing the electronic document to determine data in the electronic document; and

extracting, based on a predetermined list of key fields, at least a portion of the determined data, wherein the at least a portion of the determined data matches at least one key field of the predetermined list of key fields.

8. The method of claim 7, wherein analyzing the electronic document further comprises:

performing optical character recognition on the electronic document.

9. The method of claim 7, further comprising:

performing a cleaning process on the extracted at least a portion of the determined data.

10. The method of claim 7, further comprising:

checking if each piece of data of the extracted at least a portion of the determined data is completed; and for each piece of data that is not completed, performing at least one of: querying at least one external source, and correlating the determine data with data of at least one previously analyzed electronic document.

1 1 . A non-transitory computer readable medium having stored thereon instructions for causing a processing circuitry to perform a process, the process comprising:

identifying, in the electronic document, at least one key field and at least one value; creating, based on the electronic document, a dataset, wherein the created dataset includes the at least one key field and the at least one value;

analyzing the created dataset to determine at least one transaction parameter; creating a template for the transaction, wherein the template is a structured dataset including the determined at least one transaction parameter; and

applying, in real-time, at least one rule to the created template to determine whether the transaction is validated.

12. A system for validating a transaction represented by an electronic document, comprising:

a processing circuitry; and

a memory, the memory containing instructions that, when executed by the processing circuitry, configure the system to:

identify, in the electronic document, at least one key field and at least one value; create, based on the electronic document, a dataset, wherein the created dataset includes the at least one key field and the at least one value;

analyze the created dataset to determine at least one transaction parameter; create a template for the transaction, wherein the template is a structured dataset including the determined at least one transaction parameter; and

apply, in real-time, at least one rule to the created template to determine whether the transaction is validated.

13. The system of claim 12, wherein the system is further configured to:

determine, based on the created dataset, at least one entity identifier; and verify an entity of the transaction based on the determined at least one entity identifier.

14. The system of claim 13, wherein the electronic document is a first electronic document, wherein the created dataset is a first dataset, wherein the system is further configured to:

obtain, based on the first electronic document, a second electronic document; create, based on the obtained second electronic document, a second dataset; and compare the first dataset to the second dataset, wherein the verification is based further on the comparison.

15. The system of claim 13, wherein the transaction is a reverse charge mechanism transaction, wherein the entity is a merchant enterprise entity, wherein the system is further configured to:

determine, based on the at least one entity identifier, whether the merchant enterprise entity is registered for reverse charge mechanism transactions.

16. The system of claim 12, wherein the system is further configured to:

determine, based on at least one predetermined constraint, whether the created dataset is eligible for validation.

17. The system of claim 12, wherein the system is further configured to:

generate a notification, wherein the notification indicates whether the transaction is validated.

18. The system of claim 12, wherein the system is further configured to:

analyze the electronic document to determine data in the electronic document; and extract, based on a predetermined list of key fields, at least a portion of the determined data, wherein the at least a portion of the determined data matches at least one key field of the predetermined list of key fields.

19. The method of claim 18, wherein the system is further configured to: perform optical character recognition on the electronic document.

20. The system of claim 18, wherein the system is further configured to:

perform a cleaning process on the extracted at least a portion of the determined data.

21 . The system of claim 18, wherein the system is further configured to:

check if each piece of data of the extracted at least a portion of the determined data is completed; and

for each piece of data that is not completed, perform at least one of: querying at least one external source, and correlating the determine data with data of at least one previously analyzed electronic document.

Description:
SYSTEM AND METHOD FOR AUTOMATIC VALIDATION

CROSS-REFERENCE TO RELATED APPLICATIONS

[001] This application claims the benefit of U.S. Provisional Application No. 62/260,553 filed on November 29, 2015, and of U.S. Provisional Application No. 62/261 ,355 filed on December 1 , 2015, the contents of which are hereby incorporated by reference.

TECHNICAL FIELD

[002] The present disclosure relates generally to data analysis, and more particularly to validation of data.

BACKGROUND

[003] As businesses increasingly rely on technology to manage data related to operations, suitable systems for properly managing and validating data have become crucial to success. Particularly for large businesses, the amount of data utilized daily by businesses can be overwhelming. Accordingly, manual review and validation of such data is impractical, at best. In addition to normal sales data, businesses in countries where value- added taxes (VATs) are applied collect and utilize even more data, thereby raising additional potential points of failure.

[004] The challenges facing customers seeking a refund and, in particular, seeking VAT refunds, may result in customers becoming discouraged and failing to follow through on obtaining their refunds. This issue is further compounded when the customer is an employee of an enterprise because the customer is not directly benefiting from the refund. Moreover, employees may submit irrelevant or duplicate information that is unnecessary for seeking refunds. Filtering through such unnecessary information may be time- consuming, costly, and subject to a large degree of human error.

[005] Moreover, business-to-business (B2B) transactions typically raise further tax issues, especially when VATs are involved. Previously, businesses engaging in B2B transactions with businesses in other countries needed to seek separate registrations and payments for VATs. To address these challenges, the reverse charge mechanism (RCM) was developed by the European Union. The RCM allows a first party of a transaction to charge the other party only for the actual cost of the transaction, i.e., excluding VAT charges. However, to utilize the RCM, businesses must collect and report various data. Manual collection and use of the data is resource-intensive and error-prone. Further, validation of RCM data is inefficient.

[006] Additionally, many existing solutions for validating based on data face challenges in obtaining the data required for validation. Specifically, existing solutions typically require either structured data or data that otherwise adheres to particular format requirements (e.g., a required size of a scanned image, markings indicating validation-related information, etc.) in order to identify and utilize the data. Such existing solutions face challenges when appropriately formatted data cannot be obtained. In particular, such existing solutions may be unable to complete validation or may return inaccurate results (e.g., false positives and false negatives) when required data cannot be properly identified. Such issues may occur when, for example, data is unstructured or semi- structured, or when data is at least partially structured but in an unrecognizable format so as to effectively render the data unstructured for analysis purposes. These issues are compounded when multiple sets of data requirements (e.g., requirements for VAT refunds and for RCM validation) must be met. Manual checking to account for these issues is inefficient and introduces new potential sources of error.

[007] It would therefore be advantageous to provide a solution that would overcome the deficiencies of the prior art.

SUMMARY

[008] A summary of several example embodiments of the disclosure follows. This summary is provided for the convenience of the reader to provide a basic understanding of such embodiments and does not wholly define the breadth of the disclosure. This summary is not an extensive overview of all contemplated embodiments, and is intended to neither identify key or critical elements of all embodiments nor to delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more embodiments in a simplified form as a prelude to the more detailed description that is presented later. For convenience, the term "some embodiments" may be used herein to refer to a single embodiment or multiple embodiments of the disclosure. [009] Certain embodiments disclosed herein include a method for validating a transaction represented by an electronic document. The method comprises: identifying, in the electronic document, at least one key field and at least one value; creating, based on the electronic document, a dataset, wherein the created dataset includes the at least one key field and the at least one value; analyzing the created dataset to determine at least one transaction parameter; creating a template for the transaction, wherein the template is a structured dataset including the determined at least one transaction parameter; and applying, in real-time, at least one rule to the created template to determine whether the transaction is validated.

[0010] Certain embodiments disclosed herein also include a non-transitory computer readable medium having stored thereon instructions for causing a processing circuitry to perform a process, the process comprising: identifying, in the electronic document, at least one key field and at least one value; creating, based on the electronic document, a dataset, wherein the created dataset includes the at least one key field and the at least one value; analyzing the created dataset to determine at least one transaction parameter; creating a template for the transaction, wherein the template is a structured dataset including the determined at least one transaction parameter; and applying, in real-time, at least one rule to the created template to determine whether the transaction is validated.

[0011] Certain embodiments disclosed herein also include a system for validating a transaction represented by an electronic document. The system comprises: a processing circuitry; and a memory, the memory containing instructions that, when executed by the processing circuitry, configure the system to: identify, in the electronic document, at least one key field and at least one value; create, based on the electronic document, a dataset, wherein the created dataset includes the at least one key field and the at least one value; analyze the created dataset to determine at least one transaction parameter; create a template for the transaction, wherein the template is a structured dataset including the determined at least one transaction parameter; and apply, in real-time, at least one rule to the created template to determine whether the transaction is validated.

BRIEF DESCRIPTION OF THE DRAWINGS

[0012] The subject matter disclosed herein is particularly pointed out and distinctly claimed in the claims at the conclusion of the specification. The foregoing and other objects, features, and advantages of the disclosed embodiments will be apparent from the following detailed description taken in conjunction with the accompanying drawings.

[0013] Figure 1 is a network diagram utilized to describe the various disclosed embodiments.

[0014] Figure 2 is a schematic diagram of a validation system according to an embodiment.

[0015] Figure 3 is a flowchart illustrating a method for validating data according to an embodiment.

[0016] Figure 4 is a flowchart illustrating a method for creating a dataset based on at least one electronic document according to an embodiment.

[0017] Figure 5 is a flowchart illustrating a method for verifying an entity of a transaction according to an embodiment.

[0018] Figure 6 is a flowchart illustrating a method for validating a transaction based on a first dataset and a second dataset according to an embodiment.

DETAILED DESCRIPTION

[0019] It is important to note that the embodiments disclosed herein are only examples of the many advantageous uses of the innovative teachings herein. In general, statements made in the specification of the present application do not necessarily limit any of the various claimed embodiments. Moreover, some statements may apply to some inventive features but not to others. In general, unless otherwise indicated, singular elements may be in plural and vice versa with no loss of generality. In the drawings, like numerals refer to like parts through several views.

[0020] The various disclosed embodiments include a method and system for automatically validating transactions. In an embodiment, a dataset is created based on at least one electronic document. In an optional embodiment, a consumer enterprise indicated in the dataset is verified. The dataset is analyzed to determine if a transaction indicated in the dataset is eligible for validation and, if so, a template of transaction attributes is created. At least one rule is applied to the created template to determine if requirements for validation are met. A notification indicating whether the transaction has been validated may be generated.

[0021] Fig. 1 shows an example network diagram 100 utilized to describe the various disclosed embodiments. In the example network diagram 100, a validation system 120, a consumer enterprise system 130, a plurality of databases 140-1 through 140-N (hereinafter referred to individually as a database 140 and collectively as databases 140, merely for simplicity purposes), and a merchant enterprise system 150, are communicatively connected via a network 1 10. The network 1 10 may be, but is not limited to, a wireless, cellular or wired network, a local area network (LAN), a wide area network (WAN), a metro area network (MAN), the Internet, the worldwide web (WWW), similar networks, and any combination thereof.

[0022] The consumer enterprise system 130 is associated with a consumer entity, and may store data related to purchases made by the consumer entity as well as data related to the consumer entity itself. The consumer entity may be, but is not limited to, a business that purchases goods or services in a business-to-business (B2B) transaction. The merchant enterprise system 150 is associated with a merchant entity, and may store data related to sales made by the merchant entity as well as data related to the merchant entity itself. The merchant entity may be, but is not limited to, a business that sold goods or services in a B2B transaction. Each of the consumer entity system 130 and the merchant entity system 150 may be, but is not limited to, a server, a database, an enterprise resource planning system, a customer relationship management system, or any other system storing relevant data.

[0023] The data stored by each of the consumer enterprise system 130 and the merchant enterprise system 150 may include, but is not limited to, electronic documents (e.g., an image file showing, for example, a scan of an invoice, a text file, a spreadsheet file, etc.). Data included in the electronic document may be structured, semi-structured, unstructured, or a combination thereof. The structured or semi-structured data may be in a format that is not recognized by the validation system 120 and, therefore, may be treated like unstructured data.

[0024] The databases 140 store at least validation data utilized for validating transactions.

The data may include, but is not limited to, requirements for validating data, data related to transactions, and the like. As a non-limiting example, the databases 140 may store regulatory requirements related to B2B transactions (e.g., requirements for validating reverse charge mechanism (RCM) transactions). The requirements may be stored in the form of, for example, rules. [0025] In an embodiment, the validation system 120 is configured to create datasets based on electronic documents including data at least partially lacking a known structure (e.g., unstructured data, semi-structured data, or structured data having an unknown structure). To this end, the validation system 120 may be further configured to utilize optical character recognition (OCR) or other image processing to determine data in the electronic document.

[0026] In an embodiment, the validation system 120 is configured to analyze the created datasets to identify transaction parameters related to transactions related to the electronic documents. In some embodiments, the validation system 120 may be configured to verify an entity indicated in the created datasets. In another embodiment, the validation system 120 may be configured to determine whether the created datasets are eligible for validation based on, e.g., whether the dataset meets at least one predetermined constraint.

[0027] In an embodiment, the validation system 120 is configured to create a template based on the created datasets. The template is a structured dataset including the identified transaction parameters. In a further embodiment, the validation system 120 is configured to apply, in real-time, at least one rule to the created template to determine if requirements for validation are met by the transaction.

[0028] In another example implementation, the validation system 120 may be further configured to compare information from a first electronic document with information from a second electronic document to determine whether the information in the first electronic document is accurate. As a non-limiting example, information from an invoice for a business transaction and information from a waybill associated with an item sold during the business transaction may be compared. The validation system 120 may be further configured to utilize optical character recognition (OCR) to determine at least a portion of the information to be compared. In yet another example implementation, determining whether the data is validated may include determining whether the first electronic document is complete.

[0029]The validation system 120 may be configured to generate a notification based on the validation determination. The validation system 120 may be further configured to send the notification to, for example, a user device 160 communicatively connected to the network 1 10. The user device (UD) 160 may be, but is not limited to, a personal computer, a laptop, a tablet computer, a smartphone, a wearable computing device, or any other device capable of receiving and displaying notifications. The user device 160 may be, for example, a user device of a business engaged in a transaction that wishes to know whether the transaction is validated.

[0030] It should be noted that the embodiments described herein above with respect to Fig.

1 are described with respect to one consumer enterprise system 130 and one merchant enterprise system 150 merely for simplicity purposes and without limitation on the disclosed embodiments. Multiple consumer enterprise systems, multiple merchant enterprise systems, or both, may be equally utilized without departing from the scope of the disclosure.

[0031] Fig. 2 is an example schematic diagram of the validation system 120 according to an embodiment. The validation system 120 includes a processing circuitry 410 coupled to a memory 215, a storage 220, and a network interface 240. In an optional embodiment, the validation system 120 may include an optical character recognition (OCR) processor 230. In another embodiment, the components of the validation system 120 may be communicatively connected via a bus 250.

[0032]The processing circuitry 210 may be realized as one or more hardware logic components and circuits. For example, and without limitation, illustrative types of hardware logic components that can be used include field programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), Application-specific standard products (ASSPs), system-on-a-chip systems (SOCs), general-purpose microprocessors, microcontrollers, digital signal processors (DSPs), and the like, or any other hardware logic components that can perform calculations or other manipulations of information.

[0033]The memory 215 may be volatile (e.g., RAM, etc.), non-volatile (e.g., ROM, flash memory, etc.), or a combination thereof. In one configuration, computer readable instructions to implement one or more embodiments disclosed herein may be stored in the storage 220.

[0034] In another embodiment, the memory 215 is configured to store software. Software shall be construed broadly to mean any type of instructions, whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise. Instructions may include code (e.g., in source code format, binary code format, executable code format, or any other suitable format of code). The instructions, when executed by the one or more processors, cause the processing circuitry 210 to perform the various processes described herein. Specifically, the instructions, when executed, cause the processing circuitry 210 to perform an on-demand authorization of access to protected resources, as discussed hereinabove.

[0035] The storage 220 may be magnetic storage, optical storage, and the like, and may be realized, for example, as flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVDs), or any other medium which can be used to store the desired information.

[0036] The OCR processor 230 may include, but is not limited to, a feature and/or pattern recognition unit (RU) 235 configured to identify patterns, features, or both, in unstructured data sets. Specifically, in an embodiment, the OCR processor 230 is configured to identify at least characters in the unstructured data. The identified characters may be utilized to create a validation dataset including data required for validation of a transaction.

[0037]The network interface 240 allows the validation system 120 to communicate with the consumer enterprise system 130, the databases 140, the merchant enterprise system 150, or a combination of, for the purpose of, for example, collecting metadata, retrieving data, and the like. Additionally, the network interface 240 allows the validation system 120 to communicate with the user device 160 in order to send notifications regarding validation of data.

[0038] It should be understood that the embodiments described herein are not limited to the specific architecture illustrated in Fig. 2, and other architectures may be equally used without departing from the scope of the disclosed embodiments.

[0039] Fig. 3 is an example flowchart 300 illustrating a method for validating data according to an embodiment. In an embodiment, the method may be performed by a validation system (e.g., the validation system 120). In a further embodiment, the validation may be validation of a transaction (e.g., a transaction involving the reverse charge mechanism) between a consumer entity (e.g., a buyer business) and a merchant entity (e.g., a seller business). [0040] At S310, a dataset is created based on at least one electronic document including information related to a transaction. Each of the at least one electronic document may include, but is not limited to, unstructured data, semi-structured data, structured data with structure that is unanticipated or unannounced, or a combination thereof. In an embodiment, S310 may further include analyzing the electronic document using optical character recognition (OCR) to determine data in the electronic document, identifying key fields in the data, identifying values in the data, or a combination thereof. Creating datasets based on electronic documents is described further herein below with respect to Fig. 4.

[0041] At S320, the created dataset is analyzed. In an embodiment, analyzing the dataset may include, but is not limited to, determining transaction parameters such as, but not limited to, at least one entity identifier (e.g., a consumer enterprise identifier, a merchant enterprise identifier, or both), information related to the transaction (e.g., a date, a time, a price, a type of good or service sold, etc.), or both. In a further embodiment, analyzing the dataset may also include identifying the transaction based on the dataset.

[0042] At optional S330, any entity indicated in the created dataset may be verified. In an example, the entity to be verified may be a merchant enterprise of the transaction. In an embodiment, the verification is based on the determined at least one entity identifier. In a further embodiment, S330 may include searching in at least one database based on the at least one identifier. Verifying entities of transactions based on datasets created using electronic documents dataset is described further herein below with respect to Fig. 5.

[0043] At optional S340, it is determined, based on the analysis, whether the created dataset is eligible for validation and, if so, execution continues with S350; otherwise, execution terminates. In an embodiment, S340 may include determining whether the created dataset meets at least one predetermined constraint. A dataset may be eligible for validation if, e.g., the dataset meets the at least one predetermined constraint. The at least one predetermined constraint may include, but is not limited to, requirements on types of information needed for validation, accuracy requirements, or a combination thereof. For example, if an electronic document does not include a country for the merchant enterprise in a transaction or a price of the transaction, validation may not be possible. Determining whether the transaction is eligible for validation may reduce use of computing resources by only validating datasets meeting minimum requirements.

[0044] In another embodiment, S340 may further include determining at least one constraint based on the created dataset. In a further embodiment, determining the at least one constraint may include searching in at least one database based on the created dataset (e.g., using a location of the merchant enterprise indicated in the created dataset). In yet a further embodiment, S340 may also include analyzing at least one reporting requirements electronic document (e.g., a VAT reclaim form) to determine the at least one constraint. The analysis may further include performing OCR or other image processing on each reporting requirements electronic document.

[0045] In another embodiment, when it is determined that the transaction is not eligible for validation, replacement information may be retrieved from at least one data source and included in the created dataset. In a further embodiment, upon retrieving the replacement information, execution continues with S350. In another embodiment, upon retrieving the replacement information, it is determined whether the dataset with the replacement information is eligible for validation and, if so, execution continues with S350; otherwise, execution terminates.

[0046] At S350, a template of the transaction is created. The template may be, but is not limited to, a data structure including a plurality of fields. The fields may include the identified transaction parameters. The fields may be predefined.

[0047] Creating templates from electronic documents allows for faster processing due to the structured nature of the created templates. For example, query and manipulation operations may be performed more efficiently on structured datasets than on datasets lacking such structure. Further, organizing information from electronic documents into structured datasets, the amount of storage required for saving information contained in electronic documents may be significantly reduced. Electronic documents are often images that require more storage space than datasets containing the same information. For example, datasets representing data from 100,000 image electronic documents can be saved as data records in a text file. A size of such a text file would be significantly less than the size of the 100,000 images. [0048] At S360, at least one rule is applied, in real-time, to the created template to determine if requirements for validation are met. In an embodiment, S360 includes retrieving the at least one rule from at least one database (e.g., a database of a regulatory authority that establishes requirements for reverse charge mechanism transactions or other transactions). In a further embodiment, the at least one rule may be retrieved based on at least a portion of the transaction parameters. Applying the at least one rule in real-time allows for automatic application of current rules for transaction validation, thereby increasing accuracy.

[0049] At optional S370, a notification may be generated. The notification may indicate the results of the application of the at least one rule (i.e., whether the transaction is validated).

[0050] Fig. 4 is an example flowchart S310 illustrating a method for creating a dataset based on at least one electronic document according to an embodiment.

[0051] At S410, the at least one electronic document is obtained. Obtaining each electronic document may include, but is not limited to, receiving the electronic document (e.g., receiving a scanned image) or retrieving the electronic document (e.g., retrieving the electronic document from a consumer enterprise system, a merchant enterprise system, or a database).

[0052] At S420, the electronic document is analyzed. The analysis may include, but is not limited to, using optical character recognition (OCR) to determine characters in the electronic document.

[0053] At S430, based on the analysis, key fields and values in the electronic document are identified. The key field may include, but are not limited to, merchant's name and address, date, currency, good or service sold, a transaction identifier, an invoice number, and so on. An electronic document may include unnecessary details that would not be considered to be key values. As an example, a logo of the merchant may not be required and, thus, is not a key value. In an embodiment, a list of key fields may be predefined, and pieces of data that may match the key fields are extracted. Then, a cleaning process is performed to ensure that the information is accurately presented. For example, if the OCR would result in a data presented as "121 1212005", the cleaning process will convert this data to 12/12/2005. As another example, if a name is presented as "Mo$den", this will change to "Mosden". The cleaning process may be performed using external information resources, such as dictionaries, calendars, and the like.

[0054] In a further embodiment, it is checked if the extracted pieces of data are completed.

For example, if the merchant name can be identified but its address is missing, then the key field for the merchant address is incomplete. An attempt to complete the missing key filed values is performed. This attempt may include querying external systems and databases, correlation with previously analyzed invoices, or a combination thereof. Examples for external systems and databases may include business directories, Universal Product Code (UPC) databases, parcel delivery and tracking systems, and so on. In an embodiment, S430 results in a complete set of the predefined key fields and their respective values.

[0055] At S440, a structured dataset is generated. The generated dataset includes the identified key fields and values.

[0056] Fig. 5 is an example flowchart S330 illustrating a method for verifying an entity of a transaction based on a dataset created using an electronic document according to an embodiment. In an example, the entity to be verified may be a merchant enterprise of a transaction. In such an example, the verification may be utilized to check that the merchant enterprise is registered for reverse charge mechanism transactions (e.g., if the merchant enterprise is VAT registered in a particular country, if the merchant enterprise is European Union certified, etc.). If the merchant enterprise is not registered for a reverse charge transactions, then any reverse charge mechanism transaction including the merchant enterprise will be determined to not be valid regardless of other requirements for validation.

[0057] At S510, at least one entity identifier in the dataset is determined. Each entity identifier indicates a respective entity. The at least one entity identifier may be, but is not limited to, a name of the entity, a location of the entity (e.g., an address), or both. In an embodiment, the at least one entity is any entity required in order to identify the transaction (e.g., a buying entity, a selling entity, a shipping entity, etc.). The at least one entity may be predetermined. As a non-limiting example, for a RCM transaction, the at least one entity may include a consumer enterprise system and a merchant enterprise system. [0058] At S520, at least one database is searched using the at least one entity identifier. The at least one database may include, but is not limited to, databases storing entity identifiers of registered entities. In an embodiment, S520 may include determining the at least one database to be searched based on a type of the transaction to be validated.

[0059]At S530, based on the search, it is determined whether the entity is verified. In an embodiment, the entity is verified if the entity identifier was found in any of the at least one database.

[0060] Fig. 6 is an example flowchart 600 illustrating a method for validating a transaction based on a first dataset and a second dataset according to an embodiment. In an example implementation, the method includes validating based on a first electronic document including a scanned image of an invoice for a transaction and a second electronic document including a scanned image of a waybill related to the transaction. The data from the second electronic document may be utilized to verify that the data from the first electronic document is accurate.

[0061] At S610, a first dataset is created based on a first electronic document related to a transaction. The first electronic document may include, but is not limited to, unstructured data, semi-structured data, structured data with structure that is unanticipated or unannounced, or a combination thereof. In an embodiment, S610 may further include analyzing the first electronic document using optical character recognition (OCR), identifying key fields in the data, identifying values in the data, or a combination thereof. Creating datasets based on electronic documents is described further herein above with respect to Fig. 4.

[0062] At optional S620, an entity indicated in the first electronic document may be verified.

In an example, the entity to be verified may be a consumer enterprise of the transaction. In an embodiment, the verification is based on the determined at least one identifier. In a further embodiment, S620 may include searching in at least one database based on the at least one identifier. Verifying entities of transactions based on datasets created using electronic documents dataset is described further herein above with respect to Fig. 5.

[0063]At S630, based on the first electronic document, a second electronic document is obtained. In an embodiment, S630 may include searching, based on the first dataset, in at least one database for an electronic document that is related to the first electronic document. The searching may be further based on metadata of the second electronic document, data in the second electronic document, or both. In a further embodiment, the second electronic document may be related to the first electronic document if the metadata and data of the second electronic document match the first dataset above a predetermined threshold.

[0064] At S640, a second dataset is created based on the obtained second electronic document. In an embodiment, S640 may further include analyzing the second electronic document using optical character recognition (OCR), identifying key fields in the data, identifying values in the data, or a combination thereof. Creating datasets based on electronic documents is described further herein above with respect to Fig. 4.

[0065] At optional S650, it is determined whether the first dataset is eligible for validation and, if so, execution continues with S660; otherwise, execution terminates. In an embodiment, S650 includes comparing the first dataset to the second dataset. In a further embodiment, the first dataset is determined to be accurate if the first dataset matches the second dataset above a predetermined threshold. In yet a further embodiment, S650 may include analyzing the first dataset to determine at least one transaction parameter, where the comparison is based on the at least one transaction parameter.

[0066] In an embodiment, S650 may further include determining whether the first dataset meets at least one predetermined constraint. A dataset may be eligible for validation if, e.g., the dataset meets the at least one predetermined constraint. The at least one predetermined constraint may include, but is not limited to, requirements on types of information needed for validation, accuracy requirements, or a combination thereof. For example, if an electronic document does not include a country for the consumer enterprise in a transaction or a price of the transaction, validation may not be possible. Determining whether the transaction is eligible for validation may reduce use of computing resources by only validating datasets meeting minimum requirements.

[0067] In another embodiment, when it is determined that the transaction is not eligible for validation, replacement information may be retrieved from at least one data source and included in the first dataset. In a further embodiment, upon retrieving the replacement information, execution continues with S660. In another embodiment, upon retrieving the replacement information, it is determined whether the first dataset with the replacement information is eligible for validation and, if so, execution continues with S660; otherwise, execution terminates.

[0068] At S660, a template of the transaction is created. The template may be, but is not limited to, a structured dataset including the identifier transaction parameters.

[0069] At S670, at least one rule is applied, in real-time, to the created template to determine if requirements for validation are met. In an embodiment, S670 includes retrieving the at least one rule from at least one database (e.g., a database of a regulatory authority that establishes requirements for reverse charge mechanism transactions or other transactions). Applying the at least one rule in real-time allows for automatic application of current rules for transaction validation, thereby increasing accuracy.

[0070] At optional S680, a notification may be generated. The notification may indicate the results of the application of the at least one rule (i.e., whether the transaction is validated).

[0071] As used herein, the phrase "at least one of" followed by a listing of items means that any of the listed items can be utilized individually, or any combination of two or more of the listed items can be utilized. For example, if a system is described as including "at least one of A, B, and C," the system can include A alone; B alone; C alone; A and B in combination; B and C in combination; A and C in combination; or A, B, and C in combination.

[0072] The various embodiments disclosed herein can be implemented as hardware, firmware, software, or any combination thereof. Moreover, the software is preferably implemented as an application program tangibly embodied on a program storage unit or computer readable medium consisting of parts, or of certain devices and/or a combination of devices. The application program may be uploaded to, and executed by, a machine comprising any suitable architecture. Preferably, the machine is implemented on a computer platform having hardware such as one or more central processing units ("CPUs"), a memory, and input/output interfaces. The computer platform may also include an operating system and microinstruction code. The various processes and functions described herein may be either part of the microinstruction code or part of the application program, or any combination thereof, which may be executed by a CPU, whether or not such a computer or processor is explicitly shown. In addition, various other peripheral units may be connected to the computer platform such as an additional data storage unit and a printing unit. Furthermore, a non-transitory computer readable medium is any computer readable medium except for a transitory propagating signal.

73] All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the principles of the disclosed embodiment and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions. Moreover, all statements herein reciting principles, aspects, and embodiments of the disclosed embodiments, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure.