CURIC ADRIAN (SG)
MEIJIA GABRIEL (SG)
HUANG JUAN (SG)
CN109977287B | 2021-02-02 | |||
US10776888B1 | 2020-09-15 | |||
US20190156442A1 | 2019-05-23 | |||
US20180081975A1 | 2018-03-22 |
Claim 1. A system for generating deduplicated property listing from a plurality of property listings, the system comprising: a processor, and a memory in communication to the processor for storing instructions executable by the processor, wherein the processor is configured to: retrieve the plurality of property listings, identify a plurality of matching property listings from the plurality of property listings that matches each other based on at least one criterion to form a plurality of matched property listings, group the plurality of matched property listings to form a consolidated property record, and generate a deduplicated property listing based on the consolidated property record. 2. The system according to claim 1, wherein the at least one criterion comprises location data, wherein the plurality of matching property listings are matched based on the location data. 3. The system according to claim 1 or 2, wherein the at least one criterion further comprises a property configuration data, wherein the plurality of matching property listings are matched based on the property configuration data. 4. The system according to any one of claims 1 to 3, wherein the at least one criterion further comprises a property size data, wherein the plurality of matching property listings are matched based on the property size data. 5. The system according to claim 4, wherein the processor is configured to receive a threshold size tolerance, wherein the plurality of matching property listings are matched if the difference between the property size data of two matching property listings is within the threshold size tolerance. 6. The system according to any one of claims 1 to 5, wherein the at least one criterion further comprises a property value data, wherein the plurality of matching property listings are matched based on the property value data. 7. The system according to claim 6, wherein the processor is configured to receive a threshold value, wherein the plurality of property listings are matched if property value data of two matching property listings are the same or the difference between the property value data of two matching property listings is within the threshold value. 8. A method of generating a deduplicated property listing from a plurality of property listings, the method comprising: retrieving the plurality of property listings, identifying a plurality of matching property listings from the plurality of property listings that matches each other based on at least one criterion to form a plurality of matched property listings, grouping the plurality of matched property listings to form a consolidated property record, and generating a deduplicated property listing based on the consolidated property record. 9. The method according to claim 8, wherein the at least one criterion comprises location data, wherein the plurality of matching property listings are matched based on the location data. 10. The method according to claim 8 or 9, wherein the at least one criterion further comprises a property configuration data, wherein the plurality of matching property listings are matched based on the property configuration data. 11. The method according to any one of claims 8 to 10, wherein the at least one criterion further comprises a property size data, wherein the plurality of matching property listings are matched based on the property size data. 12. The method according to any one of claims 8 to 11, further comprising receiving a threshold size tolerance, wherein the plurality of matching property listings are matched if the difference between the property size data of two matching property listings is within the threshold size tolerance. 13. The method according to any one of claims 8 to 12, wherein the at least one criterion further comprises a property value data, wherein the plurality of matching property listings are matched based on the property value data. 14. The method according to claim 13, further comprising receiving a threshold value, wherein the plurality of property listings are matched if property value data of two matching property listings are the same or the difference between the property value data of two matching property listings is within the threshold value. 15. A non-transitory computer readable storage medium comprising instructions, such that the instructions, when executed by a processor in a system, cause the system to: retrieve the plurality of property listings, identify a plurality of matching property listings from the plurality of property listings that matches each other based on at least one criterion to form a plurality of matched property listings, group the plurality of matched property listings to form a consolidated property record, and generate a deduplicated property listing based on the consolidated property record. |
Cross-Reference to Related Applications
[0001] The present application claims the benefit of Singapore Patent Application No. 10202104040W filed April 20, 2021 which are incorporated by reference herein.
Technical Field
[0002] The present invention relates to a system for generating a deduplicated property listing from a plurality of property listings and a method thereof.
Background
[0003] When the owner of a real estate property wishes to sell or lease the property, he or she may use property listing systems or portals to inform potential buyers or tenants that the property is available for sale or lease, provide contact information, and share information about the property such as asking price, description and video clips. These systems may be used directly by the owner, or by a third party who represents the owner such as a real estate agent. In most countries, many different systems and portals exist in which such property listings may be published. However, there is often no coordination between these systems as they are independent from each other. Additionally, more than one listing may be published for a same property, either within the same listing system by one or multiple parties, such as multiple agents competing to market the same property, or in different systems, such as one or multiple agents using several systems simultaneously to market the property to a wider audience.
[0004] As a result of the multiple systems and the possibility of multiple parties creating listings, the real availability of properties for sale or rent within a real estate market is difficult to determine, as a person wishing to know a complete or exhaustive list of properties listed for sale or rent would have to access multiple systems. Even so, the person would be faced with duplicate listings for the same property in one or multiple systems. The issue is made harder as several listing systems do not allow users to identify properties, either because the full address and unit numbers are hidden or because they are not recorded. Hence, the person would not be able to identify when listings are promoting the same property.
[0005] There is also lack of coordination between different listing systems and systems that record rental and/or sale transactions. When a sale or rental transaction occurs for a property that was listed on one or on multiple listing systems, when a property is transacted, the transaction information is not linked to the corresponding listings in each system. As a result, listings may be published for properties that are no longer available, which may lead interested potential clients to waste time enquiring about such properties.
Summary
[0006] According to various embodiments, a system for generating deduplicated property listing from a plurality of property listings is provided. The system includes a processor, and a memory in communication to the processor for storing instructions executable by the processor, such that the processor is configured to retrieve the plurality of property listings, identify a plurality of matching property listings from the plurality of property listings that matches each other based on at least one criterion to form a plurality of matched property listings, group the plurality of matched property listings to form a consolidated property record, and generate a deduplicated property listing based on the consolidated property record.
[0007] According to various embodiments, the at least one criterion may include location data, wherein the plurality of matching property listings are matched based on the location data.
[0008] According to various embodiments, the at least one criterion further may include a property configuration data, such that the plurality of matching property listings are matched based on the property configuration data.
[0009] According to various embodiments, the at least one criterion may further include a property size data, such that the plurality of matching property listings are matched based on the property size data. [0010] According to various embodiments, the processor may be configured to receive a threshold size tolerance, such that the plurality of matching property listings are matched if the difference between the property size data of two matching property listings is within the threshold size tolerance.
[0011] According to various embodiments, the at least one criterion may further include a property value data, such that the plurality of matching property listings are matched based on the property value data.
[0012] According to various embodiments, the processor may be configured to receive a threshold value, such that the plurality of property listings are matched if property value data of two matching property listings are the same or the difference between the property value data of two matching property listings is within the threshold value.
[0013] According to various embodiments, a method of generating a deduplicated property listing from a plurality of property listings is provided. The method includes retrieving the plurality of property listings, identifying a plurality of matching property listings from the plurality of property listings that matches each other based on at least one criterion to form a plurality of matched property listings, grouping the plurality of matched property listings to form a consolidated property record, and generating a deduplicated property listing based on the consolidated property record.
[0014] According to various embodiments, the at least one criterion may include location data, such that the plurality of matching property listings are matched based on the location data.
[0015] According to various embodiments, the at least one criterion may further include a property configuration data, such that the plurality of matching property listings are matched based on the property configuration data.
[0016] According to various embodiments, the at least one criterion may further include a property size data, such that the plurality of matching property listings are matched based on the property size data. [0017] According to various embodiments, the method may further include receiving a threshold size tolerance, such that the plurality of matching property listings are matched if the difference between the property size data of two matching property listings is within the threshold size tolerance.
[0018] According to various embodiments, the at least one criterion may further include a property value data, such that the plurality of matching property listings are matched based on the property value data.
[0019] According to various embodiments, the method may further include receiving a threshold value, such that the plurality of property listings are matched if property value data of two matching property listings are the same or the difference between the property value data of two matching property listings is within the threshold value.
Brief Description of Drawings
[0020] Fig. 1 shows an exemplary embodiment of a system for generating a deduplicated property listing from a plurality of property listings, the system.
[0021] Fig. 2 shows a flow diagram of an exemplary method of generating a deduplicated property listing from a plurality of property listings, the system.
[0022] Fig. 3 shows an exemplary embodiment of the property data resources accessed by the system to generate the deduplicated property listing.
[0023] Fig. 3A shows an exemplary embodiment of the raw listing database with data fields.
[0024] Fig. 3B shows an exemplary embodiment of the deduplicated listing database with data fields.
[0025] Fig. 3C shows an exemplary embodiment of the transaction database with data fields. [0026] Fig. 3D shows an exemplary embodiment of the property database with data fields.
[0027] Fig. 3E shows an exemplary embodiment of the agent database with data fields.
[0028] Fig. 3F shows an exemplary embodiment of a listing record with data fields.
[0029] Fig. 3G shows a schematic diagram of the relationship between the databases.
[0030] Fig. 4 shows a flow diagram of an exemplary method of normalizing and cleaning data across the databases.
[0031] Fig, 5 shows a flow diagram of an exemplary method of generating a deduplicating listing from a plurality of property listings.
[0032] Fig. 6 shows a flow diagram of an exemplary method of matching a deduplicated property listing to a transaction record.
[0033] Fig. 7 shows a flow diagram of an exemplary method of matching unmatched deduplicated property listing to a transaction record.
[0034] Fig. 8 shows a flow diagram of an exemplary method of obtaining historical asking closing price spreads for single properties and the relationships between the databases.
[0035] Fig. 9 shows a flow diagram of an exemplary method of predicting asking-closing price spreads for properties and clusters of properties.
[0036] Fig. 10 shows a flow diagram of an exemplary method of determining property availability for clusters of properties. [0037] Fig. 11 shows a flow diagram of an exemplary method of determining the asking closing price spread for property listings.
[0038] Fig. 12 shows a flow diagram of an exemplary method of determining property absorption rate.
[0039] Fig. 13 shows a flow diagram of an exemplary method of determining a period of time of a property listing on the market.
Detailed Description
[0040] Fig. 1 shows an exemplary embodiment of a system 100 for generating a deduplicated property listing from a plurality of property listings. System 100 may include a processor 110, a memory 120 in communication with the processor 110 for storing instructions executable by the processor 110. Processor 110 is configured to retrieve the plurality of property listings, identify a plurality of matching property listings from the plurality of property listings that matches each other based on at least one criterion to form a plurality of matched property listings, group the plurality of matched property listings to form a consolidated property record and generate a deduplicated property listing based on the consolidated property record. System 100 may further include an I/O interface 130, a display 140, a power source 150, a communication module 160. System 100 may be in communication with user devices, e.g. computer devices, via a network. System 100 may be in communication with property data resources 180, which include databases for storing property data of the properties.
[0041] Fig. 2 shows a flow diagram of an exemplary method 200 of generating a deduplicated property listing from a plurality of property listings. Method 200 includes retrieving the plurality of property listings in block 210, identifying a plurality of matching property listings from the plurality of property listings that matches each other based on at least one criterion to form a plurality of matched property listings in block 220, grouping the plurality of matched property listings to form a consolidated property record in block 230, and generating a deduplicated property listing based on the consolidated property record in block 240. [0042] Fig. 3 shows an exemplary embodiment of the property data resources accessed by the system 100 to generate the deduplicated property listing. Property data resources may include at least the databases as shown in Fig. 3. Data in the databases may go through a series of cleaning and normalization process before being used as shown in Fig. 4 below.
[0043] Fig. 3 A shows an exemplary embodiment of the raw listing database 180A with data fields. Raw listing database 180A may include at least the data fields as shown. Raw listing database 180A may include raw property listings retrieved from single sources or multiple sources, e.g. multiple property listing platforms. Plurality of property listings may include the raw property listings. Raw property listings may include property records which an agent advertises or promotes a property for sale or for rent through the property listing platforms and contains the information as it is published in the original listing, i.e., with the same tags and content. Raw listing database 180A incorporates data from different property listing platforms and records the source of each raw property listing. Raw property listings may include active raw property listings, i.e., listings that are accessible and promoted at the current time, and non active raw property listings, i.e., listings that were published in the past but are not currently being promoted or are not accessible to users of the listing portals. In this stage, the exact identification of the property, such as a full address and unit/lot number may not be made available in the raw property listing. Further, the existence of other listings for the same property, in the same property listing platform or through other property listing platforms, is typically not known or disclosed in this stage. Raw listing database 180A may include records of historical asking-closing price spreads after the deduplicated property listings are matched with transaction records.
[0044] Referring to Fig. 3, the normalized listing database 180B may include normalized property listings, which includes records of property listings in which the data fields are normalized to a standard format applicable to listings from different platforms. Raw property listings are normalized by the system 100 and stored in the normalized listing database 180B. Plurality of property listings may include the normalized property listings.
[0045] Fig. 3B shows an exemplary embodiment of the deduplicated listing database 180C with data fields. Deduplicated listing database 180C may include at least the data fields as shown. Deduplicated listing database 180C may include deduplicated property listings, which include records of properties and listings in which all of the active and non-active property listings that have been collected for the same property are identified, matched to the corresponding property, and grouped together, resulting in a consolidated property record that includes property listings from multiple sources. Consolidated property record may include the historical record of active and non-active raw listings attached to the property, date ranges for which each individual property listing was found to be active, price drops and price increases of the property, etc. Although the exact identification of the property, such as a full address and unit/lot number may not be made available in some property listings, the system 100 is configured to group together property listings that are most likely referring to the same property. Deduplicated listing database 180C may include deduplicated listing ID, etc.
[0046] Fig. 3C shows an exemplary embodiment of the transaction database 180D with data fields. Transaction database 180D may include at least the data fields as shown. Transaction database 180D may include property transaction data, which include records of actual transacted sales or rentals, transaction ID, etc.
[0047] Fig. 3D shows an exemplary embodiment of the property database 180E with data fields. Property database 180E may include at least the data fields as shown. Property database 180E may include property data, e.g. address, size of the property, no. of rooms, etc., property
ID.
[0048] Fig. 3E shows an exemplary embodiment of the agent database 180F with data fields. Agent database 180F may include at least the data fields as shown. Agent database 180F may include agent details, e.g. agent ID, name, etc.
[0049] Referring to Fig. 3, the building database 180G may include information of buildings, e.g. height, no. of floors, no. of elevators, construction material, etc. Referring to Fig. 3, the project database 180H may include information about specific projects/developments, new developments being built, etc. Referring to Fig. 3, the property cluster database 180J may include property cluster data. A property cluster is a group of properties that is grouped using a specific set of clustering criteria, such as, location of properties, e.g. districts or neighbourhoods, or property details, e.g. property types. Property cluster database 180J may include property cluster ID to identify property clusters. Property cluster database 180J may include time-specific statistical computations on the asking -closing price spread such as current asking-closing price spreads for each property clusters, time-specific statistical computations on the property availability, time-specific statistical computations on the property absorption, time-specific statistical computations on days-on-market for listings in each property cluster, etc. Referring to Fig. 3, the address database 180K may include information about an address and various points of interest. Address database 180K may include latitude/longitude coordinates of the property.
[0050] Fig. 3F shows an exemplary embodiment of a listing record 300 with data fields related to the property listing. Listing record may include at least the data fields as shown.
[0051] Fig. 3G shows a schematic diagram 310 of the relationship between the databases. As shown in Fig. 13, each database is linked to another database via a link between the database IDs. For example, the raw listing database 180a is linked to the agent database 180F via the agent ID and to the deduplicated listing database 180C via the deduplicated listing ID. Transaction database 180D is linked to the deduplicated listing database 180C via deduplicated listing ID and transaction ID and to the property database 180E via the property ID.
[0052] Fig. 4 shows a flow diagram of an exemplary method 400 of normalizing and cleaning data across the databases. System 100 may include a normalizing and cleaning module configured to normalize and clean data stored in the databases. Method 4f00 may include retrieving new raw property listings from the raw listing database 180A in block 410. Method 400 may further include normalizing and cleaning the raw property listings in block 420. In this step, word encoding may be used for pre-processing. Misspelled parts may be identified based on Levenshtein distance and bag of ngrams encoding. NER (named entity recognition) may be used for address part identification and meaning assignment. The distance between two addresses may be based on cosine similarity and the confidence score uses the TF-IDF weights and the total entropy of present address components. Method 400 may include storing the normalized property listings in the normalized listing database 180B in block 430. Method 400 may further include categorizing the normalized property listings into property types in block 440. This step will be further elaborated below. Once categorized, the method 400 may include storing the categorized normalized property listings in the normalized listing database 180B in block 450. As a result, the data in the raw property listings is normalized or transformed into a format that can be compared and analysed regardless of the source of the data, e.g. property listing platforms.
[0053] System 100 may include a categorizing module configured to categorize the raw property listings into the most likely types of property listings. Categorizing module may be configured to correct inaccuracies. Categorizing module may be configured compare property data of a normalized property listing with reference data of the types of properties, i.e. data obtained from historical records of types of properties, and identify the type of the normalized property listing based on the reference data. For example, the categorizing module may be configured to calculate a price per unit area value and compares the value to reference values of the types of properties, i.e. reference values obtained from historical records of types of properties. Using a single room listing as an example, a reference value may be inputted by taking the 95th percentile listing prices that are tagged as “single room” for each corresponding property type group, or by other methods. If this comparison results in the listed price being equal or lower than the reference value of single-room price of the same category of properties, then the categorizing module categorizes the listing as belonging to a single-room rental listing. The reference value used in the comparison to categorize the normalized property listing as a single room may be adjusted. If on the other hand, the categorizing module determines that the price per unit area value is higher than the reference value, then the categorizing module categorizes the normalized property listing as an entire property rental listing. The property type used to generate a reference value in the categorization process may refer to properties of the same form, e.g., apartments/condominiums, landed houses, government subsidized properties, or any other, to properties of the same project, to properties in a nearby location, or to any other convention that can be used to form a comparable property type.
[0054] Raw property listings may be advertising the interest of the property owner to conduct different types of transactions with the property, such as a sale of the property, a rental or lease of the entire property, or a rental or lease of a part of the unit, e.g. rental of a single bedroom within a property that has more than one bedroom. A common problem found among rental listings across multiple property listing platforms is that they do not specify whether the listing is referring to a rental of the entire property or of a single room within the property. Another problem is that some listings that in reality are referring to single -room rentals are inaccurately labelled as entire property rentals. Hence, the system 100 is able to categorize the property listing according to the most likely types of property listings and correct inaccuracies.
[0055] Fig, 5 shows a flow diagram of an exemplary method 500 of generating a deduplicating listing from a plurality of property listings, e.g. normalized property listings. System 100 may include a deduplicating module configured to generate the deduplicating listing. Plurality of property listings may include one or more property listings of one property. Plurality of property listings may be from a property listing platform or a plurality of property listing platforms. Method 500 may include retrieving the plurality of property listings, e.g. the normalized property listings from the normalized listing database 180B, in block 510. Method 500 may include a filtering process to determine which normalized property listings in the normalized listings database 180B are more likely to be referring to the same property. Method 500 may include matching the plurality of property listings in block 520. Method 500 includes identifying a plurality of matching property listings from the plurality of property listings that matches each other based on at least one criterion to form a plurality of matched property listings. System 100 may include a matching module configured to match the plurality of property listings to at least one criterion. The at least one criterion may be location data of the property. Location data may be location-based information, and may include a combination of coordinates, addresses, and data fields that have been processed using the Natural Language Processing techniques. For example, the plurality of matched property listings may refer to the same project, properties located within the same development, building, condominium, cluster house project, or gated community. If the location data includes exact property addresses, i.e. full address including the unit number or house number, then matching is done directly with this criterion. If the exact property address is not available, then the matching continues as described below. Once matched, the method 500 may include grouping the plurality of matched property listings in block 540.
[0056] System 100 may include an extraction module configured to extract components of the normalized property listings from the one or multiple property listing platforms. Extraction module may be configured to utilize Natural Language Processing techniques and Computer Vision techniques to extract key components of the property listings. [0057] System 100 may be configured to determine if there is a sufficient match between the plurality of matching property listings to allow the system 100 to determine that they refer to the same property. If so, the system 100 may be configured to update the consolidated property records and normalized listings database 180B. System 100 may continuously carry out this step to update the consolidated property records and normalized listings database 180B. A consolidated property record combines records of properties and listings in which all of the active and non-active property listings that have been collected for the same property are identified, matched to the corresponding property, and grouped together, resulting in the consolidated property record that includes property listings from multiple sources. System 100 may generate the deduplicated property listing based on the consolidated property record.
[0058] The at least one criterion may further include a property configuration data such that the plurality of property listings may be matched based on the property configuration data in block 550. Plurality of property listings may be further grouped into those that refer to properties that have the same configuration data, e.g. number of bedrooms, number of bathrooms. Additionally, the at least one criterion may further include a property size data such that the plurality of property listings may be matched based on the property size data. Plurality of property listings may be grouped into those that refer to properties that have the same size, e.g. built size, constructed size, house size and/or land size. Method 500 may include receiving a threshold size tolerance, such that the plurality of property listings are matched if the difference between the property size data of two property records is within the threshold size tolerance. Plurality of property listings may be further grouped into those that are within a determined threshold size tolerance with respect to one another. Threshold size tolerance may be adjustable and fine-tuned. The at least one criterion may further include a property value data such that the plurality of property listings may be matched based on the property value data. Method 500 may include receiving a threshold value, such that the plurality of property listings are matched if property value data of two property listings are the same or the difference between the property value data of two property listings is within the threshold value. Plurality of property listings may be further grouped into those that have a listing price that have the same value or that are within a determined threshold price with respect to one another. Threshold price may be adjustable and fine-tuned. [0059] Method 500 may include matching the text description of the plurality of property listings in block 560. Text description, e.g. the property descriptions, data fields and multimedia files of the plurality of matching property listings that are grouped in the above- mentioned manner may be further compared using Natural Language Processing and Computer Vision techniques, in order to determine if they are referring to the same property, to a comparable but different property, or if there is not sufficient data to determine whether or not they are the same property at the time in which the analysis is made. The analysis is revised continuously, including after the plurality of matched property listings are no longer active, i.e. no longer available or no longer published in the listing platform and marked as non-active in the normalized listing database 180B, and after the system 100 detects property transactions that are likely to correspond to a property listing. The data fields that are further compared with Natural Language Processing include those that are used by the agent who created the listing to describe the property in their own words or any other descriptive messages that may be used to market or promote the property, such as descriptions of the interior layout, the view, the contents of the property, etc.
[0060] The similarity between pairs of data fields of two matching property listings that are being compared in this process to determine whether or not they are referring to the same property involve converting texts into vectors by using encoding libraries, and then computing cosine similarity values between vectors, where a cosine similarity value of 1 indicates exact similarity and 0 indicates nothing being similar. A pair of matching property listings will be recorded as belonging to the same property after a threshold value in the cosine similarity of the property description is met, for example, a threshold value can be used such that if the cosine similarity of the property description of listings of the same location, configuration and size is above 95% then the property listing can be recorded as belonging to the same property.
[0061] Method 500 may include matching the graphic files, e.g. pictures and floorplans of the properties, of the plurality of matching property listings in block 570. Matching module may utilise Computer Vision to classify these files between those that are representations of the interior of the property being promoted, i.e. pictures of a property’s interior or a floorplan of a property, and those that are representations of external spaces such as facades, project common areas, or the vicinity, i.e. a picture of a building facade, or a map of a neighbourhood. By comparing the interior graphic files of between two or more listings, the system 100 is able to determine if the matching property listings are using the same interior pictures and/or floorplans. The abovementioned steps in block 550, 560, 570, may not be carried out in the sequence mentioned or, in other words, may be carried out in any sequence.
[0062] If the plurality of property listings are matched based on one or more the abovementioned criterion, the method 500 may include grouping the plurality of matched property listings to form a consolidated property record in block 540. System 100 may include a grouping module configured to group the plurality of matched property listings to form a consolidated property record. System 100 may finalize the matching process by combining the results of the Computer Vision comparisons between plurality of matched property listings that have been determined to belong to the same location or within a determined proximity threshold, the same configuration, the same size or within the determined threshold size, and the same description or within the determined description cosine similarity value threshold. Method 500 may include generating a deduplicated property listing based on the consolidated property record. Deduplicating module may be configured to store the deduplicated property listing in the deduplicated listing database 180C in block 590. As such, the system 100 in configured to record the groupings of matched property listings across platforms that are referring to the same property. System 100 may also be configured to record the plurality of property listings that have not been matched, i.e. unmatched plurality of property listings, in the deduplicated listing database 180C in block 590.
[0063] With the deduplicated listing database 180C of deduplicated listings, the system 100 enables a property owner and potential buyer to avoid accessing multiple listing platforms to find a property listing of a property as they are consolidated within a deduplicated listing for the same property. In this way, the property owner and potential buyer is able to save time and obtain up to date and accurate information of the property.
[0064] System 100 may be configured to match a plurality of deduplicated property listings to a plurality of transaction records of transacted properties. Property listing platforms would typically not track or publish if and when a property listing property transacts, nor the price or value of the transaction. Therefore, the system 100 is configured to match the deduplicated property listings in the deduplicated listing database 180C with the transaction records in the transaction database 180D to as to enable the complete tracking of property market activities, from the moment in which a property is listed on any property listing platform to the moment it is transacted. However, the matching of transaction records and the deduplicated property listings is made especially difficult because the exact property identification may not be available in the property listing platforms. In most cases, the agents wish to keep the unit numbers or exact property details confidential, or may not be available in transaction records, as is the case in which transaction databases and publications show only approximate property addresses. System 100 may be configured to overcome the above issues.
[0065] System 100 may be configured to match a plurality of deduplicated property listings to a plurality of transaction records of transacted properties. System 100 may be configured to retrieve the plurality of deduplicated property listings, retrieve the transaction records, match the plurality of deduplicated property listings to the plurality of transacted records based on at least one criterion to identify a plurality of matching deduplicated property listings that matches a plurality of matching transacted records, and match a plurality of unmatched transacted records to a plurality of unmatched deduplicated property listings based on at least one criterion to identify a plurality of matching unmatched deduplicated property listings to a plurality of unmatched matching transacted records. For example, the system 100 may be configured to match 50 transaction records to 100 deduplicated property listings based on at least one criterion. Assuming that only some of the 50 transaction records match the deduplicated property listings, there will be some unmatched transaction records and deduplicated property listings. System 100 may be configured to match the unmatched deduplicated property listings to the unmatched transaction records based on the at least one criterion. When matching the unmatched deduplicated property listings to the unmatched transaction records, the system 100 may be configured to gather the transacted price of the unmatched transaction records to the unmatched deduplicated property listings so that the system 100 is able to predict a transaction price for the unmatched deduplicated property listings.
[0066] Fig. 6 shows a flow diagram of an exemplary method 600 of matching a deduplicated property listing to a transaction record. Matching module may be configured to match the deduplicated property listing to the transaction record. Once a transaction record is captured, it is stored in the transaction database 180D. System 100 may be configured to retrieve a new transaction record from the transaction database 180D in block 610. System 100 may be configured to normalize the transaction data in the transaction record into a format and data structure that facilitates comparison between the deduplicated property listing and the transaction record in block 620. Method 600 may include matching the transaction record to a deduplicated property listing in block 630. Matching module may be configured to match the transaction record to a deduplicated property listing and assign the transaction record to a deduplicated property listing or a group of deduplicated property listings. Once matched, the method 600 may further include recording the matched transaction record in the transaction database 180D in block 640 and the matched deduplicated property listing in the deduplicated list database 180C in bock 650. By recording the matched transaction record to the deduplicating property listing, the system 100 creates a deduplication listing ID in the transaction database 180D and a transaction ID in the deduplicated listing database 180C and links the records in the databases.
[0067] Each of the plurality of deduplicated property listings may include a property address data, and each of the plurality of transaction records may include a transacted property address data. Matching module may be configured to match the deduplicated property listings and the transaction records based on at least one criterion, i.e. the address data, e.g. the property address data and the transacted property address data. Matching module may be configured to match the property address data to the transacted property address data, such that a deduplicated property listing matches a transaction record when the property address data matches the transacted property address data. In block 630, the matching module may carry out a first matching iteration to determine if there are matching deduplicated property listings in the deduplicated listing database 180C that matches with the transaction records in the transaction database 180D based on location data or location-based information, e.g. project name, coordinates, and/or address. Once this is done and if there is a positive result, the matching module may verify if the matching location data corresponds to an exact and unambiguous property. For example, the matching module is configured to verify if both the unduplicated property listing and the transaction record show complete property addresses, i.e., including the unit number. Thereafter, the matching module may be configured to record the deduplicated property listing and the transaction record in the deduplicated listing database 180C and the transaction database 180D in block 640 and block 650 respectively. [0068] When the property address data and the transacted property address data are not matched, the method 600 may include matching the deduplicated property listing to the transaction record based on other criterions in block 660. The other criterions may include property configuration data, property size data and property value data. Each of the plurality of deduplicated property listings may include at least one of property configuration data, property size data and property value data, and each of the plurality of transaction records may include at least one of transacted property configuration data, transacted property size data and transacted property value data. Matching module may be configured match the property configuration data to the transacted property configuration data, verify the difference between the property size data and the transacted property size data is within a threshold size tolerance, and verify the difference between the property value data and the transacted property value data is within a threshold value. If the above is verified and matched, the matching module is configured to verify that the location of the listed property based on the property address data is within the location of the transacted property based on the transacted property address data. If it is verified, the matching module is configured to match the deduplicated property listing to the transaction record. In other words, the matching module is configured to check if there are deduplicated property listings within the verified location of the transaction record that match the transaction record in terms of configuration (i.e. number of bedrooms, number of bathrooms), property size that is within a determined threshold size tolerance, i.e. built size, constructed size, house size and/or land size, that have a value, i.e. listing price for the deduplicated property listings and transacted value for the transacted property, that are within a predetermined threshold value from one another. If the transaction value is within the threshold value, the matching module then proceeds to match and record the deduplicated property listing in the deduplicated listing database 180C in block 640 and the transaction record in the transaction database 180D in block 650. Matching module may be configured to record the matched transaction ID in the deduplicated listing database 180C and the matched deduplicated listing ID in the transaction database 180D. If the matching module determines that the transaction record cannot be matched to any of the plurality of deduplicated property listings in the deduplicated listings database 180D, the transaction record in the transaction database 180D remains unchanged in block 670. Deduplicated property listings that are not matched with any transaction records would remain unchanged. [0069] Fig. 7 shows a flow diagram of an exemplary method 700 of matching unmatched deduplicated property listing to a transaction record. Method 700 may include retrieving a plurality of unmatched deduplicating property listings from the deduplicated listing database 180C in block 710. Matching module may be configured to retrieve the plurality of unmatched deduplicated property listings. Matching module may be configured to match the unmatched deduplicated property listings and the transaction records based on at least one criterion, e.g. the property address data and the transacted property address data, in block 720. Matching module may be configured to match the property address data to the transacted property address data, such that an unmatched deduplicated property listing matches a transaction record when the property address data matches the transacted property address data. Matching module may be configured to determine if there are transaction records in transaction database 180D that have matching location as the unmatched deduplicated property listing based on project name, coordinates, and/or address. Once this is done and if there is a positive result, the matching module may verify if the matching location data corresponds to an exact and unambiguous property. For example, the matching module is configured to verify if both the unmatched deduplicated property listing and the transaction record show complete property addresses, i.e., including the unit number. Thereafter, the matching module may be configured to record the unmatched deduplicated property listing and the transaction record in the deduplicated listing database 180C and the transaction database 180D in block 730 and block 740 respectively.
[0070] When the property address data and the transacted property address data are not matched, the method 700 may include matching the unmatched deduplicated property listing to the transaction record based on other criterions in block 750. The other criterions may include property configuration data, property size data and property value data. Each of the plurality of unmatched deduplicated property listings may include at least one of property configuration data, property size data and property value data, and each of the plurality of transaction records may include at least one of transacted property configuration data, transacted property size data and transacted property value data. Matching module may be configured to match the property configuration data to the transacted property configuration data, verify the difference between the property size data and the transacted property size data is within a threshold size tolerance, and verify the difference between the property value data and the transacted property value data is within a threshold value. If the above is verified and matched, the matching module may be configured to verify that the location of the listed property based on the property address data is within the location of the transacted property based on the transacted property address data. If it is verified, the matching module is configured to match the unmatched deduplicated property listing to the transaction record. In other words, the matching module is configured to check if there are unmatched deduplicated property listings within the verified location of the transaction record that match the transaction record in terms of configuration, i.e. number of bedrooms, number of bathrooms, property size that is within a determined threshold size tolerance, i.e. built size, constructed size, house size and/or land size, that have a value, i.e. listing price for the unmatched deduplicated property listings and transacted value for the transacted property, that are within a predetermined threshold value from one another. If the transaction value is within the threshold value, the matching module then proceeds to match and record the unmatched deduplicated property listing in the deduplicated listing database 180C in block 730 and the transaction record in the transaction database 180D in block 740. Matching module may be configured to record the matched transaction ID in the deduplicated listing database 180C and the matched deduplicated listing ID in the transaction database 180D. If the matching module determines that the transaction record cannot be matched to any of the plurality of unmatched deduplicated property listings in the deduplicated listings database 180D, the transaction record in the transaction database 180D remains unchanged in block 760. Unmatched deduplicated property listings that are not matched with any transaction records would remain unchanged.
[0071] Below is an example of matching a deduplicated property listing to a transaction record. Assuming that the listing date of the deduplicated property listing is earlier than the transaction date of the transaction record and the listing price and transacted value should not differ significantly. The difference in the price and value may be pre-defined in the system 100.
[0072] Method of generating an average price gap between the listing price and transaction value without matching:
Within each project/building, and for each house area size or number of bedrooms, for example, retrieve the 25th, 50th, 75th percentile listing prices and transaction prices for the last n month (e.g.: n=l).
Derive the % difference of listing price as compared to transaction price at the same percentile, and take the maximum % value from the percentiles. The % difference may be used as a guiding number on what the maximum price gap between listing and transaction value in that project/building.
When there is no transaction in the project/building, the matching module identifies projects/buildings which are similar to the current one and within a distance radius (e.g. within 0.5km/lkm) to obtain the maximum price gap as the price gap for current project/building. Similar projects/buildings may be clustered to be in the same project cluster by using k-prototype clustering based on completion year, geographical location, project sizes and other project attributes.
For each listing, identify the latest transaction (that occurs after the listing date) that fulfils the following matching criteria:
Similar area (e.g. +/- 5%)
The price gap within the listing price and transaction value as generated above
For remaining transaction records that are not matched, identify the nearest listing (that occurred before the listing date) that fulfils the following matching criteria: Similar area (e.g. +/- 5%)
The price gap within the listing price and transaction value as generated above
[0073] Matching module may be configured to assign transaction records to one or more deduplicated property listings in the cases for which the matching method 700 does not result in a clear match, or for cases in which there are more than one deduplicated property listing that have equal chances of being the corresponding match with a given transaction.
[0074] The above process results in the interconnection of the agent database 180F, the listing database 180A, the deduplicated listing database 180C, the transaction database 180D, and the property database 180E, as each transaction ID is included in the corresponding deduplicated property listing record, each raw listing is included in the corresponding raw property listing record, and agents’ ID are included in the corresponding raw property listing record, in a continuous process that follows the steps in Fig. 7.
[0075] By generating deduplicated property listings and matching transaction records to the listings, the system 100 enables analysis of property listing information from multiple sources as well as property transaction information from multiple sources, and consolidating into unified, consistent and interlinked datasets in which each property’s historical listings and transactions are accessible and the real availability of properties for rent and for sale is revealed. This allows a user to access listings from multiple different listing platforms under the same system, identify which listings correspond to the same property, determine the real number of properties available. System 100 also enables users the ability to interact with each other across multiple platforms, as a result of the unification, consolidation and coordination of listing and transaction information.
[0076] Fig. 8 shows a flow diagram of an exemplary method 800 of obtaining historical asking closing price spreads for single properties and the relationships between the databases. As shown in Fig. 8, the transactions database 180D may be interlinked to the deduplicated property listings database 180C and to the property database 180E by the incorporation of a matched transaction ID and a property ID in the databases.
[0077] Property listings have a published price that is determined by the person who creates the listing, such as the owner of the property or the real estate agent who creates the listing. This value is equivalent to an asking price and may be different from the actual transaction value that will occur once an agreement has been made between buyer and seller or between landlord and tenant.
[0078] By effectively linking property listings to transaction records, it is possible to compute the spread between listing prices and transaction values for individual properties and transactions, consolidated groups of properties on predetermined geographic regions, and for clusters of properties and/or transactions that are grouped according to different clustering techniques and criteria.
[0079] Method 800 may include retrieving the listing price from the deduplicated listing database 180C and the transaction value from the transaction database 180D in block 810. Method 800 may include calculating the asking -closing price spreads in block 820. For each raw property listing for which there are records of matched deduplicated property listings and transaction records, two new records are created which record the absolute spread between that listing price and each transacted value, i.e. listed price minus transaction value, and the percentage spread between these values, i.e. listed price minus transaction price, divided by transaction price. Together with the transaction date, the raw listing database 180A effectively records all recorded asking-closing spreads recorded historically. Method 800 may include updating the asking price-closing spreads in property database 180E, deduplicated listing database 180C and raw listing database 180A in block 830.
[0080] For each property in the property database 180E for which deduplicated property listings and transaction records are recorded, the system 100 may obtain records of asking closing spreads via the relationship established between the deduplicated listings database 180C and raw listings database 180A. As the property database 180E contains identifiers of the property such as geographic location, name of development name or building name, property type, etc., the system 100 may be configured to obtain the results of historical asking closing price spreads for any cluster of properties for which selection criteria are specified.
[0081] Fig. 9 shows a flow diagram of an exemplary method 900 of predicting asking-closing price spreads for properties and clusters of properties. The asking-closing price spreads constitute historical observations for specific properties which may be a result of market conditions of a specific point in time. Since the overall trend of asking-closing price spreads may change over time as market conditions evolve, a process is implemented that models the trends in asking-closing price spreads for clusters of properties over time. System 100 is configured to generate probabilistic models to predict the behaviour of asking-closing price spreads of any property listing that is observed in the market based on the attributes of the property and the listing, by modelling the behaviour of this parameter in listings that have been assigned to corresponding clusters.
[0082] Method 900 may include retrieving property data from property database 180E in block 910. Method 900 may include clustering the properties in the property database 180E into appropriate clusters based on property parameters such as location (e.g. city, subdivisions within a city such as districts or neighbourhoods, project name, etc.), and property details (e.g. property types like apartments, condominiums, landed houses, public housing, etc.) in block 920. System 100 may be configured to receive user input of the clustering criteria for properties to be clustered, including specifying the clustering variables (e.g. location and/or property details). Based on the clustering criteria, the system 100 is configured to cluster the properties to form a property cluster and assigns a property cluster ID to the property cluster. Method 900 may include storing the cluster data, including the property cluster ID in the property cluster database 180J in block 930.
[0083] Method 900 may further include retrieving the deduplicated property listings from the deduplicated listing database 180C in block 940 and assigning each of the deduplicated property listings to one or several of the property clusters that apply in block 950. As property clusters may refer to different ways of grouping properties together, a deduplicated property listing can be assigned to several clusters concurrently. Method 900 may further include training a regression model using the cluster model for various listing and property features, e.g. asking price, property location, property type, property size, etc. in block 960. The regression model may include Hedonic, Neural Network, Random decision forest, models. After the model is trained, the method 900 includes predicting the asking-closing price spreads of individual listings and for property clusters, based on the observed characteristics of the properties and listings. Method 900 may include predicting the most likely asking-closing price spread, as well as probability distribution of asking -closing price spreads in block 970. Method 900 may include analysing the property clusters in terms of historical asking-closing spreads for every time period, e.g. week, month, quarter, year, etc., and recording each time-specific observation under the corresponding property cluster in the property clusters database 180J.
[0084] Fig. 10 shows a flow diagram of an exemplary method 1000 of determining property availability for clusters of properties. Property availability refers to the number of properties that are listed for sale or for rent as a percentage of the total number of properties that exist in a given property cluster, which can be defined by location based or property attributes. This is an important information for real estate decision makers but is rarely available due to the lack of a consolidated and deduplicated listings database 180C combined with a comprehensive property database 180D.
[0085] Method 1000 may include retrieving property data from the property database 180E in block 1010. Method 1000 may include retrieving property cluster data from the property cluster database 180J in block 1020. Method 1000 may include quantifying the total number of properties that exists in each property cluster. Method 1000 may further include quantify the total number of active deduplicated property listings on sale in each property cluster for each time period from the deduplicated listings database 180C in block 1030. Method 1000 may include generating the relative availability in block 1040 by dividing the deduplicated property listings by the total number of properties and recording the result at defined time intervals, e.g. every day, week, month, year, etc., in the property clusters database 180J. Separate records may be recorded for each type of listing, e.g. rental, sale, in the property cluster database 180J.
[0086] System 100 may be configured to generate sale availability for each property cluster. For example, for each historical time period, the system 100 is configured to generate the number of properties listed for sale as a percentage of total properties in the cluster. System 100 may be configured to generate rent availability for each property cluster. For example, for each historical time period, the system 100 is configured to generate the number of properties listed for rent as a percentage of total properties in the cluster
[0087] Fig. 11 shows a flow diagram of an exemplary method 1100 of determining the asking closing price spread for property listings. The asking -closing price spreads refer to the initial price that a property is listed for versus the actual transacted price. The spread may be negative, i.e. where the asking price is higher than the closing price or positive, i.e. where the asking price is lower than the closing price. This is an important information for real estate decision makers but is rarely available due to the lack of a consolidated and deduplicated listings database 180C combined with a comprehensive transaction database 180D.
[0088] Method 1100 may include retrieving property data from property database 180E in block 1110. Method 1100 may include clustering the properties in the property database 180E into appropriate clusters based on property parameters such as location, e.g. city, subdivisions within a city such as districts or neighbourhoods, project name, etc., and property details, e.g. property types like apartments, condominiums, landed houses, public housing, etc., in block 1120. System 100 may be configured to receive user input of the clustering criteria for properties to be clustered, including specifying the clustering variables, e.g. location and/or property details. Based on the clustering criteria, the system 100 may be configured to cluster the properties to form a property cluster and assigns a property cluster ID to the property cluster. Method 1100 may include storing the cluster data, including the cluster ID in the property cluster database 180J in block 1130.
[0089] Method 1100 may further include retrieving the raw property listings from the raw listing database 180A and assigning each of the raw property listings to one or several of the property clusters that apply in block 1140. Method 1100 may further include calculating the statistical outputs on asking-closing price spreads for the property clusters in block 1140. Method 1100 may include storing the outputs in the property cluster database 180J in block 1160.
[0090] Fig. 12 shows a flow diagram of an exemplary method 1200 of determining property absorption rate. Property absorption rates refers to the number of properties that transact, either as sale or rental, as a percentage of the total number of properties that were listed in a given property cluster of properties, which can be defined by location based or property attributes. This is an important information for real estate decision makers but is rarely available due to the lack of a consolidated and deduplicated listings database 180C combined with a comprehensive transaction database 180D.
[0091] Method 1200 may include retrieving transaction data and property cluster data from the transaction database 180D and property cluster database 180J respectively in block 1210. Method 1200 may include quantifying or calculating the number of property transactions that correspond to each property cluster for each time period from the transaction database 180D in block 1220. Method 1200 may include retrieving deduplicated property listing and property cluster data from the deduplicated listing database 180C and the property cluster database 180J respectively in block 1230. Method 1200 may further include quantifying and calculating the total number of active deduplicated listings in each property cluster for each time period from the deduplicated listings database 180C in block 1240. Thereafter, the method 1200 may include generating the absorption rate by dividing the number of transactions in block 1250 by the total deduplicated listings in the same property cluster and time period and recording the result at defined time intervals, e.g. day, week, month, year, etc., and storing the absorption rate in the property clusters database 180J. Separate records may be recorded for each type of transactions and listing, e.g. rental, sale, in the property cluster database 180J. Method 1200 may include updating the property cluster database 180J in block 1260.
[0092] System 100 may be configured to generate sale absorption rate for each property cluster. For example, for each historical time period, the system 100 is configured to generate the number of sale transactions as a percentage of total deduplicated property listings in the property cluster. System 100 may be configured to generate rent absorption rate for each property cluster: For example, for each historical time period, the system 100 is configured to generate the number of rent transactions as a percentage of total deduplicated rent listings in the property cluster.
[0093] Fig. 13 shows a flow diagram of an exemplary method 1300 of determining a period of time, e.g. number of days, of a property listing on the market. Days on market refers to the number of days that a property has active listings, either as sale or rental, before the property transacts. This is an important information for real estate decision makers but is rarely available due to the lack of a consolidated and deduplicated listings database 180C combined with a comprehensive transaction database 180D. System 100 enables computation of days on market for individual properties and clusters of properties.
[0094] Method 1300 may include retrieving deduplicated property listing from the deduplicated listing database 180C in block 1310. Method 1300 may include quantifying or calculating the total number of days that each property in the deduplicated property listing was listed for till transacted in block 1320. Method 1300 may include retrieving the property cluster data from the property cluster database 180J in block 1330. Method 1300 may further include inputting these values in the same property cluster and time period and recording the result at defined time intervals, e.g. day, week, month, year, etc., in the property clusters database 180J. Method 1300 may include linking the deduplicated property listing to the results in block 1340. Separate records are recorded for each type of transactions and listing, e.g. rental, sale. Finally, the method 1300 may include computing for each property cluster and time period to include mean, median, maximum and minimum days on market of the deduplicated property listing in each property cluster that transacted in the same period in block 1350. These values would be recorded and updated for each property cluster database 180J in block 1360. [0095] System 100 may be configured to generate the number of days on market for each transacted property in a property cluster. System 100 may be configured to generate mean, median, maximum and minimum days on market (sale) of the properties in each property cluster. System 100 may also be configured to generate mean, median, maximum and minimum days on market (rental) of the properties in each property cluster.
[0096] System 100 may include a server, a laptop, a computer, etc. Processor 110 typically controls overall operations of the system 100, such as the operations associated with display, data communications. Processor 110 may include one or more processors to execute instructions in the above-described modules to perform all or part of the steps in the above- described methods. Moreover, the processor 110 may include one or more modules which facilitate the interaction between the processor 110 and other modules. System 100 may be in communication with the user devices via the network.
[0097] Memory 120 may be configured to store various types of data, including the modules, to support the operation of the system 100. For example, the data may include instructions for any applications or above methods operated on the system 100, programmes, applications, modules etc. Memory 120 may be implemented using any type of volatile or non-volatile memory devices, or a combination thereof, such as a static random access memory (SRAM), an electrically erasable programmable read-only memory (EEPROM), an erasable programmable read-only memory (EPROM), a programmable read-only memory (PROM), a read-only memory (ROM), a magnetic memory, a flash memory, a magnetic or optical disk.
[0098] Power source 150 provides power to various modules of the system 100. Power source 150 may include a power management system 100, one or more power sources, and any other modules associated with the generation, management, and distribution of power in the system 100.
[0099] Display 140 may include a screen providing an output interface for the system 100 and the user. In some embodiments, the screen may include a liquid crystal display (LCD), organic light-emitting diode (OLED), a touch panel, etc. If the screen includes the touch panel, the screen may be implemented as a touch screen to receive input signals from the user. Touch panel may include one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensors may not only sense a boundary of a touch or swipe action, but also sense a period of time and a pressure associated with the touch or swipe action.
[0100] I/O interface 130 provides an interface between the processor 110 and peripheral interface modules, such as a keyboard, a click wheel, buttons, and the like. The buttons may include, but are not limited to, a home button, a volume button, a starting button, and a locking button.
[0101] Communication module 160 may be configured to facilitate communication, wired or wirelessly, between the system 100 and other devices. System 100 may access a wireless network based on a communication standard, such as WiFi, 2G, or 3G, LTE, and 4G cellular technologies or a combination thereof. In one exemplary embodiment, the communication module may receive a broadcast signal or broadcast associated information from an external broadcast management system 100 via a broadcast channel. In one exemplary embodiment, the communication module may further include a near field communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on a radio frequency identification (RFID) technology, an infrared data association (IrDA) technology, an ultra- wideband (UWB) technology, a Bluetooth (BT) technology, and other technologies.
[0102] In exemplary embodiments, the system 100 may be implemented with one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), controllers, micro-controllers, microprocessors, or other electronic modules, for performing the above described methods.
[0103] A skilled person would appreciate that the features described in one example may not be restricted to that example and may be combined with any one of the other examples.
[0104] The present invention relates to a system for generating deduplicated property listing from a plurality of property listings and a method thereof generally as herein described, with reference to and/or illustrated in the accompanying drawings.