METHOD, SYSTEM AND COMPUTER PROGRAMS FOR PROVIDING A LOCATION AWARE USER MODEL PRESERVING THE USER'S PRIVACY

Title:

METHOD, SYSTEM AND COMPUTER PROGRAMS FOR PROVIDING A LOCATION AWARE USER MODEL PRESERVING THE USER'S PRIVACY

Document Type and Number:

WIPO Patent Application WO/2020/079075

Kind Code:

Abstract:

Method, system and computer programs for providing a location aware user model preserving the user's privacy. The method comprises a) collecting, by a sensor capture module, sensor data from a plurality of sensors installed on a mobile computing device of a user; b) processing, by a computer, said collected sensor data in an anonymous way by grouping the collected sensor data into different heatspots, said heatspots including different areas of different significance for said user, and each of the heatspots comprising a given radius; c) labeling, by the computer, each of the heatspots with a unique identifier corresponding to a predetermined area; and d) generating, by the computer, a location aware user model based on said identifiers. So that said generated location aware user model being suitable for providing recommendations to the user via the mobile computing device, perform studies and/or provide an input to other user models.

Inventors:

LANTZ JOHAN (ES)
MATIC ALEKSANDAR (ES)

Application Number:

PCT/EP2019/078075

Publication Date:

April 23, 2020

Filing Date:

October 16, 2019

Export Citation:

Click for automatic bibliography generation Help

Assignee:

TELEFONICA INNOVACION ALPHA S L (ES)

International Classes:

H04W4/021; H04L29/08; H04W12/02

Domestic Patent References:

WO2015082003A1

2015-06-11

Foreign References:

US20150219466A1

2015-08-06

Other References:

"Image Analysis and Recognition : 11th International Conference, ICIAR 2014, Vilamoura, Portugal, October 22-24, 2014, Proceedings, Part I; IN: Lecture notes in computer science , ISSN 1611-3349 ; Vol. 8814", vol. 6696, 1 January 2011, SPRINGER, Berlin, Heidelberg, ISBN: 978-3-642-17318-9, article SIBREN ISAACMAN ET AL: "Identifying Important Places in People's Lives from Cellular Network Data", pages: 133 - 151, XP055322487, 032548, DOI: 10.1007/978-3-642-21726-5_9

Attorney, Agent or Firm:

DE CARLOS HERNANDO, Borja (ES)

Download PDF:

View/Download PDF PDF Help

Claims:

Claims

1. Method for providing a location aware user model preserving the user's privacy, the method comprising:

a) collecting, by a sensor capture module, sensor data from a plurality of sensors installed on a mobile computing device of a user;

b) processing, by a computer, said collected sensor data in an anonymous way by grouping the collected sensor data into different heatspots,

said heatspots including different areas of different significance for said user, and each of the heatspots comprising a given radius;

c) labeling, by the computer, each of the heatspots with a unique identifier corresponding to a predetermined area; and

d) generating, by the computer, a location aware user model based on said identifiers,

so that said generated location aware user model being suitable for providing recommendations to the user via the mobile computing device, perform studies and/or provide an input to other user models.

2. The method of claim 1 , wherein said step b) further comprises providing at least one timestamp to each heatspot indicating the moment of time the user reached the heatspot. 3. The method of previous claims, wherein each unique identifier is encrypted based at least on a part of the location coordinates of the predetermined area.

4. The method of previous claims, wherein the radius of the different heatspots are equal or different among them.

5. The method of previous claims, wherein the radius of the different heatspots range from a few meters to several kilometers.

6. The method of claim 1 , further comprising: performing steps a)-d) for a plurality of different users active in the same heatspots, such that a location aware user model is generated for each one of the plurality of different users; and calculating, by the computer, behavioral patterns between different users by correlating the generated location aware user models of the different users.

7. The method of claim 1 , further comprising: performing steps a)-d) for a plurality of different users active in the same heatspots, such that a location aware user model is generated for each one of the plurality of different users; computing, by the computer, a seed and using said computed seed for automatically creating and encrypting a random salt key; determining, by the computer, a hashing technique to be used to obfuscate the different heatspots, and transmitting to the mobile computing device of each user of said plurality of different users the encrypted random salt key and the determined hashing technique; and applying, by each mobile computing device, the hashing technique with the salt key to every heatspot and transmitting a hash to the computer. 8. The method of previous claims, wherein said collected sensor data includes one or more of the following: accelerometer data; activity data; data about installed applications in the computing device; data about a battery level of the computing device; data about Bluetooth beacons in the heatspot; call logs; data about de computing device including model and/or brand name; data indicating whether a headset is plugged in or not; Internet logs and/or surf history; current lux level; location data; whether music is playing or not; ambient noise level; pedometer data; network data about the computing device including roaming, operator, cell tower, data TX/RX, mobile/WiFi, airplane mode and/or country; data about places or type of establishments nearby the heatspot; data indicating whether a screen of the computing device is on/off; SMS logs; data indicating activity transitions of the user; and/or data indicating walking dynamics of the user.

9. System for providing a location aware user model preserving the user's privacy, comprising: a mobile computing device of a user; a sensor capture module installed on said mobile computing device to collect sensor data from a plurality of sensors installed on the mobile computing device; a computer including one or more processors and at least one memory, wherein the one or more processor being configured to:

process said collected sensor data in an anonymous way by grouping the collected sensor data into different heatspots,

said heatspots including different areas of different significance for said user, and each of the heatspots comprising a given radius;

label each heatspot with a unique identifier corresponding to a predetermined area; and

generate a location aware user model based on said identifiers, so that said generated location aware user model being suitable for providing recommendations to the user via the mobile computing device, perform studies and/or provide an input to other user models.

10. A non-transitory computer readable medium, comprising code instructions that when executed by a processor of a computer implement a method for providing a location aware user model preserving the user’s privacy by:

processing sensor data collected from a plurality of sensors installed on a mobile computing device of a user in an anonymous way by grouping the collected sensor data into different heatspots, said heatspots including different areas of different significance for said user, and each of the heatspots comprising a given radius;

labeling each of the heatspots with a unique identifier corresponding to a predetermined area; and

generating a location aware user model based on said identifiers,

11. The non-transitory computer readable medium of claim 10, wherein the processor when processing the collected sensor data further provides at least one timestamp to each heatspot indicating the moment of time the user reached the heatspot.

12. The non-transitory computer readable medium of claim 10 or 1 1 , wherein each unique identifier is encrypted based at least on a part of the location coordinates of the predetermined area.

Description:

Method, system and computer programs for providing a location aware user model preserving the user's privacy

Technical Field

This invention relates to a method, and corresponding system and computer programs, for ensuring user privacy for sensor data collected from a mobile computing device such as a smartphone.

Background of the Invention

Collecting large amounts of data from personal computing devices and moreover acquiring rich information about individuals naturally comes with the risk of invading the person’s privacy. Regardless of the fact that the user agrees with the consent that explains in details the data that is being collected and the intended use, GDPR strongly encourages data minimization and more importantly it prohibits the collection of data that is not required to deliver the service. Being able to obtain the same results and/or modelling accuracy with less data is hugely beneficial for any data-dependent service as it decreases the risk to expose personal information while boosting user’s trust and the perception of control.

There are known some APPS that take the advantage of phone sensors to deliver or improve their services, thus they often rely on privacy sensitive data. One common feature is geofencing where APPS can interact with the physical world to improve engagement and timeliness of interaction with a user.

New techniques/solutions are therefore needed to process personal information in a more anonymous way, so the information can be shared with backend services capable of building advanced user models and apply machine learning algorithms but without the risk of exposing information that could uniquely identify the user(s). Description of the Invention

To that end, embodiments of the present invention provide according to a first aspect a method for providing a location aware user model preserving the user's privacy, comprising collecting, by a sensor capture module, sensor data from a plurality (or a set) of sensors installed on a mobile computing device, for example a smartphone, of a user. Then, a computer processes the collected sensor data in an anonymous way by grouping the collected sensor data into different heatspots; labels each of the heatspots with a unique identifier corresponding to a predetermined area; and generates a location aware user model based on said identifiers. Thus, the generated location aware user model can be used for providing recommendations to the user via the mobile computing device, perform studies and/or provide an input to other user models.

According to the proposed method, the heatspots include different areas of different significance for the user. The heatspots have a given radius, equal or different among them, that can range from a few meters to several kilometers.

Moreover, according to the proposed invention, the collected sensor data includes one or more of the following: accelerometer data; activity data; data about installed applications in the computing device; data about a battery level of the computing device; data about Bluetooth beacons in the heatspot; call logs; data about de computing device including model and/or brand name; data indicating whether a headset is plugged in or not; Internet logs and/or surf history; current lux level; location data; whether music is playing or not; ambient noise level; pedometer data; network data about the computing device including roaming, operator, cell tower, data TX/RX, mobile/WiFi, airplane mode and/or country; data about places or type of establishments nearby the heatspot; data indicating whether a screen of the computing device is on/off; SMS logs; data indicating activity transitions of the user; and/or data indicating walking dynamics of the user.

The sensor capture module may reside in the platform layer of the application meaning that there is a separate version for iOS™ and Android™. Nonetheless, the concept is not limited to a specific platform and similar features could be available on other mobile platforms, embedded systems (loT) or even web browsers.

In an embodiment, the processing of the collected sensor data further comprises providing at least one timestamp to each heatspot indicating the moment of time the user reached the heatspot.

In an embodiment, each unique identifier is encrypted based at least on a part of the location coordinates of the predetermined area. In another embodiment, the proposed method is proposed for a plurality of different users active in the same heatspots, such that a location aware user model is generated for each one of the plurality of different users. In this case, the computer can calculate behavioral patterns between different users by correlating the generated location aware user models of the different users.

Complementary or alternatively, in yet another embodiment, the computer may further compute a seed and use said computed seed for automatically creating and encrypting a random salt key. Then the compute can determine a hashing technique (e.g. SHA- 256) to be used to obfuscate the different heatspots, and transmit to the mobile computing device of each user of the plurality of different users the encrypted random salt key and the determined hashing technique. Upon reception, each mobile computing device can apply the hashing technique with the salt key to every heatspot and further transmit a hash to the computer.

Other embodiments of the invention that are disclosed herein also include a system and software programs to perform the method embodiment steps and operations summarized above and disclosed in detail below. More particularly, a computer program product is one embodiment that has a computer-readable medium including computer program instructions encoded thereon that when executed on at least one processor in a computer system causes the processor to perform the operations indicated herein as embodiments of the invention.

Therefore, present invention allows for reaching an optimal trade-off between the user modeling power and the level of data sensitivity. Moreover, present invention increases user trust and decreases risk in case of data breaches. Besides, higher compliance with data regulations is achieved.

Brief Description of the Drawings

The previous and other advantages and features will be more fully understood from the following detailed description of embodiments, with reference to the attached figures, which must be considered in an illustrative and non-limiting manner, in which:

Fig. 1 graphically depicts a simple heatspot model used by the proposed invention.

Fig. 2 is a simplified visualization of how different heatspots are connected to each other. The transition from 2-4 indicates a missed location sample in a regular interval. Fig. 3 graphically depicts an example in which both User 1 and User 2 spends a significant amount of time in anonymized heatspot #56aa34532.

Fig. 4 is a flow chart illustrating the general flow from device detection to recommendation via analysis. Fig. 5 is an illustration of how the same device generates two different identifiers when reported to the computer/server.

Fig. 6 is an illustration in which in User A and B reports User C to the server but only the manufacturer identifier is preserved.

Fig. 7 is an illustration of how User A and User B would both report the same, anonymized identifier for User C.

Fig. 8 illustrates how User B’s privacy settings eliminate User A from the devices reported for analysis since it is outside of the predefined range.

Detailed Description of Preferred Embodiments

Present invention focuses on privacy preservation while still allowing for the sensor data collection and user modelling. The descriptions below focus on sensor data that can potentially expose private information and how that can be anonymized without losing the ability to process the data in a personalized way.

The aim of the present invention is to build a good user model that can be 100% anonymous using data that anonymized while still being equally or close to equally relevant as its non-privacy invasive counterpart.

Continuously uploading location information for a user exposes a lot of personal information; on the other hand it provides a rich insight into the users’ daily activities. In order to reduce the exposure risk and preserve user privacy present invention proposes a concept called the‘heatspot’ (i.e. areas of different significance for the user). The heatspot works in such a way that for each location obtained from the user’s mobile computing device it compares it to a list of locally cached areas within a certain radius and if there is a previous match the number of “hits” in that area is increased. The benefit of this is that it does not require continuous monitoring; on the contrary by obtaining a location at regular or fairly regular intervals the reliability of the heatspot importance is improved.

As an example, User A spends most his time at home or at work and has a 30 minute commute in between. A simplified day in User As life looks like this:

07:00 Wake up

08:00 Leave for work

08:30 Arrive at work

17:00 Leave work

17:30 Arrive home

23:00 Go to bed

With an approach that continuously monitors User A’s exact location his exact whereabouts over time would be tracked and persisted in the backend. However, if the sensor capture module before uploading the data for processing uses a heatspot approach, User As activities would be grouped into areas of different significance (see Figs. 1 and/or 2).

Now a machine learning algorithm can quite easily detect a pattern from this simplified case, labeling Heatspot 1 as User A’s Home, heatspot 2 as User A’s workplace and Heatspots 3-5 and intermediate points such as commuting.

Moreover, in an embodiment, if the heatspot identifier is reported in combination with a timestamp the granularity is further improved since it will allow transition monitoring between heatspots allowing user flows to be simulated without exposing location details.

Depending on the embodiment more or less precision might be required/desired and with the heatspot model the user can be in control of this. This is accomplished by exposing an option on application or user level that controls the size of the heatspot.

If fine precision is needed, for instance for a mental wellness APP that needs to know if the user is leaving home at all, the heatspot radius must be relatively low to be able to determine if the user is at home or in another heatspot. For a more generic purpose, it might be sufficient to have a larger heatspot radius for instance if it needs to be detected if the user is travelling for work or spend weekends away without exposing to where, then a heatspot the size of a city would be more than sufficient. In both cases the exact location is never compromised but having the option to tune the granularity might offer more peace of mind for the user.

In an embodiment, the heatspot is simply labeled or identified with an identifier that is specific for each user, i.e. users A, B and C will all have Heatspots 0, 1 , 2 and so on.

In another embodiment, the identifier is further encrypted based at least on a part of the location coordinates of the predetermined area. In this case, the computer would be able to correlate behaviors, movements etc. between users active in the same heatspots. It could also be used to conclude if users spending a lot of time in similar areas also share similar behaviors, problems etc.

For many services, developing behavioral models is highly dependent on establishing statistical relationships among different users, which therefore requires mapping between their collected data points, such as location. However, this is impossible if different users have different heatspot annotations. In order to allow for that while still fully preserving the users’ privacy, in an embodiment, the computer randomly creates a seed for creating a salt key. Then, the computer automatically creates the random salt key (with a pre-defined number of characters), encrypts it and stores it for the future use. The computer also decides on a hashing technique to be used to obfuscate the locations, e.g. SHA-256 (note that the computer can change a hashing technique over time to use the latest one), and communicates to the mobile computing device of the user the hashing technique and the encrypted salt key (this transfer can be performed in the same way that server and client side exchange the password, without any of the sides storing the raw value). Finally, the mobile computing device applies the hashing technique with the salt key to every location and sends only a hash to the computer.

Different computers will have its own salt key; therefore even if the same hashing function is accidentally used and the two computers communicate to each other they cannot map their users. This is extremely important because crossing two different data sets can endanger user privacy in unpredictable ways, and location if uniquely hashed can serve as a key to identify users.

Therapy application: In an embodiment, in particular for a company developing a therapy application, the application comprises several parts: an interactive therapy program designed to address the symptoms, a chat with the therapist or an anonymous support group, and other features. Now while the user may follow the program at certain pace or interact with the therapist or support group on random occasions, these are all user initiated actions. There is also a need for preventive measures and detecting anomalies in the user movement patterns as a good indicator that something might be wrong.

In the case of User A having a condition that makes it incredibly hard to leave home due to for example anxiety, depression or bad self-image it would be very valuable for the treatment application to proactively detect behavior that could be potentially harmful. However tracking the user’s location and action on a detailed level will be extremely privacy invasive and would pose great challenges on the security of the backend storage (or computer’s storage). On the other hand if the user is tracked based on anonymous heatspots and the algorithms in the backend (i.e. in the computer) have learned where is the home heatspot then it can easily detect if the user has not left that comfort zone for X days and in this case notify either the GP or the support group.

As a first step the application can query the user about the current perceived health state, then recommend the user to take a walk and finally“alert” the peers about a potentially unhealthy situation. In no case would this expose the user’s exact whereabouts.

As an example, the APP can provide a service for detecting early signs that a user is going to experience a mental health crisis (such as depression, mania, or similar). The literature showed that mobility patterns are important predictors of the upcoming crises. However, using raw locations is considered to be extremely privacy invasive, and in particular patients do not feel comfortable to share it. From the service side, storing raw locations poses additional requirements, for instance GDRP imposes “high” security measures that are extremely challenging to comply to particularly for smaller companies (such as physical security, logging not only electronic access to the server but authenticating people who are in physical vicinity of the server and granting special permissions, etc.). Storing heatspots instead of raw locations eliminates the data security requirements, while still allowing for the models to incorporate the analysis of mobility patterns (e.g. knowing that a sequence of very specific locations is a predictor of the crisis, the algorithm can have the same accuracy with heatspots like in the case of raw locations). Geofencing services:

In another embodiment, if an APP delivers notifications to its users, the right timing is crucial for the engagement. Knowing in which heatspots its users are more responsive for specific time periods, the“right time” algorithm can work therefore without the need to store real locations. In the same way, if some features of the APP rely on the proximity of its users (e.g. buying/selling items in the neighborhood); this function can work without the raw location data. Moreover, the concept of heatspots will support the case in which users set different granularity of location obfuscation (e.g. 100m versus 1 km), while indicating the precision in the interface. Browser logs:

Having access to the Internet browsing logs of a user provides a deep insight into not only internet browsing habits but also the type of content consumed, user’s preferences, and tastes. Expectedly, many studies showed that locations and internet history are the data categories with the highest privacy concerns. Thus, in an embodiment, the very same concept of heatspots applies for the obfuscation of the internet logs, representing online whereabouts as opposed to locations that represent whereabouts in real life. In order to apply the invention in the same way to the locations, the granularity is defined in the following way (note that the granularity was defined based on distances in the location use case). First, the following visibility levels of the internet logs are defined:

1 ) Timestamps of http(s) access (i.e. no any information about the requested domain).

2) Hashing only the domain name and sending it with the server (e.g. cnn.com shared as“ah13f;323f239tu2foiewewf”, uniquely for the same service). 3) Hashing the address up to the second hash 7” and sharing the hash with the server (e.g. cnn.com/sport/ shared as“24otih3094tfe2fij42”, uniquely for the same service).

4) Hashing address at the page level and sharing it with the server (e.g.

“en. Wikipedia. org/wiki/Josip_Broz_Tito”shared as “fuh8742hjas94ht2’[g”, uniquely for the same service).

5) Hashing the name of the first level category that the visited website or a service belongs to (e.g. the first level category Alexa defines as: Adult, Arts, Business, Computers, Games, Health, Home, Kids and Teens, News, Recreation, Reference, Regional, Science, Shopping, Society, Sports, World).

6) Hashing the name of the second level category that the visited website or a service belongs to (e.g. for Science Alexa defines 29 second level categories including Academic Departments, Agriculture, Anomalie & Alternative Science, Astronomy, Biology, etc.)

7) Hashing the name of the third, fourth .... level category that the visited website or a service belongs to (the number of the category levels is related to the used dictionary).

8) Sharing a non-hashed name of the first category level that the visited website or a service belongs to.

9) Sharing a non-hashed name of the second category level that the visited website or a service belongs to.

10) Sharing a non-hashed name of the thirs category level that the visited website or a service belongs to ... etc.

11 ) Sharing a non-hashed domain name.

12) Sharing a non-hashed domain name up to the second hash “I”.

Each next visibility level has one degree of a lower granularity from the previous one.

As an illustration, the above list is ordered from the lowest to the highest granularity with respect to the heatspot concept. However, variations in the above categories are allowed as long as they provide different levels of the URL visibility with the related partial or full obfuscation.

As it has been demonstrated here, https://arxiv.org/pdf/1710.00069.pdf different URL visibility levels indeed provide different user modelling predictive power (even only the timestamps can be sufficient for accurate user models). Bluetooth data:

The Bluetooth sensor is responsible for scanning the surroundings for Bluetooth or Bluetooth LE devices. This provides a way to detect which beacons that normally is available in the surroundings of the user. The most obvious example is a Bluetooth smartphone that would identify another individual but other devices such as smart speakers, TV’s etc. could indicate incoming level and other interesting parameters that are valuable for the user modelling.

Collecting this data however may come with serious privacy concerns. For instance, there are adult items that have Bluetooth, and the Bluetooth identifier easily reveals the manufacturer. Moreover, having raw Bluetooth identifier can indirectly reveal extremely privacy sensitive information, e.g. which exactly device a user is the surrounding of at 2am during the weekends. It could however still be valuable for the model to know that this device is frequently or repeatedly present in the surroundings of the user. If used in a raw format, it is possible to reverse engineer if the identifier corresponds to a mobile phone (therefore a person) or to a specific device, TV, headphones, laptop, etc.

Therefore, for protecting user privacy, the exact Bluetooth address should not be shared with the backend for analysis, unless protected. The general flow from device detection to recommendation via analysis is described on Fig. 4.

Strong local protection: Each APP can generate a unique and persistent identifier id. This id will be used to hash/encrypt the remote Bluetooth device address. For example, Bluetooth address: AABBCCDDEEFF11 would be 45fe12aa673423. This means that even if User A and User B sees the same device, they will report different identifiers to the computer/server. Recognition can only be done for the same reporting device (seeing the same beacon twice will generate the same result).

Fig. 5 illustrates an example of how the same device generates two different identifiers when reported to the server.

Strong local partial protection:

The first 3 bytes of a Bluetooth address identifies the manufacturer. By lowering the requirements slightly, the manufacturer could still be allowed to be identified while not exposing the device specific part of the address. For example, Bluetooth address: AABBCCDDEEFF11 would be AABBCCaa673423 (3 first bytes are preserved).

This allows detection of devices of the same brands and could potentially be tied into a position depending on other privacy settings but the actual unique device identifier is not exposed so there is no way to know if User A and B actually detected the same device when they saw User 3.

Fig. 6 illustrates an example in which User A and B reports User C to the server but only the manufacturer identifier is preserved.

Distributed protection: The examples above work but are only valid in the scope of one user. The alternative option is to privatize the personal information with a shared key or hash so that the result is always the same for the same device, regardless of which user encrypts it. This would allow for modelling of interactions between users and stationary beacons for different users of the same APP. For example, User B has Bluetooth address: AABBCCDDEEFF11. When User A sees user B he will report AABBCCaa673423 to the backend. When User C sees user B he will also report: AABBCCaa673423. This way it can be deducted that User A and User C both interact with User B even though the exact details of User B’s address is not shared. Fig. 7 is an illustration of how User A and User B would both report the same, anonymized identifier for User C.

Range restrictions:

The maximum Bluetooth range (for v5.0) is around 120m. For users concerned about being associated with that remote device an increase in privacy would be to limit the reported devices to ones that are in a restricted range. This is controlled by verifying that the RSSI value measured from the remote beacon is higher than a certain threshold correlating to a privacy level setting chosen by the user. Fig. 8 illustrates how User B’s privacy settings eliminate User A from the devices reported for analysis since it is outside of the predefined range. Over time the reports received by the server, in any of the described embodiments, will allow computing a model of how the user interacts with other peers and devices. It will also allow the system to learn about random encounters vs repeated ones. Devices that are part of the home scenario vs work. In an extension, it can also be used to anonymously map circles of users to each other if they are all using the same platform. In contrast to other commercial and ad focused services, this is all done while preserving the privacy of both the user and the detected peers.

The embodiments described above are to be understood as a few illustrative examples of the present invention. It will be understood by those skilled in the art that various modifications, combinations and changes may be made to the embodiments without departing from the scope of the present invention. In particular, different part solutions in the different embodiments can be combined in other configurations, where technically possible.

The scope of the present invention is defined in the following set of claims.

Previous Patent: AUTONOMOUS VEHICLE PLANNING

Next Patent: STACKABLE CONTAINER