Azure Microsoft

Personal Data in the cloud

Update 2020-08-07:
Lawyers and privacy professionals on LinkedIn are discussing these topics at the moment. There are many good points here, here, here and here, among many. Basically some of them are hinting a broad definition of  “transfer abroad” (no pun intended..). Storing documents in Microsoft 365, sending and receiving emails, among other things, can be interpreted to mean “transfer abroad” even if the data is stored in European datacenters. I have also seen people recommending outright not using U.S. cloud providers… Looking forward to seeing how this plays out in the near future.

TL;DR - EARN IT and LAED are bad news. Some Azure AD data is stored in the U.S. Encryption probably not the complete answer.

So I wanted to write this post because lately there has been much talk and writing about personal data and “the cloud”. I just need to sort my thoughts, and try to make sense of this.

There are several challenges in this domain, but I will not dive into the sensitive, confidential or otherwise classified information in this post. I just want this to be about personal data in the cloud (mostly Azure because this is what I know), GDPR, and some thoughts on legislation.

Does storing and processing personal data in European cloud datacenters provide sufficient protection? My initial interpretation was that the data needed to be geographically located in the U.S. for it to be covered by their legislation. I now realise that the issue might be more nuanced.

At the moment, there are valid concerns about proposed legislation LAED and EARN IT in the U.S.:

…the bill (LAED) is an actual, overt, make-no-mistake, crystal-clear ban on providers from offering end-to-end encryption in online services, from offering encrypted devices that cannot be unlocked for law enforcement, and indeed from offering any encryption that does not build in a means of decrypting data for law enforcement. - source

…but this is does not only relate to cloud providers. It is just bad news for privacy in general. Spread the word on LAED!

Collaboration tools

When using collaboration tools like email, lists, chats, etc., we are most likely storing personal data in the cloud. Basic definition of personal data:

Personal data is any information that relates to an identified or identifiable living individual. - Source

We can safely assume that our mailboxes, chats, online file storage, web sites, etc. contain such information. Emails can contain references to other individuals, phone numbers, addresses. Maybe some users are using their corporate email account for private correspondance? Cloud file storage can contain private pictures, pictures of IDs, passports, and similar documents. Are we sufficiently protected with this stored in Europe even though non-european cloud providers are managing the datacenters?

If you were solely relying on Privacy Shield to protect you while storing personal data overseas, this is now invalidated by EU Court of justice.  SCCs, if you have them, are not yet invalidated.

Most of the articles I have read about this, claim data must be stored, transferred to, or processed in the U.S. for this to be a pressing concern. If this is the case, cloud data stored in Europe region, should be sufficiently protected by GDPR and other EU regulations.

I can’t be sure about how this affects personal data located inside the EU, in cloud provider controlled datacenters. Leave a comment if you know and can document it.

Azure AD

For Azure (which is the cloud I know), this can be an issue in Azure AD. Azure AD is the identity component in Microsoft 365.

For customers who provided an address in Europe, Azure AD keeps most of the identity data within European datacenters.

Notable exceptions (stored in U.S. regions):

  • Azure AD B2B (which may include email address, among other things).
  • Multi-Factor Authentication (which most likely contains phone number, ip addresses, email address and/or mobile device information)
  • What MS calls “other considerations”:
    • “Services and applications that integrate with Azure AD have access to identity data.”
    • Like Facebook, SuperOffice, or any other third party application that requests consent.

The “other considerations” point in this list, is maybe a hidden risk. Many tenants allow regular users to grant access for third party applications, which may or may not store/process/transfer identity data from your Azure AD.

Some Azure services can also have data stored outside the preferred region. See this page for details. Look under “Additional information” or in the appendix below.

I am guessing this also applies to other cloud providers in some form.

Encryption

Is encrypting your data enough? How much benefit will you have from using the cloud if all data must be encrypted before upload. This means that the cloud provider does not have access to raw data, or the decryption key. It is most likely impossible, or at least very difficult, to achieve.

When using encryption in any cloud provider, the key is most likely stored in some sort of hosted secure storage resource. This opens for the possibility of the cloud provider being forced to hand over the encryption key to U.S. authorities if they request access. I am not saying this is guaranteed to happen, but it is a theoretical possibility.

If data is going to be processed in the cloud, it needs to be decrypted in the cloud, and this means that the cloud provider must have access to the decryption key in some form.

Besides, if LAED is put in to law, we are in practice no longer protected by encryption. The act will invalidate cloud provider encryption, because they will need to build backdoors or hand over a copy of the keys. We are back to manually encrypting data before uploading, and keeping the keys stored in our own datacenters. This does not mix well with SaaS and PaaS services.

Wrapping up

I see now that storing personal data in the cloud can be a mine field, and I will be on the lookout for any new information about this. We will certainly face challenges because of personal data in the cloud, but I hope that this will mostly be relevant if you are using non-regional services or services which explicitly store and process data outside of EU/EEA.

There are examples of EU data being collected by U.S. government (PRISM collected Skype data in 2009 before it was owned by MS), but this seems to be data collected in transit. 

After investigating this, I find I am more curious, than outright concerned. I am not saying cloud providers can’t be trusted with our data or our keys, I am just saying that there is an argument to be made here for theoretical access. As far as I can tell, none of the cloud providers have proven not trustworthy, and I am willing to give them the benefit of the doubt.

Comments are, as always, welcome. 

Disclaimer

Always assess and document the risks. I am not a lawyer, nor do I have any legal background, so these are just my opinions and thoughts. It should not be interpreted as legal advice, but treated more like a primer for further investigations.

Appendix

Some Azure geo exceptions

Microsoft will not store customer data outside the customer-specified Geo except for the following regional services:

  • Cloud Services, which back up web- and worker-role software deployment packages to the United States regardless of the deployment region.
  • Language Understanding may store active learning data in the United States, Europe, or Australia based on the authoring regions which the customer uses. See here for additional details.
  • Azure Machine Learning service may store free form text that the customer provides (e.g. names for workspaces, resource groups, experiments, files, and images) and experiment parameters in the United States.
  • Azure Sentinel, which generates new security data such as incidents, alert rules, bookmarks, etc., that themselves may contain customer data from the customer’s instances of Azure Log Analytics. Such security data generated by Azure Sentinel will be stored at rest in Europe (for security data generated from the customer’s Log Analytics workspaces located in Europe), Australia (for security data generated from the customer’s Log Analytics workspaces located in Australia), or in the United States (for security data generated from the customer’s Log Analytics workspaces located elsewhere).
  • Preview, beta, or other prerelease services, which typically store customer data in the United States but may store it globally.

Non-regional exceptions:

  • Content Delivery Network (CDN), which provides a global caching service and stores customer data at edge locations around the world.
  • Azure Active Directory, which may store Active Directory data globally. This does not apply to Active Directory deployments in the United States (where Active Directory data is stored solely in the United States) and in Europe (where Active Directory data is stored in Europe or the United States). See here for additional details.
  • Azure Multi-Factor Authentication, which stores authentication data in the United States. See here for additional details.
  • Azure Security Center, which may store a copy of security-related customer data, collected from or associated with a customer resource (e.g. virtual machine or Azure Active Directory tenant): (a) in the same Geo as that resource, except in those Geos where Microsoft has yet to deploy Azure Security Center, in which case a copy of such data will be stored in the United States; and (b) where Azure Security Center uses another Microsoft Online Service to process such data, it may store such data in accordance with the geolocation rules of that other Online Service.
  • Services that provide global routing functions and do not themselves process or store customer data. This includes Traffic Manager, which provides load balancing between different regions, and Azure DNS, which provides domain name services that route to different regions.

FISA and Section 702

FISA was used as a legal basis for U.S. surveillance projects such as PRISM.  

According to FISA section 702, the U.S. Government is allowed to collect and use all information on non-us citizens, and this data is among other methods collected via PRISM.

I assume Microsoft complies with at least some of the NSA requirements, but I am not 100% sure that this is the case. Anyway, we can assume that all data transferred to servers in the U.S. has been collected by PRISM. We can also assume that they are working hard on trying to decrypt https communication. They certainly have the resources for it.

Interesting external sources