Feature

GDPR-defined personal data can be hard to find—here's where to look

The General Data Protection Regulation (GDPR) puts the onus on organizations to better manage and personal data. But do they know where to find it? We list the areas most likely to be overlooked.

Doug Drinkwater May 01st 2018 A-A+

The EU’s General Data Protection Regulation (GDPR) is a big change from how many firms have approached data protection in the past, from how responsive their security teams need to be to how clearly and quickly they can tell where personal data resides. It’s on the issue of personal data that companies are starting to sweat the most.

With the May 25 deadline looming, it’s quite likely organizations still hold copious amounts of personally identifiable information (PII) — anything from cookie data to device identifiers to IP addresses — across disparate systems located on-premises and in the cloud. That’s before you get into the murky world of identifying whether your business is a data controller or processor.

What is PII and how can it be used?

Under GDPR, the processing of personal data is broader than under the previous local data protection legislations. Article 2 of the GDPR states that the regulation applies to “the processing of personal data wholly or partly by automated means and to the processing other than by automated means of personal data which form part of a filing system or are intended to form part of a filing system.”

So, how do you define personal data?

Under article 4, personal data means “any information relating to an identified or identifiable natural person (data subject); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.”

This can include IP addresses and cookie data and with GDPR introducing newer concepts like subject access requests (SARs), the right to be forgotten/right to deletion, and data portability, EU citizens now have a right to know what data is collected on them and that’s a concern for businesses when PII can be everywhere from email and social platforms to HR, HCM, and CRM systems. (For a deeper dive on PII, see “What is personally identifiable information (PII)? How to protect it under GDPR.”).

First step is a scoping exercise

The lack of awareness around where data resides has been troubling for organizations large and small.  Take UK pub chain Wetherspoons, for example, which apparently deleted its 500,000-plus email marketing database and started again, presumably under the belief that it couldn’t readily get renewed consent, nor properly manage and protect that personal data.

“We felt, on balance, that we would rather not hold even email addresses for customers. The less customer information we have, which now is almost none, then the less risk associated with data,” the firm said in a statement to Wired at the time.

Nick Ioannou, head of IT at UK-based architecture firm Ratcliffe Groves, says organizations need to first identify if they are a data processor or controller, as well as what data they already hold. “The first step is to identify who has access to the PII data and whether they are a controller or processor. This is also tied in to where your data is, for instance a cloud-based email system. Next is looking at the risks and security around the data, together with identifying any automated processing. Understanding the laws that affect your business that override GDPR is also important to meet your GDPR obligations correctly.”

The steps to finding unexpected PII

GDPR lists the six lawful reasons for processing personal data: consent, contract, legal obligation, vital interests, public task, and legitimate interests. The reason this is important is that once you have identified the PII you have and where it is, you need to identify the lawful basis for having it or change your processes, so you stop asking for PII you do not need.

First, how do you find it? Here are just a few examples of where PII might reside outside core operational systems:

  • Cloud apps, including those not approved by the organization
  • Online file-sharing services
  • Removeable media
  • Physical storage (file cabinets)
  • Third-party/supply chain providers
  • Temporary files
  • Sandbox/test systems
  • Backup systems
  • Employee devices

Ioannou says: “General Data Protection Regulation is really (analog and digital) data protection regulation, so the first thing to do is take a step back at look everywhere stuff is written down, printed, scanned or created, and stored as digital content. Shadow IT could contain lots of personal data that may not be expected to be there, as well as removable USB memory sticks and drives, as well as backups.”

Stewart says you need to be looking everywhere, quite literally. “Well everywhere…filing cabinets, third-party storage, file servers. The first step is to be clear what personal data is — information classification is a prerequisite so that you know personal data when you see it. I’ve heard of a number of organizations that effectively had to start their search for personal data again because they hadn’t formalized what they were looking for.”

Perhaps after the Cambridge Analytica scandal, he suggests the supply chain too will soon feel the effects: “Supply chain is definitely an important place to look. I would also suggest backup and archive resources. Also bear in mind that GDPR is landing in the middle of the largest migration of human knowledge in history.”

Stewart is referring to the rapid move from onsite storage to the cloud. "While this isn’t a bad thing per se, the drivers are usually reduced storage costs or move before a disk runs out of space. Therefore, most organizations are doing lift and shift wholesale migrations of content that they don’t fully understand. There will be all sorts of sensitive personal data that has been moved to cloud without realizing it.”

Nic Miller, now a consulting virtual CISO but previously CISO at European hedge fund management company Brevan Howard, also sees too much real data being used in test systems. He believes unstructured data is going to be the blind spot for many organizations. “Shared folders, scratch/temp drives etc., and there's no simple way to search for personal data...remembering that personal data is a wider net than the better understood PII."

“A lot of companies will use third-party services for a number of staff services, payroll, pensions, insurance etc. All these companies will hold large amounts of sensitive data on the majority of your staff,” says Miller. “Don't just look at this through a due diligence lens, though. Consider how that information is being shared. If it's being emailed backwards and forwards with encrypted attachments that's both an accident waiting to happen when it is email to an incorrect address, and it's causing you greater problems with the proliferation of this data internally though email archiving, etc.” 

How do organizations move forward?

Miller adds that process is important; “GDPR talks about implementing ‘appropriate technical and organizational measures to ensure a level of security appropriate to the risk’ so in order to prove we have the appropriate measures, our IT infrastructure and processes need to be documented and the risks assessed. Shadow IT, retention, rights, sharing and access control needs to be looked at, together with just about every business process and where the GDPR obligations impact these processes.” 

“Two key activities should be priorities,” stresses Stewart. “First, put in place a process to manage project risk and implement ‘secure by design’. Second, define personal data, run a discovery process to find it in BAU and then perform a high-level risk assessment using a triage process of a core of key controls that you think deliver 80 percent of the safety needed for personal data. For example, access control reviews, logging and monitoring, vulnerability management etc. might be part of a ‘top ten.”

Stewart goes onto say that while encryption and pseudonymization technologies are “great...they are fairly high fruit on the tree.” Whitfield agrees, adding that it’s good for protecting data and if “not used correctly can leave you with a false sense of protection.”

“Many organizations are still struggling to do the basics,” says Stewart, citing an example of one international bank which doesn’t know “how many servers they have and what they’re being used for. I find it hard to believe claims that they’re ‘on top of’ GDPR compliance. I’d suggest a landscape view of personal data and risk should be a priority over any particular technical control. The ICO has very much signaled a risk-based approach. To do that credibly you need the landscape view.”

Nik Whitfield warns about being sucked in by vendor solutions too, also warning of encryption; “Be wary of letting technology vendors guide your strategy. Products that offer 'GDPR compliance' are only giving you solutions to very specific parts of the problem."