The General Data Protection Regulation (GDPR) has made it necessary to protect user privacy. According to the GDPR, you should remove any personally identifiable information before transferring user data to any US-owned tool. This step became necessary due to Privacy Shield invalidation.
In this article, I will describe how to automatically remove user data using stape Anonymizer power-up and manually redact user data via web and server GTM. This is an extension to the article published in our blog, which covers why you need to use a proxy server to use Google Analytics in a GDPR-compliant way.
There were a few incidents in EU countries (Italy, France, Austria, and Denmark) when people contacted local data protection authorities to verify if using Google Analytics on the website falls under GDPR. The answer in all cases was - using Google Analytics is not GDPR compliant.
The main reason is that US companies (including Google) do not provide enough security measures to protect the personal data of EU users. That is why sharing PII with US companies falls against GDPR. You can find more information about it in our earlier blog post.
The good news is there is a solution to use Google Analytics and still be GDPR-compliant. CNIL (French data protection authority) said that to use GA in a GDPR-compliant way, you should implement two main things: EU proxy-server and pseudonymization of user data before the export.
Proxy-server ensures no direct contact between the website and the US analytics tool. The easiest way to implement such a proxy server is by using the server Google Tag Manager container. Proxy-servers must meet a range of criteria. The main area: the company that provides you with a proxy server must be registered in the EU; servers used to host your sGTM container must be physically located in the EU. For these two reasons, you can’t use Google Cloud (GCP) for sGTM. Basically, it’s the same reason as Google Analytics - Google, a US company, owns it.
Another good news is that stape has got you covered. We have a specific product - Stape Europe that meets all requirements for the EU proxy server. Stape Europe is registered in the EU (Estonia) and uses the EU cloud server provided by Scaleway to run your sGTM container.
In this article, I want to focus more on the second part of the law, which is the pseudonymization of user data. At Stape, we are implementing a list of features that will help you to remove user data automatically. That is why I will divide the article into two parts:
The list of user data that should be pseudonymized is quite vague.
For now, we're designing the Stape Anonymizer power-up only for GA4. However, it will be adapted and made available with UA's anonymization feature in future updates.
It’s essential to understand that the list of parameters that GA4 sends can change. We will keep this article updated, but ensure you test user data anonymization before publishing it to production.
The best tool I’ve found that helps keep track and identify GA4 parameters is this one.
The process of user data pseudonymization takes place inside the GA4 tags in the web and server GTM container. If you have not set up server GA4 yet, follow these steps.
We do not have strict guidelines on what data must be removed. It’s up to you how you want your company to be secure. For example, you can remove the user’s IP or redact the last few digits. Another big question is about parameters like country, language, browser, etc. Each parameter individually does not give enough user identification information, but a set of parameters can provide it.
There are no questions on whether you should remove parameters like client id or URL queries. Using each parameter individually can lead to user identification because of the unique ID in Google.
Let’s say it may be essential for you to analyze mobile vs. desktop traffic or conversions in different browsers. Should you remove all data that can be used for fingerprinting and user identification or remove only some? Can you leave the browser and device if you remove all other parameters?
Ensure you discuss these questions with your lawyers or DPO to have good protection if the regulator comes to you. I believe that removing all user identifiers that can be used for fingerprinting and re-identifying is better to keep your company secure.
This article does not pretend to be an instruction. It’s just sharing experience on removing or pseudoanonymize data and how stape does it automatically. You can select not to use our anonymization power-up or manually anonymize each parameter.
We’ve recently released an Anonymizer power-up. It’s available for all Stape users. The main goal of the anonymizer is to either remove or anonymize user data in Google Analytics 4 and Universal Analytics.
To enable the anonymizer, open the sGTM container in stape, click power-up and open the anonymizer.
This product includes GeoLite2 data created by MaxMind, available from https://www.maxmind.com
You will have to select what parameters you want to leave as is, remove or anonymize. Once parameters are configured, you should update tagging server URL for Google Analytics 4 and Universal Analytics. If you’ve previously used tagging server URL https://sgtm.example.com when anonymizer is enabled, the updated tagging server URL will look like https://sgtm.example.com/anonymize. We proxy your requests to sGTM through /anonymize path and remove specified data.
When GA requests go through the tagging server URL that includes /anonymize, we automatically remove or anonymize selected parameters.
After enabling and configuring Anonymizer, ensure you've changed the GA4/UA transport URL in the Web GTM config tag to the one that ends /anonymize.
Below is a list of all parameters that Anonymizer can either remove and anonymize. When creating Anonymizer our goal was to give our clients ability to remove all parameters that somehow can be consider as personal user data. You can select which parameters you want to remove. Talk with your DPO or lawyers to specify which parameters need to be removed.
You will have two options for most parameters: leave as is or remove. For two parameters (IP and Client ID), you will see options to Anonymize and Anonymize Strictly.
IP
Anonymize - removes the last octet.
Anonymize Strictly - removes the last two octets
Client ID. Work only if you use JavaScript Managed client identification.
Anonymize - use a hash of IP+UserAgent and add year+month.
Anonymize Strictly - use a hash of IP+UserAgent and add a timestamp, crc32_hash(IP+UA).timestamp
Parameter name | Description | GA4 Parameter | Anonymize |
IP | User IP | IP Address | Anonymize - removes the last octet. Anonymize Strictly - removes the last two octets |
Client ID | Google Analytics Client ID, _ga, _ga_*, FPLC, FPID cookies | cid, _ga, _ga_*, FPLC, FPID | Anonymize - use a hash of IP+UserAgent and add year+month. Anonymize Strictly - use a hash of IP+UserAgent and add a timestamp, crc32_hash(IP+UA).timestamp |
User ID | User ID, Google Developer ID, Firebase ID | uid, gdid, _fid | - |
Session ID | Session ID, New Session ID | sid, _nsi | - |
Query parameters | Remove query paramaters from Document Location | dl | - |
Referer | Document Referrer Header, Document Referrer Parameter | referer header, dr | - |
User Agent | Document User-Agent header, Sec-Ch-Ua header, Sec-Sh-Ua-Platform header, Sec-Ch-Ua-Mobile header, User-Agent Parameter | user-agent header, sec-ch-ua header, sec-ch-ua-platform header, sec-ch-ua-mobile header, ua | - |
User Country | Geographical ID, Current country for the user | geoid, _uc | - |
Browser plugins | Java Enabled, Flash Version | je, fl | - |
Screen Info | Browser screen resolution, Viewport size | sr, vp | - |
Screen Colors | Specifies the screen color depth | sd | - |
User Language | Browser active locale | ul | - |
User Agent Architecture | uaa | - | |
User Agent Bitness | uab | - | |
User Agent Full Version List | uafvl | - | |
User Agent Mobile | uamb | - | |
User Agent Model | uam | - | |
User Agent Platform | uap | - | |
User Agent Platform Version | uapv | - | |
User Agent WOW64 | uaw | - |
Campaign Medium | cm | - | |
Campaign Source | cs | - | |
Campaign Name | cn | - | |
Campaign Content | cc | - | |
Campaign ID | ci | - | |
Campaign Term | ck | - | |
Campaign Creative Format | ccf | - | |
Campaign Marketing Tactic | cmt | - | |
Google Ads ID | gclid | - | |
Google Display Ads ID | dclid | - |
Parameters that Google Analytics 4 collects change from time to time. So you need to check your GA4 requests to ensure all user data is removed.
After you configured parameters in Anonymizer and changed GA4 transport URL to the one that containers /anonymize in the end, we will remove or anonymize specified parameters.
After enabling Anonymizer and updating GA4 transport URL, please use web/sGTM debuggers, console, and GA4 debugger to test if all required parameters were removed.
This one is relatively easy to implement but has some controversy. Google has a built-in feature to remove the last byte of the IP address. By cutting the previous byte, the chance that google can identify users is 1 to 256. In combination with other parameters, IP can quickly identify a specific person.
Some people think that cutting the last octet is enough. Others believe that you need to remove user IP altogether. My opinion is that it’s better to override the user IP completely. You never know if/how Google reuses IP.
To remove the user IP, I’ve used the server GA4 tag and set an ip_overrride to a random IP.
Google assigns a unique client ID to the browser device pair and uses it to identify when the same user revisits your site. This parameter must be removed or pseudoanonymize before sending to GA4.
There are numerous approaches to anonymizing client IDs, all up to your imagination and a suite of tools you use. But make sure that the client id is unique and that you added a time-vary component.
You can use a hash of user agent, IP, GTM random number variable, etc. Unlike User IP, we did not find a way to redact the client id on the server side, so we did it on the client.
Once you've anonymized Google Analytics Client ID, you may want to override GA4 cookies with the new values to ensure that GA4 does not set any user identifiers. To do so, I've used the Cookie Monster tag template for the server GTM container. All you need to do is add cookie names and values. once done, do not forget to use the console and check the cookies GA sets.
After you redact the client id, it will significantly impact GA4 reporting. Since the client id will be unique, GA won’t be able to determine new vs. returning visitors. As well as multi-channel attribution and events like sessions start, first visit, etc.
An external referrer is designed to determine how a user landed on your site. Was it organic, paid, or maybe social traffic.
To remove, you should rewrite page_referrer.
The primary purpose of parameters in the URL is to determine the origin of advertising campaigns. URL parameters can be utm_souce, utm_medium, different click ID types, etc. Besides that, some platforms automatically insert user data into the URL.
To remove URL parameters, you must rewrite the page URL. Several variables in the web GTM template gallery can help you with this. I’ve used Trim Query. You just need to specify a blocklist or allowlist of query parameters, which will do all the magic for you.
Such information can be user agent, device, browser, screen resolution, language, operating system, etc. Make sure you’ve redacted all information that can be used for fingerprinting.
Ensure you do not use cross-site identifiers like a user or CRM ID.
This part is a bit hard to understand, but I suggest checking the request that your sGTM container sends to GA and ensuring there are no parameters that can be used for user re-identification.
There are several ways to check if all necessary data was removed or pseudonymized. You first want to go to the server GTM debugger and see outgoing GA4 requests. Ensure that you test different scenarios when there are user parameters vs. no user parameters, URL parameters, various events, referrers, etc.
The second way is to use Google Analytics 4 debugger and see what data GA4 processes.
It's not just Google that collects EU user data and transfers it to the US, which violates GDPR. Multiple companies have collected personal data from Europeans for years, and now it seems their practices will be restricted across the board in response to the Privacy Shield deactivation and ruling that data transfer of EU users to the US is illegal under GDPR.
If you are a website owner in the European Union, it’s time to start changing what data you share with US companies, or you may be at risk of being fined by regulatory enforcement.
1. How can I use proxy-server for GA when implemented through gtag.js?
If you use gtag.js on your website to send events to your server container, you can add the transport_url parameter to your existing tag:
gtag('config', 'TARGET-ID', {
'transport_url': 'https://analytics.example.com',
'first_party_collection': true,
});
You can use an anonymizer URL to anonymize user data in GA when implemented via gtag.js. Let's say you use stape anonymizer and your anonymizer URL is https://sgtm.site.com/anonymize. You just need to add https://sgtm.site.com/anonymize as a transport URL to gtag config.