How to anonymize user data in Google Analytics 4

Author
Stape
Published
Sep 21, 2022

The General Data Protection Regulation (GDPR) has made it necessary to protect user privacy. According to the GDPR, you should remove any personally identifiable information before transferring user data to any US-owned tool. This step became necessary due to Privacy Shield invalidation.

In this article, I will describe how to automatically remove user data using stape Anonimyzer power-up and manually redact user data via web and server GTM. This is an extension to the article published in our blog, which covers why you need to use a proxy server to use Google Analytics in a GDPR-compliant way

Why should you remove PII from Google Analytics 4Copy link to this section

There were a few incidents in EU countries (Italy, France, Austria, and Denmark) when people contacted local data protection authorities to verify if using Google Analytics on the website falls under GDPR. The answer in all cases was - using Google Analytics is not GDPR compliant. 

The main reason is that US companies (including Google) do not provide enough security measures to protect the personal data of EU users. That is why sharing PII with US companies falls against GDPR. You can find more information about it in our earlier blog post

The good news is there is a solution to use Google Analytics and still be GDPR-compliant. CNIL (French data protection authority) said that to use GA in a GDPR-compliant way, you should implement two main things: EU proxy-server and pseudonymization of user data before the export.

Proxy-server ensures no direct contact between the website and the US analytics tool. The easiest way to implement such a proxy server is by using the server Google Tag Manager container. Proxy-servers must meet a range of criteria. The main area: the company that provides you with a proxy server must be registered in the EU; servers used to host your sGTM container must be physically located in the EU. For these two reasons, you can’t use Google Cloud (GCP) for sGTM. Basically, it’s the same reason as Google Analytics - Google, a US company, owns it. 

Another good news is that stape has got you covered. We have a specific product - Stape Europe that meets all requirements for the EU proxy server. Stape Europe is registered in the EU (Estonia) and uses the EU cloud server provided by Scaleway to run your sGTM container

In this article, I want to focus more on the second part of the law, which is the pseudonymization of user data. At Stape, we are implementing a list of features that will help you to remove user data automatically. That is why I will divide the article into two parts:

  • How you can automatically remove/pseudonymize user data using stape Anonymizer power-up
  • How to manually remove user data using web and server GTM.

The list of user data that should be pseudonymized is quite vague. 

  • IP address.
  • User identifiers. (like Google client ID)
  • External referrer.
  • URL parameter.
  • Any data that can be used for fingerprinting.
  • Cross-site identifier.
  • Any data that could be used for user identification.

For now, we're designing the Stape Anonymizer power-up only for GA4. However, it will be adapted and made available with UA's anonymization feature in future updates.

It’s essential to understand that the list of parameters that GA4 sends can change. We will keep this article updated, but ensure you test user data anonymization before publishing it to production. 

The best tool I’ve found that helps keep track and identify GA4 parameters is this one

How to approach user data anonymizationCopy link to this section

The process of user data pseudonymization takes place inside the GA4 tags in the web and server GTM container. If you have not set up server GA4 yet, follow these steps.

We do not have strict guidelines on what data must be removed. It’s up to you how you want your company to be secure. For example, you can remove the user’s IP or redact the last few digits. Another big question is about parameters like country, language, browser, etc. Each parameter individually does not give enough user identification information, but a set of parameters can provide it. 

There are no questions on whether you should remove parameters like client id or URL queries.  Using each parameter individually can lead to user identification because of the unique ID in Google. 

Let’s say it may be essential for you to analyze mobile vs. desktop traffic or conversions in different browsers. Should you remove all data that can be used for fingerprinting and user identification or remove only some? Can you leave the browser and device if you remove all other parameters? 

Ensure you discuss these questions with your lawyers or DPO to have good protection if the regulator comes to you. I believe that removing all user identifiers that can be used for fingerprinting and re-identifying is better to keep your company secure. 

This article does not pretend to be an instruction. It’s just sharing experience on removing or pseudoanonimise data and how stape does it automatically. You can select not to use our anonymization power-up or manually anonymize each parameter. 

Remove user data from Google Analytics with the help of AnonimyzerCopy link to this section

We’ve recently released a beta version of the Anonymizer power-up. It’s available for all Stape Europe users. The main goal of the anonymizer is to either remove or anonymize user data in Google Analytics 4 and Universal Analytics. 

To enable the anonymizer, open the sGTM container in https://app.eu.stape.io/, click power-up and open the anonymizer. 

anonymizer power up 

Anonymizer power-up is still in beta, as we are adding new features and testing for uncommon use cases. 

You will have to select what parameters you want to leave as is, remove or anonymize. Once parameters are configured, you should update tagging server URL for Google Analytics 4 and Universal Analytics. If you’ve previously used tagging server URL https://sgtm.example.com when anonymizer is enabled, the updated tagging server URL will look like https://sgtm.example.com/anonymize. We proxy your requests to sGTM through /anonymize path and remove specified data.

When GA requests go through the tagging server URL that includes /anonymize, we automatically remove or anonymize selected parameters. 

After enabling and configuring Anonymizer, ensure you've changed the GA4/UA transport URL in the Web GTM config tag to the one that ends /anonymize

Below is a list of all parameters that Anonimiser can either remove and anonymise. When creating Anonimyser our goal was to give our clients ability to remove all parameters that somehow can be consider as personal user data. You can select which parameters you want to remove. Talk with your DPO or lawyers to specify which parameters need to be removed. 

General InfoCopy link to this section

You will have two options for most parameters: leave as is or remove. For two parameters (IP and Client ID), you will see options to Anonymize and Anonymize Strictly. 

IP

Anonymize - removes the last octet.

Anonymize Strictly - removes the last two octets

Client IDWork only if you use JavaScript Managed client identification.

Anonymize - use a hash of IP+UserAgent and add year+month.

Anonymize Strictly - use a hash of IP+UserAgent and add a timestamp, crc32_hash(IP+UA).timestamp

Parameter name DescriptionGA4 ParameterAnonymize
IPUser IPIP AddressAnonymize - removes the last octet. Anonymize Strictly - removes the last two octets
Client IDGoogle Analytics Client ID, _ga, _ga_*, FPLC, FPID cookiescid, _ga, _ga_*, FPLC, FPID Anonymize - use a hash of IP+UserAgent and add year+month. Anonymize Strictly - use a hash of IP+UserAgent and add a timestamp, crc32_hash(IP+UA).timestamp
User IDUser ID, Google Developer ID, Firebase IDuid, gdid, _fid-
Session IDSession ID, New Session IDsid, _nsi-
Query parametersRemove query paramaters from Document Locationdl-
RefererDocument Referrer Header, Document Referrer Parameterreferer header, dr-

System infoCopy link to this section

User AgentDocument User-Agent header, Sec-Ch-Ua header, Sec-Sh-Ua-Platform header, Sec-Ch-Ua-Mobile header, User-Agent Parameteruser-agent header, sec-ch-ua header, sec-ch-ua-platform header, sec-ch-ua-mobile header, ua-
User CountryGeographical ID, Current country for the usergeoid, _uc-
Browser pluginsJava Enabled, Flash Versionje, fl-
Screen InfoBrowser screen resolution, Viewport sizesr, vp-
Screen ColorsSpecifies the screen color depth
sd-
User LanguageBrowser active localeul-

User Agent ParsedCopy link to this section

User Agent Architectureuaa-
User Agent Bitnessuab-
User Agent Full Version Listuafvl-
User Agent Mobileuamb-
User Agent Modeluam-
User Agent Platformuap-
User Agent Platform Versionuapv-
User Agent WOW64uaw-

Ads Campaign AttributionCopy link to this section

Campaign Mediumcm-
Campaign Sourcecs-
Campaign Namecn-
Campaign Contentcc-
Campaign IDci-
Campaign Termck-
Campaign Creative Formatccf-
Campaign Marketing Tacticcmt-
Google Ads IDgclid-
Google Display Ads IDdclid-

Parameters that Google Analytics 4 collects change from time to time. So you need to check your GA4 requests to ensure all user data is removed. 

After you configured parameters in Anonimiser and changed GA4 transport URL to the one that containers /anonymize in the end, we will remove or anonymise specified parameters. 

After enabling Anonymizer and updating GA4 transport URL, please use web/sGTM debuggers, console, and GA4 debugger to test if all required parameters were removed

Manually remove PII from GA4 using GTMCopy link to this section

1. IP addressCopy link to this section

This one is relatively easy to implement but has some controversy. Google has a built-in feature to remove the last byte of the IP address. By cutting the previous byte, the chance that google can identify users is 1 to 256. In combination with other parameters, IP can quickly identify a specific person.

Some people think that cutting the last octet is enough. Others believe that you need to remove user IP altogether. My opinion is that it’s better to override the user IP completely. You never know if/how Google reuses IP.

“It should be noted that online identifiers, such as IP addresses or information stored in cookies can commonly be used to identify a user, particularly when combined with other similar types of information. This is illustrated by Recital 30 GDPR, according to which the assignment of online identifiers such as IP addresses and cookie identifiers to natural persons or their devices may "leave traces which, in particular when combined with unique identifiers and other information received by the servers, may be used to create profiles of the natural persons and identify them." 

This is what CNIL (french regulator) says about the IP


To remove the user IP, I’ve used the server GA4 tag and set an ip_overrride to a random IP. 

remove user IP

2. User identifiersCopy link to this section

Google assigns a unique client ID to the browser device pair and uses it to identify when the same user revisits your site. This parameter must be removed or pseydoanonimise before sending to GA4. 

“ To ensure effective pseudonymization, the algorithm performing the replacement should ensure a sufficient level of collision (i.e., a sufficient probability that two different identifiers will give an identical result after a hash) and include a time-varying component (adding a value to the hashed data that evolves over time so that the hash result is not always the same for the same identifier) ;”

This is what CNIL says about client ID.


There are numerous approaches to anonymizing client IDs, all up to your imagination and a suite of tools you use. But make sure that the client id is unique and that you added a time-vary component. 

You can use a hash of user agent, IP, GTM random number variable, etc. Unlike User IP, we did not find a way to redact the client id on the server side, so we did it on the client. 

hash client ID
rewrite client ID

Once you've anonymized Google Analytics Client ID, you may want to override GA4 cookies with the new values to ensure that GA4 does not set any user identifiers. To do so, I've used the Cookie Monster tag template for the server GTM container. All you need to do is add cookie names and values. once done, do not forget to use the console and check the cookies GA sets. 

override google analytics cookies

After you redact the client id, it will significantly impact GA4 reporting. Since the client id will be unique, GA won’t be able to determine new vs. returning visitors. As well as multi-channel attribution and events like sessions start, first visit, etc. 

3. External referrerCopy link to this section

An external referrer is designed to determine how a user landed on your site. Was it organic, paid, or maybe social traffic. 

To remove, you should rewrite page_referrer. 

remove external referrer

4. Parameters contained in the collected URLsCopy link to this section

The primary purpose of parameters in the URL is to determine the origin of advertising campaigns. URL parameters can be utm_souce, utm_medium, different click ID types, etc. Besides that, some platforms automatically insert user data into the URL.  

To remove URL parameters, you must rewrite the page URL. Several variables in the web GTM template gallery can help you with this. I’ve used Trim Query. You just need to specify a blocklist or allowlist of query parameters, which will do all the magic for you.  

remove URL parameters GA4

5. Information that can be used to generate a fingerprintCopy link to this section

Such information can be user agent, device, browser, screen resolution, language, operating system, etc. Make sure you’ve redacted all information that can be used for fingerprinting. 

remove data that can be used to generate fingerprint

6. Any lasting or cross-site identifiersCopy link to this section

Ensure you do not use cross-site identifiers like a user or CRM ID. 

7. Any other data that can lead to re-identificationCopy link to this section

This part is a bit hard to understand, but I suggest checking the request that your sGTM container sends to GA and ensuring there are no parameters that can be used for user re-identification. 

How to test anonymization Copy link to this section

There are several ways to check if all necessary data was removed or pseudonymized. You first want to go to the server GTM debugger and see outgoing GA4 requests. Ensure that you test different scenarios when there are user parameters vs. no user parameters, URL parameters, various events, referrers, etc. 

test GA4 user data anonimisation

The second way is to use Google Analytics 4 debugger and see what data GA4 processes. 

test GA4 user data anonymisation GA4 debugger

Conclusion:Copy link to this section

It's not just Google that collects EU user data and transfers it to the US, which violates GDPR. Multiple companies have collected personal data from Europeans for years, and now it seems their practices will be restricted across the board in response to the Privacy Shield deactivation and ruling that data transfer of EU users to the US is illegal under GDPR. 

If you are a website owner in the European Union, it’s time to start changing what data you share with US companies, or you may be at risk of being fined by regulatory enforcement.

Frequently asked questionsCopy link to this section

1. How can I use proxy-server for GA when implemented through gtag.js?

If you use gtag.js on your website to send events to your server container, you can add the transport_url parameter to your existing tag:

gtag('config', 'TARGET-ID', {

'transport_url': 'https://analytics.example.com',

'first_party_collection': true,

});

You can use an anonymizer URL to anonymize user data in GA when implemented via gtag.js. Let's say you use stape anonymizer and your anonymizer URL is https://sgtm.site.com/anonymize. You just need to add https://sgtm.site.com/anonymize as a transport URL to gtag config.

Tagged with:sGTM tag

Host your GTM server at Stape