Privacy and Google Analytics

The problem

After reading a post comparing analytics solutions, I realised most people ignore that Google Analytics can be fine tuned to comply with stricter privacy policies. In this post, I'll share the solutions I implemented to keep using Google Analytics without compromising my visitors (a.k.a. yourself) privacy.

Please note that the code only relates to the latest analytics.js API. Look for support pages for older versions of Google Analytics.

The solutions

Disable data sharing

By default, the data collected from Google Analytics are shared across several services like:

Though this data sharing should be harmless most of the time, you can disable it in your account settings. You can read more about it on the data sharing settings support page.

Anonymise IP address

Google Analytics has a built-in feature to anonymise IP address (see this help page for details). When enabled, the last few digits of the visitor's IP address are set to 0 (IPv6 is supported). In theory it is not possible for Google to map an anonymised IP address to a specific user. Also, according to their documentation, the complete IP address is never logged.

However, I read the relevant document carefully and noticed it's not very clear to me if Google logs the IP address when the analytics.js file is requested from their server. If you're a bit paranoid, you may decide not to trust Google here. In such case, you'll want to proxy it on your own domain and refresh it as often as possible. I haven't tried myself, but if you're doing that, let me know as I'd love to read more.

Obviously, the down side of this setting is you get a less accurate geographic report as it is based on the IP address. But who needs a granularity to the level of a city?

Implement an opt-out mechanism

You can easily give your users the ability to deactivate tracking. Just create a setting in your app that sets the following property to true:

window['ga-disable-UA-XXXXXX-Y'] = true; // Replace UA-XXXXXX-Y by your Google Analytics ID

You can persist this setting in your user account, so that her preference is is kept whatever the device she's using to access your website. But for performance reason, I recommend not to load Google Analytics at all when your user has opted out. You can read more about user opt-out in the help pages.

Don't use cookies

Though not directly related to privacy, not setting cookies is likely to increase your visitors confidence while removing the need to display a message related to the EU cookie directive.

Instead of cookies, use local storage for storing the visitor identifier used by Google Analytics, the so called clientId parameter. This technique has a beneficial aspect on performance as well. The browser won't send all the cookie detail with every request made to an asset located in the same domain as the page.

Here is the code used in this blog:

ga('create', 'UA-XXXXXX-Y', { // Use your own ID!
  'storage': 'none',
  'clientId': localStorage.getItem('gaClientId')
});
ga(function(tracker) {
  localStorage.setItem('gaClientId', tracker.get('clientId'));
});
ga('send','pageview')

Comply with Do Not Track

The DNT is not only exposed to the client as a HTTP header but also as a property on the navigator object: navigator.doNotTrack. This returns:

You can wrap the code responsible for loading Google Analytics in a if statement like I do in this blog (examine the source for yourself if you don't believe me!):

if (navigator.doNotTrack !== '1') {
  // Load Google Analytics and track user.
}

Force SSL

This blog is hosted on Github pages. Unfortunately, https is not (yet?) supported. The least I could do was to force the analytics.js file and the beacon to be respectively received and sent encrypted. More info about the forceSSL setting is available.

The challenge

Some of the solutions presented here depend on your level of trust. I decided to trust Google even if I can never have the certitude that they really do what they say they do. So after all, your trust level may be different and you can decide whether to apply the solutions mentioned above or turn to other analytics services, like Piwik.

Also, I would have loved to see Google making some of these options easier. For example, support of DNT could have been made an option in the account settings. Also, we can't create a custom IP masking function to push anonymisation even further.

Google Analytics offer some very basic privacy aware options, but they should offer more!

Comments

Experimenting with serverless apps

Servers are inherent to the web. The recent unhosted movement tends to reduce their importance to give the power back to the client applications.

But is a completely serverless web really achievable?

Let's stay very basic for now and let's not consider issues like security or version update.

(Note: the content of this post was inspired and developed during an event and a hack day I attended recently.)

A dystopian example

Imagine you are demonstrating against the government in place in your country. They are powerful and control the communication. In order to slow and confuse you they cut all network. Without internet you can't communicate with your peers and can't organise the movement.

You want to be able to send and receive information. How can you do that when all you have is a web enabled device and no network?

Distribute serverless apps

Let's see if it's possible to distribute apps from device to device, using web technologies only but without servers.

file:// URI scheme

The most obvious and direct approach is to create a file on your computer and open it in your browser using the file:// protocol. It works for simple documents but comes with a huge list of limitations:

Data URI

Data URIs allow to package all of your app logic and resources in a string format that your browser can recognise and execute. Rather than using a URL as a reference to a content located somewhere, the URL contains everything. The address IS the content.

A very simple example is the following: data:text/html;utf-8,<b>Hello, world!</b>

Clicking on the link will display the text "Hello, world!" in bold. Everything is self-contained.

There is a limitation, however, in the number of characters such a URI can contain: no more than 65535. Here is the function I use to convert HTML code to data URI:

function dataUriEncode(string) {
  var encoded = 'data:text/html;utf-8,' + encodeURIComponent(string);
  if (encoded.length >= 65535) {
    console.error('The string generated is too long.')
  }
  return encoded;
}

Abusing data Uris

Let's now abuse this simple mechanism to create web pages and distribute them without server.

For the sake of an example, I shamelessly stole WOLF1K, a clone of Wolfenstein in JavaScript that only weights 1K. Here is the complete source code of the app that I'll try to distribute:

data:text/html;utf-8,<title>WOLF1K</title><canvas id=c></canvas><script>E="A=document.body.children.c;B=A.getContext('2d'~=1$&31$&992(E&19)?1;    n=@nB#[A#)onkeyup=0};D=[setInterval(@=innerWidth-30;A.heigh@/2;    n=@n=[t,n,S,8+$*S&8)|(X+y+t*s+t*c8&7]){w=X=x=;v=y=;z=2]z+n/@ a=s=yG=F,r=u;a=c=X    ;$~)F<G?(F,F1/uS=c/u):(G,G1/rS=32*s/r)}    n=@i<15;i++){2];j=38;i?1X=(+,;Y=(,+;ji?random(8-42]=(b-t/16+9.42%6.28)-3.14;z-atan2(Y-=v,X-=w~i&&>.5?=[sqrt(X*X+Y*Y),@/2-@*b,i,0,++n]    D.sort(+x[-y[}~n)    a=[,a/@ F=@/2//[,c=8,u=v=+1=[3]=c=1,@,0%)'):a!ca)    '+v+'%)'atob('CBF+/p6f9AC9bsP/w/dqvdvb2NvD29sb'),y=8,@/4,b!yb)d>>y&1,v),9)]";"@A.widthMath.cos(b)B[i][function(x,y){B.fillStyle='hsl('+D[n][2]+'1,99%,%=1;u=a<0?-a:a,F=(a<0?d:1-d)/u;onkeydownMath.sin(b).charCodeAt(x++%73)    for(?B.fillRect(a,b,u:A[j]-A[j+2];D[n]A[x.which]=--;*t/8;x|=y<<5;,i=0;Math.+=0]1.1,return)*b=,d=:0}F/41]t= -.5,!-=F;#[n]=$(x~);,x".replace(/.([%-}]+)/g,function(x,y){E=E.split(x[0]).join(y)});eval(E)</script>

I will now run this web page and distribute it to other devices without using a server. The tests below are based on the following browser/OS configuration:

Feel free to test and send me the results on other OS configurations.

Distribution strategies

Link

This is the easier way of sharing a data URI app and it works everywhere! Just click the link below to start the app: WOLF1K

Obviously a link means that a web page is required and possibly a server to host it.

Emails

I haven't been able to send or receive a data URI link from an email. This is due to security reasons. What works though is to paste the source code of the app and ask the recipient to copy and paste it in her browser address bar.

SMS

Likewise, copy/pasting content from a text message is the only way to use SMS to send such an app.

Bluetooth and NFC

This methods were a major disappointment to me. I was really hoping that one can send a data URI app from a device to another. But it doesn't work due to "Can't open media type" (Bluetooth) and "Unknown tag type" (NFC) errors. That said, I suspect it can be fixed on Firefox OS.

QR Code

1084 characters are too much for QR Code and make it impossible to decode.

This is sad because this prevents us from using printed material to distribute apps.

If you don't believe me, try to scan that: WOLF1K

Wifi Direct

This is a protocol to discover and communicate with devices connected to the same Wifi network, even if there is no Internet connection. Though it is currently implemented in Firefox OS and Android, this needs to be tested to see if it's possible to send data URI web app via this channel.

Bookmarking and adding to homescreen

Once the app received is running on your browser, you want to bookmark it so that you can use it later.

Desktop browsers permit bookmarking data URI apps, but not to add to homescreen.

On mobile browsers, things are slightly more complex:

So what now?

Ironically the only methods that work are the server based one (link, email, text messages). Distributing web documents encoded in data URI without a server is a bit tricky at the moment. OS UI don't always allow creating, sharing or opening such apps. I understand this falls in a particular edge case, but if data URI apps were supported like URL based apps, things would be easier.

Now I'd love to see a discussion happening around what does the web need to achieve this. Are they any specifications currently being standardised that would go in this direction?

Let's move things forward and get a truly serverless web!

Comments