What is Device Fingerprinting?
Identifiability on the Internet: it’s not just the identifiers you know about
Device fingerprinting sounds like a term used to describe how to unlock your phone or laptop using your finger. But that’s not what we are talking about. If you know enough about the web to worry about whether you’re being tracked across websites, you know that you are being tracked.
You’re also at least aware of web cookies and other kinds of tags, as you see language about cookies on sites thanks to various regulations. You may even know that your IP address can itself identify you (and far more effectively than you might think) because we have written about it earlier this year.
As a user, there are ways to protect yourself from some of this – refusing cookies, using a browser that restricts information given to websites, even using the web through an incognito browser or anonymizing proxy. Though just recently, Google was hit with a $5B Lawsuit Over Tracking Users In Incognito Mode, so users may not have that many protections at all.
As a website owner, you also have some control over your users’ privacy. You can make sure you’re not asking for information you don’t need. You can also make sure you’re not sharing with anyone you don’t have to, though there’s still complexity.
Something new to worry about: Device Fingerprinting
However, those aren’t the only ways to track users. To explain this, we need to understand a concept called device fingerprinting.
Current computers are complex in terms of both hardware and software. There are a lot of pieces that work together to create the whole. You can look up current versions and settings of many of them if you know how.
Taken together, all the information that describes an individual device is a big collection of data points. Even hardware gets updated in production. My Macbook Air may not have the same exact model of SSD that someone may have purchased two months later. The variability can be even higher with Windows machines. Providing a lot of this information is necessary to allow websites and scripts to perform basic functions, but it also poses a risk.
Assessing the privacy risk
When you add in the software, the variations are endless. You have regular updates (not all of which a given user takes), people install different software, packages you download install extra stuff, there are libraries, and on, and on.
A web browser has settings, add-ons, and versions of different tools. All this variability means that there’s an excellent chance that the data signature about your computer may be very different from your co-worker who may have started 6 months later with the same hardware and software installed.
This is because you can look at all the data versioning and the like to distinguish your device configuration as unique from your coworker’s. And the differences between you, your co-workers, and your neighbors are easily distinguishable.
Connecting the bits
A party who can get access to this data about an individual machine can use it to create what is called a device fingerprint. This data signature of sorts uniquely identifies the machine in question without making use of any direct identifiers like IP address, IMEI, etc.
If the information is collected solely from the browser – browsers are really chatty – this is called a browser fingerprint. To see how unique your browser fingerprint is, check out AmIUnique.org. Another good canvas fingerprinting test you can check is available on Privacy.net. As they point out on these sites, though, your fingerprint may become more or less unique over time as configurations change.
If you’re then able to associate a machine, with a known device fingerprint, and a user account somewhere, you’ve just linked it to the identifiable person who owns that account.
None of this is perfect. It’s a probable link: “machine XXXXXXX is probably Joseph Saul, firstname.lastname@example.org,” but it’s good enough for stuff like targeted marketing. It is also very valuable information and quite good enough for data brokers to create and sell fingerprint databases derived in this way.
Fingerprinting your flock
You have probably heard by now that Google is planning to abandon cookies for tracking purposes by making Chrome disable the use of any 3rd-party cookies. Google is instead planning to utilize a new technology called the Federated Learning of Cohorts or “FLoC”.
FLoC is Google’s new tracking system that makes individuals part of a “cohort” group rather than identifying you as an individual. And there are plenty of legitimate concerns being raised from privacy advocacy groups like the Electronic Frontier Foundation. The question remains: is the pending flock-based “cookiepocalypse” a meaningful improvement to privacy?
According to the Verge, the issue is complex. While early, it looks as if this new technology won’t necessarily make individual privacy any more secure. Instead, it looks as if it will enable only the largest players in the AdTech industry to discern more comprehensive individual data signatures by parsing through the data. Basically moving ad giants off cookie tracking and committing them to a fingerprinting practice of some sort.
The lone bird in the flock
So far, no other browser provider has publicly announced any intention to support Google’s FLoC technology. The larger competitors to Chrome are simply planning to block third-party cookies and let the chips fall where they may.
This leaves it up to users and website operators to address privacy fingerprinting through other means.
What goes into a device/browser fingerprint?
Essentially, it’s a lot of stuff. The more data elements you can get and the more diverse their possible values are, the more likely the fingerprint is unique.
To give you an idea of the breadth of the information, here are some examples.
- versions of various software the browser can tell you about
- what browser extensions are installed
- browser history if available
- audio fingerprinting
- Canvas and WebGL information
- geolocation info (this often requires permission)
Hardware-based examples include:
- hardware properties and versioning
- hardware benchmarking, e.g. processing speed
To be useful for individual identification, device fingerprints must be both diverse and stable.
Diversity means that there must be enough possible values in the space of all values for each machine to be unique within it. If there aren’t enough possible values, too many machines will have the same fingerprint, making them useless for individual identification.
For example, if all they could get was a model of a computer and a version of a web browser, most “fingerprints” would include thousands of machines.
Stability means that each machine needs to keep the same set of values across different websites. Ideally long enough to link the fingerprint to something identifying, like a Facebook account.
If the fingerprint changed from website to website, you obviously wouldn’t be able to tell it was the same person. If you can’t link it to a real identity, it’s significantly less valuable.
Therefore, the best way to counter device fingerprinting is to attack the diversity or stability of the machine’s signature.
Individual users have several tools at their disposal for this.
To get something out of the way first, though, using “incognito mode” won’t completely protect you as a lot of the data used in fingerprinting isn’t blocked. What can help is a browser which gives a simplified fingerprint, reducing diversity.
Firefox will do some of this for you (and even blocks some by default); the Tor browser is designed to protect you from fingerprinting. There are also software packages you can install on your computer to protect against fingerprinting.
In all of these cases, though, blocking this data may break certain website functions. And there are practical reasons for some scripts to read this data in order to work properly.
As a website owner, you have a slightly different problem. Your site needs some of the data that goes into a fingerprint to work properly, but you should be concerned if your website is leaking information to third parties about your users. This includes leaking information that your users have visited your site via device fingerprinting.
To remedy this, you’ll need to inventory the third parties on your site, figure out which ones may be accessing this data, and then decide what to do about them. If you block them outright, you lose the functionality you added them for in the first place.
This 3rd-party client-side protection is an under-addressed gap in the privacy tech world. And this is precisely where Lokker Privacy Automation comes to the rescue.
Privacy automation can help
First, privacy automation platforms like Lokker can give you a comprehensive inventory of the third party (and fourth, and fifth, and so on) scripts on your site, which is critical. Then, Lokker shows you where data is flowing, and what specific user data is being exposed.
Finally, Lokker can allow you to decide whether to completely block data from going to a script, or whether to replace it with anonymous data. This could itself be a fingerprint, but one that won’t link up to an existing fingerprint anywhere.
The good news in all this is that the tools you need to dramatically enhance your privacy standards are here. As privacy takes on a larger role in your company’s daily operations, there are those of us on the front line who have been focused on protective technologies from the beginning. If you are concerned, we are here to help.