Lokker’s Privacy Page Scan tool (referred to as “Lokker’) is a real-time website privacy inspector. It emulates how a user might be surveilled while browsing the web. Users type a URL into Lokker, and it visits the requested website, scans for known types of potential privacy violations, and returns a privacy analysis of the inspected site.
Purpose: The purpose of our web page privacy scanner is educational in nature. Lokker privacy scans are designed to reveal privacy and security information that is otherwise difficult to see. Our goal is to deliver transparency and not to make privacy, cybersecurity, or risk assessments. Users should use our scan results for informational purposes only.
Lokker works by visiting the requested website with a headless browser, running custom software built by Lokker. This software monitors which scripts on that website are potentially surveilling the user by performing a variety of tests, each investigating a specific, known method of surveillance, to help users identify:
- Third-party cookies
- Key logging
- Session recording
- Canvas fingerprinting
- Potentially Risky Domains
- Volatile Scripts
- Ad trackers
- Foreign Domains
When a user enters a URL into Lokker, the tool opens a headless web browser with a fresh profile and visits this URL several times using Lokker’s third-party detection and analysis system. While the browser is visiting the website, it runs custom software in the background that monitors scripts and network requests to observe when and how user data is being collected. To monitor scripts, Lokker modifies various fingerprint properties of the browser’s Window API. This allows Lokker to log which script made a particular function call, using the Stacktrace-js package. The network requests are collected using a monitoring tool included in Puppeteer’s API.
Lokker uses the script data and network requests to run the seven tests identified above. Afterward, it closes the browser and generates a report for the user. It records a list of all the URLs that the end-user browser requests during the viewing of the page. In addition, it makes a list of all domains and subdomains that were requested. Lokker also allows for the scanning of multiple pages on a website based on either a pre-built URL list, sitemap, or a site crawl of a limited depth to present aggregated results to the user.
Lokker defines domain names using the Public Suffix + 1 method. It defines a first-party domain as any domain that matches the website visited, including subdomains. It defines a third-party domain as any domain that does not match the website visited. The tool compares the list of third-party domains from the website requests with DuckDuckGo’s Tracker Radar dataset. This data merge allows Lokker to add the following information about the third-party domains found on the inspected site:
- Name of the domain’s owner.
- Categories assigned by DuckDuckGo to each domain that attempt to describe its purpose or intent.
This additional information about third parties is provided to users as context for Lokker’s test results. Among other things, this information is used to count the number of advertising-related trackers present on a given website.
Lokker will also tell users whether the report’s results are high, low, or about average compared with what Lokker found on the 100,000 most popular websites as ranked by the Tranco List. This is described in more detail below.
How Lokker Analyzes Each Type of Tracking
Third-party cookies are a small piece of data that tracking companies store in your web browser when you visit a website. This bit of text—usually a unique number or string of characters—identifies you when you visit other websites that contain tracking code from the same company. Third-party cookies are used by hundreds of companies to build dossiers about users and deliver customized ads based on their behavior. Popular web browsers Edge, Brave, Firefox, and Safari all block third-party tracking cookies by default, and Chrome has announced that it will phase them out.
Key logging is when a first or third party monitors the text that you type into a webpage before you hit the submit button. This technique has been used for a variety of purposes, including identifying anonymous web users by matching them to postal addresses and real names. There are other reasons for key logging, such as providing autocomplete functionality. Lokker cannot determine the intent behind the inspected website’s use of this technique.
To test whether this is happening on a given website, Lokker types predetermined text in all input fields but never clicks on a submit button. It monitors network requests to see if the data that was entered was sent to any servers.
Session recording is a technology that allows a third party to monitor and record a user’s behavior on a webpage—including mouse movements, clicks, scrolling down the page, and anything you type into a form even if you don’t click submit.
Lokker monitors the network requests for specific URL substrings that appear only when session recording is taking place, according to a list created by researchers at Princeton University in 2017. Sometimes key logging is used as part of session recording. In those cases, Lokker would correctly report the session recorder as both key logging and session recording because it observed both, even though both tests are identifying the same script. Lokker accurately detects when a website loads these scripts—but companies typically record only a sample of website visits, so not every user is being recorded on every visit.
Fingerprinting describes a group of techniques that try to identify your browser without setting a cookie. They can identify you even if you block all cookies. Canvas fingerprinting is a type of fingerprinting that identifies users by drawing shapes and text on a user’s webpage and noting the minor differences in the way they are rendered. These differences in font rendering, smoothing, and anti-aliasing, and other features are used by marketers and others to identify individual devices. All of the major internet browsers – except Chrome – try to counter canvas fingerprinting—either by not fulfilling data requests for scripts known to have engaged in the practice or by trying to standardize users’ fingerprints.
Lokker follows the methodology developed by Princeton University researchers to identify when the HTML canvas element is used for tracking purposes.
The parameters are:
- The canvas element’s height and width properties must not be set below 16px.
- The text must be written to the canvas within at least 10 distinct characters.
- The script should not call the save, restore or addEventListener methods of the rendering context.
- The script extracts an image with a toDataURL or with a single call to getImageData that specifies an area with a minimum size of 16px × 16px.
It is possible that Lokker could falsely label a legitimate use of the canvas that matches these heuristics.
Malware domains are domains that are known to generate spam, host botnets, create DDoS attacks, and generally contain or distribute malware. Many malware domain lists are freely available on the internet. Lokker uses data from maravento/blackweb to identify and categorize domains that are listed as malware domains.
Lokker checks all network requests against the EasyPrivacy list, which contains URLs and URL substrings that are known to be used for tracking. Lokker monitors the network activity for requests being made to these URLs and substrings. Lokker only records requests being made to third-party domains. It ignores any URL patterns in the EasyPrivacy list that match the first-party domain. Lokker looks up these third-party domains in DuckDuckGo’s Tracker Radar data set to find out who owns them, how prevalent they are, and what kinds of services they provide. Lokker only includes third-party domains that belong to the “Ad Motivated Tracking” categories defined in the Tracker Radar data set. Lokker also uses data from ghostery/whotracks.me for the categorization of ad trackers listed in scan results.
The Facebook pixel is a piece of Facebook code that allows other websites to target their visitors later with ads on Facebook. Common actions that can be tracked by pixel include viewing a page or specific content, adding payment information, or making a purchase.
Lokker looks for network requests from the site going to Facebook and looks in the URL query parameters for data that matches the schema of what is described in the documentation for Facebook’s pixel. It looks for three different types of data: “standard events,” “custom events” and “advanced matching.”
Google Analytics’ “Remarketing Audiences”
Google Analytics is the most popular website analytics platform in use today. While most of the functionality of this service is to provide developers and website owners with information on how their audience is engaging with their website, the tool also allows the website to make custom audience lists based on user behavior and then target ads to those visitors across the internet using Google Ads and Display & Video 360.
Lokker examines inspected sites for the presence of the tool, not how it is used. Lokker looks for network requests from the inspected site going to a URL beginning with “stats.g.doubleclick” that also contains the UA Google account identifier prefix. This is described in more detail in Google Analytics developer documentation.
Lokker’s web page privacy scans are intended to provide transparency and high-level summary insights. We intend to reveal how potentially personally identifiable user data is being accessed and shared via third-party applications through web browsers. We hope companies will learn from these scans and take appropriate steps to identify and correct any unintended data leaks our scans may indicate.
Lokker’s analysis is limited by four main factors:
- It is a simulation of user behavior, not actual user behavior, and could thus trigger different surveillance responses. For instance, an automated request might trigger more fraud detection but fewer ads.
- The inspected website could be surveilling user activities for benign purposes. For instance, canvas fingerprinting is used for fraud prevention because it can identify a device. And key logging can be used to provide autocomplete functionality. Lokker does not attempt to identify the intent of any tracking technology it finds. Nor can Lokker determine exactly how a website uses the data it collects on a user when loading session recording scripts and monitoring user behavior, such as mouse movements and keystrokes.
- False positives (possible with canvas fingerprinting): Occasionally, legitimate uses of the HTML canvas match the heuristics Lokker uses to identify canvas fingerprinting.
- False negatives: The stack tracing technique used by Lokker might incorrectly attribute a call to a window API method it is monitoring to a library included by a script.
Given the dynamic nature of web-based technology, it is also possible that some of these tests will become out-of-date over time, and new legitimate-use cases for the techniques Lokker flags could emerge that would not be listed in the tool’s caveats. For this reason, Lokker’s results should not be taken as the final word on potential privacy violations by a given website. Rather, they should be treated as an initial automated inspection that requires further investigation before a definitive claim can be made.