Tweasel update #1: Building libraries and automating setup

With the tweasel project, we want to build a web app that detects privacy violations in mobile apps on Android and iOS. Users can select an app from the app stores and we will analyze its network traffic and consent dialogs. We will show a report to the user and offer to generate a complaint under the GDPR and ePrivacy Directive with the collected evidence. Lorenz and I are working on this thanks to NLnet funding.

To keep you up to date on everything we’re doing, we’ll start doing biweekly update posts, where we go into the progress we’ve made and features we’ve added to our tools and libraries, but also any interesting technical challenges we’ve solved. This first one is going to be a bit longer, since we have some catching up to do. Strap in.

Appstraction

Appstraction is an abstraction layer for common instrumentation functions on Android and iOS. It allows you to install, uninstall, start, stop apps and configure their permissions, as well as manage device settings like emulator snapshots, clipboard, proxy, and certificates. Appstraction can also be used for purposes other than mobile privacy.

We released the first version of appstraction at the end of March. This initial version was based on the platform layer I wrote for my master’s thesis, but we already added quite a lot of useful improvements, in addition to thorough documentation:
- We have a capability system that allows users to disable functionality that requires special capabilities like root rights or Frida if they don’t want or need those features.
- You can granularly choose which permissions to grant or deny.
- I added support for managing proxies and certificate authorities on Android and iOS.
  
  On Android, everything is completely automated and works on both emulators and physical devices. System CAs are stored in /system/etc/security/cacerts but since Android 10, /system is only mounted as read-only and cannot be written to even with root rights. To circumvent that, we’re using a clever workaround by HTTP Toolkit: While writing to /system/etc/security/cacerts/ is not possible, you can mount a tmpfs over /system/etc/security/cacerts/, which you can then write to.
  Instead of a global system proxy, you can use WireGuard, which regular apps can’t get around and which allows you to precisely filter which apps to tunnel. WireGuard is even automatically installed and configured on the device if enabled.
  
  Meanwhile on iOS, the method for programmatically configuring CAs through an SQLite database I reverse-engineered is incomplete and requires a one-time manual action by the user. Lorenz is currently investigating a different method using configuration profiles.
  We currently also only support setting a system proxy on iOS but no WireGuard (yet?).
- I also added functions to configure an app’s battery optimization settings (currently Android-only), (force-)stop apps (Android and iOS), and to check whether an app is already installed on the device (Android and iOS).
- Split APKs often come with splits for various architectures. Trying to install these on a device that doesn’t support some of of these architectures will fail. To make it easier for the user, we automatically filter the splits to only install the compatible ones.
- There is barely any device setup necessary on Android as we’re automatically installing Frida if necessary.
Version 0.2.0 brought support for Windows. This didn’t require any major changes and was mostly just annoying (different line endings, no grep, ideviceinstaller has a different CLI). Also, we are now using NodeSSH, which removes sshpass as a dependency.
Since then, we’ve made a bunch of changes that will be released soon:
- Lorenz added support for installing apps from .xapk, .apkm, and .apks files, which are used by the common APK download portals APKPure and APKMirror. With that, we also support installing .obb files alongside .apks. While deprecated, some older apps (especially games) still need those to work correctly. I added support for parsing the metadata of these custom bundle formats.
- We are trying to get rid of as many setup steps as possible. To that end, I automated the installation of the Android developer tools. For that, I created a separate library called andromatic that can also be used by others. With it, you can just call tools like adb from Node.js (including requesting a specific version) and andromatic will make sure it is installed:
- We also did the same thing for our Python dependencies. Lorenz first implemented a postinstall script for this in cyanoacrylate, but we also needed the functionality in appstraction. So I created autopy, a library for depending on Python packages from JavaScript that will automatically manage a venv and download Python and pip dependencies. It uses the very handy static Python builds provided by python-build-standalone.
- Still on the subject of fewer dependencies, I got rid of OpenSSL. We only needed that to calculate the subject_hash_old of the CAs we want to install on Android. With the help of Bing Chat, I was able to implement that in JS.
- Lorenz is investigating switching from libimobiledevice to pymobiledevice3 or go-ios. These have more features (most crucially, the ability to install configuration profiles) and we could install them automatically (my quick attempt at doing that with libimobiledevice was not successful).
- Lorenz has also started work on automating the setup of iOS devices, installing all the tweaks we need using apt through an SSH session.

Cyanoacrylate

Cyanoacrylate is a toolkit for large-scale automated traffic analysis of mobile apps on Android and iOS. It uses mitmproxy to capture the HTTP(S) traffic of apps in HAR format and appstraction to instrument physical devices, or emulators for Android. Cyanoacrylate handles the management of certificate authorities and WireGuard mitmproxy setup automatically. It is designed to analyze the tracking behavior of mobile apps.

The first version of cyanoacrylate was released at the end of March. It featured a fully automatic mitmproxy setup, Android emulator control and Python environment installation. We are using the har_dump.py script to export the traffic from mitmproxy as a .har file and Lorenz wrote a mitmproxy script to communicate its events to JavaScript. This version only supported Android.
In version 0.2.0, we make use of WireGuard’s feature to only tunnel traffic of specific apps and allow you to configure the WireGuard app filtering in the options. By default, if you do a traffic collection on an app analysis, we only collect that app’s traffic. That way, you don’t have to worry about filtering out background traffic anymore.

I implemented this by manipulating the internal config files of the WireGuard app on Android.
In version 0.3.0, Lorenz implemented support for traffic collection on iOS devices. This currently uses an HTTP(S) proxy (unlike on Android, where we use WireGuard) and cannot filter the traffic of individual apps. Instead, we currently always record the entire system’s traffic.
Finally, with version 0.4.0, we added Windows support for cyanoacrylate and simplified the setup a little.

TrackHAR and trackers.tweasel.org

TrackHAR is a library for detecting tracking data transmissions from traffic in HAR format. It uses custom adapters to handle different tracking endpoints and extract the transmitted data. TrackHAR also aims to produce outputs that can be used to generate human-readable documentation of the tracking data. This documentation is hosted at trackers.tweasel.org, a wiki that explains how TrackHAR recognizes and decodes the requests, and provides some sample information from research data.

TrackHAR had its first release in April. With that, we have laid down the design and schema for the adapters and implemented the basic functionality. Most of the adapters from my master’s thesis are ported over but have received only limited additional testing and checking so far. Also, the documentation for the containedDataPaths is still lacking behind what we are aiming for.
The adapter-based matching approach TrackHAR primarily uses necessarily means that a significant portion of requests will be unprocessed (as we can’t write an adapter for every possible endpoint, especially developer-/app-specific ones). To alleviate that somewhat, I implemented indicator matching as an (optional) fallback. With indicator matching, the user can provide an object that maps data types to honey data like this:
```
{
    localIp: ['10.0.0.2', 'fd31:4159::a2a1'],
    idfa: '6a1c1487-a0af-4223-b142-a0f4621d0311'
}
```
TrackHAR then searches for these values in the requests. In addition to string matching in plain text, we also support searching in base64- and URL-encoded text. Support for additional encodings and hashes is planned.
In April, we also launched Lorenz' initial implementation of trackers.tweasel.org. This documentation is generated completely automatically from the adapters in TrackHAR. We are even creating a human-readable description of the decoding steps. I also included static example values of the actual data transmitted to the tracking endpoints based on the data from my master’s thesis. Ultimately, we want to have a constantly-updated public database of tracking requests and dynamically list examples of observed values for each data path.
We hope that this will become a valuable resource for people who want to dig deeper into tracking.

CLI

Tweasel CLI is a command-line tool that allows you to instrument and analyze mobile apps and their traffic using the tweasel project libraries. You can record the traffic of an Android or iOS app in HAR format (based on cyanoacrylate), and detect tracking data transmissions from the traffic (based on TrackHAR). Tweasel CLI provides a convenient wrapper around these libraries for common use cases, so you don’t have to write any code.

In April, we also released the first version of our CLI (the implementation of which was more painful than it should have been…). This initial release supports two commands:

With record-traffic, you can record the traffic of an Android or iOS app in HAR format. Through command line arguments, you can configure various aspects of the traffic collection like a timeout and whether to record only the traffic of one app or the entire system.

With detect-tracking, you can then detect tracking data transmissions from traffic in HAR format (whether recorded with a tweasel tool or otherwise). The traffic in the specified HAR file will be analyzed using TrackHAR. The detected tracking data can be output as JSON or as a human-readable table:
Since then, I made two more changes that are not released yet (both requested by Malte):
- I’ve implemented an “interactive timeout”. If the user doesn’t provide an explicit --timeout flag, we wait until they manually stop the traffic recording. I think the CLI is more likely to be used for manual analysis, so this makes more sense as a default.
  
  I also added support for multiple traffic collections. With a new --multiple-collections flag, after each time the user stops an interactive timeout, we ask them to enter a name to start a new traffic collection or leave it empty to stop. This is really useful for analyzing apps with consent dialogs. This way, you can easily do a manual analysis and record the traffic from before and after an interaction with the consent dialog separately.
- I also displayed the “setting up” steps more granularly:

Everything else

Just at the end of last year, we gave a talk at the FireShonks year-end event. We talked about how mobile apps track us and what data they send to third parties. We showed how we analyzed thousands of apps automatically and what we found out. We also explained the legal framework of tracking in the EU and why most apps and consent dialogs don’t comply with it. The talk was recorded (it was in German but there is an English live dub available).
We also have a parse-tunes library for fetching select data on iOS apps from the Apple App Store via undocumented internal iTunes APIs. I wrote a Mastodon thread on that already back in January.
Our explicit goal is to make our libraries and tools not just for us, both also for other NGOs, data protection authorities, researchers, etc. In April, we gave a presentation before the tech advisory board of the European Data Protection Board (EDPB) about our results and the tools we developed for mobile app tracking research. The meeting was not recorded but our slides are of course available.
We’ll also be giving a training course on how to use our tools for a German authority next month. And we’re already in contact with two other organizations fighting against tracking to work together on this issue. If you’re also interested in collaborating, please reach out! We are more than happy to help you use our tools, implement feature requests, etc.

written by Benjamin Altpeter
on 2023-05-31 at 13:37
licensed under: Creative Commons Attribution 4.0 International License

Tweasel update #1: Building libraries and automating setup

Appstraction

Cyanoacrylate

TrackHAR and trackers.tweasel.org

CLI

Everything else

Comments
Subscribe to the comments on this post using your RSS/Atom feed reader.

Leave a comment

Language

Country

Appstraction

Cyanoacrylate

TrackHAR and trackers.tweasel.org

CLI

Everything else

CommentsSubscribe to the comments on this post using your RSS/Atom feed reader.

Leave a comment

Comments
Subscribe to the comments on this post using your RSS/Atom feed reader.