Open Source Analytics for Web & Mobile
2024-11-26
Table of Contents
Our Requirements & Priorities
Google Analytics has dominated just because it is very simple to set up, but also it gives the user access to so much data! All at the expense of handing your user data over to Big Tech. Not only does this negatively affect your users' privacy, but it leads to a collect everything mentality, which goes against GDPR, and transferring user data around the world for the purpose of surveillance capitalism does seem to be illegal.
So the question is what do we really need to collect as system builders to ensure we can get the business insights required to build our products without invading our user's right to privacy - assuming you're not in the business of building tracking tools, of course...
For our projects, the key features required are:
Data sovereignty
Collect data locally within our own systems, and process it only for the purpose of product improvement.
Allow input from various sources
Working across the web, mobile, and backend systems, we want to ensure we have the data needed for trend analysis & decision-making in a central location.
There is little value in just reviewing server logs, we also need rich information from client interactions to understand how customers interact with our products.
Open source
Using Open Source software means we can ensure long term stability for our projects as a vendor can't be acquired, change direction, or change the pricing model once we have already built a platform.
Privacy respecting
GDPR and the EU Digital Services Act are both great protections for online users, in our opinion, but they do create a few extra headaches for developers looking to implement systems. The best part about respecting your users' privacy by default is there is no reason to include those really annoying cookie banners, which is great because we hate cookie banners!
Our Analytics Platform Top Picks
Plausible CE
https://github.com/plausible/community-edition/
Modern, GDPR-compliant, No Cookies
License: AGPL (copyleft)
Plausible primarily push their Managed SaaS offering, but they also have a self-hosted version called Community Edition. They use ClickHouse under the hood, which is an extremely robust DBMS that has been in development since 2016. These are some seriously strong foundations for the project and instill confidence that it can scale to seriously impressive numbers. To get started, though, you just need 2GB of ram, and Docker.
They have an Android SDK, no iOS SDK, but there is documentation for posting to the endpoints
Plausible is very easy to install, if you can allow the docker compose implementation to take over ports 80 and 443 on the host device (e.g. it's not already your web server) them the default CE setup even sets up TLS certificates for you!
Plausible has a very simple and easy setup process for self-hosting
Setting up the tracking script is a breeze, with the initial JavaScript loading only 300 bytes with everything else deferred, so there is almost no impact on page load.
Plausible has a nice modern UI, and shows all the data you would expect
Overall, Plausible is an extremely well implemented platform that should meet most use cases. 10/10 from us!
Countly Lite
Modern, Tons of SDKs, Detailed tracking and crash reports
License: AGPL v3 (copyleft)
Countly is a much more feature complete real time event tracking platform. It strays further into the user tracking field and is less about collecting simple metrics. Luckily, the Lite version, which is what we are interested in as it's self-hosted and AGPL, does not include most of the user tracking as you can see in their comparison chart.
Countly is available to install in various ways, but we are sticking with following the Countly docker installation instructions, so it's directly comparable to the Plausible setup we are testing.
Lightweight instructions are available to start with docker compose
Their docker compose file is far more complicated initially than the one provided by Plausible, but if you are familiar with the format of compose files it's easy to follow, and shouldn't require any changes for a typical installation.
When first starting, the drawbacks of a Node & Mongo based system are clear, as Node uses a single thread to run loadCitiesInDb.js, which in my case took forever to run before the server came up. At last check, it was at 24 minutes! Another downside is that there are no TLS certificates configured by default, as there is with Plausible, so while this config is easier to modify for more technical users, it's harder for someone to just spin it up and get started.
As Countly server does not serve the client side script, you also run into the problem that the JavaScript is blocked by uBlock Origin. You can of course get around this by serving the script yourself with your reverse proxy, but it's another element to configure, again confirming that Countly might be better for larger projects, or where specific metrics are required.
Once it's up and running, it works as expected with all the information, and more, that would be required.
Countly has a nice modern UI, and shows all the data you would expect
Goat Counter
Simple, Easy to set up, Free and open source, Community hosted!
License: EUPL (copyleft, attribution)
In the why page Martin describes the situation very accurately, either you let Google have all your data, and aggressively track all of your visitors, or you pay for business level analytics platforms.
GoatCounter is not only free and open source, but it's also a free hosted service funded by donations, so you can be up and running in literally 2 minutes.
Sign up, copy and paste the JavaScript into your HTML, and you are done!
One downside with using the hosted service is that once it is identified as a tracker, it might not get accurate results if visitors are using ad blockers, and in my testing, the domain is blocked by uBlock Origin.
GoatCounter was unable to load the visitor tracking script, because it's blocked by uBlock Origin
This is another very strong tick in the box for self-hosting your own analytics, which GoatCounter offers.
Overall, I would say this is a great platform, and Martin is doing a great community service by running it on a donation basis. It's just unfortunate that it is being lumped in with the horrible tracking practices of big tech, making it hard to get true insights without self-hosting it. Perhaps this is no longer a major concern if the vast majority of internet users can no longer use uBlock Origin on Chrome. It might depend mostly upon your target audience.
GoatCounter showing all the information you would need for a small blog, or lightweight website
Other contenders
PostHog FOSS
https://github.com/PostHog/posthog-foss
PostHog also runs on ClickHouse and might be something to consider if you want more detailed user tracking, but for us right now, this is a level or user tracking that we don't need, and as we discussed at the top of the article, strive to collect only the data you need to keep improving your products.
Open Web Analytics
https://www.openwebanalytics.com/
OWA is PHP based and seems mainly aimed at WordPress. If you are using WordPress, this is probably a good option, but I discounted it because the other options above are more to my taste & requirements.
Using a proxy to avoid blockers
Plausible have written up a nice overview of the why and the how of ad blockers, but a simple solution is to use a proxy on your domain to forward the requests to the trackers, instead of using their domain names. The only downside with this approach is that then you still have to host and manage the proxy, so while this would certainly be an option for all the providers listed, it seems to kind of defeat the point as then you could perhaps just self-host the platform anyway.