GoAccess and real website stats

Does Google Analytics shows the same stats as logs from your server? Sure it doesn’t. A lot of people (including me) use various Privacy tools, Ads or JavaScript blockers. As an admin of the website, I would like to know how many visitors I have. It is going to be even harder, when your website also have an onion address. Connection is anonymized and TorBrowser is blocking by default all scripts. Additionally, the use of external analysis tools only feeds large corporations with data on user behavior in the network. I don’t say they are bad, but they are not showing you everything.

weblogs

So it doesn’t matter if I am using Google Analytics or some open source alternative like Matomo (previously Piwik), Open Web Analytics or any other. I have to rely on my own logs. I have read one article recently, and one guy in the comments remind me about GoAccess. I’ve played with it in the past and forgot it completely. Time to turn it back on and check for discrepancies.

I am on Debian, I decided to use official repository (in Debian repo GoAccess is outdated).

1
2
3
4
echo "deb https://deb.goaccess.io/ $(lsb_release -cs) main" | sudo tee -a /etc/apt/sources.list.d/goaccess.list
wget -O - https://deb.goaccess.io/gnugpg.key | sudo apt-key --keyring /etc/apt/trusted.gpg.d/goaccess.gpg add -
sudo apt-get update
sudo apt-get install goaccess

On Github page you can find other methods to install GoAccess on various systems. Official documentation is also quit nice.

One simple command to check logs:

1
sudo goaccess /var/log/nginx/access.log /var/log/nginx/access.log.1 --log-format=COMBINED

Just provide path to your access.log defined in www server (Apache or Nginx etc.) To check logs in real-time add parameter -c

GoAccess Console Stats

To generate HTML report use:

1
sudo goaccess access.log -a > report.html

To see html report in real time on your website:

1
sudo goaccess access.log -o /usr/share/nginx/html/your_site/report.html --real-time-html

GoAccess HTML Report

Results

This part probably is going to be boring for you, but I am excited to see and compare results. Time range is 16.03.2021 to 12.03.2021.

Of course, this whole comparison should include a lot more, like network scanners, bots etc, but I don’t really need it, and someone inspired by this article might go a step further. The analysis and comparisons of others show that the difference in their observations is always 40% to 60% between server logs and Google Analytics results. I also have an Onion address so in my case looks like 0ut3r.space is more popular in Tor rather than standard Internet. Same knowledge split in two domains, but the one served as hidden service is more interesting. I experimented with artificially generated traffic, or free points, or SEO coupons for positioning phrases in search engines like Google and Bing, for the domain in Clearnet. I have not done stroke of work when it comes to the onion domain :) It looks like everything that is uploaded to the deep web is twice as interesting, useful, forbidden and hacker-friendly than on the regular Internet. The domain https://0ut3r.space is just another stupid and boring blog from some amateur. On the other side domain http://reycdxyc24gf7jrnwutzdn3smmweizedy7uojsa7ols6sflwu25ijoyd.onion is definitely a hacker, cracker, pyromaniac and éminence grise, who definitely use drugs and is taking a bath in a bathtub full of Bitcoins.

Visitors and Hits

Just one word about naming. Visitors and Hits in GoAccess are called Users and Pageviews in Google Analytics.

Analytics Users and Pageviews

966 Users and 1994 Pageviews

VS

GoAccess Visitors and Hits

12929 Visitors and 186691Hits. Narf!

This screen can explain a lot in case of visitors from Tor Onion address, but still numbers are quite bit different right? This is why I like to analyze logs directly from the server.

GoAccess IPS

Operating system

Analytics OS

VS

GoAccess OS

Browsers

Analytics Browsers

VS

GoAccess Browsers

Summary

The differences are visible. Is it wrong? It depends on what you want to achieve. Google Analytics shows what it can, and since the scripts analyzing traffic are often blocked, it will not count everything. The server logs do not lie, but they show everything they can and they are not enriched with what Google knows as a giant. Without info for the analyst who wants to see what profits the website will bring him or what the estimated age of users has been calculated. If you are interested in real traffic and the number of users, check the logs. If you are interested in earnings, ads, users age, shoe size and which model of smartphone was used, rely on Google Analytics or its alternatives. Google gives a lot of what you cannot read from the server logs alone.

For example definitely can’t find from logs what search query users put into the Google Search (or any other available fancy stats), and how many of them visited my page after such a search.

Analytics Search Query

On the other side, stats from the “Not Found URLs” tells a lot. If link points to something that never exist it means that some vulnerability scanner is looking for something. Maybe it is a Google Bot? Maybe hacker looking for weak points in configuration, or sensitive data? Who knows ;)

GoAccess 404

Good luck with numbers and let me know how it looks on your side. If you are not a website admin, just a visitor, I definitely need to know what is your shoe size. I miss this data in Google Analytics.