So it doesn’t matter if I am using Google Analytics or some open source alternative like Matomo (previously Piwik), Open Web Analytics or any other. I have to rely on my own logs. I have read one article recently, and one guy in the comments remind me about GoAccess. I’ve played with it in the past and forgot it completely. Time to turn it back on and check for discrepancies.
I am on Debian, I decided to use official repository (in Debian repo GoAccess is outdated).
echo "deb https://deb.goaccess.io/ $(lsb_release -cs) main" | sudo tee -a /etc/apt/sources.list.d/goaccess.list
One simple command to check logs:
sudo goaccess /var/log/nginx/access.log /var/log/nginx/access.log.1 --log-format=COMBINED
Just provide path to your access.log defined in www server (Apache or Nginx etc.) To check logs in real-time add parameter
To generate HTML report use:
sudo goaccess access.log -a > report.html
To see html report in real time on your website:
sudo goaccess access.log -o /usr/share/nginx/html/your_site/report.html --real-time-html
This part probably is going to be boring for you, but I am excited to see and compare results. Time range is 16.03.2021 to 12.03.2021.
Of course, this whole comparison should include a lot more, like network scanners, bots etc, but I don’t really need it, and someone inspired by this article might go a step further. The analysis and comparisons of others show that the difference in their observations is always 40% to 60% between server logs and Google Analytics results. I also have an Onion address so in my case looks like 0ut3r.space is more popular in Tor rather than standard Internet. Same knowledge split in two domains, but the one served as hidden service is more interesting. I experimented with artificially generated traffic, or free points, or SEO coupons for positioning phrases in search engines like Google and Bing, for the domain in Clearnet. I have not done stroke of work when it comes to the onion domain :) It looks like everything that is uploaded to the deep web is twice as interesting, useful, forbidden and hacker-friendly than on the regular Internet. The domain https://0ut3r.space is just another stupid and boring blog from some amateur. On the other side domain http://reycdxyc24gf7jrnwutzdn3smmweizedy7uojsa7ols6sflwu25ijoyd.onion is definitely a hacker, cracker, pyromaniac and éminence grise, who definitely use drugs and is taking a bath in a bathtub full of Bitcoins.
Just one word about naming. Visitors and Hits in GoAccess are called Users and Pageviews in Google Analytics.
966 Users and 1994 Pageviews
12929 Visitors and 186691Hits. Narf!
This screen can explain a lot in case of visitors from Tor Onion address, but still numbers are quite bit different right? This is why I like to analyze logs directly from the server.
The differences are visible. Is it wrong? It depends on what you want to achieve. Google Analytics shows what it can, and since the scripts analyzing traffic are often blocked, it will not count everything. The server logs do not lie, but they show everything they can and they are not enriched with what Google knows as a giant. Without info for the analyst who wants to see what profits the website will bring him or what the estimated age of users has been calculated. If you are interested in real traffic and the number of users, check the logs. If you are interested in earnings, ads, users age, shoe size and which model of smartphone was used, rely on Google Analytics or its alternatives. Google gives a lot of what you cannot read from the server logs alone.
For example definitely can’t find from logs what search query users put into the Google Search (or any other available fancy stats), and how many of them visited my page after such a search.
On the other side, stats from the “Not Found URLs” tells a lot. If link points to something that never exist it means that some vulnerability scanner is looking for something. Maybe it is a Google Bot? Maybe hacker looking for weak points in configuration, or sensitive data? Who knows ;)
Good luck with numbers and let me know how it looks on your side. If you are not a website admin, just a visitor, I definitely need to know what is your shoe size. I miss this data in Google Analytics.