How to configure Nginx logs to improve GoAccess precision

June 13, 2021

A year ago I published a post about how to install GoAccess and I stated

My idea is to have insights (most visited pages, operating systems, browsers and referrals) about the visitors without any client-side code and cookie.

I finally removed Google Analytics tracking code and it's been one month since I am relying only on GoAccess generated analytics.

In this post I share the configuration I am currently using to have the most possible precise data.

GoAccess comes with a lot of built-in filters to remove noise from server logs, but it cannot do all the work alone. It can be helped to work at his best fine tuning the web server logs, in my case Nginx.

What I did, it was to create an ad-hoc log file for GoAccess using Nginx conditional logging features.

After one year of observation of raw logs and some trial and error sessions, I defined an heuristic based on these four rules:

  1. My website is generated using Gatsby and all page URLs have the structure All end with /.

  2. I am only interested in logged requests that use the GET HTTP method.

  3. I am only interested in logged requests with status code 200.

  4. The protocol used is HTTP/2.0.

Implementing this heuristic is very simple. I edited my server block adding this code:

map $server_protocol $goAccess_protocol { HTTP/2.0 1; default 0; } map $status $goAccess_status { 200 $goAccess_protocol; default 0; } map $request_method $goAccess_method { GET $goAccess_status; default 0; } map $request_uri $goAccess { ~.*/$ $goAccess_method; default 0; } server { # ... access_log /var/log/nginx/ combined if=$goAccess; } is the input of GoAccess.

Referral spam

Unfortunately this configuration does not eliminate all the noise, especially the referral spam. Duckducking I found this really useful post.

I added this other rule to my heuristic

# ... map $http_referer $referral_spam { default 0; include /etc/nginx/; } server { # ... if ($referral_spam) { return 444; } }

The file /etc/nginx/ contains lines such as

"~*" 1; "~*" 1; "~*" 1; "~*" 1; "~*" 1; "~*" 1; "~*" 1; "~*" 1; "~*" 1; "~*" 1;

The boring thing is that as soon I see a suspect domain in my logs, I have to manually add the domain to the file and restart Nginx.

A photo of Elia Contini
Written by
Elia Contini
Sardinian UX engineer and a Front-end web architect based in Switzerland. Marathoner, traveller, wannabe nature photographer.