Command Line Analytics, Part I

If you are at all interested in the activity or performance of a web server under your charge, it is vital that you become familiar with the contents of /var/log. For your reference, the contents of this directory include:

"Log files from the system and various programs/services, especially login (/var/log/wtmp, which logs all logins and logouts into the system) and syslog (/var/log/messages, where all kernel and system program messages are usually stored)." --The Linux Documentation Project (TLDP).

Log files, then, contain critical feedback from your system that you can use to monitor, optimize, or troubleshoot your server.

Rather than go in to a full exposition of logs and the logging process in U*NX systems, this post will focus on a single log file on a particular web server. While we'll only be looking at the access.log file on an NGINX server, the principals covered apply across platforms.

As a minimum, using log files effectively requires that you understand the contents of the log itself. Once you've got a grip on reading the file, it's useful to learn how to modify the preset directives governing what gets logged, and how that information is displayed.

In our example, /var/log/nginx/access.log contains information about client requests made on the server (e.g. when, from their phone, a user clicks on the "Contact Us" link of your site). If we attempt to look at this log with:

head -n 1 /var/log/nginx/access.log,

where head -n 1 pulls the first line of our access.log file, we are met with the following cryptic entry:

66.249.79.156 - - [27/Feb/2018:07:14:16 +0000] "GET /robots.txt HTTP/1.1" 200 75 "https://geekberg.info" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)".

You may be able to decipher this entry off the cuff, but in case you can't, here's a breakdown:

66.249.79.156 → the IP address of a client accessing your site;

[27/Feb/2018:07:14:16 +0000] → the date and time of access;

"GET /robots.txt HTTP/1.1" → the request the client made to your server;

200 → the status code of the request;

75 → the size of the response to the request (in bytes);

https://geekberg.info → the referrer URL, i.e. the page the client was on before it made the current request;

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) → the client used to access the page.

Also, note that -- is used as a place holder in the event of an empty field.

In our next post, we'll explore using the terminal to gather web traffic analytics.

Cheers.