Command Line Analytics, Part I
If you are at all interested in the activity or performance of a web server under your charge, it is vital that you become familiar with the contents of /var/log
. For your reference, the contents of this directory include:
"Log files from the system and various programs/services, especially login (/var/log/wtmp, which logs all logins and logouts into the system) and syslog (/var/log/messages, where all kernel and system program messages are usually stored)." --The Linux Documentation Project (TLDP).
Log files, then, contain critical feedback from your system that you can use to monitor, optimize, or troubleshoot your server.
Rather than go in to a full exposition of logs and the logging process in U*NX systems, this post will focus on a single log file on a particular web server. While we'll only be looking at the access.log
file on an NGINX server, the principals covered apply across platforms.
As a minimum, using log files effectively requires that you understand the contents of the log itself. Once you've got a grip on reading the file, it's useful to learn how to modify the preset directives governing what gets logged, and how that information is displayed.
In our example, /var/log/nginx/access.log
contains information about client requests made on the server (e.g. when, from their phone, a user clicks on the "Contact Us" link of your site). If we attempt to look at this log with:
head -n 1 /var/log/nginx/access.log
,
where head -n 1
pulls the first line of our access.log file, we are met with the following cryptic entry:
66.249.79.156 - - [27/Feb/2018:07:14:16 +0000] "GET /robots.txt HTTP/1.1" 200 75 "https://geekberg.info" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
.
You may be able to decipher this entry off the cuff, but in case you can't, here's a breakdown:
66.249.79.156
→ the IP address of a client accessing your site;
[27/Feb/2018:07:14:16 +0000]
→ the date and time of access;
"GET /robots.txt HTTP/1.1"
→ the request the client made to your server;
200
→ the status code of the request;
75
→ the size of the response to the request (in bytes);
https://geekberg.info
→ the referrer URL, i.e. the page the client was on before it made the current request;
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
→ the client used to access the page.
Also, note that --
is used as a place holder in the event of an empty field.
In our next post, we'll explore using the terminal to gather web traffic analytics.
Cheers.