(W)get at the Truth
In a previous post, we discussed HTTP response codes, and shared a snippet for using curl to extract a website's response code for the purpose of checking if that site was accessible. Today, we'll show how to use wget to achieve the same effect.
Here's the gist of how to do this:
#!/usr/bin/env bash
url="https://geekberg.info"
wget_check() {
status_code=$(wget --spider --server-response $url 2>&1 | awk '/HTTP\/1.1/{print $2}' | head -1)
printf "%s\\n" "$status_code"
if [ "$status_code" -ne "200" ] ; then
printf "%s\\n" "BAD URL"
else
printf "%s\\n" "GOOD URL"
fi
}
wget_check
We use wget's built-in ability to function as a web crawler so that it checks--but does not download--a given site's headers:
URL transformed to HTTPS due to an HSTS policy
Spider mode enabled. Check if remote file exists.
--2019-05-17 00:26:59-- https://geekberg.info/
Resolving geekberg.info (geekberg.info)... 159.65.178.169
Connecting to geekberg.info (geekberg.info)|159.65.178.169|:443... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Server: nginx/1.10.3 (Ubuntu)
Date: Fri, 17 May 2019 00:26:59 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 13112
Connection: keep-alive
X-Powered-By: Express
Cache-Control: public, max-age=0
ETag: W/"3338-S/XfSVjCnN/lfdMnlmt2u8WQ478"
Vary: Accept-Encoding
Strict-Transport-Security: max-age=63072000; includeSubDomains; preload
X-Frame-Options: SAMEORIGIN
X-Content-Type-Options: nosniff
Length: 13112 (13K) [text/html]
Then, using awk, we parse out the second field of the line beginning with HTTP/1.1
and pipe this information to the head command. This is done because websites can return multiple response codes, and we're only concerned with the first.
From there, we use an if/else constuct to check if the response code from the site is not equal to 200
. If that's the case, we'll call it a "BAD URL", otherwise we know it's kosher, hence the "GOOD URL".
Cheers.