Are you looking at me?
Keeping track of the number of people visiting your website may be important in measuring the effectiveness of your web content and marketing strategies. If you followed the instructions in our last newsletter regarding search engine optimisation or have recently published an article with links to your website you may want to check and see what impact your efforts are having on traffic directed to your website. You may also want to know which pages are most frequently visited, where visitors are coming from, how they found your site, what search keywords they utilised, how these factors change over time and many other pieces of information about the people using your website. In this edition of Make the Web Work we start by looking at web logs and understanding web statistics. In a future edition we'll suggest ways in which you can use this information to inform decision making processes around planning, developing and managing a more effective web presence. What we knowMuch of this information will be recorded in a log file by the web server that is hosting your website. The types of information recorded in the server log includes things like - their network address,
- the date and time of access,
- the address of the page requested,
- the referring page if available, the page where they clicked a link
leading to your page (including any search terms they may have used to
find your web page),
- the type and version of
web browser the visitor was using, and
- the type and version operating system (E.g. Windows, Mac etc.) the visitor was using.
What we don't knowOne thing the logs can't record is exactly who is visiting
your site and exactly what the individual user is looking at on
your site. This is because the web operates in a "stateless" way,
that is to say each page is served independently, without any
knowledge of the users actions that came before it. The web's extensive use of proxy
servers (to improve performance and sometimes to filter
content) also causes a problem with web logs in that many users
at a company or university might be visiting your site via one
proxy server, but your web logs will only record the proxy
server's IP address as a unique visit.
There are some methods of overcoming these limitations. Google Analytics uses a small piece of JavaScript code in your web page to record and track other information about web visitors. While these types of methods can garner additional useful information, the resulting information is not as accurate as web logs which don't rely on technologies like JavaScript which won't be available in all web browsers. Making sense of logsThe logs themselves are very much a raw data set; huge files with thousands of lines like this: 130.95.86.59 - - [17/Jul/2006:15:02:13 +0800] "GET /administrative_services/mail_room/mail_service/mbdp HTTP/1.1" 200 160468 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.2) Gecko/20060405 SeaMonkey/1.0.1" Thankfully a wide range of free and commercial tools are available to sort through and analyse this raw data. Web log analysis software is capable of generating website statistics and other useful information from the web logs that will help you track the use of your website and to plan and make decision on how to more effectively manage your content. Most analysis tools generate similar information and describe the information using similar terminology. You may see reports from web log analysis software refer to hits, visits, pages, referrers, user-agents... We'll discuss the web site statistics generated by one such tool below but the language used and general principles would be similar if not identical across most web log analysis software. Understanding web statisticsWebsites hosted in the MySource content management system
have access to website statistics generated by a program called Webalizer. If your site is hosted in the centrally provided MySource facility, Webalizer reports can be accessed
via the MySource backend. Look under the Wizards tab for "Links to UWA specific MySource reports" and click on the link to "Webalizer webstats report". Webalizer analyses the web logs for your site and presents the
information in an easy to read way. The following image is from a Webalizer
report.

Interpreting the logs, however, requires a little bit of
specialist knowledge with most confusion around the difference
between the terms "pages"
"files" and "hits".
Let's begin with hits, probably the most
(mis)used term with regard to web site statistics and the most misunderstood. A hit is registered each time an item is requested from a web server. Displaying a single web page in a web browser may involve a number of hits. A web page with 5 images may register 6 hits in the server logs. That's one hit for the page
and one hit for each of the 5 images. For this reason, hits often aren't an accurate indication of web traffic.
Pages are those URLs that would be considered
the actual page being requested, and not all of the individual
items that make it up (such as graphics, stylesheets, JavaScript files, video or audio clips). Some
people call this metric page views or page impressions. Page
views are often used in online advertising, where advertisers use
the number of page views a site receives to determine where and
how to advertise.
Files represent the total number of hits
(requests) that actually resulted in something being sent back to
the user. Not all hits will send data, such as 404-Not Found
requests and requests for pages that are already in the browser's cache.
The next image shows a portion of Webalizer's Daily statistics
table which includes "Visits",
"Sites" and "KBytes".

Visits occur when some remote site makes a
request for a page on your server for the first time. As long as
the same site keeps making requests within a given timeout
period, they will all be considered part of the same Visit. If
the site makes a request to your server, and the length of time
since the last request is greater than the specified timeout
period (default is 30 minutes), a new Visit is started and
counted, and the sequence repeats. Since only pages will trigger
a visit, remotes sites that link to graphic and other non-page
URLs will not be counted in the visit totals, reducing the number
of false visits.
Sites is the number of unique IP
addresses/hostnames that made requests to the server. Care should
be taken when using this metric for anything other than that.
Many users can appear to come from a single site (using a proxy
server), and they can also appear to come from many IP addresses
so it should be used simply as a rough gauge as to the number of
visitors to your server.
A KByte (KB) is 1024 bytes (1 Kilobyte). Used
to show the amount of data that was transferred between the server
and the remote machine, based on the data found in the server
log.
There are some other interesting metrics that can be gained
from the server's log files such as "Referrers",
"Search Strings" and "User
Agents".

Referrers are those URLs that lead a user to
your site or caused the browser to request something from your
server. The vast majority of requests are made from your own
URLs, since most HTML pages contain links to other objects such
as graphics files. If one of your HTML pages contains links to 10
graphic images, then each request for the HTML page will produce
10 more hits with the referrer specified as the URL of your own
HTML page.

Search Strings are obtained from examining
the referrer string and looking for known patterns from various
search engines. The search engines and the patterns to look for
can be specified by the user within a configuration file. The
default will catch most of the major ones.

User Agents describes the information that a web browser uses to identify itself. This information may include details of the operating system that a visitor may be using (Macintosh, Windows...) and the web browser and version number. E.g. MSIE6.0 refers to Microsoft Internet Explorer Version 6.0 You may come across references to Netscape, Opera, Konqueror, Mozilla as User Agents. Keep in mind
however, that many browsers allow the user to change it's
reported name, so you might see some obvious fake names in the
listing.
Web logs are a useful way to measure the traffic on your web
site, to track patterns and to see which pages are the most
popular. However they are not precise, the data can be skewed by
many things like when a search engine visits to index your site.
Proxy servers blur the numbers of actual visitors and the default
timeout settings for the visitors metric means
the numbers recorded can only be considered an estimate. Looking for more?As mentioned earlier, there are a wide range of tools available for analysing web logs. If Webalizer doesn't provide the particular piece of analysis that you require, bear in mind that you may use other tools to analyse the same log files. Ask the University Website Office about your particular requirements. If you would like to try Google Analytics and need an "invitation" to create an account or participate in their program, please contact the University Website Office weboffice@uwa.edu.au Further Reading:
|