Webalizer shows only PART of the very first day in a log

Kev asked:

I’m using webalizer-2.23-04-cygwin, the latest binary I could find, and it’s doing the same thing an older version was also doing on a certain 900MB logfile I have. It only shows the first 411 hits, everything before around 6pm, not that there’s anything special about that, at least when I look at the lines of the logfile myself I don’t see much difference.

I’m using the sample.conf file with only these changes:

  1. output directory
  2. Incremental yes — read somewhere this might help with this issue but didn’t
  3. Really_quiet yes

The latter is because I was getting a number of “user name truncated” messages, but my logfile doesn’t even have usernames, first 411 lines or not.

Example line 407: - - [24/Sep/2010:17:42:27 -0400] "GET /home/ HTTP/1.1" 200 13382 "http://intapp/task5394" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv: Gecko/20100914 Firefox/3.6.10"

Example line 435: [24/Sep/2010:18:20:17 -0400] "GET /home/ HTTP/1.1" 200 11644 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv: Gecko/20100914 Firefox/3.6.10"

Example supressed warning:

Skipping bad record (3639)
Warning: Truncating oversized username

What am I doing wrong here?

My answer:

The log file format is different between your two examples. Since webalizer is expecting the first format, it can’t parse the second format.

In the second example, fields 2 and 3 (each of which is a - here) have been removed.

You have a couple of options: You can edit the log file to replace the missing fields, or you can change webalizer’s configuration to ignore the missing fields. Either way, you’ll almost certainly have to split the log file at this change to work with it.

View the full question and any other answers on Server Fault.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.