reading Apache log
http2pl reads the Apache log.
host_name("80.142.182.188","p508EB6BC.dip.t-dialin.net").
http_get("p508EB6BC.dip.t-dialin.net","Apr-25"," 00:56:21",/dns.html).
http_from("http://mserv.rrzn.uni-hannover.de/cgi-bin/meta/meta.ger1",
"Apr-25"," 00:56:21","p508EB6BC.dip.t-dialin.net").
Meaning host p508EB6BC.dip.t-dialin.net with ip 80.142.182.188 asked for /dns.html and was lead here from http://mserv.rrzn.uni-hannover.de/cgi-bin/meta/meta.ger1
Sorting makes a lot of sense here. After sorting I have on one heap which pages have been visited and on the other heap from where people were directed to my homepage.
My Apache runs without looking up hostnames. So I have only ip addresses in the original Log. Looking up hostnames may take some time. That is why I envoke http2pl normally
cat access_log20050403 | http2pl > access_log20050403.pl &
sort access_log20050403.pl > access_log20050403.sort
Letting IASON do its work in the background.
It may be interesting to see if I have repeating guests
grep robots *.pl
access.log.16.pl:http_get("crawl-66-249-64-7.googlebot.com","Apr-22"," 08:38:49",/robots.txt).
access.log.16.pl:http_get("lj1018.inktomisearch.com","Apr-22"," 12:47:33",/robots.txt).
access.log.16.pl:http_get("ns.ww.de.plusline.net","Apr-22"," 14:46:31",/robots.txt).
access.log.16.pl:http_get("sv-fw.looksmart.com","Apr-22"," 16:34:36",/robots.txt).
access.log.16.pl:http_get("cpe-66-65-225-179.nycap.res.rr.com","Apr-22"," 20:01:22",/robots.txt).
access.log.17.pl:http_get("lj1018.inktomisearch.com","Apr-25"," 07:56:46",/robots.txt).
access.log.17.pl:http_get("crawl-66-249-64-49.googlebot.com","Apr-25"," 08:02:22",/robots.txt).
access.log.17.pl:http_get("ns.ww.de.plusline.net","Apr-25"," 12:34:26",/robots.txt).
access.log.17.pl:http_get("shroud.64-242-88-10.crawlers.looksmart.com","Apr-25"," 14:00:21",/robots.txt).
Now grep looks for me throug all my processed logfiles and tells me the file where it found what I was looking for.
|