Apache Logging Filter Robots

[problem]

Sick of filtering through loads of logs, or just spotting real hits from the robots! 🙂

Seriously reduce your apache web logs, by filtering out images, style sheets and your own hits.

[/problem]

[solution]

Simple with Apache’s customlog and setenvif statements.

I’ve also included capturing the user-agent in a separate file, as well as the referer, which is brill for seeing which google searches brought traffic to you.

You can even still capture robot and own hits into a separate log, here is how below.

[/solution]

[example]

         SetEnvIf Request_URI ".(png|gif|jpg|js|css)" image-req         SetEnvIf Request_URI "favicon.ico" image-req         SetEnvIf Request_URI "/icons" image-req         SetEnvIf Request_URI "sitemap.xml.gz" image-req         SetEnvIf REMOTE_ADDR "127.0.0.1" image-req         SetEnvIf REMOTE_ADDR "127.0.0.1" home-req         SetEnvIf User-agent "(Googlebot|msnbot|Spider|crawl|slurp|Jeeves| Mediapartners|FeedBurner)" image-req         SetEnvIf User-agent "(Googlebot|msnbot|Spider|crawl|slurp|Jeeves| Mediapartners|FeedBurner)" bot-req      CustomLog logs/access_log.techieblogs "["%{Referer}i"]n %h %l %u %t "%r" %>s %b" env=!image-req     CustomLog logs/access_log.agents.techieblogs "%h ["%{Referer}i"] ["%{User-agent}i"]" env=!image-req     CustomLog logs/access_log.bots.techieblogs "["%{Referer}i"] %h %l %u %t "%r" %>s %b" env=bot-req     CustomLog logs/access_log.home.techieblogs "["%{Referer}i"] n %h %l %u %t "%r" %>s %b" env=home-req 

[/example]

[reference]

[tags]Apache Logging, Unix Coding School[/tags]

[/reference]

If you have found my website useful, please consider buying me a coffee below 😉

Leave a Reply

Your email address will not be published. Required fields are marked *