Cache poisoning - dirty SEO

I am looking at my logs on my sites and see very strange referrers from search engines pointing to nonexistent pages on my sites that generate pages from databases and use mod_rewrite (or similar techique) to create search engine friendly urls.

Here is an example how it works - in case you haven't seen or used pages like this:

When you enter a URL e.g. http://domain.tld/whatever_product-review.html it is translated
by the webserver into e.g. reviews.php?product=whatever product

How does that pose a danger and give a perfect way for the evil webmasters to have your site results completely messed up?

Well if your site is programmed right to protect from that kind of poisoningm there is no reason to fear, but I have seen that most sites do not handle that problem and end up in the bottom of the dustbin of the search engines.


Most people (including myself) would first generate a header, title and even a whole page before making the first database lookup so basically it is enough to link to http://domain.tld/evil-keyword-review.html to generate a page full of errors, and a bunch of unwanted keywords on your site.

So you will have "evil keyword prices", "evil keyword reviews", "buy evil keyword" etc on your pages before you even notice it.

Besides evil keywords appearing on your sites you will also have a bunch of identical pages on your site with identical errors, clearly showing it to search engines, that your page is generated from a database and it is possibly nothing else than search engine spam.

Technically it is enough to have any indexed evil domain to link to your site let's say 500 times to non-existent web pages to have a terrible effect on your search engine listings.

So here is my tip: TAKE THE TIME and do a QUERY in whatever database you are getting your results from BEFORE GENERATING THE PAGE, and in case the review/product/prices are not found; do yourself and your visitors a favour, and give a polite 404, and just redirect them to the main page, make a relevant search to the keywords (unless it is in your EVIL KEYWORDS DATABASE), but DO NOT generate a page.

How many sites are affected? Hmm... zillions?

Lots of work to do

publishing date: Mon, 10 Oct 2005 20:13:09 -0600 ( + bytes long)


Back to today's news