Christopher Schmidt wrote:
On Wed, Jun 25, 2008 at 04:06:28PM +0100, David Cantrell wrote:
Yes, what you're missing is that google don't pay attention to robots.txt or the meta thingy. I expect that they cache it and then ignore changes for some time. Yahoo do the same.
Er, you seem to be misunderstanding how meta tags work: they have to *Crawl* the page to see the tags... and there is no tag that says "never crawl this page again."
You seem to be misunderstanding the concept of a cache. If they read the meta tag once, they should remember what it said for a while, AND OBEY IT without asking for that page again. Likewise robots.txt.
I've never had Google violate robots.txt.
Lucky you. I had them start crawling one of my sites, so I added a robots.txt, but they kept coming. I can understand them keeping going for a day or so cos they cached the fact that I didn't have a robots.txt file, but they were still requesting files other than robots.txt well over a month later. I hope all their programmers' children die in a fire.
the key thing to point to would be an
instance of Google search results containing a piece of HTML that is blocked by noindex. If you can find one of those, I bet that Google would be interested in seeing it. (Cheap tricks like modifying the HTML after Google crawls by don't count.)
I have no interest in helping google. My time is better spent by taking a few seconds to block their abusive bot than it is in figuring out how to contact the right person and convincing him that he's fucked up and then waiting months while he gets manglement approval to deploy a bugfix, while all the time the bot continues to prevent real users from having access to the service.