Christopher Schmidt wrote:
On Wed, Jun 25, 2008 at 04:06:28PM +0100, David
Yes, what you're missing is that google
don't pay attention to
robots.txt or the meta thingy. I expect that they cache it and then
ignore changes for some time.
Yahoo do the same.
Er, you seem to be misunderstanding how meta tags work: they
*Crawl* the page to see the tags... and there is no tag that says "never
crawl this page again."
You seem to be misunderstanding the concept of a cache. If they read
the meta tag once, they should remember what it said for a while, AND
OBEY IT without asking for that page again. Likewise robots.txt.
I've never had Google violate robots.txt.
Lucky you. I had them start crawling one of my sites, so I added a
robots.txt, but they kept coming. I can understand them keeping going
for a day or so cos they cached the fact that I didn't have a robots.txt
file, but they were still requesting files other than robots.txt well
over a month later. I hope all their programmers' children die in a fire.
the key thing to point
to would be an
instance of Google search results containing a piece of HTML that is
blocked by noindex. If you can find one of those, I bet that Google
would be interested in seeing it. (Cheap tricks like modifying the HTML
after Google crawls by don't count.)
I have no interest in helping google. My time is better spent by taking
a few seconds to block their abusive bot than it is in figuring out how
to contact the right person and convincing him that he's fucked up and
then waiting months while he gets manglement approval to deploy a
bugfix, while all the time the bot continues to prevent real users from
having access to the service.
header FROM_DAVID_CANTRELL From =~ /david.cantrell/i
describe FROM_DAVID_CANTRELL Message is from David Cantrell
score FROM_DAVID_CANTRELL 15.72 # This figure from experimentation