blogger labels and robots.txt
after blogger introduced labels, i noticed that google was treating my label pages as original and the dedicated post pages as duplicate (and possibly attempts to rig google page ranking). not much concerned about the page ranking, since this site's mainly a personal diary and yackfest i long ago stopped trying to whore traffic for, but it was a pain in the ass if i tried to find something and it led me to the label page. since i hide much of the front page text for the purpose of post continuations on the post page (where the full post's revealed), i regularly ended up searching the html source instead of the post itself. pain in the neck; i google this site plenty when writing new posts.not a fan of clutter traffic. i lock down images (the biggest traffic draw to this site) so that much of the time somebody clicks on a link from google, it brings up a nonsensical error page (think that denial only works if the pic hasn't loaded into their cache already). experienced webbers know how to easily and immediately get around that, but it helps, and also cuts 99% of hotlinking. with google using the new blogger labels, potential for bogus hits was increased radically, and through the recent addition of sitemeter.com i found that many of the hits here were garbage, connecting unrelated posts just because they appear on the same label page.
blogger solved this problem for blogspot users by adding a robots.txt disallowing the search folder of every site. i solved my problem via robots.txt by blocking the labels folder. while making mods, i zapped all the old archive pages and routed them to a new folder, blocking that as well, since those also tend to return irrelevant hits, combining all the post titles for a month into one page.
added a sitemap for the hell of it, but probably going to delete it. don't see that it does anything for a site like this except add one more maintenance hassle. seems to be a big argument, the value of sitemaps, with some alleging that a sitemap can make things worse. don't know about that, but i do know that the whole time i was compiling the sitemap (because google and godaddy sorta made it sound like a great thing for everybody), i was wondering if there were even the slight possibility it would improve google searches here. seems google's being doing a helluva job without me pointing out where my pages are, with the only problem being, as explained above, that it saw too many pages. i don't have any flash, relevant java, etc., and haven't yet found a persuasive argument for why i should bother with a sitemap.
if you're an FTP blogger user and wondering why labels or your archive have whacked out your search results, give robots.txt a whirl. relevant portion of mine:
User-agent: *
Disallow: /blog/archive/
Disallow: /blog/labels/
Labels: hints from pigloise








<< Home