Create an Account
username: password:
 
  MemeStreams Logo

Indexing Robot Crawler Checklist

search

Acidus
Picture of Acidus
My Blog
My Profile
My Audience
My Sources
Send Me a Message

sponsored links

Acidus's topics
Arts
Business
Games
Health and Wellness
Home and Garden
Miscellaneous
Current Events
Recreation
Local Information
Science
Society
Sports
Technology

support us

Get MemeStreams Stuff!


 
Indexing Robot Crawler Checklist
Topic: Technology 10:33 pm EST, Dec  1, 2005

This document provides both technical information and some background and insight into what search engine indexing robots should expect to encounter . Technically, the problems arise from misunderstandings and exploitation of anomalies by HTML creators (direct tagging, WYSIWYG and automated systems), and the tendency of browser applications to be very forgiving in their interpretation of pages and links. Therefore, it's impossible to simply read the HTML and HTTP specifications and follow the rules there -- the real world is much messier than that.

I wish I had found this about 4 months ago! Easily the best checklist of the various issues and practical solutions you will face when writing a web crawler.

Indexing Robot Crawler Checklist



 
 
Powered By Industrial Memetics
RSS2.0