- We spider blogs, and match up their links to your blog - to anywhere on your blog
- In the inbound blog list, we use the outbound links from the blog homepage, not from the archives
- We do process RSS feeds an other metadata, but that doesn't affect your inbound blog stats.
- Nightly, we go through the database and re-calculate the number of inbound blogs and links, which helps us double-check our work and also allows us to create the interesting newcomers list, the interesting recent blogs list, etc.
I saw some the spider were written in Ruby, Phyton and Perl. Take a look to this article, "A web crawler in Perl", maybe this is a good lead.
No comments:
Post a Comment