My lack of posting and failure to follow through on promised posts has been noted by Sol over on Federated Search Blog. I wish I had a good excuse, but I really don’t (aside from my usual excuses: school, work, baby). So… sorry about that. 😐
Things are going to be hectic for the next couple of weeks as the semester winds down, but after that, my schedule should improve tremendously. One of the things I’ll be reviving is my posts on Deep-Web crawling. Despite being quiet about it, I’ve actually been very active in this area lately. I just gave a 45 minute talk on the topic yesterday (if anyone wants to see the slides, let me know and I’ll post them), and I feel about 200% more knowledgeable on the topic than I did when I wrote the prototype version of DeepCrawler.NET. I’m still not sure when I’ll have to pick that project back up and do it justice, but I will pick it up long enough to properly document its usage and post it on the blog for the world to make fun of. I am still very interested in the topic, too, so if anyone has anything they’d like to share, please let me know.
Aside from that, I am also planning on showing off a fully-functional, front-to-back information retrieval system that I’ve created this semester for another class. It uses Lucene for the actual index, but it has a complete pipeline built around it to enable distributed indexing including a document partitioner, indexers, and a simple search interface.