In the last post, I laid out DeepCrawler.NET’s (primitive) strategy for finding search forms, populating them, and submitting their contents using WatiN and a heuristic search mechanism. As I mentioned at the end of the previous post though, submitting a query is only the first step in a complicated process. Assuming nothing goes wrong, submitting…
try-catch-FAIL
Category: <span>DeepCrawler.NET</span>
DeepCrawler.NET: Alive and Kicking
Much to my surprise, getting DeepCrawler.NET up and working with basic functionality was surprisingly easy. It’s far from finished, and I haven’t exhaustively tested it, but it does work. In this post, I’ll describe the current implementation with respect to how I’ve addressed some of the barriers raised in my last post. How do we…
Deep-web crawling with .NET: Getting Started
Thanks go out to Sol over at FederatedSearchBlog.com for giving me some suggestions on things to watch out for. If you want more background information on federated search or information retrieval, go check it out that site. In the last post, I introduced the idea of creating a deep-web crawler. I laid out the basic…
Creating a deep-web crawler with .NET: Background
For one of my graduate courses, I’ve decided to tackle the task of creating an intelligent agent for deep-web (AKA hidden-web) crawling. Unlike traditional crawlers, which work by following hyperlinks from a set of seed pages, a deep-web crawler digs through search interfaces and forms on the Internet to access the data underneath, data that…