Category: DeepCrawler.NET

Crawling results in DeepCrawler.NET

In the last post, I laid out DeepCrawler.NET’s (primitive) strategy for finding search forms, populating them, and submitting their contents using WatiN and a heuristic search mechanism.  As I mentioned at the end of the previous post though, submitting a query is only the first step in a complicated process.  Assuming nothing goes wrong, submitting…

Read More

DeepCrawler.NET: Alive and Kicking

Much to my surprise, getting DeepCrawler.NET up and working with basic functionality was surprisingly easy.  It’s far from finished, and I haven’t exhaustively tested it, but it does work.  In this post, I’ll describe the current implementation with respect to how I’ve addressed some of the barriers raised in my last post. How do we…

Read More

Deep-web crawling with .NET: Getting Started

Thanks go out to Sol over at FederatedSearchBlog.com for giving me some suggestions on things to watch out for.  If you want more background information on federated search or information retrieval, go check it out that site. In the last post, I introduced the idea of creating a deep-web crawler.  I laid out the basic…

Read MoreView 4 Comments

Creating a deep-web crawler with .NET: Background

For one of my graduate courses, I’ve decided to tackle the task of creating an intelligent agent for deep-web (AKA hidden-web) crawling.  Unlike traditional crawlers, which work by following hyperlinks from a set of seed pages, a deep-web crawler digs through search interfaces and forms on the Internet to access the data underneath, data that…

Read More