Print

WebSPHINX ( Website-Specific Processors for HTML INformation eXtraction) is a Java class library and interactive development environment for web crawlers. A web crawler (also called a robot or spider) is a program that browses and processes Web pages automatically.

WebSPHINX consists of two parts: the Crawler Workbench and the WebSPHINX class library.

Crawler Workbench

The Crawler Workbench is a graphical user interface that lets you configure and control a customizable web crawler. Using the Crawler Workbench, you can:

WebSPHINX class library

The WebSPHINX class library provides support for writing web crawlers in Java. The class library offers a number of features:

URL: http://www.cs.cmu.edu/~rcm/websphinx/
Licence: Apache License

Parent Category: Java Applications