Search Engine

The Compass Framework is a first class open source Java framework, enabling the power of Search Engine semantics to your application stack declaratively. Built on top of the amazing Lucene Search Engine, Compass integrates seamlessly to popular development frameworks like Hibernate and Spring. It provides search capability to your application data model and synchronises changes with the datasource. With Compass: write less code, find data quicker.

URL: http://www.compassframework.org
Licence: Apache License

Java Search Engine is a server-side search engine program for web sites. Search engines provide to the site visitors easy and fast way to find what they want on your site. If you want to have search engine on your site - you can try Java Search Engine. It is easy, just follow instructions on this page.

Java Search Engine has common Java API interfaces such as JSP, servlets and EJB. Can save results as XML and transform them into HTML using XSLT stylesheets.

Java Search Engine is a complete solution, you don't have to to create crawler for it, you don't have to to install or integrate it with any database if you don't want, you don't have to use any other additional software (except JDK of course). This search engine is familiar to your visitors - it has the same query language and output interface as Google.

Features

  • Full featured text search engine software for web sites
  • Supports any operating system, completely in Java
  • Includes web crawler
  • It is FREE
  • Complete solution, no additional software required
  • Available as WAR (Web ARchive), servlet, JSP for Tomcat, Resin or other JSP engine
  • Simple installation using web interface
  • Available as EJB (Enterprise JavaBean) on J2EE Application Server
  • HTML, PDF, MS Word and plain text indexing
  • Available as Java API library
  • Supports incremental update, "hot update" and "hot rescan" (without stopping search), delete pages
  • International encodings support, multiply encodings in one storage, automatic detection
  • Stopwords and word stemming (suffixes stripping) for every configured language
  • Using file system or database (JDBC) for index storage
  • Supports META tags (description and keywords), image ALT tags, BASIC Authentication and forms crawling
  • Can store several separated sites in one index, even multilingual
  • Can group pages and limit search results to some group
  • Customizable page rank with options to boost on word position, number of appearances or by URL
  • Google style output, quotations with highlighted words, or you can use META tags to customize description
  • Google query language, includes AND, OR, "-", phrase, substrings and all possible combinations
  • Can transform results directly into XML, or HTML using XSLT

URL: http://www.me.lv/jse/
Licence: Proprietary

eSearch is a server side Java-based search engine which supplies basic search capabilities for Web use. Its basic capabilities can be extended to include intelligent agents and other expert-system behaviors. eSearch utilizes the Expresso Framework, please also download and install Expresso.

URL: http://www.jcorporate.com/econtent/Content.do?state=template&template=2&resource=641&db=default
Licence: Apache License

This program is a result of my diploma thesis (with the same title). It is a distributed search engine, consisting of multiple search nodes which report their results to a master server. Each node should be responsible for indexing and querying a local node (ideally this is a single web server). The nodes are connected in a hierachical way. Every super node can execute a query with its own index and it can query all (or a subset) of its sub nodes. This is done by determining the sub-nodes which can give the best results for the query.

It is possible to query every node directly, even if it is not the top-node. It will then use its own data and the data of its sub-nodes for answering the query.

Special features are:

  • distributed search engine
  • tolerant against writing errors and other words formes
  • separated data server and data gatherer
  • can support many file formats via plugin mechanism, supported are
  • PDF
  • HTML
  • plain text
  • ZIP and gzip files
  • uses any relational database (maybe with small changes because of differences in the SQL dialect)
  • tested with InstantDB and Oracle Lite
  • can gather data via HTTP or from local file system
  • HTTP spider is resistent against loops (if a document links against itself, but in another path)
  • HTTP Spider is resitsnet againss HTML errors (like missing "'s in parameters or non-quoted &'s)

For the fault tolerance, a so-called "trigram index" is used. This index takes all trigrams (3-letter-combinations) which a word contains, and stores this information in a reverse index. From the words to the documents there is another reverse index. This gives a high speed for queries and a tolerance against mis-spelled words (either in the searched documents or in the query). It can also find substrings in words.

URL: http://www.hendriklipka.de/java/ldse.html
Licence: GPL

Spindle is a web indexing/search tool built on top of the Lucene toolkit. It includes a HTTP spider that is used to build the index, and a search class that is used to search the index. In addition, support is provided for the Bitmechanic listlib JSP TagLib, so that a search can be added to a JSP based site without writing any Java classes.

URL: http://www.bitmechanic.com/projects/spindle/
Licence: GPL