java left logo
java middle logo
java right logo
 

Home arrow Java Libraries
 
 
Main Menu
Home
Java Tutorials
Book Reviews
Java SE Tips
Java ME Tips
Java EE Tips
Other API Tips
Java Applications
Java Libraries
Java Games
Java Network
Java Forums
Java Blog




Most Visited Tips
Java SE Tips
Java ME Tips
Java EE Tips
Other API Tips
Java Applications
Java Libraries
Java Games
Book Reviews
Top Rated Tips
Java SE Tips
Java ME Tips
Java EE Tips
Other API Tips
Java Applications
Java Libraries
Java Games
Book Reviews


Statistics
Registered Users: 4096
Java SE Tips: 614
Java ME Tips: 202
Java EE Tips: 183
Other API Tips: 779
Java Applications: 298
Java Libraries: 209
Java Games: 16
Book Reviews:
 
 
 
Jericho HTML Parser E-mail
User Rating: / 16
PoorBest 

Jerich HTML Parser is a simple but powerful java library allowing analysis and manipulation of parts of an HTML document, including some common server-side tags, while reproducing verbatim any unrecognised or invalid HTML. It also provides high-level HTML form manipulation functions.

The library distinguishes itself from other HTML parsers by its four major features:

  • No parse tree of the entire document is ever generated. The document source text is searched only for the markup relevant to the current operation. This allows the library to analyse and modify documents containing incorrect or badly formatted HTML or any other server or client side code, script, macro or markup. Most other parsers can't handle content that they are not explicitly programmed to accept.
  • The beginning and end positions in the source text of all parsed segments are accessible, allowing modification of only selected segments of the document without having to reconstruct the entire document from a parse tree. This feature, in combination with the one above, makes the toolkit extremely powerful in its simplicity.
  • Provides a simple but comprehensive interface for the analysis and manipulation of HTML form controls, including the extraction and population of initial values, and conversion to read-only or data display modes. Analysis of the form controls also allows data received from the form to be stored and presented in an appropriate manner.
  • ASP, JSP, PSP, PHP and Mason server tags can be registered for recognition by the parser, and are recognised as accurately as is possible without incorporating actual parsers for these languages into the library. The library then allows any of these segments to be ignored when parsing the rest of the document so that they do not interfere with the HTML syntax. (see Segment.ignoreWhenParsing())
  • Custom tag types can be easily defined and registered for recognition by the parser.

URL: http://jerichohtml.sourceforge.net/doc/index.html
Licence: LGPL


 Related Tips

 
< Prev   Next >

Page 1 of 0 ( 0 comments )

You can share your information about this topic using the form below!

Please do not post your questions with this form! Thanks.


Name (required)


E-Mail (required)

Your email will not be displayed on the site - only to our administrator
Homepage(optional)



Comment Enable HTML code : Yes No



 
       
         
     
 
 
 
   
 
 
java bottom left
java bottom middle
java bottom right
RSS 0.91 FeedRSS 1.0 FeedRSS 2.0 FeedATOM FeedOPML Feed

Home - About Us - Privacy Policy
Copyright 2005 - 2008 www.java-tips.org
Java is a trademark of Sun Microsystems, Inc.