org.htmlparser 1.6.jar.Parser是哪个jar包里的

顶好评:(303)
踩坏评:(37)
htmlparser.jar是一款专业的html解析助手,具有体积小巧、速度快的特点,可以满足初学者对html解析的各种要求。
HTMLParser的核心模块是org.htmlparser.Parser类,这个类实际完成了对于HTML页面的分析工作。这个类有下面几个构造函数: & &public Parser (); & &public Parser (Lexer lexer, ParserFeedback fb); & public Parser (URLConnection connection, ParserFeedback fb) throws ParserE & &public Parser (String resource, ParserFeedback feedback) throws ParserE & public Parser (String resource) throws ParserE & &public Parser (Lexer lexer); & &public Parser (URLConnection connection) throws ParserE & &和一个静态类public static Parser createParser (String html, String charset); & &对于大多数使用者来说,使用最多的是通过一个URLConnection或者一个保存有网页内容的字符串来初始化Parser,或者使用静态函数来生成一个Parser对象。ParserFeedback的代码很简单,是针对调试和跟踪分析过程的,一般不需要改变。而使用Lexer则是一个相对比较高级的话题,放到以后再讨论吧。 & &这里比较有趣的一点是,如果需要设置页面的编码方式的话,不使用Lexer就只有静态函数一个方法了。对于大多数中文页面来说,好像这是应该用得比较多的一个方法。 & 下面是初始化Parser的例子。package com.baizeju.import java.io.BufferedRimport java.io.InputStreamRimport java.io.FileInputSimport java.io.Fimport java.net.HttpURLCimport java.net.URL;import org.htmlparser.visitors.TextExtractingVimport org.htmlparser.Ppublic class Main { & &private static String ENCODE = "GBK"; & &private static void message( String szMsg ) { & & & &try{System.out.println(new String(szMsg.getBytes(ENCODE), System.getProperty("file.encoding"))); } catch(Exception e ){} & &} & &public static String openFile( String szFileName ) { & & & &try { & & & & & &BufferedReader bis = new BufferedReader(new InputStreamReader(new FileInputStream( new File(szFileName)), ENCODE) ); & & & & & &String szContent=""; & & & & & &String szT & & & & & & & & & & & &while ( (szTemp = bis.readLine()) != null) { & & & & & & & &szContent+=szTemp+"\n"; & & & & & &} & & & & & &bis.close(); & & & & & &return szC & & & &} & & & &catch( Exception e ) { & & & & & &return ""; & & & &} & &} & & & public static void main(String[] args) { & & & & & & & &String szContent = openFile( "E:/My Sites/HTMLParserTester.html"); & & & & & & & &try{ & & & & & &//Parser parser = Parser.createParser(szContent, ENCODE); & & & & & &//Parser parser = new Parser( szContent ); & & & & & Parser parser = new Parser( (HttpURLConnection) (new URL("http://127.0.0.1:8080/HTMLParserTester.html")).openConnection() ); & & & & & & & & & &TextExtractingVisitor visitor = new TextExtractingVisitor(); & & & & & &parser.visitAllNodesWith(visitor); & & & & & &String textInPage = visitor.getExtractedText(); & & & & & &message(textInPage); & & & &} & & & &catch( Exception e ) { & & & & & & & & & &} & &}}加重的部分测试了几种不同的初始化方法,后面的显示了结果。大家看到能Parser出内容就可以了,如何操作访问Parser的内容我们在后面讨论。HTMLParser将解析过的信息保存为一个树的结构。Node是信息保存的数据类型基础。请看Node的定义:public interface Node extends CNode中包含的方法有几类:对于树型结构进行遍历的函数,这些函数最容易理解:Node getParent ():取得父节点NodeList getChildren ():取得子节点的列表Node getFirstChild ():取得第一个子节点Node getLastChild ():取得最后一个子节点Node getPreviousSibling ():取得前一个兄弟(不好意思,英文是兄弟姐妹,直译太麻烦而且不符合习惯,对不起女同胞了)Node getNextSibling ():取得下一个兄弟节点取得Node内容的函数:String getText ():取得文本String toPlainTextString():取得纯文本信息。String toHtml () :取得HTML信息(原始HTML)String toHtml (boolean verbatim):取得HTML信息(原始HTML)String toString ():取得字符串信息(原始HTML)Page getPage ():取得这个Node对应的Page对象int getStartPosition ():取得这个Node在HTML页面中的起始位置int getEndPosition ():取得这个Node在HTML页面中的结束位置用于Filter过滤的函数:void collectInto (NodeList list, NodeFilter filter):基于filter的条件对于这个节点进行过滤,符合条件的节点放到list中。用于Visitor遍历的函数:void accept (NodeVisitor visitor):对这个Node应用visitor用于修改内容的函数,这类用得比较少:void setPage (Page page):设置这个Node对应的Page对象void setText (String text):设置文本void setChildren (NodeList children):设置子节点列表其他函数:void doSemanticAction ():执行这个Node对应的操作(只有少数Tag有对应的操作)Object clone ():接口Clone的抽象函数。
Tanida Demo Builder是一个用来创建交互式Flash影片...
Dreamingsoft 123 Flash Menu是一款十分强大且专业的...
DreamWeaver cc2015是由adobe推出的一款所见即所得的...
Dreamweaver 8 单文件绿色版由卡饭会员优化,此软件单...
Dreamweaver CS3简称dw cs3,这是由adobe打造的一款网...
Dreamweaver CC 2016是由adobe推出的一款可视化网页设...
htmlparser.jar下载 2.0
121下载口号:伸出你我的手 ― 分享!
121下载站破解版软件均来自互联网, 如有侵犯您的版权, 请与我们联系。121所有软件经过严格安装检测,保证不会有任何 病毒木马等信息,请大家放心使用;
大家在安装的时候务必留意每一步的 N强烈推荐使用
下载本站软件以获取最佳的下载速度。
如果您觉得本站还不错, 以便下一次的访问 ^-^ ^-^
121下载QQ群:
① 5030944java.lang.NoClassDefFoundError: org/htmlparser/util/ParserException - ITeye问答
我使用htmlparser进行application编程是遇到了这样的问题:
java.lang.NoClassDefFoundError: org/htmlparser/util/ParserException at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Unknown Source) at java.lang.Class.getDeclaredMethod(Unknown Source) at com.exe4j.runtime.LauncherEngine.launch(Unknown Source) at com.exe4j.runtime.WinLauncher.main(Unknown Source)
我是把做好的程序打包成jar,之后使用exe4j打exe完成后,点exe程序的时候出现了的。
有谁知道是什么原因吗?
没找到字节文件……
1.jar包不对
2.重新构建一下工程
java.lang.NoClassDefFoundError: org/htmlparser/util/ParserException
没找不到类定义,你打的程序没找了jar吧了
可能是导包问题,导好包需要重新部署一下
已解决问题
未解决问题HTML Parser -
HTML Parser
Last Published: 09/17/2006
HTML Parser
HTML Parser is a Java library used to parse HTML in either a linear or nested fashion.
Primarily used for transformation or extraction, it features filters, visitors,
custom tags and easy to use JavaBeans. It is a fast, robust and well tested package.
Welcome to the homepage of HTMLParser - a super-fast real-time
parser for real-world HTML. What has attracted most developers to HTMLParser has
been its simplicity in design, speed and ability to handle streaming real-world
The two fundamental use-cases that are handled by the parser are
(the syntheses use-case, where HTML pages are created from scratch, is better
handled by other tools closer to the source of data). While prior versions
concentrated on data extraction from web pages, Version 1.4 of the
HTMLParser has substantial improvements in the area of transforming web
pages, with simplified tag creation and editing, and verbatim toHtml() method
In general, to use the HTMLParser you will need to be able to write code in
the Java programming language. Although some example programs are provided
that may be useful as they stand, it's more than likely you will need (or
want) to create your own programs or modify the ones provided to match your
intended application.
To use the library, you will need to add either the htmllexer.jar or
htmlparser.jar to your classpath when compiling and running. The
htmllexer.jar provides low level access to generic string, remark and tag nodes on
the page in a linear, flat, sequential manner. The htmlparser.jar, which
includes the classes found in htmllexer.jar, provides access to a page as a
sequence of nested differentiated tags containing string, remark and other
tag nodes. So where the output from calls to the lexer
method might be:
The output from the parser
nest the tags as children of the &html&, &head& and other nodes
(here represented by indentation):
The parser attempts to balance opening tags with ending tags to present the
structure of the page, while the lexer simply spits out nodes. If your
application requires only modest structural knowledge of the page, and is
primarily concerned with individual, isolated nodes, you should consider
using the lightweight lexer. But if your application requires knowledge of
the nested structure of the page, for example processing tables, you will
probably want to use the full parser.
Extraction
Extraction encompasses all the information retrieval programs that are not
meant to preserve the source page. This covers uses like:
text extraction, for use as input for text search engine databases for example
link extraction, for crawling through web pages or harvesting email
screen scraping, for programmatic data input from web pages
resource extraction, collecting images or sound
a browser front end, the preliminary stage of page display
link checking, ensuring links are valid
site monitoring, checking for page differences beyond simplistic diffs
There are several facilities in the HTMLParser codebase to help with
extraction, including
Transformation
Transformation includes all processing where the input and the output
are HTML pages. Some examples are:
URL rewriting, modifying some or all links on a page
site capture, moving content from the web to local disk
censorship, removing offending words and phrases from pages
HTML cleanup, correcting erroneous pages
ad removal, excising URLs referencing advertising
conversion to XML, moving existing web pages to XML
During or after reading in a page, operations on the nodes can
accomplish many transformation tasks &in place&, which can then be output
Depending on the purpose of your application, you will probably want to look
into node decorators,
in conjunction with the

我要回帖

更多关于 htmlparser.jar 2.0 的文章

 

随机推荐