1. XenForo 1.5.14 中文版——支持中文搜索!现已发布!查看详情
  2. Xenforo 爱好者讨论群:215909318 XenForo专区

新闻 jsoup 1.8.3 发布,HTML 解析器 下载

本帖由 漂亮的石头2015-08-03 发布。版面名称:软件资讯

  1. 漂亮的石头

    漂亮的石头 版主 管理成员

    注册:
    2012-02-10
    帖子:
    487,345
    赞:
    47
    jsoup 1.8.3 发布,此版本主要改进有:解析大型 HTML 文件的一些性能提升;抓取 XML 文档时,自动切换到 XML 解析器;重要 bug 修复。

    更新内容:

    改进


    • Performance improvement on parsing larger HTML pages. On Android KitKat, around 1.7x times faster.


    • On Android Lollipop, ~ 1.3x faster. Improvements largely from re-ordering the HtmlTreeBuilder methods based on analysis of various websites; also from further memory reduction for nodes with no children, and other tweaks.


    • When fetching XML URLs, automatically switch to the XML parser instead of the HTML parser.


    • Improved support for boolean attributes in HTML5.


    • When serialising XML, ensure that '<' characters in attributes are escaped, per spec. Not required in HTML.

    Bug 修复


    • Fixed an issue in Element.elementSiblingIndex() (and related methods) where sibling elements with the same content would incorrectly have the same sibling index.


    • Fixed an issue where unexpected elements in a badly nested table could be moved to the wrong location in the document.


    • Fixed an issue where a table nested within a TH cell would parse to an incorrect tree.


    • When serializing a document using the XHTML encoding entities, if the character set did not support &nbsp; chars (such as Shift_JIS), the character would be skipped. For visibility, will now always output &xa0; (the hex code for non-breaking-space); when using XHTML encoding entities (as &nbsp; is not defined), regardless of the output character set.


    • Fixed an issue when resolving URLs, where if the absolute URL had no path, the relative URL was not normalized correctly.


    • Fixed an issue where connections that were redirected to a relative URL did not have the same normalization rules as a URL read from Nodes.absUrl(String).

    本站使用 jsoup 来解析 HTML。

    jsoup 是一款 Java 的HTML 解析器,可直接解析某个URL地址、HTML文本内容。它提供了一套非常省力的API,可通过DOM,CSS以及类似于JQuery的操作方法来取出和操作数据。

    jsoup的主要功能如下:


    1. 从一个URL,文件或字符串中解析HTML;


    2. 使用DOM或CSS选择器来查找、取出数据;


    3. 可操作HTML元素、属性、文本;

    jsoup是基于MIT协议发布的,可放心使用于商业项目。
    jsoup 1.8.3 发布,HTML 解析器下载地址
     
正在加载...