1. XenForo 1.5.14 中文版——支持中文搜索!现已发布!查看详情
  2. Xenforo 爱好者讨论群:215909318 XenForo专区

新闻 jsoup 1.11.1 发布,最强的 Java HTML 解析器 下载

本帖由 漂亮的石头2017-11-06 发布。版面名称:软件资讯

  1. 漂亮的石头

    漂亮的石头 版主 管理成员

    注册:
    2012-02-10
    帖子:
    488,069
    赞:
    47
    jsoup 1.11.1 发布了,该版本降低了 30% 的 DOM 内存使用,增加了流式网络 HTML 解析,更快的 HTML 生成以及大量的改进和 bug 修复,下载地址:https://jsoup.org/download

    详细的改进内容如下:

    Improvements


    • When loading content from a URL or a file, the content is now parsed as it streams in from the network or disk, rather than being fully buffered before parsing. This substantially reduces memory consumption & large garbage objects when loading large files. Note that this change means that a response, once parsed, may not be parsed again from the same response object unless you call Connection.Response.bufferUp() first, which will buffer the full response into memory.


    • Updated language level to Java 7 from Java 5. To maintain Android support (of minversion 8), try-with-resources are not used.


    • Added Connection.Response.bodyStream(), a method to get the response body as an input stream. This is useful for saving a large response straight to a file, without buffering fully into memory first.


    • Performance improvements in text and HTML generation (through less GC).


    • Reduced memory consumption of text, scripts, and comments in the DOM by 40%, by refactoring the node hierarchy to not track childnodes or attributes by default for lead nodes. For the average document, that's about a 30% memory reduction.


    • Reduced memory consumption of Elements by refactoring their Attributesto be a simple pair of arrays, vs a LinkedHashSet.


    • Added support for Element.selectFirst(), to efficiently find the first matching element.


    • Added Element.appendTo(parent) to simplify slinging elements about.


    • Added support for multiple headers with the same name in Jsoup.Connect


    • Added Element.shallowClone() and Node.shallowClone(), to allow cloning nodes without getting all their children.


    • Updated Element.text() and the :contains(text) selector to consider   character as spaces.


    • Updated Jsoup.connect().timeout() to implement a total connect + combined read timeout. Previously it specified connect and buffer read times only, so to implement a combined total timeout, you had to have another thread send an interupt.


    • Improved performance of Node.addChildren() (was quadratic)


    • Added missing support for template tags in tables


    • In Jsoup.Connect file uploads, added the ability to set the uploaded files' mimetype.


    • Improved Node traversal, including less object creation, and partial and filtering traversor support.
    Bug Fixes


    • Bugfix: if a document was was redecoded after character set detection, the HTML parser was not reset correctly, which could lead to an incorrect DOM.


    • Bugfix: attributes with the same name but different case would be incorrectly treated as different attributes.


    • Bugfix: self-closing tags for known empty elements were incorrectly treated as errors.


    • Bugfix: fixed an issue where a self-closing title, noframes, or style tag would cause the rest of the page to be incorrectly parsed as data or text.


    • Bugfix: fixed an issue with unknown mixed-case tags


    • Bugfix: fixed an issue where the entity resources were left open after startup, causing a warning.


    • Bugfix: fixed an issue where Element.getElementsByIndexLessThan(index) would incorrectly provide the root element


    • Improved parse time for pages with exceptionally deeply nested tags.


    • Improvement / workaround: modified the Entities implementation to load its data from a .class vs from a jar resource. Faster, and safer on Android.
    jsoup 1.11.1 发布,最强的 Java HTML 解析器下载地址
     
正在加载...