新闻 Apache Nutch 1.14 发布，Web 爬虫下载

漂亮的石头 · 2017-12-27

Apache Nutch 1.14 发布了。Nutch是一个成熟的、可用于生产的 Web 爬虫。 Nutch 1.x 可以依靠 Apache Hadoop™ 数据结构进行细粒度配置，这对于批处理非常有用。

更新内容：

Bug 修复

[NUTCH-2071] - A parser failure on a single document may fail crawling job

[NUTCH-2235] - Classpath discrepancy with protocol-selenium in deploy mode

[NUTCH-2269] - Clean not working after crawl

[NUTCH-2295] - Nutch master docker container broken

[NUTCH-2297] - CrawlDbReader -stats wrong values for earliest fetch time and shortest interval

[NUTCH-2316] - Library conflict with Parser-Tika Plugin and Lib Folder

提升

[NUTCH-1763] - Improving comments on the Injector Class

[NUTCH-2034] - CrawlDB filtered documents counter.

[NUTCH-2035] - Regex filter using case sensitive rules.

[NUTCH-2046] - The crawl script should be able to skip an initial injection.

[NUTCH-2135] - Ant Eclipse build does not include protocol-interactiveselenium

[NUTCH-2193] - Upgrade feed parser plugin to use rome 1.5

完整更新内容请查看发布说明。

下载地址：

http://nutch.apache.org/downloads.html

Apache Nutch 1.14 发布，Web 爬虫下载地址

登录或注册

新闻 Apache Nutch 1.14 发布，Web 爬虫下载

漂亮的石头版主管理成员

登录或注册

新闻 Apache Nutch 1.14 发布，Web 爬虫 下载

漂亮的石头 版主 管理成员

新闻 Apache Nutch 1.14 发布，Web 爬虫下载

漂亮的石头版主管理成员