Scrapy 1.0 发布,该版本有多项新的特性和 bug 修复,部分内容如下: 新特性和增强 Python logging (:issue:`1060`, :issue:`1235`, :issue:`1236`, :issue:`1240`,:issue:`1259`, :issue:`1278`, :issue:`1286`) FEED_EXPORT_FIELDS option (:issue:`1159`, :issue:`1224`) Dns cache size and timeout options (:issue:`1132`) support namespace prefix in xmliter_lxml (:issue:`963`) Reactor threadpool max size setting (:issue:`1123`) Allow spiders to return dicts. (:issue:`1081`) Add Response.urljoin() helper (:issue:`1086`) look in ~/.config/scrapy.cfg for user config (:issue:`1098`) handle TLS SNI (:issue:`1101`) Selectorlist extract first (:issue:`624`, :issue:`1145`) Added JmesSelect (:issue:`1016`) add gzip compression to filesystem http cache backend (:issue:`1020`) CSS support in link extractors (:issue:`983`) httpcache dont_cache meta #19 #689 (:issue:`821`) add signal to be sent when request is dropped by the scheduler (:issue:`961`) avoid download large response (:issue:`946`) Allow to specify the quotechar in CSVFeedSpider (:issue:`882`) Add referer to "Spider error processing" log message (:issue:`795`) process robots.txt once (:issue:`896`) GSoC Per-spider settings (:issue:`854`) Add project name validation (:issue:`817`) GSoC API cleanup (:issue:`816`, :issue:`1128`, :issue:`1147`,:issue:`1148`, :issue:`1156`, :issue:`1185`, :issue:`1187`, :issue:`1258`,:issue:`1268`, :issue:`1276`, :issue:`1285`, :issue:`1284`) Be more responsive with IO operations (:issue:`1074` and :issue:`1075`) Do leveldb compaction for httpcache on closing (:issue:`1297`) 弃用和清除: Deprecate htmlparser link extractor (:issue:`1205`) remove deprecated code from FeedExporter (:issue:`1155`) a leftover for.15 compatibility (:issue:`925`) drop support for CONCURRENT_REQUESTS_PER_SPIDER (:issue:`895`) Drop old engine code (:issue:`911`) Deprecate SgmlLinkExtractor (:issue:`777`) 更多内容请查看发行日志。 该版本下载:Source code (zip) Scrapy 是一套基于基于Twisted的异步处理框架,纯python实现的爬虫框架,用户只需要定制开发几个模块就可以轻松的实现一个爬虫,用来抓取网页内容以及各种图片,非常之方便~ Scrapy 1.0 发布,Web 爬虫框架下载地址