Apache Arrow 0.9.0 发布了。Apache Arrow 是 Apache 基金会的顶级项目之一。它的目的是作为一个跨平台的数据层来加快大数据分析项目的运行速度。它包含一组规范的内存中的平面和分层数据表示,以及多种语言绑定以进行结构操作。 它还提供低架构流式传输和批量消息传递,零拷贝进程间通信(IPC)和矢量化的内存分析库。 更新内容: 新特性和改进 ARROW-1021 - [Python] Add documentation about using pyarrow from other Cython and C++ projects ARROW-1035 - [Python] Add ASV benchmarks for streaming columnar deserialization ARROW-1394 - [Plasma] Add optional extension for allocating memory on GPUs ARROW-1463 - [JAVA] Restructure ValueVector hierarchy to minimize compile-time generated code ARROW-1579 - [Java] Add dockerized test setup to validate Spark integration ARROW-1580 - [Python] Instructions for setting up nightly builds on Linux ARROW-1623 - [C++] Add convenience method to construct Buffer from a string that owns its memory ARROW-1632 - [Python] Permit categorical conversions in Table.to_pandas on a per-column basis ARROW-1643 - [Python] Accept hdfs:// prefixes in parquet.read_table and attempt to connect to HDFS ARROW-1705 - [Python] Create StructArray from sequence of dicts given a known data type ARROW-1706 - [Python] StructArray.from_arrays should handle sequences that are coercible to arrays ARROW-1712 - [C++] Add method to BinaryBuilder to reserve space for value data ARROW-1757 - [C++] Add DictionaryArray::FromArrays alternate ctor that can check or sanitized “untrusted” indices ARROW-1815 - [Java] Rename MapVector to StructVector 更多内容请完整更新列表和下载地址。 Apache Arrow 0.9.0 发布,内存数据交换格式下载地址