Commit 14bf39d
Arrow2 02092022 (#1795)
* feat: add join type for logical plan display (#1674)
* (minor) Reduce memory manager and disk manager logs from `info!` to `debug!` (#1689)
* Move `information_schema` tests out of execution/context.rs to `sql_integration` tests (#1684)
* Move tests from context.rs to information_schema.rs
* Fix up tests to compile
* Move timestamp related tests out of context.rs and into sql integration test (#1696)
* Move some tests out of context.rs and into sql
* Move support test out of context.rs and into sql tests
* Fixup tests and make them compile
* Add `MemTrackingMetrics` to ease memory tracking for non-limited memory consumers (#1691)
* Memory manager no longer track consumers, update aggregatedMetricsSet
* Easy memory tracking with metrics
* use tracking metrics in SPMS
* tests
* fix
* doc
* Update datafusion/src/physical_plan/sorts/sort.rs
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* make tracker AtomicUsize
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* Implement TableProvider for DataFrameImpl (#1699)
* Add TableProvider impl for DataFrameImpl
* Add physical plan in
* Clean up plan construction and names construction
* Remove duplicate comments
* Remove unused parameter
* Add test
* Remove duplicate limit comment
* Use cloned instead of individual clone
* Reduce the amount of code to get a schema
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* Add comments to test
* Fix plan comparison
* Compare only the results of execution
* Remove println
* Refer to df_impl instead of table in test
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* Fix the register_table test to use the correct result set for comparison
* Consolidate group/agg exprs
* Format
* Remove outdated comment
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* refine test in repartition.rs & coalesce_batches.rs (#1707)
* Fuzz test for spillable sort (#1706)
* Lazy TempDir creation in DiskManager (#1695)
* Incorporate dyn scalar kernels (#1685)
* Rebase
* impl ToNumeric for ScalarValue
* Update macro to be based on
* Add floats
* Cleanup
* Newline
* add annotation for select_to_plan (#1714)
* Support `create_physical_expr` and `ExecutionContextState` or `DefaultPhysicalPlanner` for faster speed (#1700)
* Change physical_expr creation API
* Refactor API usage to avoid creating ExecutionContextState
* Fixup ballista
* clippy!
* Fix can not load parquet table form spark in datafusion-cli. (#1665)
* fix can not load parquet table form spark
* add Invalid file in log.
* fix fmt
* add upper bound for pub fn (#1713)
Signed-off-by: remzi <13716567376yh@gmail.com>
* Create SchemaAdapter trait to map table schema to file schemas (#1709)
* Create SchemaAdapter trait to map table schema to file schemas
* Linting fix
* Remove commented code
* approx_quantile() aggregation function (#1539)
* feat: implement TDigest for approx quantile
Adds a [TDigest] implementation providing approximate quantile
estimations of large inputs using a small amount of (bounded) memory.
A TDigest is most accurate near either "end" of the quantile range (that
is, 0.1, 0.9, 0.95, etc) due to the use of a scalaing function that
increases resolution at the tails. The paper claims single digit part
per million errors for q ≤ 0.001 or q ≥ 0.999 using 100 centroids, and
in practice I have found accuracy to be more than acceptable for an
apprixmate function across the entire quantile range.
The implementation is a modified copy of
https://github.com/MnO2/t-digest, itself a Rust port of [Facebook's C++
implementation]. Both Facebook's implementation, and Mn02's Rust port
are Apache 2.0 licensed.
[TDigest]: https://arxiv.org/abs/1902.04023
[Facebook's C++ implementation]: https://github.com/facebook/folly/blob/main/folly/stats/TDigest.h
* feat: approx_quantile aggregation
Adds the ApproxQuantile physical expression, plumbing & test cases.
The function signature is:
approx_quantile(column, quantile)
Where column can be any numeric type (that can be cast to a float64) and
quantile is a float64 literal between 0 and 1.
* feat: approx_quantile dataframe function
Adds the approx_quantile() dataframe function, and exports it in the
prelude.
* refactor: bastilla approx_quantile support
Adds bastilla wire encoding for approx_quantile.
Adding support for this required modifying the AggregateExprNode proto
message to support propigating multiple LogicalExprNode aggregate
arguments - all the existing aggregations take a single argument, so
this wasn't needed before.
This commit adds "repeated" to the expr field, which I believe is
backwards compatible as described here:
https://developers.google.com/protocol-buffers/docs/proto3#updating
Specifically, adding "repeated" to an existing message field:
"For ... message fields, optional is compatible with repeated"
No existing tests needed fixing, and a new roundtrip test is included
that covers the change to allow multiple expr.
* refactor: use input type as return type
Casts the calculated quantile value to the same type as the input data.
* fixup! refactor: bastilla approx_quantile support
* refactor: rebase onto main
* refactor: validate quantile value
Ensures the quantile values is between 0 and 1, emitting a plan error if
not.
* refactor: rename to approx_percentile_cont
* refactor: clippy lints
* suppport bitwise and as an example (#1653)
* suppport bitwise and as an example
* Use $OP in macro rather than `&`
* fix: change signature to &dyn Array
* fmt
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* fix: substr - correct behaivour with negative start pos (#1660)
* minor: fix cargo run --release error (#1723)
* Convert boolean case expressions to boolean logic (#1719)
* Convert boolean case expressions to boolean logic
* Review feedback
* substitute `parking_lot::Mutex` for `std::sync::Mutex` (#1720)
* Substitute parking_lot::Mutex for std::sync::Mutex
* enable parking_lot feature in tokio
* Add Expression Simplification API (#1717)
* Add Expression Simplification API
* fmt
* Add tests and CI for optional pyarrow module (#1711)
* Implement other side of conversion
* Add test workflow
* Add (failing) tests
* Get unit tests passing
* Use python -m pip
* Debug LD_LIBRARY_PATH
* Set LIBRARY_PATH
* Update help with better info
* Update parking_lot requirement from 0.11 to 0.12 (#1735)
Updates the requirements on [parking_lot](https://github.com/Amanieu/parking_lot) to permit the latest version.
- [Release notes](https://github.com/Amanieu/parking_lot/releases)
- [Changelog](https://github.com/Amanieu/parking_lot/blob/master/CHANGELOG.md)
- [Commits](Amanieu/parking_lot@0.11.0...0.12.0)
---
updated-dependencies:
- dependency-name: parking_lot
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Prevent repartitioning of certain operator's direct children (#1731) (#1732)
* Prevent repartitioning of certain operator's direct children (#1731)
* Update ballista tests
* Don't repartition children of RepartitionExec
* Revert partition restriction on Repartition and Projection
* Review feedback
* Lint
* API to get Expr's type and nullability without a `DFSchema` (#1726)
* API to get Expr type and nullability without a `DFSchema`
* Add test
* publically export
* Improve docs
* Fix typos in crate documentation (#1739)
* add `cargo check --release` to ci (#1737)
* remote test
* Update .github/workflows/rust.yml
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* Move optimize test out of context.rs (#1742)
* Move optimize test out of context.rs
* Update
* use clap 3 style args parsing for datafusion cli (#1749)
* use clap 3 style args parsing for datafusion cli
* upgrade cli version
* Add partitioned_csv setup code to sql_integration test (#1743)
* use ordered-float 2.10 (#1756)
Signed-off-by: Andy Grove <agrove@apache.org>
* #1768 Support TimeUnit::Second in hasher (#1769)
* Support TimeUnit::Second in hasher
* fix linter
* format (#1745)
* Create built-in scalar functions programmatically (#1734)
* create build-in scalar functions programatically
Signed-off-by: remzi <13716567376yh@gmail.com>
* solve conflict
Signed-off-by: remzi <13716567376yh@gmail.com>
* fix spelling mistake
Signed-off-by: remzi <13716567376yh@gmail.com>
* rename to call_fn
Signed-off-by: remzi <13716567376yh@gmail.com>
* [split/1] split datafusion-common module (#1751)
* split datafusion-common module
* pyarrow
* Update datafusion-common/README.md
Co-authored-by: Andy Grove <agrove@apache.org>
* Update datafusion/Cargo.toml
* include publishing
Co-authored-by: Andy Grove <agrove@apache.org>
* fix: Case insensitive unquoted identifiers (#1747)
* move dfschema and column (#1758)
* add datafusion-expr module (#1759)
* move column, dfschema, etc. to common module (#1760)
* include window frames and operator into datafusion-expr (#1761)
* move signature, type signature, and volatility to split module (#1763)
* [split/10] split up expr for rewriting, visiting, and simplification traits (#1774)
* split up expr for rewriting, visiting, and simplification
* add docs
* move built-in scalar functions (#1764)
* split expr type and null info to be expr-schemable (#1784)
* rewrite predicates before pushing to union inputs (#1781)
* move accumulator and columnar value (#1765)
* move accumulator and columnar value (#1762)
* fix bad data type in test_try_cast_decimal_to_decimal
* added projections for avro columns
Co-authored-by: xudong.w <wxd963996380@gmail.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Yijie Shen <henry.yijieshen@gmail.com>
Co-authored-by: Phillip Cloud <417981+cpcloud@users.noreply.github.com>
Co-authored-by: Matthew Turner <matthew.m.turner@outlook.com>
Co-authored-by: Yang <37145547+Ted-Jiang@users.noreply.github.com>
Co-authored-by: Remzi Yang <59198230+HaoYang670@users.noreply.github.com>
Co-authored-by: Dan Harris <1327726+thinkharderdev@users.noreply.github.com>
Co-authored-by: Dom <dom@itsallbroken.com>
Co-authored-by: Kun Liu <liukun@apache.org>
Co-authored-by: Dmitry Patsura <talk@dmtry.me>
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: r.4ntix <r.4ntix@gmail.com>
Co-authored-by: Jiayu Liu <Jimexist@users.noreply.github.com>
Co-authored-by: Andy Grove <agrove@apache.org>
Co-authored-by: Rich <jychen7@users.noreply.github.com>
Co-authored-by: Marko Mikulicic <mmikulicic@gmail.com>
Co-authored-by: Eduard Karacharov <13005055+korowa@users.noreply.github.com>1 parent 83f937a commit 14bf39d
102 files changed
Lines changed: 13702 additions & 8123 deletions
File tree
- .github/workflows
- ballista/rust
- client
- core
- executor
- scheduler
- datafusion-cli
- src
- datafusion-common
- src
- datafusion-expr
- src
- datafusion
- benches
- fuzz-utils
- src
- src
- avro_to_arrow
- datasource
- file_format
- listing
- execution
- logical_plan
- optimizer
- physical_optimizer
- physical_plan
- expressions
- file_format
- sql
- tests
- sql
- docs/source
- specification
- user-guide/sql
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
58 | 58 | | |
59 | 59 | | |
60 | 60 | | |
61 | | - | |
| 61 | + | |
62 | 62 | | |
63 | 63 | | |
64 | 64 | | |
65 | 65 | | |
66 | | - | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
67 | 73 | | |
68 | 74 | | |
69 | 75 | | |
| |||
230 | 236 | | |
231 | 237 | | |
232 | 238 | | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
233 | 288 | | |
234 | 289 | | |
235 | 290 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| 21 | + | |
| 22 | + | |
21 | 23 | | |
22 | 24 | | |
23 | 25 | | |
| |||
33 | 35 | | |
34 | 36 | | |
35 | 37 | | |
36 | | - | |
37 | | - | |
| 38 | + | |
| 39 | + | |
0 commit comments