Compute the minimum, maximum and mean temperatures per weather station, sorted alphabetically for one trillion rows.
The example is inspired from the One Trillion Row Challenge by Coiled - the company behind popular distributed pythonic open-source library Dask.
The source code is available in our Github repository at https://github.com/kubox-ai/1trc
This repository is a work in progress as we iterate and learn before the final submission. It contains the code to quickly spin up a Kubox cluster in AWS us-east-1
and process 1 trillion rows of data. It give you two options for tackling this challenge:
We’re excited to have you explore and experiment with Kubox. Feel free to dive in and share your feedback as we continue to enhance this project!
Metric/Framework | Daft + Ray | Clickhouse |
---|---|---|
Startup time | 320s | 313s |
Running time | 1189s | 527s |
Delete time | 122s | 123s |
Estimate cost | $2.75 | $1.37 |
Work in progress..
The One Trillion Row Challenge originated as an ambitious benchmark task:
24 MiB
in size with 10 million rows)s3://coiled-datasets-rp/1trc
(AWS S3 Requester Pays Bucket)Here is how to download one of the 24 MiB
file:
Compute the minimum, maximum and mean temperatures per weather station, sorted alphabetically for one trillion rows.
The example is inspired from the One Trillion Row Challenge by Coiled - the company behind popular distributed pythonic open-source library Dask.
The source code is available in our Github repository at https://github.com/kubox-ai/1trc
This repository is a work in progress as we iterate and learn before the final submission. It contains the code to quickly spin up a Kubox cluster in AWS us-east-1
and process 1 trillion rows of data. It give you two options for tackling this challenge:
We’re excited to have you explore and experiment with Kubox. Feel free to dive in and share your feedback as we continue to enhance this project!
Metric/Framework | Daft + Ray | Clickhouse |
---|---|---|
Startup time | 320s | 313s |
Running time | 1189s | 527s |
Delete time | 122s | 123s |
Estimate cost | $2.75 | $1.37 |
Work in progress..
The One Trillion Row Challenge originated as an ambitious benchmark task:
24 MiB
in size with 10 million rows)s3://coiled-datasets-rp/1trc
(AWS S3 Requester Pays Bucket)Here is how to download one of the 24 MiB
file: