Hacker News new | past | comments | ask | show | jobs | submit login
Datafuse – Modern Real-Time Data Processing in Rust (github.com/datafuselabs)
80 points by implfuture on Aug 5, 2021 | hide | past | favorite | 14 comments



Anyone can optimize a database for trivial queries. Would be nice to at least see TPCH results or any other more complex benchmark.


Sure, datafuse is still working in progress. TPCH (mainly for JOIN) will be fully supported in Beta version. Datafuse team are mainly working on the Alpha version.


I almost mixed this up with Apache Arrow DataFusion for a second: https://github.com/apache/arrow-datafusion


That doesn't surprise me at all. I have politely requested that the project considers renaming to avoid this.

https://github.com/datafuselabs/datafuse/issues/654


Impressive and bold claims. I wish the team well. But I won’t tinker with it until it’s been jepsen tested and many core features on the roadmap have been finished.


I'll complete jepsen test ASAP:)


Curious what the motivation behind rebuilding it in rust is, versus contributing more to Clickhouse? Obviously memory safety is a big one, but is that the only reason?

What are the other goals of the project?

Personally, I'd love to see an easier-to-manage system with replication considered as a first-class feature rather than bolted on at the end.


Well. 1. With the improvement of the rust ecosystem, using rust has made database development faster and easy, for example datafuse use the tokio to implement the pipeline https://github.com/datafuselabs/datafuse/tree/master/fuseque....

2. Couldn't agree with you more with easier-to-manage as a first-class feature, but some times easier-to-manage is built on stability, that's what datafuse is trying to do.


Awesome, great to hear! I've been using clickhouse for a long time and although we haven't contributed significantly to development, random bugs and issues have been quite painful in the past. Looking forward to what you're able to do!

p.s. Please don't add in-process DNS caching ;-)

https://github.com/ClickHouse/ClickHouse/issues/5287


The problem with these queries is that they just aren't realistic in a production system. Over time the queries become more complex, include more edge-cases and cruft, and your main goal is that they complete with accuracy rather than if it was 5s or 50s.


Would be good to know the background of this project, team etc.


Datafuse team mainly from the ClickHouse community, but more focused on the cloud database. Datafuse Labs team: https://github.com/orgs/datafuselabs/people


Suddenly things got a lot more interesting. The Clickhouse guys are really talented & have good taste.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: