Datafuse – Modern Real-Time Data Processing in Rust

budabudimir · on Aug 5, 2021

Anyone can optimize a database for trivial queries. Would be nice to at least see TPCH results or any other more complex benchmark.

BohuTANG · on Aug 5, 2021

Sure, datafuse is still working in progress. TPCH (mainly for JOIN) will be fully supported in Beta version. Datafuse team are mainly working on the Alpha version.

themaxdavitt · on Aug 5, 2021

I almost mixed this up with Apache Arrow DataFusion for a second: https://github.com/apache/arrow-datafusion

andygrove · on Aug 5, 2021

That doesn't surprise me at all. I have politely requested that the project considers renaming to avoid this.

https://github.com/datafuselabs/datafuse/issues/654

gigatexal · on Aug 5, 2021

Impressive and bold claims. I wish the team well. But I won’t tinker with it until it’s been jepsen tested and many core features on the roadmap have been finished.

AriesDevil · on Aug 5, 2021

I'll complete jepsen test ASAP:)

caust1c · on Aug 5, 2021

Curious what the motivation behind rebuilding it in rust is, versus contributing more to Clickhouse? Obviously memory safety is a big one, but is that the only reason?

What are the other goals of the project?

Personally, I'd love to see an easier-to-manage system with replication considered as a first-class feature rather than bolted on at the end.

BohuTANG · on Aug 5, 2021

Well. 1. With the improvement of the rust ecosystem, using rust has made database development faster and easy, for example datafuse use the tokio to implement the pipeline https://github.com/datafuselabs/datafuse/tree/master/fuseque....

2. Couldn't agree with you more with easier-to-manage as a first-class feature, but some times easier-to-manage is built on stability, that's what datafuse is trying to do.

caust1c · on Aug 5, 2021

Awesome, great to hear! I've been using clickhouse for a long time and although we haven't contributed significantly to development, random bugs and issues have been quite painful in the past. Looking forward to what you're able to do!

p.s. Please don't add in-process DNS caching ;-)

https://github.com/ClickHouse/ClickHouse/issues/5287

neilsense · on Aug 5, 2021

The problem with these queries is that they just aren't realistic in a production system. Over time the queries become more complex, include more edge-cases and cruft, and your main goal is that they complete with accuracy rather than if it was 5s or 50s.

threeseed · on Aug 5, 2021

Would be good to know the background of this project, team etc.

BohuTANG · on Aug 5, 2021

Datafuse team mainly from the ClickHouse community, but more focused on the cloud database. Datafuse Labs team: https://github.com/orgs/datafuselabs/people

MrBuddyCasino · on Aug 5, 2021

Suddenly things got a lot more interesting. The Clickhouse guys are really talented & have good taste.

wubx · on Aug 5, 2021

https://datafuse.rs/overview/architecture/