On coordinated omission #120

fee-mendes · 2025-02-05T05:34:38Z

Benchmarking Apache Cassandra with Rust states:

Use async programming model: don’t block immediately on the returned future after spawning a single request, but spawn many requests and collect results asynchronously when they are ready. This way we’re making sending requests independent from each other, thus avoiding coordinated omission.

This makes sense, as long as we account the wait time when it misses (or do not respond) within its window. It looks like latte accounts for requests' service time instead of response time and has no scheduling notion when a target rate is specified (though I am far from familiar with the code, correct me if I'm wrong).

Consider this snapshot where I asked for a fixed rate of 150K:

Naturally, the stressed system is unable to keep up with the load. Yet - latencies are too good to be true. As latencies do not grow, it effectively means the load generator does not account for the latency of requests which missed its window. In turn, the current latte implementation resembles cassandra-stress "throughput" or -throttle modes, where only service time gets measured instead of response time.

http://psy-lob-saw.blogspot.com/2016/07/fixing-co-in-cstress.html is a very good (wow, almost 10 years!) article addressing most of the points mentioned here. The expectation is that when --rate is specified, the load generator reports growing latencies back as the server is unable to meet its implied schedule.

Btw, latte performance is quite impressive - though unfortunately this effectively makes it a blocker for my use case :(

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On coordinated omission #120

On coordinated omission #120

fee-mendes commented Feb 5, 2025

On coordinated omission #120

On coordinated omission #120

Comments

fee-mendes commented Feb 5, 2025