[META] 8.x PQ improvements #11447

colinsurprenant · 2019-12-17T19:32:54Z

These are the PQ issues that should be looked at in 7.x

Phase 1

Phase 2

Review, triage and/or fix issues in this list persistent queues

Needs fixing

[@kaisecheng] PQ AccessDeniedException on Windows in checkpoint move add backoff to checkpoint write #13902
[@kaisecheng] queue.max_bytes >= queue.page_capacity check is not done in multipipelines PQ size check for multiple pipelines #13877
verify out-of-date firstUnackedPageNum that point to purged page verify handling of potential out-of-date firstUnackedPageNum in head checkpoint in Queue.open() #6592 Handle out-of-date firstUnackedPageNum in head checkpoint #14147
PQ signal notfull state when a batch is read but in reality it is still full PQ full/not full state #6801
queue drain should not start shutdown watcher till queue empty Hide Shutdown Watcher stall message when PQ draining #13934
Allow metrics update when PQ draining #13935
PQ exception leads to crash upon reloading pipeline by not releasing PQ lock PQ exception leads to crash upon reloading pipeline by not releasing PQ lock #12005
Document that PQ and DLQ should not be set on NFS Document that PQ and DLQ should not be set on NFS #12097
[Doc] Add missing indicators of supported queue type on Logstash settings [Doc] Add missing indicators of supported queue type on Logstash settings #12536

No fix required:

Available disk space check for PQ does not count unused space in page files Available disk space check for PQ does not count unused space in page files #10047
Logstash fails to start when the queue size is bigger than the available memory Logstash fails to start when the queue size is bigger than the available memory #11785
Logstash seems to map all PQ pages in open()

Phase 3 & Beyond

Needs discussion

Enhanced Logstash Guaranteed Delivery
Change shutdown behaviour when PQ is enabled. #8458
PQ default 64mb page could hold fewer elements than a configured large batch size PQ default 64mb page could hold fewer elements than a configured large batch size #12102
PQ does not use all resources Bottleneck in PQ Why it doesn't use all resources #13906
[performance] look for a more efficient isFull() and getPersistedByteSize() interactions [performance] look for a more efficient isFull() and getPersistedByteSize() interactions #9038
New queue type (cirular queue) to configurably drop events to avoid upstream blocking from downstream back pressure New queue type to configurably drop events to avoid upstream blocking from downstream back pressure #11601

Needs watch

files permissions can lead to startup PQ problems files permissions can lead to startup PQ problems #10715
PQ AccessDeniedException on Windows in checkpoint move PQ AccessDeniedException on Windows in checkpoint move #12345

Recovery (pkcheck/pqrepair)

Add a new pqdump utility which can dump in JSON all/any data in a queue dir.
Improve automatic recovery at logstash startup

Timeouts and batching

Re-assess the state of queue write timeout handling WRT plugins like the http input, anything else required to move forward?
[META] Queue timeouts + Batching [META] Queue timeouts + Batching #9389

Performance

There were a few issues about the PQ design performance with the checkpointing strategy Checkpoints Come with a Lot of Overhead when Writing #7162 and (large) page files memory mapping Persisted Queue Performance Design Issue (Writing Data) #7317 PQ large page performance degradation #8801. I do not believe that substantial performance improvements will be prioritized in the the foreseeable future.
Decouple write lock and read lock. Decouple write lock and read lock of PQ #16158

The text was updated successfully, but these errors were encountered:

zez3 · 2021-02-18T22:37:50Z

I was reading this https://www.elastic.co/blog/using-parallel-logstash-pipelines-to-improve-persistent-queue-performance

I see that there is an issue(“To put this another way, a single pipeline can only drive the disk with a single thread. This is true even if a pipeline were to have multiple inputs, as additional inputs in a single pipeline do not increase disk I/O threads.”)

The proposed "Solution" for improving overall performance I would call it a workaround that unfortunately does not work for all cases. We where planing to send syslog -> LS directly no multiple filebeats instances.

Does the support of additional persistent queue threads running in parallel matches any issues described on this meta?

zez3 · 2021-02-19T06:24:55Z

I would also add one more issue.

After filling up the PQ file LS should check if the ES cluster is in read-only state and stop outputting to that cluster.
A recheck every x seconds would help

Similar to:
#10023

zalseryani · 2024-03-07T07:07:18Z

Greetings,
any update on this ?

Thank you.

colinsurprenant added meta persistent queues labels Dec 17, 2019

jsvd mentioned this issue Apr 14, 2020

Logstash fails to start when the queue size is bigger than the available memory #11785

Closed

roaksoax changed the title ~~[META] 7.x PQ improvements~~ [META] 8.x PQ improvements Jan 19, 2022

roaksoax assigned kaisecheng Mar 15, 2022

roaksoax added the status:work-in-progress label Mar 15, 2022

roaksoax unassigned kaisecheng Nov 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[META] 8.x PQ improvements #11447

[META] 8.x PQ improvements #11447

colinsurprenant commented Dec 17, 2019 •

edited by kaisecheng

Loading

zez3 commented Feb 18, 2021

zez3 commented Feb 19, 2021 •

edited

Loading

zalseryani commented Mar 7, 2024

[META] 8.x PQ improvements #11447

[META] 8.x PQ improvements #11447

Comments

colinsurprenant commented Dec 17, 2019 • edited by kaisecheng Loading

Phase 1

Permissions & .lock file

Page file size too small exception

Docs

Recovery (pkcheck/pqrepair)

Phase 2

Needs fixing

No fix required:

Phase 3 & Beyond

Needs discussion

Needs watch

Recovery (pkcheck/pqrepair)

Timeouts and batching

Performance

zez3 commented Feb 18, 2021

zez3 commented Feb 19, 2021 • edited Loading

zalseryani commented Mar 7, 2024

colinsurprenant commented Dec 17, 2019 •

edited by kaisecheng

Loading

zez3 commented Feb 19, 2021 •

edited

Loading