Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image print metrics and advanced printer alarms #41

Merged
merged 11 commits into from
Dec 20, 2024

Conversation

maehw
Copy link
Collaborator

@maehw maehw commented Dec 17, 2024

The contributions in this pull request shall help to support #39.

TL;DR:
Every print job submitted by self-o-mat now gets tracked with some timing information (e.g. timestamps of self-o-mat processing start, user confirmation, creation of CUPS print job). Another "logic" thread has been added which monitors the print job after handover to CUPS: when does it get processed by the printer and when is it completed - as well as the job's state. This allows to detect timing anomalies (in the future! e.g. unexpectedly long CUPS processing duration by the printer or duration in the pending state). I've also introduced new alarms when the printer needs attention (printer out of paper, printer out of ink, printer's input tray missing) which should improve self-o-mat operation.

Details
Timestamps, job ID and job state for every CUPS print job submitted by self-o-mat are kept in a list (std::vector of a struct). The struct is initially filled in the already existing printer thread. If the user does not cancel printing respectively does confirm printing, the entry (struct) is added to the list. This marks the handover to the new printer monitoring thread. The latter keeps asking CUPS for the current job state, the timestamps for CUPS job creation, processing and completion - updates the list item accordingly and takes actions depending on the current state. When a job is completed (or no longer visible in CUPS' print jobs, e.g. because it had been removed by the admin manually) - this includes a successful as well as an unsuccessful print (states canceled, aborted, completed), the print job entry gets removed from the list and hence is no longer being monitored (why should it). During removal from the list, the job details are emitted to the log (future idea: use a separate external log file, CSV?).

Unintentional by-catch (feature!)
When checking print jobs in Raspberry Pi's print queue UI manually, I found out that a print job could be queried for its IPP job-printer-state-reasons. This way, it is possible to detect actual problems while a print job is being processed (and never finishes!): at least for the SELPHY printer "media-empty-error" and "media-needed" indicate that the printer is out of paper, "marker-supply-empty-error" indicates that the printer has run out of ink and "input-tray-missing" indicates that there is no paper input tray at all (oops, forgot to put it back in?). I find this extremely helpful when operating self-o-mat, especially as my booth has the printer hidden halfway in the housing and its display cannot be seen by the users. Hence, I implemented a CUPS/IPP query for these job details and emit one out of the three new alarms when such a problem is detected.

Summary of changes

  • As PrintManager acts as the CUPS/IPP interface, I've extended and slightly modified its interface:
    • changed: printImage() does no longer return a boolean (had not been checked before) but now returns the CUPS job ID
    • added: getJobDetails() a method to query the job state (introduced new type PrinterJobState internally mapped from CUPS) and the relevant timestamps
    • added: printerJobStateToString() a simple helper method which converts the new magic values of the enum type to a human-readable string
    • added: checkPrinterAttentionFromJob(): internally queries the job-printer-state-reasons attribute for a CUPS job by its ID and returns newly introduced error flags (PrinterAttentionFlag) which are used by "logic" to emit alarms
  • BoothLogic:
    • changed: printerThread(): collecting timestamps and adding them to metrics struct, eventually stored as list elements
    • added: printMonitoringThread() which contains the core functionality of print job monitoring and checking for additional printer attention (alarming out of ink, out of paper and missing paper input tray)
  • IGui.h: added an alarm enum value for printer alerts (I did not want to interfere with the existing alarm sources)

Interesting finds

  • PrinterManager::resumePrinter() does not call httpClose(http) explicitely - is this handled automatically (e.g. when loosing context i.e. exiting the method) or a potential bug? (I haven't tried to track any open HTTP connections.. but maybe it could be done this way)
  • I wanted to backup my SD card image from the existing setup but had some troubles.. maybe the SD card was facing its EOL. So I had to setup from scratch (thanks to README.md and https://github.com/xtech/self-o-mat/wiki/Configuration#fresh-raspberry-pi-os-installation not a big deal!). The problem: the browser UI does not let me click any dropdown menus.. or nothing really happens, it only gets "darker" but no choices appear. A problem with the newer web browsers (firefox and chromium are installed by default; no versions at hand, sorry). Do you have any idea? I don't see this related to my changes.. as how could they have broken the web UI?
  • As my CUPS setup also got broken... the prints are not in landscape mode anymore but in portrait mode... adding white padding. Any idea how to configure this properly?
  • Printing on the SELPHY with the current setup really takes ages. IIRC around 1 minute and 50 seconds... around 50 seconds alone the printer's image processing time(?) until it starts an attempt to print (and also hence the delay for detecting that it has run out of ink, paper ... or is missing its paper input tray completely). Do you have an idea if your setup is comparably "fast"/slow (or are you only using the professional grade printer)? I think neither self-o-mat nor CUPS is the bottleneck when printing (this PR originated in Speed up Canon SELPHY printer / add support for second printer #39 and that's what I can tell so far).

Open questions

  • PrinterManager mutex: I was unsure about printerStateMutex - what should it really lock? Concurrent access to CUPS/IPP or some data structures? I think I need input from your side whether it shall be used in the new methods or not.
  • i18n/fr_frontend.json: No clue if French people would phrase it like that
  • Currently printer monitoring thread is activated periodically. As denoted by TODOs in the code, print thread could explicitly activate print monitoring thread - and print monitoring thread could periodically activate "itself" until the metrics list runs empty.. and then be activated again from print thread. Can you wait for a signal from another thread with a timeout? (Sorry, only familiar with few RTOSes, not with Boost threads and its signalling. Just saw something called "condition variables" in your code already in use). BTW the chosen period time also influences with what delay alarms may be shown/hidden (obvious when you know it) - this may influence usability/UX.
  • Decide what logs to keep... I've commented out some not to flood "the bus" with too many messages.
  • How to go on with storing+analyzing those metrics. What would you recommend? Enabling the filesystem logger (how?) and then parsing the logs? Or a separate file (e.g. CSV as mentioned above)?
  • Did you introduce this CLAssistant thingie which has been added to all pull requests? such as in [ADD] Countdown to delay triggering in software #36 -- Or is this something external/unauthorized?

Edit: I think the UI files are missing. This is due to working remotely on the RPi from another machine and with no direct GitHub access on the RPi. I'll add them soon. Fixed

@CLAassistant
Copy link

CLAassistant commented Dec 17, 2024

CLA assistant check
All committers have signed the CLA.

@maehw maehw changed the title Image print metrics Image print metrics and advanced printer alarms Dec 17, 2024
@maehw maehw self-assigned this Dec 18, 2024
@maehw maehw requested a review from ClemensElflein December 18, 2024 11:00
@ClemensElflein
Copy link
Member

Hi @maehw,
thank you for the PR! I have added my replies inline:

The contributions in this pull request shall help to support #39.

TL;DR: Every print job submitted by self-o-mat now gets tracked with some timing information (e.g. timestamps of self-o-mat processing start, user confirmation, creation of CUPS print job). Another "logic" thread has been added which monitors the print job after handover to CUPS: when does it get processed by the printer and when is it completed - as well as the job's state. This allows to detect timing anomalies (in the future! e.g. unexpectedly long CUPS processing duration by the printer or duration in the pending state). I've also introduced new alarms when the printer needs attention (printer out of paper, printer out of ink, printer's input tray missing) which should improve self-o-mat operation.

Details Timestamps, job ID and job state for every CUPS print job submitted by self-o-mat are kept in a list (std::vector of a struct). The struct is initially filled in the already existing printer thread. If the user does not cancel printing respectively does confirm printing, the entry (struct) is added to the list. This marks the handover to the new printer monitoring thread. The latter keeps asking CUPS for the current job state, the timestamps for CUPS job creation, processing and completion - updates the list item accordingly and takes actions depending on the current state. When a job is completed (or no longer visible in CUPS' print jobs, e.g. because it had been removed by the admin manually) - this includes a successful as well as an unsuccessful print (states canceled, aborted, completed), the print job entry gets removed from the list and hence is no longer being monitored (why should it). During removal from the list, the job details are emitted to the log (future idea: use a separate external log file, CSV?).

Unintentional by-catch (feature!) When checking print jobs in Raspberry Pi's print queue UI manually, I found out that a print job could be queried for its IPP job-printer-state-reasons. This way, it is possible to detect actual problems while a print job is being processed (and never finishes!): at least for the SELPHY printer "media-empty-error" and "media-needed" indicate that the printer is out of paper, "marker-supply-empty-error" indicates that the printer has run out of ink and "input-tray-missing" indicates that there is no paper input tray at all (oops, forgot to put it back in?). I find this extremely helpful when operating self-o-mat, especially as my booth has the printer hidden halfway in the housing and its display cannot be seen by the users. Hence, I implemented a CUPS/IPP query for these job details and emit one out of the three new alarms when such a problem is detected.

Summary of changes

* As `PrintManager` acts as the CUPS/IPP interface, I've extended and slightly modified its interface:
  
  * changed: `printImage()` does no longer return a boolean (had not been checked before) but now returns the CUPS job ID
  * added: `getJobDetails()` a method to query the job state (introduced new type `PrinterJobState` internally mapped from CUPS) and the relevant timestamps
  * added: `printerJobStateToString()` a simple helper method which converts the new magic values of the enum type to a human-readable string
  * added: `checkPrinterAttentionFromJob()`: internally queries the `job-printer-state-reasons` attribute for a CUPS job by its ID and returns newly introduced error flags (`PrinterAttentionFlag`) which are used by "logic" to emit alarms

* `BoothLogic`:
  
  * changed: `printerThread()`: collecting timestamps and adding them to metrics struct, eventually stored as list elements
  * added: `printMonitoringThread()` which contains the core functionality of print job monitoring and checking for additional printer attention (alarming out of ink, out of paper and missing paper input tray)

* `IGui.h`: added an alarm enum value for printer alerts (I did not want to interfere with the existing alarm sources)

Interesting finds

* `PrinterManager::resumePrinter()` does not call `httpClose(http)` explicitely - is this handled automatically (e.g. when loosing context i.e. exiting the method) or a potential bug? (I haven't tried to track any open HTTP connections.. but maybe it could be done this way)

Yes, this is a bug. It introduces a memory leak as well as keeping the connections open. Solution would be to either httpClose() the connection before method exit. Alternatively it looks to me like we could just keep the connection open forever and reuse it on later calls.

* I wanted to backup my SD card image from the existing setup but had some troubles.. maybe the SD card was facing its EOL. So I had to setup from scratch (thanks to README.md and https://github.com/xtech/self-o-mat/wiki/Configuration#fresh-raspberry-pi-os-installation not a big deal!). The problem: the browser UI does not let me click any dropdown menus.. or nothing really happens, it only gets "darker" but no choices appear. A problem with the newer web browsers (firefox and chromium are installed by default; no versions at hand, sorry). Do you have any idea? I don't see this related to my changes.. as how could they have broken the web UI?

No, sorry I haven't written the web interface, that was a buddy of mine. Are you sure it's just because of the new browsers, because then maybe recompiling the web interface with an updated version of the web framework will solve it?

I also can't see how your changes would break the web interface. Does the master code still run as expected?

* As my CUPS setup also got broken... the prints are not in landscape mode anymore but in portrait mode... adding white padding. Any idea how to configure this properly?

I haven't used it in ages, back then we had this manual: Manual_Drucker.pdf

* Printing on the SELPHY with the current setup really takes ages. IIRC around 1 minute and 50 seconds... around 50 seconds alone the printer's image processing time(?) until it starts an attempt to print (and also hence the delay for detecting that it has run out of ink, paper ... or is missing its paper input tray completely). Do you have an idea if your setup is comparably "fast"/slow (or are you only using the professional grade printer)? I think neither self-o-mat nor CUPS is the bottleneck when printing (this PR originated in [Speed up Canon SELPHY printer / add support for second printer #39](https://github.com/xtech/self-o-mat/issues/39) and that's what I can tell so far).

Yes, it takes ages. Even during development on my desktop PC. That's why we got the large printer instead for events.

Open questions

* `PrinterManager` mutex: I was unsure about `printerStateMutex` - what should it really lock? Concurrent access to CUPS/IPP or some data structures? I think I need input from your side whether it shall be used in the new methods or not.

It should lock the vector which is modified by two threads. If a new HTTP connection is made on each call, no need to lock those, if the connection is shared, it should also have a lock.

* `i18n/fr_frontend.json`: No clue if French people would phrase it like that

Me neither, ChatGPT says:

Your translations are quite good! They are clear, direct, and concise. However, a few adjustments can make them sound a bit more natural in French:

1. **"printer_no_paper": "Papier d'imprimante manquant"**  
   This is fine, but "Papier d'imprimante épuisé" (out of paper) might be a bit more typical in a printer context, as "manquant" is often used for something that’s physically absent, rather than something that has run out.

2. **"printer_no_ink": "Encre d'imprimante manquante"**  
   This is clear and correct. Alternatively, "Encre de l'imprimante épuisée" could also work, as "épuisée" is often used for ink running out.

3. **"printer_no_tray": "Bac à papier de l'imprimante manquant"**  
   This works well, but you might also consider "Bac à papier de l'imprimante absent" as "absent" is more commonly used for something that is missing from the printer.

So overall, your translations are good with just a slight preference for terms like "épuisé" (out of) in place of "manquant" in certain contexts.
* Currently _printer monitoring thread_ is activated periodically. As denoted by `TODO`s in the code, _print thread_ could explicitly activate _print monitoring thread_ - and _print monitoring thread_ could periodically activate "itself" until the metrics list runs empty.. and then be activated again from _print thread_. Can you wait for a signal from another thread with a timeout? (Sorry, only familiar with few RTOSes, not with Boost threads and its signalling. Just saw something called "condition variables" in your code already in use). BTW the chosen period time also influences with what delay alarms may be shown/hidden (obvious when you know it) - this may influence usability/UX.

Yes, you can either use condition variables or semaphores. Semaphore is the way to go here I think.
Initialize semaphore with count 0. The monitoring thread just tries to acquire the semaphore (will be blocking because 0). Every time you have added to the vector, increment the semaphore by one. The monitoring thread will then run for one iteration.

* Decide what logs to keep... I've commented out some not to flood "the bus" with too many messages.

Nicest would be to migrate to something like spdlog where we can set the log level on runtime.

* How to go on with storing+analyzing those metrics. What would you recommend? Enabling the filesystem logger (how?) and then parsing the logs? Or a separate file (e.g. CSV as mentioned above)?

When using spdlog, file logging with log rotation is free. CSV would be nice for advanced statistics, basically you need to decide whats best for you. I think logging is good enough for analysis.

* Did you introduce this CLAssistant thingie which has been added to all pull requests? such as in [[ADD] Countdown to delay triggering in software #36](https://github.com/xtech/self-o-mat/pull/36) -- Or is this something external/unauthorized?

Yes, after noticing how annoying it can be to change the license to something more liberal, I just added the CLA bot to the xtech organization: ClemensElflein/open_mower_ros#136

Edit: I think the UI files are missing. This is due to working remotely on the RPi from another machine and with no direct GitHub access on the RPi. I'll add them soon. Fixed

@maehw
Copy link
Collaborator Author

maehw commented Dec 20, 2024

Okay, I think this is it regarding my functional changes. I am quite happy with the results.

Still need to investigate the web interface issues though - I'll test it on master branch and if it's OK with you, I'd open another separate issue with the findings if it's reproducible on the master branch as well.

I've prepared CSV export, but I guess this is something which should not be there by default.
This leaves me with the question: how can I enable file system logging? Or: where do I find the log file(s)? So far I've only checked the screen or the command line outputs. But as there is src/tools/FilesystemLogger.cpp|h, I guess file system logging is available. BTW at the moment, I am not going to invest time to migrate to another logging system (you mentioned spdlog abvoe).

Any further thoughts on the PR from your side?

@ClemensElflein
Copy link
Member

ClemensElflein commented Dec 20, 2024

Okay, I think this is it regarding my functional changes. I am quite happy with the results.

Still need to investigate the web interface issues though - I'll test it on master branch and if it's OK with you, I'd open another separate issue with the findings if it's reproducible on the master branch as well.
Yes, no problem at all.

I've prepared CSV export, but I guess this is something which should not be there by default. This leaves me with the question: how can I enable file system logging? Or: where do I find the log file(s)? So far I've only checked the screen or the command line outputs. But as there is src/tools/FilesystemLogger.cpp|h, I guess file system logging is available. BTW at the moment, I am not going to invest time to migrate to another logging system (you mentioned spdlog abvoe).

Looks like I put this there for debugging, but in order to use it, we'd have to instantiate a FileSystemLogger instance in code, currently I can't find a place where it's actually constructed.

This would also explain the SEGFAULT during shutdown, the (in my opinion non-existing) FileSystemLogger gets shut down.

Integrating spdlog should be easy, I will give it a go after merging this PR.

Any further thoughts on the PR from your side?

No, the PR looks good to me as is. Thank you so much for working on this project!

@ClemensElflein
Copy link
Member

So the PR is ready to merge from your side (also asked on Discord, not sure where you will see it earlier)?

@maehw
Copy link
Collaborator Author

maehw commented Dec 20, 2024

Thanks, Clemens. Yes, ready to merge! :) Thanks for accepting this contribution.

@ClemensElflein ClemensElflein merged commit 9e61790 into xtech:master Dec 20, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants