Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate segments #478

Merged
merged 3 commits into from
Jan 25, 2025
Merged

Validate segments #478

merged 3 commits into from
Jan 25, 2025

Conversation

sw1313
Copy link
Contributor

@sw1313 sw1313 commented Jan 24, 2025

Related issues / PRs. Summarize issues.

Validate segments not empty in subtitle_manager.py

Summarize Changes

Perform defensive checking to avoid the "list indices must be integers or slices, not str" exception caused by null values.
test file: https://github.com/anars/blank-audio/blob/master/10-minutes-of-silence.mp3

@jhj0517 jhj0517 added the enhancement New feature or request label Jan 25, 2025
@jhj0517
Copy link
Owner

jhj0517 commented Jan 25, 2025

Hi.
I tried to reproduce it with the file, but it was not reproducible with lastest version.
Can you check?

@sw1313
Copy link
Contributor Author

sw1313 commented Jan 25, 2025

_whisper-webui_logs.txt
This is the log of me pulling the latest version files using git and running them.
FireShot Capture 001 - Gradio - 192 168 50 238

@jhj0517
Copy link
Owner

jhj0517 commented Jan 25, 2025

I assume this happens before merging #480?

Can you fetch upstream's and try again?
It leaves a message instead of throwing an error.

@sw1313
Copy link
Contributor Author

sw1313 commented Jan 25, 2025

I tested the latest code, but the error still occurs.
I consulted the reason for the error, and the response is:
If the upper layer Whisper reasoning or VAD result is empty, pass the empty list directly to subtitle_manager.
The code logic originally expected result to be a dict with the key "segments" inside.
The actual incoming result is a pure list, or other structure that does not contain "segments".
In the next step in iterate_result or write_result function, when doing result["segments"], if result is a list, then access it using string index ("segments") on the list.
Python will raise a "list indices must be integers or slices, not str" error.

@sw1313
Copy link
Contributor Author

sw1313 commented Jan 25, 2025

The issues #468 case1 is an error caused by whisper when no valid data is output for the audio(include audio track).
A more strict illusion penalty must be used to reproduce this error, so that no illusion data is written into the subtitles.
Case3 is an error caused by the failure to obtain the audio track during video detection phase. (case3 has been fixed)

@jhj0517
Copy link
Owner

jhj0517 commented Jan 25, 2025

I could reproduce it now.
But in my opinion it should not just silently return nothing, but leave a message and then pass the case.

May I edit some?

@sw1313
Copy link
Contributor Author

sw1313 commented Jan 25, 2025

Of course, I consulted OpenAI for a solution only after you were unable to reproduce it.

@jhj0517
Copy link
Owner

jhj0517 commented Jan 25, 2025

Confirmed that it is no longer reproducible

@jhj0517 jhj0517 merged commit cae0267 into jhj0517:master Jan 25, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants