Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not support m4a format audio? #492

Closed
flier268 opened this issue Feb 5, 2025 · 3 comments
Closed

Not support m4a format audio? #492

flier268 opened this issue Feb 5, 2025 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@flier268
Copy link

flier268 commented Feb 5, 2025

Which OS are you using?

  • OS: Docker

When I I upload m4a format audo and press "GENERATE SUBTITLE FILE", I got fail
After convert that to mp3, it works!

2025-01-31 05:04:57 Use "faster-whisper" implementation
2025-01-31 05:04:57 Device "cuda" is detected
2025-01-31 05:04:58 * Running on local URL:  http://0.0.0.0:7860
2025-01-31 05:04:58 
2025-01-31 05:04:58 To create a public link, set `share=True` in `launch()`.
2025-02-04 22:08:45 WARNING:  Invalid HTTP request received.
2025-02-04 22:08:45 WARNING:  Invalid HTTP request received.
2025-02-04 22:10:43 /Whisper-WebUI/venv/lib/python3.11/site-packages/torch/cuda/memory.py:391: FutureWarning: torch.cuda.reset_max_memory_allocated now calls torch.cuda.reset_peak_memory_stats, which resets /all/ peak memory stats.
2025-02-04 22:10:43   warnings.warn(
2025-02-04 22:10:46 INFO:speechbrain.utils.quirks:Applied quirks (see `speechbrain.utils.quirks`): [disable_jit_profiling, allow_tf32]
2025-02-04 22:10:46 INFO:speechbrain.utils.quirks:Excluded quirks specified by the `SB_DISABLE_QUIRKS` environment (comma-separated list): []
2025-02-04 22:10:49 
pytorch_model.bin:   0% 0.00/5.91M [00:00<?, ?B/s]
pytorch_model.bin: 100% 5.91M/5.91M [00:00<00:00, 7.28MB/s]
pytorch_model.bin: 100% 5.91M/5.91M [00:00<00:00, 7.20MB/s]
2025-02-04 22:10:49 
config.yaml:   0% 0.00/399 [00:00<?, ?B/s]
config.yaml: 100% 399/399 [00:00<00:00, 2.86MB/s]
2025-02-04 22:10:53 
pytorch_model.bin:   0% 0.00/26.6M [00:00<?, ?B/s]
pytorch_model.bin:  39% 10.5M/26.6M [00:01<00:01, 8.12MB/s]
pytorch_model.bin:  79% 21.0M/26.6M [00:01<00:00, 12.6MB/s]
pytorch_model.bin: 100% 26.6M/26.6M [00:01<00:00, 15.6MB/s]
pytorch_model.bin: 100% 26.6M/26.6M [00:01<00:00, 13.4MB/s]
2025-02-04 22:10:54 
config.yaml:   0% 0.00/221 [00:00<?, ?B/s]
config.yaml: 100% 221/221 [00:00<00:00, 2.35MB/s]
2025-02-04 22:10:57 /Whisper-WebUI/venv/lib/python3.11/site-packages/pyannote/audio/utils/reproducibility.py:74: ReproducibilityWarning: TensorFloat-32 (TF32) has been disabled as it might lead to reproducibility issues and lower accuracy.
2025-02-04 22:10:57 It can be re-enabled by calling
2025-02-04 22:10:57    >>> import torch
2025-02-04 22:10:57    >>> torch.backends.cuda.matmul.allow_tf32 = True
2025-02-04 22:10:57    >>> torch.backends.cudnn.allow_tf32 = True
2025-02-04 22:10:57 See https://github.com/pyannote/pyannote-audio/issues/1370 for more details.
2025-02-04 22:10:57 
2025-02-04 22:10:57   warnings.warn(
2025-02-04 22:11:03 /Whisper-WebUI/venv/lib/python3.11/site-packages/pyannote/audio/models/blocks/pooling.py:104: UserWarning: std(): degrees of freedom is <= 0. Correction should be strictly less than the reduction factor (input numel divided by output numel). (Triggered internally at /pytorch/aten/src/ATen/native/ReduceOps.cpp:1831.)
2025-02-04 22:11:03   std = sequences.std(dim=-1, correction=1)
2025-02-04 22:11:04 /Whisper-WebUI/venv/lib/python3.11/site-packages/torch/cuda/memory.py:391: FutureWarning: torch.cuda.reset_max_memory_allocated now calls torch.cuda.reset_peak_memory_stats, which resets /all/ peak memory stats.
2025-02-04 22:11:04   warnings.warn(
2025-02-04 22:11:18 INFO:faster_whisper:Processing audio with duration 00:22.008
2025-02-04 22:11:22 INFO:faster_whisper:Detected language 'zh' with probability 1.00
2025-02-04 22:11:36 /Whisper-WebUI/venv/lib/python3.11/site-packages/pyannote/audio/models/blocks/pooling.py:104: UserWarning: std(): degrees of freedom is <= 0. Correction should be strictly less than the reduction factor (input numel divided by output numel). (Triggered internally at /pytorch/aten/src/ATen/native/ReduceOps.cpp:1831.)
2025-02-04 22:11:36   std = sequences.std(dim=-1, correction=1)
2025-02-04 22:11:37 /Whisper-WebUI/venv/lib/python3.11/site-packages/torch/cuda/memory.py:391: FutureWarning: torch.cuda.reset_max_memory_allocated now calls torch.cuda.reset_peak_memory_stats, which resets /all/ peak memory stats.
2025-02-04 22:11:37   warnings.warn(
2025-02-04 22:11:49 INFO:faster_whisper:Processing audio with duration 00:21.480
2025-02-04 22:11:53 INFO:faster_whisper:Detected language 'zh' with probability 1.00
2025-02-04 22:12:07 /Whisper-WebUI/venv/lib/python3.11/site-packages/pyannote/audio/models/blocks/pooling.py:104: UserWarning: std(): degrees of freedom is <= 0. Correction should be strictly less than the reduction factor (input numel divided by output numel). (Triggered internally at /pytorch/aten/src/ATen/native/ReduceOps.cpp:1831.)
2025-02-04 22:12:07   std = sequences.std(dim=-1, correction=1)
2025-02-04 22:12:07 /Whisper-WebUI/venv/lib/python3.11/site-packages/torch/cuda/memory.py:391: FutureWarning: torch.cuda.reset_max_memory_allocated now calls torch.cuda.reset_peak_memory_stats, which resets /all/ peak memory stats.
2025-02-04 22:12:07   warnings.warn(
2025-02-04 22:12:19 INFO:faster_whisper:Processing audio with duration 00:22.656
2025-02-04 22:12:24 INFO:faster_whisper:Detected language 'zh' with probability 1.00
2025-02-04 22:12:37 /Whisper-WebUI/venv/lib/python3.11/site-packages/pyannote/audio/models/blocks/pooling.py:104: UserWarning: std(): degrees of freedom is <= 0. Correction should be strictly less than the reduction factor (input numel divided by output numel). (Triggered internally at /pytorch/aten/src/ATen/native/ReduceOps.cpp:1831.)
2025-02-04 22:12:37   std = sequences.std(dim=-1, correction=1)
2025-02-04 22:12:37 /Whisper-WebUI/venv/lib/python3.11/site-packages/torch/cuda/memory.py:391: FutureWarning: torch.cuda.reset_max_memory_allocated now calls torch.cuda.reset_peak_memory_stats, which resets /all/ peak memory stats.
2025-02-04 22:12:37   warnings.warn(
2025-02-05 14:42:28 2025-02-05 06:42:28,647 - Whisper-WebUI - INFO - The file /tmp/gradio/0c47604add19b8751b51b7f3776943c9d8aeb6540b9c362c4500985d8ea6722c/FPS II修改20250205.m4a is not able to open or corrupted. Please check the file. Error opening '/tmp/gradio/0c47604add19b8751b51b7f3776943c9d8aeb6540b9c362c4500985d8ea6722c/FPS II修改20250205.m4a': Format not recognised.
2025-02-05 14:42:28 INFO:Whisper-WebUI:The file /tmp/gradio/0c47604add19b8751b51b7f3776943c9d8aeb6540b9c362c4500985d8ea6722c/FPS II修改20250205.m4a is not able to open or corrupted. Please check the file. Error opening '/tmp/gradio/0c47604add19b8751b51b7f3776943c9d8aeb6540b9c362c4500985d8ea6722c/FPS II修改20250205.m4a': Format not recognised.
2025-02-05 14:42:28 Traceback (most recent call last):
2025-02-05 14:42:28   File "/Whisper-WebUI/modules/whisper/base_transcription_pipeline.py", line 287, in transcribe_file
2025-02-05 14:42:28     subtitle, file_path = generate_file(
2025-02-05 14:42:28                           ^^^^^^^^^^^^^^
2025-02-05 14:42:28   File "/Whisper-WebUI/modules/utils/subtitle_manager.py", line 440, in generate_file
2025-02-05 14:42:28     file_writer(result=result, output_file_name=output_file_name, **kwargs)
2025-02-05 14:42:28   File "/Whisper-WebUI/modules/utils/subtitle_manager.py", line 88, in __call__
2025-02-05 14:42:28     self.write_result(result, file=f, options=options, **kwargs)
2025-02-05 14:42:28   File "/Whisper-WebUI/modules/utils/subtitle_manager.py", line 106, in write_result
2025-02-05 14:42:28     print(segment["text"].strip(), file=file, flush=True)
2025-02-05 14:42:28           ^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:42:28 AttributeError: 'NoneType' object has no attribute 'strip'
2025-02-05 14:42:28 
2025-02-05 14:42:28 The above exception was the direct cause of the following exception:
2025-02-05 14:42:28 
2025-02-05 14:42:28 Traceback (most recent call last):
2025-02-05 14:42:28   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/queueing.py", line 625, in process_events
2025-02-05 14:42:28     response = await route_utils.call_process_api(
2025-02-05 14:42:28                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:42:28   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/route_utils.py", line 322, in call_process_api
2025-02-05 14:42:28     output = await app.get_blocks().process_api(
2025-02-05 14:42:28              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:42:28   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/blocks.py", line 2044, in process_api
2025-02-05 14:42:28     result = await self.call_function(
2025-02-05 14:42:28              ^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:42:28   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/blocks.py", line 1591, in call_function
2025-02-05 14:42:28     prediction = await anyio.to_thread.run_sync(  # type: ignore
2025-02-05 14:42:28                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:42:28   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
2025-02-05 14:42:28     return await get_async_backend().run_sync_in_worker_thread(
2025-02-05 14:42:28            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:42:28   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2461, in run_sync_in_worker_thread
2025-02-05 14:42:28     return await future
2025-02-05 14:42:28            ^^^^^^^^^^^^
2025-02-05 14:42:28   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 962, in run
2025-02-05 14:42:28     result = context.run(func, *args)
2025-02-05 14:42:28              ^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:42:28   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/utils.py", line 883, in wrapper
2025-02-05 14:42:28     response = f(*args, **kwargs)
2025-02-05 14:42:28                ^^^^^^^^^^^^^^^^^^
2025-02-05 14:42:28   File "/Whisper-WebUI/modules/whisper/base_transcription_pipeline.py", line 311, in transcribe_file
2025-02-05 14:42:28     raise RuntimeError(f"Error transcribing file: {e}") from e
2025-02-05 14:42:28 RuntimeError: Error transcribing file: 'NoneType' object has no attribute 'strip'
2025-02-05 14:42:35 2025-02-05 06:42:35,892 - Whisper-WebUI - INFO - The file /tmp/gradio/0c47604add19b8751b51b7f3776943c9d8aeb6540b9c362c4500985d8ea6722c/FPS II修改20250205.m4a is not able to open or corrupted. Please check the file. Error opening '/tmp/gradio/0c47604add19b8751b51b7f3776943c9d8aeb6540b9c362c4500985d8ea6722c/FPS II修改20250205.m4a': Format not recognised.
2025-02-05 14:42:35 INFO:Whisper-WebUI:The file /tmp/gradio/0c47604add19b8751b51b7f3776943c9d8aeb6540b9c362c4500985d8ea6722c/FPS II修改20250205.m4a is not able to open or corrupted. Please check the file. Error opening '/tmp/gradio/0c47604add19b8751b51b7f3776943c9d8aeb6540b9c362c4500985d8ea6722c/FPS II修改20250205.m4a': Format not recognised.
2025-02-05 14:42:35 Traceback (most recent call last):
2025-02-05 14:42:35   File "/Whisper-WebUI/modules/whisper/base_transcription_pipeline.py", line 287, in transcribe_file
2025-02-05 14:42:35     subtitle, file_path = generate_file(
2025-02-05 14:42:35                           ^^^^^^^^^^^^^^
2025-02-05 14:42:35   File "/Whisper-WebUI/modules/utils/subtitle_manager.py", line 440, in generate_file
2025-02-05 14:42:35     file_writer(result=result, output_file_name=output_file_name, **kwargs)
2025-02-05 14:42:35   File "/Whisper-WebUI/modules/utils/subtitle_manager.py", line 88, in __call__
2025-02-05 14:42:35     self.write_result(result, file=f, options=options, **kwargs)
2025-02-05 14:42:35   File "/Whisper-WebUI/modules/utils/subtitle_manager.py", line 106, in write_result
2025-02-05 14:42:35     print(segment["text"].strip(), file=file, flush=True)
2025-02-05 14:42:35           ^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:42:35 AttributeError: 'NoneType' object has no attribute 'strip'
2025-02-05 14:42:35 
2025-02-05 14:42:35 The above exception was the direct cause of the following exception:
2025-02-05 14:42:35 
2025-02-05 14:42:35 Traceback (most recent call last):
2025-02-05 14:42:35   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/queueing.py", line 625, in process_events
2025-02-05 14:42:35     response = await route_utils.call_process_api(
2025-02-05 14:42:35                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:42:35   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/route_utils.py", line 322, in call_process_api
2025-02-05 14:42:35     output = await app.get_blocks().process_api(
2025-02-05 14:42:35              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:42:35   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/blocks.py", line 2044, in process_api
2025-02-05 14:42:35     result = await self.call_function(
2025-02-05 14:42:35              ^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:42:35   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/blocks.py", line 1591, in call_function
2025-02-05 14:42:35     prediction = await anyio.to_thread.run_sync(  # type: ignore
2025-02-05 14:42:35                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:42:35   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
2025-02-05 14:42:35     return await get_async_backend().run_sync_in_worker_thread(
2025-02-05 14:42:35            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:42:35   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2461, in run_sync_in_worker_thread
2025-02-05 14:42:35     return await future
2025-02-05 14:42:35            ^^^^^^^^^^^^
2025-02-05 14:42:35   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 962, in run
2025-02-05 14:42:35     result = context.run(func, *args)
2025-02-05 14:42:35              ^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:42:35   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/utils.py", line 883, in wrapper
2025-02-05 14:42:35     response = f(*args, **kwargs)
2025-02-05 14:42:35                ^^^^^^^^^^^^^^^^^^
2025-02-05 14:42:35   File "/Whisper-WebUI/modules/whisper/base_transcription_pipeline.py", line 311, in transcribe_file
2025-02-05 14:42:35     raise RuntimeError(f"Error transcribing file: {e}") from e
2025-02-05 14:42:35 RuntimeError: Error transcribing file: 'NoneType' object has no attribute 'strip'
2025-02-05 14:43:49 2025-02-05 06:43:49,270 - Whisper-WebUI - INFO - The file /tmp/gradio/0c47604add19b8751b51b7f3776943c9d8aeb6540b9c362c4500985d8ea6722c/FPS II修改20250205.m4a is not able to open or corrupted. Please check the file. Error opening '/tmp/gradio/0c47604add19b8751b51b7f3776943c9d8aeb6540b9c362c4500985d8ea6722c/FPS II修改20250205.m4a': Format not recognised.
2025-02-05 14:43:49 INFO:Whisper-WebUI:The file /tmp/gradio/0c47604add19b8751b51b7f3776943c9d8aeb6540b9c362c4500985d8ea6722c/FPS II修改20250205.m4a is not able to open or corrupted. Please check the file. Error opening '/tmp/gradio/0c47604add19b8751b51b7f3776943c9d8aeb6540b9c362c4500985d8ea6722c/FPS II修改20250205.m4a': Format not recognised.
2025-02-05 14:43:49 Traceback (most recent call last):
2025-02-05 14:43:49   File "/Whisper-WebUI/modules/whisper/base_transcription_pipeline.py", line 287, in transcribe_file
2025-02-05 14:43:49     subtitle, file_path = generate_file(
2025-02-05 14:43:49                           ^^^^^^^^^^^^^^
2025-02-05 14:43:49   File "/Whisper-WebUI/modules/utils/subtitle_manager.py", line 440, in generate_file
2025-02-05 14:43:49     file_writer(result=result, output_file_name=output_file_name, **kwargs)
2025-02-05 14:43:49   File "/Whisper-WebUI/modules/utils/subtitle_manager.py", line 88, in __call__
2025-02-05 14:43:49     self.write_result(result, file=f, options=options, **kwargs)
2025-02-05 14:43:49   File "/Whisper-WebUI/modules/utils/subtitle_manager.py", line 106, in write_result
2025-02-05 14:43:49     print(segment["text"].strip(), file=file, flush=True)
2025-02-05 14:43:49           ^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:43:49 AttributeError: 'NoneType' object has no attribute 'strip'
2025-02-05 14:43:49 
2025-02-05 14:43:49 The above exception was the direct cause of the following exception:
2025-02-05 14:43:49 
2025-02-05 14:43:49 Traceback (most recent call last):
2025-02-05 14:43:49   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/queueing.py", line 625, in process_events
2025-02-05 14:43:49     response = await route_utils.call_process_api(
2025-02-05 14:43:49                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:43:49   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/route_utils.py", line 322, in call_process_api
2025-02-05 14:43:49     output = await app.get_blocks().process_api(
2025-02-05 14:43:49              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:43:49   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/blocks.py", line 2044, in process_api
2025-02-05 14:43:49     result = await self.call_function(
2025-02-05 14:43:49              ^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:43:49   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/blocks.py", line 1591, in call_function
2025-02-05 14:43:49     prediction = await anyio.to_thread.run_sync(  # type: ignore
2025-02-05 14:43:49                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:43:49   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
2025-02-05 14:43:49     return await get_async_backend().run_sync_in_worker_thread(
2025-02-05 14:43:49            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:43:49   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2461, in run_sync_in_worker_thread
2025-02-05 14:43:49     return await future
2025-02-05 14:43:49            ^^^^^^^^^^^^
2025-02-05 14:43:49   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 962, in run
2025-02-05 14:43:49     result = context.run(func, *args)
2025-02-05 14:43:49              ^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:43:49   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/utils.py", line 883, in wrapper
2025-02-05 14:43:49     response = f(*args, **kwargs)
2025-02-05 14:43:49                ^^^^^^^^^^^^^^^^^^
2025-02-05 14:43:49   File "/Whisper-WebUI/modules/whisper/base_transcription_pipeline.py", line 311, in transcribe_file
2025-02-05 14:43:49     raise RuntimeError(f"Error transcribing file: {e}") from e
2025-02-05 14:43:49 RuntimeError: Error transcribing file: 'NoneType' object has no attribute 'strip'
2025-02-05 14:52:54 INFO:faster_whisper:Processing audio with duration 01:05:56.400
2025-02-05 14:53:02 INFO:faster_whisper:Detected language 'zh' with probability 0.98

@flier268 flier268 added the bug Something isn't working label Feb 5, 2025
@jhj0517
Copy link
Owner

jhj0517 commented Feb 5, 2025

Hi, thanks for reporting.
m4a is not supported with soundfile, I may add / refactor later to load audio with ffmpeg not soundfile.

@flier268
Copy link
Author

flier268 commented Feb 5, 2025

By the way, there are nothing error message on Web, let me feel confuse

@jhj0517
Copy link
Owner

jhj0517 commented Feb 5, 2025

This is now fixed with #493, please reopen the issue if the problem persists.

@jhj0517 jhj0517 closed this as completed Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants