Not support m4a format audio? #492

flier268 · 2025-02-05T06:58:53Z

Which OS are you using?

OS: Docker

When I I upload m4a format audo and press "GENERATE SUBTITLE FILE", I got fail
After convert that to mp3, it works!

2025-01-31 05:04:57 Use "faster-whisper" implementation
2025-01-31 05:04:57 Device "cuda" is detected
2025-01-31 05:04:58 * Running on local URL:  http://0.0.0.0:7860
2025-01-31 05:04:58 
2025-01-31 05:04:58 To create a public link, set `share=True` in `launch()`.
2025-02-04 22:08:45 WARNING:  Invalid HTTP request received.
2025-02-04 22:08:45 WARNING:  Invalid HTTP request received.
2025-02-04 22:10:43 /Whisper-WebUI/venv/lib/python3.11/site-packages/torch/cuda/memory.py:391: FutureWarning: torch.cuda.reset_max_memory_allocated now calls torch.cuda.reset_peak_memory_stats, which resets /all/ peak memory stats.
2025-02-04 22:10:43   warnings.warn(
2025-02-04 22:10:46 INFO:speechbrain.utils.quirks:Applied quirks (see `speechbrain.utils.quirks`): [disable_jit_profiling, allow_tf32]
2025-02-04 22:10:46 INFO:speechbrain.utils.quirks:Excluded quirks specified by the `SB_DISABLE_QUIRKS` environment (comma-separated list): []
2025-02-04 22:10:49 
pytorch_model.bin:   0% 0.00/5.91M [00:00<?, ?B/s]
pytorch_model.bin: 100% 5.91M/5.91M [00:00<00:00, 7.28MB/s]
pytorch_model.bin: 100% 5.91M/5.91M [00:00<00:00, 7.20MB/s]
2025-02-04 22:10:49 
config.yaml:   0% 0.00/399 [00:00<?, ?B/s]
config.yaml: 100% 399/399 [00:00<00:00, 2.86MB/s]
2025-02-04 22:10:53 
pytorch_model.bin:   0% 0.00/26.6M [00:00<?, ?B/s]
pytorch_model.bin:  39% 10.5M/26.6M [00:01<00:01, 8.12MB/s]
pytorch_model.bin:  79% 21.0M/26.6M [00:01<00:00, 12.6MB/s]
pytorch_model.bin: 100% 26.6M/26.6M [00:01<00:00, 15.6MB/s]
pytorch_model.bin: 100% 26.6M/26.6M [00:01<00:00, 13.4MB/s]
2025-02-04 22:10:54 
config.yaml:   0% 0.00/221 [00:00<?, ?B/s]
config.yaml: 100% 221/221 [00:00<00:00, 2.35MB/s]
2025-02-04 22:10:57 /Whisper-WebUI/venv/lib/python3.11/site-packages/pyannote/audio/utils/reproducibility.py:74: ReproducibilityWarning: TensorFloat-32 (TF32) has been disabled as it might lead to reproducibility issues and lower accuracy.
2025-02-04 22:10:57 It can be re-enabled by calling
2025-02-04 22:10:57    >>> import torch
2025-02-04 22:10:57    >>> torch.backends.cuda.matmul.allow_tf32 = True
2025-02-04 22:10:57    >>> torch.backends.cudnn.allow_tf32 = True
2025-02-04 22:10:57 See https://github.com/pyannote/pyannote-audio/issues/1370 for more details.
2025-02-04 22:10:57 
2025-02-04 22:10:57   warnings.warn(
2025-02-04 22:11:03 /Whisper-WebUI/venv/lib/python3.11/site-packages/pyannote/audio/models/blocks/pooling.py:104: UserWarning: std(): degrees of freedom is <= 0. Correction should be strictly less than the reduction factor (input numel divided by output numel). (Triggered internally at /pytorch/aten/src/ATen/native/ReduceOps.cpp:1831.)
2025-02-04 22:11:03   std = sequences.std(dim=-1, correction=1)
2025-02-04 22:11:04 /Whisper-WebUI/venv/lib/python3.11/site-packages/torch/cuda/memory.py:391: FutureWarning: torch.cuda.reset_max_memory_allocated now calls torch.cuda.reset_peak_memory_stats, which resets /all/ peak memory stats.
2025-02-04 22:11:04   warnings.warn(
2025-02-04 22:11:18 INFO:faster_whisper:Processing audio with duration 00:22.008
2025-02-04 22:11:22 INFO:faster_whisper:Detected language 'zh' with probability 1.00
2025-02-04 22:11:36 /Whisper-WebUI/venv/lib/python3.11/site-packages/pyannote/audio/models/blocks/pooling.py:104: UserWarning: std(): degrees of freedom is <= 0. Correction should be strictly less than the reduction factor (input numel divided by output numel). (Triggered internally at /pytorch/aten/src/ATen/native/ReduceOps.cpp:1831.)
2025-02-04 22:11:36   std = sequences.std(dim=-1, correction=1)
2025-02-04 22:11:37 /Whisper-WebUI/venv/lib/python3.11/site-packages/torch/cuda/memory.py:391: FutureWarning: torch.cuda.reset_max_memory_allocated now calls torch.cuda.reset_peak_memory_stats, which resets /all/ peak memory stats.
2025-02-04 22:11:37   warnings.warn(
2025-02-04 22:11:49 INFO:faster_whisper:Processing audio with duration 00:21.480
2025-02-04 22:11:53 INFO:faster_whisper:Detected language 'zh' with probability 1.00
2025-02-04 22:12:07 /Whisper-WebUI/venv/lib/python3.11/site-packages/pyannote/audio/models/blocks/pooling.py:104: UserWarning: std(): degrees of freedom is <= 0. Correction should be strictly less than the reduction factor (input numel divided by output numel). (Triggered internally at /pytorch/aten/src/ATen/native/ReduceOps.cpp:1831.)
2025-02-04 22:12:07   std = sequences.std(dim=-1, correction=1)
2025-02-04 22:12:07 /Whisper-WebUI/venv/lib/python3.11/site-packages/torch/cuda/memory.py:391: FutureWarning: torch.cuda.reset_max_memory_allocated now calls torch.cuda.reset_peak_memory_stats, which resets /all/ peak memory stats.
2025-02-04 22:12:07   warnings.warn(
2025-02-04 22:12:19 INFO:faster_whisper:Processing audio with duration 00:22.656
2025-02-04 22:12:24 INFO:faster_whisper:Detected language 'zh' with probability 1.00
2025-02-04 22:12:37 /Whisper-WebUI/venv/lib/python3.11/site-packages/pyannote/audio/models/blocks/pooling.py:104: UserWarning: std(): degrees of freedom is <= 0. Correction should be strictly less than the reduction factor (input numel divided by output numel). (Triggered internally at /pytorch/aten/src/ATen/native/ReduceOps.cpp:1831.)
2025-02-04 22:12:37   std = sequences.std(dim=-1, correction=1)
2025-02-04 22:12:37 /Whisper-WebUI/venv/lib/python3.11/site-packages/torch/cuda/memory.py:391: FutureWarning: torch.cuda.reset_max_memory_allocated now calls torch.cuda.reset_peak_memory_stats, which resets /all/ peak memory stats.
2025-02-04 22:12:37   warnings.warn(
2025-02-05 14:42:28 2025-02-05 06:42:28,647 - Whisper-WebUI - INFO - The file /tmp/gradio/0c47604add19b8751b51b7f3776943c9d8aeb6540b9c362c4500985d8ea6722c/FPS II修改20250205.m4a is not able to open or corrupted. Please check the file. Error opening '/tmp/gradio/0c47604add19b8751b51b7f3776943c9d8aeb6540b9c362c4500985d8ea6722c/FPS II修改20250205.m4a': Format not recognised.
2025-02-05 14:42:28 INFO:Whisper-WebUI:The file /tmp/gradio/0c47604add19b8751b51b7f3776943c9d8aeb6540b9c362c4500985d8ea6722c/FPS II修改20250205.m4a is not able to open or corrupted. Please check the file. Error opening '/tmp/gradio/0c47604add19b8751b51b7f3776943c9d8aeb6540b9c362c4500985d8ea6722c/FPS II修改20250205.m4a': Format not recognised.
2025-02-05 14:42:28 Traceback (most recent call last):
2025-02-05 14:42:28   File "/Whisper-WebUI/modules/whisper/base_transcription_pipeline.py", line 287, in transcribe_file
2025-02-05 14:42:28     subtitle, file_path = generate_file(
2025-02-05 14:42:28                           ^^^^^^^^^^^^^^
2025-02-05 14:42:28   File "/Whisper-WebUI/modules/utils/subtitle_manager.py", line 440, in generate_file
2025-02-05 14:42:28     file_writer(result=result, output_file_name=output_file_name, **kwargs)
2025-02-05 14:42:28   File "/Whisper-WebUI/modules/utils/subtitle_manager.py", line 88, in __call__
2025-02-05 14:42:28     self.write_result(result, file=f, options=options, **kwargs)
2025-02-05 14:42:28   File "/Whisper-WebUI/modules/utils/subtitle_manager.py", line 106, in write_result
2025-02-05 14:42:28     print(segment["text"].strip(), file=file, flush=True)
2025-02-05 14:42:28           ^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:42:28 AttributeError: 'NoneType' object has no attribute 'strip'
2025-02-05 14:42:28 
2025-02-05 14:42:28 The above exception was the direct cause of the following exception:
2025-02-05 14:42:28 
2025-02-05 14:42:28 Traceback (most recent call last):
2025-02-05 14:42:28   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/queueing.py", line 625, in process_events
2025-02-05 14:42:28     response = await route_utils.call_process_api(
2025-02-05 14:42:28                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:42:28   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/route_utils.py", line 322, in call_process_api
2025-02-05 14:42:28     output = await app.get_blocks().process_api(
2025-02-05 14:42:28              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:42:28   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/blocks.py", line 2044, in process_api
2025-02-05 14:42:28     result = await self.call_function(
2025-02-05 14:42:28              ^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:42:28   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/blocks.py", line 1591, in call_function
2025-02-05 14:42:28     prediction = await anyio.to_thread.run_sync(  # type: ignore
2025-02-05 14:42:28                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:42:28   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
2025-02-05 14:42:28     return await get_async_backend().run_sync_in_worker_thread(
2025-02-05 14:42:28            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:42:28   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2461, in run_sync_in_worker_thread
2025-02-05 14:42:28     return await future
2025-02-05 14:42:28            ^^^^^^^^^^^^
2025-02-05 14:42:28   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 962, in run
2025-02-05 14:42:28     result = context.run(func, *args)
2025-02-05 14:42:28              ^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:42:28   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/utils.py", line 883, in wrapper
2025-02-05 14:42:28     response = f(*args, **kwargs)
2025-02-05 14:42:28                ^^^^^^^^^^^^^^^^^^
2025-02-05 14:42:28   File "/Whisper-WebUI/modules/whisper/base_transcription_pipeline.py", line 311, in transcribe_file
2025-02-05 14:42:28     raise RuntimeError(f"Error transcribing file: {e}") from e
2025-02-05 14:42:28 RuntimeError: Error transcribing file: 'NoneType' object has no attribute 'strip'
2025-02-05 14:42:35 2025-02-05 06:42:35,892 - Whisper-WebUI - INFO - The file /tmp/gradio/0c47604add19b8751b51b7f3776943c9d8aeb6540b9c362c4500985d8ea6722c/FPS II修改20250205.m4a is not able to open or corrupted. Please check the file. Error opening '/tmp/gradio/0c47604add19b8751b51b7f3776943c9d8aeb6540b9c362c4500985d8ea6722c/FPS II修改20250205.m4a': Format not recognised.
2025-02-05 14:42:35 INFO:Whisper-WebUI:The file /tmp/gradio/0c47604add19b8751b51b7f3776943c9d8aeb6540b9c362c4500985d8ea6722c/FPS II修改20250205.m4a is not able to open or corrupted. Please check the file. Error opening '/tmp/gradio/0c47604add19b8751b51b7f3776943c9d8aeb6540b9c362c4500985d8ea6722c/FPS II修改20250205.m4a': Format not recognised.
2025-02-05 14:42:35 Traceback (most recent call last):
2025-02-05 14:42:35   File "/Whisper-WebUI/modules/whisper/base_transcription_pipeline.py", line 287, in transcribe_file
2025-02-05 14:42:35     subtitle, file_path = generate_file(
2025-02-05 14:42:35                           ^^^^^^^^^^^^^^
2025-02-05 14:42:35   File "/Whisper-WebUI/modules/utils/subtitle_manager.py", line 440, in generate_file
2025-02-05 14:42:35     file_writer(result=result, output_file_name=output_file_name, **kwargs)
2025-02-05 14:42:35   File "/Whisper-WebUI/modules/utils/subtitle_manager.py", line 88, in __call__
2025-02-05 14:42:35     self.write_result(result, file=f, options=options, **kwargs)
2025-02-05 14:42:35   File "/Whisper-WebUI/modules/utils/subtitle_manager.py", line 106, in write_result
2025-02-05 14:42:35     print(segment["text"].strip(), file=file, flush=True)
2025-02-05 14:42:35           ^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:42:35 AttributeError: 'NoneType' object has no attribute 'strip'
2025-02-05 14:42:35 
2025-02-05 14:42:35 The above exception was the direct cause of the following exception:
2025-02-05 14:42:35 
2025-02-05 14:42:35 Traceback (most recent call last):
2025-02-05 14:42:35   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/queueing.py", line 625, in process_events
2025-02-05 14:42:35     response = await route_utils.call_process_api(
2025-02-05 14:42:35                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:42:35   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/route_utils.py", line 322, in call_process_api
2025-02-05 14:42:35     output = await app.get_blocks().process_api(
2025-02-05 14:42:35              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:42:35   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/blocks.py", line 2044, in process_api
2025-02-05 14:42:35     result = await self.call_function(
2025-02-05 14:42:35              ^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:42:35   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/blocks.py", line 1591, in call_function
2025-02-05 14:42:35     prediction = await anyio.to_thread.run_sync(  # type: ignore
2025-02-05 14:42:35                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:42:35   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
2025-02-05 14:42:35     return await get_async_backend().run_sync_in_worker_thread(
2025-02-05 14:42:35            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:42:35   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2461, in run_sync_in_worker_thread
2025-02-05 14:42:35     return await future
2025-02-05 14:42:35            ^^^^^^^^^^^^
2025-02-05 14:42:35   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 962, in run
2025-02-05 14:42:35     result = context.run(func, *args)
2025-02-05 14:42:35              ^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:42:35   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/utils.py", line 883, in wrapper
2025-02-05 14:42:35     response = f(*args, **kwargs)
2025-02-05 14:42:35                ^^^^^^^^^^^^^^^^^^
2025-02-05 14:42:35   File "/Whisper-WebUI/modules/whisper/base_transcription_pipeline.py", line 311, in transcribe_file
2025-02-05 14:42:35     raise RuntimeError(f"Error transcribing file: {e}") from e
2025-02-05 14:42:35 RuntimeError: Error transcribing file: 'NoneType' object has no attribute 'strip'
2025-02-05 14:43:49 2025-02-05 06:43:49,270 - Whisper-WebUI - INFO - The file /tmp/gradio/0c47604add19b8751b51b7f3776943c9d8aeb6540b9c362c4500985d8ea6722c/FPS II修改20250205.m4a is not able to open or corrupted. Please check the file. Error opening '/tmp/gradio/0c47604add19b8751b51b7f3776943c9d8aeb6540b9c362c4500985d8ea6722c/FPS II修改20250205.m4a': Format not recognised.
2025-02-05 14:43:49 INFO:Whisper-WebUI:The file /tmp/gradio/0c47604add19b8751b51b7f3776943c9d8aeb6540b9c362c4500985d8ea6722c/FPS II修改20250205.m4a is not able to open or corrupted. Please check the file. Error opening '/tmp/gradio/0c47604add19b8751b51b7f3776943c9d8aeb6540b9c362c4500985d8ea6722c/FPS II修改20250205.m4a': Format not recognised.
2025-02-05 14:43:49 Traceback (most recent call last):
2025-02-05 14:43:49   File "/Whisper-WebUI/modules/whisper/base_transcription_pipeline.py", line 287, in transcribe_file
2025-02-05 14:43:49     subtitle, file_path = generate_file(
2025-02-05 14:43:49                           ^^^^^^^^^^^^^^
2025-02-05 14:43:49   File "/Whisper-WebUI/modules/utils/subtitle_manager.py", line 440, in generate_file
2025-02-05 14:43:49     file_writer(result=result, output_file_name=output_file_name, **kwargs)
2025-02-05 14:43:49   File "/Whisper-WebUI/modules/utils/subtitle_manager.py", line 88, in __call__
2025-02-05 14:43:49     self.write_result(result, file=f, options=options, **kwargs)
2025-02-05 14:43:49   File "/Whisper-WebUI/modules/utils/subtitle_manager.py", line 106, in write_result
2025-02-05 14:43:49     print(segment["text"].strip(), file=file, flush=True)
2025-02-05 14:43:49           ^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:43:49 AttributeError: 'NoneType' object has no attribute 'strip'
2025-02-05 14:43:49 
2025-02-05 14:43:49 The above exception was the direct cause of the following exception:
2025-02-05 14:43:49 
2025-02-05 14:43:49 Traceback (most recent call last):
2025-02-05 14:43:49   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/queueing.py", line 625, in process_events
2025-02-05 14:43:49     response = await route_utils.call_process_api(
2025-02-05 14:43:49                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:43:49   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/route_utils.py", line 322, in call_process_api
2025-02-05 14:43:49     output = await app.get_blocks().process_api(
2025-02-05 14:43:49              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:43:49   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/blocks.py", line 2044, in process_api
2025-02-05 14:43:49     result = await self.call_function(
2025-02-05 14:43:49              ^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:43:49   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/blocks.py", line 1591, in call_function
2025-02-05 14:43:49     prediction = await anyio.to_thread.run_sync(  # type: ignore
2025-02-05 14:43:49                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:43:49   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
2025-02-05 14:43:49     return await get_async_backend().run_sync_in_worker_thread(
2025-02-05 14:43:49            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:43:49   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2461, in run_sync_in_worker_thread
2025-02-05 14:43:49     return await future
2025-02-05 14:43:49            ^^^^^^^^^^^^
2025-02-05 14:43:49   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 962, in run
2025-02-05 14:43:49     result = context.run(func, *args)
2025-02-05 14:43:49              ^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-05 14:43:49   File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/utils.py", line 883, in wrapper
2025-02-05 14:43:49     response = f(*args, **kwargs)
2025-02-05 14:43:49                ^^^^^^^^^^^^^^^^^^
2025-02-05 14:43:49   File "/Whisper-WebUI/modules/whisper/base_transcription_pipeline.py", line 311, in transcribe_file
2025-02-05 14:43:49     raise RuntimeError(f"Error transcribing file: {e}") from e
2025-02-05 14:43:49 RuntimeError: Error transcribing file: 'NoneType' object has no attribute 'strip'
2025-02-05 14:52:54 INFO:faster_whisper:Processing audio with duration 01:05:56.400
2025-02-05 14:53:02 INFO:faster_whisper:Detected language 'zh' with probability 0.98

The text was updated successfully, but these errors were encountered:

jhj0517 · 2025-02-05T09:59:40Z

Hi, thanks for reporting.
m4a is not supported with soundfile, I may add / refactor later to load audio with ffmpeg not soundfile.

flier268 · 2025-02-05T10:37:57Z

By the way, there are nothing error message on Web, let me feel confuse

jhj0517 · 2025-02-05T10:41:21Z

This is now fixed with #493, please reopen the issue if the problem persists.

flier268 added the bug Something isn't working label Feb 5, 2025

flier268 assigned jhj0517 Feb 5, 2025

jhj0517 mentioned this issue Feb 5, 2025

Refactor audio validation #493

Merged

jhj0517 closed this as completed Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not support m4a format audio? #492

Not support m4a format audio? #492

flier268 commented Feb 5, 2025

jhj0517 commented Feb 5, 2025

flier268 commented Feb 5, 2025

jhj0517 commented Feb 5, 2025

Not support m4a format audio? #492

Not support m4a format audio? #492

Comments

flier268 commented Feb 5, 2025

jhj0517 commented Feb 5, 2025

flier268 commented Feb 5, 2025

jhj0517 commented Feb 5, 2025