transcription in logs file is empty #18

PiotrEsse · 2024-01-31T18:17:22Z

Hi,
thank You for Your work but I am having issues.
Theres no error but after run your example I am getting an almost empty file in logs:
In the file theres only following string:
zach (206.8 : 206.8) :

In terminal theres no errors>

(speechlib39) piotr@Legion7:~/speechlib/examples$ python3 transcribe.py
/home/piotr/anaconda3/envs/speechlib39/lib/python3.9/site-packages/pyannote/audio/core/io.py:43: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
  torchaudio.set_audio_backend("soundfile")
/home/piotr/anaconda3/envs/speechlib39/lib/python3.9/site-packages/torch_audiomentations/utils/io.py:27: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
  torchaudio.set_audio_backend("soundfile")
torchvision is not available - cannot save figures
obama_zach.wav is already in WAV format.
obama_zach.wav is already a mono audio file.
The file already has 16-bit samples.
config.yaml: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [00:00<00:00, 292kB/s]pytorch_model.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17.7M/17.7M [00:00<00:00, 19.4MB/s]config.yaml: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 318/318 [00:00<00:00, 36.2kB/s]Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.1.3. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint ../../.cache/torch/pyannote/models--pyannote--segmentation/snapshots/c4c8ceafcbb3a7a280c2d357aee9fbc9b0be7f9b/pytorch_model.bin`
Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.2.0+cu121. Bad things might happen unless you revert torch to 1.x.
running diarization...
diarization done. Time taken: 17 seconds.
running speaker recognition...
speaker recognition done. Time taken: 4 seconds.
running transcription...
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.26k/2.26k [00:00<00:00, 660kB/s]vocabulary.txt: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 460k/460k [00:00<00:00, 1.02MB/s]tokenizer.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.20M/2.20M [00:00<00:00, 3.03MB/s]model.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.53G/1.53G [00:58<00:00, 26.0MB/s]Cannot check for SPDIF
transcription done. Time taken: 140 seconds.
(speechlib39) piotr@Legion7:~/speechlib/examples$ ls
README.md  audio_cache  logs  obama1.mp3  obama1.wav  obama_zach.wav  preprocess.py  pretrained_models  segments  temp  transcribe.py  voices
(speechlib39) piotr@Legion7:~/speechlib/examples$ python3 transcribe.py
/home/piotr/anaconda3/envs/speechlib39/lib/python3.9/site-packages/pyannote/audio/core/io.py:43: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
  torchaudio.set_audio_backend("soundfile")
/home/piotr/anaconda3/envs/speechlib39/lib/python3.9/site-packages/torch_audiomentations/utils/io.py:27: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
  torchaudio.set_audio_backend("soundfile")
torchvision is not available - cannot save figures
obama_zach.wav is already in WAV format.
obama_zach.wav is already a mono audio file.
The file already has 16-bit samples.
Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.1.3. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint ../../.cache/torch/pyannote/models--pyannote--segmentation/snapshots/c4c8ceafcbb3a7a280c2d357aee9fbc9b0be7f9b/pytorch_model.bin`
Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.2.0+cu121. Bad things might happen unless you revert torch to 1.x.
running diarization...
diarization done. Time taken: 14 seconds.
running speaker recognition...
speaker recognition done. Time taken: 4 seconds.
running transcription...
Cannot check for SPDIF
transcription done. Time taken: 82 seconds.

Content of the file:

I have python 3.9, clean conda env. Whisper works flawleslly

The text was updated successfully, but these errors were encountered:

NavodPeiris · 2024-02-01T16:24:47Z

did u ran the same example in this repo? if not then post the code.
what is the model size you used?
did you input paths to obama_zach file correctly?
can you run this in normal python environment instead of conda and tell me if error persists

PiotrEsse · 2024-02-02T13:35:35Z

Ad 1. Yes, Ive run same example, whithout any changes. I use diarize.py
~/speechlib/examples$ python3 transcribe.py

obama_zach_143156_en.txt
Ad 2. I use medium
Ad 3. Yes - it process the file. It takes time - 79sec to be precisely
Ad 4. Sure, Ill have to prepare clean WSL VM.

elia-morrison · 2024-04-09T09:09:23Z

This can happen due to a number of reasons because of an insane try/except block in this function.

It literally says:

try:
    trans = transcribe(file, language, modelSize, quantization)  
    
    # return -> [[start time, end time, transcript], [start time, end time, transcript], ..]
    texts.append([segment[0], segment[1], trans])
except:
    pass

I removed this via a monkeypatch and it revealed the actual issue:

ValueError: Requested float16 compute type, but the target device or backend do not support efficient float16 computation.

This is a common issue for faster-whisper and is discussed here: SYSTRAN/faster-whisper#42
There may be a different error in your case.

tomich · 2024-05-30T13:56:01Z

Im having the same problem and it could be solved partially with

#37

In the meantime, i'll try to create a branch in my fork that doesn't use faster-whisper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

transcription in logs file is empty #18

transcription in logs file is empty #18

PiotrEsse commented Jan 31, 2024 •

edited

Loading

NavodPeiris commented Feb 1, 2024

PiotrEsse commented Feb 2, 2024

elia-morrison commented Apr 9, 2024

tomich commented May 30, 2024

transcription in logs file is empty #18

transcription in logs file is empty #18

Comments

PiotrEsse commented Jan 31, 2024 • edited Loading

NavodPeiris commented Feb 1, 2024

PiotrEsse commented Feb 2, 2024

elia-morrison commented Apr 9, 2024

tomich commented May 30, 2024

PiotrEsse commented Jan 31, 2024 •

edited

Loading