Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transcription in logs file is empty #18

Open
PiotrEsse opened this issue Jan 31, 2024 · 4 comments
Open

transcription in logs file is empty #18

PiotrEsse opened this issue Jan 31, 2024 · 4 comments

Comments

@PiotrEsse
Copy link

PiotrEsse commented Jan 31, 2024

Hi,
thank You for Your work but I am having issues.
Theres no error but after run your example I am getting an almost empty file in logs:
In the file theres only following string:
zach (206.8 : 206.8) :

In terminal theres no errors>

(speechlib39) piotr@Legion7:~/speechlib/examples$ python3 transcribe.py
/home/piotr/anaconda3/envs/speechlib39/lib/python3.9/site-packages/pyannote/audio/core/io.py:43: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
  torchaudio.set_audio_backend("soundfile")
/home/piotr/anaconda3/envs/speechlib39/lib/python3.9/site-packages/torch_audiomentations/utils/io.py:27: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
  torchaudio.set_audio_backend("soundfile")
torchvision is not available - cannot save figures
obama_zach.wav is already in WAV format.
obama_zach.wav is already a mono audio file.
The file already has 16-bit samples.
config.yaml: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [00:00<00:00, 292kB/s]pytorch_model.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17.7M/17.7M [00:00<00:00, 19.4MB/s]config.yaml: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 318/318 [00:00<00:00, 36.2kB/s]Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.1.3. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint ../../.cache/torch/pyannote/models--pyannote--segmentation/snapshots/c4c8ceafcbb3a7a280c2d357aee9fbc9b0be7f9b/pytorch_model.bin`
Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.2.0+cu121. Bad things might happen unless you revert torch to 1.x.
running diarization...
diarization done. Time taken: 17 seconds.
running speaker recognition...
speaker recognition done. Time taken: 4 seconds.
running transcription...
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.26k/2.26k [00:00<00:00, 660kB/s]vocabulary.txt: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 460k/460k [00:00<00:00, 1.02MB/s]tokenizer.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.20M/2.20M [00:00<00:00, 3.03MB/s]model.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.53G/1.53G [00:58<00:00, 26.0MB/s]Cannot check for SPDIF
transcription done. Time taken: 140 seconds.
(speechlib39) piotr@Legion7:~/speechlib/examples$ ls
README.md  audio_cache  logs  obama1.mp3  obama1.wav  obama_zach.wav  preprocess.py  pretrained_models  segments  temp  transcribe.py  voices
(speechlib39) piotr@Legion7:~/speechlib/examples$ python3 transcribe.py
/home/piotr/anaconda3/envs/speechlib39/lib/python3.9/site-packages/pyannote/audio/core/io.py:43: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
  torchaudio.set_audio_backend("soundfile")
/home/piotr/anaconda3/envs/speechlib39/lib/python3.9/site-packages/torch_audiomentations/utils/io.py:27: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
  torchaudio.set_audio_backend("soundfile")
torchvision is not available - cannot save figures
obama_zach.wav is already in WAV format.
obama_zach.wav is already a mono audio file.
The file already has 16-bit samples.
Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.1.3. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint ../../.cache/torch/pyannote/models--pyannote--segmentation/snapshots/c4c8ceafcbb3a7a280c2d357aee9fbc9b0be7f9b/pytorch_model.bin`
Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.2.0+cu121. Bad things might happen unless you revert torch to 1.x.
running diarization...
diarization done. Time taken: 14 seconds.
running speaker recognition...
speaker recognition done. Time taken: 4 seconds.
running transcription...
Cannot check for SPDIF
transcription done. Time taken: 82 seconds.

Content of the file:
image

I have python 3.9, clean conda env. Whisper works flawleslly

@NavodPeiris
Copy link
Owner

  1. did u ran the same example in this repo? if not then post the code.
  2. what is the model size you used?
  3. did you input paths to obama_zach file correctly?
  4. can you run this in normal python environment instead of conda and tell me if error persists

@PiotrEsse
Copy link
Author

Ad 1. Yes, Ive run same example, whithout any changes. I use diarize.py
~/speechlib/examples$ python3 transcribe.py

obama_zach_143156_en.txt
Ad 2. I use medium
Ad 3. Yes - it process the file. It takes time - 79sec to be precisely
Ad 4. Sure, Ill have to prepare clean WSL VM.

@elia-morrison
Copy link

This can happen due to a number of reasons because of an insane try/except block in this function.

It literally says:

try:
    trans = transcribe(file, language, modelSize, quantization)  
    
    # return -> [[start time, end time, transcript], [start time, end time, transcript], ..]
    texts.append([segment[0], segment[1], trans])
except:
    pass

I removed this via a monkeypatch and it revealed the actual issue:

ValueError: Requested float16 compute type, but the target device or backend do not support efficient float16 computation.

This is a common issue for faster-whisper and is discussed here: SYSTRAN/faster-whisper#42
There may be a different error in your case.

@tomich
Copy link
Contributor

tomich commented May 30, 2024

Im having the same problem and it could be solved partially with

#37

In the meantime, i'll try to create a branch in my fork that doesn't use faster-whisper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants