Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding words in trained model dictonary #382

Closed
Tortoise17 opened this issue Jan 22, 2021 · 21 comments
Closed

adding words in trained model dictonary #382

Tortoise17 opened this issue Jan 22, 2021 · 21 comments

Comments

@Tortoise17
Copy link

I want to ask that what is the way to add some words in the vosk_trained engine dictionary?
is there any function which can add or customize?

@LuggerMan
Copy link

@Tortoise17 this is one of the bigger problems, u need to update the vocabulary, Read this https://alphacephei.com/vosk/adaptation

@dazzzed
Copy link

dazzzed commented Feb 19, 2021

@Tortoise17 this is one of the bigger problems, u need to update the vocabulary, Read this https://alphacephei.com/vosk/adaptation

In the adaptation page, we can read:

You can not introduce new words this way, that is something we will cover later.

So, I'm still looking for the solution by exploring the Kaldi docs, so far, it seems to be the way, doesn't seem to be easy, if anyone finds a good tutorial, article, documentation on this, I'd appreciate.

@nshmyrev nshmyrev mentioned this issue Feb 21, 2021
@LuggerMan
Copy link

LuggerMan commented Feb 24, 2021

@dazzzed lmao, read past this line, later means two strings below

Updating words and the vocabulary
For more detailed guide see this post.
https://chrisearch.wordpress.com/2017/03/11/speech-recognition-using-kaldi-extending-and-using-the-aspire-model/

this might get you on track

@Tortoise17
Copy link
Author

Thank you .. !

You said, that
The ones which have HCLG.fst are static

and also there is methodology to update that graph too but these are static graphs which i am interested. what thing make them static and is there any chance to make them dynamic?or retrain from this accuracy point?
I am asking about the models which are around 1GB big or so. fr-pguyot-zamia-20191016-tdnn_f vosk-model-de-0.6 vosk-model-nl-spraakherkenning-0.6

@nshmyrev
Copy link
Collaborator

nshmyrev commented Mar 5, 2021

what thing make them static and is there any chance to make them dynamic?

If you have all necessary model files (tree, phonemes) you can build both dynamic graph and static graph with mkgraph.sh and mkgraph_lookahead.sh kaldi scripts

or retrain from this accuracy point?

Accuracy is the same, speed is slightly slower

I am asking about the models which are around 1GB big or so. fr-pguyot-zamia-20191016-tdnn_f
vosk-model-de-0.6 vosk-model-nl-spraakherkenning-0.6

Yes, you can build dynamic graphs from these models with mkgraph_lookahead script which you can find in kaldi repo.

@Tortoise17
Copy link
Author

OMG .. !! you are super great professor. I should be your student really. Lot to learn from you. Thank you so much for this great effort. and I will keep asking some more questions. This is really great work.

@nshmyrev
Copy link
Collaborator

nshmyrev commented Mar 5, 2021

Thank you, @Tortoise17, I hope it is useful for you. You have always have an opportunity to join Vosk project an learn more ;).

@Tortoise17
Copy link
Author

How to join the project? Please let me know.

@nshmyrev
Copy link
Collaborator

nshmyrev commented Mar 5, 2021

Pick up any issue an try so solve. Like this one:

#180

@Tortoise17
Copy link
Author

I will have a look and I will be in contact with you for my updates as well.

@FirasHm
Copy link

FirasHm commented May 18, 2021

I am still working on this issue, did anyone resolve tit without using static graphs?

@nshmyrev
Copy link
Collaborator

We have recently added documentation on proper process:

https://alphacephei.com/vosk/lm

@ExtReMLapin
Copy link

Hello,
I followed the process in the link posted above my message and it seems to work correctly.
As for the example I added the word COVID in the French model (big one) by writing "covid" into extra.txt and the corresponding pronunciation inside extra.dic based on how other words sounds, I'm not sure how (and if) I could have used SIMPA.

The main issue is the compilation took a hell of a time, about 24 minutes. It was using only 1 out of my 8 cores and not the GPU at all.
From memory it was on ngram.
Is there any way/plan to support GPU compilation or multithreaded compilation ?
I'm fairly new to this whole thing so pardon me if i'm being a fool.

@nshmyrev
Copy link
Collaborator

The main issue is the compilation took a hell of a time, about 24 minutes. It was using only 1 out of my 8 cores and not the GPU at all.

It is not yet easy to speedup things. Maybe something on srilm part but, in general, it is going to be about that. If you update once a day it is ok.

@ExtReMLapin
Copy link

ExtReMLapin commented Jan 10, 2022

It's true it's a "non-problem" for the average application workflow.
However, let's say for example on an interface, you have your words output, with highlighting on each word depending on the confidence of the model on the said word.

Now let's say you could click on the word with low confidence and correct it by adding a new word (with it's pronunciation).
If compilation time was reduced to few seconds (In an imaginary world, obviously), you could directly show to the user the fixed word and he could -without waiting- see if it works.

Obviously, I'm not expecting this kind of seconds-only compilation times, but reducing it by few minutes could be great.

Don't misunderstand me, sure even 1h compilation time is good enough, the faster we go (without losing accuracy), the wider the feature spectrum is.

@nshmyrev
Copy link
Collaborator

Now let's say you could click on the word with low confidence and correct it by adding a new word (with it's pronunciation).

You also need ngram probabilities for that word, not just pronunciation. So not that straight.

If compilation time was reduced to few seconds (In an imaginary world, obviously), you could directly show to the user the fixed word and he could -without waiting- see if it works.

You can check kaldi-active-grammar project, it can do that.

@ClintonBrits
Copy link

I was looking at extending the the vocabulary as well but the only source that i have managed to find on how to do this was the link https://chrisearch.wordpress.com/2017/03/11/speech-recognition-using-kaldi-extending-and-using-the-aspire-model/,

Can anyone recommend a good starting model that can be used for this. The Aspire chain model is what was used apparently worked but despite taking days for it to make a graph, it results in a graph that is too big for applications like the android-vosk library.

Is there perhaps a better starting point than the one they mention in the article?

@nshmyrev
Copy link
Collaborator

nshmyrev commented Aug 1, 2023

@ClintonBrits
Copy link

https://github.com/alphacep/vosk-api/blob/master/python/example/colab/vosk-adaptation.ipynb

Unbelievably well put together solution. Thanks for the link

@DrunkJin
Copy link

How do I configure the model for use after I proceed with adaptation? I want to use the learned model in Python.

@nshmyrev
Copy link
Collaborator

Same as #1687

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

8 participants