GEFS-language-detector

Download the model weights from the Hugging Face

https://huggingface.co/ImranzamanML/GEFS-language-detector

Load model directly from Hugging Face using Transformers

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("ImranzamanML/GEFS-language-detector")
model = AutoModelForSequenceClassification.from_pretrained("ImranzamanML/GEFS-language-detector")

German, English, French and Spanish Language Detector

The GEFS-language-detector model outperformed by achieving an impressive F1 score close to 100%. This result significantly exceeds typical benchmarks and underscores the model's accuracy and reliability in identifying languages. This is a fined tuned model by using the dataset of papluca Language Identification and the base model xlm-roberta-base .

100K downloads of the LLM model

Use a pipeline as a high-level helper

from transformers import pipeline

text=["Mir gefällt die Art und Weise, Sprachen zu erkennen",
      "I like the way to detect languages",
      "Me gusta la forma de detectar idiomas",
      "J'aime la façon de détecter les langues"]
pipe = pipeline("text-classification", model="ImranzamanML/GEFS-language-detector")
lang_detect=pipe(text, top_k=1)
print("The detected language is", lang_detect)

Predicted output:

Model will return the language detection in the language codes like:

  - de as German
  - en as English
  - fr as French
  - es as Spanish

Supported languages

Currently this model support 4 languages but in future more languages will be added.

Following languages supported by the model:

German (de)
English (en)
French (fr)
Spanish (es)

Model Training

Epoch	  Training Loss	    Validation Loss
1	      0.002600	        0.000148  
2	      0.001000	        0.000015
3	      0.000000	        0.000011
4	      0.001800	        0.000009
5	      0.002700	        0.000016
6	      0.001600	        0.000012
7	      0.001300	        0.000009
8	      0.001200	        0.000008
9	      0.000900	        0.000007
10	      0.000900	        0.000007

Testing Results

    Language   Precision   Recall	F1 	     Accuracy
    de	       0.9997	   0.9998	0.9998   0.9999
    en	       1.0000	   1.0000	1.0000	 1.0000
    fr	       0.9995	   0.9996	0.9996	 0.9996
    es	       0.9994	   0.9996	0.9995	 0.9996

About Author

Name: Muhammad Imran Zaman

Company: Theum AG

Role: Lead Machine Learning Engineer

Professional Links:

Kaggle: Profile
LinkedIn: Profile
Google Scholar: Profile
YouTube: Channel
GitHub: Channel

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GEFS-language-detector

Download the model weights from the Hugging Face

Load model directly from Hugging Face using Transformers

German, English, French and Spanish Language Detector

100K downloads of the LLM model

Use a pipeline as a high-level helper

Predicted output:

Supported languages

Model Training

Testing Results

About Author

About

Releases

Packages

Imran-ml/GEFS-language-detector

Folders and files

Latest commit

History

Repository files navigation

GEFS-language-detector

Download the model weights from the Hugging Face

Load model directly from Hugging Face using Transformers

German, English, French and Spanish Language Detector

100K downloads of the LLM model

Use a pipeline as a high-level helper

Predicted output:

Supported languages

Model Training

Testing Results

About Author

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages