Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use langcodes to match to_lang to chat_sample name. #872

Closed
wants to merge 7 commits into from

Conversation

SamuelWN
Copy link
Contributor

@SamuelWN SamuelWN commented Mar 7, 2025

Change chat_sample handling:

  • Add function get_chat_sample(to_lang) to return chat_sample for requested language
  • Use langcodes to match to_lang to the closest available sample languages
    • Currently implemented with a narrow margin (max_distance=5)
      • e.g. both en-US vs en-GB and pt-BR vs pt-PT have a score langcodes distance of 5
  • Cache chat_sample[to_lang] match as variable (only need to do langcodes matching once)

Provides additional flexibility & resilience for handling chat_sample language IDs (a user-provided value) .

Additional changes:

  • Consolidate _LANGUAGE_CODE_MAP into GPTConfig
  • Cleaned up unused imports
  • chatgpt.py - Added English comments below Chinese comments

SamuelWN and others added 7 commits March 3, 2025 14:11
Change `chat_sample` handling:
- Add function `get_chat_sample(to_lang)` to return chat_sample for requested language
- Use `langcodes` to match `to_lang` to the closest available sample languages (within narrow margin)
  * `max_difference = 5` = `en-US` vs `en-GB` or `pt-BR` vs `pt-PT`
- Cache `chat_sample[to_lang]` match as variable (only need to do `langcodes` matching once)
Merge `ollama` --> `custom_openai` migration
- Default: `deepseek-chat`
- Option: `deepseek-reasoner`

If `reasoning_content` provided: Print to `debug` logger.

Add `ConfigGPT` setup.
    - https://api-docs.deepseek.com/quick_start/token_usage#calculate-token-usage-offline

Use `_assemble_prompts` function from current ChatGPT script ( zyddnys@c3bd2e9 )
    - Modified to use true token count
@SamuelWN SamuelWN changed the title Change chat_sample handling Use langcodes to match to_lang to chat_sample name. Mar 7, 2025
@SamuelWN SamuelWN closed this Mar 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant