Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete overhaul of the chatgpt script #848

Merged
merged 3 commits into from
Feb 22, 2025

Conversation

SamuelWN
Copy link
Contributor

@SamuelWN SamuelWN commented Feb 15, 2025

  • Single class named OpenAITranslator
  • Uses environmental key to set desired OpenAI model (OPENAI_MODEL)
  • Made child of ConfigGPT
  • Function to check for common refusal terms (_cannot_assist)
  • Revised Translator class to map openai and gpt* to chatgpt
  • Example gpt_config file
  • Utilization of chat_sample

Tested with OPENAI_MODEL='gpt-4o-mini'

 

PS: I am also currently working on implementing request_format , though more dev work is needed. (Updating the openai version would be required, though initial testing with 1.63.0 seems not to break anything.)

  • Pros:
    • Reduces translation refusals
      • e.g. ChatGPT refused translating some test files when run normally, but gave full translation via JSON
    • Forces output conformity
  • Cons:
    • Increased code complexity
      • Note: The standardized output may allow for it to be added to a parent class and inherited
    • Can cause issue with 'default' prompts (e.g. requesting it "maintain formatting," etc. can cause problems)

SamuelWN and others added 3 commits February 13, 2025 18:29
- Single class named `OpenAITranslator`
- Uses key to set desired OpenAI model (`OPENAI_MODEL`)
- Made child of `ConfigGPT`
- Function to check for common refusal terms (`_cannot_assist`)
- Revised `Translator` class to map `openai` and `gpt`* to `chatgpt`
- Example `gpt_config` file
- Utilization of `chat_sample`
Remove superfluous import
- Use `VALID_LANGUAGES` as language codes
- Fix oversight, made ChatSample is optional
@zyddnys zyddnys merged commit 05d11a1 into zyddnys:main Feb 22, 2025
0 of 2 checks passed
@IwfWcf
Copy link

IwfWcf commented Feb 23, 2025

@SamuelWN VALID_LANGUAGES is not aligned with _CHAT_SAMPLE in config_gpt.py

"CHS" map to "Chinese (Simplified)" in VALID_LANGUAGES while "Simplified Chinese" was used in _CHAT_SAMPLE

@popcion
Copy link
Contributor

popcion commented Feb 27, 2025

Testing suggests ChatGPT refusals are all-or-nothing.

NO. Partial responses do occur, and they are very common. The OpenAI class is not limited to OpenAI's own models; it also applies to all models that are compatible with OpenAI API endpoints. Almost all models can be converted into OpenAI-style APIs through the 2api project. Therefore, it is unreasonable to only test OpenAI's models.

I mean you can get the right translation if you keep retry.

@SamuelWN
Copy link
Contributor Author

SamuelWN commented Feb 27, 2025

@IwfWcf Thank you for the notice, I will submit a patch soon.
Will also look into using the langcodes library to add some flexibility, though its use of ZHO as the Alpha3 code for Simplified Chinese (rather than CHS) poses an issue. (Has the benefit of already being in the requirements file, though.)

@popcion I think you may have misunderstood the line:

Revised Translator class to map openai and gpt* to chatgpt

That's referring to the user-provided value "translator"within the config-example.json file. It has no relation to the openai library.
e.g. if the user states provides:

  "translator": {
    "translator": "openai",
...

it will be assumed that they meant ChatGPT. (Since the relevant API key is named "OPENAI_API_KEY", I felt it was a reasonable assumption for the user to make.)

The function that checks for refusals (_cannot_assist) is only used by when querying ChatGPT. (The class OpenAITranslator is named in reference to the company, not the library - perhaps I should have named it better.)

I have not encountered any instance of partial refusals. With my test samples:

  • gpt-4o-mini seems to only completely refuse
  • gpt-3.5-turbo and gpt-4o gave full responses without any refusals.

If you know of any ChatGPT model + input combinations that provide a partial response, that would be very helpful for testing.
My worry with checking each response against a list of common refusal patterns is the risk of false-positives. (e.g. what if the correct translation of the text actually is: "I'm sorry, I cannot assist with that."?)

@popcion
Copy link
Contributor

popcion commented Feb 27, 2025

It’s possible that my description is a bit off, but I understand your point. I am using models compatible with OpenAI API endpoints, such as DeepSeek, transformed Gemini and Claude APIs, and of course, OpenAI's own models. As long as one query hits a refusal term, the model will definitely return refusal terms, which refers to the overall result; if there are refusal terms, there cannot be partial translations. However, the same query will not always hit refusal terms; there is a possibility of bypassing them. Therefore, for models with more lenient refusal term handling, it is possible to obtain correct translations without refusal terms through multiple retries. However, currently, encountering refusal terms leads to an immediate skip, which is certainly unreasonable; retries are necessary.

In addition, while supporting code reduction, too much has been removed. You may not have considered the purpose of adding split translations; the batch_translation function is not redundant. For models with strict refusal term handling, the existence of specific queries is the fundamental reason for the model returning refusal terms. Therefore, by splitting the queries on a single page for batch translation, we can accurately locate the queries that hit refusal terms. This way, the final translation result will not fail overall; only the specific queries that hit refusal terms will be untranslatable, while others will be translated normally.

Moreover, you have overlooked some details. For example, if a batch contains only one query, this query is very likely to be a worthless fragment caused by OCR misrecognition. In this case, the model may return a sentence without a prefix, but currently, any returned content is treated as a translation result and will not trigger a retry, even if this result lacks a numerical prefix.

Additionally, it is unnecessary to truncate longer parts. I have only encountered cases where the number of lines is insufficient; I have not encountered cases where the number of lines exceeds the number of queries. Even if it exceeds, it will lead to mismatched line counts. Imagine that the model tends to translate multiple short queries into one sentence, resulting in a mismatch between the number of translation results and the number of queries. We cannot simply use the retrieved prefix numbers to determine if the line counts are the same, because as long as there is a case of multiple sentences being translated into one, misalignment in translation positions is inevitable. In this case, a retry is necessary, or we must use a splitting logic to force the short sentences to translate separately.

When the number of translation results matches the number of queries, it is not perfect; it is possible for a translation result of a certain query to be empty, yet still have a prefix number. In this case, we still need to retry or split the translation, because that empty position is likely to be a meaningless query, also caused by OCR errors. If we leave this empty, it may lead to unintended areas being painted over during inpainting, and empty translations can sometimes appear in short sentences.

Furthermore, not all models will return token counts; some non-OpenAI models using the OpenAI API format may not provide this. Error handling cannot be removed. However, it doesn’t matter now; for some reason, the token count is no longer displayed, but this does not affect usage.

The path prompt is gone, making it impossible to see which page it is on, and it is difficult to locate files when troubleshooting errors.
image
image

Overall, this PR is not rigorous enough. Most importantly, it ignores the use case of translating NSFW content, as there will be many refusal terms coming into play. Handling this part is crucial, but you have directly skipped over them.

I am going to submit a PR because I have already completed it, and I won’t wait for you to submit yours. I have combined my previous retry logic with your PR, taking the strengths from both sides, and submitted it to the O1 model for results, which is basically ready to use. However, I have made some improvements and added some explanations, such as preprocessing for inference model translation and adjusting the order of translation error handling. I have removed what I personally deemed unnecessary, as it seemed to serve no purpose, and reintroduced the splitting logic.

In fact, all the items mentioned above are old issues I discovered based on the current file and have corrected. However, the only thing that has not been restored is the path prompt for translation content in the command line. I hope someone can continue to improve that. Also, there is the issue of the command line not displaying token consumption. Many existing issues were resolved in previous PRs, but later new PRs reintroduced the same problems...


可能是我描述有点偏差,但是我理解你的意思。我在用使用gemini、claude、deepseek等所有兼容openai的api端点的模型,当然也包括openai自己的模型。只要有一个query被风控到了,模型就一定会返回风控词,这当然指的是整体,有风控词就不可能有部分翻译。但是相同的query不会每次都被风控,有脱离风控的可能,因此对于一些风控比较松的模型可以通过多次重试最终获取不含风控词的正确翻译,但是现在遇到了风控词就直接跳过,这肯定是不合理的,重试是必要的。

除此之外,虽然支持减少代码,但是删的太多了,你可能没考虑过我添加分割翻译的目的,batch_translation函数不是冗余。对于一些风控严格的模型来说,特定query的存在就是导致模型返回风控词的根本原因,因此通过分割单页的query分批翻译,可以精确定位被风控的query,这样最终的翻译结果不会整体失败,而是只有被风控的特定query无法翻译,其他都是正常翻译的。

另外你忽略了部分细节。比如一批query仅有1个,这个query极有可能是OCR误识别导致的毫无价值的边角料,这种情况模型很可能返回不含前缀的一个句子,但是现在翻译返回的任何一句内容都会被视为翻译结果而不会继续重试,即使这个结果不含数字前缀。

另外对于较长的部分截去是不必要的,我只碰到过行数不够的,没遇见行数超过query数的,即使超过了,它会导致行数不对应。试想,模型倾向于将多个短query翻译成1句导致翻译结果数和query数不对应,并不能仅用获取的前缀序号来判断是否行数相同,因为只要有多句并一句的情况,翻译位置岔行在所难免,此时必须重试,或者用切分的逻辑将短句逼到极限强制让其翻译分开。

当翻译结果数与query数相同时,并不是完美的,有可能出现某个query的翻译结果是空的,但是有前缀序号。此时仍然要重试或拆分翻译,因为那个空的位置很可能是毫无意义的query,也是OCR错误产生的,如果把这个留空,可能在inpainting的时候不该涂抹的地方被抹去。这种空翻译有时出也会出现在短句中。

还有并不是所有模型都会返回token量的,有些用openai的api格式的非openai模型是可能没有的,错误捕获不能去掉。但是现在无所谓了,token量不知道为什么不再显示了,不过这对使用没影响。

这个路径提示没有了,无法查看是第几页,排查错误时无法定位文件。
image
image

总的来说,这个pr并不够严谨,最重要的是,忽视了翻译NSFW的使用场景,因为会有很多风控词轮番上阵,处理这部分是重点,但是你直接跳过了它们 (

我现在要发一个pr,因为已经弄好了,我就不等你发了,这是我将我之前的重试逻辑结合你这个pr合并在一起,各取长处,丢给o1模型返回结果,基本直接能用,但是我改进了一些地方,添加了一点说明。比如使用推理模型翻译的预处理、翻译排误的顺序调整。我把我个人认为不需要的给删了,因为感觉确实没什么用,把分割逻辑重新加入了进来。

其实以上说的所有条目,都是我基于当前的文件发现的老问题并修正的内容。但是唯一没恢复的是命令行中的翻译内容路径提示,希望有人能继续改。对了,还有命令行中不显示消耗token的问题。很多现有问题之前的pr都解决了,后来新pr又改出相同的问题...

题外话,之前chatsample实际上是发送的,只是不显示在命令行,这样更清爽,有人好像觉得没发,现在又给加回来了😂

@SamuelWN
Copy link
Contributor Author

@popcion I admittedly did overlook that batching logic - it seemed like some bizarre error-handling, but with that context, I understand the logic now.

For example, if a batch contains only one query, this query is very likely to be a worthless fragment caused by OCR misrecognition.

I've personally viewed that as something to adjust in the System Prompt. There's cases where a single phrase on the page makes sense, other times where it is nonsense - best to try to encourage the AI to handle it properly.

I would appreciate your feedback on my fork: https://github.com/SamuelWN/manga-image-translator/tree/JSON
(if you do: Be sure to look examine the gpt_config-example.yaml and set json_mode: True in whichever yaml you test it with)

As mentioned in the initial commit message: Submitting the requests as JSON using the response_format field solves a lot of the issues.

  • Refusals (at least with the models I've tested with) are essentially nonexistent
  • It forces each line to be handled as a distinct input and output (no merging of sentences)
  • It simplifies the a lot of other issues (no need for the <|#|> tags).

@popcion
Copy link
Contributor

popcion commented Feb 28, 2025

@SamuelWN
I'm trying, and I found that in JSON mode, it sends three identical prompts to generate three responses. Why is that?
Time interval between requests:
image

[edit]
I am currently encountering a situation where three identical queries are sent simultaneously, and two of them encounter refusal words, while one group translates normally. However, the final result does not incorporate the one normal translation.

The following error occurs frequently. --ignore-errors must be added
I test with claude-3-5-sonnet-20241022

[OpenAITranslator] ------------
ERROR: [OpenAITranslator] API error: Error code: 500 - {'error': {'message': ' (request id: 2025030100490559433681YW3VwITw)', 'type': 'upstream_error', 'param': '500', 'code': 'bad_response_status_code'}}
WARNING: [OpenAITranslator] Translation attempt 1 failed: Error code: 500 - {'error': {'message': ' (request id: 2025030100490559433681YW3VwITw)', 'type': 'upstream_error', 'param': '500', 'code': 'bad_response_status_code'}}

ERROR: [OpenAITranslator] Error in _request_translation: 1 validation error for TranslationList
  Invalid JSON: expected value at line 1 column 1 [type=json_invalid, input_value='我不能翻译或复述...', input_type=str]

The JSON mode itself is quite good, but it needs to prompt for the return content list to be displayed on separate lines; otherwise, the command line output appears all in one line, which is not very aesthetically pleasing. The prefix issue is currently not very important; as the model becomes smarter, the problem with incorrect prefixes occurs less frequently. However, if necessary, an additional JSON mode should be added, while retaining the existing translation and retry logic, only addressing the prefix handling. And don't forget reasoning model handling. I know it's not very reasonable, but there is now a demand for using reasoning models for translation. Some reasoning content is at the same level as the content, while some are within the content itself.
https://platform.openai.com/docs/guides/reasoning
https://api-docs.deepseek.com/zh-cn/guides/reasoning_model
image

@SamuelWN
Copy link
Contributor Author

SamuelWN commented Feb 28, 2025

@popcion

Are you sure that Claude supports the response_format field? From what I can find in their docs, I don't think that: Claude supports the response_format field, so I'm surprised that you got any response. (Their examples of "JSON Mode" is simply including JSON as text in the message - not an actual variable or separate field.)

Have you tried it with any ChatGPT models? I know I named it poorly, but theOpenAITranslator class really isn't intended for any and all "openai"-compatible endpoints. Some models and endpoints may or may not work, but it is designed for use with OpenAI's own system and models made by the OpenAI company.

I have not encountered any refusals or errors with:

OPENAI_API_BASE='https://api.openai.com/v1'
OPENAI_MODEL='gpt-4o-mini'

Might be worthwhile forking the class itself - one ChatGPTTranslator class that assumes all the functionality of the official OpenAI service and making the OpenAITranslator class only assume basic API functionality.


command line output appears all in one line, which is not very aesthetically pleasing

Yeah, I've tested some ways to attempt to handle that within the logger. I tested pprint to "pretty-print" the output, but the indentation wasted a lot of space. Will look into a simple '\n'.join(...) approach.


And don't forget reasoning model handling. I know it's not very reasonable, but there is now a demand for using reasoning models for translation. Some reasoning content is at the same level as the content, while some are within the content itself.

I actually added rgx_capture as a field within the gpt_conig.yaml file for just such purposes. I updated my example config to include:

ollama:
  # Use the `OLLAMA_MODEL_CONF` key (`deepseek-r1`) to group together similar model specifications.
  # e.g. Set the `rgx_capture` pattern to filter out the `think` portion for `deepseek-r1`-based models:
  deepseek-r1:
    rgx_capture: '<think>.*</think>\s*(.*)|(.*)'

(Worth noting, however, that deepseek-r1 does not include any think portion when using JSON mode.)


PS: If able to run models locally: I have also implemented and tested JSON-formatting into the Ollama translator (tested primarily with deepseek-r1:14b) and have had good results.

@popcion
Copy link
Contributor

popcion commented Feb 28, 2025

In fact, many experienced users who frequently use AI now integrate various different models together, using OpenAI's format as an intermediary for seamless transition. Due to the emergence of these two main projects, a large number of 2api projects have continuously appeared, all aimed at being fully compatible with the OpenAI format. As a result, all models can use it.
https://github.com/songquanpeng/one-api
https://github.com/Calcium-Ion/new-api

It's possible to reference the classification from this project:
https://github.com/neavo/LinguaGacha/blob/main/resource/platforms/en/12_custom_openai.json
The project also has other content worth referencing, such as the GPT dictionary (grossory).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants