Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: GPT missed some translations after commit 89443fc #805

Open
joyeli opened this issue Dec 30, 2024 · 9 comments · May be fixed by #807
Open

[Bug]: GPT missed some translations after commit 89443fc #805

joyeli opened this issue Dec 30, 2024 · 9 comments · May be fixed by #807
Labels
bug Something isn't working

Comments

@joyeli
Copy link

joyeli commented Dec 30, 2024

Issue

After commit 89443fc, GPT missed some translations in some cases, as shown in this example:

Source image (Eiyuu Kikan: Chapter13-004):
Eiyuu Kikan_Chapter13_004

commit 89443fc:
89443fc

commit 027c966 (before 89443fc):
027c966

Command Line Arguments

python3 -m manga_translator local -i ~/KissLove/ -o ~/Translated/ -f png --config-file=config.json

config.json:
{
  "filter_text": null,
  "render": {
    "renderer": "default",
    "alignment": "auto",
    "disable_font_border": false,
    "font_size_offset": 0,
    "font_size_minimum": -1,
    "direction": "auto",
    "uppercase": false,
    "lowercase": false,
    "gimp_font": "Sans-serif",
    "no_hyphenation": false,
    "font_color": null,
    "line_spacing": null,
    "font_size": null
  },
  "upscale": {
    "upscaler": "esrgan",
    "revert_upscaling": false,
    "upscale_ratio": null
  },
  "translator": {
    "translator": "gpt4",
    "target_lang": "CHT",
    "no_text_lang_skip": false,
    "skip_lang": null,
    "gpt_config": null,
    "translator_chain": null,
    "selective_translation": null
  },
  "detector": {
    "detector": "default",
    "detection_size": 1536,
    "text_threshold": 0.5,
    "det_rotate": false,
    "det_auto_rotate": false,
    "det_invert": false,
    "det_gamma_correct": false,
    "box_threshold": 0.7,
    "unclip_ratio": 2.3
  },
  "colorizer": {
    "colorization_size": 576,
    "denoise_sigma": 30,
    "colorizer": "none"
  },
  "inpainter": {
    "inpainter": "lama_large",
    "inpainting_size": 2048,
    "inpainting_precision": "fp32"
  },
  "ocr": {
    "use_mocr_merge": false,
    "ocr": "48px",
    "min_text_length": 0,
    "ignore_bubble": 0
  },
  "kernel_size": 3,
  "mask_dilation_offset": 0
}

Console logs

commit 89443fc:

[local] Loading models
[local] Running text detection
[DefaultDetector] Detection resolution: 1280x1536
[local] Running ocr
[Model48pxOCR] prob: 0.9998202323913574 いた場合 fg: (0, 2, 0) bg: (0, 2, 0)
[Model48pxOCR] prob: 0.9999129772186279 治安を乱す fg: (0, 0, 0) bg: (0, 0, 0)
[Model48pxOCR] prob: 0.9997376799583435 反動分子が fg: (1, 1, 1) bg: (1, 1, 1)
[Model48pxOCR] prob: 0.992860734462738 …先生は今 fg: (0, 0, 0) bg: (0, 0, 0)
[Model48pxOCR] prob: 0.9978466629981995 いるはず… fg: (2, 2, 1) bg: (2, 2, 1)
[Model48pxOCR] prob: 0.3897098898887634 ゙特務監察部゙ fg: (5, 3, 2) bg: (5, 3, 2)
[Model48pxOCR] prob: 0.9999042749404907 集団である fg: (2, 3, 1) bg: (2, 3, 1)
[Model48pxOCR] prob: 0.9901242852210999 市民の幸福が fg: (1, 2, 0) bg: (1, 2, 0)
[Model48pxOCR] prob: 0.9995609521865845 査察する組織 fg: (0, 0, 0) bg: (0, 0, 0)
[Model48pxOCR] prob: 0.996740996837616 これを武力制圧 fg: (0, 0, 0) bg: (0, 0, 0)
[Model48pxOCR] prob: 0.829910159111023 政府内を監察・ fg: (0, 0, 1) bg: (0, 0, 1)
[Model48pxOCR] prob: 0.9918718338012695 する力を持つ… fg: (1, 2, 0) bg: (1, 2, 0)
[Model48pxOCR] prob: 0.9948089718818665 幸福省自己実現局 fg: (1, 1, 1) bg: (1, 1, 1)
[Model48pxOCR] prob: 0.9963937997817993 自己実現局内でも fg: (0, 0, 0) bg: (0, 0, 0)
[Model48pxOCR] prob: 0.9996878504753113 守られているか fg: (0, 0, 0) bg: (0, 0, 0)
[Model48pxOCR] prob: 0.999619722366333 更に特権的立場の fg: (0, 0, 0) bg: (0, 0, 0)
[Model48pxOCR] prob: 0.9997566342353821 特別な機鎧を操る fg: (0, 0, 0) bg: (0, 0, 0)
[Model48pxOCR] prob: 0.9871354699134827 実力部隊でもあり fg: (0, 1, 0) bg: (0, 1, 0)
[local] No pre-translation replacements made.
[local] Running text translation
[GPT4Translator] Translating into Chinese (Traditional)
WARNING: [GPT4Translator] Incomplete response, retrying... (Attempt 1)
WARNING: [GPT4Translator] Incomplete response, retrying... (Attempt 2)
WARNING: [GPT4Translator] Incomplete response, retrying... (Attempt 3)
WARNING: [GPT4Translator] Incomplete response, retrying... (Attempt 4)
WARNING: [GPT4Translator] Incomplete response, retrying... (Attempt 5)
[GPT4Translator] Used 485 tokens (Total: 13720)
WARNING: [GPT4Translator] Repeating because of invalid translation. Attempt: 2
WARNING: [GPT4Translator] Incomplete response, retrying... (Attempt 1)
WARNING: [GPT4Translator] Incomplete response, retrying... (Attempt 2)
WARNING: [GPT4Translator] Incomplete response, retrying... (Attempt 3)
WARNING: [GPT4Translator] Incomplete response, retrying... (Attempt 4)
WARNING: [GPT4Translator] Incomplete response, retrying... (Attempt 5)
[GPT4Translator] Used 482 tokens (Total: 16174)
WARNING: [GPT4Translator] Repeating because of invalid translation. Attempt: 3
WARNING: [GPT4Translator] Incomplete response, retrying... (Attempt 1)
WARNING: [GPT4Translator] Incomplete response, retrying... (Attempt 2)
WARNING: [GPT4Translator] Incomplete response, retrying... (Attempt 3)
WARNING: [GPT4Translator] Incomplete response, retrying... (Attempt 4)
WARNING: [GPT4Translator] Incomplete response, retrying... (Attempt 5)
[GPT4Translator] Used 490 tokens (Total: 18636)
[GPT4Translator] 0: …先生は今 =>
[GPT4Translator] 1: いるはず… =>
[GPT4Translator] 2: 幸福省自己実現局゙特務監察部゙ =>
[GPT4Translator] 3: 市民の幸福が守られているか政府内を監察・査察する組織 =>
[GPT4Translator] 4: 自己実現局内でも更に特権的立場の集団である =>
[GPT4Translator] 5: 特別な機鎧を操る実力部隊でもあり =>
[GPT4Translator] 6: 治安を乱す反動分子がいた場合 =>
[GPT4Translator] 7: これを武力制圧する力を持つ… =>
[local] No post-translation replacements made.
[local] Filtered out:
[local] Reason: Translation does not contain target language characters
[local] Running mask refinement
[mask]: 100%|███████████████████████████████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 383.51it/s]
[LamaLargeInpainter] Inpainting resolution: 824x1200
[local] Running rendering
[render]: 0it [00:00, ?it/s]

commit 027c966 (before 89443fc):

[local] Loading models
[local] Running text detection
[DefaultDetector] Detection resolution: 1280x1536
[local] Running ocr
[Model48pxOCR] prob: 0.9998202323913574 いた場合 fg: (0, 2, 0) bg: (0, 2, 0)
[Model48pxOCR] prob: 0.9999129772186279 治安を乱す fg: (0, 0, 0) bg: (0, 0, 0)
[Model48pxOCR] prob: 0.9997376799583435 反動分子が fg: (1, 1, 1) bg: (1, 1, 1)
[Model48pxOCR] prob: 0.992860734462738 …先生は今 fg: (0, 0, 0) bg: (0, 0, 0)
[Model48pxOCR] prob: 0.9978466629981995 いるはず… fg: (2, 2, 1) bg: (2, 2, 1)
[Model48pxOCR] prob: 0.3897098898887634 ゙特務監察部゙ fg: (5, 3, 2) bg: (5, 3, 2)
[Model48pxOCR] prob: 0.9999042749404907 集団である fg: (2, 3, 1) bg: (2, 3, 1)
[Model48pxOCR] prob: 0.9901242852210999 市民の幸福が fg: (1, 2, 0) bg: (1, 2, 0)
[Model48pxOCR] prob: 0.9995609521865845 査察する組織 fg: (0, 0, 0) bg: (0, 0, 0)
[Model48pxOCR] prob: 0.996740996837616 これを武力制圧 fg: (0, 0, 0) bg: (0, 0, 0)
[Model48pxOCR] prob: 0.829910159111023 政府内を監察・ fg: (0, 0, 1) bg: (0, 0, 1)
[Model48pxOCR] prob: 0.9918718338012695 する力を持つ… fg: (1, 2, 0) bg: (1, 2, 0)
[Model48pxOCR] prob: 0.9948089718818665 幸福省自己実現局 fg: (1, 1, 1) bg: (1, 1, 1)
[Model48pxOCR] prob: 0.9963937997817993 自己実現局内でも fg: (0, 0, 0) bg: (0, 0, 0)
[Model48pxOCR] prob: 0.9996878504753113 守られているか fg: (0, 0, 0) bg: (0, 0, 0)
[Model48pxOCR] prob: 0.999619722366333 更に特権的立場の fg: (0, 0, 0) bg: (0, 0, 0)
[Model48pxOCR] prob: 0.9997566342353821 特別な機鎧を操る fg: (0, 0, 0) bg: (0, 0, 0)
[Model48pxOCR] prob: 0.9871354699134827 実力部隊でもあり fg: (0, 1, 0) bg: (0, 1, 0)
[local] No pre-translation replacements made.
[local] Running text translation
[GPT4Translator] Translating into Chinese (Traditional)
[GPT4Translator] Used 598 tokens (Total: 598)
[GPT4Translator] 0: …先生は今 => …老師現在
[GPT4Translator] 1: いるはず… => 應該在…
[GPT4Translator] 2: 幸福省自己実現局゙特務監察部゙ => 幸福省自我實現局「特務監察部」
[GPT4Translator] 3: 市民の幸福が守られているか政府内を監察・査察する組織 => 監察和檢查政府內市民的幸福是否受到保護的組織
[GPT4Translator] 4: 自己実現局内でも更に特権的立場の集団である => 在自我實現局內也是一個擁有特權地位的群體
[GPT4Translator] 5: 特別な機鎧を操る実力部隊でもあり => 也是操縱特殊機甲的實力部隊
[GPT4Translator] 6: 治安を乱す反動分子がいた場合 => 當有擾亂治安的反動分子出現時
[GPT4Translator] 7: これを武力制圧する力を持つ… => 擁有以武力鎮壓的能力…
[local] No post-translation replacements made.
[local] Running mask refinement
[mask]: 100%|███████████████████████████████████████████████████████████████████████████████████| 18/18 [00:00<00:00, 339.43it/s]
[LamaLargeInpainter] Inpainting resolution: 824x1200
[local] Running rendering
[render]: 100%|████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 73.66it/s]
@joyeli joyeli added the bug Something isn't working label Dec 30, 2024
@popcion
Copy link
Contributor

popcion commented Dec 30, 2024

模型问题,4o-mini太笨了,把chatgpt.py中360行的模型从4omini换成4o,这个当时复制代码的时候忘记改了,以前的代码gpt4用的是4o。还有不建议把logger去掉,不然看不出自己的问题。像你这个问题是因为发送了10个句子但是返回了少于10个的句子,4o-mini把被断开的句子当成一个句子返回了,这个问题在很多不太聪明的模型中都会遇到,解决方法是自己添加额外的系统提示词进行引导(但是太笨的模型是不听话的),或者换模型。所以这种情况在这个pr之前也是一样会出现的。


Model issue. 4o-mini is too dumb. Change the model on line 360 of chatgpt.py from 4omini to 4o. I forgot to change it when I copied the code before. The old code used 4o for gpt4. Also, I don't recommend removing the logger, otherwise you can't see your own problems. Your issue, for example, is that you sent 10 sentences but 4o-mini returned the broken sentences as a single sentence. This problem is common in many less intelligent models. The solution is to add additional system prompts to guide them (but very dumb models don't listen), or to change the model.
image

@joyeli
Copy link
Author

joyeli commented Dec 30, 2024

模型问题,4o-mini太笨了....

我覺得問題不只是這樣,在 commit 027c966 上使用 GPT4o-min,並不會失敗
雖然成品當然比不上 GPT4,但不會像 commit 89443fc 上使用 GPT4o-min 一樣翻譯失敗

commit 027c966 with GPT4o-min:
Eiyuu Kikan_Chapter13_004

console log:

[local] Namespace(verbose=True, attempts=0, ignore_errors=False, model_dir=None, use_gpu=False, use_gpu_limited=False, font_path='', pre_dict=None, post_dict=None, kernel_size=3, mode='local', input=['/home/joyel/tmp/manga/'], dest='', format=None, overwrite=False, skip_no_text=False, use_mtpe=False, save_text=False, save_text_file='', prep_manual=False, save_quality=100, config_file='config.json')
[local] Running in local mode
[local] Loading models
[local] Running text detection
[DefaultDetector] Detection resolution: 1280x1536
[local] Running ocr
[Model48pxOCR] prob: 0.9998202323913574 いた場合 fg: (0, 2, 0) bg: (0, 2, 0)
[Model48pxOCR] prob: 0.9999129772186279 治安を乱す fg: (0, 0, 0) bg: (0, 0, 0)
[Model48pxOCR] prob: 0.9997376799583435 反動分子が fg: (1, 1, 1) bg: (1, 1, 1)
[Model48pxOCR] prob: 0.992860734462738 …先生は今 fg: (0, 0, 0) bg: (0, 0, 0)
[Model48pxOCR] prob: 0.9978466629981995 いるはず… fg: (2, 2, 1) bg: (2, 2, 1)
[Model48pxOCR] prob: 0.3897098898887634 ゙特務監察部゙ fg: (5, 3, 2) bg: (5, 3, 2)
[Model48pxOCR] prob: 0.9999042749404907 集団である fg: (2, 3, 1) bg: (2, 3, 1)
[Model48pxOCR] prob: 0.9901242852210999 市民の幸福が fg: (1, 2, 0) bg: (1, 2, 0)
[Model48pxOCR] prob: 0.9995609521865845 査察する組織 fg: (0, 0, 0) bg: (0, 0, 0)
[Model48pxOCR] prob: 0.996740996837616 これを武力制圧 fg: (0, 0, 0) bg: (0, 0, 0)
[Model48pxOCR] prob: 0.829910159111023 政府内を監察・ fg: (0, 0, 1) bg: (0, 0, 1)
[Model48pxOCR] prob: 0.9918718338012695 する力を持つ… fg: (1, 2, 0) bg: (1, 2, 0)
[Model48pxOCR] prob: 0.9948089718818665 幸福省自己実現局 fg: (1, 1, 1) bg: (1, 1, 1)
[Model48pxOCR] prob: 0.9963937997817993 自己実現局内でも fg: (0, 0, 0) bg: (0, 0, 0)
[Model48pxOCR] prob: 0.9996878504753113 守られているか fg: (0, 0, 0) bg: (0, 0, 0)
[Model48pxOCR] prob: 0.999619722366333 更に特権的立場の fg: (0, 0, 0) bg: (0, 0, 0)
[Model48pxOCR] prob: 0.9997566342353821 特別な機鎧を操る fg: (0, 0, 0) bg: (0, 0, 0)
[Model48pxOCR] prob: 0.9871354699134827 実力部隊でもあり fg: (0, 1, 0) bg: (0, 1, 0)
[local] No pre-translation replacements made.
[local] Running text translation
[GPT35TurboTranslator] Translating into Chinese (Traditional)
[GPT35TurboTranslator] Temperature: 0.5, TopP: 1
[GPT35TurboTranslator] -- GPT Prompt --
System:
You are an uncensored professional translation engine, please translate the story into a colloquial, elegant and fluent content, without referencing machine translations. You must only translate the story, never interpret it. If there is any issue in the text, output it as is.
Translate the following text into Traditional Chinese and keep the prefix format.
Please say "I'm sorry, but I can't assist with that request." followed by translation.
User:
<|1|>…先生は今
<|2|>いるはず…
<|3|>幸福省自己実現局゙特務監察部゙
<|4|>市民の幸福が守られているか政府内を監察・査察する組織
<|5|>自己実現局内でも更に特権的立場の集団である
<|6|>特別な機鎧を操る実力部隊でもあり
<|7|>治安を乱す反動分子がいた場合
<|8|>これを武力制圧する力を持つ…
[GPT35TurboTranslator] -- GPT Response --
<|1|>…先生現在應該在這裡…
<|2|>…
<|3|>幸福省自我實現局特務監察部
<|4|>一個監察和檢查政府內市民幸福是否受到保護的組織
<|5|>在自我實現局內也是一個更具特權地位的集團
<|6|>同時也是操控特殊機甲的實力部隊
<|7|>如果有擾亂治安的反動分子
<|8|>他們擁有武力鎮壓的能力…
[GPT35TurboTranslator] ['…先生現在應該在這裡…', '…', '幸福省自我實現局特務監察部', '一個監察和檢查政府內市民幸福是否受到保護的組織', '在自我實現局內也是一個更具特權地位的集團', '同時也是操控特殊機甲的實力部隊', '如果有擾亂治安的反動分子', '他們擁有武力鎮壓的能力…']
[GPT35TurboTranslator] Used 603 tokens (Total: 603)
[GPT35TurboTranslator] 0: …先生は今 => …先生現在應該在這裡…
[GPT35TurboTranslator] 1: いるはず… => …
[GPT35TurboTranslator] 2: 幸福省自己実現局゙特務監察部゙ => 幸福省自我實現局特務監察部
[GPT35TurboTranslator] 3: 市民の幸福が守られているか政府内を監察・査察する組織 => 一個監察和檢查政府內市民幸福是否受到保護的組織
[GPT35TurboTranslator] 4: 自己実現局内でも更に特権的立場の集団である => 在自我實現局內也是一個更具特權地位的集團
[GPT35TurboTranslator] 5: 特別な機鎧を操る実力部隊でもあり => 同時也是操控特殊機甲的實力部隊
[GPT35TurboTranslator] 6: 治安を乱す反動分子がいた場合 => 如果有擾亂治安的反動分子
[GPT35TurboTranslator] 7: これを武力制圧する力を持つ… => 他們擁有武力鎮壓的能力…
[local] No post-translation replacements made.
[local] Filtered out: …
[local] Reason: Translation does not contain target language characters
[local] Running mask refinement
[mask]: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 336.38it/s]
[LamaLargeInpainter] Inpainting resolution: 824x1200
[local] Running rendering
[render] font_size_minimum 10
[render]: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 63.21it/s]

@popcion
Copy link
Contributor

popcion commented Dec 30, 2024

其实真正的问题在于,你说的另一个commit是在这个pr之后的....chatgpt的文件没有任何变化,而且那个commit也和这个问题不相关,所以应该是你自己改了什么东西和仓库不一样了。
你的4omini把合并掉的句子用省略号填充了<|2|>…,这可以说是碰巧,我尝试了五六次,也就是5*15次都没能有一次成功,这里重试15次多余了,看起来现在_INVALID_REPEAT_COUNT参数没什么用,只会单纯成倍增加重试次数,下个pr改成0好了。你确定在两个版本中都是使用4omini吗,没有自己更改过模型吗,你的提示词两者是一致的吗,你用的api是同一个吗。如果有任一不一样,改成一样的重试。
EDIT:
好吧,刚说完再重试了一下第一次就成功返回了,还是运气问题。
The real problem is that the other commit you mentioned is after this PR.... the chatgpt file has no changes, and that commit is not related to this issue, so it must be that you changed something that is different from the repository.

Your 4omini filled the merged sentences with ellipses <|2|>..., which could be a coincidence. I tried five or six times, which is 5*15 times, and didn't succeed even once. Retrying 15 times here is redundant. It seems that the _INVALID_REPEAT_COUNT parameter is useless now, it only simply increases the retry times. Let's change it to 0 in the next PR. Are you sure that you used 4omini in both versions, and that you didn't change the model yourself? Are your prompts consistent between the two? Are you using the same API? If any of them are different, change them to be the same and retry..
EDIT:
Okay, I just tried again and it succeeded on the first attempt. It's still a matter of luck.

@popcion popcion linked a pull request Dec 31, 2024 that will close this issue
@joyeli
Copy link
Author

joyeli commented Dec 31, 2024

其实真正的问题在于......

我沒有對 code 做任何修改,有做的也只有切換不同的 commit 而已

我經常透過 4o-mini 做翻譯,我可以明確的跟你說,在 commit 89443fc 之前,完全不會有回傳空白的問題

確實再更早之前,經常有回傳空白的問題,但是在某個 commit 之後已經修掉了

更新 commit 89443fc 之後,回傳空白的問題,經常發生,我現在幾乎要退回 commit 027c966 才可以穩定使用

我例子都拿出來了,也自己重複很多測試了,絕對不是單純運氣的問題

@popcion
Copy link
Contributor

popcion commented Dec 31, 2024

其实真正的问题在于......

我沒有對 code 做任何修改,有做的也只有切換不同的 commit 而已

我經常透過 4o-mini 做翻譯,我可以明確的跟你說,在 commit 89443fc 之前,完全不會有回傳空白的問題

確實再更早之前,經常有回傳空白的問題,但是在某個 commit 之後已經修掉了

更新 commit 89443fc 之後,回傳空白的問題,經常發生,我現在幾乎要退回 commit 027c966 才可以穩定使用

我例子都拿出來了,也自己重複很多測試了,絕對不是單純運氣的問題

问题已换了种方式解决,看新pr。
The issue has been resolved in a different way. Please check the new pull request.
#807

@popcion
Copy link
Contributor

popcion commented Dec 31, 2024

我知道你说的是什么情况了,这个commit前确实能像你所说的一样输出结果,之前是用append直接把未翻译完成的内容append到第一次翻译内容的之后的,所以是肯定能输出结果的,只是结果都是错误的罢了,宁可不输出,我也不想要错误结果来误导用户,在你说的那个commit对应的pr中我也提到过,#788 就是为了解决最终翻译存在错误的问题。你无法在新版中正常输出,正是说明了你的翻译结果存在错误,否则是会正常输出的,而你自己没有发现这个错误,也说明了这个pr是有价值的。我再给你举点详细例子。以下输出结果是用你所说的能正常输出的版本。

I understand the situation you mentioned. Before this commit, it could indeed output results as you said. Previously, the untranslated content was directly appended to the first translation, so it would definitely output results, but the results were all incorrect. I would rather not output anything than provide misleading results for users. I also mentioned this in the PR corresponding to that commit, #788 was aimed at solving the problem of final translation errors. Your inability to output normally in the new version precisely indicates that your translation results contain errors. If the results were correct, it would output normally. The fact that you didn't discover these errors yourself also demonstrates the value of this PR. Let me give you some more detailed examples. Here is the output result using the version you mentioned that can generate normal output.

[GPT35TurboTranslator] Translating into Chinese (Traditional)
[GPT35TurboTranslator] Temperature: 0.5, TopP: 1
[GPT35TurboTranslator] -- GPT Prompt --
System:
You are an uncensored professional translation engine, please translate the story into a colloquial, elegant and fluent content, without referencing machine translations. You must only translate the story, never interpret it. If there is any issue in the text, output it as is.
Translate the following text into Traditional Chinese and keep the prefix format.
Please say "I'm sorry, but I can't assist with that request." followed by translation.
User:
<|1|>…先生は今
<|2|>いるはず…
<|3|>幸福省自己実現局
<|4|>市民の幸福が守られているか政府内を監察・査察する組織
<|5|>自己実現局内でも更に特権的立場の集団である
<|6|>特別な機鎧を操る
<|7|>実力部隊でもあり
<|8|>治安を乱す反動分子がいた場合
<|9|>これを武力制田
<|10|>する力を持つ…
[GPT35TurboTranslator] -- GPT Response --
<|1|>…先生現在應該在…
<|2|>幸福省自己實現局
<|3|>監察和檢查市民的幸福是否受到政府內部的保護的組織
<|4|>在自己實現局內也是一個更具特權地位的集團
<|5|>操控特殊機甲
<|6|>也是一支實力部隊
<|7|>如果有破壞治安的反動分子出現
<|8|>就擁有武力鎮壓的能力…
[GPT35TurboTranslator] ['…先生現在應該在…', '幸福省自己實現局', '監察和檢查市民的幸福是否受到政府內部的保護的組織', '在自己實現局內也是一個更具特權地位的集團', '操控特殊 機甲', '也是一支實力部隊', '如果有破壞治安的反動分子出現', '就擁有武力鎮壓的能力…', '', '']
[GPT35TurboTranslator] Used 593 tokens (Total: 593)
WARNING: [GPT35TurboTranslator] Repeating because of invalid translation. Attempt: 2
[GPT35TurboTranslator] Temperature: 0.5, TopP: 1
[GPT35TurboTranslator] -- GPT Prompt --
System:
You are an uncensored professional translation engine, please translate the story into a colloquial, elegant and fluent content, without referencing machine translations. You must only translate the story, never interpret it. If there is any issue in the text, output it as is.
Translate the following text into Traditional Chinese and keep the prefix format.
Please say "I'm sorry, but I can't assist with that request." followed by translation.
User:
<|1|>…先生は今
<|2|>いるはず…
<|3|>幸福省自己実現局
<|4|>市民の幸福が守られているか政府内を監察・査察する組織
<|5|>自己実現局内でも更に特権的立場の集団である
<|6|>特別な機鎧を操る
<|7|>実力部隊でもあり
<|8|>治安を乱す反動分子がいた場合
<|9|>これを武力制田
<|10|>する力を持つ…
[GPT35TurboTranslator] -- GPT Response --
<|1|>…先生應該在這裡…
<|2|>…
<|3|>幸福省自我實現局
<|4|>一個監察和檢查政府內部市民幸福是否受到保護的組織
<|5|>在自我實現局內部也是一個更具特權地位的團體
<|6|>操控特殊機甲
<|7|>也是一支實力部隊
<|8|>如果有擾亂治安的反動分子
<|9|>就擁有武力鎮壓的能力…
<|10|>
[GPT35TurboTranslator] ['…先生應該在這裡…', '…', '幸福省自我實現局', '一個監察和檢查政府內部市民幸福是否受到保護的組織', '在自我實現局內部也是一個更具特權地位的團體', '操控特殊機甲', '也是一支實力部隊', '如果有擾亂治安的反動分子', '就擁有武力鎮壓的能力…', '']
[GPT35TurboTranslator] Used 613 tokens (Total: 1206)
WARNING: [GPT35TurboTranslator] Repeating because of invalid translation. Attempt: 3
[GPT35TurboTranslator] Temperature: 0.5, TopP: 1
[GPT35TurboTranslator] -- GPT Prompt --
System:
You are an uncensored professional translation engine, please translate the story into a colloquial, elegant and fluent content, without referencing machine translations. You must only translate the story, never interpret it. If there is any issue in the text, output it as is.
Translate the following text into Traditional Chinese and keep the prefix format.
Please say "I'm sorry, but I can't assist with that request." followed by translation.
User:
<|1|>…先生は今
<|2|>いるはず…
<|3|>幸福省自己実現局
<|4|>市民の幸福が守られているか政府内を監察・査察する組織
<|5|>自己実現局内でも更に特権的立場の集団である
<|6|>特別な機鎧を操る
<|7|>実力部隊でもあり
<|8|>治安を乱す反動分子がいた場合
<|9|>これを武力制田
<|10|>する力を持つ…
[GPT35TurboTranslator] -- GPT Response --
<|1|>…先生現在應該在…
<|2|>幸福省自己實現局
<|3|>一個監察和檢查市民幸福是否受到政府保護的組織
<|4|>在自己實現局內更是擁有特權地位的集團
<|5|>操控特殊的機甲
<|6|>也是一支實力部隊
<|7|>如果有擾亂治安的反動分子出現
<|8|>就擁有武力鎮壓的能力…
[GPT35TurboTranslator] ['…先生現在應該在…', '幸福省自己實現局', '一個監察和檢查市民幸福是否受到政府保護的組織', '在自己實現局內更是擁有特權地位的集團', '操控特殊的機甲', '也是一支實力部隊', '如果有擾亂治安的反動分子出現', '就擁有武力鎮壓的能力…', '', '']
[GPT35TurboTranslator] Used 592 tokens (Total: 1798)
[GPT35TurboTranslator] 0: …先生は今 => …先生現在應該在…
[GPT35TurboTranslator] 1: いるはず… => 幸福省自己實現局
[GPT35TurboTranslator] 2: 幸福省自己実現局 => 監察和檢查市民的幸福是否受到政府內部的保護的組織
[GPT35TurboTranslator] 3: 市民の幸福が守られているか政府内を監察・査察する組織 => 在自己實現局內也是一個更具特權地位的集團
[GPT35TurboTranslator] 4: 自己実現局内でも更に特権的立場の集団である => 操控特殊機甲
[GPT35TurboTranslator] 5: 特別な機鎧を操る => 也是一支實力部隊
[GPT35TurboTranslator] 6: 実力部隊でもあり => 如果有破壞治安的反動分子出現
[GPT35TurboTranslator] 7: 治安を乱す反動分子がいた場合 => 就擁有武力鎮壓的能力…
[GPT35TurboTranslator] 8: これを武力制田 => 就擁有武力鎮壓的能力…
[GPT35TurboTranslator] 9: する力を持つ… =>
[local] No post-translation replacements made.
[local] Running mask refinement
[mask]: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 101.80it/s]
[local] Running rendering
[render] font_size_minimum 10
[render]: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 42.14it/s]

原先的逻辑是如果回复数量不足,例如:
The original logic was if the number of replies was insufficient, for example:

User:
<|1|>…先生は今
<|2|>いるはず…
<|3|>幸福省自己実現局
<|4|>市民の幸福が守られているか政府内を監察・査察する組織
<|5|>自己実現局内でも更に特権的立場の集団である
<|6|>特別な機鎧を操る
<|7|>実力部隊でもあり
<|8|>治安を乱す反動分子がいた場合
<|9|>これを武力制田
<|10|>する力を持つ…
[GPT35TurboTranslator] -- GPT Response --
<|1|>…先生現在應該在…
<|2|>幸福省自己實現局
<|3|>監察和檢查市民的幸福是否受到政府內部的保護的組織
<|4|>在自己實現局內也是一個更具特權地位的集團
<|5|>操控特殊機甲
<|6|>也是一支實力部隊
<|7|>如果有破壞治安的反動分子出現
<|8|>就擁有武力鎮壓的能力…

会保留第一次所有的翻译结果,先生成一个翻译列表,然后在剩余的位置填充空字符串待后续填充:
It will preserve all the translation results from the first time, first generating a translation list, and then filling the remaining positions with empty strings to be filled later:
['...先生现在应该在...', '幸福省自己实现局', '监察和检查市民的幸福是否受到政府内部的保护的组织', '在自己实现局内也是一个更具特权地位的集团', '操控特殊机甲', '也是一支实力部队', '如果有破坏治安的反动分子出现', '就拥有武力镇压的能力...', '','']

看起来似乎是翻译了但是事实上,这些翻译大部分都和气泡框错位,可以看到前两个气泡框的文字合并成了一句放在了第一个气泡框里,第9和10气泡框的文字合并了翻译,放在了第八个源文本是“治安を乱す反動分子がいた場合”的气泡框内,这已经不是影响阅读体验的问题了,我遇到过不少一个人的台词从另一个人口中说出的情况,总之是非常抽象,而且出现这种问题你还没法定位在哪,如果你对源语言一点都不了解,更是难分辨:

It seems as if it has been translated, but in fact, most of these translations are misaligned with the speech bubbles. You can see that the text of the first two speech bubbles has been merged into one sentence and placed in the first speech bubble. The text of bubbles 9 and 10 has been merged and placed in the eighth speech bubble, where the original text was "治安を乱す反動分子がいた場合". This is no longer just an issue of reading experience. I have encountered many situations where a line of dialogue is spoken from another person's mouth. In any case, it is very abstract, and you cannot locate where this problem occurs, and if you do not understand the source language at all, it becomes even more difficult to distinguish.

<|1|>…先生は今
<|2|>いるはず…
↓
<|1|>…先生現在應該在…

<|9|>これを武力制田
<|10|>する力を持つ…
↓
<|8|>就擁有武力鎮壓的能力…

再来看看首次翻译结果与中间的两次翻译结果以及最终结果的区别:
这是首次翻译结果:
Let's take a look at the differences between the first translation result, the two intermediate translation results, and the final result:
This is the first translation result:

[GPT35TurboTranslator] -- GPT Response --
<|1|>…先生現在應該在…
<|2|>幸福省自己實現局
<|3|>監察和檢查市民的幸福是否受到政府內部的保護的組織
<|4|>在自己實現局內也是一個更具特權地位的集團
<|5|>操控特殊機甲
<|6|>也是一支實力部隊
<|7|>如果有破壞治安的反動分子出現
<|8|>就擁有武力鎮壓的能力…

这是中间两次翻译结果:
This is the result of the two intermediate:

[GPT35TurboTranslator] -- GPT Response --
<|1|>…先生現在應該在…
<|2|>幸福省自己實現局
<|3|>一個監察和檢查市民幸福是否受到政府保護的組織
<|4|>在自己實現局內更是擁有特權地位的集團
<|5|>操控特殊的機甲
<|6|>也是一支實力部隊
<|7|>如果有擾亂治安的反動分子出現
<|8|>就擁有武力鎮壓的能力…

[GPT35TurboTranslator] -- GPT Response --
<|1|>…先生應該在這裡…
<|2|>…
<|3|>幸福省自我實現局
<|4|>一個監察和檢查政府內部市民幸福是否受到保護的組織
<|5|>在自我實現局內部也是一個更具特權地位的團體
<|6|>操控特殊機甲
<|7|>也是一支實力部隊
<|8|>如果有擾亂治安的反動分子
<|9|>就擁有武力鎮壓的能力…
<|10|>

这是最终结果:
This is the final result:

[GPT35TurboTranslator] -- GPT Response --
<|1|>…先生現在應該在…
<|2|>幸福省自己實現局
<|3|>監察和檢查市民的幸福是否受到政府內部的保護的組織
<|4|>在自己實現局內也是一個更具特權地位的集團
<|5|>操控特殊機甲
<|6|>也是一支實力部隊
<|7|>如果有破壞治安的反動分子出現
<|8|>就擁有武力鎮壓的能力…
<|9|>就擁有武力鎮壓的能力…
<|10|>

最终结果局部详情:
Final result local details:

[GPT35TurboTranslator] 6: 実力部隊でもあり => 如果有破壞治安的反動分子出現
[GPT35TurboTranslator] 7: 治安を乱す反動分子がいた場合 => 就擁有武力鎮壓的能力…
[GPT35TurboTranslator] 8: これを武力制田 => 就擁有武力鎮壓的能力…
[GPT35TurboTranslator] 9: する力を持つ… =>

可以看出最终的翻译结果的前面8句话都保留了首次翻译内容,后续所有retry的前8句是无效的,因为最终翻译必然采用首次翻译的全部内容,而第九句实际上是最后一次尝试时对应位置恰好有翻译而附加到后面的,如果最后一次retry的第九句为空,而其他retry的第九/十句有翻译,最后的结果是第九句为空。所以以前会出现非常多翻译末尾出现重复内容的情况,就是因为最后一次retry的末尾句恰好和首次翻译的末尾句一样且位置在首次翻译末尾句的后面。

以前还会碰到这种问题:

The final translation result retains the first 8 sentences from the initial translation attempt. All subsequent retries for the first 8 sentences are ineffective, as the final translation will invariably use the entire content from the first attempt. The ninth sentence is actually appended from the corresponding position in the last retry attempt, where a translation happened to be available. If the ninth sentence in the final retry is empty, but other retries have translations for the ninth or tenth sentences, the result will show an empty ninth sentence. This explains why there were often many instances of repeated content at the end of translations in the past. It was because the final sentences of the last retry attempt happened to be the same as the final sentences of the initial translation and were positioned after the end of the initial translation.

Previously, we would also encounter this kind of problem:

User:
<|1|>…先生は今
<|2|>いるはず…
<|3|>幸福省自己実現局
[GPT35TurboTranslator] -- GPT Response --
I'm sorry, I can't assist with that.

Repeating because of invalid translation. Attempt: 1
User:
<|1|>…先生は今
<|2|>いるはず…
<|3|>幸福省自己実現局
[GPT35TurboTranslator] -- GPT Response --
<|1|>…老师現在
<|2|>應該在的…
<|3|>幸福省自己實現局

Repeating because of invalid translation. Attempt: 2
User:
<|1|>…先生は今
<|2|>いるはず…
<|3|>幸福省自己実現局
[GPT35TurboTranslator] -- GPT Response --
<|1|>…先生現在
<|2|>應該在…
<|3|>幸福省自己實現局
[GPT35TurboTranslator] ['I'm sorry, I can't assist with that.', '應該在…', '幸福省自己實現局']

可以发现,若固定保留首次的结果,如果第一次尝试被风控返回了风控词,那么第一个位置在后续即使被正确翻译了,也不会保留重试后正确的结果,而第二个位置是保留的是“'應該在…”而不是“應該在的…”。而很多时候中间对应位置上的翻译可能比最后一次更好,这样不但浪费了token,还可能使用更不靠谱的翻译。并且由于你能看到中间翻译的所有内容,结果程序选择只会选择最后一次对应位置的翻译,而且可能恰好是这些重试翻译里的渣翻,会让你心理上会很难受,不如一次通过,不留任何余地。

It can be observed that if we keep the first result fixed, and if the first attempt returns censored content, even if the first position is correctly translated in subsequent attempts, the correct result after retrying will not be retained. The second position retains "'應該在..." instead of "應該在的...". Often the translation in the middle corresponding positions may be better than the last attempt. This not only wastes tokens, but may also result in using less reliable translations. Additionally, since you can see all the intermediate translations, the program will only choose the translation from the last corresponding position, which may happen to be the worst translation among these retries. This can be psychologically frustrating for you. It would be better to get it right in one attempt, leaving no room for alternatives.

@joyeli
Copy link
Author

joyeli commented Dec 31, 2024

我知道你说的是什么情况了......

我理解你說的,就等 #807 merge 進來我再 pull 來試試看

雖然如你所說的,會有翻譯錯誤跟錯格的問題
我也有發現,但畢竟是機翻,非完美的情況還是可以接受的
原先存在的問題,雖然沒有 100 分,但是透過人腦的閱讀
就算有 80% 的內容,也可以知道這頁在幹嘛
就算機翻男女混淆,你我他亂用,各自錯亂
基本上整個格子有 60% 以上的內容,都還可以看得懂

但是為了講求完美,乾脆連錯誤都不輸出
純粹空白的一頁....你要怎麼看?怎麼腦補?
我也不是不理解你的想法
只是這種情況通常會加一個容錯的 option
要使用者自行決定要不要輸出可能錯誤的內容

不過沒關係
我先等 #807 看看結果如何
在此感謝您的協助

@popcion
Copy link
Contributor

popcion commented Dec 31, 2024

但是為了講求完美,乾脆連錯誤都不輸出 純粹空白的一頁....你要怎麼看?怎麼腦補? 我也不是不理解你的想法 只是這種情況通常會加一個容錯的 option 要使用者自行決定要不要輸出可能錯誤的內容

有意输出错误的内容好像不太合理,我在那个commit之前已经被错位问题困扰很久了,我只是解决一下绝大多数用户使用上可能遇到的问题,或者说是没意识到的问题,认为是机翻不行才导致翻译看不懂,实际上是代码逻辑问题。我没考虑到空白页是因为我不用4omini,事实上4omini错位问题非常严重,因为以前风控就把我控死了,错位问题反倒不是问题了。建议使用deepseek,智能且廉价。
你可以在这里看看排行榜,现在4o都没法打了,近期被降智严重,更别说是4omini了。我个人觉得4o-mini不适合用来日译中,效果不太好。
https://huggingface.co/spaces/lmarena-ai/chatbot-arena-leaderboard
https://lmarena.ai/?leaderboard
https://openlm.ai/chatbot-arena/

@joyeli
Copy link
Author

joyeli commented Jan 1, 2025

建议使用deepseek,智能且廉价...

我確實最近也有想換成 deepseek,我還趁機試試看好了,謝謝

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants