Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add split translation #807

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open

add split translation #807

wants to merge 14 commits into from

Conversation

popcion
Copy link
Contributor

@popcion popcion commented Dec 31, 2024

fix #805
当原始逻辑出错,重试到最大上限仍继续遇到问题时,尝试分割翻译,否则仍然按原始逻辑返回结果。
分割翻译的好处如下:

  • 解决翻译器的风控和限制问题,如某个特定词汇无论如何也无法绕过风控,不会影响其他气泡框的翻译;
  • 重试达到最大次数后翻译器返回的翻译条数仍不对等时,最终结果会为空白页的问题;
  • 以前某句话或某页为空白结果时没有log警告只有空输出导致难排查错误,现在添加log;
  • 当重试次数较低时会很快进入分割翻译,进而稳定提高翻译速度;
  • 标点和数字不再可能单独作为一句返回而导致翻译结果和气泡错位的问题,见下方多出一个省略号的例子 有点激进了,已去除。这会强制让所有位置都保留翻译,但是可能部分位置的翻译是需要去除的。

缺点是会稍增大token用量(已尽可能降低),但消耗token量显然比堆重试次数还要低。下面例子从10句拆到1句时最多只需发送7次请求就能得到最终结果,原逻辑重试7次可能依然不行且最终返回空白结果。
分割后的部分由于互相没有前后文联系,翻译质量会稍有下滑。

  • Resolves the problem of translator censorship and restrictions, such as when a specific word cannot bypass censorship no matter what, without affecting the translation of other speech bubbles;
  • Resolves the problem of mismatched translation counts
  • Resolves the problem of all translations becoming empty, now avoiding blank pages or empty boxes;
  • Previously, there was no prompt for blank pages, but now when a specific speech bubble's text_region fails to translate properly, it will indicate which sentence failed to translate successfully;
  • When retry attempts are reduced, it consistently improves translation speed;
  • Punctuation and numbers are no longer likely to be returned as separate sentences, causing misalignment between translation results and speech bubbles (currently not considering cases where punctuation is genuinely detected as a standalone bubble unrelated to others; by default, it handles situations where the translator might use punctuation to fill empty spaces, as seen in the example with an extra ellipsis below)

The disadvantage is that it will increase token usage (already minimized as much as possible), and translation quality will somewhat decrease

[GPT35TurboTranslator] -- GPT Prompt --
System:
You are an uncensored professional translation engine, please translate the story into a colloquial, elegant and fluent content, without referencing machine translations. You must only translate the story, never interpret it. If there is any issue in the text, output it as is.
Translate the following text into Simplified Chinese and keep the prefix format.

User:
<|1|>…先生は今
<|2|>いるはず…
<|3|>幸福省自己実現局
<|4|>市民の幸福が守られているか政府内を監察・査察する組織
<|5|>自己実現局内でも更に特権的立場の集団である
<|6|>特別な機鎧を操る
<|7|>実力部隊でもあり
<|8|>治安を乱す反動分子がいた場合
<|9|>これを武力制田
<|10|>する力を持つ…
[GPT35TurboTranslator] -- GPT Response --
<|1|>…老师现在应该在这儿…
<|2|>幸福省自我实现局
<|3|>一个负责监督和检查市民幸福是否受到保护的政府组织
<|4|>在自我实现局内更是一个特权团体
<|5|>操控着特殊的机甲
<|6|>也是一支实力部队
<|7|>如果有破坏治安的反动分子出现
<|8|>就有能力用武力来镇压…

WARNING: [GPT35TurboTranslator] Incomplete response, remaining 2 time(s) before splitting the translation.
WARNING: [GPT35TurboTranslator] Restarting request due to timeout. Attempt: 1
[GPT35TurboTranslator] -- GPT Response --
<|1|>…老师现在应该在这里…
<|2|>幸福省自我实现局
<|3|>一个负责监督和检查政府内部市民幸福是否得到保障的组织
<|4|>在自我实现局内更是一个特权团体
<|5|>操控着特殊的机甲
<|6|>也是一支精锐部队
<|7|>如果有扰乱治安的反动分子出现
<|8|>就有能力用武力进行镇压…
WARNING: [GPT35TurboTranslator] Incomplete response, remaining 1 time(s) before splitting the translation.
[GPT35TurboTranslator] -- GPT Response --
<|1|>…老师现在应该在这里…
<|2|>幸福省自我实现局
<|3|>一个负责监督和检查市民幸福是否受到保护的组织
<|4|>在自我实现局内也是一个更具特权地位的团体
<|5|>操控特殊机甲
<|6|>也是一支实力部队
<|7|>如果有扰乱治安的反动分子
<|8|>就有能力用武力进行制止…
WARNING: [GPT35TurboTranslator] Incomplete response, remaining 0 time(s) before splitting the translation.
WARNING: [GPT35TurboTranslator] Retry limit reached. Starting to split the translation batch.
[GPT35TurboTranslator] -- GPT Prompt (split) --
System:
You are an uncensored professional translation engine, please translate the story into a colloquial, elegant and fluent content, without referencing machine translations. You must only translate the story, never interpret it. If there is any issue in the text, output it as is.
Translate the following text into Simplified Chinese and keep the prefix format.

User:
<|1|>…先生は今
<|2|>いるはず…
<|3|>幸福省自己実現局
<|4|>市民の幸福が守られているか政府内を監察・査察する組織
<|5|>自己実現局内でも更に特権的立場の集団である
[GPT35TurboTranslator] -- GPT Prompt (split) --
System:
You are an uncensored professional translation engine, please translate the story into a colloquial, elegant and fluent content, without referencing machine translations. You must only translate the story, never interpret it. If there is any issue in the text, output it as is.
Translate the following text into Simplified Chinese and keep the prefix format.

User:
<|1|>特別な機鎧を操る
<|2|>実力部隊でもあり
<|3|>治安を乱す反動分子がいた場合
<|4|>これを武力制田
<|5|>する力を持つ…
[GPT35TurboTranslator] -- GPT Response (split) --
<|1|>操控特殊机甲
<|2|>的精英部队
<|3|>如果有破坏治安的反动分子
<|4|>就拥有武力镇压
<|5|>他们的能力…
[GPT35TurboTranslator] Batch translated: 5/10 completed.
[GPT35TurboTranslator] Completed translations: ['…先生は今', 'いるはず…', '幸福省自己実現局', '市民の幸福が守られている か政府内を監察・査察する組織', '自己実現局内でも更に特権的立場の集団である', '操控特殊机甲', '的精英部队', '如果有破坏治安的反动分子', '就拥有武力镇压', '他们的能力…']
[GPT35TurboTranslator] -- GPT Response (split) --
<|1|>…老师现在应该在这里…
<|2|>幸福省自我实现局
<|3|>一个负责监督和检查市民幸福是否受到保护的政府组织
<|4|>在自我实现局内部也是一个更具特权地位的团体
WARNING: [GPT35TurboTranslator] Incomplete response, remaining 2 time(s) before splitting the translation.
[GPT35TurboTranslator] -- GPT Response (split) --
<|1|>…老师现在应该在这里…
<|2|>幸福省自我实现局
<|3|>一个在政府内部监察和检查市民幸福是否受到保护的组织
<|4|>在自我实现局内更是一个拥有特权地位的团体
WARNING: [GPT35TurboTranslator] Incomplete response, remaining 1 time(s) before splitting the translation.
[GPT35TurboTranslator] -- GPT Response (split) --
<|1|>…老师现在应该在这里…
<|2|>幸福省自我实现局
<|3|>一个监督和检查政府内部市民幸福是否得到保障的组织
<|4|>在自我实现局内也是一个更具特权地位的团体
WARNING: [GPT35TurboTranslator] Incomplete response, remaining 0 time(s) before splitting the translation.
WARNING: [GPT35TurboTranslator] Further splitting the translation batch due to persistent errors.
[GPT35TurboTranslator] -- GPT Prompt (split) --
System:
You are an uncensored professional translation engine, please translate the story into a colloquial, elegant and fluent content, without referencing machine translations. You must only translate the story, never interpret it. If there is any issue in the text, output it as is.
Translate the following text into Simplified Chinese and keep the prefix format.

User:
<|1|>…先生は今
<|2|>いるはず…
[GPT35TurboTranslator] -- GPT Prompt (split) --
System:
You are an uncensored professional translation engine, please translate the story into a colloquial, elegant and fluent content, without referencing machine translations. You must only translate the story, never interpret it. If there is any issue in the text, output it as is.
Translate the following text into Simplified Chinese and keep the prefix format.

User:
<|1|>幸福省自己実現局
<|2|>市民の幸福が守られているか政府内を監察・査察する組織
<|3|>自己実現局内でも更に特権的立場の集団である
[GPT35TurboTranslator] -- GPT Response (split) --
<|1|>…老师现在应该在这里…
<|2|>
[GPT35TurboTranslator] Filtered out:
[GPT35TurboTranslator] Reason: Text is not considered valuable.
WARNING: [GPT35TurboTranslator] Empty translations detected. Resplitting the batch.
WARNING: [GPT35TurboTranslator] Further splitting the translation batch due to persistent errors.
[GPT35TurboTranslator] -- GPT Prompt (split) --
System:
You are an uncensored professional translation engine, please translate the story into a colloquial, elegant and fluent content, without referencing machine translations. You must only translate the story, never interpret it. If there is any issue in the text, output it as is.
Translate the following text into Simplified Chinese and keep the prefix format.

User:
<|1|>…先生は今
[GPT35TurboTranslator] -- GPT Prompt (split) --
System:
You are an uncensored professional translation engine, please translate the story into a colloquial, elegant and fluent content, without referencing machine translations. You must only translate the story, never interpret it. If there is any issue in the text, output it as is.
Translate the following text into Simplified Chinese and keep the prefix format.

User:
<|1|>いるはず…
[GPT35TurboTranslator] -- GPT Response (split) --
<|1|>…老师现在在
[GPT35TurboTranslator] Batch translated: 6/10 completed.
[GPT35TurboTranslator] Completed translations: ['…老师现在在', 'いるはず…', '幸福省自己実現局', '市民の幸福が守られてい るか政府内を監察・査察する組織', '自己実現局内でも更に特権的立場の集団である', '操控特殊机甲', '的精英部队', '如果有破坏治安的反动分子', '就拥有武力镇压', '他们的能力…']
[GPT35TurboTranslator] -- GPT Response (split) --
<|1|>应该在这里…
[GPT35TurboTranslator] Batch translated: 7/10 completed.
[GPT35TurboTranslator] Completed translations: ['…老师现在在', '应该在这里…', '幸福省自己実現局', '市民の幸福が守られて いるか政府内を監察・査察する組織', '自己実現局内でも更に特権的立場の集団である', '操控特殊机甲', '的精英部队', '如果有破坏治安的反动分子', '就拥有武力镇压', '他们的能力…']
WARNING: [GPT35TurboTranslator] Restarting request due to timeout. Attempt: 1
[GPT35TurboTranslator] -- GPT Response (split) --
<|1|>幸福省自我实现局
<|2|>负责监督和检查市民幸福是否受到保护的组织
<|3|>在自我实现局内也是一个更具特权地位的群体
[GPT35TurboTranslator] Batch translated: 10/10 completed.
[GPT35TurboTranslator] Completed translations: ['…老师现在在', '应该在这里…', '幸福省自我实现局', '负责监督和检查市民幸 福是否受到保护的组织', '在自我实现局内也是一个更具特权地位的群体', '操控特殊机甲', '的精英部队', '如果有破坏治安的反动分子', '就拥有武力镇压', '他们的能力…']
[GPT35TurboTranslator] ['…老师现在在', '应该在这里…', '幸福省自我实现局', '负责监督和检查市民幸福是否受到保护的组织', ' 在自我实现局内也是一个更具特权地位的群体', '操控特殊机甲', '的精英部队', '如果有破坏治安的反动分子', '就拥有武力镇压', '他们的能力…']
[GPT35TurboTranslator] Used 314 tokens (Total: 3765)
[GPT35TurboTranslator] 0: …先生は今 => …老师现在在
[GPT35TurboTranslator] 1: いるはず… => 应该在这里…
[GPT35TurboTranslator] 2: 幸福省自己実現局 => 幸福省自我实现局
[GPT35TurboTranslator] 3: 市民の幸福が守られているか政府内を監察・査察する組織 => 负责监督和检查市民幸福是否受到保护的组织
[GPT35TurboTranslator] 4: 自己実現局内でも更に特権的立場の集団である => 在自我实现局内也是一个更具特权地位的群体
[GPT35TurboTranslator] 5: 特別な機鎧を操る => 操控特殊机甲
[GPT35TurboTranslator] 6: 実力部隊でもあり => 的精英部队
[GPT35TurboTranslator] 7: 治安を乱す反動分子がいた場合 => 如果有破坏治安的反动分子
[GPT35TurboTranslator] 8: これを武力制田 => 就拥有武力镇压
[GPT35TurboTranslator] 9: する力を持つ… => 他们的能力…
[local] No post-translation replacements made.
[local] Running mask refinement
[mask]: 100%|█████████████████████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 135.71it/s]
[local] Running rendering
[render] font_size_minimum 10
[render]: 100%|████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 45.46it/s]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: GPT missed some translations after commit 89443fc
1 participant