ERROR: No document breaks were found in the input file! These are necessary to allow the script to ensure that random NextSentences are not sampled from the same document. Please add blank lines to indicate breaks between documents in your input file. If your dataset does not contain multiple documents, blank lines can be inserted at any natural boundary, such as the ends of chapters, sections or paragraphs. #24

ChhXiitaa · 2022-11-06T02:45:25Z

感谢您的开源
我想知道我怎么将我自己的数据集处理成N-gram.txt

GuiminChen · 2022-11-06T02:45:48Z

您好，您的来信已收到，我会尽快回复您的邮件。 ============ 祝您生活愉快！

shizhediao · 2023-09-20T18:48:50Z

您好，
感谢关注我们的工作，有多种不同的方法可以构建ngram字典。其中一种可以参考这篇文章里用到的PMI方法 (Section 3.1)，相关代码也已开源 https://aclanthology.org/2021.acl-long.259.pdf

ChhXiitaa closed this as completed Nov 6, 2022

ChhXiitaa reopened this Nov 6, 2022

Provide feedback