Skip to content

Generate synthetic dataset with specific documents locally.

Notifications You must be signed in to change notification settings

kime541200/SyntheticWithFiles

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

講述如何使用LLM來產生「基於特定文件」的合成資料集✨️

Table of content

Background

  • 合成資料?📃
    簡單來講就是用生成式AI來產生的資料. (詳見What is synthetic data?)

  • 為什麼需要基於特定領域的知識來產生合成資料?🤔

    1. 在企業內部有許多專業領域知識(domain knowledge)都是只有在該領域的專家才懂, 且這些資料大多都不容易閱讀.
    2. 透過微調讓LLM可以更貼近特定領域的應用場景, 而要微調便需要先準備好資料.

Pre-requirement

About

Generate synthetic dataset with specific documents locally.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published