[feat] add `EvolInstruct` alike methods to `camel/datagen` #1747

ZIYU-DEEP · 2025-03-08T22:56:58Z

Description

Describe your changes in detail (optional if the linked issue already contains a detailed description of the changes).

Fixes #1737. Changes made in:

./examples/datagen/evol_instruct
./camel/datagen/evol_instruct

Checklist

Go over all the following points, and put an x in all the boxes that apply.

I have read the CONTRIBUTION guide (required)
I have linked this PR to an issue using the Development section on the right sidebar or by adding Fixes #issue-number in the PR description (required)
I have checked if any dependencies need to be added or updated in pyproject.toml and poetry.lock
I have updated the tests accordingly (required for a bug fix or a new feature)
I have updated the documentation if needed:
I have added examples if this is a new feature

Notes for Reviewers

The current data handling of EvolInstruct and SelfInstruct differs and could be improved. Let's discuss how to better align them with a base class?

review-notebook-app · 2025-03-08T22:57:03Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

ZIYU-DEEP · 2025-03-08T23:54:11Z

camel/datagen/evol_instruct/evol_instruct.py

+                # simulate random scores in range (1, 10) for now
+                scores = [random.randint(1, 10) for _ in batch_results[1:]] if keep_original else [random.randint(1, 10) for _ in batch_results]
+            else:
+                # TODO: implement instruction scoring module, e.g., complexity/quality scorer or by reward advantage


left a future feature on scorer which evaluates instructions, that can be rule-based or by a generative agent. some references:

https://arxiv.org/pdf/2312.15685 using instruction complexity (by llm judge) as the score

https://arxiv.org/pdf/2411.00062 using reward advantage as the score

other metric for data selection/sampling: perplexities, reward variance, ...

ZIYU-DEEP · 2025-03-09T00:31:17Z

camel/datagen/evol_instruct/templates.py

+    IN_BREADTH_KEYS = ['persona', 'shift-in', 'shift-out', 'mix', 'abstract'] 
+    IN_DEPTH_KEYS = ['constraints', 'deepening', 'concretizing', 'reasoning', 'expansion']
+
+    EVOL_METHODS = {


notes: we can define more domain-specific templates (e.g., for math/coding/...).

also, currently the evolving happens independently for each prompt (x' ~ LLM( | x, ins)); we should improve this later so that the evolving becomes multi-prompt / group based (x' ~ LLM( | a cluster of x, ins)), where the LLM can crossover and mutate in a group.

regarding the prompt groups -- some time ago, @lightaime mentioned message-passing based sampling. we can also include support for this in our pipeline.

zjrwtx

Great thanks for your work @ZIYU-DEEP ,but some docstring need to be polished

zjrwtx · 2025-03-09T04:27:36Z

camel/datagen/evol_instruct/evol_instruct.py

+        self, 
+        agent: ChatAgent,
+    ):
+        """


Suggested change

"""

r"""

ZIYU-DEEP added 9 commits March 7, 2025 23:08

start on evolinstruct

b9fe3a9

init dataclass

41ddee3

init evolinstruct

dc74b6b

fix chunk

b289db1

fix chunk

dc79534

fix evol_instruct

ed08f1b

update examples

d3b79fc

Merge branch 'camel-ai:master' into evol-draft

44eac3e

notes on evolinstruct

117bbcf

ZIYU-DEEP added the New Feature label Mar 8, 2025

ZIYU-DEEP requested review from lightaime, Wendong-Fan and hallerite March 8, 2025 22:56

ZIYU-DEEP self-assigned this Mar 8, 2025

ZIYU-DEEP commented Mar 8, 2025

View reviewed changes

ZIYU-DEEP commented Mar 9, 2025

View reviewed changes

zjrwtx reviewed Mar 9, 2025

View reviewed changes

Merge branch 'master' into evol-draft

70dec07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] add `EvolInstruct` alike methods to `camel/datagen` #1747

[feat] add `EvolInstruct` alike methods to `camel/datagen` #1747

ZIYU-DEEP commented Mar 8, 2025

review-notebook-app bot commented Mar 8, 2025

ZIYU-DEEP Mar 8, 2025

ZIYU-DEEP Mar 9, 2025

zjrwtx left a comment

zjrwtx Mar 9, 2025

[feat] add EvolInstruct alike methods to camel/datagen #1747

Are you sure you want to change the base?

[feat] add EvolInstruct alike methods to camel/datagen #1747

Conversation

ZIYU-DEEP commented Mar 8, 2025

Description

Checklist

Notes for Reviewers

review-notebook-app bot commented Mar 8, 2025

ZIYU-DEEP Mar 8, 2025

Choose a reason for hiding this comment

ZIYU-DEEP Mar 9, 2025

Choose a reason for hiding this comment

zjrwtx left a comment

Choose a reason for hiding this comment

zjrwtx Mar 9, 2025

Choose a reason for hiding this comment

[feat] add `EvolInstruct` alike methods to `camel/datagen` #1747

[feat] add `EvolInstruct` alike methods to `camel/datagen` #1747