You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to reproduce results of ExecRepoBench, but got different numbers, so I have some questions:
Is it true that verify environments (described here) command shouldn't produce any errors? When I ran this command I got 40% samples containing error in srderr field, e.g.
for /Chronyk/test_chronyk.py
warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
Traceback (most recent call last):
File "/var/tmp/tmp87jcuy04/Chronyk/evaluate_repo.py", line 27, in <module>
run_pytest_in_directory(test_path)
File "/var/tmp/tmp87jcuy04/Chronyk/evaluate_repo.py", line 8, in run_pytest_in_directory
raise Exception("Assert Error")
Exception: Assert Error
Could you share the environment in which you run the ExecRepoBench commands? Because, I’m getting slightly different numbers.
Even with a large number of errors during verification, e.g., I was able to get an average pass@1 for Qwen-Coder-7B close to yours, but still not the same (in my experiments it is 19.1, in your paper it is 19.8)
The text was updated successfully, but these errors were encountered:
Hello qwen team!
I am trying to reproduce results of ExecRepoBench, but got different numbers, so I have some questions:
srderr
field, e.g.for
/Chronyk/test_chronyk.py
Even with a large number of errors during verification, e.g., I was able to get an average pass@1 for Qwen-Coder-7B close to yours, but still not the same (in my experiments it is 19.1, in your paper it is 19.8)
The text was updated successfully, but these errors were encountered: