AI development experiment #8
Replies: 3 comments 1 reply
-
Hi @rezzie-rich Great initiative on leveraging the mentat-bot for our AI development experiment! Breaking down the project into smaller, manageable modules is a smart approach and aligns well with agile methodologies. Regarding the generation of usable code, aiming for 60% directly usable code seems reasonable given the current capabilities of AI coding tools. I agree that the focus should initially be on getting a solid boilerplate and framework set up. This can serve as a strong foundation, which we can then iteratively improve upon. As you mentioned, even if the experiment doesn't fully succeed, the process will provide valuable insights into the practical challenges AI developers face, which is crucial for our learning and development strategy. To make the most out of your remaining subscription, perhaps we can prioritize tasks that maximize the bot's output quality. For instance, focusing on generating code for less complex, more structured tasks could enhance the overall quality of the PRs. Let's discuss this in our next sync to outline a clear action plan for the coming weeks. Looking forward to your thoughts! |
Beta Was this translation helpful? Give feedback.
-
Oops, I missed this discussion (and also the comment and the closed event were missed due to a bug in the Telegram Gmail Bot). The vision is to leverage SLMs effectively and work towards solving most of the issues on the SWE-Bench Lite evaluation. |
Beta Was this translation helpful? Give feedback.
-
@rezzie-rich on SWE-Bench currently, which proprietary services have the best performance-to-price ratio? Is it possible to create a system where AI reviewers can examine the code written by different systems and file reports accordingly? |
Beta Was this translation helpful? Give feedback.
-
@SmartManoj, I still have almost 2 weeks of mentat-bot subscription left where I can make 10 commits a day.
If we make a comprehensive vision of the ideal project, we can break down that vision into smaller modules.
I lack advanced coding knowledge but I have problem-solving experience from many years including project management, and with your technical experiences, we can break those modules into smaller manageable issues as instructions for mentat to generate code.
I assume the generated code will not be perfect as-is. however, I do believe it should be able to generate at least 60% usable code. You can polish those generated codes and merge them as its ready.
mentat-bot has a 38% score on swe-bench lite eval. It should be able to at least generate the boilerplate to develop the project. Worst case scenario we will have a failed experiment result which could help us understand why AI developers aren't practical yet. On the best side, I have 2 weeks of subscription left with no proper usage, might just create a bunch of drafts. 10 commits a day means 140 commits in 2 weeks. it should at least give out 4-5 solid PRs.
Beta Was this translation helpful? Give feedback.
All reactions