AI development experiment #8

rezzie-rich · 2024-07-31T19:27:57Z

rezzie-rich
Jul 31, 2024

@SmartManoj, I still have almost 2 weeks of mentat-bot subscription left where I can make 10 commits a day.

If we make a comprehensive vision of the ideal project, we can break down that vision into smaller modules.

I lack advanced coding knowledge but I have problem-solving experience from many years including project management, and with your technical experiences, we can break those modules into smaller manageable issues as instructions for mentat to generate code.

I assume the generated code will not be perfect as-is. however, I do believe it should be able to generate at least 60% usable code. You can polish those generated codes and merge them as its ready.

mentat-bot has a 38% score on swe-bench lite eval. It should be able to at least generate the boilerplate to develop the project. Worst case scenario we will have a failed experiment result which could help us understand why AI developers aren't practical yet. On the best side, I have 2 weeks of subscription left with no proper usage, might just create a bunch of drafts. 10 commits a day means 140 commits in 2 weeks. it should at least give out 4-5 solid PRs.

devmangel · 2024-08-03T19:48:39Z

devmangel
Aug 3, 2024

Hi @rezzie-rich

Great initiative on leveraging the mentat-bot for our AI development experiment! Breaking down the project into smaller, manageable modules is a smart approach and aligns well with agile methodologies.

Regarding the generation of usable code, aiming for 60% directly usable code seems reasonable given the current capabilities of AI coding tools. I agree that the focus should initially be on getting a solid boilerplate and framework set up. This can serve as a strong foundation, which we can then iteratively improve upon.

As you mentioned, even if the experiment doesn't fully succeed, the process will provide valuable insights into the practical challenges AI developers face, which is crucial for our learning and development strategy.

To make the most out of your remaining subscription, perhaps we can prioritize tasks that maximize the bot's output quality. For instance, focusing on generating code for less complex, more structured tasks could enhance the overall quality of the PRs. Let's discuss this in our next sync to outline a clear action plan for the coming weeks.

Looking forward to your thoughts!

1 reply

BradKML Dec 20, 2024

If you can get AI to operate with design patterns and refactoring, that would be very useful. https://refactoring.guru/design-patterns
We can also bake in anti-patterns and code smells into the AI as well, but in that case it is either a toolkit issue (e.g. scc for code complexity), or an issue in defining subtypes of bad codes that needs to be documented in a systematic way.

SmartManoj · 2024-08-10T04:55:33Z

SmartManoj
Aug 10, 2024
Maintainer

Oops, I missed this discussion (and also the comment and the closed event were missed due to a bug in the Telegram Gmail Bot).

The vision is to leverage SLMs effectively and work towards solving most of the issues on the SWE-Bench Lite evaluation.

0 replies

BradKML · 2024-12-20T03:24:26Z

BradKML
Dec 20, 2024

@rezzie-rich on SWE-Bench currently, which proprietary services have the best performance-to-price ratio? Is it possible to create a system where AI reviewers can examine the code written by different systems and file reports accordingly?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AI development experiment #8

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

AI development experiment #8

rezzie-rich Jul 31, 2024

Replies: 3 comments · 1 reply

devmangel Aug 3, 2024

BradKML Dec 20, 2024

SmartManoj Aug 10, 2024 Maintainer

BradKML Dec 20, 2024

rezzie-rich
Jul 31, 2024

Replies: 3 comments 1 reply

devmangel
Aug 3, 2024

SmartManoj
Aug 10, 2024
Maintainer

BradKML
Dec 20, 2024