Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image description with OpenAI #81

Merged
merged 21 commits into from
Mar 26, 2024
Merged

Conversation

mgax
Copy link
Member

@mgax mgax commented Mar 5, 2024

Adds a magic button to generate an image title by asking the AI to describe the image. Fixes. #32

You can test with the included testapp. By default it will use the echo backend, but you can use OpenAI by exporting two env vars:

export WAGTAIL_AI_DEFAULT_BACKEND=chatgpt
export OPENAI_API_KEY=...

https://www.loom.com/share/ce35e23565a74c4589e86e0670e6df90

Screen.Recording.2024-03-08.at.17.07.18.mov

src/wagtail_ai/views.py Outdated Show resolved Hide resolved
@mgax mgax marked this pull request as ready for review March 8, 2024 17:05
@mgax mgax requested a review from tomdyson March 8, 2024 17:05
Copy link
Member

@tomusher tomusher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great to me @mgax! Really excited to get this in.

I've left some initial comments but I'll do a full test soon and add any additional feedback as it comes up.

docs/editor-integration.md Outdated Show resolved Hide resolved
docs/images-integration.md Outdated Show resolved Hide resolved
src/wagtail_ai/ai/openai.py Show resolved Hide resolved
src/wagtail_ai/forms.py Outdated Show resolved Hide resolved
src/wagtail_ai/static_src/image_description/main.tsx Outdated Show resolved Hide resolved
src/wagtail_ai/views.py Outdated Show resolved Hide resolved
1. In the Django project settings, configure an AI backend, and a model, that support images. Set `IMAGE_DESCRIPTION_BACKEND` to the name of the model:
```python
WAGTAIL_AI = {
"BACKENDS": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: This seems pretty clear but I'm not sure it's obvious enough from this example that you need to configure two separate backends if you want both image description and text completion features.

Do you think specifying a IMAGE_DESCRIPTION_MODEL_ID config setting (with a default value) on the OpenAIBackend would make it less of a burden to configure?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've extended the docs to have an example with two backends. I don't think it would help to add another config, and have the backend switch models on the fly, that feels confusing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems OK, I am still a bit stuck on the idea that the simplest configuration to get all the fancy Wagtail AI features should be:

WAGTAIL_AI = {
    "BACKENDS": {
        "default": {
            "CLASS": "wagtail_ai.ai.openai.OpenAIBackend",
        },
    },
}

but appreciate that we have to balance technical complexity here too, and your configuration doesn't necessarily prevent us from doing that in the future.

We're still asking a lot of a user who might know what AI can do for them, but doesn't understand what models to use in what situations so will just be copying examples from the docs, but again I'm sure we can review at another time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're still asking a lot of a user who might know what AI can do for them, but doesn't understand what models to use in what situations so will just be copying examples from the docs, but again I'm sure we can review at another time.

I agree it's not the best experience, and ideally they should be able to just specify a default backend with minimal configuration. But, keeping in mind that the model we're suggesting for images is called gpt-4-vision-preview, I think we can hold off until the functionality becomes part of the mainline offering.

docs/images-integration.md Outdated Show resolved Hide resolved
src/wagtail_ai/static_src/image_description/main.tsx Outdated Show resolved Hide resolved
src/wagtail_ai/static_src/image_description/main.tsx Outdated Show resolved Hide resolved
src/wagtail_ai/static_src/image_description/main.tsx Outdated Show resolved Hide resolved
src/wagtail_ai/static_src/image_description/main.tsx Outdated Show resolved Hide resolved
src/wagtail_ai/static_src/image_description/main.tsx Outdated Show resolved Hide resolved
src/wagtail_ai/views.py Outdated Show resolved Hide resolved

try:
ai_response = backend.describe_image(image_file=rendition.file, prompt=prompt)
description = ai_response.text()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: We could limit the scope of what's inside try/except block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the failure modes of the .text() method call are, for the most part, AI-related, so it should be inside the try/except block.

src/wagtail_ai/views.py Show resolved Hide resolved
src/wagtail_ai/views.py Show resolved Hide resolved
const formData = new FormData();
formData.append('image_id', imageId);
formData.append('csrfmiddlewaretoken', csrfToken);
input.value = await fetchResponse('DESCRIBE_IMAGE', formData);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Would there be an option for custom alt-text fields this way or are we limited by Wagtail in only being allowed to use the title field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is what you're asking, but the JS code will happily work with any text input field, if your custom image model has such extra fields. It's briefly mentioned in the docs: https://github.com/wagtail/wagtail-ai/pull/81/files#diff-da6a4f33a37dc72141b415345529ef7c066f01a4413477e569f89ddd51d39276R46.


export const fetchResponse = async (
action: keyof typeof ApiUrlName,
body: FormData,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Out of curiosity, why do we prefer FormData instead of JSON? Is that so we could send files in the future or is that what is a custom in Wagtail?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering this too, but went with what the text completion endpoint was already doing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like an extra step to parse JSON in the view, whereas with FormData we're just passing request.POST to the form?

@mgax
Copy link
Member Author

mgax commented Mar 13, 2024

@tomusher @tm-kn thanks for the detailed feedback! I've addressed most things, and there are some open topics, not sure we should resolve them as part of this PR though.

@mgax mgax requested a review from tomusher March 13, 2024 09:08
@mgax mgax requested a review from tm-kn March 13, 2024 09:08
return " \n".join(errors_for_response)
class DescribeImageApiForm(ApiForm):
image_id = forms.CharField()
maxlength = forms.IntegerField(required=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Given this is now a user-supplied argument, we should at least limit this to positive numbers only. I can see we could also set a max value of something like 4096 so this is not abused to do something else than populating short descriptions.

src/wagtail_ai/views.py Show resolved Hide resolved
src/wagtail_ai/views.py Show resolved Hide resolved
@tomusher
Copy link
Member

Thanks @mgax - there doesn't seem to be anything major unresolved here so I'm going to get this merged in!

@tomusher tomusher merged commit 49eada0 into wagtail:main Mar 26, 2024
11 checks passed
@mgax mgax deleted the image-description-openai branch March 27, 2024 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants