-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Image description with OpenAI #81
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great to me @mgax! Really excited to get this in.
I've left some initial comments but I'll do a full test soon and add any additional feedback as it comes up.
1. In the Django project settings, configure an AI backend, and a model, that support images. Set `IMAGE_DESCRIPTION_BACKEND` to the name of the model: | ||
```python | ||
WAGTAIL_AI = { | ||
"BACKENDS": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: This seems pretty clear but I'm not sure it's obvious enough from this example that you need to configure two separate backends if you want both image description and text completion features.
Do you think specifying a IMAGE_DESCRIPTION_MODEL_ID
config setting (with a default value) on the OpenAIBackend
would make it less of a burden to configure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've extended the docs to have an example with two backends. I don't think it would help to add another config, and have the backend switch models on the fly, that feels confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems OK, I am still a bit stuck on the idea that the simplest configuration to get all the fancy Wagtail AI features should be:
WAGTAIL_AI = {
"BACKENDS": {
"default": {
"CLASS": "wagtail_ai.ai.openai.OpenAIBackend",
},
},
}
but appreciate that we have to balance technical complexity here too, and your configuration doesn't necessarily prevent us from doing that in the future.
We're still asking a lot of a user who might know what AI can do for them, but doesn't understand what models to use in what situations so will just be copying examples from the docs, but again I'm sure we can review at another time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're still asking a lot of a user who might know what AI can do for them, but doesn't understand what models to use in what situations so will just be copying examples from the docs, but again I'm sure we can review at another time.
I agree it's not the best experience, and ideally they should be able to just specify a default backend with minimal configuration. But, keeping in mind that the model we're suggesting for images is called gpt-4-vision-preview
, I think we can hold off until the functionality becomes part of the mainline offering.
|
||
try: | ||
ai_response = backend.describe_image(image_file=rendition.file, prompt=prompt) | ||
description = ai_response.text() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: We could limit the scope of what's inside try/except block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the failure modes of the .text()
method call are, for the most part, AI-related, so it should be inside the try/except block.
const formData = new FormData(); | ||
formData.append('image_id', imageId); | ||
formData.append('csrfmiddlewaretoken', csrfToken); | ||
input.value = await fetchResponse('DESCRIBE_IMAGE', formData); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Would there be an option for custom alt-text fields this way or are we limited by Wagtail in only being allowed to use the title field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is what you're asking, but the JS code will happily work with any text input field, if your custom image model has such extra fields. It's briefly mentioned in the docs: https://github.com/wagtail/wagtail-ai/pull/81/files#diff-da6a4f33a37dc72141b415345529ef7c066f01a4413477e569f89ddd51d39276R46.
|
||
export const fetchResponse = async ( | ||
action: keyof typeof ApiUrlName, | ||
body: FormData, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Out of curiosity, why do we prefer FormData instead of JSON? Is that so we could send files in the future or is that what is a custom in Wagtail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering this too, but went with what the text completion endpoint was already doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like an extra step to parse JSON in the view, whereas with FormData we're just passing request.POST
to the form?
src/wagtail_ai/forms.py
Outdated
return " \n".join(errors_for_response) | ||
class DescribeImageApiForm(ApiForm): | ||
image_id = forms.CharField() | ||
maxlength = forms.IntegerField(required=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Given this is now a user-supplied argument, we should at least limit this to positive numbers only. I can see we could also set a max value of something like 4096
so this is not abused to do something else than populating short descriptions.
Thanks @mgax - there doesn't seem to be anything major unresolved here so I'm going to get this merged in! |
Adds a magic button to generate an image title by asking the AI to describe the image. Fixes. #32
You can test with the included testapp. By default it will use the echo backend, but you can use OpenAI by exporting two env vars:
https://www.loom.com/share/ce35e23565a74c4589e86e0670e6df90
Screen.Recording.2024-03-08.at.17.07.18.mov