Image description with OpenAI #81

mgax · 2024-03-05T14:28:46Z

Adds a magic button to generate an image title by asking the AI to describe the image. Fixes. #32

Based on Initial test of OpenAI alt-text generation #47 by @tm-kn.
Icon animation thanks to @Morsey187 and @jhancock532.
Refactored the existing logic that assumed all AI calls would be text completion.
Lots of commits; I think it would be good to squash the branch when merging.

You can test with the included testapp. By default it will use the echo backend, but you can use OpenAI by exporting two env vars:

export WAGTAIL_AI_DEFAULT_BACKEND=chatgpt
export OPENAI_API_KEY=...

https://www.loom.com/share/ce35e23565a74c4589e86e0670e6df90

Screen.Recording.2024-03-08.at.17.07.18.mov

src/wagtail_ai/views.py

tomusher

This is looking great to me @mgax! Really excited to get this in.

I've left some initial comments but I'll do a full test soon and add any additional feedback as it comes up.

docs/editor-integration.md

docs/images-integration.md

src/wagtail_ai/ai/openai.py

src/wagtail_ai/forms.py

src/wagtail_ai/static_src/image_description/main.tsx

src/wagtail_ai/views.py

tomusher · 2024-03-08T17:41:19Z

docs/images-integration.md

+1. In the Django project settings, configure an AI backend, and a model, that support images. Set `IMAGE_DESCRIPTION_BACKEND` to the name of the model:
+   ```python
+   WAGTAIL_AI = {
+       "BACKENDS": {


Suggestion: This seems pretty clear but I'm not sure it's obvious enough from this example that you need to configure two separate backends if you want both image description and text completion features.

Do you think specifying a IMAGE_DESCRIPTION_MODEL_ID config setting (with a default value) on the OpenAIBackend would make it less of a burden to configure?

I've extended the docs to have an example with two backends. I don't think it would help to add another config, and have the backend switch models on the fly, that feels confusing.

This seems OK, I am still a bit stuck on the idea that the simplest configuration to get all the fancy Wagtail AI features should be:

WAGTAIL_AI = { "BACKENDS": { "default": { "CLASS": "wagtail_ai.ai.openai.OpenAIBackend", }, }, }

but appreciate that we have to balance technical complexity here too, and your configuration doesn't necessarily prevent us from doing that in the future.

We're still asking a lot of a user who might know what AI can do for them, but doesn't understand what models to use in what situations so will just be copying examples from the docs, but again I'm sure we can review at another time.

We're still asking a lot of a user who might know what AI can do for them, but doesn't understand what models to use in what situations so will just be copying examples from the docs, but again I'm sure we can review at another time.

I agree it's not the best experience, and ideally they should be able to just specify a default backend with minimal configuration. But, keeping in mind that the model we're suggesting for images is called gpt-4-vision-preview, I think we can hold off until the functionality becomes part of the mainline offering.

docs/images-integration.md

src/wagtail_ai/static_src/image_description/main.tsx

src/wagtail_ai/views.py

tm-kn · 2024-03-11T09:36:26Z

src/wagtail_ai/views.py

+
+    try:
+        ai_response = backend.describe_image(image_file=rendition.file, prompt=prompt)
+        description = ai_response.text()


Suggestion: We could limit the scope of what's inside try/except block.

I think the failure modes of the .text() method call are, for the most part, AI-related, so it should be inside the try/except block.

src/wagtail_ai/views.py

tm-kn · 2024-03-11T09:42:14Z

src/wagtail_ai/static_src/image_description/main.tsx

+      const formData = new FormData();
+      formData.append('image_id', imageId);
+      formData.append('csrfmiddlewaretoken', csrfToken);
+      input.value = await fetchResponse('DESCRIBE_IMAGE', formData);


Question: Would there be an option for custom alt-text fields this way or are we limited by Wagtail in only being allowed to use the title field?

I'm not sure if this is what you're asking, but the JS code will happily work with any text input field, if your custom image model has such extra fields. It's briefly mentioned in the docs: https://github.com/wagtail/wagtail-ai/pull/81/files#diff-da6a4f33a37dc72141b415345529ef7c066f01a4413477e569f89ddd51d39276R46.

src/wagtail_ai/static_src/image_description/main.tsx

tm-kn · 2024-03-11T09:58:54Z

src/wagtail_ai/static_src/api.tsx

+
+export const fetchResponse = async (
+  action: keyof typeof ApiUrlName,
+  body: FormData,


Question: Out of curiosity, why do we prefer FormData instead of JSON? Is that so we could send files in the future or is that what is a custom in Wagtail?

I was wondering this too, but went with what the text completion endpoint was already doing.

Feels like an extra step to parse JSON in the view, whereas with FormData we're just passing request.POST to the form?

src/wagtail_ai/static_src/image_description/main.css

mgax · 2024-03-13T09:08:06Z

@tomusher @tm-kn thanks for the detailed feedback! I've addressed most things, and there are some open topics, not sure we should resolve them as part of this PR though.

tm-kn · 2024-03-13T09:12:17Z

src/wagtail_ai/forms.py

-        return " \n".join(errors_for_response)
+class DescribeImageApiForm(ApiForm):
+    image_id = forms.CharField()
+    maxlength = forms.IntegerField(required=False)


Suggestion: Given this is now a user-supplied argument, we should at least limit this to positive numbers only. I can see we could also set a max value of something like 4096 so this is not abused to do something else than populating short descriptions.

src/wagtail_ai/static_src/image_description/main.tsx

src/wagtail_ai/views.py

src/wagtail_ai/static_src/image_description/main.tsx

tests/views/test_describe_image.py

tomusher · 2024-03-26T11:34:56Z

Thanks @mgax - there doesn't seem to be anything major unresolved here so I'm going to get this merged in!

tm-kn and others added 7 commits February 28, 2024 18:06

Initial test of OpenAI alt-text generation

5fa4dc1

Test for DescribeImageOpenAI as it stands

9ad85e3

Rename and ssubclass OpenAIBackend from AIBackend

585111e

Look up the right backend for a feature

e0915b6

Configure OpenAIBackend using BaseAIBackendConfig

5055c4d

Show a button in the UI and make an API call

a0e949d

Generate image title on API call

a6b0f95

tomdyson reviewed Mar 5, 2024

View reviewed changes

src/wagtail_ai/views.py Outdated Show resolved Hide resolved

mgax added 7 commits March 5, 2024 18:58

Cleanup

cf01ee4

Fix tests

a9db39e

Implement text completion for OpenAIBackend

c34eef3

Check image permission; configure custom prompt

66e200d

Animated wand icon; button title

301eb7a

Documentation

e762c9f

Add image description to testapp

02a68c6

mgax marked this pull request as ready for review March 8, 2024 17:05

mgax requested a review from tomdyson March 8, 2024 17:05

tomusher reviewed Mar 8, 2024

View reviewed changes

tm-kn reviewed Mar 11, 2024

View reviewed changes

tomusher reviewed Mar 11, 2024

View reviewed changes

src/wagtail_ai/static_src/image_description/main.tsx Outdated Show resolved Hide resolved

tomusher reviewed Mar 11, 2024

View reviewed changes

src/wagtail_ai/static_src/image_description/main.tsx Outdated Show resolved Hide resolved

tm-kn reviewed Mar 11, 2024

View reviewed changes

tomusher reviewed Mar 11, 2024

View reviewed changes

src/wagtail_ai/static_src/image_description/main.css Outdated Show resolved Hide resolved

mgax added 5 commits March 13, 2024 10:30

Simplify SVG loading

b33135a

PR feedback

0cb9cf3

Configure rendition used for image description

9daa492

Get maxlength from the HTML input field

815ab08

Fix DOM dataset access

d5bb802

mgax requested a review from tomusher March 13, 2024 09:08

mgax requested a review from tm-kn March 13, 2024 09:08

tm-kn reviewed Mar 13, 2024

View reviewed changes

src/wagtail_ai/static_src/image_description/main.tsx Outdated Show resolved Hide resolved

mgax added 2 commits March 13, 2024 11:52

Style the button for dark mode

b0243d3

Max length validation

7cdf140

tm-kn reviewed Mar 13, 2024

View reviewed changes

tests/views/test_describe_image.py Show resolved Hide resolved

tm-kn reviewed Mar 13, 2024

View reviewed changes

tests/views/test_describe_image.py Show resolved Hide resolved

tomusher merged commit 49eada0 into wagtail:main Mar 26, 2024
11 checks passed

mgax deleted the image-description-openai branch March 27, 2024 09:07

tomusher mentioned this pull request Jun 17, 2024

Initial test of OpenAI alt-text generation #47

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image description with OpenAI #81

Image description with OpenAI #81

mgax commented Mar 5, 2024 •

edited

Loading

tomusher left a comment

tomusher Mar 8, 2024

mgax Mar 13, 2024

tomusher Mar 13, 2024

mgax Mar 13, 2024

tm-kn Mar 11, 2024

mgax Mar 11, 2024

tm-kn Mar 11, 2024

mgax Mar 11, 2024

tm-kn Mar 11, 2024

mgax Mar 11, 2024

tomusher Mar 13, 2024

mgax commented Mar 13, 2024

tm-kn Mar 13, 2024

tomusher commented Mar 26, 2024

Image description with OpenAI #81

Image description with OpenAI #81

Conversation

mgax commented Mar 5, 2024 • edited Loading

tomusher left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mgax commented Mar 13, 2024

Choose a reason for hiding this comment

tomusher commented Mar 26, 2024

mgax commented Mar 5, 2024 •

edited

Loading