-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Image description with OpenAI #81
Merged
Merged
Changes from 19 commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
5fa4dc1
Initial test of OpenAI alt-text generation
tm-kn 9ad85e3
Test for `DescribeImageOpenAI` as it stands
mgax 585111e
Rename and ssubclass OpenAIBackend from AIBackend
mgax e0915b6
Look up the right backend for a feature
mgax 5055c4d
Configure OpenAIBackend using BaseAIBackendConfig
mgax a0e949d
Show a button in the UI and make an API call
mgax a6b0f95
Generate image title on API call
mgax cf01ee4
Cleanup
mgax a9db39e
Fix tests
mgax c34eef3
Implement text completion for OpenAIBackend
mgax 66e200d
Check image permission; configure custom prompt
mgax 301eb7a
Animated wand icon; button title
mgax e762c9f
Documentation
mgax 02a68c6
Add image description to testapp
mgax b33135a
Simplify SVG loading
mgax 0cb9cf3
PR feedback
mgax 9daa492
Configure rendition used for image description
mgax 815ab08
Get maxlength from the HTML input field
mgax d5bb802
Fix DOM dataset access
mgax b0243d3
Style the button for dark mode
mgax 7cdf140
Max length validation
mgax File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
nav: | ||
- installation.md | ||
- editor-integration.md | ||
- images-integration.md | ||
- ai-backends.md | ||
- text-splitting.md | ||
- contributing.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
# Images Integration | ||
|
||
Wagtail AI integrates with the image edit form to provide AI-generated descriptions to images. The integration requires a backend that supports image descriptions, such as [the OpenAI backend](../ai-backends/#the-openai-backend). | ||
|
||
## Configuration | ||
|
||
1. In the Django project settings, configure an AI backend, and a model, that support images. Set `IMAGE_DESCRIPTION_BACKEND` to the name of the backend: | ||
```python | ||
WAGTAIL_AI = { | ||
"BACKENDS": { | ||
"vision": { | ||
"CLASS": "wagtail_ai.ai.openai.OpenAIBackend", | ||
"CONFIG": { | ||
"MODEL_ID": "gpt-4-vision-preview", | ||
"TOKEN_LIMIT": 300, | ||
}, | ||
}, | ||
}, | ||
"IMAGE_DESCRIPTION_BACKEND": "vision", | ||
} | ||
``` | ||
2. In the Django project settings, configure a [custom Wagtail image base form](https://docs.wagtail.org/en/stable/reference/settings.html#wagtailimages-image-form-base): | ||
```python | ||
WAGTAILIMAGES_IMAGE_FORM_BASE = "wagtail_ai.forms.DescribeImageForm" | ||
``` | ||
|
||
Now, when you upload or edit an image, a magic wand icon should appear next to the _title_ field. Clicking on the icon will invoke the AI backend to generate an image description. | ||
|
||
## Separate backends for text completion and image description | ||
|
||
Multi-modal models are faily new, so you may want to configure two different backends for text completion and image description. The `default` model will be used for text completion: | ||
|
||
```python | ||
WAGTAIL_AI = { | ||
"BACKENDS": { | ||
"default": { | ||
"CLASS": "wagtail_ai.ai.llm.LLMBackend", | ||
"CONFIG": { | ||
"MODEL_ID": "gpt-3.5-turbo", | ||
}, | ||
}, | ||
"vision": { | ||
"CLASS": "wagtail_ai.ai.openai.OpenAIBackend", | ||
"CONFIG": { | ||
"MODEL_ID": "gpt-4-vision-preview", | ||
"TOKEN_LIMIT": 300, | ||
}, | ||
}, | ||
}, | ||
"IMAGE_DESCRIPTION_BACKEND": "vision", | ||
} | ||
``` | ||
|
||
## Custom prompt | ||
|
||
Wagtail AI includes a simple prompt to ask the AI to generate an image description: | ||
|
||
> Describe this image. Make the description suitable for use as an alt-text. | ||
If you want to use a different prompt, override the `IMAGE_DESCRIPTION_PROMPT` value: | ||
|
||
```python | ||
WAGTAIL_AI = { | ||
"BACKENDS": { | ||
# ... | ||
}, | ||
"IMAGE_DESCRIPTION_PROMPT": "Describe this image in the voice of Sir David Attenborough.", | ||
} | ||
``` | ||
|
||
## Custom form | ||
|
||
Wagtail AI includes an image form that enhances the `title` field with an AI button. If you are using a [custom image model](https://docs.wagtail.org/en/stable/advanced_topics/images/custom_image_model.html), you can provide your own form to target another field. Check out the implementation of `DescribeImageForm` in [`forms.py`](https://github.com/wagtail/wagtail-ai/blob/main/src/wagtail_ai/forms.py), adapt it to your needs, and set it as `WAGTAILIMAGES_IMAGE_FORM_BASE`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: This seems pretty clear but I'm not sure it's obvious enough from this example that you need to configure two separate backends if you want both image description and text completion features.
Do you think specifying a
IMAGE_DESCRIPTION_MODEL_ID
config setting (with a default value) on theOpenAIBackend
would make it less of a burden to configure?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've extended the docs to have an example with two backends. I don't think it would help to add another config, and have the backend switch models on the fly, that feels confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems OK, I am still a bit stuck on the idea that the simplest configuration to get all the fancy Wagtail AI features should be:
but appreciate that we have to balance technical complexity here too, and your configuration doesn't necessarily prevent us from doing that in the future.
We're still asking a lot of a user who might know what AI can do for them, but doesn't understand what models to use in what situations so will just be copying examples from the docs, but again I'm sure we can review at another time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's not the best experience, and ideally they should be able to just specify a default backend with minimal configuration. But, keeping in mind that the model we're suggesting for images is called
gpt-4-vision-preview
, I think we can hold off until the functionality becomes part of the mainline offering.