Description Logic for T2I Evaluations

Automated evaluation of text-to-image generative models using description logic. The main T2I model to be evaluated is Stable Diffusion V1.4 and Stable Diffusion V2.1.

Automating Evaluations

There are multiple evaluation methods. Evaluations can be automated using two ways:

Creating a pipeline to generate a diverse set of prompts
Designing the evaluation procedure to check generated images

Challenges

There are a few challenges associated with this task:

Bias within evaluation data (like, apple is always associated with red and green colors)

Can we create better evaluation dataset?
Other kinds of biases: apple is always evaluated on the basis of colors, but not sizes Can Stable Diffusion generate big apple with the size of an elephant?

Hallucination: If we ask the model to generate “A” then it generates “A + B”.

How to detect such hallucinations?

Prompt Generation Methodology

Prompt generation will take the following format and expand on it to form more complicated prompts:
C = Color = {Red, Green, Black}
D = Fruit = {Banana, Apple}
F = Furniture = {Chair, Table}
R = Relation = {“on top of”, “and”}

Level 1

C union D = {“red banana”, “black apple”}

Level 2

R((C union D), F) = {“black apple on top of chair”}

Level 3

R((C union D), (C union D)) = {“black apple and red banana”}

Project Goals

Estimated Duration	Tasks
2 weeks	Learning description logics
	Playing with Stable Diffusion (and understanding where it is failing)
	Reading and analyzing the existing T2I evaluation strategies: DALL-Eval and HRS-Benchmark
2 weeks	Defining the description logic rules (i.e., knowledge graph)
	Creating a small diverse set of prompts using automated strategies
	Evaluating several T2I models
3 weeks	Scaling the description logic rules
	Performing automated evaluations of T2I models
1 week	Summarizing and report writing

Check out our detailed report for further details - here

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
Results_SD_v1.4		Results_SD_v1.4
Results_SD_v2.0		Results_SD_v2.0
__pycache__		__pycache__
annotation2_blip		annotation2_blip
DL-T2I_Report.pdf		DL-T2I_Report.pdf
README.md		README.md
category_analysis.ipynb		category_analysis.ipynb
grammar.py		grammar.py
level1.txt		level1.txt
level1_rules.py		level1_rules.py
level2.txt		level2.txt
level2_rules.py		level2_rules.py
level3.txt		level3.txt
level3_rules.py		level3_rules.py
level_1_gen.txt		level_1_gen.txt
level_2_gen.txt		level_2_gen.txt
level_3_gen.txt		level_3_gen.txt
map.py		map.py
objects.py		objects.py
properties.py		properties.py
stable_diffusion_2_1.ipynb		stable_diffusion_2_1.ipynb
stable_diffusion_exp.ipynb		stable_diffusion_exp.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description Logic for T2I Evaluations

Automating Evaluations

Challenges

Prompt Generation Methodology

Level 1

Level 2

Level 3

Project Goals

About

Releases

Packages

Contributors 3

Languages

Vihang26/dl_t2i

Folders and files

Latest commit

History

Repository files navigation

Description Logic for T2I Evaluations

Automating Evaluations

Challenges

Prompt Generation Methodology

Level 1

Level 2

Level 3

Project Goals

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages