A curated publication list on visual dialog.
This repository was built to facilitate navigating the mainstream on visual dialog.
Please note that only accepted papers (for reliability) by conferences (for brevity) are contained here.
Last updated: 2023/5/7 (not finished yet)
The visual dialog models for both generative and discriminative tasks have been evaluated by the retrieval-based evaluation metrics: mean reciprocal rank (MRR), recall@k (R@k), mean rank (Mean), and normalized discounted cumulative gain (NDCG). Specifically, all dialogs in VisDial contain a list of 100 answer candidates for each visual question, and there is one ground-truth answer in the answer candidates. The model sorts the answer candidates by the log-likelihood scores and then is evaluated by the four different metrics. MRR, R@k, and Mean consider the rank of the single ground-truth answer, while NDCG considers all relevant answers from the 100-answers list by using the densely annotated relevance scores for all answer candidates. The community regards NDCG as the primary evaluation metric.
In addition, links to the implementations are attached with their framework specification if available. 'o-' and 'u-' indicate the official and the unofficial implementations, respectively.
[Note]
*: re-implemented results
†: use of dense labels
‡: use of additional knowledge
ID | Year | Venue | Model (or Authors) |
MRR | R@1 | R@5 | R@10 | MEAN↓ | code |
---|---|---|---|---|---|---|---|---|---|
1 | 2017 | CVPR | LF | 58.07 | 43.82 | 74.68 | 84.07 | 5.78 | [o-torch] |
2 | 2017 | CVPR | HRE | 58.46 | 44.67 | 74.50 | 84.22 | 5.72 | [o-torch] |
3 | 2017 | CVPR | HREA | 58.68 | 44.82 | 74.81 | 84.36 | 5.66 | [o-torch] |
4 | 2017 | CVPR | MN | 59.65 | 45.55 | 76.22 | 85.37 | 5.46 | [o-torch] |
5 | 2017 | NeurIPS | HCIAE | 62.22 | 48.48 | 78.75 | 87.59 | 4.81 | [o-pytorch] |
6 | 2017 | NeurIPS | AMEM | 62.27 | 48.53 | 78.66 | 87.43 | 4.86 | |
7 | 2018 | CVPR | CoAtt | 63.98 | 50.29 | 80.71 | 88.81 | 4.47 | |
8 | 2018 | CVPR | SF | 62.42 | 48.55 | 78.96 | 87.75 | 4.70 | |
9 | 2018 | ECCV | CorefNMN | 64.10 | 50.92 | 80.18 | 88.81 | 4.45 | |
10 | 2019 | CVPR | VGNN | 62.85 | 48.95 | 79.65 | 88.36 | 4.57 | [o-pytorch] |
11 | 2019 | CVPR | RvA | 66.34 | 52.71 | 82.97 | 90.73 | 3.93 | [o-pytorch] |
12 | 2019 | CVPR | FGA | 65.25 | 51.43 | 82.08 | 89.56 | 4.35 | |
13 | 2019 | IJCAI | DVAN | 66.67 | 53.62 | 82.85 | 90.72 | 3.93 | |
14 | 2019 | EMNLP | DAN | 66.38 | 53.33 | 82.42 | 90.38 | 4.04 | [o-pytorch] |
15 | 2019 | ICCV | HACAN | 67.92 | 54.76 | 83.03 | 90.68 | 3.97 | |
16 | 2020 | AAAI | DualVD | 62.94 | 48.64 | 80.89 | 89.94 | 4.17 | [o-pytorch] |
17 | 2020 | CVPR | CAG | 67.56 | 54.64 | 83.72 | 91.48 | 3.75 | |
18 | 2020 | ACL | MVAN | 67.65 | 54.65 | 83.85 | 91.47 | 3.73 | [o-pytorch] |
19 | 2020 | EMNLP | VD-BERT | 70.04 | 57.79 | 85.34 | 92.68 | 4.04 | [o-pytorch] |
20 | 2022 | ICASSP | VU-BERT | 63.33 | 48.71 | 81.03 | 89.10 | 4.19 | |
21 | 2022 | MM | AlignVD | 71.65 | 59.64 | 88.30 | 94.72 | 2.96 |
ID | Year | Venue | Model (or Authors) |
NDCG | MRR | R@1 | R@5 | R@10 | MEAN↓ | code |
---|---|---|---|---|---|---|---|---|---|---|
1 | 2017 | CVPR | MN* | 55.13 | 60.42 | 46.09 | 78.14 | 88.05 | 4.63 | |
2 | 2017 | NeurIPS | HCIAE* | 57.65 | 62.96 | 48.94 | 80.50 | 89.66 | 4.24 | |
3 | 2018 | CVPR | CoAtt* | 57.72 | 62.91 | 48.86 | 80.41 | 89.83 | 4.21 | |
4 | 2019 | ACL | ReDan | 59.32 | 64.21 | 50.60 | 81.39 | 90.26 | 4.05 | |
6 | 2020 | ECCV | VisDial-BERT | 64.94 | 69.10 | 55.88 | 85.50 | 93.29 | 3.25 | [o-pytorch] |
8 | 2020 | ECCV | LTMI | 62.72 | 62.32 | 48.94 | 78.65 | 87.88 | 4.86 | |
9 | 2020 | ACL | MVAN | 60.17 | 65.33 | 51.86 | 82.40 | 90.90 | 3.88 | [o-pytorch] |
10 | 2020 | ACL | MCA | 60.27 | 64.33 | 51.12 | 80.91 | 89.65 | 4.24 | [o-pytorch] |
12 | 2020 | EMNLP | VD-BERT | 63.22 | 67.44 | 54.02 | 83.96 | 92.33 | 3.53 | |
13 | 2021 | ICASSP | SGLKT | 63.41 | 63.34 | - | - | - | - | |
14 | 2021 | ICASSP | SGLKT† | 74.54 | 59.10 | - | - | - | - | |
15 | 2022 | ICASSP | ICMU | 64.30 | 69.14 | 56.80 | 85.09 | 93.42 | 3.37 | |
16 | 2022 | CVPR | UTC | 63.22 | 68.58 | 55.48 | 85.38 | 93.20 | 3.28 | |
17 | 2022 | MM | AlignVD | 67.22 | 70.45 | 57.64 | 87.06 | 94.20 | 3.05 |
ID | Year | Venue | Model (or Authors) |
NDCG | MRR | R@1 | R@5 | R@10 | MEAN↓ | code |
---|---|---|---|---|---|---|---|---|---|---|
1 | 2020 | CVPR | P1+P2† | 73.63 | 50.56 | 37.99 | 63.98 | 77.95 | 7.26 | [o-pytorch] |
2 | 2020 | ECCV | VisDial-BERT† | 75.24 | 52.22 | 39.92 | 65.05 | 80.63 | 6.17 | [o-pytorch] |
3 | 2020 | ACL | MCA† | 72.18 | 46.92 | 32.09 | 63.85 | 78.06 | 7.37 | [o-pytorch] |
ID | Year | Venue | Model (or Authors) |
NDCG | MRR | R@1 | R@5 | R@10 | MEAN↓ | code |
---|---|---|---|---|---|---|---|---|---|---|
1 | 2017 | CVPR | LF | 45.21 | 55.42 | 40.95 | 72.45 | 82.83 | 5.95 | [o-torch] |
2 | 2017 | CVPR | HRE | 45.46 | 54.16 | 39.93 | 70.45 | 81.50 | 6.41 | [o-torch] |
3 | 2017 | CVPR | MN | 47.50 | 55.49 | 40.98 | 72.30 | 83.30 | 5.92 | [o-torch] |
4 | 2018 | ECCV | CorefNMN | 54.70 | 61.50 | 47.55 | 78.10 | 88.80 | 4.40 | |
5 | 2019 | CVPR | VGNN | 52.82 | 61.37 | 47.33 | 77.98 | 87.83 | 4.57 | [o-pytorch] |
6 | 2019 | CVPR | Sync | 57.32 | 62.20 | 47.90 | 80.43 | 89.95 | 4.17 | |
7 | 2019 | CVPR | RvA | 55.59 | 63.03 | 49.03 | 80.40 | 89.83 | 4.18 | [o-pytorch] |
8 | 2019 | CVPR | FGA | 52.10 | 63.70 | 49.58 | 80.97 | 88.55 | 4.51 | |
9 | 2019 | IJCAI | DVAN | 54.70 | 62.58 | 48.90 | 79.35 | 89.03 | 4.36 | |
10 | 2019 | EMNLP | DAN | 57.59 | 63.20 | 49.63 | 79.75 | 89.35 | 4.30 | [o-pytorch] |
11 | 2019 | ICCV | HACAN | 57.17 | 64.22 | 50.88 | 80.63 | 89.45 | 4.20 | |
12 | 2019 | ACL | ReDan | 61.86 | 53.13 | 41.38 | 66.07 | 74.50 | 8.91 | |
13 | 2020 | AAAI | CDF | 59.49 | 64.40 | 50.90 | 81.18 | 90.40 | 3.99 | |
14 | 2020 | AAAI | DualVD | 56.32 | 63.23 | 49.25 | 80.23 | 89.70 | 4.11 | [o-pytorch] |
15 | 2020 | CVPR | CAG | 56.64 | 63.49 | 49.85 | 80.63 | 90.15 | 4.11 | |
16 | 2020 | MM | KBGN | 57.60 | 64.13 | 50.47 | 80.70 | 90.16 | 4.08 | |
17 | 2020 | ECCV | VisDial-BERT | 63.87 | 67.50 | 53.85 | 84.68 | 93.25 | 3.32 | [o-pytorch] |
18 | 2020 | ECCV | LTMI | 60.92 | 60.65 | 47.00 | 77.03 | 87.75 | 4.90 | |
19 | 2020 | ACL | MVAN | 59.37 | 64.84 | 51.45 | 81.12 | 90.65 | 3.97 | [o-pytorch] |
20 | 2020 | EMNLP | VD-BERT | 59.96 | 65.44 | 51.63 | 82.23 | 90.68 | 3.90 | [o-pytorch] |
21 | 2021 | ICASSP | SGLKT | 61.97 | 62.28 | 48.15 | 79.65 | 89.10 | 4.34 | |
22 | 2022 | ICASSP | ICMU | 61.30 | 66.82 | 53.50 | 83.05 | 92.05 | 3.59 | |
23 | 2022 | CVPR | UTC | 64.60 | 68.70 | 55.73 | 84.93 | 93.08 | 3.32 | |
24 | 2022 | MM | AlignVD | 67.23 | 68.17 | 54.57 | 85.65 | 93.38 | 3.23 | |
25 | 2023 | CVPR | GST‡ | 64.91 | 68.44 | 55.05 | 85.18 | 93.35 | 3.23 | [o-pytorch] |
ID | Year | Venue | Model (or Authors) |
NDCG | MRR | R@1 | R@5 | R@10 | MEAN↓ | code |
---|---|---|---|---|---|---|---|---|---|---|
1 | 2020 | CVPR | P1+P2† | 71.60 | 48.58 | 35.98 | 62.08 | 77.23 | 7.48 | [o-pytorch] |
2 | 2020 | ECCV | VisDial-BERT† | 74.47 | 50.74 | 37.95 | 64.13 | 80.00 | 6.28 | [o-pytorch] |
3 | 2020 | ACL | MCA† | 72.47 | 37.68 | 20.67 | 56.67 | 72.12 | 8.89 | [o-pytorch] |
4 | 2020 | EMNLP | VD-BERT† | 74.54 | 46.72 | 33.15 | 61.58 | 77.15 | 7.18 | [o-pytorch] |
5 | 2021 | ICASSP | SGLKT† | 72.60 | 58.01 | 46.20 | 71.01 | 83.20 | 5.85 | |
6 | 2022 | ICASSP | VU-BERT† | 72.87 | 49.09 | 33.60 | 67.20 | 81.60 | 6.12 | |
7 | 2022 | MM | AlignVD† | 78.70 | 45.75 | 29.50 | 65.70 | 82.45 | 6.64 | |
8 | 2023 | CVPR | GST†‡ | 71.76 | 68.09 | 55.18 | 83.68 | 91.93 | 3.57 | [o-pytorch] |
ID | Year | Venue | Model (or Authors) |
Accuracy | code |
---|---|---|---|---|---|
1 | 2017 | NeurIPS | AMEM | 96.39 | |
2 | 2018 | ECCV | CorefNMN | 99.30 |
ID | Year | Venue | Model (or Authors) |
Train err | Val err | Test err | code |
---|---|---|---|---|---|---|---|
1 | 2017 | CVPR | LSTM+VGG | 26.1 | 38.5 | 39.2 | [o-tensorflow] |
2 | 2017 | CVPR | HRED+VGG | 27.4 | 38.4 | 39.6 | [o-tensorflow] |
3 | 2017 | CVPR | A-ATT | 26.7 | 33.7 | 34.2 | |
4 | 2019 | ICCV | HACAN | 26.1 | 32.3 | 33.2 |
ID | Year | Venue | Model (or Authors) |
MRR | R@1 | R@5 | R@10 | MEAN↓ | code |
---|---|---|---|---|---|---|---|---|---|
1 | 2017 | CVPR | LF | 51.99 | 41.83 | 61.78 | 67.59 | 17.07 | [o-torch] |
2 | 2017 | CVPR | HRE | 52.37 | 42.29 | 62.18 | 67.92 | 17.07 | [o-torch] |
3 | 2017 | CVPR | HREA | 52.42 | 42.28 | 62.33 | 68.17 | 16.79 | [o-torch] |
4 | 2017 | CVPR | MN | 52.59 | 42.29 | 62.85 | 68.88 | 17.06 | [o-torch] |
5 | 2017 | NeurIPS | HCIAE | 53.86 | 44.06 | 63.55 | 69.24 | 16.01 | [o-pytorch] |
6 | 2018 | CVPR | CoAtt | 55.78 | 46.10 | 65.69 | 71.74 | 14.43 | |
7 | 2018 | ECCV | CorefNMN | 53.50 | 43.66 | 63.54 | 69.93 | 15.69 | |
8 | 2019 | CVPR | RvA | 55.43 | 45.37 | 65.27 | 72.97 | 10.71 | [o-pytorch] |
9 | 2019 | IJCAI | DVAN | 55.94 | 46.58 | 65.50 | 71.25 | 14.79 | |
10 | 2020 | AAAI | DMRM | 55.96 | 46.20 | 66.02 | 72.43 | 13.15 | [o-pytorch] |
11 | 2020 | ECCV | LTMI* | 55.85 | 46.07 | 65.97 | 72.44 | 14.17 | |
12 | 2020 | EMNLP | VD-BERT | 55.95 | 46.83 | 65.43 | 72.05 | 13.18 | [o-pytorch] |
13 | 2021 | ACL | MITVG | 56.83 | 47.14 | 67.19 | 73.72 | 11.95 | |
14 | 2022 | ICASSP | VU-BERT | 54.04 | 44.50 | 62.60 | 71.70 | 12.49 | |
15 | 2023 | CVPR | GST‡ | 60.03 | 50.40 | 70.74 | 77.15 | 12.13 | [o-pytorch] |
ID | Year | Venue | Model (or Authors) |
NDCG | MRR | R@1 | R@5 | R@10 | MEAN↓ | code |
---|---|---|---|---|---|---|---|---|---|---|
1 | 2017 | CVPR | MN* | 56.99 | 47.83 | 38.01 | 57.49 | 64.08 | 18.76 | |
2 | 2017 | NeurIPS | HCIAE* | 59.70 | 49.07 | 39.72 | 58.23 | 64.73 | 18.43 | |
3 | 2018 | CVPR | CoAtt* | 59.24 | 49.64 | 40.09 | 59.37 | 65.92 | 17.86 | |
4 | 2019 | ACL | ReDan | 60.47 | 50.02 | 40.27 | 59.93 | 66.78 | 17.40 | |
5 | 2020 | AAAI | DMRM | - | 50.16 | 40.15 | 60.02 | 67.21 | 15.19 | [o-pytorch] |
6 | 2020 | IJCAI | DAM | 60.93 | 50.51 | 40.53 | 60.84 | 67.94 | 16.65 | [o-pytorch] |
7 | 2020 | MM | KBGN | 60.42 | 50.05 | 40.40 | 60.11 | 66.82 | 17.54 | |
8 | 2020 | ECCV | LTMI | 63.58 | 50.74 | 40.44 | 61.61 | 69.71 | 14.93 | |
9 | 2021 | ACL | MITVG | 61.47 | 51.14 | 41.03 | 61.25 | 68.49 | 14.37 | |
10 | 2022 | CVPR | UTC | 63.86 | 52.22 | 42.56 | 62.40 | 69.51 | 15.67 | |
11 | 2023 | CVPR | GST‡ | 65.47 | 53.19 | 43.08 | 64.09 | 71.51 | 14.34 | [o-pytorch] |
- [VisDial] | CVPR'17 | Visual Dialog |
[pdf]
|[o-torch]
- [GuessWhat] | CVPR'17 | GuessWhat?! Visual object discovery through multi-modal dialogue |
[pdf]
|[o-tensorflow]
- [HCIAE] | NIPS'17 | Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model |
[pdf]
|[o-pytorch]
- [AMEM] | NIPS'17 | Visual Reference Resolution using Attention Memory for Visual Dialog |
[pdf]
- [CoAtt] | CVPR'18 | Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning |
[pdf]
- [SF] | CVPR'18 | Two Can Play This Game: Visual Dialog With Discriminative Question Generation and Answering |
[pdf]
- [A-ATT] | CVPR'18 | Visual grounding via accumulated attention |
[pdf]
- [CorefNMN] | ECCV'18 | Visual Coreference Resolution in Visual Dialog using Neural Module Networks |
[pdf]
- [VGNN] | CVPR'19 | Reasoning Visual Dialogs with Structural and Partial Observations |
[pdf]
|[o-pytorch]
- [Sync] | CVPR'19 | Image-Question-Answer Synergistic Network for Visual Dialog |
[pdf]
- [RvA] | CVPR'19 | Recursive Visual Attention in Visual Dialog |
[pdf]
|[o-pytorch]
- [FGA] | CVPR'19 | Factor graph attention |
[pdf]
- [DVAN] | IJCAI'19 | Dual Visual Attention Network for Visual Dialog |
[pdf]
- [DAN] | EMNLP'19 | Dual Attention Networks for Visual Reference Resolution in Visual Dialog |
[pdf]
|[o-pytorch]
- [HACAN] | ICCV'19 | Making History Matter: History-Advantage Sequence Training for Visual Dialog |
[pdf]
- [ReDan] | ACL'19 | Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog |
[pdf]
- [DMRM] | AAAI'20 | DMRM: A Dual-channel Multi-hop Reasoning Model for Visual Dialog |
[pdf]
|[o-pytorch]
- [CDF] | AAAI'20 | Modality-Balanced Models for Visual Dialogue |
[pdf]
- [DualVD] | AAAI'20 | DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue |
[pdf]
|[o-pytorch]
- [P1+P2] | CVPR'20 | Two Causal Principles for Improving Visual Dialog |
[pdf]
|[o-pytorch]
- [CAG] | CVPR'20 | Iterative Context-Aware Graph Inference for Visual Dialog |
[pdf]
| - [DAM] | IJCAI'20 | DAM: Deliberation, Abandon and Memory Networks for Generating Detailed and Non-repetitive Responses in Visual Dialogue |
[pdf]
|[o-pytorch]
- [KBGN] | MM'20 | KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue] |
[pdf]
- [Visdial-Bert] | ECCV'20 | Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline] |
[pdf]
|[o-pytorch]
- [LTMI] | ECCV'20 | Efficient attention mechanism for visual dialog that can handle all the interactions between multiple inputs |
[pdf]
|[o-pytorch]
- [MVAN] | ACL'20 | Multi-View Attention Network for Visual Dialog |
[pdf]
|[o-pytorch]
- [MCA] | ACL'20 | History for Visual Dialog: Do we really need it? |
[pdf]
|[o-pytorch]
- [VD-BERT] | EMNLP'20 | VD-BERT: A Unified Vision and Dialog Transformer with BERT |
[pdf]
|[o-pytorch]
- [MITVG] | ACL'21 | Multimodal Incremental Transformer with Visual Grounding for Visual Dialogue Generation |
[pdf]
- [SGLKT] | EMNLP'21 | Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer |
[pdf]
- [VU-BERT] | ICASSP'22 | VU-BERT: A Unified framework for Visual Dialog |
[pdf]
- [ICMU] | ICASSP'22 | Improving Cross-Modal Understanding in Visual Dialog via Contrastive Learning |
[pdf]
- [UTC] | CVPR'22 | UTC: A Unified Transformer with Inter-Task Contrastive Learning for Visual Dialog |
[pdf]
- [AlignVD] | MM'22 | Unsupervised and Pseudo-Supervised Vision-Language Alignment in Visual Dialog |
[pdf]
- [GST] | CVPR'23 | The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training |
[pdf]
|[o-pytorch]
If you have any suggestions or find missing papers, please feel free to contact me.