When the Distribution Is the Answer - VizWizWhen the Distribution Is the Answer VizWiz Challenge...

13
When the Distribution Is the Answer VizWiz Challenge Denis Dushi Sandro Pezzelle Tassilo Klein Moin Nabi

Transcript of When the Distribution Is the Answer - VizWizWhen the Distribution Is the Answer VizWiz Challenge...

Page 1: When the Distribution Is the Answer - VizWizWhen the Distribution Is the Answer VizWiz Challenge Sandro Pezzelle Contacts: sandro.pezzelle@unitn.it skype: sandro.pezzelle mobile: +39

When the Distribution Is the AnswerVizWiz Challenge

Sandro Pezzelle

Contacts:[email protected]

skype: sandro.pezzellemobile: +39 349 0537325

sandropezzelle.github.ioresearchgate

linkedInscholar

arXiv

Work address:CIMeC, University of Trento

Corso Bettini, 3138068 Rovereto (TN), Italy

Skills

Languages

� Italian� English� FrenchProgramming

� Unix� Python� Keras� Tensorflow

� Matlab� Psychtoolbox

� Lua/TorchStatistics & Others

� R/RStudio� lme4� ggplot2

� LaTeX� LibreOffice� Inkscape� HtmlSoft Skills

� Communication� Writing� Organization� Learning� Networking

Sandro PezzellePhD Student

About me PhD Student in Cognitive and Brain Sciences, track Language,Interaction and Computation. My current research - at the intersectionbetween Computational Linguistics, Computer Vision and Cognition - isfocused on the learning of quantity expressions (numbers, proportions,quantifiers). I’d define myself as an enthusiastic, communicative, multi-faceted person. Proactive and inclined to lifelong learning. “Let’s try!” asa personal motto. My code is full of print().

Education2015 - present, PhD in Cognitive and Brain SciencesCIMeC, University of Trento, Italy. Supervisor: Raffaella BernardiComputational Linguistics, Computer Vision, Cognitive Sciences, MachineLearning, AI

2012 - 2015, MSc in Linguistics, 110/110 cum laudeUniversity of Padova, Italy. Supervisors: Laura Vanelli, Marco MarelliDistributional Semantics, Psycholinguistics, Morphology

Jan 2014 - Jul 2014, Erasmus ProgramUniversite Catholique de Louvain, Belgium.Applied Linguistics, Computational Linguistics, Statistics

2009 - 2012, BSc in Modern Literature, 110/110 cum laudeUniversity of Padova, Italy. Supervisor: Luca ZulianiStylistic and Metrics, Formal Linguistics, Philology

Relevant ExperienceOct 2017, Research InternILLC, University of Amsterdam. Supervisor: Jakub SzymanikDistributional Semantics, Formal Linguistics, Language Modelling

Nov 2016 - Jun 2017, Language SpecialistAppen. Part-time, project-oriented remote positionComputational Linguistics, Formal Linguistics

Training2017 Mini-Symposium on Deep Generative Models, Amsterdam2017 iV&L Training School on Cognitive Robotics, Athens2016 26th ESSLLI, Bolzano2016 iV&L Training School on Deep Learning, Malta2015 - 2016 Machine Learning by Stanford University, Coursera

Recent PresentationsOct 24, 2017 Learning to Quantify from Language and Vision: Insights from Be-havioral and Computational Studies. Talk at Comp. Ling. Series, Amsterdam.Sep 28, 2017 Quantifiers and Proportions in Language and Vision: Insights fromBehavioral and Computational Studies. Talk at CoSaQ Workshop, Amsterdam.Sep 26, 2017 Be Precise or Fuzzy: Learning the Meaning of Cardinals and Quan-tifiers from Vision. Poster at Google NLP Summit, Zurich.

Denis Dushi Sandro Pezzelle Tassilo Klein Moin Nabi

Page 2: When the Distribution Is the Answer - VizWizWhen the Distribution Is the Answer VizWiz Challenge Sandro Pezzelle Contacts: sandro.pezzelle@unitn.it skype: sandro.pezzelle mobile: +39

2INTERNAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ

VQA Task

Q: “What is this?”

AnnotationsInput

answer count

bottle 5

tv 2

office 2

room 1

A1 bottleA2 bottleA3 tvA4 officeA5 bottleA6 tvA7 bottleA8 roomA9 officeA10 bottle

Ground Truth

“bottle”

Page 3: When the Distribution Is the Answer - VizWizWhen the Distribution Is the Answer VizWiz Challenge Sandro Pezzelle Contacts: sandro.pezzelle@unitn.it skype: sandro.pezzelle mobile: +39

3INTERNAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ

VQA Evaluation metric

answer count

bottle 5

tv 2

office 2

room 1

Ground Truth

“bottle”

accuracy = min(# Annotators providing that answer

3

, 1) (1)

L(x, c,w) =

|c|X

i=1

wi

(� log

exci

P|x|j=1 e

xj

) (2)

Table 1:

num answers/classes 1 2 5 50 300 3000 40271

soft-loss model acc. (val) 0.349 0.402 0.424 0.481 0.504 0.516 0.512

Table 2: Accuracy of soft-loss model using N classes in prediction.

1

Annotations Evaluation Accuracy

prediction accuracy

bottle 100%

tv ~ 67%

office ~ 67%

room ~ 33%

Training Loss

[1] Antol et al. (2015). VQA: Visual Question Answering. Proceedings of the IEEE international 076 conference on Computer Vision: 2425–2433

[1]

Page 4: When the Distribution Is the Answer - VizWizWhen the Distribution Is the Answer VizWiz Challenge Sandro Pezzelle Contacts: sandro.pezzelle@unitn.it skype: sandro.pezzelle mobile: +39

4INTERNAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ

Subjectivity

[2] Jolly, Pezzelle et al. (2018). The Wisdom of MaSSeS: Majority, Subjectivity, and Semantic Similarity in the Evaluation of VQA

Page 5: When the Distribution Is the Answer - VizWizWhen the Distribution Is the Answer VizWiz Challenge Sandro Pezzelle Contacts: sandro.pezzelle@unitn.it skype: sandro.pezzelle mobile: +39

5INTERNAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ

Coverage analysis

num answers/classes 1 2 5 50 300 3000 40271

num samples (train) 9541 11570 12531 14963 17046 19425 20K

% samples (train) 47.70 57.85 62.65 74.81 85.23 97.12 100

Table 1: Number and percentage of samples covered by using the top-N answers

(row 1).

1

• Coverage of samples considering all the annotations

Page 6: When the Distribution Is the Answer - VizWizWhen the Distribution Is the Answer VizWiz Challenge Sandro Pezzelle Contacts: sandro.pezzelle@unitn.it skype: sandro.pezzelle mobile: +39

6INTERNAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ

Most frequent answer : unanswerable

count covered samples % covered samples1 3059 32%2 1878 20%≥ 3 4604 48%

Page 7: When the Distribution Is the Answer - VizWizWhen the Distribution Is the Answer VizWiz Challenge Sandro Pezzelle Contacts: sandro.pezzelle@unitn.it skype: sandro.pezzelle mobile: +39

7INTERNAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ

Uncertainty-aware training

• Methods that use only the most-frequent answer ignore :

Uncertainty-aware training Uncertainty modeled as agreement over humans

1. Contribution of other answers

2. Uncertainty of each answer

Page 8: When the Distribution Is the Answer - VizWizWhen the Distribution Is the Answer VizWiz Challenge Sandro Pezzelle Contacts: sandro.pezzelle@unitn.it skype: sandro.pezzelle mobile: +39

8INTERNAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ

Soft cross-entropy loss

«What's the weather like outside on this photo? Thank you»

.

.

.

7 cloudy0 unsuitable0 yes2 overcast0 blue0 dog...

10

VQA Model

.

.

.

7 cloudy0 unsuitable0 yes2 overcast0 blue0 dog...

10

accuracy = min(# Annotators providing that answer

3

, 1) (1)

L(x, c,w) =

|c|X

i=1

wi

(� log

exci

P|x|j=1 e

xj

) (2)

Table 1:

num answers/classes 1 2 5 50 300 3000 40271

soft-loss model acc. (val) 0.349 0.402 0.424 0.481 0.504 0.516 0.512

Table 2: Accuracy of soft-loss model using N classes in prediction.

1

[3] Ilievski et al. (2017). A simple loss function for improving the convergence and accuracy of visual question answering models.

[4] Kazemi et al. (2017). Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering.

[3]

• Standard VQA model [4]

Page 9: When the Distribution Is the Answer - VizWizWhen the Distribution Is the Answer VizWiz Challenge Sandro Pezzelle Contacts: sandro.pezzelle@unitn.it skype: sandro.pezzelle mobile: +39

9INTERNAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ

ResultsDataset Augmented VizWiz 50% VizWiz Balanced VizWiz

Accuracy 0.501 0.446 0.111

num answers/classes 1 2 5 50 300 3000 40271

soft-loss model acc. (val) 0.349 0.402 0.424 0.481 0.504 0.516 0.512

Table 2 Accuracy of soft-loss model using N classes in prediction.

Actual class (Most freq. answer)

other unanswerable / unsuitable

predicted class

other 1199 118

unanswerable / unsuitable 1052 804

Table 3 Confusion matrix. unanswerable and unsuitable are the answers with

the highest coverage of samples in VizWiz.

manipulation augmented train 50% train balanced val

accuracy 0.501 0.446 0.111

only Text only Vision Multimodal

unanswerable 0.784 0.796 0.803

other 0.138 0.299 0.340

yes/no 0.499 0.346 0.690

number 0.243 0.319 0.285

tot. accuracy 0.377 0.476 0.516

Table 4 Ablation study.

2

• Accuracy on validation split

• Accuracy on test-challenge split

method acc

SoA 0.475

Ours 0.512

[5] Gurari et al. (2018). VizWiz Grand Challenge: Answering Visual Questions from Blind People.

[5]

Page 10: When the Distribution Is the Answer - VizWizWhen the Distribution Is the Answer VizWiz Challenge Sandro Pezzelle Contacts: sandro.pezzelle@unitn.it skype: sandro.pezzelle mobile: +39

10INTERNAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ

Preprocessing

• Accuracy on test-challenge

method acc

SoA 0.4750

Ours 0.5120

Ours + prepro 0.5163

1. Smartly stripping punctuation

2. Filtering conversational words

e.g. “can’t” à “cant”

e.g. “hello”, “please”, “thank you”, “goodbye” ...

[5] Gurari et al. (2018). VizWiz Grand Challenge: Answering Visual Questions from Blind People.

[5]

Page 11: When the Distribution Is the Answer - VizWizWhen the Distribution Is the Answer VizWiz Challenge Sandro Pezzelle Contacts: sandro.pezzelle@unitn.it skype: sandro.pezzelle mobile: +39

11INTERNAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ

Answerability task

• Accuracy on test-dev

method F1 AP

Ours 65.02 74.71

Ours + Up 68.84 74.73

1. Change output layer of multi-class model

2. Balance dataset

Label : 0/1 (unanswerable/answerable)

• Up-sampling

• Down-samplingImbalanced dataset (71.3 % answerable)

• Accuracy on test-challenge

method F1 AP

SoA - 71.7

Ours + Up 67.71 73.11

[5] Gurari et al. (2018). VizWiz Grand Challenge: Answering Visual Questions from Blind People.

[5]

Page 12: When the Distribution Is the Answer - VizWizWhen the Distribution Is the Answer VizWiz Challenge Sandro Pezzelle Contacts: sandro.pezzelle@unitn.it skype: sandro.pezzelle mobile: +39

12INTERNAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ

Conclusion

1. Multi-class task

2. Answerability task

Binary classifier with up-sampling of unanswerable samples

• Soft cross-entropy

• Smart preprocessing

Page 13: When the Distribution Is the Answer - VizWizWhen the Distribution Is the Answer VizWiz Challenge Sandro Pezzelle Contacts: sandro.pezzelle@unitn.it skype: sandro.pezzelle mobile: +39

Sandro Pezzelle

Contacts:[email protected]

skype: sandro.pezzellemobile: +39 349 0537325

sandropezzelle.github.ioresearchgate

linkedInscholar

arXiv

Work address:CIMeC, University of Trento

Corso Bettini, 3138068 Rovereto (TN), Italy

Skills

Languages

� Italian� English� FrenchProgramming

� Unix� Python� Keras� Tensorflow

� Matlab� Psychtoolbox

� Lua/TorchStatistics & Others

� R/RStudio� lme4� ggplot2

� LaTeX� LibreOffice� Inkscape� HtmlSoft Skills

� Communication� Writing� Organization� Learning� Networking

Sandro PezzellePhD Student

About me PhD Student in Cognitive and Brain Sciences, track Language,Interaction and Computation. My current research - at the intersectionbetween Computational Linguistics, Computer Vision and Cognition - isfocused on the learning of quantity expressions (numbers, proportions,quantifiers). I’d define myself as an enthusiastic, communicative, multi-faceted person. Proactive and inclined to lifelong learning. “Let’s try!” asa personal motto. My code is full of print().

Education2015 - present, PhD in Cognitive and Brain SciencesCIMeC, University of Trento, Italy. Supervisor: Raffaella BernardiComputational Linguistics, Computer Vision, Cognitive Sciences, MachineLearning, AI

2012 - 2015, MSc in Linguistics, 110/110 cum laudeUniversity of Padova, Italy. Supervisors: Laura Vanelli, Marco MarelliDistributional Semantics, Psycholinguistics, Morphology

Jan 2014 - Jul 2014, Erasmus ProgramUniversite Catholique de Louvain, Belgium.Applied Linguistics, Computational Linguistics, Statistics

2009 - 2012, BSc in Modern Literature, 110/110 cum laudeUniversity of Padova, Italy. Supervisor: Luca ZulianiStylistic and Metrics, Formal Linguistics, Philology

Relevant ExperienceOct 2017, Research InternILLC, University of Amsterdam. Supervisor: Jakub SzymanikDistributional Semantics, Formal Linguistics, Language Modelling

Nov 2016 - Jun 2017, Language SpecialistAppen. Part-time, project-oriented remote positionComputational Linguistics, Formal Linguistics

Training2017 Mini-Symposium on Deep Generative Models, Amsterdam2017 iV&L Training School on Cognitive Robotics, Athens2016 26th ESSLLI, Bolzano2016 iV&L Training School on Deep Learning, Malta2015 - 2016 Machine Learning by Stanford University, Coursera

Recent PresentationsOct 24, 2017 Learning to Quantify from Language and Vision: Insights from Be-havioral and Computational Studies. Talk at Comp. Ling. Series, Amsterdam.Sep 28, 2017 Quantifiers and Proportions in Language and Vision: Insights fromBehavioral and Computational Studies. Talk at CoSaQ Workshop, Amsterdam.Sep 26, 2017 Be Precise or Fuzzy: Learning the Meaning of Cardinals and Quan-tifiers from Vision. Poster at Google NLP Summit, Zurich.

Denis Dushi Sandro Pezzelle Tassilo Klein Moin Nabi

Thank you.(Answerable) Questions?