Vision-Language Models

Concrete Jungle: Towards Concreteness Paved Contrastive Negative Mining for Compositional Understanding featured image

Concrete Jungle: Towards Concreteness Paved Contrastive Negative Mining for Compositional Understanding

We study how concreteness-aware negative mining can improve compositional understanding in vision-language models, and introduce ConcretePlant, Cement loss, and Slipform to make …

eun-woo-im