Documentation

Multimodal Reranking

Score image-text or image-image similarity for retrieval and recommendation

Multimodal reranking models score relevance between images and text (or images and images). Use as a second-stage ranker after embedding-based retrieval to improve final result quality.

Available Models

  • SigLIP Cross-Encoder – Image-text cross-encoder based on SigLIP for visual-text relevance scoring
  • CLIP Cross-Encoder – CLIP-based cross-encoder for image-text and image-image similarity scoring

On this page


Command Palette

Search for a command to run...

Keyboard Shortcuts
CTRL + KSearch
CTRL + DTheme switch
CTRL + LLanguage switch

Software details
Compiled about 7 hours ago
Release: v4.0.0-production
Buildnumber: master@d5b7269
History: 52 Items