Dokumentation (english)

Vision Language Models

Models that understand both images and text for captioning, VQA, and document understanding

Vision-language models jointly process images and text for captioning, visual question answering, and structured document extraction.

Available Models

  • BLIP-2 – Image captioning, visual question answering, and image-text retrieval
  • LayoutLMv3 – Document understanding combining text, layout, and image for forms, receipts, and invoices

On this page


Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor etwa 4 Stunden
Release: v4.0.0-production
Buildnummer: master@afa25ab
Historie: 72 Items