Dokumentation (english)

ViT Base

Vision Transformer Base model for image classification

Vision Transformer Base (ViT-Base-Patch16-224) splits images into 16×16 pixel patches and applies self-attention to classify them. Achieves strong accuracy and benefits significantly from fine-tuning on domain-specific images.

When to use:

  • Custom image category classification after fine-tuning
  • Medical imaging, product categorization, quality inspection

Input: Image file (PNG, JPG) + optional fine-tuned checkpoint Output: Predicted class label and confidence scores

Inference Settings

No dedicated inference-time settings. The model classifies images deterministically using the loaded checkpoint. Output class labels are determined by the categories in the fine-tuned model.

On this page


Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor etwa 4 Stunden
Release: v4.0.0-production
Buildnummer: master@afa25ab
Historie: 72 Items