Multimodal Classification
Universal multimodal classification combining multiple image sources and feature columns
Universal multimodal classification model combining images from multiple modalities with structured feature data. Supports custom feature and target column selection.
When to use:
- Combining image data with tabular features for classification
- Multi-modal medical diagnosis (imaging + patient data)
- Industrial quality classification combining multiple sensor images
Input:
- Finetuned Checkpoint (optional): Fine-tuned model weights
- Input Images (required): Directory containing input images from multiple modalities
- Prompt (optional): Text prompt about the images
Output: Classification result and generation metadata
Model Settings (set during training, used at inference)
Feature Columns (required during training) List of feature column names used as input. The same columns must be available at inference time.
Target Column (required during training) Name of the target column the model was trained to predict. Determines class labels at inference.
Inference Settings
No dedicated inference-time settings. Results depend on the feature and target configuration from the training run.