SAM (Segment Anything Model)

Foundation model for promptable instance segmentation with points, boxes, or masks

SAM (Segment Anything Model) is a foundation model that can segment any object in an image through various prompts: point clicks, bounding boxes, or rough masks. Unlike traditional models requiring full retraining for new classes, SAM's zero-shot capabilities enable interactive segmentation for any object without additional training, making it revolutionary for annotation tools and flexible segmentation tasks.

When to Use SAM

SAM is ideal for:

Interactive segmentation with user prompts
Zero-shot segmentation without training data for specific classes
Annotation tools for creating training data
Flexible segmentation where object classes aren't predefined
Research and prototyping requiring quick segmentation

Note: SAM is inference-only in this system - no training supported, only fine-tuned checkpoint inference.

Strengths

Promptable: Segment anything by pointing or boxing
Zero-shot: Works on novel objects without training
Interactive: Real-time feedback for user-guided segmentation
Versatile: Multiple prompt types (points, boxes, masks)
Foundation model: Pre-trained on 1 billion+ masks
Multi-mask output: Generates multiple plausible segmentations

Weaknesses

Inference only: Cannot be trained in this system
Semantic labels not provided (just masks)
Requires user interaction for each segmentation
Not optimized for fully automatic batch processing
Large model size (~400MB for ViT-H variant)

Parameters

Inference Configuration

Input Image: Image to segment Finetuned Checkpoint (Optional): Fine-tuned SAM weights Prompt Points (Optional): List of (x, y) coordinates with labels (foreground/background) Prompt Boxes (Optional): Bounding box coordinates (x1, y1, x2, y2)

Multimask Output (Default: true)

Generate multiple masks with different levels of granularity
Recommended to keep true for flexibility
Model automatically ranks masks by quality score

Mask Threshold (Default: 0.0)

Threshold for converting soft masks to binary
0.0 uses model's default (adaptive)
Increase (e.g., 0.5) for tighter masks

Usage Patterns

Point Prompts

Click on objects to segment them. Use positive points (foreground) and negative points (background) to refine.

Example: Click center of object (positive), click background areas (negative) to exclude unwanted regions

Box Prompts

Draw bounding box around object for quick segmentation.

Example: Drag box around person - SAM segments precise boundaries

Combining Prompts

Use both points and boxes for maximum control.

Example: Box around object + negative points to exclude overlapping objects

Configuration Tips

Best Practices

Start with single positive point on object center
Add negative points to refine boundaries
Use boxes for quick rough segmentation
Combine prompts for complex scenarios
multimask_output=true to see alternatives

When to Use SAM

Interactive Annotation: Creating training data for other models - SAM accelerates manual annotation

Zero-shot Tasks: Need to segment objects without training data - SAM works immediately

Flexible Applications: Object classes change frequently - no retraining needed

Prototyping: Quick experimentation with segmentation - iterate without training

When NOT to Use SAM

Fully Automatic: Need batch processing without interaction - use trained segmentation models instead

Semantic Labels: Need class labels not just masks - SAM doesn't classify, only segments

Real-time Automatic: Need automatic detection + segmentation - use Mask R-CNN or DETR Segmentation

Output

Segmentation Masks: Numpy arrays of binary masks Mask Image: Visualization of masks overlaid on input Scores: Quality/confidence scores for each predicted mask (when multimask=true)

Example Use Cases

Creating Training Data

Scenario: Need to annotate 1,000 images for custom segmentation task

Why SAM: Dramatically faster than manual pixel-level annotation. Click object, review mask, accept/refine. Can create training set in hours instead of days.

Research Prototyping

Scenario: Testing segmentation idea on new object types

Why SAM: Zero-shot capability means immediate results without collecting and annotating training data.

Interactive Photo Editing

Scenario: Consumer app for selecting and editing objects in photos

Why SAM: Users click objects, get instant precise selections without technical knowledge.

Flexible Segmentation System

Scenario: Segmentation needs change based on user requirements

Why SAM: Can segment any object on-demand without model retraining for each new class.

Comparison with Alternatives

SAM vs Mask R-CNN

Choose SAM when:

Interactive/promptable segmentation needed
Zero-shot on novel objects
Creating annotation tools
Object classes undefined or changing

Choose Mask R-CNN when:

Fully automatic segmentation required
Fixed set of known classes
Batch processing thousands of images
Need semantic class labels
Training data available

SAM vs DETR Segmentation

Choose SAM when:

Promptable interaction needed
No training data available
Quick prototyping
Flexible, undefined object classes

Choose DETR Segmentation when:

Automatic panoptic segmentation
Specific trained classes
Batch inference
Unified detection + segmentation
Can train custom model

SAM vs SegFormer

Choose SAM when:

Instance segmentation (separate objects)
Interactive prompting
Zero-shot capability needed

Choose SegFormer when:

Semantic segmentation (pixel classes)
Fully automatic processing
Dense scene labeling
Can train on custom data

Technical Notes

Model Variants: SAM comes in ViT-B, ViT-L, ViT-H (Huge is default, best quality)
Inference Speed: 50-200ms per image depending on prompt complexity and GPU
Memory: ~2-4GB GPU memory for inference
Fine-tuning: Possible outside this system, load fine-tuned checkpoints for specialized domains

SAM (Segment Anything Model)

When to Use SAM

Strengths

Weaknesses

Parameters

Inference Configuration

Usage Patterns

Point Prompts

Box Prompts

Combining Prompts

Configuration Tips

Best Practices

When to Use SAM

When NOT to Use SAM

Output

Example Use Cases

Creating Training Data

Research Prototyping

Interactive Photo Editing

Flexible Segmentation System

Comparison with Alternatives

SAM vs Mask R-CNN

SAM vs DETR Segmentation

SAM vs SegFormer

Technical Notes

On this page

Sicherheit auf Enterprise-Niveau

In jeder Infrastruktur einsetzbar

DSGVO-konform

SAM (Segment Anything Model)

When to Use SAM

Strengths

Weaknesses

Parameters

Inference Configuration

Usage Patterns

Point Prompts

Box Prompts

Combining Prompts

Configuration Tips

Best Practices

When to Use SAM

When NOT to Use SAM

Output

Example Use Cases

Creating Training Data

Research Prototyping

Interactive Photo Editing

Flexible Segmentation System

Comparison with Alternatives

SAM vs Mask R-CNN

SAM vs DETR Segmentation

SAM vs SegFormer

Technical Notes

On this page

Command Palette