Dokumentation (english)

Mask R-CNN

Industry-standard instance segmentation extending Faster R-CNN with mask prediction

Mask R-CNN extends Faster R-CNN by adding a mask prediction branch, enabling instance segmentation alongside object detection. It remains the industry standard for instance segmentation due to its reliability, strong performance, and extensive production deployment history. The model predicts bounding boxes, class labels, and pixel-level masks for each object instance.

When to Use Mask R-CNN

Mask R-CNN is ideal for:

  • Production instance segmentation requiring proven reliability
  • Separating individual object instances with precise boundaries
  • Projects needing both detection and segmentation
  • When you want mature, well-documented architecture
  • Datasets with 1,000+ annotated instances

Strengths

  • Industry standard: Widely deployed in production systems
  • Reliable and stable: Mature architecture with predictable behavior
  • Good accuracy: Strong performance across diverse tasks
  • Fast training: Converges faster than DETR-based approaches
  • Flexible backbones: ResNet-50 or ResNet-101 options
  • Well-optimized: Years of engineering improvements
  • Extensive documentation: Large community and resources

Weaknesses

  • Not state-of-the-art (newer transformers can be more accurate)
  • Anchor-based approach requires tuning
  • NMS post-processing needed (not end-to-end)
  • Less elegant than transformer architectures
  • Slower inference than YOLO-based methods

Parameters

Training Configuration

Training Images: Folder with images Annotations: COCO-format JSON with instance masks (polygons or RLE)

Backbone (Default: "resnet50")

  • Options: resnet50, resnet101
  • ResNet-50 for speed, ResNet-101 for accuracy

Score Threshold (Default: 0.5)

  • Minimum confidence for predictions at inference
  • Range: 0.0-1.0
  • Lower values: more detections, more false positives
  • Higher values: fewer detections, higher precision

Batch Size (Default: 2)

  • Range: 1-8
  • Typically 2-4 for instance segmentation
  • Memory-intensive due to mask head

Learning Rate (Default: 0.005)

  • Range: 0.001-0.01
  • Higher than DETR models (different optimizer)
  • Reduce for small datasets

Configuration Tips

Backbone Selection

  • ResNet-50: Standard choice, good balance, faster training
  • ResNet-101: +2-3% mAP, slower, use for maximum accuracy

Training Settings

  • batch_size=2-4 typical with 12-16GB GPU
  • Converges faster than DETR (fewer epochs needed)
  • learning_rate=0.005 standard, reduce to 0.001-0.002 for small data
  • score_threshold=0.5 during training, tune for inference (0.3-0.7)

Dataset Requirements

  • Minimum: 500 images with 1,000+ object instances
  • Optimal: 2,000+ images with well-annotated masks
  • Instance masks must be accurate (polygons or RLE format)

Expected Performance

Instance mAP@0.5: 40-55% on COCO-style datasets (ResNet-50 backbone) Mask mAP: 35-50% depending on task difficulty Training Time: 1-2 hours per epoch on 5k images (RTX 4090) Inference Speed: 20-40ms per image (GPU), slower than YOLO but acceptable

Example Use Cases

Manufacturing Quality Control

Scenario: Segment individual defects on products for detailed analysis

Configuration:

Model: Mask R-CNN
Backbone: resnet50
Batch Size: 4
Learning Rate: 0.005
Images: 2,500 with instance annotations

Why Mask R-CNN: Proven reliability for production, precise instance separation, good for quality metrics per defect

Cell Segmentation (Medical)

Scenario: Segment individual cells in microscopy images

Configuration:

Model: Mask R-CNN
Backbone: resnet101
Batch Size: 2
Learning Rate: 0.002
Images: 1,500 microscopy images
Score Threshold: 0.6 (reduce false positives)

Why Mask R-CNN: High accuracy for medical use, separates touching cells, reliable for clinical settings

Retail Product Instance Segmentation

Scenario: Segment individual products on shelves for inventory tracking

Configuration:

Model: Mask R-CNN
Backbone: resnet50
Batch Size: 4
Learning Rate: 0.005
Images: 3,000 shelf images

Why Mask R-CNN: Handles occlusion, separates touching products, fast enough for automated systems

Common Issues and Solutions

Overlapping Instances Not Separated

Problem: Model merges touching objects into single mask

Solutions:

  1. Ensure training masks properly separate instances
  2. Include diverse examples of overlapping objects
  3. Lower score_threshold to detect more instances
  4. Check annotation quality - masks must be distinct

Poor Mask Boundaries

Problem: Masks don't follow object edges precisely

Solutions:

  1. Use ResNet-101 backbone for better features
  2. Ensure training masks are pixel-accurate
  3. Train for more epochs
  4. Check if input resolution sufficient for details

Out of Memory

Problem: CUDA out of memory during training

Solutions:

  1. Reduce batch_size to 2 or 1
  2. Use ResNet-50 instead of ResNet-101
  3. Reduce input image resolution
  4. Enable gradient accumulation if available

Comparison with Alternatives

Mask R-CNN vs DETR Segmentation

Choose Mask R-CNN when:

  • Need proven production reliability
  • Want faster training (2-3x faster convergence)
  • Mature tooling and documentation important
  • Instance segmentation sufficient (not panoptic)
  • Have existing R-CNN infrastructure

Choose DETR Segmentation when:

  • Want modern transformer architecture
  • Need panoptic segmentation (stuff + things)
  • Prefer end-to-end approach without NMS
  • Research or experimentation setting
  • Can afford slower training

Mask R-CNN vs SAM

Choose Mask R-CNN when:

  • Need fully automatic batch processing
  • Have training data for specific classes
  • Want semantic labels + masks
  • Production deployment at scale

Choose SAM when:

  • Interactive/promptable segmentation needed
  • Zero-shot on novel objects required
  • Creating annotation tools
  • Flexible, undefined object classes

Mask R-CNN vs SegFormer

Choose Mask R-CNN when:

  • Need instance segmentation (separate objects)
  • Object detection + segmentation together
  • Individual object masks required

Choose SegFormer when:

  • Need semantic segmentation (pixel classification)
  • Don't need instance separation
  • Want efficient transformer architecture
  • Dense scene labeling priority

Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor 1 Tag
Release: v4.0.0-production
Buildnummer: master@64a3463
Historie: 68 Items