Object Detection

Object detection combines classification and localization to identify where objects are in images and what they are. Unlike image classification which labels entire images, object detection outputs bounding boxes and class labels for each detected object. This task is fundamental to applications like autonomous driving, surveillance, robotics, and visual inspection.

Learn About Object Detection

New to object detection? Visit our Object Detection Concepts Guide to learn about bounding boxes, IoU metrics, anchor-free vs anchor-based methods, and annotation formats like COCO.

Available Models

DETR (Detection Transformer) Family

DETR revolutionized object detection by eliminating hand-crafted components like anchor boxes and non-maximum suppression through a transformer-based approach.

DETR ResNet-50 - Standard DETR with ResNet-50 backbone, balanced performance
DETR ResNet-101 - Deeper backbone for higher accuracy
DETR ResNet-50 DC5 - Dilated convolutions for improved small object detection
DETR ResNet-101 DC5 - Deepest DETR variant with dilated convolutions

Advanced DETR Variants

Improvements on the DETR architecture addressing convergence speed and accuracy.

Deformable DETR - Deformable attention for faster convergence and better small object detection
Conditional DETR - Conditional spatial queries for faster training

YOLO Family

You Only Look Once (YOLO) models prioritize real-time detection speed while maintaining competitive accuracy.

YOLOv8-Nano - Ultra-fast and lightweight for edge devices and real-time applications

Common Configuration

Data Requirements

Training Images: Directory containing your object images

Annotations: JSON file in COCO format containing:

Image information (filename, dimensions)
Bounding boxes (x, y, width, height)
Object categories/classes
Instance IDs

COCO Annotation Format Example:

[
  "images": [
    ["id": 1, "file_name": "image1.jpg", "height": 480, "width": 640]
  ],
  "annotations": [
    ["id": 1, "image_id": 1, "category_id": 1,
     "bbox": [100, 150, 200, 180], "area": 36000]
  ],
  "categories": [
    ["id": 1, "name": "car"],
    ["id": 2, "name": "person"]
  ]
]

Key Training Parameters

Batch Size: Number of images processed together

DETR models: 2-8 (transformer overhead)
YOLO models: 8-32 (more efficient architecture)
Reduce if out-of-memory errors occur

Epochs: Complete passes through training data

1-5 epochs typical for fine-tuning
More epochs for training from scratch or small datasets
Object detection generally needs fewer epochs than classification

Learning Rate: Optimizer step size

5e-5 typical for DETR models
Higher rates possible for YOLO (1e-3 to 1e-4)
Lower rates for small datasets or when fine-tuning

Eval Steps: Evaluation frequency during training

Understanding Metrics

mAP (mean Average Precision): Primary metric for object detection

mAP@0.5: Average Precision at IoU threshold 0.5 (lenient)
mAP@0.5:0.95: Average over IoU thresholds 0.5 to 0.95 (strict, COCO standard)
Higher is better, ranges from 0 to 1 (or 0% to 100%)

IoU (Intersection over Union): Overlap between predicted and ground truth boxes

IoU > 0.5: Generally considered a correct detection
IoU > 0.75: High-quality detection

Precision: Fraction of detections that are correct

High precision: Few false positives

Recall: Fraction of ground truth objects that are detected

High recall: Few missed objects

Loss Components:

Classification loss: How well classes are predicted
Bounding box regression loss: How accurately boxes are localized
Should both decrease during training

Choosing the Right Model

By Priority

Maximum Accuracy

DETR ResNet-101 DC5 (best overall)
Deformable DETR (great for small objects)
DETR ResNet-101

Fastest Training

YOLOv8-Nano (quickest to converge)
Conditional DETR (improved DETR convergence)
DETR ResNet-50

Fastest Inference

YOLOv8-Nano (real-time capable)
DETR ResNet-50
Conditional DETR

Best for Small Objects

Deformable DETR (designed for this)
DETR ResNet-50/101 DC5 (dilated convolutions help)
YOLOv8-Nano (with appropriate input size)

Edge Deployment

YOLOv8-Nano (only practical option)
Consider quantization for other models

By Use Case

Autonomous Vehicles

Deformable DETR or YOLOv8-Nano
Need real-time performance and small object detection
Large, well-annotated datasets available

Security/Surveillance

DETR ResNet-101 DC5 for maximum accuracy
YOLOv8-Nano if real-time processing required
Often dealing with small, distant objects

Manufacturing Quality Control

DETR ResNet-50 for balanced performance
Controlled environment, good lighting
Precision important, speed often secondary

Retail Analytics

YOLOv8-Nano for real-time people counting
Deformable DETR for product detection
Need balance of speed and accuracy

Wildlife Monitoring

DETR ResNet-101 or Deformable DETR
Animals often small in frame
Accuracy more important than speed

Best Practices

Data Preparation

Annotation Quality: Accurate bounding boxes are critical
- Tight boxes around objects (no excessive padding)
- Consistent annotation guidelines
- Include partially visible objects if relevant
Dataset Balance:
- Aim for balanced instances across classes
- At least 100 instances per class minimum
- More instances for difficult classes
Image Diversity:
- Various lighting conditions
- Different angles and scales
- Diverse backgrounds
- Include edge cases
Validation Split:
- 10-20% of data for validation
- Ensure validation set represents real-world distribution

Training Strategy

Start with Default Config: Use default learning rates and batch sizes initially
Monitor Training:
- Loss should decrease steadily
- Both classification and localization losses important
- Check mAP on validation set
Adjust Learning Rate:
- Reduce if loss oscillates or increases
- Increase if convergence very slow
- Consider learning rate scheduling
Augmentation:
- Less aggressive than classification (preserve spatial information)
- Common: horizontal flip, brightness/contrast adjustment
- Avoid: heavy rotation or cropping that cuts objects

Common Pitfalls

Small Objects Not Detected

Use Deformable DETR or models with DC5
Increase input resolution if possible
Ensure small objects well-annotated in training data

Many False Positives

Lower confidence threshold at inference
Train longer for better classification
Check if similar-looking objects confuse model

Poor Localization (Low IoU)

Focus on bounding box loss during training
Verify annotation quality and consistency
May need more training data

Slow Convergence

DETR models converge slower than YOLO
Consider Conditional DETR or Deformable DETR
Increase learning rate cautiously

Class Imbalance Issues

Ensure adequate examples of rare classes
Consider weighted sampling or loss reweighting
May need to collect more data for rare classes

GPU Requirements

Memory Guidelines

DETR Models:

8GB minimum (batch_size=2)
12-16GB recommended (batch_size=4-8)
Transformers memory-intensive

YOLO Models:

4-8GB sufficient
More efficient architecture
Can use larger batch sizes

Training Time Estimates

Small Dataset (1,000 images):

DETR models: 30-60 minutes per epoch
YOLO models: 10-20 minutes per epoch

Medium Dataset (5,000 images):

DETR models: 2-5 hours per epoch
YOLO models: 30-90 minutes per epoch

Large Dataset (20,000+ images):

DETR models: 8+ hours per epoch
YOLO models: 2-4 hours per epoch

Times assume modern GPU (RTX 3080/4080 or better)

Dataset Size Guidelines

Minimum: 500 annotated images with 50+ instances per class Good: 2,000-5,000 images with 200+ instances per class Excellent: 10,000+ images with 1,000+ instances per class

Object detection typically requires more data than classification due to the additional complexity of localization.

Object Detection

Available Models

DETR (Detection Transformer) Family

Advanced DETR Variants

YOLO Family

Common Configuration

Data Requirements

Key Training Parameters

Understanding Metrics

Choosing the Right Model

By Priority

By Use Case

Best Practices

Data Preparation

Training Strategy

Common Pitfalls

GPU Requirements

Memory Guidelines

Training Time Estimates

Dataset Size Guidelines

On this page

Sicherheit auf Enterprise-Niveau

In jeder Infrastruktur einsetzbar

DSGVO-konform

Object Detection

Available Models

DETR (Detection Transformer) Family

Advanced DETR Variants

YOLO Family

Common Configuration

Data Requirements

Key Training Parameters

Understanding Metrics

Choosing the Right Model

By Priority

By Use Case

Best Practices

Data Preparation

Training Strategy

Common Pitfalls

GPU Requirements

Memory Guidelines

Training Time Estimates

Dataset Size Guidelines

On this page

Command Palette