Conditional DETR
DETR with conditional spatial queries for faster training convergence
Conditional DETR improves the original DETR by using conditional spatial queries based on decoder embedding, leading to faster convergence and better performance. Instead of static object queries, it dynamically updates spatial queries based on content, resulting in more efficient training that converges in significantly fewer epochs than standard DETR.
When to Use Conditional DETR
Conditional DETR is ideal for:
- Projects needing faster DETR training than standard models
- When you want transformer benefits with improved convergence
- Medium to large datasets (2,000+ images)
- General object detection where standard DETR would apply
Strengths
- Faster convergence than standard DETR (fewer epochs needed)
- Improved localization through conditional spatial queries
- Better optimization landscape makes training more stable
- Same elegant end-to-end transformer approach as DETR
- Compatible with existing DETR infrastructure
Weaknesses
- Still slower convergence than Deformable DETR
- More complex than standard DETR architecturally
- Requires good understanding of attention mechanisms
- Not as widely adopted as standard or Deformable DETR
Parameters
Training Configuration
Training Images: Folder with training images Annotations: COCO-format JSON file Batch Size (Default: 2) - Range: 2-8 Epochs (Default: 1) - Range: 1-6 (faster than standard DETR) Learning Rate (Default: 5e-5) Eval Steps (Default: 1)
Configuration Tips
- Converges in ~5-7 epochs for fine-tuning (vs 8-10 for standard DETR)
- Works well with 2,000+ annotated images
- batch_size=4 with 16GB GPU
- learning_rate=5e-5 standard
Expected Performance
- 20-30% faster convergence than standard DETR
- Comparable or slightly better accuracy
- Good middle ground between standard DETR and Deformable DETR
Comparison with Alternatives
vs Standard DETR: Choose Conditional for faster training with similar architecture simplicity
vs Deformable DETR: Deformable DETR still better for small objects and overall performance, but Conditional simpler to understand