Object Detection
for Industry

Build and deploy that works

Build and deploy object detection that works on factory floors, yards, and campuses—accurate, measurable, and maintainable (no hype)

🎯 What it is

Object detection finds where things are (bounding boxes) and what they are (classes) in images/video.

📝 Strong Pipeline Structure

ingest pre-process detect post-process (NMS/tracking) business events

🏭 Use cases

🔍 Quality Control

  • • Missing/wrong parts on products
  • • Defects, scratches, stains
  • • Orientation/alignment checks

🦺 Safety

  • • PPE compliance (helmets/vests/boots)
  • • Smoke/fire/spill detection
  • • Zone intrusion alerts

📦 Logistics

  • • Pallets, boxes, containers
  • • Load verification
  • • Damage assessment

⚡ Energy/Utility

  • • Valve/switch positions
  • • Gauge readings (combine with OCR)
  • • Equipment status lights

🚪 Access Control

  • • Vehicle/person counting
  • • Badge/uniform verification
  • • Behavioral anomalies

🤖 Model choices

⚡ YOLO family (YOLOv8, YOLOv10)

  • Single pass detection
  • • Good speed/accuracy balance
  • • Works on edge (Jetson, RPi)

🎯 Anchor-free (CenterNet, FCOS)

  • No anchor tuning
  • • Good for small objects
  • • Less hyperparameter noise

🔬 DETR/RT-DETR

  • • Transformer-based
  • • Good for complex scenes
  • • Heavier compute

🏗️ Two-stage (Faster R-CNN, Mask R-CNN)

  • • Higher accuracy potential
  • • Slower inference
  • • Good for precision-critical tasks

📊 Data first

⚠️ Before picking models, nail the data

📋 Schema & Annotation

  • Class definitions: Clear, non-overlapping
  • Bounding box format: COCO, YOLO, or Pascal
  • • Tool consistency (LabelImg, CVAT, LabelStudio)

🎯 Quality Control

  • Inter-annotator agreement >85%
  • • Review edge cases, ambiguous samples
  • • Consistent box tightness

📈 Splits & Balance

  • 70/15/15 or 80/10/10 train/val/test
  • • Stratify by class, site, lighting
  • • Keep test set truly unseen

🔄 Augmentations

  • Spatial: rotation, flip, crop
  • Color: brightness, contrast, saturation
  • Avoid: unrealistic distortions

🗂️ Versioning

  • DVC, Git LFS, or cloud datasets
  • • Track annotation changes
  • • Reproducible model training

📷 Cameras & optics that actually work

📐 Angles

  • • Keep view angles ≤ 25–30°
  • • Roll ≤ when measuring/reading text

🎯 Pixel Density

  • • Target ≥ 32–64 px on smallest dimension
  • • For reliable detection

📸 Shutter

  • • Freeze motion (e.g., 1/250–1/1000 s)
  • • Lock shutter, manage exposure with gain/IR

💡 Illumination

  • • Even light distribution
  • • Add 850 nm IR for low light
  • • WDR for backlight situations

🔍 Lens

  • • Frame the task (don't overshoot FOV)
  • • Use polarizer if glare
  • • Global shutter for high speed

📊 Metrics that matter

🎯 Detection Metrics

  • mAP@50 and mAP@[50:95]
  • • Per-class Precision/Recall/F1

📈 Analysis Tools

  • PR curves & confusion matrix
  • • Find look-alike classes

⚡ Performance

  • Latency & throughput (end-to-end)
  • • Not model-only

🖥️ Resource

  • • GPU/CPU %, memory
  • • Thermal throttling

📈 Business KPIs

  • • Scrap/rework rates
  • • MTTR, pick accuracy
  • • Safety incidents

Optimization playbook

🔧 Quantize & Prune

  • • FP16/INT8 with proper calibration
  • • Channel pruning/distillation for edge

🎯 Right Resolution & ROI

  • • Crop to region, adaptive FPS
  • • Skip empty frames (triggered inference)

🚀 Accelerators

  • • TensorRT/OpenVINO/ONNX Runtime
  • • Batch where latency allows

🔍 Tracking & Logic

  • • Multi-frame voting
  • • Line-crossing, dwell timers
  • • Reduce false events

⚙️ Operations

  • • Autoscale workers, watchdogs
  • • Back-pressure management
  • • Log per-job latency & confidence

🏗️ Deployment patterns

📦 Edge node

Near cameras for low latency & privacy; send events/metadata upstream

🖥️ Server/cluster

For many streams or heavy models; ensure RBAC/MFA, encryption, and HA

🔀 Hybrid

Edge pre-filter → server analytics → cloud reporting

🔄 Pipeline Architecture

RTSP ingest decode pre-proc inference post-proc MQTT/REST events dashboards

🔒 Privacy & compliance

👥 When People are in Frame

If people are in frame (PPE/safety), treat video as personal data:

  • • Clear signage and purpose
  • Retention policy (e.g., 30–90 days)
  • • Encryption in transit/at rest
  • Role-based access control
  • • Redact faces when exporting if not needed
  • • Regular compliance audits

🚩 Red flags

❌ "Works in any lighting/angle"

With no pixel-density or lighting plan

❌ Model-only FPS claims

No decode/post-proc/IO counted

❌ No per-class metrics

No confusion matrix

❌ Single camera expected

To do overview + detail + OCR at once

❌ No dataset/version control

No re-training plan

🔗 GaugeSnap integration

🏭 Edge AI Packs

For factory tasks: gauge/meter regions, PPE, pallets/boxes, valves/levers, smoke/flame cues

🔄 Event APIs

REST/MQTT webhooks to SCADA/MES/ERP/VMS, with images/crops/confidence/latency

📊 Dashboards

mAP/PR by class, per-camera latency, confidence histograms, drift alerts

🌱 Sustainable AI

INT8/FP16, ROI pipelines, energy & cost KPIs per 1k inferences

💻 Example event

{
  "event": "object_detected",
  "camera_id": "line_A_cam3",
  "objects": [
    {"class": "pallet", "bbox": [412,120,188,160], "score": 0.92},
    {"class": "no_helmet", "bbox": [220,96,70,88], "score": 0.87}
  ],
  "latency_ms": 48,
  "ts": "2025-08-25T12:34:56Z"
}

🚀 How to start (low-risk)

1. List 3–8 classes

That matter for your KPI; define pass/fail logic

2. Share 2–3 minutes of video

Per camera (day/night conditions)

3. Get pilot plan

Camera/lighting/pixel-density plan, baseline mAP/PR estimate, and clear KPIs

💡 Principle: Prove with site videos, per-class metrics, and business KPIs—then scale.