Canadian University Dubai

School of Engineering, Applied Science and Technology

Instructor: Dr. Najla Al Futaisi

BCS 407 Artificial Intelligence Spring 2026

Campus Infrastructure
Object Detection
with YOLOv11n

Group Members

Tahamid Hossain 20220001801

Sumaid Bin Omar 20220001454

Parth Aggarwal 20220001200

Rufaid Bin Omar 20230003171

Arham Bin Azad 20220001121

YOLOv11n · 2.6M params CUDA · RTX 4060 Batch 04 · 2026-04-24 100 Epochs

Campus Infrastructure
Object Detection
with YOLOv11n

End-to-end pipeline for detecting four campus assets — projectors, whiteboards, fire extinguishers, and door signs — using a curated 800-image dataset with perfect class balance, achieving near-saturating accuracy on safety-critical classes.

Comparative Analysis Batch 04 vs Batch 05 · updated custom dataset YOLOv11s Comparative Analysis Batch 04 vs 05 vs 06 · baseline + dataset + model uplift

0%

mAP@0.5

Test set · 80 images

0%

mAP@0.5:0.95

Tight IoU eval

1.000

Precision

Zero false positives

0%

Recall

Macro average

01 · Pipeline

End-to-End Workflow

Seven notebooks take raw dataset exports through aggregation, splitting, health-checking, training, evaluation, inference, and export — each stage's artefacts feeding the next.

NB01

Aggregate

200 (image, label) pairs per class from Roboflow/Kaggle exports → JPEG uniformity

→

NB02

Remap + Split

Global ID assignment · class ID normalisation · stratified 70/20/10 split

→

NB03

Health Check

Class balance, box distribution, leakage detection, dimension stats

→

NB04

Train YOLOv11n

100 epochs · SGD · AMP FP16 · mosaic augmentation · RTX 4060

→

NB05

Evaluate

Held-out test split · mAP, P/R curves · confusion matrix · qualitative preds

→

NB06

Live Inference

Single image · batch · video · webcam modes against best.pt

→

NB07

ONNX Export

Opset 12 · dynamic axes · onnx-simplifier · 50% size reduction

↗

02 · Dataset

Data Collection & Preparation

2.1 · Data Collection — NB01 ▶

Raw images sourced from Roboflow, except doorsign2–4, which were custom-captured from HUB campus door signs, annotated, and prepared by the team. NB01 aggregated exactly 200 (image, label) pairs per class into data/aggregated/<class>/, re-encoding all images to JPEG.

Class	Available Pairs	After Cap	Source
● projector	319	200	Projector1, 2, 3 (Roboflow)
● whiteboard	200	200	Single Roboflow export
● fire_extinguisher	848	200	Kaggle train/valid/test
● door_sign	240	200	doorsign1 (Roboflow); doorsign2–4 (custom — HUB campus, annotated by team)

2.2 · Stratified Split — NB02 ▶

NB02 merged all four classes into a unified Ultralytics-compatible dataset with global ID re-indexing and class ID normalisation (0=projector, 1=whiteboard, 2=fire_extinguisher, 3=door_sign). Stratified 140/40/20 split per class.

Class	Train	Val	Test	Boxes/img (train)
● projector	140	40	20	0.82
● whiteboard	140	40	20	1.09
● fire_extinguisher	140	40	20	1.19
● door_sign	140	40	20	1.86
Total	560	160	80	—

💡

door_sign averages ~1.9 boxes/image across all splits — exit + number placards commonly co-occur. projector falls below 1.0 boxes/image, indicating more background frames.

Figure 1. 3×4 spot-check grid of class samples drawn from the split dataset.

2.3 · Dataset Health Check — NB03 ▶

Train images560

Val images160

Test images80

Total boxes997

Empty labels4.4%

Tiny boxes0

Leakage0 dups

Median dim640×640

Figure 2. Per-class image counts by split.

Figure 3. Normalised box area distribution per class.

Figure 4. Width vs height scatter — 800 images. Cluster at (640,640) dominates; sparse high-res outliers from Kaggle fire extinguisher photos.

03 · Methodology

Model Selection & Training Config

Architecture Comparison

Architecture	Params	COCO mAP	Latency (T4)	Verdict
Faster R-CNN (ResNet-50)	~41 M	42.9	~47 ms	Too slow
SSD MobileNetV2	~3.4 M	22.1	~1.2 ms	Low ceiling
YOLOv8n	3.2 M	37.3	~1.47 ms	Strong baseline
YOLOv11n	2.6 M	39.5	~1.55 ms	✓ Selected
YOLOv11s	9.4 M	47.0	~2.46 ms	+acc, +params
YOLOv11m	20.1 M	51.5	~4.70 ms	Not edge

🎯

Why YOLOv11n? 2.6M params — 0.6M fewer than YOLOv8n with 2.2 points higher COCO accuracy. C3k2 + SPPF backbone with depthwise convolutions exports cleanly to ONNX. Transfer learning from COCO provides critical feature initialisation for our small 560-image training set.

Training Configuration ▶

Weightsyolo11n.pt

Epochs100

Patience15

Batch size16

Image size640×640

OptimizerSGD

lr0 / lrf0.01 / 0.01

Warmup3 epochs

AMPTrue (FP16)

Seed42

IoU thresh0.7

DeviceCUDA:0

Loss Weights

Box (CIoU)7.5

Classification (BCE)0.5

DFL1.5

Data Augmentation ▶

Mosaic1.0 (off last 10 ep)

RandAugmentEnabled

Random Erasing0.4 prob

Horizontal flip0.5 prob

Scale jitter±50%

Translation±10%

HSV hue0.015

HSV sat/val0.7 / 0.4

MixUp/CutMixDisabled

Vertical flipDisabled

🔀

Mosaic concatenates 4 training images into a single 640×640 tile — effectively quadrupling context diversity and forcing the model to handle partially visible objects. Disabled for the final 10 epochs to consolidate clean features.

04 · Architecture

YOLOv11n — Model Structure

Single-stage anchor-free detector with task-aligned assignment (TAL) — eliminates anchor tuning and improves small-object recall.

Input

RGB Image · 640×640

↓

── Backbone ──

C3k2 Blocks

Compact CSP bottleneck
2×depthwise conv

SPPF

Spatial Pyramid Pooling – Fast
Multi-scale context

↓

── Neck ──

PANet

Bidirectional feature fusion
P3/8 · P4/16 · P5/32

↓

── Head ──

Classification Branch

BCE loss · 4 classes

Regression Branch

DFL · CIoU · anchor-free

↓

TAL Assignments → NMS → Detections

Task-Aligned Learning · IoU threshold 0.7

Parameters2.6 M

ParadigmOne-stage · anchor-free

COCO mAP@0.5:0.9539.5

T4 latency~1.55 ms/img

05 · Training

Training Curves & Convergence

Model peaked at epoch 60 (val mAP@0.5 = 0.9489). Val box loss flattened around 0.84 while train loss continued decreasing — mild overfitting, correctly halted by patience=15.

Val mAP over Epochs

Box Loss over Epochs

Epoch	Train Box Loss	Val mAP@0.5	Val mAP@0.5:0.95
1	1.068	0.199	0.122
10	~0.88	~0.650	~0.450
30	~0.77	~0.880	~0.680
60 (best)	0.744	0.9489	0.7251
100 (final)	0.511	0.938	0.735

Figure 6. Training curves — box + classification loss on train/val (left); mAP@0.5 and mAP@0.5:0.95 on val set across 100 epochs (right).

Figure 5. YOLO-generated label distribution: class frequency (left) and bounding-box spatial heatmap (right).

Validation Batch — Ground Truth vs. Predictions

Ground Truth Val batch 0 labels

Predictions Val batch 0 predictions

Ground Truth Val batch 1 labels

Predictions Val batch 1 predictions

Ground Truth Val batch 2 labels

Predictions Val batch 2 predictions

Training Batch Mosaic

Figure 7. Example mosaic training batch (batch 0) — augmented 4-tile compositions at 640×640.

Post-Training Sanity Check

Figure 9. 1×4 sanity inference grid — correct class assignment with high confidence scores confirmed.

06 · Results

Test-Set Evaluation

Evaluated exclusively on the held-out test split (80 images, 102 boxes) using best.pt at conf=0.25, IoU=0.5.

0%

mAP@0.5

0%

mAP@0.5:0.95

100%

Precision (macro)

0%

Recall (macro)

Per-Class Performance

● projector

P: 1.000 R: 0.804 mAP@0.5: 0.896

mAP@0.5

89.6%

mAP@0.5:0.95

72.9%

Recall

80.4%

● whiteboard

P: 1.000 R: 0.763 mAP@0.5: 0.876

mAP@0.5

87.6%

mAP@0.5:0.95

77.7%

Recall

76.3%

● fire_extinguisher

P: 1.000 R: 0.957 mAP@0.5: 0.978

mAP@0.5

97.8%

mAP@0.5:0.95

90.5%

Recall

95.7%

● door_sign

P: 1.000 R: 0.922 mAP@0.5: 0.982

mAP@0.5

98.2%

mAP@0.5:0.95

77.3%

Recall

92.2%

Confusion Matrices

Figure 11. Raw-count confusion matrix (4×4 + background).

Figure 12. Row-normalised confusion matrix — diagonal dominance confirms class separability.

Figure 13. Custom dual-panel Seaborn confusion matrix.

Precision–Recall & Threshold Curves

Figure 14. Precision–Recall curves per class.

Figure 17. F1 score vs. confidence threshold.

Figure 15. Precision vs. confidence — all classes at P=1.0 above conf≈0.4.

Figure 16. Recall vs. confidence threshold.

Qualitative Predictions

Figure 18. 2×4 grid of test-set images with predicted bounding boxes and confidence scores overlaid.

Ultralytics Results Dashboard

Figure 10. 10-subplot training dashboard — loss curves, precision, recall, mAP across 100 epochs.

07 · Discussion

Strengths & Limitations

✓ Zero False Positives

Macro precision = 1.0 across all classes. The model never hallucinates detections above the 0.25 threshold — critical for inventory and safety-audit applications where false alarms erode trust.

✓ Strong mAP@0.5

93.3% mAP@0.5 on 80 test images with a 2.6M parameter model. Fine-tuning on a narrow four-class domain closes the gap from COCO's 39.5% dramatically.

✓ Safety-Critical Near-Saturation

fire_extinguisher (97.8%) and door_sign (98.2%) mAP@0.5 — the highest-value outcome for compliance auditing. Visually distinctive appearance aids detection.

✓ Edge-Ready Weight Size

22 MB PyTorch → 11 MB ONNX. ONNX export enables TensorRT, OpenVINO, and CoreML deployment without a PyTorch runtime dependency.

⚠ Projector/Whiteboard Recall

Recall of 0.804 / 0.763 respectively. High intra-class variance: ceiling-mounted vs. desktop projectors, reflective whiteboards with/without text. Lowest boxes/image ratio (~0.82).

⚠ mAP@0.5:0.95 Gap

13.7-point gap between mAP@0.5 (0.933) and mAP@0.5:0.95 (0.796). The lightweight regression head lacks capacity for sub-pixel box refinement. YOLOv11s or cascaded refinement would close this.

⚠ Small Dataset

560 training images for 4 classes. 4.4% empty-label rate reduces effective positives. Scaling to ~500 images per class would likely push projector and whiteboard recall above 0.90.

⚠ Single Domain

All images from similar institutional settings. Performance on unusual campus layouts, diverse lighting, or non-standard equipment may degrade due to domain shift.

Improvement Directions

Issue	Suggested Fix
Low projector/whiteboard recall	100–200 additional images per class from varied viewpoints and lighting
mAP@0.5:0.95 gap	Upgrade to YOLOv11s; or add test-time augmentation (TTA)
Potential domain shift	Add online images from different institutions; apply colour jitter at inference
Overfitting after epoch 60	Add dropout (currently 0.0) or stronger weight decay

08 · Deployment

Inference & Export

Live Inference Modes — NB06

🖼

Single Image

Load, infer, display detection with latency measurement

📁

Batch / Folder

Per-class counts and throughput (images/s) over directory

🎬

Video File

Frame-by-frame annotation + annotated video + preview grid

📷

Webcam

Live feed with real-time overlay; stop via widget button

⚙

CONF_THRESH = 0.25 · IMGSZ = 640 · device auto-selected (CUDA → MPS → CPU)

ONNX Export — NB07

ONNX opset12

Dynamic axesTrue

SimplifyTrue

Size reduction~50%

.pt

best.pt

PyTorch checkpoint · full training state

~22 MB

ONNX

best.onnx

Framework-agnostic · TensorRT / OpenVINO / CoreML

~11 MB

Conclusion

A complete object detection pipeline for four campus infrastructure classes was built and evaluated. YOLOv11n achieves mAP@0.5 = 93.3% with zero false positives at the chosen confidence threshold. Safety-critical classes fire_extinguisher and door_sign are detected at near-saturation accuracy (mAP@0.5 > 97.8%). Both PyTorch and ONNX weights are production-ready for real-time campus deployment.

Campus Infrastructure Object Detection with YOLOv11n

Campus Infrastructure Object Detection with YOLOv11n

End-to-End Workflow

Data Collection & Preparation

Model Selection & Training Config

Architecture Comparison

Loss Weights

YOLOv11n — Model Structure

Training Curves & Convergence

Validation Batch — Ground Truth vs. Predictions

Training Batch Mosaic

Post-Training Sanity Check

Test-Set Evaluation

Per-Class Performance

Confusion Matrices

Precision–Recall & Threshold Curves

Qualitative Predictions

Ultralytics Results Dashboard

Strengths & Limitations

Improvement Directions

Inference & Export

Live Inference Modes — NB06

ONNX Export — NB07

Campus Infrastructure
Object Detection
with YOLOv11n

Campus Infrastructure
Object Detection
with YOLOv11n