Imagery Upload Workflow¶
In late 2025 we worked with a contractor to improve the imagery upload workflow.
The full discussion and plan can be seen here.
The workflow is summarised here by Opus 4.6 LLM as of 13/02/2026.
High-Level Pipeline¶
flowchart LR
A[Upload] --> B[Ingest]
B --> C[Classify]
C --> D[Review]
D --> E[Process]
| Phase | Who / What | Summary |
|---|---|---|
| Upload | Browser → S3 | Resumable multi-part upload direct to S3 |
| Ingest | ARQ worker | Hash, EXIF extract, thumbnail, duplicate check |
| Classify | ARQ worker | Quality checks, GPS → task matching, tail removal |
| Review | User (UI) | Accept/reject images, verify tasks have coverage |
| Process | ARQ worker | Move files to task folders, trigger ODM |
Image Statuses¶
Each image in project_images has a status that tracks where it
is in the pipeline:
stateDiagram-v2
[*] --> staged : ingest OK
[*] --> invalid_exif : missing EXIF / GPS
[*] --> duplicate : same hash exists
staged --> classifying : classification starts
classifying --> assigned : quality OK + inside task area
classifying --> rejected : failed quality check
classifying --> unmatched : valid but outside all tasks
classifying --> invalid_exif : EXIF/GPS issues found
assigned --> rejected : flight tail detected
- staged - ingested and waiting for classification.
- assigned - passed all checks, matched to a task area.
- rejected - failed a quality check (blur, exposure, gimbal, flight tail).
- unmatched - valid image but GPS is outside all task boundaries.
- invalid_exif - EXIF or GPS data is missing / unreadable.
- duplicate - identical MD5 hash already exists in the project.
Phase 1 - Upload¶
Images are uploaded from the browser using S3 multi-part resumable uploads. The backend never proxies file bytes - the browser uploads chunks directly to S3 via presigned URLs.
S3 paths:
- Staging:
projects/{project_id}/user-uploads/{filename} - Direct to task:
projects/{project_id}/{task_id}/images/{filename}
API flow (all under /api/projects/):
POST /initiate-multipart-upload/→ getupload_id+file_keyPOST /sign-part-upload/(per chunk) → get presigned URL- Browser
PUTs each chunk directly to S3 POST /complete-multipart-upload/→ finalise in S3, enqueue ingest jobGET /list-parts/- for resuming interrupted uploadsPOST /abort-multipart-upload/- cancel and clean up
Phase 2 - Ingest¶
Each completed upload enqueues a process_uploaded_image ARQ job
(deferred 2 s for S3 consistency). Jobs are isolated - one corrupt
image won't affect others.
flowchart TD
A[Download from S3] --> B[Calculate MD5 hash]
B --> C{Duplicate?}
C -- Yes --> D[Save as duplicate, stop]
C -- No --> E[Extract EXIF + GPS]
E --> F{Valid EXIF & GPS?}
F -- No --> G[Status = invalid_exif]
F -- Yes --> H[Status = staged]
G --> I[Generate thumbnail]
H --> I
I --> J[Save to project_images]
A project_images row is always created so the UI can show
every upload attempt and its outcome.
Phase 3 - Classify¶
The user triggers classification via
POST /{project_id}/classify-batch/, which enqueues a
classify_image_batch ARQ job for all staged images in the batch.
Images are classified in parallel (up to 6 at a time). Each image goes through:
- EXIF check - must have metadata.
- GPS validation - valid coordinates in range.
- Gimbal pitch - camera must point down (≤ -20°).
- Sharpness - Laplacian variance must be ≥ 100.
- Exposure - rejects mostly black (lens cap) or mostly white (overexposed) images.
- Task matching - GPS point intersected against task polygons.
Images that pass all checks and fall inside a task area are assigned. Others are marked rejected, unmatched, or invalid_exif.
After classification, flight tail removal runs per task to reject images captured during takeoff/landing transit (detected by analysing azimuthal shifts in the flight trajectory).
Phase 4 - Review¶
The user reviews classified images in the UI:
- View images grouped by task, with thumbnails and status.
- Inspect image locations on a map overlaid with task boundaries.
- Check estimated coverage percentage per task.
- Accept previously rejected images, or delete bad ones.
- Mark tasks as verified ("fully flown") - this is required before processing can begin.
Phase 5 - Process¶
The user triggers processing via
POST /{project_id}/batch/{batch_id}/process/, which enqueues a
process_batch_images ARQ job.
- Move files - assigned images in verified tasks are copied in S3
from
user-uploads/to{task_id}/images/. - Trigger ODM - for each task with ≥ 3 images, a
process_drone_imagesjob is enqueued for photogrammetric processing.
End-to-End Sequence¶
sequenceDiagram
actor User as Browser
participant API as Backend API
participant S3 as S3 / MinIO
participant ARQ as ARQ Worker
Note over User,ARQ: Phase 1 - Upload (per image)
User->>API: Initiate multipart upload
API->>S3: Create multipart upload
S3-->>API: upload_id
API-->>User: upload_id + file_key
loop Each chunk
User->>API: Sign part upload
API-->>User: presigned URL
User->>S3: PUT chunk to presigned URL
S3-->>User: ETag
end
User->>API: Complete multipart upload
API->>S3: Finalise upload
API->>ARQ: Enqueue process_uploaded_image
Note over User,ARQ: Phase 2 - Ingest (background)
ARQ->>S3: Download file
ARQ->>ARQ: Hash + EXIF + thumbnail
ARQ->>S3: Upload thumbnail
ARQ->>ARQ: Save to project_images (staged)
Note over User,ARQ: Phase 3 - Classify
User->>API: Trigger classification
API->>ARQ: Enqueue classify_image_batch
ARQ->>S3: Download images for quality checks
ARQ->>ARQ: Quality checks + GPS → task matching
ARQ->>ARQ: Flight tail removal
Note over User,ARQ: Phase 4 - Review
User->>API: Poll status, review images
User->>API: Accept/reject, mark tasks verified
Note over User,ARQ: Phase 5 - Process
User->>API: Trigger processing
API->>ARQ: Enqueue process_batch_images
ARQ->>S3: Move files to task folders
ARQ->>ARQ: Enqueue ODM jobs (per task)