Imagery Upload Workflow¶

In late 2025 we worked with a contractor to improve the imagery upload workflow.

The full discussion and plan can be seen here.

The workflow is summarised here by Opus 4.6 LLM as of 13/02/2026.

High-Level Pipeline¶

flowchart LR
    A[Upload] --> B[Ingest]
    B --> C[Classify]
    C --> D[Review]
    D --> E[Process]

Phase	Who / What	Summary
Upload	Browser → S3	Resumable multi-part upload direct to S3
Ingest	ARQ worker	Hash, EXIF extract, thumbnail, duplicate check
Classify	ARQ worker	Quality checks, GPS → task matching, tail removal
Review	User (UI)	Accept/reject images, verify tasks have coverage
Process	ARQ worker	Move files to task folders, trigger ODM

Image Statuses¶

Each image in project_images has a status that tracks where it is in the pipeline:

stateDiagram-v2
    [*] --> staged : ingest OK
    [*] --> invalid_exif : missing EXIF / GPS
    [*] --> duplicate : same hash exists

    staged --> classifying : classification starts
    classifying --> assigned : quality OK + inside task area
    classifying --> rejected : failed quality check
    classifying --> unmatched : valid but outside all tasks
    classifying --> invalid_exif : EXIF/GPS issues found

    assigned --> rejected : flight tail detected

staged - ingested and waiting for classification.
assigned - passed all checks, matched to a task area.
rejected - failed a quality check (blur, exposure, gimbal, flight tail).
unmatched - valid image but GPS is outside all task boundaries.
invalid_exif - EXIF or GPS data is missing / unreadable.
duplicate - identical MD5 hash already exists in the project.

Phase 1 - Upload¶

Images are uploaded from the browser using S3 multi-part resumable uploads. The backend never proxies file bytes - the browser uploads chunks directly to S3 via presigned URLs.

S3 paths:

Staging: projects/{project_id}/user-uploads/{filename}
Direct to task: projects/{project_id}/{task_id}/images/{filename}

API flow (all under /api/projects/):

POST /initiate-multipart-upload/ → get upload_id + file_key
POST /sign-part-upload/ (per chunk) → get presigned URL
Browser PUTs each chunk directly to S3
POST /complete-multipart-upload/ → finalise in S3, enqueue ingest job
GET /list-parts/ - for resuming interrupted uploads
POST /abort-multipart-upload/ - cancel and clean up

Phase 2 - Ingest¶

Each completed upload enqueues a process_uploaded_image ARQ job (deferred 2 s for S3 consistency). Jobs are isolated - one corrupt image won't affect others.

flowchart TD
    A[Download from S3] --> B[Calculate MD5 hash]
    B --> C{Duplicate?}
    C -- Yes --> D[Save as duplicate, stop]
    C -- No --> E[Extract EXIF + GPS]
    E --> F{Valid EXIF & GPS?}
    F -- No --> G[Status = invalid_exif]
    F -- Yes --> H[Status = staged]
    G --> I[Generate thumbnail]
    H --> I
    I --> J[Save to project_images]

A project_images row is always created so the UI can show every upload attempt and its outcome.

Phase 3 - Classify¶

The user triggers classification via POST /{project_id}/classify-batch/, which enqueues a classify_image_batch ARQ job for all staged images in the batch.

Images are classified in parallel (up to 6 at a time). Each image goes through:

EXIF check - must have metadata.
GPS validation - valid coordinates in range.
Gimbal pitch - camera must point down (≤ -20°).
Sharpness - Laplacian variance must be ≥ 100.
Exposure - rejects mostly black (lens cap) or mostly white (overexposed) images.
Task matching - GPS point intersected against task polygons.

Images that pass all checks and fall inside a task area are assigned. Others are marked rejected, unmatched, or invalid_exif.

After classification, flight tail removal runs per task to reject images captured during takeoff/landing transit (detected by analysing azimuthal shifts in the flight trajectory).

Phase 4 - Review¶

The user reviews classified images in the UI:

View images grouped by task, with thumbnails and status.
Inspect image locations on a map overlaid with task boundaries.
Check estimated coverage percentage per task.
Accept previously rejected images, or delete bad ones.
Mark tasks as verified ("fully flown") - this is required before processing can begin.

Phase 5 - Process¶

The user triggers processing via POST /{project_id}/batch/{batch_id}/process/, which enqueues a process_batch_images ARQ job.

Move files - assigned images in verified tasks are copied in S3 from user-uploads/ to {task_id}/images/.
Trigger ODM - for each task with ≥ 3 images, a process_drone_images job is enqueued for photogrammetric processing.

End-to-End Sequence¶

sequenceDiagram
    actor User as Browser
    participant API as Backend API
    participant S3 as S3 / MinIO
    participant ARQ as ARQ Worker

    Note over User,ARQ: Phase 1 - Upload (per image)
    User->>API: Initiate multipart upload
    API->>S3: Create multipart upload
    S3-->>API: upload_id
    API-->>User: upload_id + file_key
    loop Each chunk
        User->>API: Sign part upload
        API-->>User: presigned URL
        User->>S3: PUT chunk to presigned URL
        S3-->>User: ETag
    end
    User->>API: Complete multipart upload
    API->>S3: Finalise upload
    API->>ARQ: Enqueue process_uploaded_image

    Note over User,ARQ: Phase 2 - Ingest (background)
    ARQ->>S3: Download file
    ARQ->>ARQ: Hash + EXIF + thumbnail
    ARQ->>S3: Upload thumbnail
    ARQ->>ARQ: Save to project_images (staged)

    Note over User,ARQ: Phase 3 - Classify
    User->>API: Trigger classification
    API->>ARQ: Enqueue classify_image_batch
    ARQ->>S3: Download images for quality checks
    ARQ->>ARQ: Quality checks + GPS → task matching
    ARQ->>ARQ: Flight tail removal

    Note over User,ARQ: Phase 4 - Review
    User->>API: Poll status, review images
    User->>API: Accept/reject, mark tasks verified

    Note over User,ARQ: Phase 5 - Process
    User->>API: Trigger processing
    API->>ARQ: Enqueue process_batch_images
    ARQ->>S3: Move files to task folders
    ARQ->>ARQ: Enqueue ODM jobs (per task)