Switch between modes from the AI Analysis dock. The rest of the pipeline β frame extraction, KLV sync, georeferencing, and VMTI encoding β is identical regardless of mode.
.onnx model file from the local filesystem[1, 84, 8400] β NMS β detections# Python (QGIS)
import onnxruntime as ort
session = ort.InferenceSession(
"yolov11.onnx",
providers=["CUDAExecutionProvider",
"CPUExecutionProvider"]
)
outputs = session.run(None,
{"images": preprocessed_tensor}
)
# Apply NMS and confidence filter
detections = postprocess(outputs, conf=0.5, iou=0.45)
# Python (QGIS) β async httpx client
payload = {
"model": "qwen/qwen3-vl",
"messages": [{
"role": "user",
"content": [{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{b64}"}
}, {
"type": "text",
"text": "Detect objects. Return JSON array: [{bbox:[x,y,w,h], label, confidence}]"
}]
}]
}
| Model | Mode | Provider | Best For | Air-Gap | Config |
|---|---|---|---|---|---|
| YOLOv11 (default) | Local | Ships with GeoFMV | Fast object detection, vehicles, personnel, infrastructure | β | Loaded from local .onnx path |
| Custom ONNX model | Local | User-supplied | Domain-specific detection (aircraft, vessels, damage) | β | Any YOLO-compatible ONNX |
| QWEN3-VL | Remote | Alibaba via OpenRouter | High-accuracy object detection, scene analysis, OCR | β | qwen/qwen3-vl |
| Llama-3.2-Vision | Remote | Meta via OpenRouter | General vision, scene captioning, VQA | β | meta-llama/llama-3.2-vision |
| ERNIE 4.5 | Remote | Baidu via OpenRouter | OCR, text spotting on signage and vehicle markings | β | baidu/ernie-4.5 |
| GPT-4.1-mini | Remote | OpenAI direct | Scene summarization, fallback detection | β | openai/gpt-4.1-mini |
When MISB ST 0601 Tags 82β89 (Corner Latitude/Longitude Points 1β4) are present, the georeferencing engine computes a perspective transform (homography matrix) from the four frame pixel corners to the four geographic corners. Any pixel coordinate β including AI bounding box centres β can then be projected to a WGS 84 Lat/Lon with sub-pixel accuracy.
# Compute 3Γ3 perspective transform matrix
src_pts = np.array([[0,0],[w,0],[w,h],[0,h]], dtype=float32)
dst_pts = np.array([nw, ne, se, sw]) # KLV Tags 82-89
M, _ = cv2.findHomography(src_pts, dst_pts)
# Transform bounding box centre pixel β Lat/Lon
geo = cv2.perspectiveTransform(
np.array([[[px, py]]], dtype=float32), M
)[0][0] # β [longitude, latitude]
When Corner Coordinates are absent, the engine projects a unit look vector from the sensor's 3D position (Tags 13/14/15), rotated by the platform attitude (Tags 5/6/7) and adjusted for the pixel's position relative to the sensor FOV (Tags 16/17). The ray intersects the active DEM layer or the WGS 84 reference ellipsoid, and the intersection point is converted from ECEF to geodetic coordinates.
With homography (corner coordinates present) and sensor altitude
between 100β5,000 m AGL: <15 m CEP.
Homography residual is stored on each detection for quality assessment.
Residuals >50 m trigger a warning but the detection is not discarded.
Extract one frame per configurable interval. Default: 1 fps. Range: 0.1β30 fps. For a 10-minute video at 1 fps β 600 frames submitted for inference.
Extract a frame whenever the Sensor HFOV changes by more than a configurable threshold (default: 2Β°) between consecutive KLV packets. Captures scene changes due to zoom or attitude shifts.
Analyst triggers extraction of the current frame on demand, or initiates full-video batch processing from start to finish. Progress indicator shows percentage and ETA.
The OpenRouter API key is stored in the QGIS Authentication Manager
(QgsAuthManager) as an authcfg entry.
It is never written to any config file, log file, or environment variable.
The settings dialog allows entry, update, and deletion without displaying
the key in plaintext.
On ArcGIS Pro, the key is stored in the Windows Credential Manager via
Windows.Security.Credentials.PasswordVault (DPAPI).
The key is retrieved at runtime from the vault β it never touches disk.
When Local Mode is active, zero outbound AI network calls are made.