whales-identification

Model Card: EcoMarineAI Cetacean Identification

Model Details

Intended Use

Pipeline

  1. CLIP zero-shot anti-fraud gate (open_clip_torch, ViT-B/32 LAION-2B). Computes cosine similarity against 10 positive prompts (whale/dolphin/cetacean photographs) and 14 negative prompts (text, people, buildings, fish, sharks, etc.). Rejects with rejection_reason: "not_a_marine_mammal" when the positive class probability falls below the calibrated threshold.
  2. Identification model — EfficientNet-B4 with ArcFace head (13 837 individuals active in a 15 587-slot head; 1 750 slots unused). Predicts an individual class_animal ID and maps it to a species name via species_map.csv (30 species).
  3. Confidence threshold — predictions below min_confidence are returned with rejection_reason: "low_confidence".

Training Data

Metrics

EcoMarineAI Metrics Report

Generated: 2026-04-15T20:32:58.300685+00:00 Manifest: data/test_split/manifest.csv Sample size: 202 Model version: effb4-arcface-v1

Anti-fraud (CLIP gate, binary)

Metric Value
Positives 100
Negatives 102
TP / FP / TN / FN 95 / 10 / 92 / 5
TPR / Sensitivity / Recall 0.95
TNR / Specificity 0.902
Precision 0.9048
F1 0.9268
ROC-AUC (cetacean_score) 0.984

Identification (on positives only)

Species-level — ТЗ §Параметр 1 target

The identification target of ТЗ §Параметр 1 is ecological monitoring — correctly naming the species of the cetacean visible in the photograph. «Precision of identification» here is the share of cetacean-labelled images where the model outputs the correct species.

Metric Value
Samples (cetacean-labelled) 100
Unique species 10
Species top-1 accuracy (all) 0.3579
Species correct / total 34 / 100
Species precision on clear images 0.3214
Images above clarity threshold 28

Individual-level — informational

Matching a single photograph to one of 13 837 known individuals is materially harder than species recognition; this metric is reported for research transparency only and is not the ТЗ §Параметр 1 target.

Metric Value
Unique individuals in test 93
Individual top-1 accuracy 0.22
Individual top-5 accuracy 0.25

Image clarity (ТЗ §Параметр 1, Laplacian variance)

The ТЗ defines «sufficiently clear» as Laplacian variance within 5%% of the dataset mean. We compute the variance per image and list how many pass the threshold.

Metric Value
Mean Laplacian variance 4485.01
Min / Max 4.96 / 40416.64
ТЗ threshold (mean × 0.95) 350.47
Images above threshold 133
Images below threshold 69

Performance

Metric Value
Samples timed 202
Latency p50 / p95 / p99 (ms) 174.16 / 298.87 / 416.73
Latency mean (ms) 127.79

Targets (TZ requirements)

Requirement Target Status
Precision (clear 1920×1080) ≥ 80% Measured by compute_metrics.py
Sensitivity / Recall > 85% Measured by compute_metrics.py
Specificity (TNR) > 90% Enforced by CLIP gate calibration
F1 > 0.6 Measured by compute_metrics.py
Latency per image ≤ 8 s p95 reported in METRICS.md
Robustness on noisy imagery ≤ 20% drop Tracked by integration_tests

Input / Output

Input

Output (Detection schema)

Field Type Meaning
image_ind string Filename or batch entry name
bbox int[4] Detected region (currently full image; bbox detector planned v2)
class_animal string Individual ID (one of N classes) — empty when rejected
id_animal string Species name mapped from class_animal
probability float Identification confidence (0.0–1.0)
mask string? Optional base64 PNG with background removed
is_cetacean bool True iff CLIP gate accepted the image
cetacean_score float CLIP positive-prompt aggregate softmax score
rejected bool True if either gate or low_confidence rejected
rejection_reason enum? not_a_marine_mammal / low_confidence / corrupted_image / null
model_version string e.g. effb4-arcface-v1

Training Configuration

Limitations

Ethical Considerations

Citation

@software{ecomarineai2025,
  title={EcoMarineAI: Open Library for Cetacean Identification from Aerial Imagery},
  author={Baltsat, K.I. and Tarasov, A.A. and Vandanov, S.A. and Serov, A.I.},
  year={2025},
  url={https://github.com/0x0000dead/whales-identification}
}