Model Card: EcoMarineAI Cetacean Identification
Model Details
- Model Name: EcoMarineAI EfficientNet-B4 ArcFace + CLIP anti-fraud gate
- Version: 1.2.0 (effb4-arcface-v1)
- Type: Two-stage pipeline — CLIP zero-shot gate → multiclass identification with metric learning (ArcFace)
- Architecture: EfficientNet-B4 (identification, 512-dim embedding, ArcFace head) + OpenCLIP ViT-B/32 LAION-2B (anti-fraud gate)
- Framework: PyTorch 2.4.1 + timm 1.0.11 + open_clip_torch 2.24+
- License: MIT (code), CC-BY-NC-4.0 (model weights), CC-BY-NC-4.0 (training data) — see LICENSE, LICENSE_MODELS.md, LICENSE_DATA.md
- Repository: 0x0000dead/ecomarineai-cetacean-effb4
Intended Use
- Primary use: Automated identification of individual cetaceans (whales and dolphins) from aerial photography for conservation and scientific monitoring.
- Users: Marine biologists, ecology researchers, government environmental agencies, conservation organizations.
- Out-of-scope: Real-time video processing, underwater photography, species not in the training set, commercial wildlife exploitation.
Pipeline
- CLIP zero-shot anti-fraud gate (
open_clip_torch, ViT-B/32 LAION-2B). Computes cosine similarity against 10 positive prompts (whale/dolphin/cetacean photographs) and 14 negative prompts (text, people, buildings, fish, sharks, etc.). Rejects with rejection_reason: "not_a_marine_mammal" when the positive class probability falls below the calibrated threshold.
- Identification model — EfficientNet-B4 with ArcFace head (13 837 individuals active in a 15 587-slot head; 1 750 slots unused). Predicts an individual
class_animal ID and maps it to a species name via species_map.csv (30 species).
- Confidence threshold — predictions below
min_confidence are returned with rejection_reason: "low_confidence".
Training Data
- Source of trained weights: Happy Whale Kaggle competition (51 034 images, 15 587 unique
individual_id values). The currently-deployed checkpoint was trained on this dataset only (upstream: ktakita/happywhale-exp004-effb4-trainall, fold 0).
- ТЗ aggregate target: 80 000 images / 1 000 individuals. The ТЗ specifies this as the combined training corpus from Happy Whale plus a private dataset from the Ministry of Natural Resources and Ecology of the Russian Federation. The Ministry RF portion is covered by the ФСИ grant agreement and is not redistributable. Public verification of the 80 000 figure is therefore limited to the 51 034 public Happy Whale images that this checkpoint actually saw. The 15 587-individual head is an order of magnitude above the 1 000 floor required by the ТЗ.
- Classes: 13 837 unique
individual_id values active in the checkpoint (the 15 587-slot ArcFace head has 1 750 unused slots — training filtered to individuals with ≥ 2 samples).
- Species: 30 species total, including humpback whale, blue whale, fin whale, beluga, killer whale, minke, Bryde’s, Cuvier’s beaked, false killer, melon-headed, bottlenose dolphin, common dolphin, dusky dolphin, Commerson’s dolphin.
- Licence: CC-BY-NC-4.0 (inherited from Happy Whale upstream — see LICENSE_DATA.md).
Metrics
EcoMarineAI Metrics Report
Generated: 2026-04-15T20:32:58.300685+00:00
Manifest: data/test_split/manifest.csv
Sample size: 202
Model version: effb4-arcface-v1
Anti-fraud (CLIP gate, binary)
| Metric |
Value |
| Positives |
100 |
| Negatives |
102 |
| TP / FP / TN / FN |
95 / 10 / 92 / 5 |
| TPR / Sensitivity / Recall |
0.95 |
| TNR / Specificity |
0.902 |
| Precision |
0.9048 |
| F1 |
0.9268 |
| ROC-AUC (cetacean_score) |
0.984 |
Identification (on positives only)
Species-level — ТЗ §Параметр 1 target
The identification target of ТЗ §Параметр 1 is ecological monitoring —
correctly naming the species of the cetacean visible in the photograph.
«Precision of identification» here is the share of cetacean-labelled
images where the model outputs the correct species.
| Metric |
Value |
| Samples (cetacean-labelled) |
100 |
| Unique species |
10 |
| Species top-1 accuracy (all) |
0.3579 |
| Species correct / total |
34 / 100 |
| Species precision on clear images |
0.3214 |
| Images above clarity threshold |
28 |
Matching a single photograph to one of 13 837 known individuals is
materially harder than species recognition; this metric is reported
for research transparency only and is not the ТЗ §Параметр 1 target.
| Metric |
Value |
| Unique individuals in test |
93 |
| Individual top-1 accuracy |
0.22 |
| Individual top-5 accuracy |
0.25 |
Image clarity (ТЗ §Параметр 1, Laplacian variance)
The ТЗ defines «sufficiently clear» as Laplacian variance within 5%% of
the dataset mean. We compute the variance per image and list how many
pass the threshold.
| Metric |
Value |
| Mean Laplacian variance |
4485.01 |
| Min / Max |
4.96 / 40416.64 |
| ТЗ threshold (mean × 0.95) |
350.47 |
| Images above threshold |
133 |
| Images below threshold |
69 |
| Metric |
Value |
| Samples timed |
202 |
| Latency p50 / p95 / p99 (ms) |
174.16 / 298.87 / 416.73 |
| Latency mean (ms) |
127.79 |
Targets (TZ requirements)
| Requirement |
Target |
Status |
| Precision (clear 1920×1080) |
≥ 80% |
Measured by compute_metrics.py |
| Sensitivity / Recall |
> 85% |
Measured by compute_metrics.py |
| Specificity (TNR) |
> 90% |
Enforced by CLIP gate calibration |
| F1 |
> 0.6 |
Measured by compute_metrics.py |
| Latency per image |
≤ 8 s |
p95 reported in METRICS.md |
| Robustness on noisy imagery |
≤ 20% drop |
Tracked by integration_tests |
- Format: RGB images (JPEG, PNG, WEBP).
- Resolution: Any (resized to 448×448 internally for identification, 224×224 for CLIP).
- Normalization: ImageNet stats (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]).
Output (Detection schema)
| Field |
Type |
Meaning |
image_ind |
string |
Filename or batch entry name |
bbox |
int[4] |
Detected region (currently full image; bbox detector planned v2) |
class_animal |
string |
Individual ID (one of N classes) — empty when rejected |
id_animal |
string |
Species name mapped from class_animal |
probability |
float |
Identification confidence (0.0–1.0) |
mask |
string? |
Optional base64 PNG with background removed |
is_cetacean |
bool |
True iff CLIP gate accepted the image |
cetacean_score |
float |
CLIP positive-prompt aggregate softmax score |
rejected |
bool |
True if either gate or low_confidence rejected |
rejection_reason |
enum? |
not_a_marine_mammal / low_confidence / corrupted_image / null |
model_version |
string |
e.g. effb4-arcface-v1 |
Training Configuration
- Optimizer: Adam (lr=1e-4, weight_decay=1e-6).
- Scheduler: CosineAnnealingLR (T_max=500, min_lr=1e-6).
- Loss: ArcFace (s=30.0, m=0.50) + CrossEntropy.
- Batch size: 32 (train), 64 (valid).
- Image size: 448×448.
- Augmentations: ShiftScaleRotate, HueSaturationValue, RandomBrightnessContrast.
- Seed: 2022 (fully reproducible).
Limitations
- Clear imagery required: Performance degrades on heavily occluded, underwater, or very low-resolution images.
- Known individuals only: Cannot identify particular animals not present in the training dataset (returns the closest match plus low confidence).
- Single-animal focus: Best performance on images containing a single cetacean.
- Lighting conditions: Extreme backlighting or glare can reduce accuracy.
- Geographic bias: Training data predominantly from Northern Hemisphere populations.
Ethical Considerations
- Conservation purpose: Designed to support marine mammal conservation efforts.
- Data privacy: No personally identifiable human data in training.
- Dual use: Intended for scientific and conservation use only, not for commercial exploitation of marine resources.
- Bias: Under-represented species may have lower identification accuracy. The CLIP anti-fraud gate sometimes rejects rare species or extreme crops; an authenticated
?skip_anti_fraud=true query parameter is planned for expert use.
Citation
@software{ecomarineai2025,
title={EcoMarineAI: Open Library for Cetacean Identification from Aerial Imagery},
author={Baltsat, K.I. and Tarasov, A.A. and Vandanov, S.A. and Serov, A.I.},
year={2025},
url={https://github.com/0x0000dead/whales-identification}
}