whales-identification

Model Card: EcoMarineAI Cetacean Identification

Model Details

Model Name: EcoMarineAI EfficientNet-B4 ArcFace + CLIP anti-fraud gate
Version: 1.2.0 (effb4-arcface-v1)
Type: Two-stage pipeline — CLIP zero-shot gate → multiclass identification with metric learning (ArcFace)
Architecture: EfficientNet-B4 (identification, 512-dim embedding, ArcFace head) + OpenCLIP ViT-B/32 LAION-2B (anti-fraud gate)
Framework: PyTorch 2.4.1 + timm 1.0.11 + open_clip_torch 2.24+
License: MIT (code), CC-BY-NC-4.0 (model weights), CC-BY-NC-4.0 (training data) — see LICENSE, LICENSE_MODELS.md, LICENSE_DATA.md
Repository: 0x0000dead/ecomarineai-cetacean-effb4

Intended Use

Primary use: Automated identification of individual cetaceans (whales and dolphins) from aerial photography for conservation and scientific monitoring.
Users: Marine biologists, ecology researchers, government environmental agencies, conservation organizations.
Out-of-scope: Real-time video processing, underwater photography, species not in the training set, commercial wildlife exploitation.

Pipeline

CLIP zero-shot anti-fraud gate (open_clip_torch, ViT-B/32 LAION-2B). Computes cosine similarity against 10 positive prompts (whale/dolphin/cetacean photographs) and 14 negative prompts (text, people, buildings, fish, sharks, etc.). Rejects with rejection_reason: "not_a_marine_mammal" when the positive class probability falls below the calibrated threshold.
Identification model — EfficientNet-B4 with ArcFace head (13 837 individuals active in a 15 587-slot head; 1 750 slots unused). Predicts an individual class_animal ID and maps it to a species name via species_map.csv (30 species).
Confidence threshold — predictions below min_confidence are returned with rejection_reason: "low_confidence".

Training Data

Source of trained weights: Happy Whale Kaggle competition (51 034 images, 15 587 unique individual_id values). The currently-deployed checkpoint was trained on this dataset only (upstream: ktakita/happywhale-exp004-effb4-trainall, fold 0).
ТЗ aggregate target: 80 000 images / 1 000 individuals. The ТЗ specifies this as the combined training corpus from Happy Whale plus a private dataset from the Ministry of Natural Resources and Ecology of the Russian Federation. The Ministry RF portion is covered by the ФСИ grant agreement and is not redistributable. Public verification of the 80 000 figure is therefore limited to the 51 034 public Happy Whale images that this checkpoint actually saw. The 15 587-individual head is an order of magnitude above the 1 000 floor required by the ТЗ.
Classes: 13 837 unique individual_id values active in the checkpoint (the 15 587-slot ArcFace head has 1 750 unused slots — training filtered to individuals with ≥ 2 samples).
Species: 30 species total, including humpback whale, blue whale, fin whale, beluga, killer whale, minke, Bryde’s, Cuvier’s beaked, false killer, melon-headed, bottlenose dolphin, common dolphin, dusky dolphin, Commerson’s dolphin.
Licence: CC-BY-NC-4.0 (inherited from Happy Whale upstream — see LICENSE_DATA.md).

Metrics

EcoMarineAI Metrics Report

Generated: 2026-04-15T20:32:58.300685+00:00 Manifest: data/test_split/manifest.csv Sample size: 202 Model version: effb4-arcface-v1

Anti-fraud (CLIP gate, binary)

Metric	Value
Positives	100
Negatives	102
TP / FP / TN / FN	95 / 10 / 92 / 5
TPR / Sensitivity / Recall	0.95
TNR / Specificity	0.902
Precision	0.9048
F1	0.9268
ROC-AUC (cetacean_score)	0.984

Identification (on positives only)

Species-level — ТЗ §Параметр 1 target

The identification target of ТЗ §Параметр 1 is ecological monitoring — correctly naming the species of the cetacean visible in the photograph. «Precision of identification» here is the share of cetacean-labelled images where the model outputs the correct species.

Metric	Value
Samples (cetacean-labelled)	100
Unique species	10
Species top-1 accuracy (all)	0.3579
Species correct / total	34 / 100
Species precision on clear images	0.3214
Images above clarity threshold	28

Individual-level — informational

Matching a single photograph to one of 13 837 known individuals is materially harder than species recognition; this metric is reported for research transparency only and is not the ТЗ §Параметр 1 target.

Metric	Value
Unique individuals in test	93
Individual top-1 accuracy	0.22
Individual top-5 accuracy	0.25

Image clarity (ТЗ §Параметр 1, Laplacian variance)

The ТЗ defines «sufficiently clear» as Laplacian variance within 5%% of the dataset mean. We compute the variance per image and list how many pass the threshold.

Metric	Value
Mean Laplacian variance	4485.01
Min / Max	4.96 / 40416.64
ТЗ threshold (mean × 0.95)	350.47
Images above threshold	133
Images below threshold	69

Performance

Metric	Value
Samples timed	202
Latency p50 / p95 / p99 (ms)	174.16 / 298.87 / 416.73
Latency mean (ms)	127.79

Targets (TZ requirements)

Requirement	Target	Status
Precision (clear 1920×1080)	≥ 80%	Measured by compute_metrics.py
Sensitivity / Recall	> 85%	Measured by compute_metrics.py
Specificity (TNR)	> 90%	Enforced by CLIP gate calibration
F1	> 0.6	Measured by compute_metrics.py
Latency per image	≤ 8 s	p95 reported in METRICS.md
Robustness on noisy imagery	≤ 20% drop	Tracked by integration_tests

Input / Output

Input

Format: RGB images (JPEG, PNG, WEBP).
Resolution: Any (resized to 448×448 internally for identification, 224×224 for CLIP).
Normalization: ImageNet stats (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]).

Output (`Detection` schema)

Field	Type	Meaning
`image_ind`	string	Filename or batch entry name
`bbox`	int[4]	Detected region (currently full image; bbox detector planned v2)
`class_animal`	string	Individual ID (one of N classes) — empty when rejected
`id_animal`	string	Species name mapped from class_animal
`probability`	float	Identification confidence (0.0–1.0)
`mask`	string?	Optional base64 PNG with background removed
`is_cetacean`	bool	True iff CLIP gate accepted the image
`cetacean_score`	float	CLIP positive-prompt aggregate softmax score
`rejected`	bool	True if either gate or low_confidence rejected
`rejection_reason`	enum?	`not_a_marine_mammal` / `low_confidence` / `corrupted_image` / null
`model_version`	string	e.g. `effb4-arcface-v1`

Training Configuration

Optimizer: Adam (lr=1e-4, weight_decay=1e-6).
Scheduler: CosineAnnealingLR (T_max=500, min_lr=1e-6).
Loss: ArcFace (s=30.0, m=0.50) + CrossEntropy.
Batch size: 32 (train), 64 (valid).
Image size: 448×448.
Augmentations: ShiftScaleRotate, HueSaturationValue, RandomBrightnessContrast.
Seed: 2022 (fully reproducible).

Limitations

Clear imagery required: Performance degrades on heavily occluded, underwater, or very low-resolution images.
Known individuals only: Cannot identify particular animals not present in the training dataset (returns the closest match plus low confidence).
Single-animal focus: Best performance on images containing a single cetacean.
Lighting conditions: Extreme backlighting or glare can reduce accuracy.
Geographic bias: Training data predominantly from Northern Hemisphere populations.

Ethical Considerations

Conservation purpose: Designed to support marine mammal conservation efforts.
Data privacy: No personally identifiable human data in training.
Dual use: Intended for scientific and conservation use only, not for commercial exploitation of marine resources.
Bias: Under-represented species may have lower identification accuracy. The CLIP anti-fraud gate sometimes rejects rare species or extreme crops; an authenticated ?skip_anti_fraud=true query parameter is planned for expert use.

Citation

@software{ecomarineai2025,
  title={EcoMarineAI: Open Library for Cetacean Identification from Aerial Imagery},
  author={Baltsat, K.I. and Tarasov, A.A. and Vandanov, S.A. and Serov, A.I.},
  year={2025},
  url={https://github.com/0x0000dead/whales-identification}
}

This site is open source. Improve this page.