whales-identification

Model License

CC-BY-NC-4.0 (Creative Commons Attribution-NonCommercial 4.0 International)

Copyright (c) 2024 Baltsat Konstantin, Tarasov Artem, Vandanov Sergey, Serov Alexandr

The trained model weights distributed through this project are licensed under the Creative Commons Attribution-NonCommercial 4.0 International licence — the same licence that governs the upstream training data (Happy Whale). A model is a derivative of its training set; we cannot relax the upstream restriction.

Historical note

Earlier drafts of this repository labelled the models as Apache 2.0. That labelling was inconsistent with the upstream Happy Whale dataset’s CC-BY-NC-4.0 terms, and the issue was flagged during expert review of the intermediate НТО (round 4). The correct canonical licence is CC-BY-NC-4.0. The Hugging Face mirror at 0x0000dead/ecomarineai-cetacean-effb4 matches this file.


Important Usage Restrictions

⚠️ COMMERCIAL USE RESTRICTIONS

The trained models in this repository were developed using datasets that include data licensed under CC-BY-NC-4.0 (Creative Commons Attribution-NonCommercial 4.0 International). As a result:

  1. The trained models may NOT be used for commercial purposes without explicit permission from the original data providers (Happy Whale and Ministry of Natural Resources of the Russian Federation).

  2. This restriction applies specifically to the trained model weights (.pt, .pth, .onnx files) and any derivatives thereof.

  3. Under EU law, models trained on non-commercial data inherit those restrictions when the model is sold or used commercially.

  4. Pretrained Models (Transfer Learning): This project uses pretrained models (ResNet, EfficientNet, ViT, Swin Transformer) that were originally trained on ImageNet. ImageNet has non-commercial research-only terms, adding another layer of restriction beyond Happy Whale and Ministry RF data.


Pretrained Model Foundations

⚠️ CRITICAL: ImageNet Pretrained Weights Restrictions

Our fine-tuned models are built upon pretrained models from the following sources, all of which use ImageNet for initial training:

Pretrained Model Source Code License Pretrained Weights ImageNet Terms
ResNet-50/101 torchvision BSD-3-Clause ImageNet-1k Non-commercial
EfficientNet-B0/B5 TIMM (Google) Apache 2.0 ImageNet-1k Non-commercial
tf_efficientnet_b0_ns Google Noisy Student Apache 2.0 ImageNet + JFT-300M Non-commercial
ViT-B/16, ViT-L/32 Google ViT Apache 2.0 ImageNet-21k Non-commercial
Swin-T, Swin-L Microsoft Swin MIT ImageNet-22k Non-commercial
ConvNeXt-L Facebook ConvNeXt CC-BY-NC 4.0 ImageNet-22k Non-commercial

ImageNet Licensing Summary

ImageNet Dataset Terms:

Implications for This Project:

  1. Even though the pretrained model code uses permissive licenses (Apache 2.0, BSD)
  2. The pretrained weights are restricted by ImageNet terms
  3. Our fine-tuned models are derived from these ImageNet weights
  4. Therefore: Commercial use is prohibited

Combined Restrictions

Our models face triple restrictions on commercial use:

  1. Marine Mammal Community Training Data: CC-BY-NC-4.0 (non-commercial)
  2. Ministry RF Training Data: Government research-only terms
  3. ImageNet Pretrained Weights: Non-commercial research-only terms

Any ONE of these restrictions is sufficient to prohibit commercial use. All three apply simultaneously.

Pretrained Model Attribution

When using our models, you must also acknowledge the pretrained model sources:

Pretrained Model Attributions:
- ResNet: torchvision (BSD-3-Clause), https://github.com/pytorch/vision, trained on ImageNet-1k
- EfficientNet: TIMM/Google (Apache 2.0), https://github.com/huggingface/pytorch-image-models, trained on ImageNet-1k
- Vision Transformer: Google Research (Apache 2.0), https://github.com/google-research/vision_transformer, trained on ImageNet-21k
- Swin Transformer: Microsoft (MIT), https://github.com/microsoft/Swin-Transformer, trained on ImageNet-22k
- ConvNeXt: Facebook/Meta (CC-BY-NC 4.0), https://github.com/facebookresearch/ConvNeXt, trained on ImageNet-22k

All pretrained weights subject to ImageNet non-commercial terms (https://www.image-net.org/download.php).

Permitted Uses

Research and Educational Use

Personal and Non-Commercial Use

Government and Conservation Organizations


Models Covered by This License

The following trained models are subject to this license:

Model Name Architecture Version Status File Location
efficientnet_b4_512_fold0.ckpt EfficientNet-B4 (ArcFace head, 13 837 / 15 587 slots) effb4-arcface-v1 Production whales_be_service/src/whales_be_service/models/efficientnet_b4_512_fold0.ckpt
encoder_classes.npy Label encoder for ArcFace head effb4-arcface-v1 Production whales_be_service/src/whales_be_service/models/encoder_classes.npy
resnet101.pth ResNet-101 (ArcFace, fallback backbone) v1.0 Fallback whales_be_service/src/whales_be_service/models/resnet101.pth
model-e15.pt Vision Transformer L/32 (legacy Stage 1) v1.0 (epoch 15) Deprecated models/model-e15.pt (not auto-downloaded; Yandex Disk only)
Other experimental models Various - Research models/*.pt, models/*.pth

Note: ONNX-optimized models (.onnx files) are also subject to the same license terms.

The anti-fraud gate uses OpenCLIP ViT-B/32 LAION-2B pretrained weights, which are released under their own upstream licence (MIT / permissive). The EcoMarineAI calibrated threshold file (anti_fraud_threshold.yaml) is an artefact of this project and inherits CC-BY-NC-4.0 from the training data.


Model Storage and Distribution

The production models are distributed through:

Download via ./scripts/download_models.sh — the script automatically verifies SHA256 checksums against models/checksums.sha256 and retries up to 3 times on network errors.

Models are NOT stored directly in the GitHub repository due to size constraints (.gitignore exclusion).


Attribution Requirements

When using these models, you must provide proper attribution:

@misc{whales-identification-2024,
  author = {Baltsat, Konstantin and Tarasov, Artem and Vandanov, Sergey and Serov, Alexandr},
  title = {EcoMarineAI: Automated Whale and Dolphin Identification from Aerial Photography},
  year = {2024},
  publisher = {GitHub},
  url = {https://github.com/0x0000dead/whales-identification},
  note = {Models trained on Happy Whale and Ministry of Natural Resources RF data}
}

Additionally, you must acknowledge the original data sources:


Model Limitations and Responsible Use

Technical Limitations

Ethical Considerations

Prohibited Uses

The following uses are explicitly prohibited:

❌ Commercial exploitation without data provider consent ❌ Applications that harm marine mammals or their habitats ❌ Surveillance or tracking of marine mammals for hunting purposes ❌ Misrepresentation of model capabilities or accuracy ❌ Use in contexts that violate local wildlife protection laws


Model Versioning and Updates

Model versions follow semantic versioning: vMAJOR.MINOR.PATCH

Current Stable Version: v1.0 (January 2025)

Model Card: See MODEL_CARD.md for detailed performance metrics, training data specifications, and evaluation results.


Contact for Commercial Licensing

For inquiries regarding commercial use, custom licensing, or partnerships:


License Compatibility

Compatible with:

NOT Compatible with:


Disclaimer

THE MODELS ARE PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE MODELS OR THE USE OR OTHER DEALINGS IN THE MODELS.

The models’ predictions should not be the sole basis for critical conservation decisions. Always validate model outputs with expert marine biologist review.


Updates to This License

This license may be updated to reflect changes in:

Last Updated: January 2025 Version: 1.0 Effective Date: January 2025