whales-identification

Compliance & licensing

EcoMarineAI touches three regulated areas: data licenses, software licenses, and environmental / ethical use. This doc consolidates everything so a legal / ethics reviewer can audit in one place.


1. Software license (source code)

2. Training data license

The combined training corpus has two sources; the derivative model inherits the most restrictive terms of both.

2.1 Happy Whale dataset

2.2 Ministry of Natural Resources RF dataset

2.3 Combined effect

Use case Permitted? Notes
Academic research Attribute both sources
Educational use Accredited institutions
Non-profit conservation
Scientific publications Cite both datasets
Government monitoring (RF) With ФСИ approval
Commercial products Blocked by CC-BY-NC and gov restrictions
Open-source tools (MIT) For non-commercial downstream use
Startups / for-profit orgs Unless they obtain commercial licence

3. Pre-trained model licenses

Component Upstream Licence
OpenCLIP ViT-B/32 laion/CLIP-ViT-B-32-laion2B-s34B-b79K Apache 2.0
EfficientNet-B4 (ImageNet) timm efficientnet_b4 pre-trained weights Apache 2.0
ArcFace head (fine-tuned) ktakita/happywhale-exp004-effb4-trainall MIT (Kaggle user content — CC0/MIT mixture)
ResNet-101 (legacy) baltsat/Whales-Identification MIT
rembg background removal danielgatis/rembg MIT

All permissive, but each component’s attribution requirement is preserved in LICENSE_MODELS.md.

4. Hugging Face mirror

0x0000dead/ecomarineai-cetacean-effb4 on HF carries the combined licence: CC-BY-NC-4.0 (taking the strictest of the inputs). The HF model card lists all upstream sources in the datasets front-matter and in the ## Licensing section.

5. Third-party dependency audit

See LICENSES_ANALYSIS.md for the full 159-dependency license breakdown. Summary:

6. GDPR / personal data

7. Environmental impact

8. Dual-use / ethical considerations

9. ГОСТ 7.32-2017 alignment

The source code alignment is complete. The accompanying research report (НТО) must still follow ГОСТ rules for its Russian-language deliverable. A non-exhaustive checklist of what the code side enables:

ГОСТ requirement Source artifact
Библиографические ссылки на методы RESEARCH_NOTES.md §6
Воспроизводимость результатов scripts/compute_metrics.py, scripts/benchmark_*
Структурированное описание архитектуры ML_ARCHITECTURE.md
Документация пользовательского интерфейса USER_GUIDE_BIOLOGIST.md
Документация API API_REFERENCE.md
План тестирования TESTING_STRATEGY.md
Развёртывание и эксплуатация DEPLOYMENT.md, MLOPS_PLAYBOOK.md

10. Contact for compliance questions

For data-license questions: the upstream providers (Happy Whale and the Ministry of Natural Resources RF). For code-license questions: open a GitHub issue tagged legal.