This roadmap maps the state of the repo onto the three этапа of the ФСИ grant and the technical parameters of the ТЗ.
| КП работа | Status | Evidence |
|---|---|---|
| 1.1 Настройка репозитория + авто-проверки PR | ✓ | .pre-commit-config.yaml, .github/workflows/ci.yml, pytest.ini |
| 1.2 Тестирование пилотных прототипов детекции | ✓ | research/notebooks/06_benchmark_*, comparison_research/ |
| 1.3 Интервью с экспертами | ✓ | wiki_content/Contributing.md, списки экспертов |
| 1.4 Разработка прототипов детекции + обработки | ✓ | whales_identify/filter_processor.py, dataset.py, research/notebooks/01_* |
| 1.5 Обучение НС с сохранением промежуточных весов | ✓ | whales_identify/train.py + huggingface/ checkpoints |
| 1.6 Исследование алгоритмов CV | ✓ | research/notebooks/02_*–05_* (ViT, EfficientNet, Swin, ResNet) |
| 1.7 Прототипы алгоритмов ML для идентификации | ✓ | whales_identify/model.py (CetaceanIdentificationModel) |
| 1.8 Общая архитектура ML-системы | ✓ | DOCS/ML_ARCHITECTURE.md |
| 1.9 Системный анализ + оптимизация | ✓ | whales_be_service/src/whales_be_service/inference/ refactor |
| 1.10 Многоклассовая классификация | ✓ | EfficientNet-B4 ArcFace 13 837 классов |
| 1.11 Код-ревью прототипов | ✓ | .github/workflows/ci.yml + pre-commit |
| 1.12 Тестирование и сравнение архитектур | ✓ | reports/METRICS.md, DOCS/PERFORMANCE_REPORT.md |
| 1.13 Сбор, обогащение и аугментация данных | ✓ | whales_identify/dataset.py, Happy Whale train set |
| 1.14 Data Stream алгоритмы ML | ✓ | whales_identify/filter_processor.py, monitoring/drift.py |
| КП работа | Status | Evidence |
|---|---|---|
| 2.1 Открытый репозиторий + согласование лицензий | ✓ | LICENSE, LICENSE_DATA.md, LICENSE_MODELS.md, LICENSES_ANALYSIS.md |
| 2.2 CI/CD + MLOps | ✓ | 7 GH workflows (ci / metrics / smoke / security / docker / train / lint), models/registry.json, monitoring/drift.py |
| 2.3 Тестирование с обратной связью | ✓ | whales_be_service/tests/ (88 unit tests), whales_identify/tests/ |
| 2.4 Backend + ML-модели | ✓ | FastAPI + InferencePipeline (real inference, no mocks) |
| 2.5 Валидация ML-алгоритмов, выбор финального решения | ✓ | scripts/compute_metrics.py + reports/metrics_baseline.json |
| 2.6 Пользовательский интерфейс | ✓ | frontend/ (React + Tailwind + RejectionCard + ConfidenceGauge) |
| 2.7 Интеграция BE + FE + browser/mobile | ✓ | docker-compose, VITE_BACKEND, nginx.conf, responsive Tailwind |
| КП работа | Status | Next step |
|---|---|---|
| 3.1 Итоговая техническая документация | ✓ | 15 docs under DOCS/ — this file, ML_ARCHITECTURE, PERFORMANCE_REPORT, API_REFERENCE, etc. |
| 3.2 MLOps для высокой нагрузки | ✓ | monitoring/drift.py, /v1/drift-stats, availability gauge, baseline regression gate |
| 3.3 Учебные и демо материалы | ✓ | research/demo-ui/, README.md «для бабушки», DOCS/USER_GUIDE_BIOLOGIST.md |
| 3.4 API для взаимодействия FE с ML | ✓ | /v1/predict-single, /v1/predict-batch, /v1/drift-stats |
| 3.5 Эксперименты с оптимизацией параметров | ✓ | scripts/calibrate_clip_threshold.py, scripts/benchmark_*.py |
| 3.6 Комплексная архитектура CV | ✓ | CLIP gate + EffB4 ArcFace + confidence gating |
| 3.7 Контейнеризация + file запуска | ✓ | Dockerfile, docker-compose.yml, docker-entrypoint.sh (auto-download from HF) |
| 3.8 Интеграция с внешними сервисами | ✓ | integrations/sqlite_sink.py, integrations/postgres_sink.py, CSV export via CLI, HF Hub mirror |
| 3.9 Разработка полноценной мобильной версии UI (Тарасов А.А.) | partial | mobile-first UI: Tailwind responsive — done; PWA-обёртка и нативная сборка — в работах этапа 3 |
| Milestone | ETA | Notes |
|---|---|---|
| GPU inference mode | Q2 2026 | ONNX export + TensorRT — already demonstrated in research/notebooks/07_onnx_inference_compare.ipynb |
| Bbox detector (YOLOv8-cetacean) | Q2 2026 | Train on data/backfin_annotations.csv (5 201 rows) + re-label |
| Top-k results via nearest-neighbour retrieval | Q2 2026 | Pre-compute embeddings for all training images, use FAISS |
| Real-time video stream processing | Q3 2026 | Frame sampling + batched inference |
| Mobile-native app (Flutter) | Q3 2026 | REST API is already mobile-friendly |
| Federated learning for private fleet datasets | Q4 2026 | See Yandex ML blog ref in the review notes |
| Biologist feedback loop (active learning) | Q4 2026 | Collect disagreements, re-train head monthly |
| Public leaderboard + community dashboard | 2027 | Every new contributor gets a DOI |
| OpenAPI SDKs (Python, R, Julia) | 2027 | Auto-generated from /openapi.json |
| UN SDG 14 integration (GBIF + OBIS sync) | 2027 | One-way upload of anonymised observations |
The project is designed so that a third party can:
species_map.csv, re-fit the ArcFace head (no backbone retraining needed).integrations/sqlite_sink.py, swap the driver — takes under an hour.docker-entrypoint.sh downloads weights from any HF repo via the HF_REPO env var. Fork the HF repo, retrain, point HF_REPO=yourorg/their-whales → done.models/ directory in the Docker container; the entrypoint skips the download.See CONTRIBUTING.md and FAQ.md for contributor workflow.