MobTranslate Models

Kuku Yalanji Translation

gvn

A versioned English -> Kuku Yalanji model line. v24.3 is a runtime-verified two-task research candidate: lookup remains first for known dictionary facts, while lexical reconstruction and sentence generation retain separate evidence and claims.

NLLB LoRAFar North Queensland, Australia

Version v24.3-joint-lexeme-dose29-s3598-20260715

runtime verified research candidate

Base model: Exact v21.2 balanced-replay merged NLLB-1.3B project base plus v24.3 rank-32 LoRA and trainable <lexeme> token
Dataset: 115,136 materialized rows: 2,724 governed lexemes x29 plus 18,070 non-Bible sentence rows x2; one-seed development screen
Directions: eng-gvn
Release date: 2026-07-15

Not published

CC-BY-NC-4.0 upstream NLLB base plus source-specific project data terms. Noncommercial research and explicitly labelled drafts only; not speaker-certified, community-approved, or authoritative.

optimizer steps: 3598.0
train loss: 0.9482
closed set lexical rows: 2724.0
closed set lexical accepted exact: 2696.0
closed set lexical accepted exact percent: 99.0
closed set lexical wilson 95 low percent: 98.5
closed set lexical failures: 28.0
historical lexicon rows: 297.0

ArtifactKindFormatDownload

Comprehensive v24.3 model, training, evaluation, and hosting guide

Complete lineage, downloads, two-task API, RunPod recipe, all lexical failures, sentence limits, resource measurements, multi-adapter design, rights, and host acceptance checks.

documentationhtml Get

Comprehensive v24.3 guide source

documentationmarkdown Get

Complete v24.3 dynamic hosting bundle

Exact v21.2 merged base, v24.3 adapter and tokenizer, task-aware HTTP server, hosting manifest, CPU smoke evidence, and compact evaluation record.

bundletar.gz Get

v24.3 adapter-only hosting bundle

Requires the exact project v21.2 merged base SHA-256 documented in the hosting manifest; stock NLLB is incompatible.

adaptertar.gz Get

v24.3 hosting manifest

metadatajson Get

v24.3 CPU runtime verification

evaluationjson Get

All 2,724 selected-checkpoint lexical predictions

evaluationjson Get

Complete 28-row v24.3 lexical failure ledger

evaluationjsonl Get

v24.3 final research decision

evaluationjson Get

Exact v24.3 training and evaluation data

Exact 115,136-row materialized train schedule, optimization monitor, governed lexical census, development suite, manifests, and checksums.

datasettar.gz Get

v24.3 release metadata

metadatajson Get

v24.3 release checksums

metadatasha256 Get

The headline 98.9721% score is accepted-reference reconstruction on 2,724 governed dictionary records shown 29 times each during training. It is not unseen-word generalisation or sentence accuracy.
The selected step improved every reported development sentence corpus-chrF endpoint over untouched v21.2, but the run used one seed and development data; elder exact remained 0/43.
The older 297-prompt benchmark scored 91 exact because only 208 prompts overlap and the two resources have different accepted surfaces; 206/208 overlapping predictions were supported by the governed census.
All 28 remaining governed-census failures are published with token, part-of-speech, semantic-domain, closest-reference, and checkpoint-trajectory evidence.
Dynamic CPU loading was verified with both task prefixes. It peaked at 8.00 GiB cgroup memory, settled around 7.0-7.2 GiB RSS, and produced 4.9-23.2 second requests under variable shared-disk pressure.
The adapter is large because PEFT persisted full embedding and output matrices for the trainable <lexeme> row. Repacking or merging requires full adapter-parity evaluation before release.
Known dictionary queries should continue to use deterministic lookup. Every generated sentence must be visibly labelled as an unverified research draft.

Version v21.2-claude-balanced-replay-guarded-20260714

research only

Base model: facebook/nllb-200-distilled-1.3B + frozen v12.0 merged continuation with gvn_Latn tokenizer extension
Dataset: Byte-identical v21.2 balanced-replay weights trained on 22,164 rows per epoch; this release changes decoding only after a separately frozen transfer protocol
Directions: eng-gvn
Release date: 2026-07-14

Research preview

CC-BY-NC-4.0 base; project-approved synthetic and governed replay data for noncommercial research. The model is not speaker-certified and is not approved for authoritative or unrestricted translation.

weights train loss: 1.199
synthetic dev chrf: 55.2
synthetic dev exact: 394.0
synthetic dev segment loop rows: 1.000
synthetic test tagged chrf: 55.7
synthetic test tagged exact: 426.0
synthetic test tagged segment loop rows: 2.000
synthetic test untagged chrf: 55.7

ArtifactKindFormatDownload

Comprehensive model, training, and hosting guide

Model selection, API use, data lineage, RunPod reproduction, Atlas loading, rights, and smoke tests.

documentationhtml Get

v22 step-matched replay and decoder-transfer experiment

Preregistered checkpoint rejection, paired audits, guarded-decoder transfer, costs, limitations, and exact hashes.

documentationhtml Get

RunPod evaluation and fast feedback loop

Complete one-command A40 benchmark method, frozen linguistic protocol, parity proof, resource evidence, failure ledger, and faster candidate loop.

documentationhtml Get

RunPod evaluation and fast-loop source

Public copy of the comprehensive reproducibility and handoff record.

documentationmarkdown Get

Portable RunPod evaluation kit

One-command benchmark code, frozen 340-row input, preregistration, manifests, pinned requirements, and three local baselines; model weights are intentionally separate.

evaluationtar.gz Get

Portable RunPod kit instructions

Exact paths, runtime contract, execution command, outputs, and interpretation rules.

documentationmarkdown Get

Sealed A40 benchmark result bundle

Deterministic archive of the complete checksummed A40 run: predictions, analysis, preflight, runtime, logs, and resource measurements.

evaluationtar.gz Get

Sealed A40 output checksums

Verified remotely before completion and again after download.

metadatasha256 Get

A40 version-probe analysis

GPU-native results and 10,000-resample paired intervals for the three frozen checkpoints.

evaluationjson Get

A40 versus local prediction parity

All 1,020 row-level predictions were identical across the A40 and frozen local runs.

evaluationjson Get

A40 repeatability comparison

Independent A40 rerun comparison: all 1,020 predictions were identical.

evaluationjson Get

A40 resource summary

Sampled utilization, GPU memory, power, and temperature over the sealed run.

evaluationjson Get

Lexicon and elder version-probe results

Multi-reference canonical-headword percentages and separately scored 43-row elder results for steps 2,770, 3,120, and 4,155.

evaluationmarkdown Get

Lexicon and elder version-probe analysis

Complete metrics, exposure strata, decoder contract, and 10,000-resample paired intervals.

evaluationjson Get

Frozen lexicon and elder benchmark input

297 isolated English dictionary glosses plus the unchanged 43 rights-cleared elder sentence pairs; metrics are split by pair_kind, never pooled.

datasetjsonl Get

Lexicon-probe manifest

Mechanical row-selection rules, source hashes, exposure counts, and frozen input identity.

metadatajson Get

Lexicon and elder benchmark preregistration

evaluationmarkdown Get

Lexicon and elder benchmark run contract

evaluationmarkdown Get

Lexicon and elder benchmark package checksums

Covers every frozen input, prediction file, resource log, protocol, and result in the public benchmark directory.

metadatasha256 Get

Step 2,770 row-level lexicon and elder predictions

evaluationjson Get

Step 3,120 row-level lexicon and elder predictions

evaluationjson Get

Step 4,155 guarded row-level lexicon and elder predictions

evaluationjson Get

Atlas dynamic LoRA guarded hosting bundle

Exact frozen v12 base, v21.2 PEFT adapter, tokenizer, guarded decoding policy, hosting manifest, and smoke tests.

bundletar.gz Get

Standalone guarded merged-model bundle

Complete Transformers-compatible merged model; archive bytes are identical to the historical v21.2 merged bundle and the guarded policy is documented alongside it.

modeltar.gz Get

Guarded hosting manifest

metadatajson Get

Release metadata

metadatajson Get

Bundle README

documentationmarkdown Get

Release checksums

metadatasha256 Get

Local release verification

evaluationtext Get

Frozen decoder-transfer protocol

evaluationmarkdown Get

Decoder-transfer promotion gate

PASS: all 14 frozen checks passed on the exact published v21.2 model.

evaluationjson Get

Exact v21.2 training splits

Final train, validation, synthetic test, gate, manifest, and leakage-quarantine files; locked elder evaluation rows excluded.

datasettar.gz Get

Complete synthetic process corpus v2

20,047 synthetic sentences plus dictionary, reviews, revisions, audits, process logs, and canonical splits.

datasetzip Get

The merged model SHA-256 is exactly the published v21.2 identity; this is a decoding-policy release, not a renamed checkpoint.
The decoder was selected on the v22 development set and locked before v22 test/control inference and before any v21.2 transfer inference.
A first transfer attempt reconstructed non-identical float32 bytes and was rejected before evaluation; only the exact published v21.2 merged model was scored.
Guarded versus greedy tagged synthetic corpus chrF changed by +1.8723 while repeated-segment rows fell from 37 to 2; all seven frozen sets had zero empty outputs.
Exact match fell on some sets and Bible chrF changed slightly downward; these outcomes are reported in the experiment book rather than hidden.
A post-training 297-prompt canonical-headword probe scored this release at 48/297 normalized accepted-reference exact matches (16.16%); the result was 46/192 for target forms seen as training tokens and 2/105 for unseen targets.
Under the same fixed decoder, step 4,155 exceeded step 3,120 by 1.68 exact-match percentage points on the lexicon probe (paired 95% interval +0.34 to +3.37), while the 43-row elder difference remained inconclusive and every checkpoint was 0/43 exact.
An exact NVIDIA A40 reproduction evaluated all 340 frozen rows for steps 2,770, 3,120, and 4,155. All 1,020 GPU predictions matched the frozen local baselines, and a second A40 run reproduced all 1,020 predictions exactly.
The portable evaluator fail-closes on input, model-file, runtime, decoder, determinism, GPU-use, empty-output, baseline-parity, and output-checksum violations. Its compressed kit is 100,526 bytes and excludes model weights.
A first complete A40 attempt was rejected when a post-seal log append invalidated its checksum inventory. The failed attempt and repair evidence remain in the private research archive; only the clean from-scratch rerun is published.
On 2026-07-14 the Kuku Yalanji and Mi'gmaq CPU inference workers were stopped and disabled, their web-service dependencies were removed, and custom-model endpoints were cleared. This release remains downloadable and reproducible but is not currently loaded for live translation.
Automatic metrics are regression evidence, not community validation. Lookup-first routing and explicit research-draft labelling remain required.

Version v23.0-attested-narrative-adaptation-failed

negative result

Base model: Exact v21.2 merged weights, independently adapted under seeds 17, 42, and 73
Dataset: 2,360-row attested-narrative treatment: 624 replayed Patz training clauses, 712 dictionary-usage task rows, and 1,024 synthetic-retention rows; Bible and held-out speakers excluded
Directions: eng-gvn
Release date: 2026-07-14

Completed: not promoted

Noncommercial research result. Source materials retain their own terms. The candidate was not speaker-reviewed, community-approved, published as weights, or authorized for sentence generation.

selected seed: 73.0
optimizer steps per seed: 882.0
text3 natural test rows: 56.0
text3 baseline corpus chrf: 31.7
text3 candidate corpus chrf: 32.3
text3 corpus chrf delta: 0.6414
text3 mean sentence chrf delta: 0.4494
text3 paired ci95 low: -0.2448

ArtifactKindFormatDownload

Complete v23 negative-result report

documentationhtml Get

Preregistered v23 protocol

evaluationhtml Get

The speaker-disjoint held-out natural test was opened only after development selection, but all texts share one published grammar and editorial tradition.
No fluent speaker reviewed model outputs; automatic character overlap cannot establish semantic, grammatical, dialectal, or cultural validity.
All three seeds, checkpoints, adapters, merged weights, optimizer states, and trainer caches were deleted after compact evidence and checksums were verified.
This registry row records a completed negative result. It is not a downloadable model edition and cannot become the latest serving alias.

Version v22.0-step-matched-replay-3120-failed

negative result

Base model: Exact v21.2 training checkpoint at step 2,770, continued under the original 4,155-step learning-rate schedule
Dataset: The same v21.2 balanced replay stream for exactly 350 additional optimizer steps; no test data was available to training or decoder selection
Directions: eng-gvn
Release date: 2026-07-14

Completed: not promoted

Noncommercial research artifact only. Rejected by the preregistered promotion gate; not speaker-certified, production-approved, or offered as a model download.

global step: 3120.0
additional optimizer steps: 350.0
greedy synthetic dev chrf: 52.9
greedy synthetic test tagged chrf: 53.2
greedy synthetic test tagged exact: 417.0
greedy synthetic test tagged segment loop rows: 37.0
greedy synthetic test untagged chrf: 53.3
greedy synthetic test untagged exact: 419.0

ArtifactKindFormatDownload

Complete v22 experiment report

documentationhtml Get

Preregistered v22 protocol

evaluationmarkdown Get

v22 promotion gate

FAIL: the tagged synthetic repeated-segment gate failed.

evaluationjson Get

RunPod cost and deletion record

metadatajson Get

RunPod post-deletion pod list

metadatajson Get

Remote output checksum inventory

metadatasha256 Get

Local 182-file pullback verification

evaluationtext Get

The hypothesis, exact stop, hashes, promotion floors, and analysis rules were frozen before the locked test and control sets were opened.
The development-only guarded decoder controlled degeneration but could not rescue a checkpoint that failed the primary training-exposure gate.
The final v21.2 step 4,155 checkpoint remained better on tagged synthetic, untagged synthetic, and dictionary-usage paired comparisons.
All 182 remote-manifest files were verified after pullback with zero failures; the RunPod pod was then deleted and an empty post-deletion pod list was saved.
The rejected weights remain in the internal research archive but are intentionally not presented as a public model download.

Version v21.2-claude-balanced-replay-v2-candidate

research only

Base model: facebook/nllb-200-distilled-1.3B + frozen v12.0 merged continuation with gvn_Latn tokenizer extension
Dataset: 22,164-row balanced treatment per epoch: 16,642 leakage-audited synthetic rows, 2,047 Bible-direct replay rows, 2,047 Bible-reference replay rows, and 1,428 dictionary-usage replay rows; elder rows excluded
Directions: eng-gvn
Release date: 2026-07-11

Research preview

train loss: 1.199
synthetic dev bleu: 26.2
synthetic dev chrf: 52.8
synthetic test tagged bleu: 26.8
synthetic test tagged chrf: 53.8
synthetic test tagged exact: 426.0
synthetic test untagged bleu: 26.8
synthetic test untagged chrf: 53.8

ArtifactKindFormatDownload

Comprehensive model, training, and hosting guide

Start here for model selection, API use, training data, RunPod reproduction, Atlas LoRA loading, evaluation, rights, and smoke tests.

documentationhtml Get

Atlas dynamic LoRA hosting bundle

One-download bundle containing the exact frozen v12 base, portable v21.2 PEFT adapter, tokenizer/configuration files, hosting manifest, and smoke-test instructions.

bundletar.gz Get

Standalone merged-model bundle

Complete Transformers-compatible merged model for hosts without dynamic encoder-decoder LoRA support.

modeltar.gz Get

Hosting manifest

metadatajson Get

Release checksums

metadatasha256 Get

Exact v21.2 training splits

Final train, validation, synthetic test, gate, manifest, and leakage-quarantine files; locked elder evaluation rows excluded.

datasettar.gz Get

Complete synthetic process corpus v2

20,047 synthetic sentences plus dictionary, reviews, revisions, audits, process logs, and canonical splits.

datasetzip Get

Merged v2 candidate

Hash-verified local research artifact; served only by the guarded loopback preview.

modelsafetensorsLocal

LoRA adapter

adaptersafetensorsLocal

Independent v21 comparison

evaluationhtml Get

The independent comparison revalidated ordered IDs, inputs, references, decoder settings, and archive checksums before scoring.
Compared with v21.1, corpus chrF changed by -0.87 on tagged synthetic test, +12.74 on dictionary usage, +15.23 on Bible direct, and +15.15 on Bible reference.
The 43-row elder control remains 0 exact with severe under-translation; its paired confidence interval versus v21.1 includes zero.
Thirty-seven tagged synthetic rows contain a segment repeated at least ten times, so generation guards and a frozen-battery rerun are mandatory before broader routing.
The original release record preserves its greedy evaluation; current serving uses the separately validated guarded-policy release and remains visibly labelled as a research draft.

Version v21.1-codex-synthetic-direct-v2-candidate

research only

Base model: facebook/nllb-200-distilled-1.3B + frozen v12.0 merged continuation with gvn_Latn tokenizer extension
Dataset: 20,047-row governed synthetic corpus; leakage-audited v21.1 treatment with 16,642 train, 1,609 validation, 1,606 test, and 190 quarantined rows
Directions: eng-gvn
Release date: 2026-07-10

Research preview

Project-approved synthetic research data pending elder verification. The model is not speaker-certified and is not approved for public translation routing.

train loss: 1.691
validation bleu: 26.2
validation chrf: 53.9
synthetic test tagged bleu: 26.1
synthetic test tagged chrf: 54.7
synthetic test untagged bleu: 26.3
synthetic test untagged chrf: 54.4
elder shared chrf: 28.8

ArtifactKindFormatDownload

Merged v2 candidate

Local, hash-verified research artifact. Not served to public translation traffic.

modelsafetensorsLocal

LoRA adapter

adaptersafetensorsLocal

Comprehensive v2 handoff

documentationmarkdownLocal

Archive audit

evaluationjsonLocal

The mandatory 30-epoch gate failed and correctly blocked full training; its complete model is retained as negative evidence.
A documented 100-epoch gate amendment passed 128/128 exact before the full treatment unlocked.
All eight frozen evaluation sets completed at exact row counts with zero empty outputs.
All 192 model files matched the RunPod SHA-256 inventory; local CPU and remote CUDA smoke outputs were identical; the pod was deleted.
Compare with Claude's independent v21.2 balanced-replay lane before any release decision.

Version v20.0-full-candidate-corpus-gvn

internal proof

Base model: facebook/nllb-200-distilled-1.3B + v12.0 merged continuation
Dataset: v20.0 full candidate corpus: tagged Bible direct/reference tasks + DB usage replay + elder-shared sentence-pair replay
Directions: eng-gvn
Release date: 2026-07-02

Test in v2

Rights granted for elder-shared/DB examples; Bible-derived rows used as research diagnostics. Known approved resources remain lookup-first.

train loss: 1.485
trainer validation bleu: 13.4
trainer validation chrf: 39.9
trainer test bleu: 13.2
trainer test chrf: 40.3
validation bible direct chrf: 39.3
validation bible direct exact: 52.0
validation bible ref chrf: 39.1

ArtifactKindFormatDownload

Merged model

modeldirectory Get

LoRA adapter

adapterdirectory Get

Run report

reportmarkdown Get

Live run note

reportmarkdown Get

Training manifest

metadatajson Get

Post-eval analysis

metadatajson Get

Resource summary

metadatajson Get

Source length preflight

metadatajson Get

Test all predictions

evaluationjson Get

Test Bible direct predictions

evaluationjson Get

Test Bible reference predictions

evaluationjson Get

Validation Bible direct predictions

evaluationjson Get

Validation Bible reference predictions

evaluationjson Get

Heldout usage predictions

evaluationjson Get

Elder-shared sentence-pair predictions

evaluationjson Get

Train sample predictions

evaluationjson Get

This was the first full-candidate-corpus run after the smaller v18/v19 replay runs.
The corpus was expanded into tagged direct/reference Bible tasks plus DB usage and elder-shared sentence-pair replay.
The result shows full-corpus scale alone is not enough; route-specific lookup and mixture control remain necessary before another expensive run.
Use v20 as a diagnostic artifact, not as the product default for faithful output.

Version v19.0-balanced-replay-from-v12-gvn

internal proof

Base model: facebook/nllb-200-distilled-1.3B + v12.0 merged continuation
Dataset: v19.0 balanced Bible replay + DB usage + elder-shared sentence pairs
Directions: eng-gvn
Release date: 2026-07-02

Test in v2

Rights granted for elder-shared/DB examples; Bible-derived rows used as research diagnostics. Known approved resources remain lookup-first.

train loss: 0.2129
validation bleu: 95.8
validation chrf: 97.2
heldout usage chrf: 53.0
heldout usage exact: 6.000
heldout bible direct chrf: 44.2
heldout bible ref chrf: 44.3
heldout all chrf: 44.5

ArtifactKindFormatDownload

Merged model

modeldirectory Get

LoRA adapter

adapterdirectory Get

Release manifest

metadatajson Get

Training manifest

metadatajson Get

Post-eval analysis

metadatajson Get

Heldout Bible reference predictions

evaluationjson Get

Heldout usage predictions

evaluationjson Get

Elder-shared sentence-pair predictions

evaluationjson Get

Next move after v18: balanced replay continuation from v12, not another single-domain continuation.
Best broad heldout score after elder sentence-pair ingestion, but not a universal replacement.
Use for current research state; keep exact Bible/DB/elder sentence-pair resources lookup-first.

Version v18.0-usage-elder-sentence-continuation-from-v10

internal proof

Base model: facebook/nllb-200-distilled-1.3B + v10.0 merged continuation
Dataset: v18.0 DB usage + 43 elder-shared sentence pairs, oversampled
Directions: eng-gvn
Release date: 2026-07-02

Test in v2

Elder-shared sentence pairs and DB examples are approved for training; exact known rows remain lookup-first.

train loss: 0.2773
train usage elder sentence pair chrf: 100.0
train usage elder sentence pair exact: 408.0
elder sentence pair chrf: 100.0
elder sentence pair exact: 43.0
heldout usage chrf: 54.5
heldout usage exact: 3.000
heldout bible direct chrf: 41.4

ArtifactKindFormatDownload

Merged model

modeldirectory Get

LoRA adapter

adapterdirectory Get

Training manifest

metadatajson Get

Post-eval analysis

metadatajson Get

Elder-shared sentence-pair predictions

evaluationjson Get

Heldout usage predictions

evaluationjson Get

Proved the 16 uploaded source pages could be transcribed into 43 useful supervised sentence-pair rows.
Route-specific artifact: good proof for elder sentence-pair data ingestion, not a general model winner.

Version v15.0-soft-lexical-hint-bible-gvn-token

negative result

Base model: facebook/nllb-200-distilled-1.3B + v12.0 continuation
Dataset: v15.0 oracle lexical-hint Bible diagnostic
Directions: eng-gvn
Release date: 2026-07-02

Completed: not promoted

Research diagnostic; not an approved translation source.

heldout bible direct chrf: 44.1
heldout bible ref chrf: 43.7

ArtifactKindFormatDownload

Merged model

modeldirectory Get

LoRA adapter

adapterdirectory Get

Training manifest

metadatajson Get

Post-eval analysis

metadatajson Get

Heldout Bible reference predictions

evaluationjson Get

Soft lexical hints were not the missing ingredient; do not scale this branch.

Version v13.0-retrieval-context-bible-gvn-token

negative result

Base model: facebook/nllb-200-distilled-1.3B + v12.0 continuation
Dataset: v13.0 retrieved Bible context prefix diagnostic
Directions: eng-gvn
Release date: 2026-07-02

Completed: not promoted

Research diagnostic; not an approved translation source.

heldout bible direct chrf: 29.5
heldout bible ref chrf: 29.7

ArtifactKindFormatDownload

Merged model

modeldirectory Get

LoRA adapter

adapterdirectory Get

Training manifest

metadatajson Get

Post-eval analysis

metadatajson Get

Heldout Bible reference predictions

evaluationjson Get

Retrieval remains useful for lookup/evidence display, not as this NLLB prefix format.

Version v12.0-tagged-direct-plus-reference-bible-gvn-token

route candidate

Base model: facebook/nllb-200-distilled-1.3B + custom gvn_Latn token
Dataset: v12.0 tagged direct/reference Bible, 4096 rows
Directions: eng-gvn
Release date: 2026-07-02

Archived route candidate

Research diagnostic; exact Bible references should be served from approved lookup.

heldout bible direct chrf: 44.4
heldout bible ref chrf: 44.4
heldout exact: 0.0000

ArtifactKindFormatDownload

Merged model

modeldirectory Get

LoRA adapter

adapterdirectory Get

Training manifest

metadatajson Get

Post-eval analysis

metadatajson Get

Heldout Bible reference predictions

evaluationjson Get

Useful as Bible draft fallback only; not faithful canonical reproduction.

Version v11.0-byt5-bible-control-32row

negative result

Base model: google/byt5-small
Dataset: v11.0 ByT5 32-row Bible control
Directions: eng-gvn
Release date: 2026-07-02

Completed: not promoted

Research diagnostic only.

heldout bible direct chrf: 6.090
heldout bible ref chrf: 5.710
heldout all chrf: 5.900

ArtifactKindFormatDownload

Kept as a control result, not a candidate model.

Version v10.0-tagged-bible-plus-glossary-usage-tpi

route candidate

Base model: facebook/nllb-200-distilled-1.3B + tpi_Latn proxy token
Dataset: v10.0 Bible + glossary/DB usage multitask
Directions: eng-gvn
Release date: 2026-07-02

Archived route candidate

DB usage examples are approved; exact known DB rows remain lookup-first.

heldout usage chrf: 56.2
heldout usage exact: 3.000
heldout bible direct chrf: 43.2
heldout bible ref chrf: 43.3
heldout all chrf: 43.6

ArtifactKindFormatDownload

Merged model

modeldirectory Get

LoRA adapter

adapterdirectory Get

Training manifest

metadatajson Get

Post-eval analysis

metadatajson Get

Heldout usage predictions

evaluationjson Get

Heldout Bible reference predictions

evaluationjson Get

Keep as usage/general draft fallback until a later run clearly beats it on word-id heldout usage.

Version v9.8-tagged-bible-plus-db-usage-tpi

internal proof

Base model: facebook/nllb-200-distilled-1.3B + tpi_Latn proxy token
Dataset: v9.8 tagged Bible plus DB usage examples
Directions: eng-gvn
Release date: 2026-07-02

Test in v2

Research diagnostic with approved DB usage examples.

heldout usage chrf: 39.5
heldout bible ref chrf: 44.0
heldout all chrf: 43.9

ArtifactKindFormatDownload

Merged model

modeldirectory Get

LoRA adapter

adapterdirectory Get

Training manifest

metadatajson Get

Post-eval analysis

metadatajson Get

Heldout usage predictions

evaluationjson Get

Heldout Bible reference predictions

evaluationjson Get

Important because it proved DB examples can be learned, but not enough to route as usage fallback.

Version v9.7-tagged-direct-plus-reference-bible-tpi

internal proof

Base model: facebook/nllb-200-distilled-1.3B + tpi_Latn proxy token
Dataset: v9.7 tagged direct/reference Bible, 4096 rows
Directions: eng-gvn
Release date: 2026-07-02

Test in v2

Research diagnostic; exact Bible references remain lookup-first.

heldout bible direct chrf: 44.1
heldout bible ref chrf: 44.3
heldout exact: 0.0000

ArtifactKindFormatDownload

Merged model

modeldirectory Get

LoRA adapter

adapterdirectory Get

Training manifest

metadatajson Get

Post-eval analysis

metadatajson Get

Heldout Bible reference predictions

evaluationjson Get

Reference-conditioned input helped; exact canonical Bible output still belongs to retrieval/lookup.

Version v8.0-diagnostic-gates-summary

internal proof

Base model: facebook/nllb-200-distilled-1.3B + LoRA gates
Dataset: v8 gated overfit diagnostics, 1->8->32->256 rows
Directions: eng-gvn
Release date: 2026-07-01

Test in v2

Research diagnostic only.

overfit gate: 1.000
pipeline alive: 1.000

ArtifactKindFormatDownload

Not a product model; it is the reason later runs are meaningful.

Version 0.7.0-full-tpi-proxy-1.3b

internal proof

Base model: facebook/nllb-200-distilled-1.3B
Dataset: kuku_yalanji_ebible_parallel_v0.1.0
Directions: eng-gvn
Release date: 2026-07-01

Test in v2

Kuku Yalanji eBible snapshot rights granted for MobTranslate model training by project owner attestation on 2026-06-30. This is an internal experimental artifact for evaluation, not a community-approved production translator.

validation loss: 1.689
validation bleu: 8.745
validation chrf: 37.5
test loss: 1.678
test bleu: 8.401
test chrf: 37.1
standalone bleu: 6.377
standalone chrf: 34.5

ArtifactKindFormatDownload

Merged model

Current live translate/v2 internal-proof model. Decode with source eng_Latn and target tpi_Latn.

modelsafetensors Get

LoRA adapter

adaptersafetensors Get

Training manifest

metadataJSON Get

Evaluation report

evaluationJSON Get

MobTranslate DB usage example eval preview

48-row non-Bible preview generated through the live v0.7 CPU service from curated MobTranslate Kuku Yalanji usage_examples. Full 449-row reference set is published separately.

evaluationJSON Get

MobTranslate DB usage example reference set

449 English to Kuku Yalanji examples exported from MobTranslate Postgres usage_examples on 2026-07-01; rights_status=rights_granted and approved_for_training=true.

datasetJSONL Get

Best current experimental English to Kuku Yalanji model and the first run to clearly beat v0.4 across automatic and row-level comparison metrics.
Served through the existing NLLB tpi_Latn token as a proxy target, which outperformed the failed 1.3B custom-token probe.
Still not faithful enough for production or community-approved use: long passages compress and some clauses drift or disappear.
Added non-Bible MobTranslate DB usage-example eval preview so translate/v2 shows everyday dictionary-context failures, not only Bible verse rows.

Version 0.4.0-full-gvn-token

internal proof

Base model: facebook/nllb-200-distilled-600M
Dataset: kuku_yalanji_ebible_parallel_v0.1.0
Directions: eng-gvn
Release date: 2026-07-01

Test in v2

train loss: 2.561
validation loss: 1.966
validation bleu: 5.539
validation chrf: 33.1
test loss: 1.989
test bleu: 5.274
test chrf: 32.7
standalone bleu: 3.645

ArtifactKindFormatDownload

Merged model

Local experimental artifact. Served on translate/v2 for internal model comparison.

modelsafetensors Get

LoRA adapter

adaptersafetensors Get

Training manifest

metadataJSON Get

Evaluation report

evaluationJSON Get

Best current experimental English to Kuku Yalanji model and the first full 8-epoch high-confidence run.
Better than v0.3 on held-out metrics and samples, but still not faithful enough for production or community-approved release.
Primary remaining failure mode is semantic looseness and dropped content, especially on longer verses.

Version 0.1.0-mini-pilot

internal proof

Base model: facebook/nllb-200-distilled-600M
Dataset: kuku_yalanji_ebible_parallel_v0.1.0
Directions: eng-gvn
Release date: 2026-06-30

Test in v2

Kuku Yalanji eBible snapshot rights granted for MobTranslate model training by project owner attestation on 2026-06-30. Proof artifacts are local/internal and are not a public release.

train loss: 6.295
validation loss: 5.773
validation bleu: 0.0313
validation chrf: 4.744
test loss: 5.817
test bleu: 0.0275
test chrf: 4.272
standalone bleu: 0.0269

ArtifactKindFormatDownload

Merged model

Local proof artifact. Publish only after community/project release decision.

modelsafetensorsLocal

LoRA adapter

Includes resized language-token embeddings, so the proof adapter is large.

adaptersafetensorsLocal

Training manifest

metadataJSONLocal

Evaluation report

evaluationJSONLocal

Budget-safe proof run on the high-confidence corpus, not a production translator.
A40 utilization was healthy: 100% max GPU, 26.5 GiB max VRAM, and 295 W max power draw.
Use this release to test the translate/v2 model-lab path and inference server wiring.

Version 0.1.0-smoke

internal proof

Base model: facebook/nllb-200-distilled-600M
Dataset: kuku_yalanji_ebible_parallel_smoke_v0.1.0
Directions: eng-gvn
Release date: 2026-06-30

Test in v2

Kuku Yalanji eBible snapshot rights granted for MobTranslate model training by project owner attestation on 2026-06-30. Smoke artifacts are only a pipeline proof.

train loss: 6.537
validation loss: 6.013
validation bleu: 0.1085
validation chrf: 4.913
test loss: 6.416
test bleu: 0.0540
test chrf: 5.485
standalone bleu: 0.0552

ArtifactKindFormatDownload

Merged model

Local smoke artifact.

modelsafetensorsLocal

LoRA adapter

adaptersafetensorsLocal

Training manifest

metadataJSONLocal

Evaluation report

evaluationJSONLocal

First end-to-end RunPod proof: dataset upload, CUDA validation, LoRA training, merge, eval, artifact pullback.
Not useful as a translator; kept for regression testing the model project.

Version 0.1.0-baseline

training ready

Base model: facebook/nllb-200-distilled-600M
Dataset: kuku_yalanji_ebible_parallel_v0.1.0
Directions: eng-gvn
Release date: 2026-06-30

Awaiting training run

Kuku Yalanji eBible snapshot rights granted for MobTranslate model training by project owner attestation on 2026-06-30. Base model license and final release terms still apply to trained artifacts.

ArtifactKindFormatDownload

Model card

documentationMarkdown Get

Merged model

Published after the full baseline passes evaluation and review.

modelsafetensorsPending

LoRA adapter

Published alongside the merged model.

adaptersafetensorsPending

Training manifest

metadataJSONPending

Evaluation report

evaluationJSONPending

Full baseline is intentionally blocked until the model project, download registry, and translate/v2 test bench are working.
Planned first publishable candidate: high-confidence rows only, English to Kuku Yalanji, 8 epochs, A40/RTX A6000 first.

Download MobTranslate language models

Rights visible

Versioned artifacts

Dataset trace

Kuku Yalanji Translation

Version v24.3-joint-lexeme-dose29-s3598-20260715

Version v21.2-claude-balanced-replay-guarded-20260714

Version v23.0-attested-narrative-adaptation-failed

Version v22.0-step-matched-replay-3120-failed

Version v21.2-claude-balanced-replay-v2-candidate

Version v21.1-codex-synthetic-direct-v2-candidate

Version v20.0-full-candidate-corpus-gvn

Version v19.0-balanced-replay-from-v12-gvn

Version v18.0-usage-elder-sentence-continuation-from-v10

Version v15.0-soft-lexical-hint-bible-gvn-token

Version v13.0-retrieval-context-bible-gvn-token

Version v12.0-tagged-direct-plus-reference-bible-gvn-token

Version v11.0-byt5-bible-control-32row

Version v10.0-tagged-bible-plus-glossary-usage-tpi

Version v9.8-tagged-bible-plus-db-usage-tpi

Version v9.7-tagged-direct-plus-reference-bible-tpi

Version v8.0-diagnostic-gates-summary

Version 0.7.0-full-tpi-proxy-1.3b

Version 0.4.0-full-gvn-token

Version 0.1.0-mini-pilot

Version 0.1.0-smoke

Version 0.1.0-baseline

Mi'gmaq Translation

Version 1.0.0-rc1