motqan ai

Multi-Provider
Translation Engine

End-to-end translation for documents, data, and localization workflows.

The Problem

CI Pipelines Lack Options

Standard APIs hand back a blob. No batching control, no cost preview, and no way to protect inline tags or code identifiers.

Layout-Busting Copy-Paste

Translating formatted Word or PPTX files means spending hours fixing layouts after text is pasted back.

True RTL is Ignored

Arabic & Hebrew require layout mirroring, digit shaping, and font changes. Text replacement alone breaks the file.

Inconsistent Terminology

A legal term translated three different ways across a forty-page contract isn't a style issue—it's a liability.

A single translation core that treats real-world content and complex formats as first-class inputs.

Predictable. Scalable. Format-Aware.

System Architecture

FastAPI Layer

Translate API
Documents Pipeline
Async Jobs

➔

Translation Orchestrator

Word-Count Batching
Cache Lookup
Two-Pass Terminology
Quality Scoring

➔

AI Providers

Gemini
OpenAI
Anthropic
DeepSeek

The Request Lifecycle

1

Intake & Normalize

2

Word-Count Batching

3

Cache Lookup

4

Concurrent Dispatch

5

Reassembly

Results are reassembled into the original order, merging cache hits, fresh translations, and flagging any failures.

Performance & Predictability

20

Parallel Batches

✓

Word-Count Batching: Sizes batches by a strict budget, making API costs entirely predictable.

✓

Segment Atomicity: Ensures a sentence is never artificially split across batches.

✓

Concurrency: Independent batches run under an asyncio semaphore, slashing job latency.

Multi-Provider Setup

26

AI Models Supported

Across 4 vendors (Gemini, OpenAI, Anthropic, DeepSeek). Rule-based engine recommends models based on language, priority, and domain. Gracefully falls back if an API key is missing.

Format Breadth

23

File Formats Supported

Office: DOCX, PPTX, XLSX
Structured: JSON, YAML, PO, SRT
XML-Family: HTML, SVG, .strings
CAT Tools: XLIFF, TMX, TBX

Two-Pass Terminology Lock

Guarantees that critical legal or technical terms are translated identically across the entire document, even in parallel batches.

PASS 1

Extract Glossary

The engine scans the entire document to identify and translate key recurring terms.

➔

PASS 2

Inject & Translate

The extracted glossary is injected into prompts for all parallel batches, forcing absolute consistency.

RTL Transformation
Beyond just text

Arabic, Hebrew, Persian, and Urdu break standard translation pipelines. Text replacement isn't enough.

✓ Document & reading direction flipped.
✓ Arabic-Indic digit shaping applied.
✓ Table layouts completely mirrored.
✓ 28 Smart source-to-target font mappings.

Font Mapper Active

Source (Calibri)

The agreement is signed on 12 May.

Target (Noto Kufi Arabic + Digit Shaping)

تم توقيع الاتفاقية في ١٢ مايو.

Engine at Scale

10,000

Max segments batched per single API request.

30-Day

Per-segment cache TTL, eliminating costs for boilerplate text.

5 Retries

Resilient HMAC-signed webhook callbacks for async jobs.

3

Bounded API retries with backoff per segment failure.

313

Automated tests covering the engine's edge cases.

24,000+

Lines of Python architecture scaling the operations.

What's Next

1

PDF Translation: End-to-end translation with full layout preservation.

2

Adobe InDesign (.idml): Direct support for publishing and design workflows.

3

Human-in-the-Loop UI: Reviewer workflow layered on top of back-translation quality scoring.

4

ZIP-Archive Batch Input: Submit and translate entire folders of files in a single upload.