From Bulk Translation to Terminology Consistency: Using DeepL Glossaries with MCP Server Composition

Background

A few months ago, I shared my automated translation pipeline that translated 180 pages of technical documentation using the DeepL MCP Server. That project proved DeepL + MCP can handle volume — 1.5 million characters translated in a day.

This time, I tackled the next challenge: terminology consistency.

The Problem

My work on electronic signature software requires frequent reference to the PDF specification (ISO 32000-2) — a 1,020-page standards document. When translating specification sections into Japanese without a glossary, DeepL produces inconsistent terminology:

Source term

Without glossary

Expected

null object

NULLオブジェクト / Nullオブジェクト

nullオブジェクト

entries

項目 / エントリ / エントリー

エントリー

indirect object

間接的なオブジェクト / 間接オブジェクト

間接オブジェクト

In a standards document, this kind of inconsistency is a real problem. null is a specific PDF keyword (lowercase) — translating it as NULL or Null is technically incorrect.

The Solution: MCP Server Composition

I built pdf-spec-mcp — an MCP server that gives LLMs structured access to PDF specification documents. It's designed for querying specs during development — navigating sections, searching requirements, looking up definitions, and comparing versions between PDF 1.7 and 2.0.

While using it, I realized the get_definitions tool (which extracts all 71 defined terms from the spec) could also serve as the foundation for a translation glossary. Combining it with the DeepL MCP Server created an unexpected workflow:

  1. pdf-spec-mcp (get_definitions) → 71 terms extracted from ISO 32000-2

  2. Classify + format → 56-entry EN→JA glossary (TSV)

  3. Register glossary via DeepL API

  4. DeepL MCP (translate-text + glossaryId) → Consistent Japanese translation

Results

Real comparison from Section 7.3.7 "Dictionary objects":

Without glossary:

値がNULLである辞書項目(7.3.9「Nullオブジェクト」参照)は、その項目が存在しないのと同じように扱われるものとする。

With glossary:

値がnullである辞書項目(7.3.9「nullオブジェクト」参照)は、その項目が存在しないのと同じように扱われるものとする。

And something I didn't expect: the glossary also prevented sentence omission. Without it, DeepL dropped an entire sentence from one paragraph. With the glossary, the full text was translated.

Aspect

Without Glossary

With Glossary

null keyword

NULL / Null (inconsistent)

null (correct PDF keyword)

"entries"

項目 (generic)

エントリー (domain term)

Sentence omission

One sentence dropped

Fully translated

What I Learned

Building the glossary requires domain expertise, not just AI.

pdf-spec-mcp can extract all 71 defined terms automatically. But deciding how to translate them requires someone who works with the specification:

  • null object → nullオブジェクト (not NULLオブジェクト) — because PDF's null is a lowercase keyword

  • deprecated → 非推奨 (not katakana デプリケーテッド) — because native Japanese is clearer for developers

  • FDF file → FDFファイル — because Japanese developers read this more naturally

AI extracts the terms. Domain knowledge decides the translations. DeepL enforces them consistently. Each piece is necessary.

One Friction Point

The DeepL MCP Server supports glossary usage (glossaryId in translate-text) and reading (list-glossariesget-glossary-infoget-glossary-dictionary-entries). But glossary creation requires leaving the MCP ecosystem — I had to use a shell script calling POST /v2/glossaries directly.

If create-glossary were available as an MCP tool, the entire workflow (extract terms → build glossary → translate) could stay within MCP. I've included a workaround script in the example for now.

Try It Yourself

The full workflow is documented as a reusable example with:

  • A 56-entry EN→JA glossary for PDF specification terms (TSV)

  • A registration script that auto-detects Free/Pro API

  • Before/after translation comparison

  • Step-by-step instructions

👉 pdf-spec-mcp/examples/translation-glossary

This approach works for any domain: API documentation, legal standards, medical specifications — anywhere consistent terminology matters.

Links

Best reply by Kai

This is incredibly meticulous! The attention to detail here is next level. It’s rare to see someone address the nuances of terminology consistency so thoughtfully from a technical perspective. Major respect for the craftsmanship behind this project!

View original