How publications managers can use big data to power localisation

by Richard Foskett on 13th March 2018

Big data powers localisation

As a publications manager you will be accustomed to being pulled in multiple directions. In global organisations, requiring multiple local versions of various publications your role becomes even more challenging.

Your task is to deliver consistently high quality whilst increasing scope, reducing cost and meeting shorter deadlines. The problem is that within these three parameters, quality can suffer and yet that is the measurement used by your organisation to determine your success. Without a magic wand what are your options?

  1. Squeeze the lemon – is there anything else you can cut?
  2. Look at technological possibilities to enhance the process – eg big data

If you could control quality, time and cost to get the right balance for a specific job would that help you achieve your measurable goals? Our client – a global industrial organisation – agrees that it does. We have built a managed service solution called TX pipeline that gives them and you that level of control now. Together with our alliance partners XTM International, CrossLang, and WK Automotive the solution is available now.

What is TX Pipeline?

It’s a configurable managed service pipeline of tasks, forming a process that content moves through. Different tasks can be added or configured for different content types. The individual pipelines run in the AWS cloud and are cost efficient because the scale up and down in response to demand/load.

There are three major components to the solution based on our partners’ capabilities:

This means that it can provide deduplication; automated image handling; translation management (TM) and machine translation (MT).

TX task examples

Metadata extraction

This is executed by the Pipe Manager using metadata to optimise the tasks executed. It includes support for XML and other attributes as well as support for enrichment. For example: Fetch popularity or importance of a particular piece of content

TX Metadata Extraction


The solution can convert different types of structured data into a standardised format. The formatting itself is separated from the content in order to optimise matching and visual editing in XTM. There is the option to convert back to the original format.

TX Conversion

Image Processing

This enables the deduplication of identical images and captions across publications. Intelligent caption placement and collision detection is undertaken by image recognition algorithms. The costs of human image processing are significantly reduced.

TX Image Processing

Pipeline Logic

This capability uses rules to determine which tasks to apply to each piece of content. It can integrate to both internal and external services including extracted metadata. For example: Send least popular content for Machine Translation and optionally add a post edit task to review for quality.

TX Pipeline Logic

User Interface

The user is empowered through dashboards and reports that show what is where in the pipeline and they have the ability to:

TX User Interface

Big data architecture

Using the proven Data Delta method and principles, TX Pipeline is architected for flexibility and control. It is cloud-based and runs in AWS and MongoDB enabling auto-scale through Beanstalk and Lambda. It is API enabled and the integration with CrossLang and XTM is built in.

What this means is that for our global clients is that we can make the solution go faster or slower, scale up or down depending on their needs at the time which gives them huge flexibility.


Our client can run:

Then they can also choose to post-edit if they want to review it. They could do it faster if they wished. They are in control – TX Pipeline is flexible.

The solution is run as a fully managed service handling all their document types and giving them full control over the speed and quality of the publications. They can also scale up and down as needed based on defined SLAs and KPIs.

It is also cost effective since they are only paying for what they need and is run by a fully trained service team with no lock-in to their in-house systems giving them peace of mind.

TX Pipeline Managed Service


This is how big data powers localisation. TX pipeline is a big-data enabled solution that empowers publications managers to pay the price they want to get the value they need. Value is measured by the appropriate balance of speed and quality with them always in control of the process.

This solution was presented live at the XTM live conference in March 2018.

If you’d like to talk about how TX pipeline could help your publications team produce high quality, localised content with complete control over the speed and cost of the process then please contact us?