Most read

Automation Data Platform Energy Financial Retail Analytics
Real-time trading dashboard as a competitive advantage
Back to overview

Online Magazine

Quality translations with Azure Translator Service

Key Visual Hack of the week

Translating a document can be a lot of work, especially because its structure is usually not overtaken automatically, but has to be put in place again manually once the text is translated. Azure Translator Service helps with that, and in this hack, I will show you how.

 

By Thomas Hafermalz

Azure uses AI in many beneficial ways, one example is the Azure Cognitive Services family. It combines services like speech translation and speaker recognition, sentiment analysis and many more. Maybe the most broadly useful feature is the Azure Translator Service: it offers translations in over 100 languages and dialects, the opportunity to work with domain-specific terminology and it doesn’t log your texts during translation.

What is especially enticing: With this feature, the original document structure remains intact during the translation process. And: you can translate several documents at once.

The service translates

  • Office documents (Excel, Outlook, Word, PowerPoint),
  • PDF files,
  • HTML,

and many other document types.

Noob Hack

Get the translator up and running in three steps:


1. Create a Translator Service.

First, you have to create a Translator Service which is not part of the Cognitive Services resource.
Make sure to choose the S1 Tier to be able to translate documents and note the name you have chosen. By replacing <mycustomendpoint> with that name, you can set up the needed custom domain endpoint for the translation (https://<mycustomendpoint>.cognitiveservices.azure.com/translator/text/batch/v1.0-preview.1).





Tipp: Take a note of the subscription key.


2. Create a Storage Account.

Second, you need to set up a Storage Account (GP 2) with two blob containers – one for the source files and one for the translated target files.

To access the files, create two Shared Access Signatures (SASs):

- One for the source container (needs read and list permissions)

- Another for the target container (needs write and list permissions)


Note the generated signatures.





3. Create a ClientApp for the HTTP Post.

In a third step, a ClientApp for the HTTP Post needs to be created through which the translation job is started. In this case, I am using a .Net core console App. Choose Postman or your preferred tool.

Now set up the correct (custom) endpoint, the subscription key and both SASs in the JSON string. The example translates from German to English.

Starting the programme will translate every document from the source Blob container and store it in the target Blob container.


Programm.cs:

static readonly string route = "/batches";
private static readonly string endpoint = "https://(custom)endpoint.cognitiveservices.azure.com/translator/text/batch/v1.0-preview.1";
private static readonly string subscriptionKey = "key1";

static readonly string json = ("{\"inputs\": [{\"source\": {\"sourceUrl\": \"SAS-source\",\"storageSource\": \"AzureBlob\",\"language\": \"de\" }, \"targets\": [{\"targetUrl\": \"SAS-target\",\"storageSource\": \"AzureBlob\",\"category\": \"general\",\"language\": \"en\"}]}]}");

static async Task Main(string[] args)
{
using HttpClient client = new HttpClient();
using HttpRequestMessage request = new HttpRequestMessage();
{

StringContent content = new StringContent(json, Encoding.UTF8, "application/json");

request.Method = HttpMethod.Post;
request.RequestUri = new Uri(endpoint + route);
request.Headers.Add("Ocp-Apim-Subscription-Key", subscriptionKey);
request.Content = content;

HttpResponseMessage response = await client.SendAsync(request);
string result = response.Content.ReadAsStringAsync().Result;
if (response.IsSuccessStatusCode)
{
Console.WriteLine($"Status code: {response.StatusCode}");
Console.WriteLine();
Console.WriteLine($"Response Headers:");
Console.WriteLine(response.Headers);
}
else
Console.Write("Error");

}

}

Pro Hack

Microsoft Translator uses of Neural Machine Translation (NMT) which uses the context of full sentences in order to translate them, thus providing higher quality translations.


To reach this higher quality, you can train the translator: use previously translated documents, for example including domain-specific terminology and style, to build a translation system. On the basis of the same content in two or more languages, Custom Translator automatically matches sentences across documents.


Train the translator in three steps to get better translations:


1. Select the project where you want to build a model.


2. Manually select the documents you want to use for the training.


In the Data tab for the project, you can select training, tuning, and testing documents and see all the relevant information regarding the documents:

- Document name

- Pairing: Shows whether this is a parallel or monolingual document. (Note: Monolingual documents cannot be used for training yet.)

- Document type: Either training, tuning, testing, or dictionary.
Tipp: When you select documents of the "Training" document type for the training, there is a 10,000 parallel sentences minimum requirement. This does not apply, however, to other document types, e.g. "Dictionary" documents.

- Language pair: Shows source and target language.

- Source sentences: Shows the number of sentences extracted from the source file.

- Target sentences: Shows the number of sentences extracted from the target file.



Source: Microsoft


3. Create your model (and start the training).

Click the "Create model" button. Then give your model a name and either choose "Train immediately" to start the training right away or "Save as draft" to create the model metadata, but not start the training yet. Then again click "Create model".



Source: Microsoft


Good to know: You can check the status of the training in the models tab. The "Bleu Score" shows you how much the document pair you chose for the training improves the translation quality.




Source: Microsoft


Have fun hacking!

Your contact

SOMETHING FOR YOUR EARS AFTER READING? WE HAVE EXCITING PODCASTS & FUNNY PRANK CALLS FOR YOU!

Cat!apult
Cat!apult Podcast
IoT AI

A more human robot
Chantal is calling
Chantal Is Calling
IoT Data Platform

... Martin Schreiber
Cat!apult
Cat!apult Podcast
Food

Planted Meat