Structured XML content
Learn how to work with structured XML content with the DeepL API.
When enabling tag_handling
by setting it to xml
, the DeepL API is able to process structured XML content. This includes whole XML files, as the following example shows.
Please note that for XML structured content in classic models, we set the parameter split_sentences
to nonewlines
to make sure that newlines do not alter the results. The response is beautified for better readability.
Please note that for next-gen models, the parameter split_sentences
passed by the user is ignored and a value of nonewlines
is used for maximum translation quality.
Parameters and corresponding results:
Before sentences are translated, the XML file is parsed, and tags containing textual content other than white space are identified, in order to reproduce the XML structure in the translation.
In the example above, the title
and the two par
tags are found to contain text. These tags are considered sentence splitters. Therefore, each of the following three texts is treated separately:
A document's title
This is the first sentence. Followed by a second one.
This is the third sentence.
The second text is further split, as it contains two separate sentences. Each sentence is then translated separately and tags within the sentences (used here for formatting) are applied to the corresponding words in the translation.
Splitting on New Lines
Please note that newlines will split sentences. You should therefore clean files to avoid breaking sentences or set the parameter split_sentences=nonewlines
.
Request
Response
Restricting Splitting
For some XML files, finding tags with textual content and splitting sentences using those tags won't yield the best translation results. The following examples show the difference in results when the engine splits sentences on par
tags and proceed to translate the parts separately, as opposed to translating the sentence as a whole.
As this can lead to bad translations, this type of structure should either be avoided, or the non_splitting_tags
parameter should be set.
Parameters
Request
Response
Last updated