Structured XML content
Learn how to work with structured XML content with the DeepL API.
When enabling tag_handling
by setting it to xml
, the DeepL API is able to process structured XML content. This includes whole XML files, as the following example shows.
Please note that for XML structured content, we set the parameter split_sentences
to nonewlines
to make sure that newlines do not alter the results. The response is beautified for better readability.
Parameters and corresponding results:
Before sentences are translated, the XML file is parsed, and tags containing textual content other than white space are identified, in order to reproduce the XML structure in the translation.
In the example above, the title
and the two par
tags are found to contain text. These tags are considered sentence splitters. Therefore, each of the following three texts is treated separately:
A document's title
This is the first sentence. Followed by a second one.
This is the third sentence.
The second text is further split, as it contains two separate sentences. Each sentence is then translated separately and tags within the sentences (used here for formatting) are applied to the corresponding words in the translation.
Splitting on New Lines
Please note that newlines will split sentences. You should therefore clean files to avoid breaking sentences or set the parameter split_sentences=nonewlines
.
Request
Response
ℹ️ Here, the two parts of the sentence have been translated separately and resulted in an error: "oat biscuits" has been translated as "Hafer Kekse" instead of "Haferkekse".
Restricting Splitting
For some XML files, finding tags with textual content and splitting sentences using those tags won't yield the best translation results. The following examples show the difference in results when the engine splits sentences on par
tags and proceed to translate the parts separately, as opposed to translating the sentence as a whole.
As this can lead to bad translations, this type of structure should either be avoided, or the non_splitting_tags
parameter should be set.
Parameters
Request
Response
ℹ️ This time, the sentence is translated as a whole. The XML tags are now considered markup and copied into the translated sentence. As the translation of the words "had been" has moved to another position in the German sentence, the two par
tags are duplicated (which is expected here).
Last updated