tag_handling
by setting it to xml
, the DeepL API is able to process structured XML content. This includes whole XML files, as the following example shows.
Please note that for XML structured content in classic models, we set the parameter split_sentences
to nonewlines
to make sure that newlines do not alter the results. The response is beautified for better readability.
Please note that for next-gen models, the parameter split_sentences
passed by the user is ignored and a value of nonewlines
is used for maximum translation quality.
Parameters and corresponding results:
Example
Parameters
Example request
Example response
title
and the two par
tags are found to contain text. These tags are considered sentence splitters. Therefore, each of the following three texts is treated separately:
- A document’s title
- This is the first sentence. Followed by a second one.
- This is the third sentence.
Splitting on New Lines
Please note that newlines will split sentences. You should therefore clean files to avoid breaking sentences or set the parametersplit_sentences=nonewlines
.
Incorrect translation due to new lines
Request
Response
Here, the two parts of the sentence have been translated separately and resulted in an error: “oat biscuits” has been translated as “Hafer Kekse” instead of “Haferkekse”.
Restricting Splitting
For some XML files, finding tags with textual content and splitting sentences using those tags won’t yield the best translation results. The following examples show the difference in results when the engine splits sentences onpar
tags and proceed to translate the parts separately, as opposed to translating the sentence as a whole.
As this can lead to bad translations, this type of structure should either be avoided, or the non_splitting_tags
parameter should be set.
Parameters
Request
Response
This time, the sentence is translated as a whole. The XML tags are now considered markup and copied into the translated sentence. As the translation of the words “had been” has moved to another position in the German sentence, the two
par
tags are duplicated (which is expected here).