Typically, metadata (i.e., machine-readable descriptions of digital resources) include natural language fields. Titles, summaries, or free-text keywords are common examples of such metadata elements.
In a multilingual context, correctly processing these free-text elements requires us to accurately identify the language of each text. In metadata standards such as IEEE LOM, this is achieved by using a special construct called “language strings” where the text language is systematically identified.
An example of how language strings can be implemented in XML is shown below:
<metadata> <title> <string language="en">Title</string> <string language="fr">Titre</string> </title> <description> <string language="en">Description</string> <string language="fr">Description</string> </description> </metadata>
This approach is not ideal because it requires one to parse the full metadata record to obtain a complete description in a given language.
This is why we usually recommend the use of language blocks where all the metadata elements in a given language are grouped together, which makes them easier to access and process.
<metadata> <block language="en"> <title>Title</title> <description>Description</description> </block> <block language="fr"> <title>Titre</title> <description>Description</description> </block> </metadata>
The example above gives a sense of how language blocks can simplify processing. In this way, displaying a complete description in a given language requires only the extraction of a corresponding language block.
In the same way, adding a language translation can be easily achieved by submitting a block in a source language to a translation service and adding back the translated block to the record.