A valuable alternative to free texts are controlled vocabularies because their meaning is explicitly stated, which avoids the inevitable ambiguities inherent in natural language. Controlled vocabularies can vary from a flat list of terms to more structured ones where terms are related using precisely defined relationships (e.g., synonym rings, authority files, classifications, thesaurus).
Ideally, a controlled vocabulary itself has a unique identifier; just as each term in the vocabulary has a unique identifier. In this case, labels in multiple languages can be associated with these identifiers. In order to get the most of controlled vocabularies, identifiers are used by machines while labels are displayed to human end-users. This approach ensures that vocabularies’ semantics are processable and can support multilingualism. For example, an English-speaking indexer can select controlled terms based on their English labels but only identifiers are stored in metadata records and indexed. If a French-speaking user wants to search the resulting index, they can select one or more controlled terms using the French labels while the system uses the term identifiers to search the actual index. Similarly, when displaying a metadata record, the term identifiers can be substituted by labels in the language of the user.
Since different systems can be used to create, consume, and search metadata records, it is important to store controlled vocabularies in machine-readable formats easily accessible to all the systems that need to reference them. Standard formats are in existence for controlled vocabularies (e.g., ZTHES, IMS VDEX, SKOS). The simplest way to share a controlled vocabulary consists of encoding it in one of these formats and publishing it on a website. Alternatively, controlled vocabularies can be stored in specialized vocabulary banks that can be used to create, manage, and maintain vocabularies along with their translations.
When a controlled term is used in a metadata record, three components are necessary to unambiguously identify it:
- The identifier of the system in which the vocabulary is defined,
- The identifier of the vocabulary in this system, and
- The identifier of the vocabulary term itself.
Note that, if the vocabulary is published online, then components 1 and 2 are expressed by the URL of the vocabulary as in the example below.
<coverage type="http://vocabulary.bank/coverage/vocabulary/id">term_id</a>