In our last post “the facet solution“, we presented a generic metadata model that, thanks to a facet mechanism, can be used to describe potentially any type of resource.
A key aspect of this facet approach is the ability for an application to process only information relevant to its services while ignoring other information present in a record. Each facet encapsulates all the information necessary to describe a specific aspect of a resource so that this information can easily be consumed and processed by specialized applications. The only thing these applications have to do is look for the facet that contains the information they require without having to know anything about the other facets used to describe the resource.
Today we will look at how the Java Script Object Notation (JSON) can be used to implement this metadata model in a way that maximizes its benefits.
A Very Brief Introduction to JSON
JSON is very simple (a complete description can be found here). It consists of:
- 4 simple value types:
- String (e.g., “text”),
- Number (e.g., 123, 4.56),
- Boolean (e.g., true, false), and
- Null (e.g., null)
- And 2 structured types:
- Object, a collection of name/value pairs (e.g., {“title”: “Title”, “description”: “”}) and
- Array (e.g., [12,13,14]).
The types described above can be used to build arbitrary complex data structures.
A JSON specification known as JSON Schema allows for describing a JSON data structure in JSON (and that allows for validating JSON data against these schemas). However, although useful for precisely defining a JSON data structure, this notation leads to schemas that are difficult to read, which is why, in this post, we prefer to use a more graphical (an informal) way of describing JSON structures based on mind maps where the label of each node corresponds to the name of a name/value pair. The detail of these conventions will be introduced with the metadata implementation itself.
The Container Structure
The JSON representation of the generic metadata container structure is a direct representation of the conceptual structure introduced in a previous post.

Work-level metadata
A metadata record is a JSON object consisting of 4 name/value pairs:
- Id: The identifier of the record,
- Type: The type of the resource described by this metadata record (e.g., video, simulation, data set). Ideally, these types are defined in a controlled vocabulary.
- Description: A container for the work-level facets used to describe the resource. The label [work facet name] indicates that the value of “description” is a collection of name/value pairs where the names are the names of different facets. Facets are described in details in the next section.
- Expressions: A container for expression-level metadata. The star at the end of the name indicates that the associated value is an array. In this case, an array of expression objects.
Expression-level metadata
An expression is a JSON object consisting of 4 name/value pairs:
- Languages: An array containing the language(s) of the considered expression of the resource (ideally expressed as ISO-639-1 or ISO-639-2 codes).
- Versions: An array containing the version number(s) of the considered expression,
- Description: A container for the expression-level facets used to describe the resource.
- Manifestations: A container for manifestation-level metadata consisting of an array of manifestation objects.
Manifestation-level metadata
A manifestation is a JSON object consisting of 4 name/value pairs:
- Name: The name of the manifestation (i.e., the way the resource is presented to the users). Ideally, these manifestation names are defined in a controlled vocabulary.
- Parameters: An array of parameters further specifying the manifestation. For example, a manifestation named “web feed” can have a parameter “rss”.
- Description: A container for the manifestation-level facets used to describe the resource.
- Items: A container for item-level metadata consisting of an array of item objects.
Item-level metadata
An Item is a JSON object consisting of 2 name/value pairs:
- Location: The location of the resource copy (e.g., the URL at which it can be obtained).
- Description: A container for the item-level facets used to describe the resource.
Example
Here is an example of what an actual record looks like (the facets are not shown to save space). It describes a data set available in two languages: English (en) and French (fr) in CSV format.
{
"id": "#12345",
"type": "data set",
"description": {},
"expressions": [
{
"languages": ["en"],
"versions": ["1.0"],
"description": {},
"manifestations": [{
"name": "document",
"parameters": ["spreadsheet"],
"description": {},
"items": [{
"location": "http://somewhere.com/data-en.csv",
"description": {}
}]
}]
},
{
"languages": ["fr"],
"versions": ["1.0"],
"description": {},
"manifestations": [{
"name": "document",
"parameters": ["spreadsheet"],
"description": {},
"items": [{
"location": "http://somewhere.com/data-fr.csv",
"description": {}
}]
}]
}
]
}
Facets
A facet consists of a name/value pair where the name is the name of the facet and the value is a JSON object consisting of 3 name/value pairs:
- Schema: An reference (ideally a URL pointing) to the JSON schema that can be used to validate the facet.
- Controlled block: A container for metadata elements whose values come from controlled vocabularies.
- Language block: A container for free text elements in different languages.

Controlled block
Controlled blocks are used to store all the controlled vocabulary elements of a facet. As explained in a previous post, it is assumed that these controlled vocabularies and their translations are managed independently (possibly in a vocabulary bank), so that the only things that need to be stored in metadata records are vocabulary term identifiers and a reference to the vocabulary they belong too.
As a consequence, the value of a controlled block consists of a JSON object consisting of one name/value pair by metadata element. The value of these pairs are arrays (a metadata element can be multi-valued) of JSON objects with 2 name/value pairs:
- Source that contains the identifier of the vocabulary and
- Values that contains the identifier of the term(s) from the source vocabulary.
{
"coverage": [{
"source": "http://vocabulary.bank/coverage/vocabulary/id",
"values": ["term_id"]
}],
"subject": [{
"source": "http://vocabulary.bank/subject/thesaurus/id",
"description": [
"descriptor_1",
"descriptor_2"
]
}]
}
Language block
As explained in a previous post, language blocks are used to store all the free text-elements of a facet organized by language so that they can be easily be retrieved and used.
The value of a language block is a JSON object consisting of a name/value pair for each metadata language available where the name corresponds to the metadata language (e.g., an ISO-639-1 language code) and the value is a JSON object containing the metadata elements and their values in the language considered as shown is the example below.
{
"en": {
"title": "Title",
"description": "Description"
},
"fr": {
"title": "Titre",
"description": "Description"
}
}
This format makes it easy for an application to check if metadata is available in a given language and to use it (e.g., displaying it, indexing it, or even translating it).
A note on implementing facets
In the proposed container structure, representing description elements as JSON objects of facets proves very convenient since it is a data structure that is natively supported by most programming languages (for example HashMap in Java).
This makes it very easy for an application that receives a description object to check if the facet it is interested in exists while totally ignoring the presence (or absence) of other facets in the description.
For example, in Java (assuming: JSONObject description ; ) a simple if statement is sufficient.
if (description.containsKey( "facet name" ))
{
// handle the facet content
}