We have already presented the facetted mechanism that allows us to describe resources by encapsulating in a precisely-defined manner all the information necessary to describe a specific aspect of these resources so that this information can be easily consumed and processed by specialized applications or functionalities. We have also shown how the Java Script Object Notation (JSON) can be used to maximize the benefits of this approach.
In this post and those that follow, we will look at how data can be integrated from different data sources to build such descriptions. This post introduces “actions” and “facts”, the two core concepts of the action-fact data model upon which ZettaDataNet’s approach to data integration is based.
The main idea is that resources are described with facts collected during successive data acquisition actions. These resources can be anything of interest. They are characterized by a type (e.g., they can be of type product, customer, geographical area) and a unique identifier (e.g., products might have a unique product id, customers a social security number, geographical areas a zip code).
Facts are the basic properties of resources. For example, a person can have a name, an age, a weight, etc. Facts are characterized by:
- A property name (e.g., weight),
- A property value (e.g., 155 pounds), and
- A timestamp (e.g., 2015-01-27) since a property value can vary over time (e.g., the weight of a person is not constant during all their life).
A data acquisition action occurs when a tool is used to acquire facts about one or more resources from a data source at a given time. Such actions are characterized by:
- An action identifier (e.g., action #1),
- A timestamp (e.g., 2015-01-27 17:20:34),
- A tool (e.g., a Web Scrapper), and
- A data source (e.g., http://census.gov). Note that, in our approach, the concept of data source is very broad. A source of data can be a database or a web site but also a human filling in a form.
As a result, a fact collected by a tool from a data source during a data acquisition action is presented below.
"action id": 1,
"action timestamp": "2015-01-12 19:30:01",
"data source": "http://api.census.gov/data/2012/acs5?get=B25082_001E,B25111_001E,NAME&for=zip+code+tabulation+area:*",
"resource type": "area",
"resource id": "94114",
"fact property": "value",
"fact value": "8508810400",
"fact timestamp": "2012-07-01"
Such data elements are the result of the data acquisition workflow that we will introduce in our next post.
Acknowledgement: Several concepts used by the fragment data model described in this post took shape after hearing Daan Gerits’ Big Data BluePrint presentation.