Data Integration, Part 1: Actions & Facts

We have already presented the facetted mechanism that allows us to describe resources by encapsulating in a precisely-defined manner all the information necessary to describe a specific aspect of these resources so that this information can be easily consumed and processed by specialized applications or functionalities.  We have also shown how the Java Script Object Notation (JSON) can be used to maximize the benefits of this approach.

In this post and those that follow, we will look at how data can be integrated from different data sources to build such descriptions. This post introduces “actions” and “facts”, the two core concepts of the action-fact data model upon which ZettaDataNet’s approach to data integration is based.

The main idea is that resources are described with facts collected during successive data acquisition actions. These resources can be anything of interest. They are characterized by a type (e.g., they can be of type product, customer, geographical area) and a unique identifier (e.g., products might have a unique product id, customers a social security number, geographical areas a zip code).

Facts are the basic properties of resources. For example, a person can have a name, an age, a weight, etc. Facts are characterized by:

  • A property name (e.g., weight),
  • A property value (e.g., 155 pounds), and
  • A timestamp (e.g., 2015-01-27) since a property value can vary over time (e.g., the weight of a person is not constant during all their life).

A data acquisition action occurs when a tool is used to acquire facts about one or more resources from a data source at a given time. Such actions are characterized by:

  • An action identifier (e.g., action #1),
  • A timestamp (e.g., 2015-01-27 17:20:34),
  • A tool (e.g., a Web Scrapper), and
  • A data source (e.g., http://census.gov). Note that, in our approach, the concept of data source is very broad. A source of data can be a database or a web site but also a human filling in a form.

As a result, a fact collected by a tool from a data source during a data acquisition action is presented below.


{
    "action id": 1,
    "action timestamp": "2015-01-12 19:30:01",
    "tool": "zettadownloader",
    "data source": "http://api.census.gov/data/2012/acs5?get=B25082_001E,B25111_001E,NAME&for=zip+code+tabulation+area:*",
    "resource type": "area",
    "resource id": "94114",
    "fact property": "value",
    "fact value": "8508810400",
    "fact timestamp": "2012-07-01"
}

Such data elements are the result of the data acquisition workflow that we will introduce in our next post.

Acknowledgement: Several concepts used by the fragment data model described in this post took shape after hearing Daan Gerits’ Big Data BluePrint presentation.

Advertisements
This entry was posted in Data Modeling and tagged , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s