I attended a meetup presentation of Silk two weeks ago and decided to give it a try. For the last couple of months, I’ve been working on a data project consisting of setting up a workflow for collecting and integrating a large quantity of open data from heterogeneous sources and I was hoping that Silk could help me to visualize and explore this newly integrated data. As I quickly realized, this is not Silk’s purpose. Silk is not a data visualization tool, not even a data presentation tool as such but rather an online publication/presentation tool that helps narrate a story and illustrate it with data.
The Silk data model is very simple and consists of ‘collections’, ‘pages’, and ‘facts’. If you consider a Silk collection as a single table in a relational database, a page is an instance (i.e., a row) and a fact is an attribute (i.e., a column) for which various data types are possible: text, numeric value, URL.
A page consists of at least a title (which must be unique for each page within the collection) and one or more facts listed as a two-column table: The first column for the fact names, the second for the fact values. This basic page structure can be improved by manually adding content (e.g., texts, images, audio or video recordings), visualizations (e.g., maps, pies, bar charts, tables), and overviews (e.g., recent pages, tables of content).
A collection can be created either by encoding each page manually or by uploading an Excel spreadsheet or a CSV file where, after the first row will be used as fact names, one page will be created for each of the following rows, the first column being used as page title. Each collection must have a unique title.
A Silk project consists of at least one collection that is referenced from a homepage. As can be seen with these examples, in addition to references to its collections, a homepage can be completed with texts, images, and visualizations of its collections that allow people to go directly to (instance) pages whereas going to an individual collection page allows you to interact with this collection by visualizing it (as a table, a list, a grid, a pie chart, a map) and filtering its (instance) pages in various ways.
In my opinion, it is important to remember that Silk is a presentation tool to tell stories and illustrate them with data. Using Silk assumes that you must first have a good story to tell, which takes time (the same way preparing a good PowerPoint presentation takes time). It also means that the data you use need to be prepared to serve your story, which also takes time. Chances are that you will not be able to use the data sets (even curated) you already have as-is. So, depending on what your goal is, expect to spend ‘a certain amount of time’ on OpenRefine or whatever your favorite data preparation tool might be. Also, keep in mind that:
- A Silk project can contain a maximum of 3000 pages, so if you deal with large data sets you probably want to carefully select the data instances you want to publish in each collection. (During the MeetUp I attended, Alex Salkever from Silk announced that this limit will be raised in 2015 as Silk is improving the graph database engine that powers the service.)
- A Silk collection is a simple table. This means that nested data structures, like those typically expressed in JSON, need to be normalized before they can be uploaded to Silk.
To conclude, I would say that Silk is a promising communication tool as demonstrated by the quality of some of the presentations already available on the platform and I look forward to seeing how it is going to develop in 2015. In the meantime, I keep looking for the tool that will allow me to easily and visually explore data.