Big Data technology consists of a set of tools and techniques used to handle data when the amount of this data, its (lack of) structure and/or the speed at which it needs to be processed exceed the (scaling-up) capacity of conventional database management systems. These technologies are usually based on multi-node architectures designed to easily scale out.
When talking about Big Data, there is a tendency to presume that the amount of data at hand is so ‘big’, so messy, and in such flux that it can only be handled by (big) data scientists, i.e., hackers who rely on Big Data technology to apply statistics, machine learning, and other techniques to extract meaningful information from Big Data sets which are regarded as ‘black boxes’.
But does it really need to be a black box?
There is no reason why Big Data technology cannot be used to collect, analyze, classify, manipulate, store, and retrieve Big Data in the way information scientists typically manage information.
Considering that before any useful information can be extracted from Big Data, it needs to be obtained and scrubbed. Two necessary evils that, despite being considered as secondary, are generally reported by data practitioners as counting for up to 80% of the effort. It is imperative to tackle this in a more systematic way by adapting information management techniques to this new ‘big’ environment.
What constitutes information management in a Big Data world, how can Big Data be managed? These are issues we are interested in and would like to discuss with others during the Meet-up we organized for Thursday November 20th at 7pm Pacific. So, if you are in San Francisco at that time feel free to join the discussion.