2nd “t.m.i.” Meetup: Going In-Depth on Aggregating Data

We are glad to announce that the “t.m.i.” group will meet for the 2nd time on Thursday February 12 at 6:15pm at the Workshop Cafe, 180 Montgomery Street, San Francisco.

We will hear and then discuss the following presentations:

Joe Nelson: Coordinating Web Scrapers

Joe Nelson will demo a computer cluster architecture for massive parallel web scraping built out of free, open-source components. Learn how to provision, coordinate, and monitor scrapers in real-time. After reviewing the pieces of the architecture we’ll see it in action scraping a real site. Finally we’ll see how the data is consolidated in S3 storage and touch on the next steps for data transformation.

David Massart: A Data Model, Workflow, and Architecture for Integrating Data

The presentation proposes an approach for integrating data from different data sources. It starts by introducing “actions” and “facts”, the two core concepts of the data model upon which the proposed approach is based. Then it looks at the workflow that leads from the acquisition of raw data from various sources to its storage and integration as action-fact data. Finally, it proposes an architecture for supporting this workflow.

Feel free to join us if you are in SF next Thursday!

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s