Defining Data Science

“It is often said that a data scientist is someone who is better at software engineering than a statistician and better at statistics than any software engineer.” (Mike Becker)

A Data Science Taxonomy: Obtain, Scrub, Explore, Model, and iNterpret (Hilary Mason and Chris Wiggins)

Obtain
obtain programmatically, not manually (scraping, APIs, etc).

Scrub
Scrubbing data makes the subsequent analysis much more efficient. “A simple analysis of clean data can be more productive than a complex analysis of noisy and irregular data.”

Explore
‘“exploratory” in that no hypothesis is being tested, no predictions are attempted’ (George E. P. Box)

“The only way to find out what will happen when a complex system is disturbed is to disturb the system, not merely to observe it passively.” -Mosteller & Tukey

Model
All models are wrong, but some are useful. “The predictive power of a model lies in its ability to generalize in the quantitative sense: to make accurate quantitative predictions of data in new experiments”

Interpret
“The interpretability of a model lies in its ability to generalize in the qualitative sense: to suggest to the modeler which would be the most interesting experiments to perform next. In this step, domain expertise and intuition can be more important than technical or coding expertise”

a good data scientist needs to have computer science and math skills as well as a deep, wide-ranging curiosity, is innovative and is guided by experience as well as data.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s