“It is often said that a data scientist is someone who is better at software engineering than a statistician and better at statistics than any software engineer.” (Mike Becker)
A Data Science Taxonomy: Obtain, Scrub, Explore, Model, and iNterpret (Hilary Mason and Chris Wiggins)
obtain programmatically, not manually (scraping, APIs, etc).
Scrubbing data makes the subsequent analysis much more efficient. “A simple analysis of clean data can be more productive than a complex analysis of noisy and irregular data.”
‘“exploratory” in that no hypothesis is being tested, no predictions are attempted’ (George E. P. Box)
“The only way to find out what will happen when a complex system is disturbed is to disturb the system, not merely to observe it passively.” -Mosteller & Tukey
All models are wrong, but some are useful. “The predictive power of a model lies in its ability to generalize in the quantitative sense: to make accurate quantitative predictions of data in new experiments”
“The interpretability of a model lies in its ability to generalize in the qualitative sense: to suggest to the modeler which would be the most interesting experiments to perform next. In this step, domain expertise and intuition can be more important than technical or coding expertise”
a good data scientist needs to have computer science and math skills as well as a deep, wide-ranging curiosity, is innovative and is guided by experience as well as data.