The self-study plan
Tutor: I would aim to hire a tutor who had experience in data science and also had experience teaching data science or a similar technical topic.
Learners: I would also aim to find one to six other learners to join me in the program, to split the cost of the tutor and to work with each other for most of the day.
Curriculum: I would start with this curriculum, with a focus on NLP and time-series biodata.
The value of “Doing It Yourself”
This community values autodidacticism, so there may be a slight bias towards the romantic appeal of “doing better than the bootcamp through clever self-scholarship.” Historically, I prefer independent learning and hacking through things on my own. My goal is to get both the practical knowledge of data science and the “learning how to learn” experience.
Conditions necessary for the self-study plan to work
High-quality human feedback. A tutor who: (1) has sufficient background in statistics / programming / data science, (2) has some kind of teaching/communication experience, (3) is willing to commit 10-20 hours / week.
Study Partner(s). One to five other “learners” who: (1) have a similar level of technical background, (2) want to spend at least 5-10 hours / week working together
Concrete project descriptions and deadlines
(with responses from Sebastian Benthall, PhD Candidate, UC Berkeley School of Information)
1) What would you add or change to the above plan for self-study?
My understanding is that industry positions are not only interested in understanding a variety of machine learning concepts and techniques. They are also interested in both: (1) the ability to correctly frame and address research questions, and (2) the ability to write maintainable software code in a team. I’ve heard it said that there is a big difference between data science aimed at people making decisions and data science aimed at building autonomous systems.
The former will need more emphasis on visualization and communication of results, and perhaps require more methods for determining causal relations from observational data. The latter is going to be more about developing systems for real-time control of anomaly detection.
Depending on who or what you intend to eventually work for, you might want to focus your energy in one way or another.
Do know people who could be potential “learners” or “teachers” for a self-study data science program? Please connect us!
Tons. I’m a teacher, for example. Consider us connected.
If you have successfully completed a self-study in a technical subject, why did the thing you did work?
How can we estimate the value of designing and executing a self-study program, compared to the bootcamp’s more frequent feedback from subject experts?
You have creative suggestions for how to build “stakes” into the program. What challenge could I commit to doing 4 months from now, that would require a significant understanding of data science?
This may not be the sort of thing that interests you, but what I’ve been doing with my data science apprentices is teaching them data science by coaching them in participation in an open source software project:
The project harvests communications data available on the web (for example, email–I haven’t yet run this mailing list through yet, but would like to–that could be a cool project for you if you are interested) and prepares it for analysis as time series, text, and network data. My apprentices have been working with this very real data to derive insights and support research into the dynamics of collaboration and communication on-line.
I check their work by reviewing their pull requests. Slowly, we’re building an open source community around the project. The idea is that the project will be able to process the data that the community building it generates. One of my research goals is to use the project to understand how rationally design social organizations.
If participating in that sort of project seems ‘high stakes’ to you, I’d be happy to talk to you about how to get involved over the summer.
Zipfian vs Metis comparison
Satvik Beri response
1) what is your goal? (goal factor this. this is central to the cohesion of my curriculum)
If your goal is primarily to get a relatively fun job doing numerical stuff, then the right path is generally to pick up just a fraction of the skills in the nanodegree, get a job, and do as much on the job learning as possible. On the other hand, if your goal is to do serious research, then you need much more than 12 weeks’ worth of study
2) How Satvik learned
Started by reading one data sciencey textbook (Elements of Statistical Learning) and coding up everything I could in my free time. I told lots of people that I was interested in a job doing Machine Learning and Marketing, preferably in Lisp, and that got some attention, partially because people who want to work in Lisp tend to be pretty unusual :). I got a job before I could write a logistic regression on my own, and did the Andrew Ng Machine Learning course after work each day, while also writing an internal blog on what I learned. That got me some basic theory, and my applied skills mostly skyrocketed by doing Kaggle contests where I had a chance to apply my work and get very quick feedback, as well as learn Python. I think the main reason this all worked so well was because it was a job-related skill and I constantly had people to talk to.
As a challenge I highly recommend trying to place within the top 10% in a Kaggle contest. It’s really good applied experience that will ingrain a lot of skills you don’t quite get in theoretical courses, while still letting you focus mostly on the machine learning aspect. In fact, I basically recommend starting Kaggle contests on a few hours a week even before you’ve taken any theoretical courses.
I’d be happy to discuss tutoring, especially with other EAs. I’ve also done a lot of work helping people get good jobs, negotiate their salaries, and earn more money, and would be happy to talk about that.