A life in data, a handbook for data scientists
By Gabey Goh July 13, 2015
- The Data Science Handbook based on interviews with 25 experts
- These data scientists share their insights, stories, and advice
MANY pursuits begin with a question.
For Max Song, William Chen, Carl Shan and Henry Wang, that first question came from a friend pursuing a doctorate in physics at MIT (the Massachusetts Institute of Technology). This was a highly intelligent and educated individual disgruntled about his career prospects.
“The number of open professorship positions has been monotonically decreasing year over year, and the path to an academic career path is more like a lottery ticket,” said this friend. “What else can I do after I graduate?”
“We mentioned that there was a new career path called a Data Scientist. ‘Yeah, but how do I get started in this field?’ he asked. We didn’t know,” the four-man team told Digital News Asia (DNA) via email.
At the time, all four were studying statistics, mathematics and machine learning, and each was independently being drawn to the field of data science.
They saw that while the field was receiving increasing media attention, there was not a lot of clear guidance for how to begin a career in data science, or what a data scientist did on a day-to-day basis.
“We were young, curious, just beginning our own journeys to do data science and filled with questions. We thought that if we could make a book that answered our own questions, it would also create something valuable for others,” the team said.
The Data Science Handbook (pic) project started in August 2013, but the desire to write a book has been with the group for a very long time.
“Growing up, we were inspired by the story of Ben Franklin’s Almanac, where he gathered and organised the wisdom of his time to share with his countrymen. We wanted to do something similar for our generation.”
One member of the team, Shan, had completed a digital book on how to be a product manager that became popular within its niche with more than 6,000 readers – giving them confidence that a digital book project on the topic of data science was feasible.
This lack of information problem about the emerging field of data science was also having an impact on real world economics, creating a supply-demand mismatch.
On the one hand, there were thousands of smart, quantitatively trained graduate students every year struggling to find jobs in the academic world – and on the other hand, there were companies hungry for data and analytics expertise that couldn’t find enough people to hire.
The path less taken
The team quickly realised the many parallels their project had with a startup: From identifying a market need for a focused resource that guides people, to then creating a product to meet that need.
“These parallels aren’t just by coincidence. In fact, when we were still originally considering publishing with a traditional publishing house, we had to fill out a ‘Book Proposal’ that looked suspiciously similar to a business plan, complete with market sizing, user personas and execution timelines,” they said.
While many startups and teams start locally and later try to expand internationally, it was a global effort from the start, with the team working asynchronously over nights and weekends, across three continents and four time-zones for large swaths of the journey.
There isn’t even a group photo of them yet, as this summer will mark the first time all four will be in the same place at the same time since work began on The Data Science Handbook.
“Throughout this journey, we had a chance to learn about negotiations with traditional book publishers, how to draw and manage crowd-sourced resources such as oDesk and 99Designs. We gave talks in Amsterdam, Paris, and California. We have had our blog posts translated into Chinese.
“We learnt how to persevere through setbacks and failures, navigate through the work and the communication challenges of a globally distributed team, respond to the launch of competitors, and navigate our own product launch and get our first taste of internet virality … what a rush,” they said.
And these were lessons being learnt in the background as each team member embarked on his own journey of carving out a data-driven career.
Song joined topological machine learning startup Ayasdi, backed by the US Defence Advanced Research Projects Agency (Darpa), as its youngest data scientist by a decade, and worked on high-stakes predictive models for a Fortune 10 company.
Shan became a Data Science for Social Good (DSSG) Fellow, and worked with US President Barack Obama’s former chief scientist on applying machine learning and data science to pressing policy issues.
Chen graduated with a Masters in Statistics and joined Quora, where he has built a following of more than 17,000 followers writing about resources for statistics and giving out data science advice.
Wang was an entrepreneur in Startup Chile before joining a New Zealand sovereign wealth fund, building mathematical models to inform hundred-million-dollar investment decisions with a focus on alternative energy technologies.
Working on this book proved to be a fortuitous arrangement, with the team collecting advice over a weekend interview and then applying it at work on Monday morning. After hands-on experience, they would go on to another interview, and ask more targeted and refined questions.
As they matured through their own experiences, the team started talking to more experienced individuals — such as the heads of data science at Uber, LinkedIn, Airbnb, Khan Academy and Cloudera.
Discussions grew from advice on crossing the academic chasm to higher considerations: How to build and manage a data science team, how to work effectively with engineering and product teams, and envisioning where the world was going to go next.
Leveraging data, pay what you want
These discoveries form the core of The Data Scientist Handbook, a compilation of indepth interviews with 25 data scientists who share their insights, stories, and advice.
The team is quick to point out that it is not a technical guide to the fledgling profession, but a resource that addresses common career questions: From what look for when evaluating data science roles at companies, to identifying mindsets, techniques and skills that distinguish the good from the great.
And true to form, as a team comprised of practicing statisticians and data scientists, a rigorous experimental approach was chosen when it came to deciding on how to price the book.
The team had heard of Pay-What-You-Want (PWYW) models, where readers can purchase the book for any amount they want or at least above a set threshold. However, the prevailing concern with such a model is that only a small percentage of people will contribute, and opt for meagre amounts at that.
“On the other hand, we also felt that PWYW was an exciting thing to try. A PWYW model would allow us to get the book out to as many people as possible without putting the book behind a paywall.
“We also had an inkling that this experimental pricing model would increase exposure for our book,” the team said.
So a large-scale pricing experiment with 5,700 readers was conducted and the results surprised the team.
“Much to our surprise, many of our readers who got this variant paid much more than $0. In fact, our average purchase price was about US$9. Some readers even paid US$30,” they said.
[Read the full breakdown and report on the pricing experiment here].
The entire process of creating The Data Science Handbook, from conception to digital publication, took about 18 months, and according to the team, work is still not done.
“The funny thing about a book is that it may not end once you hit ‘Publish.’ It’s only over when you choose for it to be over.
“For example, we have been busy at work making a physical book and have also been thinking of other complementary things we can create, such as a community forum,” said the team.
That physical book has become reality, and is now available via Amazon.
What’s next for the team?
“After that, much is in the air. Though one goal that we all agree on, is that we want to focus on taking all of the wisdom and advice we have accumulated, and really apply it to ourselves to become most effective and competent data scientist we can be!”
Next Up: A Q&A with the DSH team on data science, data scientists, and what the future holds
Addressing the data scientist glut
Malaysia’s big data aspirations and the talent gap
Singapore’s first data science academy launched
Seven institutes to offer data science courses: MDeC
For more technology news and the latest updates, follow us on Twitter, LinkedIn or Like us on Facebook.