The 9 Pitfalls of Data Science

ISBN : 9780198844396

Gary Smith; Jay Cordes
272 ページ
129 x 196 mm

Data science has never had more influence on the world. Large companies are now seeing the benefit of employing data scientists to interpret the vast amounts of data that now exists. However, the field is so new and is evolving so rapidly that the analysis produced can be haphazard at best.
The 9 Pitfalls of Data Science shows us real-world examples of what can go wrong. Written to be an entertaining read, this invaluable guide investigates the all too common mistakes of data scientists - who can be plagued by lazy thinking, whims, hunches, and prejudices - and indicates how they have been at the root of many disasters, including the Great Recession.
Gary Smith and Jay Cordes emphasise how scientific rigor and critical thinking skills are indispensable in this age of Big Data, as machines often find meaningless patterns that can lead to dangerous false conclusions. The 9 Pitfalls of Data Science is loaded with entertaining tales of both successful and misguided approaches to interpreting data, both grand successes and epic failures. These cautionary tales will not only help data scientists be more effective, but also help the public distinguish between good and bad data science.


1 Pitfall #1: Using Bad Data
2 Pitfall #2: Putting Data Before Theory
3 Pitfall #3: Worshiping Math
4 Pitfall #4: Worshiping Computers
5 Pitfall #5: Torturing Data
6 Pitfall #6: Fooling Yourself
7 Pitfall #7: Confusing Correlation with Causation
8 Pitfall #8: Being Surprised By Regression Toward the Mean
9 Pitfall #9: Doing Harm
10 Case Study: The Great Recession


Gary Smith is the Fletcher Jones Professor of Economics at Pomona College. He received his Ph.D. in Economics from Yale University and was an Assistant Professor there for seven years. He has won two teaching awards and written (or co-authored) more than 80 academic papers and twelve books including Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie With Statistics, What the Luck? The Surprising Role of Chance in Our Everyday Lives, and Money Machine: The Surprisingly Simple Power of Value Investing. His research has been featured by Bloomberg Radio Network, CNBC, The Brian Lehrer Show, Forbes, The New York Times, Wall Street Journal, Motley Fool, Newsweek, and BusinessWeek.
Jay Cordes is a data scientist who enjoys tackling challenging problems, including how to guide future data scientists away from the common pitfalls he saw in the corporate world. He's a recent graduate from UC Berkeley's Master of Information and Data Science (MIDS) program and graduated from Pomona College with a mathematics major. He has worked as a software developer and a data analyst and was also a strategic advisor and sparring partner for the winning pokerbot in the 2007 AAAI Computer Poker Competition world championship.