Data Science From Scratch
Data science, also known as the sexiest job of the century, has become a dream job for many of us. But for some it feels like a difficult maze and they don’t know where to start. If you are one of them, keep reading.
If you have a computer background, you are probably familiar with programming in Python, in which case you can skip this step. But if you’re not yet exposed to the fun of coding, you should start learning Python. It is the easiest to learn of all programming languages and it is widely used for development as well as for data analysis. Perhaps the best tutorial for this learning path is Data Science From Scratch by Joe Grus (the second edition published in 2019 by O’Reilly).
For dummies, you can search for free online tutorials that will help you understand the basics of Python. I am listing some links where you can learn Python on your own in a short period of time. You can try them out and choose for yourself.
Learn statistics and math
Data science is the ability to analyze data and generate useful and actionable insights. To do this, you must have a basic knowledge of statistics and mathematics. I’m not demanding that you be a great statistician, but you should know the basics to understand important things like how data is distributed and how algorithms work. Let’s see what you need to learn.
Data science from scratch: first principles with Python
It has been almost exactly four years since the first edition came out, and in that time it has helped dozens of people learn data science, Python, or possibly a combination of both.
However, the first edition used Python 2.7. And over time, I’ve felt more and more guilty for publishing a book by my name that tells people to use Python 2. Because in 2021, you shouldn’t use Python 2. Stop using Python 2!
Quotes from Data science from scratch book
“Just run: pip install ipython and then search the Internet for solutions to whatever cryptic error messages that causes.”
“This means that, where appropriate, we will dive into mathematical equations, mathematical intuition, mathematical axioms, and cartoon versions of big mathematical ideas.”
Data science libraries, frameworks, modules, and toolkits are great for data science, but they’re also a great way to immerse yourself in the discipline without really understanding data science. In this book, you’ll learn how many of the most basic data science tools and algorithms work by implementing them from scratch.
We live in a world that is drowning in data. The websites keep track of every click of every user. Your smartphone records your location and speed every second every second.
“Quantified autonomous drivers” wear steroid pedometers that constantly record their heart rate, exercise habits, diet and sleep patterns. Smart cars collect driving habits, smart homes collect lifestyle habits and smart salespeople collect shopping habits. The Internet itself is a huge knowledge graph that contains (among other things) a huge cross-referenced encyclopedia; domain specific databases on movies, music, sports scores, pinball machines, memes and cocktails; and too many government statistics (some almost true!) from too many governments to worry.
Buried in this data are answers to myriad questions that no one ever wanted to ask. This book explains how to find them.