This week was all about statistics and learning more python packages. It was a tough week and we covered the topic that intimidated me the most, the math.
I actually grew up loving math and I know I can understand it with enough focus and time spent studying. Still it was a lot of grad level content that we pretty much squeezed into a few days. There is no way to fully learn all the concepts in a week and that is a common theme throughout this class (and probably most bootcamps). Additionally, several people in the class have PhDs in STEM (science, tech, engineering & math) and understand the math at a whole other level. It is definitely helpful to have students to learn from while also making it hard to keep up in the exercises at times.
I suspect many out there who have thought about data science decided against it because of the math (if not the programming), and I can vouch for the fact that you will be looking at Greek letters literally and reading somewhat dense materials on statistical concepts. I know I’m not making this sound any better. but seriously, if you are already coding or thinking of taking on coding, you can take on the math.
I’m not an expert in it yet, but after this week, I can already pseudo read those Greek equations that wiki loves to use in math model examples, and I actually understand why we want to use distributions (to help define unknown and random variables). It’s hard and it was a week of massive frustration (head banging against a literal brick wall – they have them in our classroom). Still sometimes that’s what you got to go through to get started and there were break throughs this week.
If you do decide to take on Zipfian and/or pursue data science in any shape, I cannot say this enough that you should totally start studying stats and linear algebra as well as sprinkle in a little calc. A couple of resources we are using are:
- Coursera Stats Course (join and you get access to videos)
- Probabilistic Programming & Bayesian Methods for Hackers
When I get to concepts I don’t understand in some of the materials we are reading, I switch over to Khan Academy videos and if I’m still struggling then I search for explanations that put it in a form that works for me or talk to someone in class. Despite the prolific online resources, having a classroom environment like this can’t be beat in regards to enabling speed of learning.
Key Stats Concepts Covered:
- Uniform Distributions
- Bernoulli & Binomial Distribution
- Poisson Distribution
- Exponential Distribution
- Beta & Gamma Distribution
- Normal Distribution
- T Distribution
- Sampling Techniques
- Hypothesis Testing & Confidence Intervals
- Kolmogorov-Smirnoff Test
- Frequentist A/B Testing
- Bayesian A/B Testing
- Markov Chain Monte Carlo Algorithm
Key Python Packages/Tools Covered (New & Reviewed):
- Numpy – good for matrices
- Matplotlib – data visualization
- SciPy – statistic functions
- Pandas – data structure/storage & analysis
- PyMC – MCMC (Markov Chain & Monte Carlo) functions
Shout out to Giovanna for helping me interpret the proof on the computational Beta version for Bayesian A/B testing and Linda for the very relevant cartoon today. Next week is all about machine learning.