307 Curated List of Free Data Science Ebooks, Courses and Resources
Data Science and Artificial Intelligence, are the two most important technologies in the world today. While Data Science makes use of Artificial Intelligence in its operations, it does not completely represent AI. While many consider contemporary Data Science as Artificial Intelligence, it is simply not so. Data science involves attempting to solve complex problems with data; AI consists of developing algorithms to find solutions to these problems. Data science can be related to AI but is not a subset of AI.
This list tries to cover from basic introductory to deep analytical data science materials. You’ll also find various curriculums for learning Data Science. Foundational in both theory and technologies, the Open Source Data Science Masters (OSDSM) breaks down the core competencies necessary to making use of data. All kinds of topics will be covered, including Python, machine learning, specific topics such as statistics, databases, linear algebra / programming, etc., courses, tutorials and a long list of blogs which you can follow and refer to.
Linear Algebra & Programming
- Linear Algebra – Khan Academy / Videos
- Linear Programming (Math 407) – University of Washington / Course
- An Intuitive Guide to Linear Algebra – Better Explained / Article
- A Programmer’s Intuition for Matrix Multiplication – Better Explained / Article
- Vector Calculus: Understanding the Cross Product – Better Explained / Article
- Vector Calculus: Understanding the Dot Product – Better Explained / Article
Convex Optimization
- Convex Optimization / Boyd – Stanford / Lectures
Statistics
- Think Stats: Probability and Statistics for Programmers – Digital
- Think Bayes – Digital
Computing
- Get your environment up and running with the – Data Science Toolbox
Algorithms
- Algorithms Design & Analysis I – Stanford / Coursera
Distributed Computing Paradigms
- See Intro to Data Science – UW / Lectures on MapReduce
- Intro to Hadoop and MapReduce – Cloudera / Udacity Course. Includes select free excerpts of Hadoop: The Definitive Guide.
Databases
- Introduction to Databases – Stanford / Online Course
- SQL School – Mode Analytics / Tutorials
- SQL Tutorials – SQLZOO / Tutorials
Data Mining
Data Design
- Tidy Data in Python – Focuses on one aspect of cleaning up data, tidying data: structuring datasets to facilitate analysis.
Machine Learning
Foundational & Theoretical
- Machine Learning – Ng Stanford / Coursera
- A Course in Machine Learning – UMD / Digital Book
- The Elements of Statistical Learning / Stanford – Digital
- Machine Learning – Caltech / Edx
Practical
- Machine Learning for Hackers – IPYNB / digital book
- Intro to scikit-learn, SciPy2013 – Youtube tutorials
Probabilistic Modeling
- Probabilistic Programming and Bayesian Methods for Hackers – Github / Tutorials
- Probabilistic Graphical Models – Stanford / Coursera
Deep Learning (Neural Networks)
- Neural Networks – Andrej Karpathy / Python Walkthrough
- Deep Learning for Natural Language Processing CS224d – Stanford
Social Network & Graph Analysis
- Social and Economic Networks: Models and Analysis / – Stanford / Coursera
Natural Language Processing
- From Languages to Information / Stanford CS147
- NLP with Python (NLTK library)
- How to Write a Spelling Correcter / Norvig (Tutorial)[Big Data Analysis with Twitter – UC Berkeley / Lectures
in Python
- Data Analysis in Python – Tutorial
Theoretical Courses / Design & Visualization
- Data Visualization – University of Washington / Slides & Resources
- Rice University’s Data Viz class – Rice University / Slides
Practical Visualization Resources
- D3 Library / Scott Murray – Blog / Tutorials
Other Related Posts
- More Data Science Ebooks & Resources
- 126 Free Artificial Intelligence (AI) Courses, Ebooks, Videos and Papers – 2021
This is a curated list of free Artificial Intelligence (AI) courses, ebooks, videos and papers pointing towards interesting directions and topics that you may be interested in. - Other Programming Posts
Python (Learning)
- Python – Class / Google
- Think Python – Digital
Python (Libraries)
- Command Line Install Script – For Scientific Python Packages.
- numpy Tutorial / Stanford CS231N – This course expects that many of you will have some experience with Python and numpy; for the rest of you, this section will serve as a quick crash course on both the Python programming language and its use for scientific computing.
- Pandas Cookbook – Data structure library.
Data Structures & Analysis Packages
- Flexible and powerful data analysis / manipulation library with labeled data structures objects, statistical functions, etc – Pandas Tutorials.
Machine Learning Packages
- scikit-learn – Tools for Data Mining & Analysis.
Networks Packages
- networkx – Network Modeling & Viz
Statistical Packages
- PyMC – Bayesian Inference & Markov Chain Monte Carlo sampling toolkit.
- Statsmodels – Python module that allows users to explore data, estimate statistical models, and perform statistical tests.
- PyMVPA – Multivariate Pattern Analysis in Python.
Natural Language Processing & Understanding
- NLTK – Natural Language Toolkit.
- Gensim – Python library for topic modeling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.
Data APIs
- twython – Python wrapper for the Twitter API.
Visualization Packages
- matplotlib – Well-integrated with analysis and data manipulation packages like numpy and pandas
- Seaborn – A high-level statistical visualization package built on top of matplotlib
iPython Data Science Notebooks
- Data Science in IPython Notebooks – Linear Regression, Logistic Regression, Random Forests, K-Means Clustering.
Capstone Project
- Capstone Analysis of Your Own Design; – Quora’s Idea Compendium.
- Analyze your LinkedIn Network – Generate & Download Adjacency Matrix.
Resources
Read
- DataTau – The “Hacker News” of Data Science.
- Wikipedia – The free encyclopedia.
- /r/MachineLearning – Machine learning subreddit.
Watch & Listen
Online Courses
- Metacademy – Search for a concept you want to learn.
- Coursera – Online university courses.
- Wolfram Alpha – The smart number and info cruncher.
- Khan Academy – High quality, free learning videos.
Lectures & Learning
Intro into Data-science
- https://www.youtube.com/watch?v=rpwZ_i-9U0o by Micheael Manoochehri (Materials) – DataEDGE 2013
- https://www.youtube.com/watch?v=Zdh3p4EKLeQ by Buck Woody – The DRIVE/conference 2013
Machine Learning
- CS 229: Machine Learning by Dr. Andrew Ng (Materials) – Stanford Lecture series 2008
- Learning from Data by Dr. Yaser S. Abu-Mostafa (Materials) – Caltech
- Machine Learning with Scikit-Learn (I) by Jake VanderPlas (Materials) – PyCon 2015
Statistical Methods
- Statistical Thinking for Data Science by Chris Fonnesbeck – SciPy 2015
Intro on Hadoop Ecosystem
- Intro to Hadoop by Bill Graham (Materials) – Part of Berkeleyi School course Info290: Analyzing Big Data With Twitter, 2012
HDFS
- HDF5 is Eating the World by Andrew Collette – SciPy 2015
- Introduction to HDF5 by Quincey Koziol – 2014
Python-based analysis
- My Data Journey with Python by Wes McKinney (Materials) – SciPy 2015
- Hands-on Data Analysis with Python by Sarah Guido – PyCon 2015
- Analyzing and Manipulating Data with Pandas by Jonathan Rocher (Materials) – SciPy 2015 Tutorial
- Machine Learning with Scikit Learn / Part 2 by Andreas Mueller & Kyle Kastner (Materials) – SciPy 2015 Tutorial
R-based analysis
- Introduction to Data Science with R – Data Analysis by David Langer (Materials) – 2014
Data Science Blogs
- A Blog From a Human-engineer-being
- Adit Deshpande
- Advanced Analytics & R
- Adventures in Data Land
- Ahmed BESBES
- Ahmed El Deeb
- Airbnb Data blog
- Alex Perrier
- Algobeans | Data Analytics Tutorials & Experiments for the Layman
- Amazon AWS AI Blog
- Amit Chaudhary
- Analytics and Visualization in Big Data @ Sicara
- Analytics Vidhya
- Andreas Müller
- Andrej Karpathy blog
- Andrew Brooks
- Andrey Kurenkov
- Andrey Vasnetsov
- Anton Lebedevich’s Blog
- Arthur Juliani
- Audun M. Øygard
- Avi Singh
- Beautiful Data
- Beckerfuffle
- Becoming A Data Scientist
- Ben Frederickson
- Berkeley AI Research
- Big-Ish Data
- Blog on neural networks
- Blogistic Regression
- blogR | R tips and tricks from a scientist
- Brain of mat kelcey
- Brilliantly wrong thoughts on science and programming
- Bugra Akyildiz
- Carl Shan
- Casual Inference
- Chris Stucchio
- Christophe Bourguignat
- Christopher Nguyen
- cnvrg.io blog
- colah’s blog
- Daniel Forsyth
- Daniel Homola
- Data Blogger
- Data Double Confirm
- Data Miners Blog
- Data Mining Research
- Data Mining: Text Mining, Visualization and Social Media
- Data School
- Data Science @ Facebook
- Data Science 101
- Data Science Dojo Blog
- Data Science Insights
- Data Science Notebook
- Data Science Tutorials
- Dataaspirant
- Dataclysm
- DataGenetics
- Dataiku
- DataKind
- Datanice
- Dataquest Blog
- DataRobot
- Datascienceblog.net
- Datascope
- DatasFrame
- David Mimno
- David Robinson
- Dayne Batten
- Deep and Shallow
- Deepdish
- Delip Rao
- DENNY’S BLOG
- Dimensionless
- Distill
- District Data Labs
- Diving into data
- Domino Data Lab’s blog
- Dr. Randal S. Olson
- Drew Conway
- Dustin Tran
- Eder Santana
- Edwin Chen
- EFavDB
- Eigenfoo
- Emilio Ferrara, Ph.D.
- Entrepreneurial Geekiness
- Eric Jonas
- Eric Siegel
- Erik Bern
- ERIN SHELLMAN
- Ethan Rosenthalh
- Eugenio Culurciello
- Fabian Pedregosa
- Fast Forward Labs
- FlowingData
- Full Stack ML
- Garbled Notes
- Grate News Everyone
- Greg Reda
- i am trask
- I Quant NY
- I’m a bandit
- inFERENCe
- Insight Data Science
- Ira Korshunova
- Jason Toy
- Java Machine Learning and DeepLearning
- jbencook
- Jesse Steinweg-Woods
- John Myles White
- Jonas Degrave
- Jovian
- Joy Of Data
- Julia Evans
- jWork.ORG.
- Kavita Ganesan’s NLP and Text Mining Blog
- KDnuggets
- Keeping Up With The Latest Techniques
- Kenny Bastani
- Kevin Davenport
- kevin frans
- korbonits | Math ∩ Data
- Large Scale Machine Learning
- LATERAL BLOG
- Lazy Programmer
- Learn Analytics Here
- LearnDataSci
- Learning With Data
- Life, Language, Learning
- Locke Data
- Louis Dorard
- M.E.Driscoll
- Machine Learning
- Machine Learning (Theory)
- Machine Learning Mastery
- Machine Learning, etc
- Machine Learning, Maths and Physics
- Machined Learnings
- MAPPING BABEL
- MAPR Blog
- MAREK REI
- MARGINALLY INTERESTING
- Mark White
- Math ∩ Programming
- Matthew Rocklin
- Mic Farris
- Mike Tyka
- Mirror Image
- Mitch Crowe
- MLWave
- MLWhiz
- Models are illuminating and wrong
- Moody Rd
- Natural language processing blog
- Neil Lawrence
- Neptune Blog: in-depth articles for machine learning practitioners
- Nikolai Janakiev
- no free hunch
- Nuit Blanche
- Number 2147483647
- On Machine Intelligence
- Opiate for the masses Data is our religion.
- p-value.info
- Pete Warden’s blog
- Peter Laurinec – Time series data mining in R
- Plotly Blog
- Probably Overthinking It
- Prooffreader.com
- Publishable Stuff
- PyImageSearch
- Pythonic Perambulations
- ℚuantitative √ourney
- quintuitive
- R and Data Mining
- R-bloggers
- R2RT
- Ramiro Gómez
- Randy Zwitch
- RaRe Technologies
- Reinforcement Learning For Fun
- Revolutions
- Rinu Boney
- Robert Chang
- Rocket-Powered Data Science
- Sachin Joglekar’s blog
- samim
- Sebastian Raschka
- Sebastian Ruder
- Sebastian’s slow blog
- Simply Statistics
- Springboard Blog
- Statistical Modeling, Causal Inference, and Social Science
- Stitch Fix Tech Blog
- Stochastic R&D Notes
- StreamHacker
- Subconscious Musings
- TechnoCalifornia
- TEXT ANALYSIS BLOG | AYLIEN
- The Angry Statistician
- The Clever Machine
- The Data Camp Blog
- The Data Incubator
- The Data Science Lab
- The Data Science Swiss Army Knife
- THE ETZ-FILES
- The Science of Data
- The Shape of Data
- The unofficial Google data science Blog
- Tombone’s Computer Vision Blog
- Tommy Blanchard
- Towards Data Science
- Trevor Stephens
- Trey Causey
- UW Data Science Blog
- Victor Zhou
- Wellecks
- Wes McKinney
- While My MCMC Gently Samples
- WildML
- Will do stuff for stuff
- Will wolf
- WILL’S NOISE
- William Lyon
- Win-Vector Blog
- Yanir Seroussi
- Zac Stewart
- 大トロ