Comprehensive learning path – Data Science in Python

关于怎么学习python,,并将python用于数据科学、数据分析、机器学习中的一篇很好的文章

Comprehensive(综合的) learning path – Data Science in Python

to a Kaggler on Python

So, you want to become a data scientist or may be you are already one and want to. You have landed at the right place. The aim of this page is to provide a comprehensive learning path to people new to python for data analysis. This path provides a comprehensiveoverview(综述) of steps you need to learn to use Python for data analysis. If you already have some background, or don’t need all theyour own paths and let us know how you made changes in the path.

Step 0: Warming up

Before starting your journey, the first question to answer is:

Why use Python?

or

How would Python be useful?

Watch the first 30 minutes of thistalk from Jeremy, Founder of DataRobot at PyCon 2014, Ukraine to get an idea of how useful Python could be.

Step 1: Setting up your machine

Now that you have made up your mind, it is time to set up your machine.The easiest way toproceed(开始) is to justdownload Anaconda from Continuum.io . It comes packaged with most of the things you will need ever. The majoris that you will need to wait for Continuum to update their packages, even when there might be an update available to theunderlying(潜在的) libraries. If you are a starter, that should hardly matter.

If you face any challenges in installing(安装), you can find moredetailed instructions for various OS here

Step 2: Learn the basics of Python language

You should start by understanding the basics of the language, libraries and datastructure(结构). The python track fromCodecademy is one of the best places to start your journey. By end of this course, you should be comfortable writing small scripts on Python, but also understand classes and objects.

Specifically learn: Lists, Tuples, Dictionaries, List comprehensions(理解), Dictionary comprehensions

Assignment: Solve the python tutorial(辅导的) questions on HackerRank. These should get your brain thinking on Python scripting

Alternate resources: If interactive(交互式的) coding is not your style of learning, you can also look at TheGoogle Class for Python. It is a 2 day classseries and also covers some of the parts discussed later.

Step 3: Learn Regular Expressions in Python

You will need to use them a lot for data cleansing(净化), especially if you are working on text data. The best way to learn Regular expressions is to go through the Google class and keep this cheat sheet handy.

Assignment: Do the baby names exercise

If you still need more practice, follow this in datawrangling(争论).

Step 4: Learn Scientific libraries in Python – NumPy, SciPy, Matplotlib and Pandas

This is where fun begins! Here is a brief introduction to various libraries. Let’s start practicing some common operations.

Practice the NumPy tutorial thoroughly, especially NumPy arrays(数组). This will form a goodfoundation(基础) for things to come.Next, look at the SciPy tutorials. Go through the introduction and the basics and do the remaining onesbasis(基础) your needs.If you guessed Matplotlib tutorials next, you are wrong! They are too comprehensive(综合的) for our need here. Instead look at thisipython notebook till Line 68 (i.e. till animations(活泼))Finally, let us look at Pandas. Pandas provide DataFrame functionality(功能) (like R) for Python. This is also where you should spend good time practicing. Pandas would become the mosteffective(有效的) tool for all mid-size data analysis. Start with a short introduction,10 minutes to pandas. Then move on to a more detailedtutorial on pandas.

You can also look at Exploratory(勘探的) Data Analysis with Pandas andData munging with Pandas

Additional Resources:

If you need a book on Pandas and NumPy, “Python(巨蟒) for Data Analysis by Wes McKinney”There are a lot of . You can have a look at themhere

Assignment: Solve this assignment(分配) from CS109 course from Harvard.

Step 5:Effective Data Visualization

Go through this lecture form CS109. You can ! Follow this lecture up withthis assignment

Step 6: Learn Scikit-learn and Machine Learning任何业绩的质变都来自于量变的积累。

Comprehensive learning path – Data Science in Python

相关文章:

你感兴趣的文章:

标签云: