Practical Machine Learning实用机器学习 章1

Practical Machine Learning实用机器学习

1.1 Prediction motivation预测的动机

课程概览About this course

This course covers the basic ideas behind machine learning/prediction,What this course depends onWhat would be useful·Study design trainingvs. test setsConceptual issues outof sample error, ROC curvesPractical implementation thecaret package·The Data Scientist’s ToolboxR Programming·Exploratory analysisReporting Data and Reproducible ResearchRegression models

机器学习的用处

Local governments >pension(退休金) paymentsGoogle >whether you will click on an adAmazon >what movies you will watchInsurance companies >what your risk of death isJohns Hopkins >who will succeed in their programs

推荐书目及资源

The elements of statistical learning

Machine learning (more advanced material)

List of machine learning resources on QuoraList of machine learning resources from ScienceAdvanced notes from MIT open coursewareAdvanced notes from CMUKaggle machinelearning competitions

1.2 什么是预测What is prediction

预测问题的中心教条dogma

predict for these dots whether they’re red or blue:

choosing the right dataset and that knowing what the specific question is are again paramount(最重要的)

可能存在的问题

一个例子:Google Flu trends algorithm didn’t realize the search terms that people would use would change over time.They might use different terms when they were searching, and so that would affect the algorithm’s performance.And also, the way that those terms were actually being used in the algorithm wasn’t very well understood.And so when the function of a particular search term changed in their algorithm, it can cause problems.

预测器的流程components of a predictor

question -> input data -> features -> algorithm -> parameters -> evaluation

Note: question: What are you trying to predict and what are you trying to predict it with?

预测的一个例子:垃圾邮件

question -> input data -> features -> algorithm -> parameters -> evaluation

Start with a general question

Can I automatically detect emails that are SPAM that are not?

Make it concrete

Can I use quantitative characteristics of the emails to classify them as SPAM/HAM?

Note:try to make it as concrete as possible

question -> input data -> features -> algorithm -> parameters -> evaluation

rss.acs.unt.edu/Rdoc/library/kernlab/html/spam.html

question -> input data -> features -> algorithm -> parameters -> evaluation

library(kernlab)data(spam)head(spam)question -> input data -> features -> algorithm-> parameters -> evaluation

Our simple algorithm

Find a value C. frequency of ‘your’ > C predict "spam"

Note:best cut off is above 0.5 then we say that it’s SPAM, and if it’s below 0.5 we can say that it’s HAM.

question -> input data -> features -> algorithm -> parameters -> evaluation

question -> input data -> features -> algorithm -> parameters -> evaluation

1.3 步骤的相对重要性Relative importance of steps

{about the tradeoffs and the different components of building a machine learning algorithm}

Relative order of importance:question > data > features > algorithms

Then creating features is an important component in that if you don’t compress the data in the right way you might lose all of the relevant and valuable information.And finally, in my experience it’s been the algorithmis often the least important part of building a machine learning algorithm.It can be very important depending on the exact modality of the type of data that you’re using.For example, image data and voice data can require certain kinds of prediction algorithms that might not necessarily be as.

An important pointThe combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.–John Tukey一个人去旅行,而且是去故乡的山水间徜徉。

Practical Machine Learning实用机器学习 章1

相关文章:

你感兴趣的文章:

标签云: