XIII. Clustering (Week 8)

机器学习Machine Learning – Andrew NG courses学习笔记

Clustering 聚类

{unsupervised learning algorithm where we learn from unlabeled data instead of the label data}

Unsupervised Learning_ Introduction非监督学习介绍

a typical supervisory problem : given a label training sets and the goal is to find the decision boundary that separates the positive label examples and the negative label examples.

The supervised learning problem in this case is given a set of labels to fit a hypothesis to it.

unsupervised learning : give this sort of unlabeled training set to an algorithm and we just ask the algorithm: find some structure in the data for us.

one type of structure we might have an algorithm find, is that has points grouped into two separate clusters and so an algorithm that finds that clusters like the ones circled, is called a clustering algorithm.聚类算法的应用

Social network analysis：information about who are the people that you email the most frequently and who are the people that they email the most frequently, and to find coherent groups of people.you’d want to find who other coherent groups of friends in a social network.

K-Means Algorithm K-均值算法（聚类算法的一种）

In the clustering problem we are given an unlabeled data set and we would like to have an algorithm automatically group the data into coherent subsets or into coherent clusters for us.K-均值算法步骤图形化展示randomly initialize two points, called the cluster centroids.

Note:two Cluster Centroids because I want to group my data into two clusters.K Means is an iterative algorithm and it does two things.First is a cluster assignment step, and second is a move centroid step.cluster assignment step:going through each of the examples,depending on whether it’s closer to the red cluster centroid or the blue cluster centroid,assign each of the data points to one of the two cluster centroids.

move centroid step:move the two cluster centroids to the average of the points colored the same colour.

迭代这两个步骤的效果：

converged收敛: keep running additional iterations of K means the cluster centroids will not change any further and the colours of the points will not change any further. And so, at this point,K means has converged and it’s done a pretty good job finding the two clusters in this data.

K-均值算法

最大的成功在于最大的付出。

相关文章：

你感兴趣的文章：

标签云：