编程作业（七）

K均值算法与主成分分析算法

K均值分析算法

在本部分练习中，你将实现K均值算法并将该算法用于图像压缩。最初，你通过使用2D数据集帮助你理解K均值算法。在此之后，你将使用K均值算法以减少颜色数量的方式压缩图像。

任务一实现K均值算法

K均值算法是一种自动将相似数据聚类在一起的方法，其基本代码如下：

% Initialize centroids
centroids = kMeansInitCentroids(X, K);
for iter = 1:iterations
    % Cluster assignment step: Assign each data point to the
    % closest centroid. idx(i) corresponds to cˆ(i), the index
    % of the centroid assigned to example i
    idx = findClosestCentroids(X, centroids);

    % Move centroid step: Compute means based on centroid
    % assignments
    centroids = computeMeans(X, idx, K);
end

上述代码中，循环分为了两步：1）将样本数据x⁽ⁱ⁾聚类至最近的聚类中心；2）计算每一聚类的均值再将样本数据重新分配。

Step 1：寻找最近的聚类中心

根据上述公式，我们在findClosestCentroids.m文件中添加如下代码：

for i = 1 : length(idx)

    T = [];
    for j = 1 : K
        T = [T; X(i, :)];
    end
    [t, idx(i)] = min(sum((T - centroids) .^ 2, 2));  % sum(A, 2)表示对矩阵A按列求和
end

Step 2：计算每一聚类的均值

根据上述公式，我们在computeCentroids.m文件中添加如下代码：

for k = 1 : K

    T = find(idx == k);
    centroids(k, :) = sum(X(T, :)) / size(T, 1);   % size(A, 1)表示求矩阵A的行数
end

最后，我们运行该任务部分代码，其中ex7.m文件中的代码如下：

%% ================= Part 1: Find Closest Centroids ====================
%  To help you implement K-Means, we have divided the learning algorithm 
%  into two functions -- findClosestCentroids and computeCentroids. In this
%  part, you should complete the code in the findClosestCentroids function. 
%
fprintf('Finding closest centroids.\n\n');

% Load an example dataset that we will be using
load('ex7data2.mat');

% Select an initial set of centroids
K = 3; % 3 Centroids
initial_centroids = [3 3; 6 2; 8 5];

% Find the closest centroids for the examples using the
% initial_centroids
idx = [];
idx = findClosestCentroids(X, initial_centroids);

fprintf('Closest centroids for the first 3 examples: \n')
fprintf(' %d', idx(1:3));
fprintf('\n(the closest centroids should be 1, 3, 2 respectively)\n');

fprintf('Program paused. Press enter to continue.\n');
pause;

运行结果为：

任务二随机初始化（该部分任务不需提交）

在实际应用中，我们更为推荐对训练集进行随机初始化操作。我们在kMeansInitCentroids.m文件中添加如下代码：

% Initialize the centroids to be random examples

% Randomly reorder the indices of examples
randidx = randperm(size(X, 1));
% Take the first K examples as centroids
centroids = X(randidx(1:K), :);

上述代码通过使用randperm()函数随机排列了样本数据的索引，然后其基于索引的随机排列选择最初的K个样本数据。这种方式避免了出现两次随机选择相同样本数据的问题。

任务三使用K均值算法对图像压缩（该部分任务不需提交）

128*128的原始图片

在24位彩色图像中，每个像素表示为红色、绿色和蓝色强度值的三个8位无符号整数（其取值范围为0~255）。我们的图像包括数千种颜色，在这部分练习中，你需将颜色的数量减少至16种。通过这种方式，可以有效地压缩图像。在该方式下，你只需存储16个所选颜色地RGB值，并且对于图像中地每个像素，你只需存储该位置地颜色索引即可。

在本练习中，你将使用K均值算法来选择用于表示压缩图像的16中颜色。具体来说，你应将原始图像中的每个像素视为样本数据，通过使用K均值算法找出最佳的16种颜色。

在Octave或 Matlab中，我们使用如下代码来导入图像：

% Load 128x128 color image (bird small.png)
A = imread('bird small.png');

% You will need to have installed the image package to used
% imread. If you do not have the image package installed, you
% should instead change the following line to
%
% load('bird small.mat'); % Loads the image into the variable A

这创建了一个三维矩阵A，其前两个索引标识像素位置，最后一个索引表示红色、绿色或蓝色。例如：A(50, 33, 3)表示行50和列33处像素的蓝色强度。

该部分代码如下：

%% ============= Part 4: K-Means Clustering on Pixels ===============
%  In this exercise, you will use K-Means to compress an image. To do this,
%  you will first run K-Means on the colors of the pixels in the image and
%  then you will map each pixel onto its closest centroid.
%  
%  You should now complete the code in kMeansInitCentroids.m
%

fprintf('\nRunning K-Means clustering on pixels from an image.\n\n');

%  Load an image of a bird
A = double(imread('bird_small.png'));

% If imread does not work for you, you can try instead
%   load ('bird_small.mat');

A = A / 255; % Divide by 255 so that all values are in the range 0 - 1

% Size of the image
img_size = size(A);

% Reshape the image into an Nx3 matrix where N = number of pixels.
% Each row will contain the Red, Green and Blue pixel values
% This gives us our dataset matrix X that we will use K-Means on.
X = reshape(A, img_size(1) * img_size(2), 3);

% Run your K-Means algorithm on this data
% You should try different values of K and max_iters here
K = 16; 
max_iters = 10;

% When using K-Means, it is important the initialize the centroids
% randomly. 
% You should complete the code in kMeansInitCentroids.m before proceeding
initial_centroids = kMeansInitCentroids(X, K);

% Run K-Means
[centroids, idx] = runkMeans(X, initial_centroids, max_iters);

fprintf('Program paused. Press enter to continue.\n');
pause;


%% ================= Part 5: Image Compression ======================
%  In this part of the exercise, you will use the clusters of K-Means to
%  compress an image. To do this, we first find the closest clusters for
%  each example. After that, we 

fprintf('\nApplying K-Means to compress an image.\n\n');

% Find closest cluster members
idx = findClosestCentroids(X, centroids);

% Essentially, now we have represented the image X as in terms of the
% indices in idx. 

% We can now recover the image from the indices (idx) by mapping each pixel
% (specified by its index in idx) to the centroid value
X_recovered = centroids(idx,:);

% Reshape the recovered image into proper dimensions
X_recovered = reshape(X_recovered, img_size(1), img_size(2), 3);

% Display the original image 
subplot(1, 2, 1);
imagesc(A); 
title('Original');

% Display compressed image side by side
subplot(1, 2, 2);
imagesc(X_recovered)
title(sprintf('Compressed, with %d colors.', K));


fprintf('Program paused. Press enter to continue.\n');
pause;

运行结果为：

主成分分析算法

在部分练习中，你将使用主成分分析算法实现降维。最初，你将在2D数据集上了解PCA算法，然后在一个大数据集上使用PCA算法。

任务一可视化数据集

该部分代码如下：

%% ================== Part 1: Load Example Dataset  ===================
%  We start this exercise by using a small dataset that is easily to
%  visualize
%
fprintf('Visualizing example dataset for PCA.\n\n');

%  The following command loads the dataset. You should now have the 
%  variable X in your environment
load ('ex7data1.mat');

%  Visualize the example dataset
plot(X(:, 1), X(:, 2), 'bo');
axis([0.5 6.5 2 8]); axis square;

fprintf('Program paused. Press enter to continue.\n');
pause;

运行结果为：

任务二实现PCA算法

PCA算法由两部分组成：首先计算数据的协方差矩阵，然后使用Octave或Matlab的svd()函数来计算特征向量U₁, ···, U_n。

不过在使用PCA算法之前，我们应当对训练集进行均值归一化操作，其具体实现代码可在featureNormalize.m文件中查看。

求取协方差矩阵的数学公式为：

svd()函数为：[U, S, V] = svd(Sigma)

我们在pca.m文件中添加如下代码：

Sigma = X' * X / m;
[U, S, V] = svd(Sigma);

该部分在ex7_pac.m文件的代码如下：

%% =============== Part 2: Principal Component Analysis ===============
%  You should now implement PCA, a dimension reduction technique. You
%  should complete the code in pca.m
%
fprintf('\nRunning PCA on example dataset.\n\n');

%  Before running PCA, it is important to first normalize X
[X_norm, mu, sigma] = featureNormalize(X);

%  Run PCA
[U, S] = pca(X_norm);

%  Compute mu, the mean of the each feature

%  Draw the eigenvectors centered at mean of data. These lines show the
%  directions of maximum variations in the dataset.
hold on;
drawLine(mu, mu + 1.5 * S(1,1) * U(:,1)', '-k', 'LineWidth', 2);
drawLine(mu, mu + 1.5 * S(2,2) * U(:,2)', '-k', 'LineWidth', 2);
hold off;

fprintf('Top eigenvector: \n');
fprintf(' U(:,1) = %f %f \n', U(1,1), U(2,1));
fprintf('\n(you should expect to see -0.707107 -0.707107)\n');

fprintf('Program paused. Press enter to continue.\n');
pause;

运行结果为：

任务三使用PCA算法降维

这部分你需要将数据集X中的每个数据投影到主成分矩阵U中的前K个分量上。projectData.m文件中的代码如下：

U_reduce = U(:, 1 : K);
Z = X * U_reduce;

任务四重建数据的近似值

recoverData.m添加的代码如下：

U_reduce = U(:, 1 : K);
X_rec = Z * U_reduce';

任务三、四部分的运行结果如下：

本次编程作业需要提交的部分到此结束。文档中的后续任务，此处不再叙述。请自行查阅和动手实践，谢谢！

最后编辑于：2017.12.10 23:47:00

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 203,937评论 6赞 478
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 85,503评论 2赞 381
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 150,712评论 0赞 337
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 54,668评论 1赞 276
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 63,677评论 5赞 366
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 48,601评论 1赞 281
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 37,975评论 3赞 396
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,637评论 0赞 258
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 40,881评论 1赞 298
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,621评论 2赞 321
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,710评论 1赞 329
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,387评论 4赞 319
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 38,971评论 3赞 307
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 29,947评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,189评论 1赞 260
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 44,805评论 2赞 349
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,449评论 2赞 342

编程作业（七）

K均值算法与主成分分析算法

K均值分析算法

主成分分析算法

推荐阅读更多精彩内容