PCA原理解读和绘制方法

⚠️该教程的PCA绘图基于ggplot2,可以根据ggplot2语法对图片进行额外的修改和保存。

1. 基础

PCA:全称Principal Component Methods,也就是主成分分析。

主成分分析是一种通过协方差分析来对数据进行降维处理的统计方法。首先利用线性变换,将数据变换到一个新的坐标系统中;然后再利用降维的思想,使得任何数据投影的第一大方差在第一个坐标(称为第一主成分)上,第二大方差在第二个坐标(第二主成分)上。这种降维的思想首先减少数据集的维数,同时还保持数据集的对方差贡献最大的特征,最终使数据直观呈现在二维坐标系。主要适用于以下情况:已获得一定数目的变量的观测值,并希望能够构造出少数几个综合变量,在最大程度上反映原始数据的消息。

主要目的:
发现数据的内在模式
对高维数据进行降维,去除数据的噪音和冗余
识别相关变量

2. 计算

2.1 R包

很多R包中的functions可以用来计算PCA:
· prcomp()princomp() [built-in R stats package],
· PCA() [FactoMineR package],
· dudi.pca() [ade4 package],
· epPCA() [ExPosition package]

在这个教程里主要是用两个R包:FactoMineR (for the analysis) 和factoextra (for ggplot2-based visualization)。

安装和加载

install.packages(c("FactoMineR", "factoextra"))
library("FactoMineR")
library("factoextra")
2.2 数据格式

载入演示数据集

data(decathlon2)
View(decathlon2)
在这个表中,1:23行是Active individuals,也就是主成分分析需要是用的Individuals;24:27行是Supplementary individuals;1:10列是Active variables,也就是主成分分析需要是用的variables;11:12行是Supplementary variables。

把主成分分析所需要用的Active individuals和Active variables提取出来

decathlon2.active <- decathlon2[1:23, 1:10]
head(decathlon2.active[, 1:6], 4)
2.3 数据标准化

在主成分分析中,变量通常需要scaled(比如standardized),尤其当变量是使用不同的尺度的单位来衡量的时候。
标准化的目标是使变量之间具有可比性。通常scale后的数据要求i)标准差为1 ii)均值为0。
在基因表达数据的分析中,数据标准化是在PCA和聚类前的一个常用方法。当变量的均值和/或标准差差异非常大的时候,我们通常需要去做scale。
当对变量进行scale的时候,数据可以做这样的变换:

R的基础功能scale()可以被用于对数据进行标准化。这个函数输入的是一个numeric matrix,并且会对列进行标准化。

注意:在使用函数PCA() [in FactoMineR]的时候,数据会自动进行标准化,无需额外标准化处理。

2.4 R code

函数PCA()[in FactoMineR]的用法如下:

PCA(X, scale.unit = TRUE, ncp = 5, ind.sup = NULL, 
    quanti.sup = NULL, quali.sup = NULL,  graph = TRUE)
  • X: a data frame. Rows are individuals and columns are numeric variables
  • scale.unit: a logical value. If TRUE, the data are scaled to unit variance before the analysis. This standardization to the same scale avoids some variables to become dominant just because of their large measurement units. It makes variable comparable.
  • ncp: number of dimensions kept in the final results.
  • graph: a logical value. If TRUE a graph is displayed.
  • ind.sup : 指定Supplementary individuals所在的列
  • quanti.sup, quali.sup : 指定定性和定量观测值所在的列

实操:

library("FactoMineR")
res.pca <- PCA(decathlon2.active, graph = FALSE)

PCA()返回的是一个list,包含"$eig", "$var","$ind"和"$call"四类数据。四类数据的取法将在下一个部分介绍。

print(res.pca)
# **Results for the Principal Component Analysis (PCA)**
# The analysis was performed on 23 individuals, described by 10 variables
# *The results are available in the following objects:

#    name               description                          
# 1  "$eig"             "eigenvalues"                        
# 2  "$var"             "results for the variables"          
# 3  "$var$coord"       "coord. for the variables"           
# 4  "$var$cor"         "correlations variables - dimensions"
# 5  "$var$cos2"        "cos2 for the variables"             
# 6  "$var$contrib"     "contributions of the variables"     
# 7  "$ind"             "results for the individuals"        
# 8  "$ind$coord"       "coord. for the individuals"         
# 9  "$ind$cos2"        "cos2 for the individuals"           
# 10 "$ind$contrib"     "contributions of the individuals"   
# 11 "$call"            "summary statistics"                 
# 12 "$call$centre"     "mean of the variables"              
# 13 "$call$ecart.type" "standard error of the variables"    
# 14 "$call$row.w"      "weights for the individuals"        
# 15 "$call$col.w"      "weights for the variables" 

3. Visualization and Interpretation

3.1 Eigenvalues(特征值) / Variances ("$eig")

定义:参考主成分分析PCA以及特征值和特征向量的意义

PC1这个轴是原点到每个变量在轴上投影的距离平方和最大时的轴,每个变量在轴上的投影点到原点距离平方和就是这个PC的特征值。Eigenvalue for PC1开方=Singular value for PC1(图片出自StatQuest: Principal Component Analysis (PCA), Step-by-Step)

每个PC的Variation是特征值/n-1。该PC的Variation/所有PC的Variation的和=variance.percent

The eigenvalues measure the amount of variation retained by each principal component. 靠前的PCs的Eigenvalues比较大,后面的PCs的Eigenvalues比较小。That is, the first PCs corresponds to the directions with the maximum amount of variation in the data set.

#调取Eigenvalues信息
library("factoextra")
eig.val <- get_eigenvalue(res.pca)
eig.val
##        eigenvalue variance.percent cumulative.variance.percent
## Dim.1       4.124            41.24                        41.2
## Dim.2       1.839            18.39                        59.6
## Dim.3       1.239            12.39                        72.0
## Dim.4       0.819             8.19                        80.2
## Dim.5       0.702             7.02                        87.2
## Dim.6       0.423             4.23                        91.5
## Dim.7       0.303             3.03                        94.5
## Dim.8       0.274             2.74                        97.2
## Dim.9       0.155             1.55                        98.8
## Dim.10      0.122             1.22                       100.0

Eigenvalues的总和是10(total variance)。The proportion of variation explained by each eigenvalue is given in the second column.

Eigenvalues可用于决定我们后续分析是用多少个PC数(Kaiser 1961)。

但不幸的是,没有公认的客观的方法去决定多少PC是足够的。不同研究领域和不同数据集的情况是不同的。 In practice, we tend to look at the first few principal components in order to find interesting patterns in the data.

在上面的数据中,前三个PC可以解释72%的variation。这个数值是可以接受的。
另一个替代的方法来决定要使用多少PC是查看碎石图Scree Plotfviz_eig() or fviz_screeplot() [factoextra package])。

fviz_eig(res.pca, addlabels = TRUE, ylim = c(0, 50))
87%的信息(variances)被前五个PC保留
3.2 Graph of variables ("$var")
  • 3.2.1 结果查看
    我们 可以使用get_pca_var() [factoextra package]取出variables的结果。
var <- get_pca_var(res.pca)
var
## Principal Component Analysis Results for variables
##  ===================================================
##   Name       Description                                    
## 1 "$coord"   "Coordinates for the variables"                
## 2 "$cor"     "Correlations between variables and dimensions"
## 3 "$cos2"    "Cos2 for the variables"                       
## 4 "$contrib" "contributions of the variables"

get_pca_var()的结果包含如下变量(可以被用于 plot)
· var$coord: 绘制scatter plot的变量坐标
· var$cos2: 是var$coord值的平方。represents the quality of representation for variables on the factor map.
· var$contrib: 是cos2/所有cos2的和再乘100(因为单位是%)。contains the contributions (in percentage) of the variables to the principal components.

# Coordinates
head(var$coord)
#                   Dim.1       Dim.2      Dim.3       Dim.4      Dim.5
# X100m        -0.8506257 -0.17939806  0.3015564  0.03357320 -0.1944440
# Long.jump     0.7941806  0.28085695 -0.1905465 -0.11538956  0.2331567
# Shot.put      0.7339127  0.08540412  0.5175978  0.12846837 -0.2488129
# High.jump     0.6100840 -0.46521415  0.3300852  0.14455012  0.4027002
# X400m        -0.7016034  0.29017826  0.2835329  0.43082552  0.1039085
# X110m.hurdle -0.7641252 -0.02474081  0.4488873 -0.01689589  0.2242200

# Cos2: quality on the factore map
head(var$cos2)
#                  Dim.1        Dim.2      Dim.3        Dim.4      Dim.5
# X100m        0.7235641 0.0321836641 0.09093628 0.0011271597 0.03780845
# Long.jump    0.6307229 0.0788806285 0.03630798 0.0133147506 0.05436203
# Shot.put     0.5386279 0.0072938636 0.26790749 0.0165041211 0.06190783
# High.jump    0.3722025 0.2164242070 0.10895622 0.0208947375 0.16216747
# X400m        0.4922473 0.0842034209 0.08039091 0.1856106269 0.01079698
# X110m.hurdle 0.5838873 0.0006121077 0.20149984 0.0002854712 0.05027463

# Contributions to the principal components
head(var$contrib)
#                  Dim.1      Dim.2     Dim.3       Dim.4     Dim.5
# X100m        17.544293  1.7505098  7.338659  0.13755240  5.389252
# Long.jump    15.293168  4.2904162  2.930094  1.62485936  7.748815
# Shot.put     13.060137  0.3967224 21.620432  2.01407269  8.824401
# High.jump     9.024811 11.7715838  8.792888  2.54987951 23.115504
# X400m        11.935544  4.5799296  6.487636 22.65090599  1.539012
# X110m.hurdle 14.157544  0.0332933 16.261261  0.03483735  7.166193
  • 3.2.2 Correlation circle
    变量和主成分(PC)之间的correlation被用于每个PC上变量的坐标。
    The correlation between a variable and a principal component (PC) is used as the coordinates of the variable on the PC. The representation of variables differs from the plot of the observations: The observations are represented by their projections, but the variables are represented by their correlations (Abdi and Williams 2010).
# Coordinates of variables
head(var$coord, 4)
##            Dim.1   Dim.2  Dim.3   Dim.4  Dim.5
## X100m     -0.851 -0.1794  0.302  0.0336 -0.194
## Long.jump  0.794  0.2809 -0.191 -0.1154  0.233
## Shot.put   0.734  0.0854  0.518  0.1285 -0.249
## High.jump  0.610 -0.4652  0.330  0.1446  0.403
fviz_pca_var(res.pca, col.var = "black")

这个图就是variable correlation plots,它展示出所有变量间的关系。它的解读如下:
· 正相关的变量会聚在一起
· 负相关的变量会处于相反的方向,位于相反的象限。
· 变量和原点之间的距离衡量factor map上变量的质量。离原点越远的变量在factor map上被展示的更好。

  • 3.2.3 Quality of representation
head(var$cos2, 4)
##           Dim.1   Dim.2  Dim.3   Dim.4  Dim.5
## X100m     0.724 0.03218 0.0909 0.00113 0.0378
## Long.jump 0.631 0.07888 0.0363 0.01331 0.0544
## Shot.put  0.539 0.00729 0.2679 0.01650 0.0619
## High.jump 0.372 0.21642 0.1090 0.02089 0.1622

可以使用corrplot包可视化

library("corrplot")
corrplot(var$cos2, is.corr=FALSE)


也可以使用fviz_cos2()[in factoextra]来画条图

# Total cos2 of variables on Dim.1 and Dim.2
fviz_cos2(res.pca, choice = "var", axes = 1:2)

对于给定的变量,所有PCs的cos2之和=1。如果一个变量可以完美的被两个主成分所代表(如Dim.1 & Dim.2),这两个PCs的cos2之和=1。这样的话这个变量就会被画在variable factor map的圈上。对于一些变量来说,想要比较好的代表数据,需要超过两个PCs,这样的话变量就会被画在圈内。

# Color by cos2 values: quality on the factor map
fviz_pca_var(res.pca, col.var = "cos2",
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"), 
             repel = TRUE # Avoid text overlapping
             )
  • 3.2.4 Contributions of variables to PCs
library("corrplot")
corrplot(var$contrib, is.corr=FALSE) 
# Contributions of variables to PC1
fviz_contrib(res.pca, choice = "var", axes = 1, top = 10)
# Contributions of variables to PC2
fviz_contrib(res.pca, choice = "var", axes = 2, top = 10)
红线是expected average contribution。如果变量的贡献是一致的,那么贡献值应该是1/length(variables) = 1/10 = 10%。对于给定的PC,贡献度超过红线的variable 被认为对这个PC贡献较大。
fviz_contrib(res.pca, choice = "var", axes = 1:2, top = 10)

变量对某个PC的贡献度可以通过颜色深浅展示出来

fviz_pca_var(res.pca, col.var = "contrib",
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07")
             )

根据贡献度改变透明度

fviz_pca_var(res.pca, alpha.var = "contrib")

自定义颜色

# Create a random continuous variable of length 10
set.seed(123)
my.cont.var <- rnorm(10)
# Color variables by the continuous variable
fviz_pca_var(res.pca, col.var = my.cont.var,
             gradient.cols = c("blue", "yellow", "red"),
             legend.title = "Cont.Var")

根据分组绘制颜色

# Create a grouping variable using kmeans
# Create 3 groups of variables (centers = 3)
set.seed(123)
res.km <- kmeans(var$coord, centers = 3, nstart = 25)
grp <- as.factor(res.km$cluster)
# Color variables by groups
fviz_pca_var(res.pca, col.var = grp, 
             palette = c("#0073C2FF", "#EFC000FF", "#868686FF"),
             legend.title = "Cluster")
3.3 Dimension description
res.desc <- dimdesc(res.pca, axes = c(1,2), proba = 0.05)
# Description of dimension 1
res.desc$Dim.1
## $quanti
##              correlation  p.value
## Long.jump          0.794 6.06e-06
## Discus             0.743 4.84e-05
## Shot.put           0.734 6.72e-05
## High.jump          0.610 1.99e-03
## Javeline           0.428 4.15e-02
## X400m             -0.702 1.91e-04
## X110m.hurdle      -0.764 2.20e-05
## X100m             -0.851 2.73e-07

# Description of dimension 2
res.desc$Dim.2
## $quanti
##            correlation  p.value
## Pole.vault       0.807 3.21e-06
## X1500m           0.784 9.38e-06
## High.jump       -0.465 2.53e-02

## $quanti means results for quantitative variables. Note that, variables are sorted by the p-value of the correlation.
3.4 Graph of individuals ("$ind")
  • 3.4.1 Results
ind <- get_pca_ind(res.pca)
ind
## Principal Component Analysis Results for individuals
##  ===================================================
##   Name       Description                       
## 1 "$coord"   "Coordinates for the individuals" 
## 2 "$cos2"    "Cos2 for the individuals"        
## 3 "$contrib" "contributions of the individuals"

To get access to the different components, use this:

# Coordinates of individuals
head(ind$coord)
# Quality of individuals
head(ind$cos2)
# Contributions of individuals
head(ind$contrib)
  • 3.4.2 Plots: quality and contribution
# fviz_pca_ind(res.pca)
fviz_pca_ind(res.pca, col.ind = "cos2", 
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
             repel = TRUE # Avoid text overlapping (slow if many points)
             )
相似的individuals会被聚在一起

改变点的大小

fviz_pca_ind(res.pca, pointsize = "cos2", 
             pointshape = 21, fill = "#E7B800",
             repel = TRUE # Avoid text overlapping (slow if many points)
             )
#同时改变点的大小和颜色by cos2
fviz_pca_ind(res.pca, col.ind = "cos2", pointsize = "cos2",
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
             repel = TRUE # Avoid text overlapping (slow if many points)
             )

画条图

fviz_cos2(res.pca, choice = "ind")
# Total contribution on PC1 and PC2
fviz_contrib(res.pca, choice = "ind", axes = 1:2)
  • 3.4.3 Color by a custom continuous variable
# Create a random continuous variable of length 23,
# Same length as the number of active individuals in the PCA
set.seed(123)
my.cont.var <- rnorm(23)
# Color individuals by the continuous variable
fviz_pca_ind(res.pca, col.ind = my.cont.var,
             gradient.cols = c("blue", "yellow", "red"),
             legend.title = "Cont.Var")
  • 3.4.4 Color by groups
head(iris, 3)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa

# The variable Species (index = 5) is removed
# before PCA analysis
iris.pca <- PCA(iris[,-5], graph = FALSE)

fviz_pca_ind(iris.pca,
             geom.ind = "point", # show points only (nbut not "text")
             col.ind = iris$Species, # color by groups
             palette = c("#00AFBB", "#E7B800", "#FC4E07"),
             addEllipses = TRUE, # Concentration ellipses
             legend.title = "Groups"
             )
# Add confidence ellipses
fviz_pca_ind(iris.pca, geom.ind = "point", col.ind = iris$Species, 
             palette = c("#00AFBB", "#E7B800", "#FC4E07"),
             addEllipses = TRUE, ellipse.type = "confidence",
             legend.title = "Groups"
             )
3.5 Graph customization

更多图形个性化绘制

  • 3.5.1 Dimensions
    默认情况下, variables/individuals代表的是PC1和PC2,如果想看其他PC比如PC2和PC3也是可以的。
# Variables on dimensions 2 and 3
fviz_pca_var(res.pca, axes = c(2, 3))
# Individuals on dimensions 2 and 3
fviz_pca_ind(res.pca, axes = c(2, 3))
  • 3.5.2 Plot elements: point, text, arrow

geom.var参数:
· geom.var = "point", to show only points;
· geom.var = "text" to show only text labels;
· geom.var = c("point", "text") to show both points and text labels
· geom.var = c("arrow", "text") to show arrows and labels (default).

geom.ind参数:
· geom.ind = "point", to show only points;
· geom.ind = "text" to show only text labels;
· geom.ind = c("point", "text") to show both point and text labels (default)

# Show variable points and text labels
fviz_pca_var(res.pca, geom.var = c("point", "text"))
# Show individuals text labels only
fviz_pca_ind(res.pca, geom.ind =  "text")
  • 3.5.3 Size and shape of plot elements
    · labelsize: font size for the text labels, e.g.: labelsize = 4.
    · pointsize: the size of points, e.g.: pointsize = 1.5.
    · arrowsize: the size of arrows. Controls the thickness of arrows, e.g.: arrowsize = 0.5.
    · pointshape: the shape of points, pointshape = 21. Type ggpubr::show_point_shapes() to see available point shapes.
# Change the size of arrows an labels
fviz_pca_var(res.pca, arrowsize = 1, labelsize = 5, 
             repel = TRUE)
# Change points size, shape and fill color
# Change labelsize
fviz_pca_ind(res.pca, 
             pointsize = 3, pointshape = 21, fill = "lightblue",
             labelsize = 5, repel = TRUE)
  • 3.5.4 Ellipses
    添加Ellipses:addEllipses = TRUE
    ellipse.type可以设置为"convex","confidence","t","norm","euclid"
    ellipse.level:设置椭圆的集中度。如设置ellipse.level = 0.95 or ellipse.level = 0.66。

  • 3.5.5 Group mean points

fviz_pca_ind(iris.pca,
             geom.ind = "point", # show points only (but not "text")
             group.ind = iris$Species, # color by groups
             legend.title = "Groups",
             mean.point = FALSE)
  • 3.5.6 Axis lines
fviz_pca_var(res.pca, axes.linetype = "blank")
  • 3.5.7 Graphical parameters
    使用ggpar()[ggpubr package]函数进行更精细的调整
ind.p <- fviz_pca_ind(iris.pca, geom = "point", col.ind = iris$Species)
ggpubr::ggpar(ind.p,
              title = "Principal Component Analysis",
              subtitle = "Iris data set",
              caption = "Source: factoextra",
              xlab = "PC1", ylab = "PC2",
              legend.title = "Species", legend.position = "top",
              ggtheme = theme_gray(), palette = "jco"
              )
3.6 Biplot双标图

绘制一张简单的双标图

fviz_pca_biplot(res.pca, repel = TRUE,
                col.var = "#2E9FDF", # Variables color
                col.ind = "#696969"  # Individuals color
                )
fviz_pca_biplot(iris.pca, 
                col.ind = iris$Species, palette = "jco", 
                addEllipses = TRUE, label = "var",
                col.var = "black", repel = TRUE,
                legend.title = "Species") 
fviz_pca_biplot(iris.pca, 
                # Fill individuals by groups
                geom.ind = "point",
                pointshape = 21,
                pointsize = 2.5,
                fill.ind = iris$Species,
                col.ind = "black",
                # Color variable by groups
                col.var = factor(c("sepal", "sepal", "petal", "petal")),
                
                legend.title = list(fill = "Species", color = "Clusters"),
                repel = TRUE        # Avoid label overplotting
             )+
  ggpubr::fill_palette("jco")+      # Indiviual fill color
  ggpubr::color_palette("npg")      # Variable colors
fviz_pca_biplot(iris.pca, 
                # Individuals
                geom.ind = "point",
                fill.ind = iris$Species, col.ind = "black",
                pointshape = 21, pointsize = 2,
                palette = "jco",
                addEllipses = TRUE,
                # Variables
                alpha.var ="contrib", col.var = "contrib",
                gradient.cols = "RdYlBu",
                
                legend.title = list(fill = "Species", color = "Contrib",
                                    alpha = "Contrib")
                )

4. Filtering results

# Visualize variable with cos2 >= 0.6
fviz_pca_var(res.pca, select.var = list(cos2 = 0.6))
# Top 5 active variables with the highest cos2
fviz_pca_var(res.pca, select.var= list(cos2 = 5))
# Select by names
name <- list(name = c("Long.jump", "High.jump", "X100m"))
fviz_pca_var(res.pca, select.var = name)
# top 5 contributing individuals and variable
fviz_pca_biplot(res.pca, select.ind = list(contrib = 5), 
               select.var = list(contrib = 5),
               ggtheme = theme_minimal())

5. Exporting results

5.1 Export plots to PDF/PNG files
# Print the plot to a pdf file
pdf("myplot.pdf")
print(myplot)
dev.off()

# Scree plot
scree.plot <- fviz_eig(res.pca)
# Plot of individuals
ind.plot <- fviz_pca_ind(res.pca)
# Plot of variables
var.plot <- fviz_pca_var(res.pca)
pdf("PCA.pdf") # Create a new pdf device
print(scree.plot)
print(ind.plot)
print(var.plot)
dev.off() # Close the pdf device

# Print scree plot to a png file
png("pca-scree-plot.png")
print(scree.plot)
dev.off()
# Print individuals plot to a png file
png("pca-variables.png")
print(var.plot)
dev.off()
# Print variables plot to a png file
png("pca-individuals.png")
print(ind.plot)
dev.off()

使用ggexport()

library(ggpubr)
ggexport(plotlist = list(scree.plot, ind.plot, var.plot), 
         filename = "PCA.pdf")

ggexport(plotlist = list(scree.plot, ind.plot, var.plot), 
         nrow = 2, ncol = 2,
         filename = "PCA.pdf")

ggexport(plotlist = list(scree.plot, ind.plot, var.plot),
         filename = "PCA.png")
5.2 Export results to txt/csv files
write.infile(res.pca, "pca.txt", sep = "\t")
# Export into a CSV file
write.infile(res.pca, "pca.csv", sep = ";")

6. 总结

  • 进行PCA的方法
    Using prcomp() [stats]
res.pca <- prcomp(iris[, -5], scale. = TRUE)

Read more: http://www.sthda.com/english/wiki/pca-using-prcomp-and-princomp

  • Using princomp() [stats]
res.pca <- princomp(iris[, -5], cor = TRUE)

Read more: http://www.sthda.com/english/wiki/pca-using-prcomp-and-princomp

  • Using dudi.pca() [ade4]
library("ade4")
res.pca <- dudi.pca(iris[, -5], scannf = FALSE, nf = 5)

Read more: http://www.sthda.com/english/wiki/pca-using-ade4-and-factoextra

  • Using epPCA() [ExPosition]
library("ExPosition")
res.pca <- epPCA(iris[, -5], graph = FALSE)
  • 可视化
fviz_eig(res.pca)     # Scree plot
fviz_pca_ind(res.pca) # Graph of individuals
fviz_pca_var(res.pca) # Graph of variables

7. Further reading

For the mathematical background behind CA, refer to the following video courses, articles and books:

  • Principal component analysis (article) (Abdi and Williams 2010). https://goo.gl/1Vtwq1.
  • Principal Component Analysis Course Using FactoMineR (Video courses). https://goo.gl/VZJsnM
  • Exploratory Multivariate Analysis by Example Using R (book) (Husson, Le, and Pagès 2017).
  • Principal Component Analysis (book) (Jollife 2002).

See also:


参考:

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
禁止转载,如需转载请通过简信或评论联系作者。
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 203,456评论 5 477
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 85,370评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 150,337评论 0 337
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,583评论 1 273
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,596评论 5 365
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,572评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,936评论 3 395
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,595评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,850评论 1 297
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,601评论 2 321
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,685评论 1 329
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,371评论 4 318
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,951评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,934评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,167评论 1 259
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 43,636评论 2 349
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,411评论 2 342

推荐阅读更多精彩内容