We Recommend a Singular Value Decomposition

原文地址：http://www.ams.org/samplings/feature-column/fcarc-svd

這是一篇直觀講解矩陣奇異值分解（SVD）原理的文章，大牛總是能把複雜的東西講解得簡單明瞭。學習了，膜拜之。

Introduction

The topic of this article, the singular value decomposition, is one that should be a part of the standard mathematics undergraduate curriculum but all too often slips between the cracks. Besides being rather intuitive, these decompositions are incredibly useful. For instance, Netflix, the online movie rental company, is currently offering a $1 million prize for anyone who can improve the accuracy of its movie recommendation system by 10%. Surprisingly, this seemingly modest problem turns out to be quite challenging, and the groups involved are now using rather sophisticated techniques. At the heart of all of them is the singular value decomposition.

A singular value decomposition provides a convenient way for breaking a matrix, which perhaps contains some data we are interested in, into simpler, meaningful pieces. In this article, we will offer a geometric explanation of singular value decompositions and look at some of the applications of them.

The geometry of linear transformations

Let us begin by looking at some simple matrices, namely those with two rows and two columns. Our first example is the diagonal matrix

Geometrically, we may think of a matrix like this as taking a point (x, y) in the plane and transforming it into another point using matrix multiplication:

The effect of this transformation is shown below: the plane is horizontally stretched by a factor of 3, while there is no vertical change.

Now let's look at

which produces this effect

It is not so clear how to describe simply the geometric effect of the transformation. However, let's rotate our grid through a 45 degree angle and see what happens.

Ah ha. We see now that this new grid is transformed in the same way that the original grid was transformed by the diagonal matrix: the grid is stretched by a factor of 3 in one direction.

This is a very special situation that results from the fact that the matrix M is symmetric; that is, the transpose of M, the matrix obtained by flipping the entries about the diagonal, is equal toM. If we have a symmetric 2 2 matrix, it turns out that we may always rotate the grid in the domain so that the matrix acts by stretching and perhaps reflecting in the two directions. In other words, symmetric matrices behave like diagonal matrices.

aid with more mathematical precision, given a symmetric matrix M, we may find a set of orthogonal vectorsv_i so thatMv_i is a scalar multiple ofv_i; that is

Mv_i = λ_iv_i

where λ_i is a scalar. Geometrically, this means that the vectors v_i are simply stretched and/or reflected when multiplied by M. Because of this property, we call the vectors v_ieigenvectors ofM; the scalars λ_i are called eigenvalues. An important fact, which is easily verified, is that eigenvectors of a symmetric matrix corresponding to different eigenvalues are orthogonal.

If we use the eigenvectors of a symmetric matrix to align the grid, the matrix stretches and reflects the grid in the same way that it does the eigenvectors.

The geometric description we gave for this linear transformation is a simple one: the grid is simply stretched in one direction. For more general matrices, we will ask if we can find an orthogonal grid that is transformed into another orthogonal grid. Let's consider a final example using a matrix that is not symmetric:

This matrix produces the geometric effect known as a shear.

It's easy to find one family of eigenvectors along the horizontal axis. However, our figure above shows that these eigenvectors cannot be used to create an orthogonal grid that is transformed into another orthogonal grid. Nonetheless, let's see what happens when we rotate the grid first by 30 degrees,

Notice that the angle at the origin formed by the red parallelogram on the right has increased. Let's next rotate the grid by 60 degrees.

Hmm. It appears that the grid on the right is now almost orthogonal. In fact, by rotating the grid in the domain by an angle of roughly 58.28 degrees, both grids are now orthogonal.

The singular value decomposition

This is the geometric essence of the singular value decomposition for 2 2 matrices: for any 2 2 matrix, we may find an orthogonal grid that is transformed into another orthogonal grid.

We will express this fact using vectors: with an appropriate choice of orthogonal unit vectorsv₁ andv₂, the vectors Mv₁ andMv₂ are orthogonal.

We will use u₁ and u₂ to denote unit vectors in the direction ofMv₁ andMv₂. The lengths ofMv₁ andMv₂--denoted by σ₁ and σ₂--describe the amount that the grid is stretched in those particular directions. These numbers are called thesingular values ofM. (In this case, the singular values are the golden ratio and its reciprocal, but that is not so important here.)

We therefore have

Mv₁ = σ₁u₁
Mv₂ = σ₂u₂

We may now give a simple description for how the matrix M treats a general vectorx. Since the vectorsv₁ and v₂ are orthogonal unit vectors, we have

x = (v₁

x)v₁ + (v₂

x)v₂

This means that

Mx = (v₁

x)Mv₁ + (v₂

x)Mv₂
Mx = (v₁

x) σ₁u₁ + (v₂

x) σ₂u₂

Remember that the dot product may be computed using the vector transpose

x =v^Tx

which leads to

Mx = u₁σ₁v₁^Tx +u₂σ₂v₂^Tx
M = u₁σ₁v₁^T +u₂σ₂v₂^T

This is usually expressed by writing

M = UΣV^T

where U is a matrix whose columns are the vectors u₁ andu₂, Σ is a diagonal matrix whose entries are σ₁ and σ₂, andV is a matrix whose columns arev₁ and v₂. The superscriptT on the matrix V denotes the matrix transpose ofV.

This shows how to decompose the matrix M into the product of three matrices:V describes an orthonormal basis in the domain, andU describes an orthonormal basis in the co-domain, andΣ describes how much the vectors inV are stretched to give the vectors inU.

How do we find the singular decomposition?

The power of the singular value decomposition lies in the fact that we may find it forany matrix. How do we do it? Let's look at our earlier example and add the unit circle in the domain. Its image will be an ellipse whose major and minor axes define the orthogonal grid in the co-domain.

Notice that the major and minor axes are defined by Mv₁ andMv₂. These vectors therefore are the longest and shortest vectors among all the images of vectors on the unit circle.

In other words, the function |Mx| on the unit circle has a maximum atv₁ and a minimum atv₂. This reduces the problem to a rather standard calculus problem in which we wish to optimize a function over the unit circle. It turns out that the critical points of this function occur at the eigenvectors of the matrixM^TM. Since this matrix is symmetric, eigenvectors corresponding to different eigenvalues will be orthogonal. This gives the family of vectorsv_i.

The singular values are then given by σ_i = |Mv_i|, and the vectorsu_i are obtained as unit vectors in the direction ofMv_i. But why are the vectors u_i orthogonal?

To explain this, we will assume that σ_i and σ_j are distinct singular values. We have

Mv_i = σ_iu_i
Mv_j = σ_ju_j.

Let's begin by looking at the expression Mv_iMv_j and assuming, for convenience, that the singular values are non-zero. On one hand, this expression is zero since the vectors v_i, which are eigenvectors of the symmetric matrixM^TM are orthogonal to one another:

Mv_i

Mv_j =v_i^TM^TMv_j =v_i

M^TMv_j = λ_jv_i

v_j = 0.

On the other hand, we have

Mv_i

Mv_j = σ_iσ_ju_i

u_j = 0

Therefore, u_i and u_j are othogonal so we have found an orthogonal set of vectorsv_i that is transformed into another orthogonal setu_i. The singular values describe the amount of stretching in the different directions.

In practice, this is not the procedure used to find the singular value decomposition of a matrix since it is not particularly efficient or well-behaved numerically.

Another example

Let's now look at the singular matrix

The geometric effect of this matrix is the following:

In this case, the second singular value is zero so that we may write:

M = u₁σ₁v₁^T.

In other words, if some of the singular values are zero, the corresponding terms do not appear in the decomposition forM. In this way, we see that therank of M, which is the dimension of the image of the linear transformation, is equal to the number of non-zero singular values.

Data compression

Singular value decompositions can be used to represent data efficiently. Suppose, for instance, that we wish to transmit the following image, which consists of an array of 15 25 black or white pixels.

Since there are only three types of columns in this image, as shown below, it should be possible to represent the data in a more compact form.

We will represent the image as a 15 25 matrix in which each entry is either a 0, representing a black pixel, or 1, representing white. As such, there are 375 entries in the matrix.

If we perform a singular value decomposition on M, we find there are only three non-zero singular values.

σ₁ = 14.72
σ₂ = 5.22
σ₃ = 3.31

Therefore, the matrix may be represented as

M=u₁σ₁v₁^T +u₂σ₂v₂^T +u₃σ₃v₃^T

This means that we have three vectors v_i, each of which has 15 entries, three vectorsu_i, each of which has 25 entries, and three singular valuesσ_i. This implies that we may represent the matrix using only 123 numbers rather than the 375 that appear in the matrix. In this way, the singular value decomposition discovers the redundancy in the matrix and provides a format for eliminating it.

Why are there only three non-zero singular values? Remember that the number of non-zero singular values equals the rank of the matrix. In this case, we see that there are three linearly independent columns in the matrix, which means that the rank will be three.

Noise reduction

The previous example showed how we can exploit a situation where many singular values are zero. Typically speaking, the large singular values point to where the interesting information is. For example, imagine we have used a scanner to enter this image into our computer. However, our scanner introduces some imperfections (usually called "noise") in the image.

We may proceed in the same way: represent the data using a 15 25 matrix and perform a singular value decomposition. We find the following singular values:

σ₁ = 14.15
σ₂ = 4.67
σ₃ = 3.00
σ₄ = 0.21
σ₅ = 0.19
...
σ₁₅ = 0.05

Clearly, the first three singular values are the most important so we will assume that the others are due to the noise in the image and make the approximation

u₁σ₁v₁^T +u₂σ₂v₂^T +u₃σ₃v₃^T

This leads to the following improved image.

Noisy image	Improved image

Data analysis

Noise also arises anytime we collect data: no matter how good the instruments are, measurements will always have some error in them. If we remember the theme that large singular values point to important features in a matrix, it seems natural to use a singular value decomposition to study data once it is collected.

As an example, suppose that we collect some data as shown below:

We may take the data and put it into a matrix:

-1.03	0.74	-0.02	0.51	-1.31	0.99	0.69	-0.12	-0.72	1.11
-2.23	1.61	-0.02	0.88	-2.39	2.02	1.62	-0.35	-1.67	2.46

and perform a singular value decomposition. We find the singular values

σ₁ = 6.04
σ₂ = 0.22

With one singular value so much larger than the other, it may be safe to assume that the small value of σ₂ is due to noise in the data and that this singular value would ideally be zero. In that case, the matrix would have rank one meaning that all the data lies on the line defined by u_i.

This brief example points to the beginnings of a field known as principal component analysis, a set of techniques that uses singular values to detect dependencies and redundancies in data.

In a similar way, singular value decompositions can be used to detect groupings in data, which explains why singular value decompositions are being used in attempts to improve Netflix's movie recommendation system. Ratings of movies you have watched allow a program to sort you into a group of others whose ratings are similar to yours. Recommendations may be made by choosing movies that others in your group have rated highly.

Summary

As mentioned at the beginning of this article, the singular value decomposition should be a central part of an undergraduate mathematics major's linear algebra curriculum. Besides having a rather simple geometric explanation, the singular value decomposition offers extremely effective techniques for putting linear algebraic ideas into practice. All too often, however, a proper treatment in an undergraduate linear algebra course seems to be missing.

This article has been somewhat impressionistic: I have aimed to provide some intuitive insights into the central idea behind singular value decompositions and then illustrate how this idea can be put to good use. More rigorous accounts may be readily found.

We Recommend a Singular Value Decomposition

Introduction

The geometry of linear transformations

The singular value decomposition

How do we find the singular decomposition?

Another example

Data compression

Noise reduction

Data analysis

Summary

釘釘打卡速度慢

Nginx R31 doc 官方文檔-01-nginx 如何安裝

Qt/C++音視頻開發74-合併標籤圖形/生成yolo運算結果圖形/文字和圖形合併成一個/水印濾鏡

挑戰程序設計競賽 2.2章習題 POJ - 3617 Best Cow Line 貪心

字節面試：MySQL什麼時候鎖表？如何防止鎖表？

.NET8連接SQL SERVER 2008 R2 報：證書鏈是由不受信任的頒發機構頒發的

golang開發環境搭建(win10)

python計算機視覺學習筆記——PIL庫的用法

Golang初學：獲取程序內存使用情況，std runtime

編譯hadoop 中eclipse-plugin的jar包

Uva10635 Prince and Princess（LIS）

Factory——工廠方法

TopCoder—ZigZag

AbstractFactory——抽象工廠

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結