Yinsen Miao Tech Blogs

Orthogonal Matching Pursuit(omp)

Recently, I have been learning to use omp methods to find sparse representation of the new data via an over complete dictionary. I want to put my summary here as a look-up page for my futher research. This is an incomplete version, since I also want to include several improved way such as Cholesky, QR, Batch and some example code. I will add some reference papers and webpages to those material later.

Sparse Learning

We are interested in modeling a column signal (for example a new data observation) by a linear combination of atoms (columns) selected from a dictionary where is number of features and for # of observations. We want to find a sparse representation of such that the difference between and becomes negligible. Formally, the problem can be described by the following optimization problem:



In formula is the sparse representation of , is the error tolerance, and is a pseudo-norm which counts the number of non-zero entries. However, the above optimization problem is NP-hard which can be efficiently solved using several approximation methods. And OMP (Orthogonal Matching Pursuit) is one of them.

OMP Approach

OMP is an approximation to the above optimization by solving one of the following two problems:


sparsity-constrained sparse coding problems given by

error-constrained sparse coding problem given by


Note that the columns of dictionary should be normalized to unit -length.

Naive OMP

The OMP use method conceptually similar to Gram-Schmidt, since the Gram-Schmidt can also find up to orthogonal vectors to express the new observations . You see that for a -dimensional vector space, you can find at most orthogonal directions.

The greedy OMP method selects each column atom from dictionary with the highest correlation to the current residual. Once the atom is selected, the signal is orthogonally projected to the span of all selected atoms (means all linear combinations of the selected atoms set), the residual is computed, and process repeats.

The detail algorithm is the table below:

Step Description
1 Input: Dictionary , signal , target sparsity or target error
2 Ouput: Sparse representation such that
3 Initialize: Set , ,
4 while (stopping criterion not meet) do
5       
6       
7       
8       
9 end while

where is one column element from , is an ordered list of selected column indexes, and are columns or entries selected by index .

Cholesky way OMP

The naive OMP requires pseudo inverse of which brings high computation cost with each iteration. However, is updated by simply adding a single row an column to it every iteration (details below). We can use methods such as Cholesky or QR which only requires the update of its last row.


Let us denote as . is a symmetric positive definite matrix and can be used to decompose into the form of . is a lower triangular matrix and denotes the conjugate transpose of . For iteration th, we have: (again denotes a subdictionary of selected by ordered set and is the ordered set in the th iteration).

After selecting the maximum in step 5, adds a new column and becomes

where each entry in is the dot product between every column of and and is the self dot product of or .

is updated by

QR way OMP

Batch OMP