Name four popular dimensionality reduction algorithms and briefly describe them.
- Principal component analysis (PCA) — uses an eigen decomposition to transform the original feature data into linearly independent eigenvectors. The most important vectors (with highest eigenvalues) are then selected to represent the features in the transformed space
- Non-negative matrix factorization (NMF) — can be used to reduce dimensionality for certain problem types while preserving more information than PCA
- Embedding techniques — various embedding techniques, e.g. finding local neighbors as done in Local Linear Embedding, can be used to reduce dimensionality
- Clustering or centroid techniques — each value can be described as a member of a cluster, a linear combination of clusters, or a linear combination of cluster centroids By far the most popular is PCA and similar eigen-decomposition-based variations.
After doing dimensionality reduction, can you transform the data back into the original feature space? How? Yes and no.
Most dimensonality reduction techniques have inverse transformations, but signal is often lost when reducing dimensions, so the inverse transformation is usually only an approximation of the original data.