Objectives
Upon completion of this lesson, you will be able to:
- list the different methods for feature normalization
- know when to apply normalization to numeric features
- use min-max and z-score standardization
Motivation
Normalizing numeric features in machine learning involves rescaling the values of numeric features in the dataset to a standard range. This process is done to make sure that each feature contributes approximately proportionately to the final prediction. For algorithms that use a measure of Euclidean distance between two data points, normalization is critical. If one feature is in the range of 0-1 and another is in the range of 0-1000, the algorithm is likely to be dominated by the second feature.
The normalization of numeric features before applying any kind of distance-based machine learning algorithm is a critical step in feature engineering.
Feature Normalization Methods
The most common types of normalization are:
- Min-Max Normalization
- z-Score Standardization
- Mean Normalization
- Unit Vector Normalization
Normalization must be applied separately to each numeric feature. Different normalization methods could conceivably be used for different features. However, it is required that all featured are scaled to the same range.
Feature normalization only applies to numeric features. It is not defined for categorical values. Those must be encoded differently in order for distance measures to be meaningful.
Min-Max Normalization
Min-Max Normalization, or Feature Scaling, is a common technique that rescales a numeric feature to a fixed range of 0 to 1. It is important that all features be scaled to the same range.
The formula for min-max normalization of a feature value \(x_i\) for the vector \(x = (x_1, x_2, ..., x_n)\) is given below:
\[
\frac{x_i - min(x)}{max(x) - min(x)}
\]
Essentially, we find the minimum and maximum values for a feature (i.e., a column in the data), and for each feature value we subtract the minimum from the value and divide it by the range. This scales all values to a range from 0 to 1.
z-Score Standardization
Standardization (or z-score normalization) is another common method for feature normalization where the features are rescaled to have the properties of a standard normal distribution with µ=0 and σ=1. This is useful in cases where the model we are training makes some sort of assumptions about the distribution of your data and presumes a Gaussian (Normal) distribution. One example of a machine learning algorithm that assumes a normal distribution of numeric feature values is linear regression.
The formula for z-score standardization of a feature value \(x_i\) for the vector \(x = (x_1, x_2, ..., x_n)\) is given below:
\[
\frac{x_i - \mu(x)}{\sigma(x)}
\] In the formula, \(\mu(x)\) is the mean if the feature values in \(x\) and \(\sigma(x)\) is their standard distribution. The range resulting from z-score standardization is centered around 0 and ranges from -∞ to +∞ but is generally within two standard deviations from 0.
Mean Normalization
Mean Normalization is a variant of min-max normalization where we subtract the mean of the feature vector from the values rather than the minimum. The range will be centered around zero but may not be in the range of -1 to 1.
The formula for mean normalization of a feature value \(x_i\) for the vector \(x = (x_1, x_2, ..., x_n)\) is given below:
\[
\frac{x_i - \mu(x)}{max(x) - min(x)}
\]
In the formula, \(\mu(x)\) is the mean of the values in the feature vector \(x\).
Unit Vector Normalization
Unit Vector Normalization rescales the feature vector so that it has a length of one. This is also known as Vector Normalization or Euclidean Normalization.
Summary
In general, normalization improves the performance and stability of machine learning algorithms, making them less sensitive to the scale of features. This means the resulting model is less likely to be affected by the scale of the input features, and hence, it can make the learning process faster and more effective. This is particularly important for algorithms that use a distance function (like k-Nearest Neighbors (KNN) or k-Means), gradient descent optimization (like linear regression, neural networks), and algorithms that use regularization (like ridge or lasso regression).
Tutorial
The video below demonstrates the application of the min-max and z-score normalization schemes.
References
No references.
Errata
None collected yet. Let us know.
---
title: "Normalizing Numeric Features for Machine Learning Algorithms"
params:
  category: 3
  stacks: 0
  number: 206
  time: 45
  level: beginner
  tags: regression,statistics
  description: "Explains how to normalize continuous numeric features for
                distance-based machine learning algorithms such as kNN,
                k-means, and SVM. Demonstrates the use of min-max and 
                z-score normalization and explains mean-normalization and
                unit vector normalization."
date: "<small>`r Sys.Date()`</small>"
author: "<small>Martin Schedlbauer</small>"
email: "m.schedlbauer@neu.edu"
affilitation: "Northeastern University"
output: 
  bookdown::html_document2:
    toc: true
    toc_float: true
    collapsed: false
    number_sections: false
    code_download: true
    theme: spacelab
    highlight: tango
---

---
title: "<small>`r params$category`.`r params$number`</small><br/><span style='color: #2E4053; font-size: 0.9em'>`r rmarkdown::metadata$title`</span>"
---

```{r code=xfun::read_utf8(paste0(here::here(),'/R/_insert2DB.R')), include = FALSE}
```

------------------------------------------------------------------------

## Objectives

Upon completion of this lesson, you will be able to:

-   list the different methods for feature normalization
-   know when to apply normalization to numeric features
-   use min-max and z-score standardization

------------------------------------------------------------------------

## Motivation {#motiv}

Normalizing numeric features in machine learning involves rescaling the values of numeric features in the dataset to a standard range. This process is done to make sure that each feature contributes approximately proportionately to the final prediction. For algorithms that use a measure of Euclidean distance between two data points, normalization is critical. If one feature is in the range of 0-1 and another is in the range of 0-1000, the algorithm is likely to be dominated by the second feature.

The normalization of numeric features before applying any kind of distance-based machine learning algorithm is a critical step in feature engineering.

## Feature Normalization Methods

The most common types of normalization are:

-   Min-Max Normalization
-   z-Score Standardization
-   Mean Normalization
-   Unit Vector Normalization

Normalization must be applied separately to each numeric feature. Different normalization methods could conceivably be used for different features. However, it is required that all featured are scaled to the same range.

> Feature normalization **only** applies to numeric features. It is not defined for categorical values. Those must be encoded differently in order for distance measures to be meaningful.

### Min-Max Normalization

*Min-Max Normalization*, or *Feature Scaling*, is a common technique that rescales a numeric feature to a fixed range of 0 to 1. It is important that all features be scaled to the same range.

The formula for min-max normalization of a feature value $x_i$ for the vector $x = (x_1, x_2, ..., x_n)$ is given below:

$$
\frac{x_i - min(x)}{max(x) - min(x)}
$$

Essentially, we find the minimum and maximum values for a feature (*i.e.*, a column in the data), and for each feature value we subtract the minimum from the value and divide it by the range. This scales all values to a range from 0 to 1.

### *z*-Score Standardization

*Standardization* (or *z-score normalization*) is another common method for feature normalization where the features are rescaled to have the properties of a standard normal distribution with µ=0 and σ=1. This is useful in cases where the model we are training makes some sort of assumptions about the distribution of your data and presumes a Gaussian (Normal) distribution. One example of a machine learning algorithm that assumes a normal distribution of numeric feature values is linear regression.

The formula for *z*-score standardization of a feature value $x_i$ for the vector $x = (x_1, x_2, ..., x_n)$ is given below:

$$
\frac{x_i - \mu(x)}{\sigma(x)}
$$ In the formula, $\mu(x)$ is the mean if the feature values in $x$ and $\sigma(x)$ is their standard distribution. The range resulting from *z*-score standardization is centered around 0 and ranges from -∞ to +∞ but is generally within two standard deviations from 0.

### Mean Normalization

*Mean Normalization* is a variant of *min-max normalization* where we subtract the mean of the feature vector from the values rather than the minimum. The range will be centered around zero but may not be in the range of -1 to 1.

The formula for mean normalization of a feature value $x_i$ for the vector $x = (x_1, x_2, ..., x_n)$ is given below:

$$
\frac{x_i - \mu(x)}{max(x) - min(x)}
$$

In the formula, $\mu(x)$ is the mean of the values in the feature vector $x$.

### Unit Vector Normalization

Unit Vector Normalization rescales the feature vector so that it has a length of one. This is also known as Vector Normalization or Euclidean Normalization.

## Summary

In general, normalization improves the performance and stability of machine learning algorithms, making them less sensitive to the scale of features. This means the resulting model is less likely to be affected by the scale of the input features, and hence, it can make the learning process faster and more effective. This is particularly important for algorithms that use a distance function (like k-Nearest Neighbors (KNN) or k-Means), gradient descent optimization (like linear regression, neural networks), and algorithms that use regularization (like ridge or lasso regression).

## Tutorial

The video below demonstrates the application of the min-max and *z*-score normalization schemes.

```{=html}
<iframe src="https://player.vimeo.com/video/829963218?h=c637f915d5&amp;title=0&amp;byline=0&amp;portrait=0&amp;speed=0&amp;badge=0&amp;autopause=0&amp;player_id=0&amp;app_id=58479" width="640" height="360" frameborder="0" allow="autoplay; fullscreen; picture-in-picture" allowfullscreen title="Normalizing Feature Values" data-external="1"></iframe>
```
## **Slide Deck**: [Feature Normalization](s-3-206-feature-normalization.pptx)

## Files & Resources

```{r zipFiles, echo=FALSE}
zipName = sprintf("LessonFiles-%s-%s.zip", 
                 params$category,
                 params$number)

textALink = paste0("All Files for Lesson ", 
               params$category,".",params$number)

# downloadFilesLink() is included from _insert2DB.R
knitr::raw_html(downloadFilesLink(".", zipName, textALink))
```

------------------------------------------------------------------------

## References

No references.

## Errata

None collected yet. Let us know.

```{=html}
<script src="https://form.jotform.com/static/feedback2.js" type="text/javascript">
  new JotformFeedback({
    formId: "212187072784157",
    buttonText: "Feedback",
    base: "https://form.jotform.com/",
    background: "#F59202",
    fontColor: "#FFFFFF",
    buttonSide: "left",
    buttonAlign: "center",
    type: false,
    width: 700,
    height: 500,
    isCardForm: false
  });
</script>
```
```{r code=xfun::read_utf8(paste0(here::here(),'/R/_deployKnit.R')), include = FALSE}
```
