Standardization in ML

KoshurAI
2 min readDec 31, 2022

--

Standardization is a common preprocessing step in machine learning (ML) that involves transforming the input data into a common scale. The goal of standardization is to improve the performance of the ML model by reducing the influence of outliers, stabilizing the variance of the input data, and improving the convergence of the model during training.

Here are a few examples of standardization in ML:

Scaling: Scaling is a technique that transforms the input data by dividing each feature by its range (maximum value — minimum value). This transformation scales the data to the range [0, 1]. Scaling is often used to standardize features that have a large range of values or that are measured on different scales.

For example, consider a dataset with two features: age (measured in years) and income (measured in dollars). The age feature ranges from 18 to 80, while the income feature ranges from 20,000 to 200,000. To standardize these features using scaling, you would transform the age feature as follows:

age_standardized = (age - age_min) / (age_max - age_min)

Where age_min and age_max are the minimum and maximum values of the age feature, respectively. The resulting age_standardized feature would have a range of [0, 1].

Normalization: Normalization is a technique that transforms the input data by subtracting the mean value of each feature from each data point and dividing the result by the standard deviation of the feature. This transformation results in a standardized feature with a mean of 0 and a standard deviation of 1.

For example, consider the same dataset with the age and income features. To standardize these features using normalization, you would transform the age feature as follows:

age_standardized = (age - age_mean) / age_std

Where age_mean and age_std are the mean and standard deviation of the age feature, respectively. The resulting age_standardized feature would have a mean of 0 and a standard deviation of 1.

Whitening: Whitening is a technique that transforms the input data by decorrelating the features and scaling them to have unit variance. This transformation is often used to improve the convergence of neural networks and other types of ML models.

For example, consider a dataset with three features: x1, x2, and x3. To standardize these features using whitening, you would first compute the covariance matrix of the features and then apply a linear transformation to the data using the eigenvectors and eigenvalues of the covariance matrix. The resulting transformed features would be uncorrelated and have unit variance.

Standardization is a useful technique for many types of ML models, including linear models, support vector machines, and neural networks. It can help to improve the performance of the model by reducing the influence of outliers, stabilizing the variance of the input data, and improving the convergence of the model during training.

--

--

KoshurAI
KoshurAI

Written by KoshurAI

Passionate about Data Science? I offer personalized data science training and mentorship. Join my course today to unlock your true potential in Data Science.

No responses yet