Label Encoding

tags: #python/data_science/preprocessing

What is label encoding and how is it used?

LabelEncoder is a class in Scikit-learn that is used to encode categorical labels with integer values between 0 and n_classes-1.

It is commonly used for encoding the target variable in supervised learning tasks. However, it can also be used to encode nominal data without order.

Caveat for using LabelEncoder on Categorical Variables

Converting classes of a categorical into numeric representation can cause the algorithm to believe there is inherent order between classes when there is not. Best to use Dummy Encoding for one-hot encoding.

Import library

from sklearn.preprocessing import LabelEncoder

Create instance of the label encoder object

# create an instance of LabelEncoder
le = LabelEncoder()

Fit and transform classes

# fit and transform the categorical labels
encoded_labels = le.fit_transform("TARGET COL")

print(encoded_labels)

# working with dataframe
df['TARGET COL'] = le.fit_transform("TARGET COL")