Various ways to create One Hot Encoding

Tags:

Using numpy:

In [2]: x = np.array([0, 1, 2, 0, 0])
In [4]: x[:, np.newaxis]
Out[4]: 
array([[0],
       [1],
       [2],
       [0],
       [0]])

# Broadcasting.
In [5]: np.arange(3) == x[:, np.newaxis]
Out[5]: 
array([[ True, False, False],
       [False,  True, False],
       [False, False,  True],
       [ True, False, False],
       [ True, False, False]], dtype=bool)

# Just change the boolean to the int.
In [35]: (np.arange(3) == x[:, np.newaxis]).astype(np.float)
Out[35]: 
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.]])

Using sklearn:

In [8]: x
Out[8]: array([0, 1, 2, 0, 0])

In [9]: from sklearn.preprocessing import OneHotEncoder

# Reshape changes shape of the x to (5, 1).
In [10]: x.reshape(-1, 1)
Out[10]: 
array([[0],
       [1],
       [2],
       [0],
       [0]])

# Return value of fit_transform() is csr_matrix. todense() changes it to numpy matrix.
In [11]: OneHotEncoder().fit_transform(x.reshape(-1, 1)).todense()
Out[11]: 
matrix([[ 1.,  0.,  0.],
        [ 0.,  1.,  0.],
        [ 0.,  0.,  1.],
        [ 1.,  0.,  0.],
        [ 1.,  0.,  0.]])

Using keras’s numpy util.

In [12]: x
Out[12]: array([0, 1, 2, 0, 0])

In [13]: from keras.utils.np_utils import to_categorical
In [14]: to_categorical(x, len(set(x)))
Out[14]: 
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.]])