Curriculum Learning

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.149.4701

Humans and animals learn much better when the examples are not randomly presented but organized in a meaningful order which illustrates gradually more concepts, and gradually more complex ones. Here, we formalize such training strategies in the context of machine learning, and call them “curriculum learning”

significant improvements in generalization can be achieved.

Similar Posts:

Scheduled Sampling

RNN에서 훈련시 실제 시퀀스 대신 모델이 예측하는 값을 입력으로 준다. 예측이 잘못된 뒤 완전히 엉뚱한 시퀀스를 답으로 내놓는 문제를 해결. 또한 실제 예측시와 동일하게 훈련을 시킨다는데 의미가 있음.

We propose a curriculum learning strategy to gently change the training process from a fully guided scheme using the true previous token, towards a less guided scheme which mostly uses the generated token instead.

Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks

Similar Posts:

Skew 된 데이터에서의 기하 평균

​랜덤 변수 X가 로그 노말을 따르면 X의 기하평균이 중앙값과 같다. 또한 기하 평균은 산술 평균보다 같거나 작다. 따라서 outlier가 있는 데이터에서 기하 평균이 유용하게 쓰인다.

예를들어 웹 사이트 로딩 시간의 latency 를 로그 노말로 본다면 latency의 기하 평균을 구할 경우 그 값은 중앙값이 된다.중앙값은 outlier의 영향을 덜 받으므로 보다 더 대표적인 latency를 구할 수 있는 장점이 있다.

Similar Posts:

Changing numpy array column or shape

Changing the order of columns. This is useful when you want to reorder image data, e.g., rgb -> bgr.

In [14]: x = np.arange(10)

In [15]: x
Out[15]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [16]: np.resize(x, (5, 2))
Out[16]: 
array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

In [17]: np.resize(x, (5, 2))[:, ::-1]
Out[17]: 
array([[1, 0],
       [3, 2],
       [5, 4],
       [7, 6],
       [9, 8]])

Changing the order of axis. For image, this is useful if you want to change the channel axis to the arbitrary position. As an example, matplotlib.pyplt.plot() accepts images in the form of (x, y, channel). Your data might be in the form of (channel, x, y).

# 256x256 image. Channel (or rgb) is at the front.
In [26]: x = np.ones((3, 256, 256))

In [27]: x.shape
Out[27]: (3, 256, 256)

# Move the channel axis to the last.
In [29]: np.rollaxis(x, 0, 3).shape
Out[29]: (256, 256, 3)

np.transpose can be used to change the order of axis.

In [25]: x = np.ones((1, 2, 3))
In [26]: x.shape
Out[26]: (1, 2, 3)
In [28]: x.transpose((2, 0, 1)).shape
Out[28]: (3, 1, 2)

Similar Posts:

왜 딥 러닝이 잘 동작하는가

관련된 글을 모아가면서 포스트를 계속 업데이트 하려 합니다.

  1. 우주의 원리가 딥러닝이 배우기에 적절하기 때문
  2. K layer로 단순하게 표현가능한 함수를 2층으로 구현하면 크기가 지수적으로 커진다.
  3. IoT, 모바일 등으로 인해 증가하는 데이터를 활용할 수 있는 모형이다

Similar Posts:

Pseudo label – Semi supervised learning 방법

Training의 막바지에 도달하면 레이블이 없는 데이터로 예측을 수행한 뒤 예측된 레이블이 진짜 레이블인 것처럼해서 모델을 튜닝한다.
http://deeplearning.net/wp-content/uploads/2013/03/pseudo_label_final.pdf
Why pseudo labels? – Kaggle Forum

Self Training 과는 구분.

Similar Posts:

Batch Normalization

Covariate shift – A Literature Survey on Domain Adaptation of Statistical Classifiers
Why does batch normalization help? – Quora
Batch Normalization – SanghyukChun’s Blog

Similar Posts:

Saving and loading numpy array

import bcolz
def save_array(fname, arr): c=bcolz.carray(arr, rootdir=fname, mode='w'); c.flush()
def load_array(fname): return bcolz.open(fname)[:]

Code snippet from http://course.fast.ai/.

Similar Posts:

Various ways to create One Hot Encoding

Using numpy:

In [2]: x = np.array([0, 1, 2, 0, 0])
In [4]: x[:, np.newaxis]
Out[4]: 
array([[0],
       [1],
       [2],
       [0],
       [0]])

# Broadcasting.
In [5]: np.arange(3) == x[:, np.newaxis]
Out[5]: 
array([[ True, False, False],
       [False,  True, False],
       [False, False,  True],
       [ True, False, False],
       [ True, False, False]], dtype=bool)

# Just change the boolean to the int.
In [35]: (np.arange(3) == x[:, np.newaxis]).astype(np.float)
Out[35]: 
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.]])

Using sklearn:

In [8]: x
Out[8]: array([0, 1, 2, 0, 0])

In [9]: from sklearn.preprocessing import OneHotEncoder

# Reshape changes shape of the x to (5, 1).
In [10]: x.reshape(-1, 1)
Out[10]: 
array([[0],
       [1],
       [2],
       [0],
       [0]])

# Return value of fit_transform() is csr_matrix. todense() changes it to numpy matrix.
In [11]: OneHotEncoder().fit_transform(x.reshape(-1, 1)).todense()
Out[11]: 
matrix([[ 1.,  0.,  0.],
        [ 0.,  1.,  0.],
        [ 0.,  0.,  1.],
        [ 1.,  0.,  0.],
        [ 1.,  0.,  0.]])

Using keras’s numpy util.

In [12]: x
Out[12]: array([0, 1, 2, 0, 0])

In [13]: from keras.utils.np_utils import to_categorical
In [14]: to_categorical(x, len(set(x)))
Out[14]: 
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.]])

Similar Posts:

Linear Regression using Keras

This is example code to perform linear regression using keras.

import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD

import numpy as np
from numpy.random import random

x = random((30, 2))
w = np.array([3., 2.])
b = 1.
y = np.dot(x, w) + b
print x
print y
model = Sequential()
# 30 observations. Each observation has 2 features.
model.add(Dense(1, input_shape=(2, )))
# MSE because we want linear regression.
model.compile(optimizer=SGD(lr=0.1), loss='mse')
model.fit(x, y, nb_epoch=60, batch_size=1)
print model.get_weights()

Output:

Using Theano backend.
Using gpu device 0: GeForce GTX 1080 (CNMeM is disabled, cuDNN 5105)
[[ 0.66716875  0.44536398]
 [ 0.58253148  0.45428442]
 [ 0.49094032  0.77897022]
 [ 0.16345683  0.27807741]
 [ 0.29913647  0.87666064]
 [ 0.41150342  0.4329423 ]
 [ 0.30724994  0.56194767]
 [ 0.1448368   0.61572276]
 [ 0.89288341  0.45568394]
 [ 0.20008006  0.1446671 ]
 [ 0.58501705  0.1407299 ]
 [ 0.8924096   0.58803216]
 [ 0.76954948  0.95146172]
 [ 0.17315788  0.03576668]
 [ 0.01515587  0.36599027]
 [ 0.77232613  0.35686848]
 [ 0.4897022   0.52092717]
 [ 0.50756237  0.20100097]
 [ 0.8372522   0.53871228]
 [ 0.2223611   0.5919245 ]
 [ 0.89898591  0.24163213]
 [ 0.571022    0.2140571 ]
 [ 0.55041835  0.00383233]
 [ 0.08253098  0.64526628]
 [ 0.3512973   0.53963146]
 [ 0.73578765  0.65469051]
 [ 0.91344962  0.40350727]
 [ 0.74023006  0.34414037]
 [ 0.41329666  0.22543498]
 [ 0.82787326  0.41838276]]
[ 3.8922342   3.65616327  4.03076141  2.0465253   3.6507307   3.10039487
  3.04564516  2.66595593  4.59001811  1.88957438  3.03651096  4.85329311
  5.21157187  1.591007    1.77744815  4.03071534  3.51096094  2.92468906
  4.58918115  2.8509323   4.18022197  3.14118018  2.65891972  2.53812548
  3.13315483  4.51674398  4.54736341  3.90897091  2.69075993  4.32038531]
Epoch 1/60
30/30 [==============================] - 0s - loss: 1.6538      
Epoch 2/60
30/30 [==============================] - 0s - loss: 0.1377     
Epoch 3/60
30/30 [==============================] - 0s - loss: 0.0793     
Epoch 4/60
30/30 [==============================] - 0s - loss: 0.0370     
Epoch 5/60
30/30 [==============================] - 0s - loss: 0.0265     
Epoch 6/60
30/30 [==============================] - 0s - loss: 0.0146     
Epoch 7/60
30/30 [==============================] - 0s - loss: 0.0096     
Epoch 8/60
30/30 [==============================] - 0s - loss: 0.0043     
Epoch 9/60
30/30 [==============================] - 0s - loss: 0.0030     
Epoch 10/60
30/30 [==============================] - 0s - loss: 0.0020     
Epoch 11/60
30/30 [==============================] - 0s - loss: 0.0012     
Epoch 12/60
30/30 [==============================] - 0s - loss: 6.9193e-04     
Epoch 13/60
30/30 [==============================] - 0s - loss: 3.6242e-04     
Epoch 14/60
30/30 [==============================] - 0s - loss: 2.2359e-04     
Epoch 15/60
30/30 [==============================] - 0s - loss: 1.1410e-04     
Epoch 16/60
30/30 [==============================] - 0s - loss: 7.5656e-05     
Epoch 17/60
30/30 [==============================] - 0s - loss: 4.6557e-05     
Epoch 18/60
30/30 [==============================] - 0s - loss: 2.9460e-05     
Epoch 19/60
30/30 [==============================] - 0s - loss: 1.6638e-05     
Epoch 20/60
30/30 [==============================] - 0s - loss: 1.0647e-05     
Epoch 21/60
30/30 [==============================] - 0s - loss: 6.4342e-06     
Epoch 22/60
30/30 [==============================] - 0s - loss: 3.5493e-06     
Epoch 23/60
30/30 [==============================] - 0s - loss: 1.8375e-06     
Epoch 24/60
30/30 [==============================] - 0s - loss: 1.3024e-06     
Epoch 25/60
30/30 [==============================] - 0s - loss: 8.3916e-07     
Epoch 26/60
30/30 [==============================] - 0s - loss: 5.3163e-07     
Epoch 27/60
30/30 [==============================] - 0s - loss: 2.8679e-07     
Epoch 28/60
30/30 [==============================] - 0s - loss: 1.5040e-07     
Epoch 29/60
30/30 [==============================] - 0s - loss: 1.1201e-07     
Epoch 30/60
30/30 [==============================] - 0s - loss: 6.0981e-08     
Epoch 31/60
30/30 [==============================] - 0s - loss: 4.7074e-08     
Epoch 32/60
30/30 [==============================] - 0s - loss: 2.9919e-08     
Epoch 33/60
30/30 [==============================] - 0s - loss: 1.6059e-08     
Epoch 34/60
30/30 [==============================] - 0s - loss: 9.3970e-09     
Epoch 35/60
30/30 [==============================] - 0s - loss: 5.7633e-09     
Epoch 36/60
30/30 [==============================] - 0s - loss: 3.3312e-09     
Epoch 37/60
30/30 [==============================] - 0s - loss: 2.1822e-09     
Epoch 38/60
30/30 [==============================] - 0s - loss: 1.2432e-09     
Epoch 39/60
30/30 [==============================] - 0s - loss: 6.8956e-10     
Epoch 40/60
30/30 [==============================] - 0s - loss: 4.4050e-10     
Epoch 41/60
30/30 [==============================] - 0s - loss: 2.5711e-10     
Epoch 42/60
30/30 [==============================] - 0s - loss: 1.5499e-10     
Epoch 43/60
30/30 [==============================] - 0s - loss: 8.9069e-11     
Epoch 44/60
30/30 [==============================] - 0s - loss: 4.6494e-11     
Epoch 45/60
30/30 [==============================] - 0s - loss: 3.0863e-11     
Epoch 46/60
30/30 [==============================] - 0s - loss: 1.5283e-11     
Epoch 47/60
30/30 [==============================] - 0s - loss: 8.9050e-12     
Epoch 48/60
30/30 [==============================] - 0s - loss: 4.8648e-12     
Epoch 49/60
30/30 [==============================] - 0s - loss: 3.5887e-12     
Epoch 50/60
30/30 [==============================] - 0s - loss: 2.0066e-12     
Epoch 51/60
30/30 [==============================] - 0s - loss: 1.1189e-12     
Epoch 52/60
30/30 [==============================] - 0s - loss: 6.5086e-13     
Epoch 53/60
30/30 [==============================] - 0s - loss: 3.3727e-13     
Epoch 54/60
30/30 [==============================] - 0s - loss: 1.7621e-13     
Epoch 55/60
30/30 [==============================] - 0s - loss: 9.8529e-14     
Epoch 56/60
30/30 [==============================] - 0s - loss: 5.6370e-14     
Epoch 57/60
30/30 [==============================] - 0s - loss: 5.3054e-14     
Epoch 58/60
30/30 [==============================] - 0s - loss: 4.9738e-14     
Epoch 59/60
30/30 [==============================] - 0s - loss: 3.9790e-14     
Epoch 60/60
30/30 [==============================] - 0s - loss: 4.4527e-14     
[array([[ 3.        ],
       [ 1.99999952]], dtype=float32), array([ 1.00000024], dtype=float32)]

Similar Posts: