Tag: deep learning

  • Polyak averaging and gradient accumulation in the Keras

    So I discovered them in the tweets. As the paper Polyak Parameter Ensemble: Exponential Parameter Growth Leads to Better Generalization shows, Polyak Ensemble is for better generalization. Another in the tweet is gradient accumulation. I don’t know if I’ll need it since I often makes the batch size as large as possible.

  • Positional encoding for timeseries data

    This is positional encoding for timseries data inspired by A Transformer-based Framework for Multivariate Time Series Representation Learning. Assuming that the input is (batch_size, sequence_length, feature_dim) and each feature is a float, the way we add positional encoding is: This is applied like: See PositionalEncoding of keras_nlp here for generic implementation.

  • A trick to ignore a data in categorical cross entropy of keras

    When there’s a data point that you want to ignore in loss computation, you can use ignore_class parameter in tf.keras.losses.SparseCategoricalCrossentropy. But the same parameter doesn’t exist in tf.keras.losses.CategoricalCrossentropy. I don’t know why, but that’s troublesome when there’s needs. Even the SparseCategoricalCrossEntropy’s ignore_class isn’t easy to use either since it requires one to add a class…

  • Better dict for configuration

    When I use dict as a config of deep learning algorithms, I often run multiple training runs, varying the config value. As a trivial example: And then I run multiple iterations varying the ‘lr’. When doing that, I feel worried about making mistake like updating the dict with wrong key, e.g., To address such concerns,…

  • Keras 에서 Jax, Flax 로 코드를 옮겨본 소감

    성능이 더 개선된다는 이유로, 그리고 기술적인 호기심으로 Keras 코드를 Jax로 옮기고 있다. 그러면서 느낀 점들을 공유해서 같은 migration을 하고 싶은 경우에 참고가 되었으면 한다. Jax 의 첫인상은 배우기 어렵단 것이었다. 특히 Sharpbits 에 그 내용이 잘 정리되어있는데 대부분은 데이터가 immutable 하다는 것으로 요약된다. 이것이 처음에는 매우 당혹스럽다가 서서히 익숙해졌다. 그러나 몇가지는 생각지도 못한 문제가 있었다.…

  • A deep learning architecture to apply Dow theory in stock price prediction

    What are the features to predict stock price? Obvious choices include: When predicting the stock price of, say, AAPL, it’s critical to consider other stocks’ price, e.g., MSFT, GOOG, META, NVDA, etc. If all the tickers in the NASDAQ are going down, we can say with confidence that we’re in a bear market. DOW theory…

  • Weighted Categorical Cross Entropy in Keras

    It took me a while until I find where the weighted categorical cross entropy is in the Keras. It’s actually supported by CategoricalFocalCrossentropy. If you set gamma=0 and give alpha=[… list of class weight …], it becomes the weighted one. Since you’re looking for weighted one probably due to class imbalance, I suggest to look…

  • keras의 timeseries_dataset_from_array 사용시 주의할 점

    timeseries_dataset_from_array 에 대한 github issue 예를들어 X=[1, 2, 3], y=[2, 3, 4] 일때 sequence_length=2 라면 X=[1, 2], y=[3] 이 올거라고 생각하지만 아니다. y=[2]가 옴. 다시말해 y는 x 의 시작점의 값임. 간단하게는 y를 shift 해서 호출하면 해결되지만 API에 이런 점이 있다는걸 놓치기 너무 쉽다. 이슈가 그냥 닫혀버려서 더욱 아쉬움.

  • MASE, RMSSE

    MAE, RMSE를 scale. Scale 시 이전 예측값으로 미래를 예측하는 방법 사용.

  • CNN performs well for trend detection

    STOCK PRICE PREDICTION USING LSTM,RNN AND CNN-SLIDING WINDOW MODEL compares RNN, CNN, and LSTM performance in predicting stock price. Very interestingly, RNN and LSTM couldn’t detect trend changes (fluctuations without specific direction VS upward direction) while CNN could. Although the experiements in the paper uses just two stocks, I see other papers also suggesting CNN…