Tag: deep learning

  • Early stopping and timeout in optuna

    I have previously explained how to run optuna in parallel. Here, I’d like to explain how one can apply timeout and early stopping. First thing to understand is the use of Manager. It’s required to safely passing around variables in multiprocessing context. In the code below, success, pruneded, failure counters are passed so that multiprocessing…

  • Using list of tuples or list of list in optuna

    Optuna, hyper paramerter tuning library, doesn’t allow non primitive (str, int, float, None) type of values in categorical proposal. See Support tuples for Categorical values #4893. The post there has a workaround, and here I’d like to write down a bit more generic approach. The idea is feeding string values of tuples, i.e., [‘(1, 2,…

  • Polyak averaging and gradient accumulation in the Keras

    So I discovered them in the tweets. As the paper Polyak Parameter Ensemble: Exponential Parameter Growth Leads to Better Generalization shows, Polyak Ensemble is for better generalization. Another in the tweet is gradient accumulation. I don’t know if I’ll need it since I often makes the batch size as large as possible.

  • Positional encoding for timeseries data

    This is positional encoding for timseries data inspired by A Transformer-based Framework for Multivariate Time Series Representation Learning. Assuming that the input is (batch_size, sequence_length, feature_dim) and each feature is a float, the way we add positional encoding is: This is applied like: See PositionalEncoding of keras_nlp here for generic implementation.

  • A trick to ignore a data in categorical cross entropy of keras

    When there’s a data point that you want to ignore in loss computation, you can use ignore_class parameter in tf.keras.losses.SparseCategoricalCrossentropy. But the same parameter doesn’t exist in tf.keras.losses.CategoricalCrossentropy. I don’t know why, but that’s troublesome when there’s needs. Even the SparseCategoricalCrossEntropy’s ignore_class isn’t easy to use either since it requires one to add a class…

  • Better dict for configuration

    When I use dict as a config of deep learning algorithms, I often run multiple training runs, varying the config value. As a trivial example: And then I run multiple iterations varying the ‘lr’. When doing that, I feel worried about making mistake like updating the dict with wrong key, e.g., To address such concerns,…

  • Keras 에서 Jax, Flax 로 코드를 옮겨본 소감

    성능이 더 개선된다는 이유로, 그리고 기술적인 호기심으로 Keras 코드를 Jax로 옮기고 있다. 그러면서 느낀 점들을 공유해서 같은 migration을 하고 싶은 경우에 참고가 되었으면 한다. Jax 의 첫인상은 배우기 어렵단 것이었다. 특히 Sharpbits 에 그 내용이 잘 정리되어있는데 대부분은 데이터가 immutable 하다는 것으로 요약된다. 이것이 처음에는 매우 당혹스럽다가 서서히 익숙해졌다. 그러나 몇가지는 생각지도 못한 문제가 있었다.…

  • A deep learning architecture to apply Dow theory in stock price prediction

    What are the features to predict stock price? Obvious choices include: When predicting the stock price of, say, AAPL, it’s critical to consider other stocks’ price, e.g., MSFT, GOOG, META, NVDA, etc. If all the tickers in the NASDAQ are going down, we can say with confidence that we’re in a bear market. DOW theory…

  • Weighted Categorical Cross Entropy in Keras

    It took me a while until I find where the weighted categorical cross entropy is in the Keras. It’s actually supported by CategoricalFocalCrossentropy. If you set gamma=0 and give alpha=[… list of class weight …], it becomes the weighted one. Since you’re looking for weighted one probably due to class imbalance, I suggest to look…

  • keras의 timeseries_dataset_from_array 사용시 주의할 점

    timeseries_dataset_from_array 에 대한 github issue 예를들어 X=[1, 2, 3], y=[2, 3, 4] 일때 sequence_length=2 라면 X=[1, 2], y=[3] 이 올거라고 생각하지만 아니다. y=[2]가 옴. 다시말해 y는 x 의 시작점의 값임. 간단하게는 y를 shift 해서 호출하면 해결되지만 API에 이런 점이 있다는걸 놓치기 너무 쉽다. 이슈가 그냥 닫혀버려서 더욱 아쉬움.