Passion is like genius; a miracle. – Page 2 – Blog on Software, Statistics, and Quant

Sort values in each row of polars df

If your data size isn’t too big, here’s an easy solution. Let’s say you have this data: Use this function to sort values across chosen columns but within each row. As an example, if we sort across [“A”, “C”]:

June 21, 2024

Tags:

software
How to suppress “Using categorical units to plot a list of strings that are all parsable as floats or dates.”

If log level is set to INFO, “Using categorical units to plot a list of strings that are all parsable as floats or dates. If these strings should be plotted as numbers, cast to the appropriate data type before plotting.” may be observed when it’s not really relevant. Here’s how to suppress it.

June 18, 2024

Tags:

software
Outlier, drift detection

I learned alibi-detect today and it looks great. It has many algorithms for outlier and drift detection. The page even has a link for youtube video that explains drift detection.

June 10, 2024

Tags:

machine learning
웹사이트 성능 최적화

소프트웨어 품질의 끝은 결국 측정과 최적화라고 생각하고, 측정이 모든 접근의 시작이자 끝이라고 생각한다. 측정은 목표를 설정할 수 있게 하고, 그 과정에서 목표 자체에 의미가 있는지 따져보게 한다. 하지만 기술적 지식으로 무장한 엔지니어들은 측정을 먼저 시작하기보다는 해결 방법과 수단을 먼저 떠올리게 되고 그것부터 적용 해 놓고 보려고 한다. 대부분의 경우에 적용되는 훌륭한 룰들은 있어서 그게 항상…

June 6, 2024

Tags:

software, web
ASUS Chromebook Plus CX34

ASUS Chromebook Plus CX34. 또 크롬북을 샀다. 이제 써본게 총 6대째인가. 너무 많이 사봤다. 처음 느낌은 이 정도면 크롬북 수준에서 괜찮은 터치패드, 이 정도면 괜찮은 키감. 살짝 키 간격이 넓은 느낌도 있는데 느낌인 것으로… 터치패드 누를때 들어가는 느낌이 별로라는 리뷰가 맞긴한데, 그런 기대는 크롬북에 바라는게 너무 많은거다. 터치패드의 기본 동작도 잘 안되는 모델이 많으니. 사진만큼…

June 2, 2024

Tags:

review
Two interesting cross validation in scikit learn

Scikit is excellent esp when considering these advanced tools. One is calibrated classifier cv. It tries to match model’s probability with the actually observed probability. The other is TunedThresholdClassifierCV yet another interesting cv. If application requires different scores, e.g. F1, one can tune the decision threshold using it.

June 1, 2024

Tags:

machine learning
Getting a probability given a prediction score

Platt scaling is a method to scale score to probability. It uses logistics transformation with some learable parameters. What’s interesting is the use of laplace smoothing (or, uniform prior) to avoid overfitting.

June 1, 2024

Tags:

machine learning
Modern linux cli tools

I personally use bat, fd, exa, rg, jq, fzf, sd. https://github.com/ibraheemdev/modern-unix?tab=readme-ov-file Not mentioned in the above but z is also an awesome tool.

June 1, 2024

Tags:

software
Conformal prediction for dummies

Here I’m writing the simplest form of the concept so that anyone can quickly get the idea. If you want a serious post, read paper or other blog article. This isn’t for you. Conformal prediction outputs range for regression and multiple lables for classifications. Its purpose is to have output contains the correct answer for…

May 30, 2024

Tags:

machine learning
OpenAI batch API python example

OpenAI announced batch api which “returns completions within 24 hours for a 50% discount.” To test it, I wrote a trivial python example of the API. I didn’t test the response retrieval yet since my run will take 24h, but I expect it works fine, hopefully. Use jupyter notebook to persist the print output!

April 30, 2024

Tags:

llm, software