Statistical modeling: The two cultures
I like review articles esp. because I’m still learning machine learning & statistics. This article discussed the reason why the author thinks statistics didn’t play much role in machine learning.
Here’s author’s arguments in the article which I think interesting to read:
1) Standard tests of goodness-of-fit did not reject linearity until the nonlinearity (in the data) is extreme.
2) (In statistical modeling) The question of how well the model fits the data is of secondary importance compared to the construction of an ingenious stochastic model.
3) … as data becomes more complex, the data model become more cumbersome and are losing the advantage of presenting a simple and clear picture of nature’s mechanism.
4) Unfortunately, in prediction, accuracy and simplicity are in conflict. (And the author claims that accuracy is more important than simplicity; e.g., a single decision tree VS random forest)
Author argues that statisticians need to spend more time on algorithmic models, e.g., neural net, forest, and support vectors instead of data modeling, e.g., linear regression.
This article is cited 743 times according to citeseer.