-
Decision tree for iris in R
Install tree package: Let’s check how iris looks: Build tree: Let’s check how it looks: There are 6 terminal nodes in the tree. Let’s plot it: See how accurate it is: We have pretty small misclassification rate. But, to avoid overfitting, let’s do k-fold cross-validation and prune the tree: Deviance (entropy like measure used in…
Tags:
-
Why C++? by Herb Sutter
Tags:
-
Cluster analysis of iris in R
First, remove species: Draw hierarchical clustering: It’s very tempting to pick three clusters as we already know that there are three species. So, cut cluster at three: Now we got three clusters for the iris data. We can check it’s accuracy using table: It’s easy to see that versicolor is often confused with virginica in…
Tags:
-
Drawing scatter plot matrix for iris in R
Don’t use pairs, but splom. This renders: Reference) R: Scatter Plot Matrices. http://stat.ethz.ch/R-manual/R-devel/library/lattice/html/splom.html
Tags:
-
Clustering in R and using tapply for finding centroids
There are two useful functions for clustering: hclust (for hierarchical clustering) and plclust (plotting cluster). Given a matrix: Here, “average” is the way we compute distance between clusters. In case of average, we compute the average distance between every possible pairs (a, b) where a and b is from two different clusters. That’s easy part.…
Tags:
-
ANOVA in R
Look. You should call summary on the ret. val of aov to get the following statistics. Because P value is very small, H0 is rejected; Petal.Width is different depending on Species. We can draw a boxplot to visualize this: Some texts[1] explain that F statistics in ANOVA is (between class variance) / (within class variance)…
Tags:
-
Mahalanobis Distance in R #2
Continuing from the previous thread, Let’s compute mahalanobis distance using the equation of it instead of using mahalanobis function: We can compute the distance between two means of two groups using pooled covariance matrix in mahalanobis distance as well.
Tags:
-
Mahalanobis Distance in R
Mahalanobis distance computes distance of two points considering covariance of data points, namely, mahalanobis distance = (d – AVG(d)) / Covariance = d’C-1d where d is euclidean distance between two points. In R[1]: Now we compute mahalanobis distance between the first data and the rest. Click here for the next article on this topic. References)…
Tags:
-
Principal Component Analysis in R
Principal component analysis (PCA) is a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called principal components[1]. PCA tries to reduce dimension while maintaining axis which has the largest variance. And the problem boils down to getting principal…
Tags:
-
Factor Analysis in R
Factor analysis assume the following[1]: x – mu = LF + epsilon where L is factor loading, and epsilon is error terms(or uniqueness that’s not explained by common latent factor F). This is basically saying that x is generated from low dimensional F. F is multiplied by L (so that it can be high dimension),…
Tags: