• Decision tree for iris in R

    Install tree package: Let’s check how iris looks: Build tree: Let’s check how it looks: There are 6 terminal nodes in the tree. Let’s plot it: See how accurate it is: We have pretty small misclassification rate. But, to avoid overfitting, let’s do k-fold cross-validation and prune the tree: Deviance (entropy like measure used in…

    Tags:

  • Why C++? by Herb Sutter

    Tags:

  • Cluster analysis of iris in R

    First, remove species: Draw hierarchical clustering: It’s very tempting to pick three clusters as we already know that there are three species. So, cut cluster at three: Now we got three clusters for the iris data. We can check it’s accuracy using table: It’s easy to see that versicolor is often confused with virginica in…

    Tags:

  • Drawing scatter plot matrix for iris in R

    Don’t use pairs, but splom. This renders: Reference) R: Scatter Plot Matrices. http://stat.ethz.ch/R-manual/R-devel/library/lattice/html/splom.html

    Tags:

  • Clustering in R and using tapply for finding centroids

    There are two useful functions for clustering: hclust (for hierarchical clustering) and plclust (plotting cluster). Given a matrix: Here, “average” is the way we compute distance between clusters. In case of average, we compute the average distance between every possible pairs (a, b) where a and b is from two different clusters. That’s easy part.…

    Tags:

  • ANOVA in R

    Look. You should call summary on the ret. val of aov to get the following statistics. Because P value is very small, H0 is rejected; Petal.Width is different depending on Species. We can draw a boxplot to visualize this: Some texts[1] explain that F statistics in ANOVA is (between class variance) / (within class variance)…

    Tags:

  • Mahalanobis Distance in R #2

    Continuing from the previous thread, Let’s compute mahalanobis distance using the equation of it instead of using mahalanobis function: We can compute the distance between two means of two groups using pooled covariance matrix in mahalanobis distance as well.

    Tags:

  • Mahalanobis Distance in R

    Mahalanobis distance computes distance of two points considering covariance of data points, namely, mahalanobis distance = (d – AVG(d)) / Covariance = d’C-1d where d is euclidean distance between two points. In R[1]: Now we compute mahalanobis distance between the first data and the rest. Click here for the next article on this topic. References)…

    Tags:

  • Principal Component Analysis in R

    Principal component analysis (PCA) is a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called principal components[1]. PCA tries to reduce dimension while maintaining axis which has the largest variance. And the problem boils down to getting principal…

    Tags:

  • Factor Analysis in R

    Factor analysis assume the following[1]: x – mu = LF + epsilon where L is factor loading, and epsilon is error terms(or uniqueness that’s not explained by common latent factor F). This is basically saying that x is generated from low dimensional F. F is multiplied by L (so that it can be high dimension),…

    Tags: