Prototypes.Rd
Prototypes are `representative' cases of a group of data points, given the similarity matrix among the points. They are very similar to medoids.
the response variable. Should be a factor.
matrix or data frame of predictor variables.
the proximity (or similarity) matrix, assumed to be symmetric with 1 on the diagonal and in [0, 1] off the diagonal (the order of row/column must match that of x)
number of prototypes to compute for each value of the response variables.
number of nearest neighbors used to find the prototypes.
For each case in x, the nNbr nearest neighors are found. Then, for each class, the case that has most neighbors of that class is identified. The prototype for that class is then the medoid of these neighbors (coordinate-wise medians for numerical variables and modes for categorical variables). One then remove the neighbors used and iterate the first steps to find a second prototype, etc.
A list of data frames with prototypes. The number of data frames is equal to the number of classes of the response variable.
Random Forests, by Leo Breiman and Adele Cutler https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#prototype
The code is an extension of classCenter
function in randomForest
package.
data(iris)
iris2 = iris
iris2$Species = factor(iris$Species == "versicolor")
iris.cf = party::cforest(Species ~ ., data = iris2,
control = party::cforest_unbiased(mtry = 2, ntree = 50))
prox=proximity(iris.cf)
Prototypes(iris2$Species,iris2[,1:4],prox)
#> $`FALSE`
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> [1,] "6.7" "2.8" "5.8" "1.9"
#> [2,] "4.8" "3" "1.4" "0.2"
#> [3,] "5" "3.4" "1.4" "0.2"
#> [4,] "6.75" "2.55" "5.65" "2.05"
#> [5,] "4.7" "3.2" "1.4" "0.2"
#>
#> $`TRUE`
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> [1,] "5.5" "2.4" "3.7" "1.1"
#> [2,] "5.6" "2.7" "3.9" "1.4"
#> [3,] "5.9" "2.7" "4.2" "1.4"
#> [4,] "6.1" "2.9" "4.5" "1.3"
#> [5,] "6.3" "2.9" "4.6" "1.4"
#>