catdesc.Rd
Measures the association between a categorical variable and some continuous and/or categorical variables
catdesc(y, x, weights = NULL,
na.rm.cat = FALSE, na.value.cat = "NAs", na.rm.cont = FALSE,
measure = "phi", limit = NULL, correlation = "kendall", robust = TRUE,
nperm = NULL, distrib = "asympt", digits = 2)
the categorical variable to describe (must be a factor)
a data frame with continuous and/or categorical variables
numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used.
logical, indicating whether NA values in the categorical variables should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the categorical variables (see na.value.cat argument).
character. Name of the level for NA category. Default is "NAs". Only used if na.rm.cat = FALSE.
logical, indicating whether NA values in the continuous variables should be silently removed before the computation proceeds. Default is FALSE.
character. The measure of local association between categories of categorical variables. Can be "phi" for phi coefficient (default), "or" for odds ratios, "std.residuals" for standardized (i.e. Pearson) residuals, "adj.residuals" for adjusted standardized residuals or "pem" for local percentages of maximum deviation from independence.
for the relationship between y and a categorical variable, only associations higher or equal to limit
will be displayed. If NULL (default), they are all displayed.
character. The type of measure of correlation measure to use between two continuous variables : "pearson", "spearman" or "kendall" (default).
logical. If TRUE (default), median and mad are used instead of mean and standard deviation.
numeric. Number of permutations for the permutation test of independence. If NULL (default), no permutation test is performed.
the null distribution of permutation test of independence can be approximated by its asymptotic distribution ("asympt"
, default) or via Monte Carlo resampling ("approx"
).
numeric. Number of digits for mean, median, standard deviation and mad. Default is 2.
A list of the following items :
associations between y and the variables in x
a list with one element for each level of y
Each element in bylevel has the following items :
a data frame with categorical variables from x and local associations
a data frame with continuous variables from x and associations measured by correlation coefficients
If nperm is not NULL, permutation tests of independence are computed and the p-values from these tests are provided.
Rakotomalala R., 'Comprendre la taille d'effet (effect size)', [http://eric.univ-lyon2.fr/~ricco/cours/slides/effect_size.pdf]
data(Movies)
catdesc(Movies$ArtHouse, Movies[,c("Budget","Genre","Country")])
#> $variables
#> variable measure association
#> 1 Genre Cramer V 0.554
#> 2 Country Cramer V 0.469
#> 3 Budget Eta2 0.181
#>
#> $bylevel
#> $bylevel$No
#> $bylevel$No$categories
#> categories freq pct.y.in.x pct.x.in.y overall.pct.x phi
#> 1 Country.USA 257 86.5 50.0 29.7 0.457
#> 2 Genre.Comedy 161 72.5 31.3 22.2 0.226
#> 3 Genre.Action 123 74.5 23.9 16.5 0.206
#> 4 Genre.SciFi 44 89.8 8.6 4.9 0.174
#> 5 Genre.Horror 25 100.0 4.9 2.5 0.156
#> 6 Genre.Animation 38 82.6 7.4 4.6 0.137
#> 7 Genre.Other 15 57.7 2.9 2.6 0.021
#> 8 Country.Europe 39 54.2 7.6 7.2 0.015
#> 9 Country.Other 6 23.1 1.2 2.6 -0.093
#> 10 Genre.ComDram 50 33.6 9.7 14.9 -0.149
#> 11 Genre.Documentary 9 11.7 1.8 7.7 -0.229
#> 12 Genre.Drama 49 20.3 9.5 24.1 -0.350
#> 13 Country.France 212 35.0 41.2 60.5 -0.405
#>
#> $bylevel$No$continuous.var
#> variables median.in.category overall.median mad.in.category overall.mad
#> 1 Budget 17218500 6127500 12309532 5156921
#> correlation
#> 1 0.426
#>
#>
#> $bylevel$Yes
#> $bylevel$Yes$categories
#> categories freq pct.y.in.x pct.x.in.y overall.pct.x phi
#> 1 Country.France 393 65.0 80.9 60.5 0.405
#> 2 Genre.Drama 192 79.7 39.5 24.1 0.350
#> 3 Genre.Documentary 68 88.3 14.0 7.7 0.229
#> 4 Genre.ComDram 99 66.4 20.4 14.9 0.149
#> 5 Country.Other 20 76.9 4.1 2.6 0.093
#> 6 Country.Europe 33 45.8 6.8 7.2 -0.015
#> 7 Genre.Other 11 42.3 2.3 2.6 -0.021
#> 8 Genre.Animation 8 17.4 1.6 4.6 -0.137
#> 9 Genre.Horror 0 0.0 0.0 2.5 -0.156
#> 10 Genre.SciFi 5 10.2 1.0 4.9 -0.174
#> 11 Genre.Action 42 25.5 8.6 16.5 -0.206
#> 12 Genre.Comedy 61 27.5 12.6 22.2 -0.226
#> 13 Country.USA 40 13.5 8.2 29.7 -0.457
#>
#> $bylevel$Yes$continuous.var
#> variables median.in.category overall.median mad.in.category overall.mad
#> 1 Budget 2281690 6127500 1629608 5156921
#> correlation
#> 1 -0.426
#>
#>
#>