Measures the association between a categorical variable and a continuous variable

assoc.catcont(x, y, weights = NULL,
              na.rm.cat = FALSE, na.value.cat = "NAs", na.rm.cont = FALSE,
              nperm = NULL, distrib = "asympt", digits = 3)

Arguments

x: the categorical variable (must be a factor)
y: the continuous variable (must be a numeric vector)
weights: numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used.
na.rm.cat: logical, indicating whether NA values in the categorical variable (i.e. x) should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the categorical variable (see na.value.cat argument).
na.value.cat: character. Name of the level for NA category. Default is "NAs". Only used if na.rm.cat = FALSE.
na.rm.cont: logical, indicating whether NA values in the continuous variable (i.e. y) should be silently removed before the computation proceeds. Default is FALSE.
nperm: numeric. Number of permutations for the permutation test of independence. If NULL (default), no permutation test is performed.
distrib: the null distribution of permutation test of independence can be approximated by its asymptotic distribution ("asympt", default) or via Monte Carlo resampling ("approx".
digits: integer. The number of digits (default is 3).

Value

A list with the following elements :

summary: summary statistics (mean, median, etc.) of the continuous variable for each level of the categorical variable
eta.squared: eta-squared between the two variables
permutation.pvalue: p-value from a permutation (i.e. non-parametric) test of independence
cor: point biserial correlation between the two variables, for each level of the categorical variable
cor.perm.pval: permutation p-value of the correlation between the two variables, for each level of the categorical variable
test.values: test-values as proposed by Lebart et al (1984)
test.values.pval: p-values corresponding to the test-values

References

Rakotomalala R., 'Comprendre la taille d'effet (effect size)', [http://eric.univ-lyon2.fr/~ricco/cours/slides/effect_size.pdf]

Lebart L., Morineau A. and Warwick K., 1984, *Multivariate Descriptive Statistical Analysis*, John Wiley and sons, New-York.

Author

Nicolas Robette

Examples

data(Movies)
with(Movies, assoc.catcont(Country, Budget, nperm = 10))
#> $summary
#>            mean       sd       min       q1   median       q3       max
#> Europe 25171951 38337712   500.000  2338250 10310500 24305750 163400000
#> France  5915391 10329243 29500.000  1303662  3399000  7159867 181621894
#> Other  14178774 25915170 40850.000  1382000  3401171  8170000 103759000
#> USA    42053782 39735878     0.817 16340000 30229000 53105000 245100000
#>             mad
#> Europe  8594800
#> France  2359928
#> Other   2461621
#> USA    17157000
#> 
#> $eta.squared
#> [1] 0.2868784
#> 
#> $permutation.pvalue
#> [1] 0
#> 
#> $cor
#> Europe France  Other    USA 
#>  0.064 -0.503 -0.022  0.510 
#> 
#> $cor.perm.pval
#>       Europe       France        Other          USA 
#> 2.024832e-02 2.754788e-81 3.329593e-01 0.000000e+00 
#> 
#> $test.values
#>      Europe      France       Other         USA 
#>   2.0081076 -15.8983644  -0.6927347  16.1140394 
#> 
#> $test.values.pval
#>     Europe     France      Other        USA 
#> 0.04463186 0.00000000 0.48847608 0.00000000 
#>