Cross-tabulation and measures of association between two categorical variables

assoc.twocat(x,y,weights=rep.int(1,length(x)),na_value=NULL,nperm=NULL,distrib="asympt")

Arguments

x

the first categorical variable (must be a factor)

y

the second categorical variable (must be a factor)

weights

an optional numeric vector of weights (by default, a vector of 1 for uniform weights)

na_value

character. Name of the level for NA category. If NULL (default), NA values are ignored.

nperm

numeric. Number of permutations for the permutation test of independence. If NULL (default), no permutation test is performed. Default is 1000.

distrib

the null distribution of permutation test of independence can be approximated by its asymptotic distribution ("asympt", default) or via Monte Carlo resampling ("approx".

Value

A list with the following elements :

freq

cross-tabulation

prop

percentages

rprop

row percentages

cprop

column percentages

expected

expected values

chi.squared

chi-squared value

cramer.v

Cramer's V between the two variables

permutation.pvalue

p-value from a permutation (so non-parametric) test of independence

pearson.residuals

the table of Pearson residuals, i.e. (observed - expected) / sqrt(expected).

phi

the table of the phi coefficients for each pair of levels

phi.perm.pval

the table of permutation p-values for each pair of levels

gather

a data frame gathering informations, with one row per cell of the cross-tabulation

References

Rakotomalala R., 'Comprendre la taille d'effet (effect size)', http://eric.univ-lyon2.fr/~ricco/cours/slides/effect_size.pdf

Author

Nicolas Robette

See also

Examples

data(Music) assoc.twocat(Music$Jazz,Music$Age,nperm=100)
#> $freq #> 15-24 25-49 50+ Sum #> No 67 159 169 395 #> Yes 8 43 44 95 #> NA 3 2 5 10 #> Sum 78 204 218 500 #> #> $prop #> 15-24 25-49 50+ Sum #> No 13.4 31.8 33.8 79.0 #> Yes 1.6 8.6 8.8 19.0 #> NA 0.6 0.4 1.0 2.0 #> Sum 15.6 40.8 43.6 100.0 #> #> $rprop #> 15-24 25-49 50+ Sum #> No 16.962025 40.25316 42.78481 100 #> Yes 8.421053 45.26316 46.31579 100 #> NA 30.000000 20.00000 50.00000 100 #> Sum 15.600000 40.80000 43.60000 100 #> #> $cprop #> 15-24 25-49 50+ Sum #> No 85.897436 77.9411765 77.522936 79 #> Yes 10.256410 21.0784314 20.183486 19 #> NA 3.846154 0.9803922 2.293578 2 #> Sum 100.000000 100.0000000 100.000000 100 #> #> $expected #> 15-24 25-49 50+ #> No 61.62 161.16 172.22 #> Yes 14.82 38.76 41.42 #> NA 1.56 4.08 4.36 #> #> $chi.squared #> [1] 6.805458 #> #> $cramer.v #> [1] 0.0824952 #> #> $permutation.pvalue #> [1] 0.1438201 #> #> $pearson.residuals #> 15-24 25-49 50+ #> No 0.6853642 -0.1701473 -0.2453658 #> Yes -1.7715780 0.6810421 0.4008802 #> NA 1.1529227 -1.0297534 0.3065044 #> #> $phi #> 15-24 25-49 50+ #> No 0.07280405 -0.02158090 -0.03188452 #> Yes -0.09582119 0.04398308 0.02652452 #> NA 0.05669319 -0.06046087 0.01843738 #> #> $phi.perm.pval #> 15-24 25-49 50+ #> No 0.06015095 0.33364299 0.20831222 #> Yes 0.01480271 0.16719800 0.26415346 #> NA 0.08283646 0.07358127 0.29396837 #> #> $gather #> Var1 Var2 Freq prop rprop cprop expected std.residuals #> 1 No 15-24 67 0.134 0.16962025 0.858974359 61.62 0.6853642 #> 2 Yes 15-24 8 0.016 0.08421053 0.102564103 14.82 -1.7715780 #> 3 NA 15-24 3 0.006 0.30000000 0.038461538 1.56 1.1529227 #> 4 No 25-49 159 0.318 0.40253165 0.779411765 161.16 -0.1701473 #> 5 Yes 25-49 43 0.086 0.45263158 0.210784314 38.76 0.6810421 #> 6 NA 25-49 2 0.004 0.20000000 0.009803922 4.08 -1.0297534 #> 7 No 50+ 169 0.338 0.42784810 0.775229358 172.22 -0.2453658 #> 8 Yes 50+ 44 0.088 0.46315789 0.201834862 41.42 0.4008802 #> 9 NA 50+ 5 0.010 0.50000000 0.022935780 4.36 0.3065044 #> phi perm.pval #> 1 0.07280405 0.06015095 #> 2 -0.09582119 0.01480271 #> 3 0.05669319 0.08283646 #> 4 -0.02158090 0.33364299 #> 5 0.04398308 0.16719800 #> 6 -0.06046087 0.07358127 #> 7 -0.03188452 0.20831222 #> 8 0.02652452 0.26415346 #> 9 0.01843738 0.29396837 #>