Cross-tabulation and measures of association between two categorical variables

assoc.twocat(x, y, weights = NULL, na.rm = FALSE, na.value = "NAs",
             nperm = NULL, distrib = "asympt")

Arguments

x

the first categorical variable (must be a factor)

y

the second categorical variable (must be a factor)

weights

numeric vector of weights. If NULL (default), uniform weights (i.e. all equal to 1) are used.

na.rm

logical, indicating whether NA values should be silently removed before the computation proceeds. If FALSE (default), an additional level is added to the variables (see na.value argument).

na.value

character. Name of the level for NA category. Default is "NAs". Only used if na.rm = FALSE.

nperm

numeric. Number of permutations for the permutation test of independence. If NULL (default), no permutation test is performed.

distrib

the null distribution of permutation test of independence can be approximated by its asymptotic distribution (asympt, default) or via Monte Carlo resampling (approx).

Value

A list of lists with the following elements :

tables list :

freq

cross-tabulation frequencies

prop

percentages

rprop

row percentages

cprop

column percentages

expected

expected values

global list :

chi.squared

chi-squared value

cramer.v

Cramer's V between the two variables

permutation.pvalue

p-value from a permutation (i.e. non-parametric) test of independence

global.pem

global PEM

GK.tau.xy

Goodman and Kruskal tau (forward association, i.e. x is the predictor and y is the response)

GK.tau.yx

Goodman and Kruskal tau (backward association, i.e. y is the predictor and x is the respons)

local list :

std.residuals

the table of standardized (i.e. Pearson) residuals.

adj.residuals

the table of adjusted standardized residuals.

adj.res.pval

the table of p-values of adjusted standardized residuals.

odds.ratios

the table of odds ratios.

local.pem

the table of local PEM

phi

the table of the phi coefficients for each pair of levels

phi.perm.pval

the table of permutation p-values for each pair of levels

gather : a data frame gathering informations, with one row per cell of the cross-tabulation.

Note

The adjusted standardized residuals are strictly equivalent to test-values for nominal variables as proposed by Lebart et al (1984).

References

Agresti, A. (2007). An Introduction to Categorical Data Analysis, 2nd ed. New York: John Wiley & Sons.

Rakotomalala R., Comprendre la taille d'effet (effect size), http://eric.univ-lyon2.fr/~ricco/cours/slides/effect_size.pdf

Lebart L., Morineau A. and Warwick K., 1984, *Multivariate Descriptive Statistical Analysis*, John Wiley and sons, New-York.

Author

Nicolas Robette

Examples

data(Movies)
assoc.twocat(Movies$Country, Movies$ArtHouse, nperm=100)
#> $tables
#> $tables$freq
#>          No  Yes  Sum
#> Europe   39   33   72
#> France  212  393  605
#> Other     6   20   26
#> USA     257   40  297
#> Sum     514  486 1000
#> 
#> $tables$prop
#>           No   Yes   Sum
#> Europe   3.9   3.3   7.2
#> France  21.2  39.3  60.5
#> Other    0.6   2.0   2.6
#> USA     25.7   4.0  29.7
#> Sum     51.4  48.6 100.0
#> 
#> $tables$rprop
#>              No      Yes Sum
#> Europe 54.16667 45.83333 100
#> France 35.04132 64.95868 100
#> Other  23.07692 76.92308 100
#> USA    86.53199 13.46801 100
#> Sum    51.40000 48.60000 100
#> 
#> $tables$cprop
#>                No        Yes   Sum
#> Europe   7.587549   6.790123   7.2
#> France  41.245136  80.864198  60.5
#> Other    1.167315   4.115226   2.6
#> USA     50.000000   8.230453  29.7
#> Sum    100.000000 100.000000 100.0
#> 
#> $tables$expected
#>             No     Yes
#> Europe  37.008  34.992
#> France 310.970 294.030
#> Other   13.364  12.636
#> USA    152.658 144.342
#> 
#> 
#> $global
#> $global$chi.squared
#> [1] 220.1263
#> 
#> $global$cramer.v
#> [1] 0.4691762
#> 
#> $global$permutation.pvalue
#> [1] 0
#> 
#> $global$global.pem
#> [1] 64.04814
#> 
#> $global$GK.tau.xy
#> [1] 0.2201263
#> 
#> $global$GK.tau.yx
#> [1] 0.1537807
#> 
#> 
#> $local
#> $local$std.residuals
#>                No        Yes
#> Europe  0.3274474 -0.3367479
#> France -5.6123445  5.7717531
#> Other  -2.0143992  2.0716146
#> USA     8.4449945 -8.6848595
#> 
#> $local$adj.residuals
#>                No        Yes
#> Europe   0.487584  -0.487584
#> France -12.809366  12.809366
#> Other   -2.927844   2.927844
#> USA     14.447862 -14.447862
#> 
#> $local$adj.res.pval
#>                 No         Yes
#> Europe 0.625844564 0.625844564
#> France 0.000000000 0.000000000
#> Other  0.003413213 0.003413213
#> USA    0.000000000 0.000000000
#> 
#> $local$odds.ratios
#>                No        Yes
#> Europe  1.1270813  0.8872474
#> France  0.1661190  6.0197809
#> Other   0.2751969  3.6337625
#> USA    11.1500000  0.0896861
#> 
#> $local$local.pem
#>         y
#> x               No       Yes
#>   Europe   5.69273  -5.69273
#>   France -51.55493  51.55493
#>   Other  -55.10326  55.10326
#>   USA     72.28804 -72.28804
#> 
#> $local$phi
#>                 No         Yes
#> Europe  0.01541876 -0.01541876
#> France -0.40506773  0.40506773
#> Other  -0.09258656  0.09258656
#> USA     0.45688150 -0.45688150
#> 
#> $local$phi.perm.pval
#>                  No          Yes
#> Europe 3.298231e-01 3.298231e-01
#> France 2.163444e-41 0.000000e+00
#> Other  4.122879e-04 4.122879e-04
#> USA    0.000000e+00 6.777456e-50
#> 
#> 
#> $gather
#>   var.y  var.x freq  prop     rprop      cprop expected std.residuals
#> 1    No Europe   39 0.039 0.5416667 0.07587549   37.008     0.3274474
#> 2    No France  212 0.212 0.3504132 0.41245136  310.970    -5.6123445
#> 3    No  Other    6 0.006 0.2307692 0.01167315   13.364    -2.0143992
#> 4    No    USA  257 0.257 0.8653199 0.50000000  152.658     8.4449945
#> 5   Yes Europe   33 0.033 0.4583333 0.06790123   34.992    -0.3367479
#> 6   Yes France  393 0.393 0.6495868 0.80864198  294.030     5.7717531
#> 7   Yes  Other   20 0.020 0.7692308 0.04115226   12.636     2.0716146
#> 8   Yes    USA   40 0.040 0.1346801 0.08230453  144.342    -8.6848595
#>   adj.residuals         or       pem         phi    perm.pval freq.x freq.y
#> 1      0.487584  1.1270813   5.69273  0.01541876 3.298231e-01     72    514
#> 2    -12.809366  0.1661190 -51.55493 -0.40506773 2.163444e-41    605    514
#> 3     -2.927844  0.2751969 -55.10326 -0.09258656 4.122879e-04     26    514
#> 4     14.447862 11.1500000  72.28804  0.45688150 0.000000e+00    297    514
#> 5     -0.487584  0.8872474  -5.69273 -0.01541876 3.298231e-01     72    486
#> 6     12.809366  6.0197809  51.55493  0.40506773 0.000000e+00    605    486
#> 7      2.927844  3.6337625  55.10326  0.09258656 4.122879e-04     26    486
#> 8    -14.447862  0.0896861 -72.28804 -0.45688150 6.777456e-50    297    486
#>   prop.x prop.y
#> 1  0.072  0.514
#> 2  0.605  0.514
#> 3  0.026  0.514
#> 4  0.297  0.514
#> 5  0.072  0.486
#> 6  0.605  0.486
#> 7  0.026  0.486
#> 8  0.297  0.486
#>