Descriptive statistics for bivariate data: introduction
Student learning outcomes
By the end of this chapter, the student should be able to:
- Identify different types of relationships between variables: categorical-categorical, categorical-numerical, and numerical-numerical
Categorical - Categorical
- Know how to make and summarize a contingency table between two categorical variables
- Discuss and describe patterns in contingency tables and supporting graphic summaries
Categorical - Numerical
- Be able to build side-by-side histograms, back-to-back stemplots, and multiple box-plots between a categorical and a numeric variable.
- Compare and contrast multiple groups based on their shape, center, and spread.
Numerical - Numerical
- Discuss basic ideas of the relationship between two numeric variables including scatter plots, linear regression and correlation
- Create and interpret a line of best fit
- Calculate and interpret the correlation coeficient
Introduction
In the previous chapters you have explored how to organize, summarize, and discuss both univariate categorical and numerical variables. Usually more interesting questions are to examine relationships between sets of variables. For example, are men or women more likely to purchase coffee at a coffee shop or do students that spend more time studying for an exam really do better? We can start to answer these questions by examining the relationship between the variables: categorical and categorical variables, categorical and numerical variables, or numerical and numerical variables. We will do this by producing graphs, calculating summary statistics, and making comparisons.
Categorical-categorical relationships: contingency tables
When we want to examine a relationship between two categorical variables we build a contingency table. Tables that summarize two categorical variables are called contingency tables. These tables are also called two-way tables , crosstabulations , summary tables , or pivot tables (in Microsoft Excel).
The table below is an example of a contingence table. It presents the number of students that fall into six different groups. The groups are based on answers to two categorical questions. The first question asked for gender (Female or Male) and the second asked for the type of transportation a student typically uses to go to school each day (bicycle, car, or walking).
In this situation, we get two measurements from each person: a person’s gender, and a person’s type of transportation. One variable is represented by the rows (GENDER) and the other variable is represented by the columns (TRANSPORTATION). There are two values for gender (Female or Male) and three possible values for transportation (Bike, Car, or Walk). Each person can have only one value for gender and only one value for transportation. Together, the two variables are said to make a 2 x 3 table, two rows and three columns. This creates a table with six cells. The cells are the boxes that are outlined with a heavy line.