2.1 Summary Categorical Data
Frequency Distribution:
A frequency distribution is a tabular summary of data showing the number (frequency) of items in each of several nonoverlapping classes.
Relative Frequency:
Relative frequency of a class = Frequency of the class / n
2.2 Summarizing Quantitative Data
1. Determine the number of non overlapping classes. (5 -20) 問:怎麼科學地決定多少個?
2. Determine the width of each class.
Approximate class width = (largest data value - smallest data value) / Number of classes
3. Determine the class limits 決定最大值和最小值.
Plot the data:
1. Dot plot
2. Histograms:
3. A variation of the frequency distribution that provides another tabular summary of quantitative data is the cumulative frequency distribution.
Ogive is a graph of a cumulative distribution.
2.3 Stem-and-leaf display
Stem-and-leaf display can be used to show both the rank order and shape of a data set simultaneously. 相對於histogram的好處,手寫容易,更多信息
1. Crosstabulation
A cross tabulation is a tabular summary of data for two variables.
2. Simpson's Paradox
The reversal of conclusions based on aggregate and unaggregated data is calledSimpson’s paradox.
我們可以看到在總體的數據中,Kendall的表現比 Luckett 好,但是當我們分別去看 unaggregated 數據的時候,在 Common Pleas 和 Municipal Court 的表現,Luckett 反而更好,這個原因在於兩個的數據基數差太大。
3. Scatter diagram & Trendline
A scatter diagram is a graphical presentation of the relationship between two quantitative variables,
and a trendline is a line that provides an approximation of the relationship.