Chapter 4: Correlation and Reliability
This exercise illustrates correlation, reliability of measurement, and aggregation. It uses the PERS dataset, consisting of 90 cases and 968 variables. The variables represent measures of traits and relevant behaviors for the dimensions of extraversion (outgoingness) and conscientiousness, reported each week for three weeks by a group of undergraduate psychology students.
Do different behavioral indicators of a personality trait correlate well with each other? If you are neat about your personal appearance, are you also neat in your school class notes, and neat about your home? If you are talkative, do you also like to be around a lot of people, like to be assertive, and like to show enthusiasm?
This exercise will examine the relationships between different behaviors which are related to a common personality trait. It will illustrate that when these behaviors are measured at one point in time, they are related less strongly to each other than when they are measured over several points in time and aggregated (totaled up). Single-occasion behavioral measures are less reliable than those which are taken repeatedly over time and aggregated. This is because situational, circumstantial, and idiosyncratic factors vary from one occasion to another, and contribute to error in the measurement of the behavior at a single occasion.
Unreliability of single-occasion measurements.
For many years, psychologists looked for evidence of human "personality" in the consistency of various behaviors thought to be related to a particular personality trait. For example, if the trait of interest was outgoingness, or extraversion as it is termed by researchers, a list of relevant behaviors which would illustrate this trait might be measured for a group of people. This might include sociability indicators such as the number of people you have contact with or interact with, indicators about the nature of your social interactions, such as whether you are dominant, assertive, and energetic, and perhaps the quality and quantity of talking that you do in various situations. The assumption would be that outgoing people have a general personality trait which predisposes them to be consistent across these indicators. Thus, a person high in outgoingness should interact with more people, be more assertive in their interactions, talk more, etc., while a less outgoing person should do these all to a lesser degree. Historically, studies would measure these various trait indicators at one point in time—ignoring the inherent unreliability of such a measure, since these behaviors may vary from day to day or week to week in the same person, even if the personality of such a person was relatively consistent and "fixed."
We will use a simple example of three behaviors, the daily measures for neatness: DCN1, time spent on appearance today; DCN2, time spent on home neatness today;
DCN3, neatness of class notes today. Start by running simple Pearson (Bivariate) correlations between these 3 variables for the Week #1 reports only (W1DCN1, W1DCN2, W1DCN3), and include the Option for Means and Standard deviations in your output. Now run simple Pearson (Bivariate) correlations between these same 3 variables for Week#2 (W2DCN1, W2DCN2, W2DCN3), again for Week#3 (W3DCN1, W3DCN2, W3DCN3), and finally for the aggregated three-week averages (DCN1, DCN2, DCN3), also including the Option for Means and Standard deviations in your outputs.
What you should notice about the sets of results is that, in many cases, the correlation between corresponding measures is higher when the three-week averages are used, compared to the same correlation when a single-week measure is used. For example, the correlation between W1DCN1 (time spent on appearance) and W1DCN3 (neatness of class notes) is .316. This same correlation for the Week#2 report is .284, and for the Week#3 report the correlation is .369. However, for the three-week average, the correlation between DCN1 and DCN3 is .393. This is best understood by the fact that, on any given week, consistent personal tendencies of the participant (to be neat in appearance or be neat in their class notetaking) may be overshadowed by situational and circumstantial factors. As the number of weeks increases, these fluctuating situational and circumstantial factors average themselves out, and any consistencies in personal tendencies then begin to show up. In other words, the aggregated 3-week measures are more reliable—they are more consistent and have less error associated with them.
Examine the corresponding correlations between the week#1 variables and the aggregated variables. In each case, there are three correlations between neatness measures to compare: 1 with 2, 1 with 3, and 2 with 3. In every case, you will find that the correlation for the three-week aggregate measures (e.g., DCN1 with DCN2) is larger than the corresponding correlation for the single-week measures (e.g., W1DCN1 with W1DCN2; W2DCN1 with W2DCN2; W3DCN1 with W3DCN2). It appears that aggregating across weeks improves the reliability of the measures, which increases their correlations with other variables.
You can quantify this difference in the correlations for the aggregated data by getting an average correlation for each set of results. Since Pearson correlations are skewed, statisticians suggest either transforming them using the "Fisher r to Z transformation" then finding the mean of these transformed correlations, or simply using the median of the original correlations, since the median is relatively unaffected by extreme scores. We will find the median of the three correlations for the Week#1, Week#2, Week#3, and aggregated measures.
Remember that in correlation matrix output (such as from SPSS), each correlation appears twice—above, as well as below the diagonal values of 1.000 (the correlation of the variable with itself). You should use only the correlations below the diagonal OR above the diagonal (not both), and DO NOT use the "1.000" correlations.
For example, for the Week #1 measures, you would find the median for the following correlations: .462, .316, .262. Since the median is simply the middle value when the values are put in order, the median is .316. Find the medians for Week#2, Week#3, and the aggregated measures. You should have found the following: Week#2 = .290; Week #3 = .369; aggregated measures = ..394. So on the average (using the median), the three neatness behaviors are more strongly correlated when 3-week aggregates are used (median = .394), than when using data for any single week (medians = .316, .290, .369). By the way, if the correlations are transformed using the Fisher Z procedure, the means of these transformed correlations follow the same pattern we see in the medians (e.g., Week#1 = .350; aggregated measures = .420).
Try the same exercise with a different set of measures, for example the daily responsibility measures (W1DCR1 to W1DCR7, W2DCR1 to W2DCR7, etc.). Remember to first transform the reverse-scored items (1,2, 5, 6, 7) according to the suggestions in the codebook under "Recoding Suggestions for Reverse-scored Items." You will have 21 correlations to examine among these seven responsibility measures. As with the neatness measures above, you can see if the aggregate measures correlate with each other better than the single-week measures. You will find that this is not true in some cases, suggesting that situational, circumstantial, or idiosyncratic influences play a large role in these behaviors.