NOTE: If you would
like to search for a word on the site, please utilize the "search" box
located on the mainpage.
If you have terms you believe should be added to this glossary or if you have suggestions that would make a definition better, please click here:
analysis of public records: using data collected by government, industry or other organizations for social science analysis.
average: a term that refers to the central tendency or typical score of a bivariate distribution, it could refer to statistical average (sum of all cases/number of cases), median (the middle point in a distribution) or the mode, (the most frequently occurring classification of a variable).
alternative hypothesis: the alternative hypothesis for chi square is that the two variables are related to each other (same as research hypothesis).
analyze: to break into component parts and study those parts to gain a better understanding of the whole.
antecedent variable: variable that is causally prior to both independent and dependent variables.
bivariate analysis: the analysis of the relationship between two variables.
binomial variable:dichotomous variable (has only two categories or values)
census: a collection of data from all the units (individuals, groups, events, etc.) of a population that a researcher is interested in making statements about or referring to in his/her hypothesis. An example is the U.S. Census given every 10 years.
codebook: a written documentation giving the names, location and description of variables in a dataset e.g. the CSURVY codebook in this appendix
conceptualization: the process of clarifying and specifying the meaning(s) of variables in a problem statement or hypothesis in order to facilitate examination of relevant research by refining and developing a clear, precise, testable hypothesis
confidence Levels and confidence intervals: a range of values around the point estimate (a statistic value) for a population that express the degree we are confident that the point estimate accurately reflects the population parameter e.g. common confidence 90%, 95%,and 99% and one would say that we are 95% confident that the population mean lies between plus or minus 7.5% points of the observed statistic value
control variable/ (test variable): a variable you believe to be like the independent variable in that it is also related to the dependent variable (e.g., gender and race both seem to be related to income) and the researcher tries to clarify the relationship between the three variables. The researcher divides (controls) the dataset into sub-groups by categories of the new variable(s) (e.g., gender, an antecedent variable, would divide the dataset into two groups (1) female, (2) male)
correlation: an association or degree of agreement between two or more variables. The relationship may be linear (positive[direct] or negative[inverse]) or curvilinear e.g. the relationship between gender and income is a positive linear relationship
causal relationship: a relationship in which one variable (i.e., the independent variable) is assumed to affect or influence the other variable (i.e., the dependent variable).
chi square: statistic used to test the hypothesis that two variables are not related to each other.
cohort study: a study that follows one or more cohorts over a period of time.
column percents: percentages in a table that add down to 100.
concordant pair: a pair of cases in which one member of the pair is higher than the other member on both variables.
contingency table: (same as crosstabulation).
control variable: the variable that is held constant in a three-variable table (same as test variable).
Cramer's V: measure of association appropriate when one or both of the variables consists of unordered categories.
cross-sectional study: a study that includes data gathered at one point in time.
crosstabulation: table showing the number of cases in each combination of categories of the column and row variables (same as contingency table).
data: values, numerical or symbolic, that represent observations for a variable
deductive reasoning: a logical process of developing specific hypothesis from a general principal or "theory". For example from the general statement "Birds of a feather flock together" we could hypothesize "hummingbirds will fly about in groups," or for criminal behavior "those juveniles associating with juvenile delinquents have a high probability of becoming juvenile delinquents."
dependent variable: a variable that is hypothesized to be caused by, or depend on, another variable, the independent variable. e.g. in a hypothesized relationship between gender and income, income is a dependent variable in that it occurs after gender occurs and is thought to be determined or caused to some extent by the gender of a person.
descriptive statistics: data summary techniques describing characteristics of a variable (means, median, mode, range, standard deviation) or the relationship between variables (correlation).
dispersion: the distribution or variability of values for a variable e.g. range, standard deviation
data: information organized for analysis. (It is a plural noun. Although it is often used with a singular construction in casual speech, the singular is datum.)
discordant pair: a pair of cases in which one member of the pair is higher than the other member on one of the variables, but lower on the other variable.
experiment: a research method that involves attempts to create changes using controlled, systematic, carefully observed conditions
explanation: a term used the analysis of the relationship between three variables, the elaboration model, to refer to the situation where a control variable causes the previous relationship between the independent and dependant variable to disappear. An example would be where the sub groups divided by the control "race" no longer showed a difference between male and females in terms of wages.
elaboration: the process of adding a control variable into the analysis.
expected frequencies: for chi square, the number of cases that would be expected in a particular cell of the crosstab if the two variables were unrelated to each other.
explanation: outcome of elaboration in which the control variable reduces the relationship between the independent and dependent variables.
frequency distribution: a list of categories of a variable and their corresponding frequencies (the number of occurrences for each category), percents and relative percents
frequency distribution: set of categories and the number of cases in each category.
Gamma: measure of association appropriate when both of the variables consist of ordered categories.
hypothesis testing: an empirical test to see if there is support for a proposed hypothesis. An example is surveying workers on their yearly income and gender to determine if there is a relationship between gender and income.
hypothesis: a proposed relationship between two or more variables, may be causal or simply a relationship e.g. a proposed relationship between a persons gender and income.
index: composite measures of variables, a sum of individual variables that represent a more general variable. For example the FBI Crime index, a summary of criminal offenses reported, and the seven majors index (same as the FBI Crime index but does not include arson or thefts under $200).
inductive reasoning: a logical process of developing generalizations "theories" based on specific observations e.g. one sees ducks flying south together for the winter and generalizes "Birds of a feather flock together" or for criminal behavior "Juvenile delinquents associate with other juvenile delinquents" [Note: this example is used to encourage you to think about possible errors such as over generalization, inaccurate observation, etc. in this type of reasoning]
inferential statistics: techniques designed to specify estimates and confidence in estimates, of a population based on data collected from a sample
interpretation: a term used in the analysis of the relationship between three variables, the elaboration model, to refer to the situation where a control variable is found to be the mediating factor in a bivariate relationship
interval measure: the level of measurement that is characterized as being mathematically isomorphic with arithmetic. All math functions for nominal and ordinal measures plus an arbitrary zero point, and equal distance between units of measurement. Statistically appropriate techniques include arithmetic mean, standard deviation, r2 Z,T,F . Examples include temperature and time measurement.
interview: a research method in which the researcher (interviewer) asks questions to another (respondent). Interviews are person to person and are conducted either face to face or telephone.
independent variable: the variable that affects another variable.
interpretation: outcome of elaboration in which the control variable shows how one variable is related to another.
intervening variable: a third variable that is causally prior to the dependent variable, but not to the independent variable.
kurtosis: (Interval/ratio level ) measures the degree of pointedness or flatness of a distribution curve compared to the normal curve where 0 is a normal curve, a plus value is more pointed and a negative value is flatter then a normal curve
longitudinal study: a study that focuses on data gathered at two or more points in time
mean: a term that refers to the central tendency, specifically the statistical average of a set of observations (sum of all cases/number of cases). For example if you had the values 1,2,3,4,5 the mean would be 3. The mean is symbolized by roman symbol X for a sample and by the Greek symbol µ for the population
median: a term that refers to the central tendency or typical score of a bivariate distribution, specifically the median is the numerical value of middle case in a distribution. For example if you had the values 6, 10, 12, 12, 13, 14, 16 the median would be 12.
mode: a term that refers to the central tendency or typical score of a bivariate distribution, specifically the most frequently occurring classification of a variable. For example if you had the values 6, 10, 12, 12, 13, 14, 16 the mode would be 12
multivariate analysis: an examination of the relationship between three or more variables e.g. in this model the analysis using the elaboration model.
measure of association: measure of the strength of the relationship between two variables.
missing data: cases for which we do not know the proper category into which to place the case.
multistage cluster sample: type of probability sample in which the population is divided into clusters, and then the clusters are sampled.
multivariate analysis: the analysis of three or more variables simultaneously.
negative relationship: a relationship is which large values of one variable tend to go with small values of the other variable.
nonprobability sample: type of sample in which every case in the population does not have a known chance of selection.
nonsampling error: sources of error other than sampling error.
null hypothesis: the null hypothesis for chi square is that the two variables are unrelated to each other.
ordinal measurement: a level of measurement that is characterized by having a rank or order to the measurement of the variable attributes, mathematically this allows all functions of nominal measures plus ranking -- making greater than and less than comparisons. Statistically acceptable techniques are median, Interquartile range, Spearman's rho, Mann-Whitney U. Examples include: military grades (private, sergeant etc.), most attitude question choices [(1)strongly agree to (6) strongly disagree]
observed frequencies: for chi square, the number of cases actually observed in a particular cell of the table.
opinion--statement of one's beliefs or feelings.
participant observation: a specification of field study which specifies the participation and observation components.
populations: (universe) The complete set of units (individuals, groups, events, families, cities, counties, automobiles that a researcher wants to study, make statements about , or refer to in his/her hypotheses and conclusions.
panel mortality: the loss of cases in a panel study at later points in time
panel study: a study that compares the same cases at two or more points in time.
partial table: the table for each category of the control variable in a three-variable table.
percent distribution: the set of categories and the percent in each.
positive relationship--a relationship in which large values of one variable tend to go with large values of the other variable and small values of one variable tend to go with small values of the other variable.
population--all the individuals or cases of interest.
probability sample: type of sample in which every individual in the population has a known, nonzero, chance of being selected in the sample.
quota sample: type of nonprobability sample which assigns interviewers quotas on preselected variables.
rate: the number of occurrences of an event for a standard base, e.g. electrical rate (cost of .06898 for each kilowatt per day), water rate (cost per 1000 gallons of water used), crime rate (number of crimes per 100,000 population)
ratio measure: a level of measurement that is characterized by being isomorphic with math as well as having a natural zero point. Ratio measures mathematically are isomorphic to arithmetic, meet all assumptions of arithmetic. Statistical functions appropriate include all functions of nominal, ordinal, interval measures as well as natural zero point based techniques. An example of a statistical technique for this level of measurement is the Geometric mean. Examples of ratio measures include weight, population density, age, income).
recode: classification of a variable's attributes into a smaller number of discrete categories for presentation or in order to test and observe the results of the new classification. An example is the classification of educational achievement into grade 1-12 as category (1) and more then 12 years as category (2). This would allow one to more easily examine data and see if a high school degree made a significant difference. Other examples of recoding include classification of crime rates into (1) High, (2) Medium, and (3) Low.
replication (elaboration model): a term used with the elaboration model to refer to the situation where a control variable results in the same distributions for partials that occurred in the original table of dependant by independent. For example if a sample were divided into ethnic groups and a relationship were still found between gender and income.
representativeness: the attempt to ensure that a sample has the same characteristics as the population from which it is selected. Statistically this is sought by random sampling techniques
random-digit dialing: method of selecting telephone numbers for a sample, numbers are dialed randomly within working exchanges (i.e., the first three digits of a telephone number).
reactivity: the ways in which the process of asking people questions affects their answers to other questions.
recoding: the process of combining categories within a variable.
replication: outcome of elaboration in which the control variable has no effect on the original relationship between the two variables.
research hypothesis: the hypothesis for chi square that states that two variables are related to each other (same as alternative hypothesis).
row percents: percentages in a table that add across to 100 percent.
rule for computing percents: if the independent variable is the column variable, the percents should sum down to 100; if the independent variable is the row variable, the percents should sum across to 100.
rule for interpreting percents: compare in the direction opposite to the way the percents sum to 100.
sampling error: the error in estimates of a population due to the variation possible in samples
scale: composite measures of variables that form a pattern logical/or empirical. Scores thus indicate a pattern not just a sum of component variables, e.g. Bogardus social distance scale.
secondary analysis: a form of research where data collected by others, researchers, government agencies or organizations for their own purposes is used for a different research purpose e.g. the officially collected census and criminal justice data used in this module for research purposes
secondary data analysis: a research method where data collected by one researcher is reanalyzed by another researcher for either different or the same purpose
significance (statistical): refers to the probability that an occurrence, a relationship or a distribution did not occur by chance, was not due to sampling error (obtaining an unusual, non typical sample). A test of significance determines the probability a relationship or distribution is real and not due to chance. See significance (substantive) for popular usage.
significance (statistical): the likelihood that a table distribution, a relationship, etc. could have occurred by chance.
significance (substantive): refers to the socially defined importance of a finding e.g.
simple Random Sample (SRS): a sample of a population where all units in the population have an equal chance of being chosen. This is a basic assumption for data analysis using statistics e.g. if the names of all people in a school were placed on an index card, mixed in a bowl and drawn one at a time to determine the sample
skewness(Interval/ratio level): measures the degree to which a variable approximates a normal curve. Mathematically it measures the deviation from symmetry. A 0 is completely symmetrical while a plus number is skewed to the right and a minus number is skewed to the left.
specification: an outcome in using the elaboration modeling which the control variable identifies the conditions under which the relationship between the independent and dependant variables occurs. e.g.
standard deviation: (Interval/ratio level of measurement), a standardized measure of dispersion of the data around the mean, mathematically the standard deviation is the square root of the variance and thus is standardized on the units measured and intrinsically more understandable. For example the standard deviation for the age of an adult sample could be 25 years which means that 68.2% of the population is between the mean -25 and the mean +25. If the mean is 31 then 68.2% of the population is between 6 and 56 years of age.
standard error: *the standard deviation of all possible sample means that could be drawn from a population. The standard error estimates for us how much disagreement there might be between our sample mean and the population mean
statistic: number value that represents a summary characteristic of a sample drawn from a population. It is a point estimate of the parameter value for the population. Statistics are usually designated by roman letters (e.g. the mean weight of a sample `X = 135lbs).
survey: a research method using standardized questionnaires or interviews to gather data.
sample: subset of the population used for study.
sampling: the process of selecting a subset of the population for study.
sampling error: random or unsystematic error resulting from selecting a sample from a population.
simple random sample: sample in which every case or combination of cases has the same chance of being selected in the sample.
specification: outcome of elaboration in which the control variable specifies the conditions under which the relationship between the independent and dependent variables varies.
spurious: a relationship that disappears completely or decreases substantially when an antecedent control variable is introduced into the analysis.
statistical inference: the process of making generalizations about the population from sample data.
statistical significance: the observed difference is large enough to conclude that it is probably not due to chance factors.
substantive significance: the observed difference is large enough to be considered important.
trend study: a study that uses cross-sections at two or more points in time to examine change over time.
univariate analysis: the analysis of one variable at a time, e.g., frequency or percent distributions, averages.
variable: a condition that changes, a property of interest, the basis on which values or attributes are assigned to members of the population e.g., sex, height, weight, income.
variance: (Interval/ratio level), a measure of dispersion of the data around the mean, mathematically the variance is the average squared deviation from the mean. For example the variance for the age of an adult sample could be 625.
variable: trait or characteristic that may change from case to case.
volunteer sample: type of nonprobability sample in which individuals volunteer to be part of the sample.