©SSRIC; Last Modified  14 August 2000

A BCDEFGHIJKLMNOPQRSTUVWY

NOTE: If you would like to search for a word on the site, please utilize the "search" box
located on the mainpage.

If you have terms you believe should be added to this glossary or if you have suggestions that would make a definition better, please click here:

Glossary Suggestions


A

alternative hypothesis (H1): the statement that is believed to describe the relationship between an independent and dependant variable. H1 is retained when the null hypothesis is rejected. H1 Reflects the operationalization of the research hypothesis e.g. there is a relationship between crime and population density.

analysis of public records: using data collected by government, industry or other organizations for social science analysis.

average: a term that refers to the central tendency or typical score of a bivariate distribution, it could refer to statistical average (sum of all cases/number of cases), median (the middle point in a distribution) or the mode, (the most frequently occurring classification of a variable).

alternative hypothesis: the alternative hypothesis for chi square is that the two variables are related to each other (same as research hypothesis).

analyze: to break into component parts and study those parts to gain a better understanding of the whole.

antecedent variable: variable that is causally prior to both independent and dependent variables.

 

BACK


B

birth cohort: group of people born in the same time period so they are likely to share many common experiences.

bivariate analysis: the analysis of the relationship between two variables.

binomial variable:dichotomous variable (has only two categories or values)

 

BACK


C

cause: the reason(s) for an event, situation or state occurring. There are many philosophical issues about cause but for statistical causality four criteria must be met (1) concomitant variation, (2) rule out spurious relationships, (3) the cause must precede the effect in time, (4) there must be a theoretical rationale

census: a collection of data from all the units (individuals, groups, events, etc.) of a population that a researcher is interested in making statements about or referring to in his/her hypothesis. An example is the U.S. Census given every 10 years.

codebook: a written documentation giving the names, location and description of variables in a dataset e.g. the CSURVY codebook in this appendix

conceptualization: the process of clarifying and specifying the meaning(s) of variables in a problem statement or hypothesis in order to facilitate examination of relevant research by refining and developing a clear, precise, testable hypothesis

confidence Levels and confidence intervals: a range of values around the point estimate (a statistic value) for a population that express the degree we are confident that the point estimate accurately reflects the population parameter e.g. common confidence 90%, 95%,and 99% and one would say that we are 95% confident that the population mean lies between plus or minus 7.5% points of the observed statistic value

control variable/ (test variable): a variable you believe to be like the independent variable in that it is also related to the dependent variable (e.g., gender and race both seem to be related to income) and the researcher tries to clarify the relationship between the three variables. The researcher divides (controls) the dataset into sub-groups by categories of the new variable(s) (e.g., gender, an antecedent variable, would divide the dataset into two groups (1) female, (2) male)

correlation: an association or degree of agreement between two or more variables. The relationship may be linear (positive[direct] or negative[inverse]) or curvilinear e.g. the relationship between gender and income is a positive linear relationship

causal relationship: a relationship in which one variable (i.e., the independent variable) is assumed to affect or influence the other variable (i.e., the dependent variable).

chi square: statistic used to test the hypothesis that two variables are not related to each other.

cohort study: a study that follows one or more cohorts over a period of time.

column percents: percentages in a table that add down to 100.

concordant pair: a pair of cases in which one member of the pair is higher than the other member on both variables.

contingency table: (same as crosstabulation).

control variable: the variable that is held constant in a three-variable table (same as test variable).

Cramer's V: measure of association appropriate when one or both of the variables consists of unordered categories.

cross-sectional study: a study that includes data gathered at one point in time.

crosstabulation: table showing the number of cases in each combination of categories of the column and row variables (same as contingency table).

 

BACK


D

data set: the collection of all data for a sample or population, may include commands making it a "system file" or SPSS "save file." An example is the dataset with computer description commands used for this module CJQLSAV

data: values, numerical or symbolic, that represent observations for a variable

deductive reasoning: a logical process of developing specific hypothesis from a general principal or "theory". For example from the general statement "Birds of a feather flock together" we could hypothesize "hummingbirds will fly about in groups," or for criminal behavior "those juveniles associating with juvenile delinquents have a high probability of becoming juvenile delinquents."

dependent variable: a variable that is hypothesized to be caused by, or depend on, another variable, the independent variable. e.g. in a hypothesized relationship between gender and income, income is a dependent variable in that it occurs after gender occurs and is thought to be determined or caused to some extent by the gender of a person.

descriptive statistics: data summary techniques describing characteristics of a variable (means, median, mode, range, standard deviation) or the relationship between variables (correlation).

dispersion: the distribution or variability of values for a variable e.g. range, standard deviation

data: information organized for analysis. (It is a plural noun. Although it is often used with a singular construction in casual speech, the singular is datum.)

discordant pair: a pair of cases in which one member of the pair is higher than the other member on one of the variables, but lower on the other variable.

 

BACK


E

ecological fallacy: making statements about one unit of analysis based on a study of a different unit of analysis e.g. studying crime rates in California counties and making statements about individuals and crime.

experiment: a research method that involves attempts to create changes using controlled, systematic, carefully observed conditions

explanation: a term used the analysis of the relationship between three variables, the elaboration model, to refer to the situation where a control variable causes the previous relationship between the independent and dependant variable to disappear. An example would be where the sub groups divided by the control "race" no longer showed a difference between male and females in terms of wages.

elaboration: the process of adding a control variable into the analysis.

expected frequencies: for chi square, the number of cases that would be expected in a particular cell of the crosstab if the two variables were unrelated to each other.

explanation: outcome of elaboration in which the control variable reduces the relationship between the independent and dependent variables.

 

BACK


F

field study (field research): a research method involving observation in natural settings. Field studies vary in the degree the researcher observes or participates.

frequency distribution: a list of categories of a variable and their corresponding frequencies (the number of occurrences for each category), percents and relative percents

frequency distribution: set of categories and the number of cases in each category.

 
BACK


 

G

Gamma: measure of association appropriate when both of the variables consist of ordered categories.

 

BACK

 

H

histogram: a graph of a frequency distribution in which bars extend from the exact limits of the class intervals to indicate the frequency of individual classifications.

hypothesis testing: an empirical test to see if there is support for a proposed hypothesis. An example is surveying workers on their yearly income and gender to determine if there is a relationship between gender and income.

hypothesis: a proposed relationship between two or more variables, may be causal or simply a relationship e.g. a proposed relationship between a persons gender and income.

 

BACK

 

I

independent variable: the causal variable in a hypothesized relationship, it is proposed to be independent of, prior to, deterministic of, a dependant variable e.g. in a hypothesized relationship between gender and income, gender is an independent variable in that it is prior to and thought to be a determiner of income.

index: composite measures of variables, a sum of individual variables that represent a more general variable. For example the FBI Crime index, a summary of criminal offenses reported, and the seven majors index (same as the FBI Crime index but does not include arson or thefts under $200).


inductive reasoning: a logical process of developing generalizations "theories" based on specific observations e.g. one sees ducks flying south together for the winter and generalizes "Birds of a feather flock together" or for criminal behavior "Juvenile delinquents associate with other juvenile delinquents" [Note: this example is used to encourage you to think about possible errors such as over generalization, inaccurate observation, etc. in this type of reasoning]

inferential statistics: techniques designed to specify estimates and confidence in estimates, of a population based on data collected from a sample

interpretation: a term used in the analysis of the relationship between three variables, the elaboration model, to refer to the situation where a control variable is found to be the mediating factor in a bivariate relationship


interval measure: the level of measurement that is characterized as being mathematically isomorphic with arithmetic. All math functions for nominal and ordinal measures plus an arbitrary zero point, and equal distance between units of measurement. Statistically appropriate techniques include arithmetic mean, standard deviation, r2 Z,T,F . Examples include temperature and time measurement.

interview: a research method in which the researcher (interviewer) asks questions to another (respondent). Interviews are person to person and are conducted either face to face or telephone.

independent variable: the variable that affects another variable.

interpretation: outcome of elaboration in which the control variable shows how one variable is related to another.

intervening variable: a third variable that is causally prior to the dependent variable, but not to the independent variable.

 

BACK

 

K

kurtosis: (Interval/ratio level ) measures the degree of pointedness or flatness of a distribution curve compared to the normal curve where 0 is a normal curve, a plus value is more pointed and a negative value is flatter then a normal curve

 

BACK

 

L

level of significance: used in connection with tests of significance to indicate the probability that a finding could have occurred by chance. Typically used are the .001 level (could have occurred only 1 out of 1000 samples by chance), .01 and .05 level

longitudinal study: a study that focuses on data gathered at two or more points in time

 

BACK

 

M

mapping: creating and displaying data on a map of a geographical or political area that uses different patterns or colors to indicate the distribution categories of a variable

mean: a term that refers to the central tendency, specifically the statistical average of a set of observations (sum of all cases/number of cases). For example if you had the values 1,2,3,4,5 the mean would be 3. The mean is symbolized by roman symbol X for a sample and by the Greek symbol µ for the population

median: a term that refers to the central tendency or typical score of a bivariate distribution, specifically the median is the numerical value of middle case in a distribution. For example if you had the values 6, 10, 12, 12, 13, 14, 16 the median would be 12.

mode: a term that refers to the central tendency or typical score of a bivariate distribution, specifically the most frequently occurring classification of a variable. For example if you had the values 6, 10, 12, 12, 13, 14, 16 the mode would be 12

multivariate analysis: an examination of the relationship between three or more variables e.g. in this model the analysis using the elaboration model.

measure of association: measure of the strength of the relationship between two variables.

missing data: cases for which we do not know the proper category into which to place the case.

multistage cluster sample: type of probability sample in which the population is divided into clusters, and then the clusters are sampled.

multivariate analysis: the analysis of three or more variables simultaneously.

 

BACK

 

N

null hypothesis(H0): a statistical test/research hypothesis that is a statement that there is no relationship between the independent and dependent variables in the population. The research goal is to reject the null and thus lend support to (not "prove") the alternative hypothesis. An example of a null hypothes is "there is no relationship between crime and population density."

negative relationship: a relationship is which large values of one variable tend to go with small values of the other variable.

nonprobability sample: type of sample in which every case in the population does not have a known chance of selection.

nonsampling error: sources of error other than sampling error.

null hypothesis: the null hypothesis for chi square is that the two variables are unrelated to each other.

 

BACK

 

O

operational definition: a specific, precise statement of a concept (variable) in terms of the operations used to measure and categorize observations. For example the operational definition for gender could be the choice selected by a person answering the question "What is your gender ___(1) female ____(2) male" or in secondary analysis the operational definition of crimes could be the number of crimes reported by official agencies (police, sheriff) of a county in their annual report.

ordinal measurement: a level of measurement that is characterized by having a rank or order to the measurement of the variable attributes, mathematically this allows all functions of nominal measures plus ranking -- making greater than and less than comparisons. Statistically acceptable techniques are median, Interquartile range, Spearman's rho, Mann-Whitney U. Examples include: military grades (private, sergeant etc.), most attitude question choices [(1)strongly agree to (6) strongly disagree]

observed frequencies: for chi square, the number of cases actually observed in a particular cell of the table.

opinion--statement of one's beliefs or feelings.

 

BACK

 

P

population parameters: number value that represents a summary characteristics of a population (in contrast to the statistics that describe sample data). The Greek letter mu (µ) signifies a population's mean. An example would be: the mean weight of a human population is 135 lbs.

participant observation: a specification of field study which specifies the participation and observation components.

populations: (universe) The complete set of units (individuals, groups, events, families, cities, counties, automobiles that a researcher wants to study, make statements about , or refer to in his/her hypotheses and conclusions.

panel mortality: the loss of cases in a panel study at later points in time

panel study: a study that compares the same cases at two or more points in time.

partial table: the table for each category of the control variable in a three-variable table.

percent distribution: the set of categories and the percent in each.

positive relationship--a relationship in which large values of one variable tend to go with large values of the other variable and small values of one variable tend to go with small values of the other variable.

population--all the individuals or cases of interest.

probability sample: type of sample in which every individual in the population has a known, nonzero, chance of being selected in the sample.

 

BACK

 

Q

quota sample: type of nonprobability sample which assigns interviewers quotas on preselected variables.

 

BACK

 



 

R

range: (requires ordinal measurement) a measure of dispersion; the lowest possible value (minimum) to the highest possible value (maximum) or a number representing the difference between the highest and lowest value for a set of observations. For example the weight range for a sample of adults could be 90 lbs. (minimum) to 300lbs (maximum)or expressed as the difference between 300 and 90 as a range of 210.

rate: the number of occurrences of an event for a standard base, e.g. electrical rate (cost of .06898 for each kilowatt per day), water rate (cost per 1000 gallons of water used), crime rate (number of crimes per 100,000 population)

ratio measure: a level of measurement that is characterized by being isomorphic with math as well as having a natural zero point. Ratio measures mathematically are isomorphic to arithmetic, meet all assumptions of arithmetic. Statistical functions appropriate include all functions of nominal, ordinal, interval measures as well as natural zero point based techniques. An example of a statistical technique for this level of measurement is the Geometric mean. Examples of ratio measures include weight, population density, age, income).

recode: classification of a variable's attributes into a smaller number of discrete categories for presentation or in order to test and observe the results of the new classification. An example is the classification of educational achievement into grade 1-12 as category (1) and more then 12 years as category (2). This would allow one to more easily examine data and see if a high school degree made a significant difference. Other examples of recoding include classification of crime rates into (1) High, (2) Medium, and (3) Low.

replication (elaboration model): a term used with the elaboration model to refer to the situation where a control variable results in the same distributions for partials that occurred in the original table of dependant by independent. For example if a sample were divided into ethnic groups and a relationship were still found between gender and income.

representativeness: the attempt to ensure that a sample has the same characteristics as the population from which it is selected. Statistically this is sought by random sampling techniques

random-digit dialing: method of selecting telephone numbers for a sample, numbers are dialed randomly within working exchanges (i.e., the first three digits of a telephone number).

reactivity: the ways in which the process of asking people questions affects their answers to other questions.

recoding: the process of combining categories within a variable.

replication: outcome of elaboration in which the control variable has no effect on the original relationship between the two variables.

research hypothesis: the hypothesis for chi square that states that two variables are related to each other (same as alternative hypothesis).

row percents: percentages in a table that add across to 100 percent.

rule for computing percents: if the independent variable is the column variable, the percents should sum down to 100; if the independent variable is the row variable, the percents should sum across to 100.

rule for interpreting percents: compare in the direction opposite to the way the percents sum to 100.

 

BACK

 

S

sample: a selection of units from a population (universe). Of particular importance is how the sample was selected (see simple random sample [SRS])

sampling error: the error in estimates of a population due to the variation possible in samples

scale: composite measures of variables that form a pattern logical/or empirical. Scores thus indicate a pattern not just a sum of component variables, e.g. Bogardus social distance scale.

secondary analysis: a form of research where data collected by others, researchers, government agencies or organizations for their own purposes is used for a different research purpose e.g. the officially collected census and criminal justice data used in this module for research purposes


secondary data analysis: a research method where data collected by one researcher is reanalyzed by another researcher for either different or the same purpose

significance (statistical): refers to the probability that an occurrence, a relationship or a distribution did not occur by chance, was not due to sampling error (obtaining an unusual, non typical sample). A test of significance determines the probability a relationship or distribution is real and not due to chance. See significance (substantive) for popular usage.

significance (statistical): the likelihood that a table distribution, a relationship, etc. could have occurred by chance.

significance (substantive): refers to the socially defined importance of a finding e.g.

simple Random Sample (SRS): a sample of a population where all units in the population have an equal chance of being chosen. This is a basic assumption for data analysis using statistics e.g. if the names of all people in a school were placed on an index card, mixed in a bowl and drawn one at a time to determine the sample

skewness(Interval/ratio level): measures the degree to which a variable approximates a normal curve. Mathematically it measures the deviation from symmetry. A 0 is completely symmetrical while a plus number is skewed to the right and a minus number is skewed to the left.

specification: an outcome in using the elaboration modeling which the control variable identifies the conditions under which the relationship between the independent and dependant variables occurs. e.g.

standard deviation: (Interval/ratio level of measurement), a standardized measure of dispersion of the data around the mean, mathematically the standard deviation is the square root of the variance and thus is standardized on the units measured and intrinsically more understandable. For example the standard deviation for the age of an adult sample could be 25 years which means that 68.2% of the population is between the mean -25 and the mean +25. If the mean is 31 then 68.2% of the population is between 6 and 56 years of age.

standard error: *the standard deviation of all possible sample means that could be drawn from a population. The standard error estimates for us how much disagreement there might be between our sample mean and the population mean

statistic: number value that represents a summary characteristic of a sample drawn from a population. It is a point estimate of the parameter value for the population. Statistics are usually designated by roman letters (e.g. the mean weight of a sample `X = 135lbs).

survey: a research method using standardized questionnaires or interviews to gather data.

sample: subset of the population used for study.

sampling: the process of selecting a subset of the population for study.

sampling error: random or unsystematic error resulting from selecting a sample from a population.

simple random sample: sample in which every case or combination of cases has the same chance of being selected in the sample.

specification: outcome of elaboration in which the control variable specifies the conditions under which the relationship between the independent and dependent variables varies.

spurious: a relationship that disappears completely or decreases substantially when an antecedent control variable is introduced into the analysis.

statistical inference: the process of making generalizations about the population from sample data.

statistical significance: the observed difference is large enough to conclude that it is probably not due to chance factors.

substantive significance: the observed difference is large enough to be considered important.

trend study: a study that uses cross-sections at two or more points in time to examine change over time.

 

BACK

 


U

unit of Analysis: (element) what or whom you want to study e.g. individuals, groups, social actions etc.

univariate analysis: the analysis of one variable at a time, e.g., frequency or percent distributions, averages.

 

BACK

 


V

values: describe the degree or type of a variable property that is possessed e.g. for the variable sex the values would be male or female and for the variable weight a number describing the weight e.g. 150 lbs.

variable: a condition that changes, a property of interest, the basis on which values or attributes are assigned to members of the population e.g., sex, height, weight, income.

variance: (Interval/ratio level), a measure of dispersion of the data around the mean, mathematically the variance is the average squared deviation from the mean. For example the variance for the age of an adult sample could be 625.

variable: trait or characteristic that may change from case to case.

volunteer sample: type of nonprobability sample in which individuals volunteer to be part of the sample.

 

BACK

 


Back
Top
Home
Glossary
FAQ
Contact