U.S. flag

An official website of the United States government

The .gov means it's official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Browse Titles

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Research Council; Division of Behavioral and Social Sciences and Education; Commission on Behavioral and Social Sciences and Education; Committee on Basic Research in the Behavioral and Social Sciences; Gerstein DR, Luce RD, Smelser NJ, et al., editors. The Behavioral and Social Sciences: Achievements and Opportunities. Washington (DC): National Academies Press (US); 1988.

Cover of The Behavioral and Social Sciences: Achievements and Opportunities

The Behavioral and Social Sciences: Achievements and Opportunities.

  • Hardcopy Version at National Academies Press

5 Methods of Data Collection, Representation, and Analysis

This chapter concerns research on collecting, representing, and analyzing the data that underlie behavioral and social sciences knowledge. Such research, methodological in character, includes ethnographic and historical approaches, scaling, axiomatic measurement, and statistics, with its important relatives, econometrics and psychometrics. The field can be described as including the self-conscious study of how scientists draw inferences and reach conclusions from observations. Since statistics is the largest and most prominent of methodological approaches and is used by researchers in virtually every discipline, statistical work draws the lion’s share of this chapter’s attention.

Problems of interpreting data arise whenever inherent variation or measurement fluctuations create challenges to understand data or to judge whether observed relationships are significant, durable, or general. Some examples: Is a sharp monthly (or yearly) increase in the rate of juvenile delinquency (or unemployment) in a particular area a matter for alarm, an ordinary periodic or random fluctuation, or the result of a change or quirk in reporting method? Do the temporal patterns seen in such repeated observations reflect a direct causal mechanism, a complex of indirect ones, or just imperfections in the data? Is a decrease in auto injuries an effect of a new seat-belt law? Are the disagreements among people describing some aspect of a subculture too great to draw valid inferences about that aspect of the culture?

Such issues of inference are often closely connected to substantive theory and specific data, and to some extent it is difficult and perhaps misleading to treat methods of data collection, representation, and analysis separately. This report does so, as do all sciences to some extent, because the methods developed often are far more general than the specific problems that originally gave rise to them. There is much transfer of new ideas from one substantive field to another—and to and from fields outside the behavioral and social sciences. Some of the classical methods of statistics arose in studies of astronomical observations, biological variability, and human diversity. The major growth of the classical methods occurred in the twentieth century, greatly stimulated by problems in agriculture and genetics. Some methods for uncovering geometric structures in data, such as multidimensional scaling and factor analysis, originated in research on psychological problems, but have been applied in many other sciences. Some time-series methods were developed originally to deal with economic data, but they are equally applicable to many other kinds of data.

  • In economics: large-scale models of the U.S. economy; effects of taxation, money supply, and other government fiscal and monetary policies; theories of duopoly, oligopoly, and rational expectations; economic effects of slavery.
  • In psychology: test calibration; the formation of subjective probabilities, their revision in the light of new information, and their use in decision making; psychiatric epidemiology and mental health program evaluation.
  • In sociology and other fields: victimization and crime rates; effects of incarceration and sentencing policies; deployment of police and fire-fighting forces; discrimination, antitrust, and regulatory court cases; social networks; population growth and forecasting; and voting behavior.

Even such an abridged listing makes clear that improvements in methodology are valuable across the spectrum of empirical research in the behavioral and social sciences as well as in application to policy questions. Clearly, methodological research serves many different purposes, and there is a need to develop different approaches to serve those different purposes, including exploratory data analysis, scientific inference about hypotheses and population parameters, individual decision making, forecasting what will happen in the event or absence of intervention, and assessing causality from both randomized experiments and observational data.

This discussion of methodological research is divided into three areas: design, representation, and analysis. The efficient design of investigations must take place before data are collected because it involves how much, what kind of, and how data are to be collected. What type of study is feasible: experimental, sample survey, field observation, or other? What variables should be measured, controlled, and randomized? How extensive a subject pool or observational period is appropriate? How can study resources be allocated most effectively among various sites, instruments, and subsamples?

The construction of useful representations of the data involves deciding what kind of formal structure best expresses the underlying qualitative and quantitative concepts that are being used in a given study. For example, cost of living is a simple concept to quantify if it applies to a single individual with unchanging tastes in stable markets (that is, markets offering the same array of goods from year to year at varying prices), but as a national aggregate for millions of households and constantly changing consumer product markets, the cost of living is not easy to specify clearly or measure reliably. Statisticians, economists, sociologists, and other experts have long struggled to make the cost of living a precise yet practicable concept that is also efficient to measure, and they must continually modify it to reflect changing circumstances.

Data analysis covers the final step of characterizing and interpreting research findings: Can estimates of the relations between variables be made? Can some conclusion be drawn about correlation, cause and effect, or trends over time? How uncertain are the estimates and conclusions and can that uncertainty be reduced by analyzing the data in a different way? Can computers be used to display complex results graphically for quicker or better understanding or to suggest different ways of proceeding?

Advances in analysis, data representation, and research design feed into and reinforce one another in the course of actual scientific work. The intersections between methodological improvements and empirical advances are an important aspect of the multidisciplinary thrust of progress in the behavioral and social sciences.

  • Designs for Data Collection

Four broad kinds of research designs are used in the behavioral and social sciences: experimental, survey, comparative, and ethnographic.

Experimental designs, in either the laboratory or field settings, systematically manipulate a few variables while others that may affect the outcome are held constant, randomized, or otherwise controlled. The purpose of randomized experiments is to ensure that only one or a few variables can systematically affect the results, so that causes can be attributed. Survey designs include the collection and analysis of data from censuses, sample surveys, and longitudinal studies and the examination of various relationships among the observed phenomena. Randomization plays a different role here than in experimental designs: it is used to select members of a sample so that the sample is as representative of the whole population as possible. Comparative designs involve the retrieval of evidence that is recorded in the flow of current or past events in different times or places and the interpretation and analysis of this evidence. Ethnographic designs, also known as participant-observation designs, involve a researcher in intensive and direct contact with a group, community, or population being studied, through participation, observation, and extended interviewing.

Experimental Designs

Laboratory experiments.

Laboratory experiments underlie most of the work reported in Chapter 1 , significant parts of Chapter 2 , and some of the newest lines of research in Chapter 3 . Laboratory experiments extend and adapt classical methods of design first developed, for the most part, in the physical and life sciences and agricultural research. Their main feature is the systematic and independent manipulation of a few variables and the strict control or randomization of all other variables that might affect the phenomenon under study. For example, some studies of animal motivation involve the systematic manipulation of amounts of food and feeding schedules while other factors that may also affect motivation, such as body weight, deprivation, and so on, are held constant. New designs are currently coming into play largely because of new analytic and computational methods (discussed below, in “Advances in Statistical Inference and Analysis”).

Two examples of empirically important issues that demonstrate the need for broadening classical experimental approaches are open-ended responses and lack of independence of successive experimental trials. The first concerns the design of research protocols that do not require the strict segregation of the events of an experiment into well-defined trials, but permit a subject to respond at will. These methods are needed when what is of interest is how the respondent chooses to allocate behavior in real time and across continuously available alternatives. Such empirical methods have long been used, but they can generate very subtle and difficult problems in experimental design and subsequent analysis. As theories of allocative behavior of all sorts become more sophisticated and precise, the experimental requirements become more demanding, so the need to better understand and solve this range of design issues is an outstanding challenge to methodological ingenuity.

The second issue arises in repeated-trial designs when the behavior on successive trials, even if it does not exhibit a secular trend (such as a learning curve), is markedly influenced by what has happened in the preceding trial or trials. The more naturalistic the experiment and the more sensitive the meas urements taken, the more likely it is that such effects will occur. But such sequential dependencies in observations cause a number of important conceptual and technical problems in summarizing the data and in testing analytical models, which are not yet completely understood. In the absence of clear solutions, such effects are sometimes ignored by investigators, simplifying the data analysis but leaving residues of skepticism about the reliability and significance of the experimental results. With continuing development of sensitive measures in repeated-trial designs, there is a growing need for more advanced concepts and methods for dealing with experimental results that may be influenced by sequential dependencies.

Randomized Field Experiments

The state of the art in randomized field experiments, in which different policies or procedures are tested in controlled trials under real conditions, has advanced dramatically over the past two decades. Problems that were once considered major methodological obstacles—such as implementing randomized field assignment to treatment and control groups and protecting the randomization procedure from corruption—have been largely overcome. While state-of-the-art standards are not achieved in every field experiment, the commitment to reaching them is rising steadily, not only among researchers but also among customer agencies and sponsors.

The health insurance experiment described in Chapter 2 is an example of a major randomized field experiment that has had and will continue to have important policy reverberations in the design of health care financing. Field experiments with the negative income tax (guaranteed minimum income) conducted in the 1970s were significant in policy debates, even before their completion, and provided the most solid evidence available on how tax-based income support programs and marginal tax rates can affect the work incentives and family structures of the poor. Important field experiments have also been carried out on alternative strategies for the prevention of delinquency and other criminal behavior, reform of court procedures, rehabilitative programs in mental health, family planning, and special educational programs, among other areas.

In planning field experiments, much hinges on the definition and design of the experimental cells, the particular combinations needed of treatment and control conditions for each set of demographic or other client sample characteristics, including specification of the minimum number of cases needed in each cell to test for the presence of effects. Considerations of statistical power, client availability, and the theoretical structure of the inquiry enter into such specifications. Current important methodological thresholds are to find better ways of predicting recruitment and attrition patterns in the sample, of designing experiments that will be statistically robust in the face of problematic sample recruitment or excessive attrition, and of ensuring appropriate acquisition and analysis of data on the attrition component of the sample.

Also of major significance are improvements in integrating detailed process and outcome measurements in field experiments. To conduct research on program effects under field conditions requires continual monitoring to determine exactly what is being done—the process—how it corresponds to what was projected at the outset. Relatively unintrusive, inexpensive, and effective implementation measures are of great interest. There is, in parallel, a growing emphasis on designing experiments to evaluate distinct program components in contrast to summary measures of net program effects.

Finally, there is an important opportunity now for further theoretical work to model organizational processes in social settings and to design and select outcome variables that, in the relatively short time of most field experiments, can predict longer-term effects: For example, in job-training programs, what are the effects on the community (role models, morale, referral networks) or on individual skills, motives, or knowledge levels that are likely to translate into sustained changes in career paths and income levels?

Survey Designs

Many people have opinions about how societal mores, economic conditions, and social programs shape lives and encourage or discourage various kinds of behavior. People generalize from their own cases, and from the groups to which they belong, about such matters as how much it costs to raise a child, the extent to which unemployment contributes to divorce, and so on. In fact, however, effects vary so much from one group to another that homespun generalizations are of little use. Fortunately, behavioral and social scientists have been able to bridge the gaps between personal perspectives and collective realities by means of survey research. In particular, governmental information systems include volumes of extremely valuable survey data, and the facility of modern computers to store, disseminate, and analyze such data has significantly improved empirical tests and led to new understandings of social processes.

Within this category of research designs, two major types are distinguished: repeated cross-sectional surveys and longitudinal panel surveys. In addition, and cross-cutting these types, there is a major effort under way to improve and refine the quality of survey data by investigating features of human memory and of question formation that affect survey response.

Repeated cross-sectional designs can either attempt to measure an entire population—as does the oldest U.S. example, the national decennial census—or they can rest on samples drawn from a population. The general principle is to take independent samples at two or more times, measuring the variables of interest, such as income levels, housing plans, or opinions about public affairs, in the same way. The General Social Survey, collected by the National Opinion Research Center with National Science Foundation support, is a repeated cross sectional data base that was begun in 1972. One methodological question of particular salience in such data is how to adjust for nonresponses and “don’t know” responses. Another is how to deal with self-selection bias. For example, to compare the earnings of women and men in the labor force, it would be mistaken to first assume that the two samples of labor-force participants are randomly selected from the larger populations of men and women; instead, one has to consider and incorporate in the analysis the factors that determine who is in the labor force.

In longitudinal panels, a sample is drawn at one point in time and the relevant variables are measured at this and subsequent times for the same people. In more complex versions, some fraction of each panel may be replaced or added to periodically, such as expanding the sample to include households formed by the children of the original sample. An example of panel data developed in this way is the Panel Study of Income Dynamics (PSID), conducted by the University of Michigan since 1968 (discussed in Chapter 3 ).

Comparing the fertility or income of different people in different circumstances at the same time to find correlations always leaves a large proportion of the variability unexplained, but common sense suggests that much of the unexplained variability is actually explicable. There are systematic reasons for individual outcomes in each person’s past achievements, in parental models, upbringing, and earlier sequences of experiences. Unfortunately, asking people about the past is not particularly helpful: people remake their views of the past to rationalize the present and so retrospective data are often of uncertain validity. In contrast, generation-long longitudinal data allow readings on the sequence of past circumstances uncolored by later outcomes. Such data are uniquely useful for studying the causes and consequences of naturally occurring decisions and transitions. Thus, as longitudinal studies continue, quantitative analysis is becoming feasible about such questions as: How are the decisions of individuals affected by parental experience? Which aspects of early decisions constrain later opportunities? And how does detailed background experience leave its imprint? Studies like the two-decade-long PSID are bringing within grasp a complete generational cycle of detailed data on fertility, work life, household structure, and income.

Advances in Longitudinal Designs

Large-scale longitudinal data collection projects are uniquely valuable as vehicles for testing and improving survey research methodology. In ways that lie beyond the scope of a cross-sectional survey, longitudinal studies can sometimes be designed—without significant detriment to their substantive interests—to facilitate the evaluation and upgrading of data quality; the analysis of relative costs and effectiveness of alternative techniques of inquiry; and the standardization or coordination of solutions to problems of method, concept, and measurement across different research domains.

Some areas of methodological improvement include discoveries about the impact of interview mode on response (mail, telephone, face-to-face); the effects of nonresponse on the representativeness of a sample (due to respondents’ refusal or interviewers’ failure to contact); the effects on behavior of continued participation over time in a sample survey; the value of alternative methods of adjusting for nonresponse and incomplete observations (such as imputation of missing data, variable case weighting); the impact on response of specifying different recall periods, varying the intervals between interviews, or changing the length of interviews; and the comparison and calibration of results obtained by longitudinal surveys, randomized field experiments, laboratory studies, onetime surveys, and administrative records.

It should be especially noted that incorporating improvements in methodology and data quality has been and will no doubt continue to be crucial to the growing success of longitudinal studies. Panel designs are intrinsically more vulnerable than other designs to statistical biases due to cumulative item non-response, sample attrition, time-in-sample effects, and error margins in repeated measures, all of which may produce exaggerated estimates of change. Over time, a panel that was initially representative may become much less representative of a population, not only because of attrition in the sample, but also because of changes in immigration patterns, age structure, and the like. Longitudinal studies are also subject to changes in scientific and societal contexts that may create uncontrolled drifts over time in the meaning of nominally stable questions or concepts as well as in the underlying behavior. Also, a natural tendency to expand over time the range of topics and thus the interview lengths, which increases the burdens on respondents, may lead to deterioration of data quality or relevance. Careful methodological research to understand and overcome these problems has been done, and continued work as a component of new longitudinal studies is certain to advance the overall state of the art.

Longitudinal studies are sometimes pressed for evidence they are not designed to produce: for example, in important public policy questions concerning the impact of government programs in such areas as health promotion, disease prevention, or criminal justice. By using research designs that combine field experiments (with randomized assignment to program and control conditions) and longitudinal surveys, one can capitalize on the strongest merits of each: the experimental component provides stronger evidence for casual statements that are critical for evaluating programs and for illuminating some fundamental theories; the longitudinal component helps in the estimation of long-term program effects and their attenuation. Coupling experiments to ongoing longitudinal studies is not often feasible, given the multiple constraints of not disrupting the survey, developing all the complicated arrangements that go into a large-scale field experiment, and having the populations of interest overlap in useful ways. Yet opportunities to join field experiments to surveys are of great importance. Coupled studies can produce vital knowledge about the empirical conditions under which the results of longitudinal surveys turn out to be similar to—or divergent from—those produced by randomized field experiments. A pattern of divergence and similarity has begun to emerge in coupled studies; additional cases are needed to understand why some naturally occurring social processes and longitudinal design features seem to approximate formal random allocation and others do not. The methodological implications of such new knowledge go well beyond program evaluation and survey research. These findings bear directly on the confidence scientists—and others—can have in conclusions from observational studies of complex behavioral and social processes, particularly ones that cannot be controlled or simulated within the confines of a laboratory environment.

Memory and the Framing of Questions

A very important opportunity to improve survey methods lies in the reduction of nonsampling error due to questionnaire context, phrasing of questions, and, generally, the semantic and social-psychological aspects of surveys. Survey data are particularly affected by the fallibility of human memory and the sensitivity of respondents to the framework in which a question is asked. This sensitivity is especially strong for certain types of attitudinal and opinion questions. Efforts are now being made to bring survey specialists into closer contact with researchers working on memory function, knowledge representation, and language in order to uncover and reduce this kind of error.

Memory for events is often inaccurate, biased toward what respondents believe to be true—or should be true—about the world. In many cases in which data are based on recollection, improvements can be achieved by shifting to techniques of structured interviewing and calibrated forms of memory elicitation, such as specifying recent, brief time periods (for example, in the last seven days) within which respondents recall certain types of events with acceptable accuracy.

  • “Taking things altogether, how would you describe your marriage? Would you say that your marriage is very happy, pretty happy, or not too happy?”
  • “Taken altogether how would you say things are these days—would you say you are very happy, pretty happy, or not too happy?”

Presenting this sequence in both directions on different forms showed that the order affected answers to the general happiness question but did not change the marital happiness question: responses to the specific issue swayed subsequent responses to the general one, but not vice versa. The explanations for and implications of such order effects on the many kinds of questions and sequences that can be used are not simple matters. Further experimentation on the design of survey instruments promises not only to improve the accuracy and reliability of survey research, but also to advance understanding of how people think about and evaluate their behavior from day to day.

Comparative Designs

Both experiments and surveys involve interventions or questions by the scientist, who then records and analyzes the responses. In contrast, many bodies of social and behavioral data of considerable value are originally derived from records or collections that have accumulated for various nonscientific reasons, quite often administrative in nature, in firms, churches, military organizations, and governments at all levels. Data of this kind can sometimes be subjected to careful scrutiny, summary, and inquiry by historians and social scientists, and statistical methods have increasingly been used to develop and evaluate inferences drawn from such data. Some of the main comparative approaches are cross-national aggregate comparisons, selective comparison of a limited number of cases, and historical case studies.

Among the more striking problems facing the scientist using such data are the vast differences in what has been recorded by different agencies whose behavior is being compared (this is especially true for parallel agencies in different nations), the highly unrepresentative or idiosyncratic sampling that can occur in the collection of such data, and the selective preservation and destruction of records. Means to overcome these problems form a substantial methodological research agenda in comparative research. An example of the method of cross-national aggregative comparisons is found in investigations by political scientists and sociologists of the factors that underlie differences in the vitality of institutions of political democracy in different societies. Some investigators have stressed the existence of a large middle class, others the level of education of a population, and still others the development of systems of mass communication. In cross-national aggregate comparisons, a large number of nations are arrayed according to some measures of political democracy and then attempts are made to ascertain the strength of correlations between these and the other variables. In this line of analysis it is possible to use a variety of statistical cluster and regression techniques to isolate and assess the possible impact of certain variables on the institutions under study. While this kind of research is cross-sectional in character, statements about historical processes are often invoked to explain the correlations.

More limited selective comparisons, applied by many of the classic theorists, involve asking similar kinds of questions but over a smaller range of societies. Why did democracy develop in such different ways in America, France, and England? Why did northeastern Europe develop rational bourgeois capitalism, in contrast to the Mediterranean and Asian nations? Modern scholars have turned their attention to explaining, for example, differences among types of fascism between the two World Wars, and similarities and differences among modern state welfare systems, using these comparisons to unravel the salient causes. The questions asked in these instances are inevitably historical ones.

Historical case studies involve only one nation or region, and so they may not be geographically comparative. However, insofar as they involve tracing the transformation of a society’s major institutions and the role of its main shaping events, they involve a comparison of different periods of a nation’s or a region’s history. The goal of such comparisons is to give a systematic account of the relevant differences. Sometimes, particularly with respect to the ancient societies, the historical record is very sparse, and the methods of history and archaeology mesh in the reconstruction of complex social arrangements and patterns of change on the basis of few fragments.

Like all research designs, comparative ones have distinctive vulnerabilities and advantages: One of the main advantages of using comparative designs is that they greatly expand the range of data, as well as the amount of variation in those data, for study. Consequently, they allow for more encompassing explanations and theories that can relate highly divergent outcomes to one another in the same framework. They also contribute to reducing any cultural biases or tendencies toward parochialism among scientists studying common human phenomena.

One main vulnerability in such designs arises from the problem of achieving comparability. Because comparative study involves studying societies and other units that are dissimilar from one another, the phenomena under study usually occur in very different contexts—so different that in some cases what is called an event in one society cannot really be regarded as the same type of event in another. For example, a vote in a Western democracy is different from a vote in an Eastern bloc country, and a voluntary vote in the United States means something different from a compulsory vote in Australia. These circumstances make for interpretive difficulties in comparing aggregate rates of voter turnout in different countries.

The problem of achieving comparability appears in historical analysis as well. For example, changes in laws and enforcement and recording procedures over time change the definition of what is and what is not a crime, and for that reason it is difficult to compare the crime rates over time. Comparative researchers struggle with this problem continually, working to fashion equivalent measures; some have suggested the use of different measures (voting, letters to the editor, street demonstration) in different societies for common variables (political participation), to try to take contextual factors into account and to achieve truer comparability.

A second vulnerability is controlling variation. Traditional experiments make conscious and elaborate efforts to control the variation of some factors and thereby assess the causal significance of others. In surveys as well as experiments, statistical methods are used to control sources of variation and assess suspected causal significance. In comparative and historical designs, this kind of control is often difficult to attain because the sources of variation are many and the number of cases few. Scientists have made efforts to approximate such control in these cases of “many variables, small N.” One is the method of paired comparisons. If an investigator isolates 15 American cities in which racial violence has been recurrent in the past 30 years, for example, it is helpful to match them with 15 cities of similar population size, geographical region, and size of minorities—such characteristics are controls—and then search for systematic differences between the two sets of cities. Another method is to select, for comparative purposes, a sample of societies that resemble one another in certain critical ways, such as size, common language, and common level of development, thus attempting to hold these factors roughly constant, and then seeking explanations among other factors in which the sampled societies differ from one another.

Ethnographic Designs

Traditionally identified with anthropology, ethnographic research designs are playing increasingly significant roles in most of the behavioral and social sciences. The core of this methodology is participant-observation, in which a researcher spends an extended period of time with the group under study, ideally mastering the local language, dialect, or special vocabulary, and participating in as many activities of the group as possible. This kind of participant-observation is normally coupled with extensive open-ended interviewing, in which people are asked to explain in depth the rules, norms, practices, and beliefs through which (from their point of view) they conduct their lives. A principal aim of ethnographic study is to discover the premises on which those rules, norms, practices, and beliefs are built.

The use of ethnographic designs by anthropologists has contributed significantly to the building of knowledge about social and cultural variation. And while these designs continue to center on certain long-standing features—extensive face-to-face experience in the community, linguistic competence, participation, and open-ended interviewing—there are newer trends in ethnographic work. One major trend concerns its scale. Ethnographic methods were originally developed largely for studying small-scale groupings known variously as village, folk, primitive, preliterate, or simple societies. Over the decades, these methods have increasingly been applied to the study of small groups and networks within modern (urban, industrial, complex) society, including the contemporary United States. The typical subjects of ethnographic study in modern society are small groups or relatively small social networks, such as outpatient clinics, medical schools, religious cults and churches, ethnically distinctive urban neighborhoods, corporate offices and factories, and government bureaus and legislatures.

As anthropologists moved into the study of modern societies, researchers in other disciplines—particularly sociology, psychology, and political science—began using ethnographic methods to enrich and focus their own insights and findings. At the same time, studies of large-scale structures and processes have been aided by the use of ethnographic methods, since most large-scale changes work their way into the fabric of community, neighborhood, and family, affecting the daily lives of people. Ethnographers have studied, for example, the impact of new industry and new forms of labor in “backward” regions; the impact of state-level birth control policies on ethnic groups; and the impact on residents in a region of building a dam or establishing a nuclear waste dump. Ethnographic methods have also been used to study a number of social processes that lend themselves to its particular techniques of observation and interview—processes such as the formation of class and racial identities, bureaucratic behavior, legislative coalitions and outcomes, and the formation and shifting of consumer tastes.

Advances in structured interviewing (see above) have proven especially powerful in the study of culture. Techniques for understanding kinship systems, concepts of disease, color terminologies, ethnobotany, and ethnozoology have been radically transformed and strengthened by coupling new interviewing methods with modem measurement and scaling techniques (see below). These techniques have made possible more precise comparisons among cultures and identification of the most competent and expert persons within a culture. The next step is to extend these methods to study the ways in which networks of propositions (such as boys like sports, girls like babies) are organized to form belief systems. Much evidence suggests that people typically represent the world around them by means of relatively complex cognitive models that involve interlocking propositions. The techniques of scaling have been used to develop models of how people categorize objects, and they have great potential for further development, to analyze data pertaining to cultural propositions.

Ideological Systems

Perhaps the most fruitful area for the application of ethnographic methods in recent years has been the systematic study of ideologies in modern society. Earlier studies of ideology were in small-scale societies that were rather homogeneous. In these studies researchers could report on a single culture, a uniform system of beliefs and values for the society as a whole. Modern societies are much more diverse both in origins and number of subcultures, related to different regions, communities, occupations, or ethnic groups. Yet these subcultures and ideologies share certain underlying assumptions or at least must find some accommodation with the dominant value and belief systems in the society.

The challenge is to incorporate this greater complexity of structure and process into systematic descriptions and interpretations. One line of work carried out by researchers has tried to track the ways in which ideologies are created, transmitted, and shared among large populations that have traditionally lacked the social mobility and communications technologies of the West. This work has concentrated on large-scale civilizations such as China, India, and Central America. Gradually, the focus has generalized into a concern with the relationship between the great traditions—the central lines of cosmopolitan Confucian, Hindu, or Mayan culture, including aesthetic standards, irrigation technologies, medical systems, cosmologies and calendars, legal codes, poetic genres, and religious doctrines and rites—and the little traditions, those identified with rural, peasant communities. How are the ideological doctrines and cultural values of the urban elites, the great traditions, transmitted to local communities? How are the little traditions, the ideas from the more isolated, less literate, and politically weaker groups in society, transmitted to the elites?

India and southern Asia have been fruitful areas for ethnographic research on these questions. The great Hindu tradition was present in virtually all local contexts through the presence of high-caste individuals in every community. It operated as a pervasive standard of value for all members of society, even in the face of strong little traditions. The situation is surprisingly akin to that of modern, industrialized societies. The central research questions are the degree and the nature of penetration of dominant ideology, even in groups that appear marginal and subordinate and have no strong interest in sharing the dominant value system. In this connection the lowest and poorest occupational caste—the untouchables—serves as an ultimate test of the power of ideology and cultural beliefs to unify complex hierarchical social systems.

Historical Reconstruction

Another current trend in ethnographic methods is its convergence with archival methods. One joining point is the application of descriptive and interpretative procedures used by ethnographers to reconstruct the cultures that created historical documents, diaries, and other records, to interview history, so to speak. For example, a revealing study showed how the Inquisition in the Italian countryside between the 1570s and 1640s gradually worked subtle changes in an ancient fertility cult in peasant communities; the peasant beliefs and rituals assimilated many elements of witchcraft after learning them from their persecutors. A good deal of social history—particularly that of the family—has drawn on discoveries made in the ethnographic study of primitive societies. As described in Chapter 4 , this particular line of inquiry rests on a marriage of ethnographic, archival, and demographic approaches.

Other lines of ethnographic work have focused on the historical dimensions of nonliterate societies. A strikingly successful example in this kind of effort is a study of head-hunting. By combining an interpretation of local oral tradition with the fragmentary observations that were made by outside observers (such as missionaries, traders, colonial officials), historical fluctuations in the rate and significance of head-hunting were shown to be partly in response to such international forces as the great depression and World War II. Researchers are also investigating the ways in which various groups in contemporary societies invent versions of traditions that may or may not reflect the actual history of the group. This process has been observed among elites seeking political and cultural legitimation and among hard-pressed minorities (for example, the Basque in Spain, the Welsh in Great Britain) seeking roots and political mobilization in a larger society.

Ethnography is a powerful method to record, describe, and interpret the system of meanings held by groups and to discover how those meanings affect the lives of group members. It is a method well adapted to the study of situations in which people interact with one another and the researcher can interact with them as well, so that information about meanings can be evoked and observed. Ethnography is especially suited to exploration and elucidation of unsuspected connections; ideally, it is used in combination with other methods—experimental, survey, or comparative—to establish with precision the relative strengths and weaknesses of such connections. By the same token, experimental, survey, and comparative methods frequently yield connections, the meaning of which is unknown; ethnographic methods are a valuable way to determine them.

  • Models for Representing Phenomena

The objective of any science is to uncover the structure and dynamics of the phenomena that are its subject, as they are exhibited in the data. Scientists continuously try to describe possible structures and ask whether the data can, with allowance for errors of measurement, be described adequately in terms of them. Over a long time, various families of structures have recurred throughout many fields of science; these structures have become objects of study in their own right, principally by statisticians, other methodological specialists, applied mathematicians, and philosophers of logic and science. Methods have evolved to evaluate the adequacy of particular structures to account for particular types of data. In the interest of clarity we discuss these structures in this section and the analytical methods used for estimation and evaluation of them in the next section, although in practice they are closely intertwined.

A good deal of mathematical and statistical modeling attempts to describe the relations, both structural and dynamic, that hold among variables that are presumed to be representable by numbers. Such models are applicable in the behavioral and social sciences only to the extent that appropriate numerical measurement can be devised for the relevant variables. In many studies the phenomena in question and the raw data obtained are not intrinsically numerical, but qualitative, such as ethnic group identifications. The identifying numbers used to code such questionnaire categories for computers are no more than labels, which could just as well be letters or colors. One key question is whether there is some natural way to move from the qualitative aspects of such data to a structural representation that involves one of the well-understood numerical or geometric models or whether such an attempt would be inherently inappropriate for the data in question. The decision as to whether or not particular empirical data can be represented in particular numerical or more complex structures is seldom simple, and strong intuitive biases or a priori assumptions about what can and cannot be done may be misleading.

Recent decades have seen rapid and extensive development and application of analytical methods attuned to the nature and complexity of social science data. Examples of nonnumerical modeling are increasing. Moreover, the widespread availability of powerful computers is probably leading to a qualitative revolution, it is affecting not only the ability to compute numerical solutions to numerical models, but also to work out the consequences of all sorts of structures that do not involve numbers at all. The following discussion gives some indication of the richness of past progress and of future prospects although it is by necessity far from exhaustive.

In describing some of the areas of new and continuing research, we have organized this section on the basis of whether the representations are fundamentally probabilistic or not. A further useful distinction is between representations of data that are highly discrete or categorical in nature (such as whether a person is male or female) and those that are continuous in nature (such as a person’s height). Of course, there are intermediate cases involving both types of variables, such as color stimuli that are characterized by discrete hues (red, green) and a continuous luminance measure. Probabilistic models lead very naturally to questions of estimation and statistical evaluation of the correspondence between data and model. Those that are not probabilistic involve additional problems of dealing with and representing sources of variability that are not explicitly modeled. At the present time, scientists understand some aspects of structure, such as geometries, and some aspects of randomness, as embodied in probability models, but do not yet adequately understand how to put the two together in a single unified model. Table 5-1 outlines the way we have organized this discussion and shows where the examples in this section lie.

Table 5-1. A Classification of Structural Models.

A Classification of Structural Models.

Probability Models

Some behavioral and social sciences variables appear to be more or less continuous, for example, utility of goods, loudness of sounds, or risk associated with uncertain alternatives. Many other variables, however, are inherently categorical, often with only two or a few values possible: for example, whether a person is in or out of school, employed or not employed, identifies with a major political party or political ideology. And some variables, such as moral attitudes, are typically measured in research with survey questions that allow only categorical responses. Much of the early probability theory was formulated only for continuous variables; its use with categorical variables was not really justified, and in some cases it may have been misleading. Recently, very significant advances have been made in how to deal explicitly with categorical variables. This section first describes several contemporary approaches to models involving categorical variables, followed by ones involving continuous representations.

Log-Linear Models for Categorical Variables

Many recent models for analyzing categorical data of the kind usually displayed as counts (cell frequencies) in multidimensional contingency tables are subsumed under the general heading of log-linear models, that is, linear models in the natural logarithms of the expected counts in each cell in the table. These recently developed forms of statistical analysis allow one to partition variability due to various sources in the distribution of categorical attributes, and to isolate the effects of particular variables or combinations of them.

Present log-linear models were first developed and used by statisticians and sociologists and then found extensive application in other social and behavioral sciences disciplines. When applied, for instance, to the analysis of social mobility, such models separate factors of occupational supply and demand from other factors that impede or propel movement up and down the social hierarchy. With such models, for example, researchers discovered the surprising fact that occupational mobility patterns are strikingly similar in many nations of the world (even among disparate nations like the United States and most of the Eastern European socialist countries), and from one time period to another, once allowance is made for differences in the distributions of occupations. The log-linear and related kinds of models have also made it possible to identify and analyze systematic differences in mobility among nations and across time. As another example of applications, psychologists and others have used log-linear models to analyze attitudes and their determinants and to link attitudes to behavior. These methods have also diffused to and been used extensively in the medical and biological sciences.

Regression Models for Categorical Variables

Models that permit one variable to be explained or predicted by means of others, called regression models, are the workhorses of much applied statistics; this is especially true when the dependent (explained) variable is continuous. For a two-valued dependent variable, such as alive or dead, models and approximate theory and computational methods for one explanatory variable were developed in biometry about 50 years ago. Computer programs able to handle many explanatory variables, continuous or categorical, are readily available today. Even now, however, the accuracy of the approximate theory on given data is an open question.

Using classical utility theory, economists have developed discrete choice models that turn out to be somewhat related to the log-linear and categorical regression models. Models for limited dependent variables, especially those that cannot take on values above or below a certain level (such as weeks unemployed, number of children, and years of schooling) have been used profitably in economics and in some other areas. For example, censored normal variables (called tobits in economics), in which observed values outside certain limits are simply counted, have been used in studying decisions to go on in school. It will require further research and development to incorporate information about limited ranges of variables fully into the main multivariate methodologies. In addition, with respect to the assumptions about distribution and functional form conventionally made in discrete response models, some new methods are now being developed that show promise of yielding reliable inferences without making unrealistic assumptions; further research in this area promises significant progress.

One problem arises from the fact that many of the categorical variables collected by the major data bases are ordered. For example, attitude surveys frequently use a 3-, 5-, or 7-point scale (from high to low) without specifying numerical intervals between levels. Social class and educational levels are often described by ordered categories. Ignoring order information, which many traditional statistical methods do, may be inefficient or inappropriate, but replacing the categories by successive integers or other arbitrary scores may distort the results. (For additional approaches to this question, see sections below on ordered structures.) Regression-like analysis of ordinal categorical variables is quite well developed, but their multivariate analysis needs further research. New log-bilinear models have been proposed, but to date they deal specifically with only two or three categorical variables. Additional research extending the new models, improving computational algorithms, and integrating the models with work on scaling promise to lead to valuable new knowledge.

Models for Event Histories

Event-history studies yield the sequence of events that respondents to a survey sample experience over a period of time; for example, the timing of marriage, childbearing, or labor force participation. Event-history data can be used to study educational progress, demographic processes (migration, fertility, and mortality), mergers of firms, labor market behavior, and even riots, strikes, and revolutions. As interest in such data has grown, many researchers have turned to models that pertain to changes in probabilities over time to describe when and how individuals move among a set of qualitative states.

Much of the progress in models for event-history data builds on recent developments in statistics and biostatistics for life-time, failure-time, and hazard models. Such models permit the analysis of qualitative transitions in a population whose members are undergoing partially random organic deterioration, mechanical wear, or other risks over time. With the increased complexity of event-history data that are now being collected, and the extension of event-history data bases over very long periods of time, new problems arise that cannot be effectively handled by older types of analysis. Among the problems are repeated transitions, such as between unemployment and employment or marriage and divorce; more than one time variable (such as biological age, calendar time, duration in a stage, and time exposed to some specified condition); latent variables (variables that are explicitly modeled even though not observed); gaps in the data; sample attrition that is not randomly distributed over the categories; and respondent difficulties in recalling the exact timing of events.

Models for Multiple-Item Measurement

For a variety of reasons, researchers typically use multiple measures (or multiple indicators) to represent theoretical concepts. Sociologists, for example, often rely on two or more variables (such as occupation and education) to measure an individual’s socioeconomic position; educational psychologists ordinarily measure a student’s ability with multiple test items. Despite the fact that the basic observations are categorical, in a number of applications this is interpreted as a partitioning of something continuous. For example, in test theory one thinks of the measures of both item difficulty and respondent ability as continuous variables, possibly multidimensional in character.

Classical test theory and newer item-response theories in psychometrics deal with the extraction of information from multiple measures. Testing, which is a major source of data in education and other areas, results in millions of test items stored in archives each year for purposes ranging from college admissions to job-training programs for industry. One goal of research on such test data is to be able to make comparisons among persons or groups even when different test items are used. Although the information collected from each respondent is intentionally incomplete in order to keep the tests short and simple, item-response techniques permit researchers to reconstitute the fragments into an accurate picture of overall group proficiencies. These new methods provide a better theoretical handle on individual differences, and they are expected to be extremely important in developing and using tests. For example, they have been used in attempts to equate different forms of a test given in successive waves during a year, a procedure made necessary in large-scale testing programs by legislation requiring disclosure of test-scoring keys at the time results are given.

An example of the use of item-response theory in a significant research effort is the National Assessment of Educational Progress (NAEP). The goal of this project is to provide accurate, nationally representative information on the average (rather than individual) proficiency of American children in a wide variety of academic subjects as they progress through elementary and secondary school. This approach is an improvement over the use of trend data on university entrance exams, because NAEP estimates of academic achievements (by broad characteristics such as age, grade, region, ethnic background, and so on) are not distorted by the self-selected character of those students who seek admission to college, graduate, and professional programs.

Item-response theory also forms the basis of many new psychometric instruments, known as computerized adaptive testing, currently being implemented by the U.S. military services and under additional development in many testing organizations. In adaptive tests, a computer program selects items for each examinee based upon the examinee’s success with previous items. Generally, each person gets a slightly different set of items and the equivalence of scale scores is established by using item-response theory. Adaptive testing can greatly reduce the number of items needed to achieve a given level of measurement accuracy.

Nonlinear, Nonadditive Models

Virtually all statistical models now in use impose a linearity or additivity assumption of some kind, sometimes after a nonlinear transformation of variables. Imposing these forms on relationships that do not, in fact, possess them may well result in false descriptions and spurious effects. Unwary users, especially of computer software packages, can easily be misled. But more realistic nonlinear and nonadditive multivariate models are becoming available. Extensive use with empirical data is likely to force many changes and enhancements in such models and stimulate quite different approaches to nonlinear multivariate analysis in the next decade.

Geometric and Algebraic Models

Geometric and algebraic models attempt to describe underlying structural relations among variables. In some cases they are part of a probabilistic approach, such as the algebraic models underlying regression or the geometric representations of correlations between items in a technique called factor analysis. In other cases, geometric and algebraic models are developed without explicitly modeling the element of randomness or uncertainty that is always present in the data. Although this latter approach to behavioral and social sciences problems has been less researched than the probabilistic one, there are some advantages in developing the structural aspects independent of the statistical ones. We begin the discussion with some inherently geometric representations and then turn to numerical representations for ordered data.

Although geometry is a huge mathematical topic, little of it seems directly applicable to the kinds of data encountered in the behavioral and social sciences. A major reason is that the primitive concepts normally used in geometry—points, lines, coincidence—do not correspond naturally to the kinds of qualitative observations usually obtained in behavioral and social sciences contexts. Nevertheless, since geometric representations are used to reduce bodies of data, there is a real need to develop a deeper understanding of when such representations of social or psychological data make sense. Moreover, there is a practical need to understand why geometric computer algorithms, such as those of multidimensional scaling, work as well as they apparently do. A better understanding of the algorithms will increase the efficiency and appropriateness of their use, which becomes increasingly important with the widespread availability of scaling programs for microcomputers.

Over the past 50 years several kinds of well-understood scaling techniques have been developed and widely used to assist in the search for appropriate geometric representations of empirical data. The whole field of scaling is now entering a critical juncture in terms of unifying and synthesizing what earlier appeared to be disparate contributions. Within the past few years it has become apparent that several major methods of analysis, including some that are based on probabilistic assumptions, can be unified under the rubric of a single generalized mathematical structure. For example, it has recently been demonstrated that such diverse approaches as nonmetric multidimensional scaling, principal-components analysis, factor analysis, correspondence analysis, and log-linear analysis have more in common in terms of underlying mathematical structure than had earlier been realized.

Nonmetric multidimensional scaling is a method that begins with data about the ordering established by subjective similarity (or nearness) between pairs of stimuli. The idea is to embed the stimuli into a metric space (that is, a geometry with a measure of distance between points) in such a way that distances between points corresponding to stimuli exhibit the same ordering as do the data. This method has been successfully applied to phenomena that, on other grounds, are known to be describable in terms of a specific geometric structure; such applications were used to validate the procedures. Such validation was done, for example, with respect to the perception of colors, which are known to be describable in terms of a particular three-dimensional structure known as the Euclidean color coordinates. Similar applications have been made with Morse code symbols and spoken phonemes. The technique is now used in some biological and engineering applications, as well as in some of the social sciences, as a method of data exploration and simplification.

One question of interest is how to develop an axiomatic basis for various geometries using as a primitive concept an observable such as the subject’s ordering of the relative similarity of one pair of stimuli to another, which is the typical starting point of such scaling. The general task is to discover properties of the qualitative data sufficient to ensure that a mapping into the geometric structure exists and, ideally, to discover an algorithm for finding it. Some work of this general type has been carried out: for example, there is an elegant set of axioms based on laws of color matching that yields the three-dimensional vectorial representation of color space. But the more general problem of understanding the conditions under which the multidimensional scaling algorithms are suitable remains unsolved. In addition, work is needed on understanding more general, non-Euclidean spatial models.

Ordered Factorial Systems

One type of structure common throughout the sciences arises when an ordered dependent variable is affected by two or more ordered independent variables. This is the situation to which regression and analysis-of-variance models are often applied; it is also the structure underlying the familiar physical identities, in which physical units are expressed as products of the powers of other units (for example, energy has the unit of mass times the square of the unit of distance divided by the square of the unit of time).

There are many examples of these types of structures in the behavioral and social sciences. One example is the ordering of preference of commodity bundles—collections of various amounts of commodities—which may be revealed directly by expressions of preference or indirectly by choices among alternative sets of bundles. A related example is preferences among alternative courses of action that involve various outcomes with differing degrees of uncertainty; this is one of the more thoroughly investigated problems because of its potential importance in decision making. A psychological example is the trade-off between delay and amount of reward, yielding those combinations that are equally reinforcing. In a common, applied kind of problem, a subject is given descriptions of people in terms of several factors, for example, intelligence, creativity, diligence, and honesty, and is asked to rate them according to a criterion such as suitability for a particular job.

In all these cases and a myriad of others like them the question is whether the regularities of the data permit a numerical representation. Initially, three types of representations were studied quite fully: the dependent variable as a sum, a product, or a weighted average of the measures associated with the independent variables. The first two representations underlie some psychological and economic investigations, as well as a considerable portion of physical measurement and modeling in classical statistics. The third representation, averaging, has proved most useful in understanding preferences among uncertain outcomes and the amalgamation of verbally described traits, as well as some physical variables.

For each of these three cases—adding, multiplying, and averaging—researchers know what properties or axioms of order the data must satisfy for such a numerical representation to be appropriate. On the assumption that one or another of these representations exists, and using numerical ratings by subjects instead of ordering, a scaling technique called functional measurement (referring to the function that describes how the dependent variable relates to the independent ones) has been developed and applied in a number of domains. What remains problematic is how to encompass at the ordinal level the fact that some random error intrudes into nearly all observations and then to show how that randomness is represented at the numerical level; this continues to be an unresolved and challenging research issue.

During the past few years considerable progress has been made in understanding certain representations inherently different from those just discussed. The work has involved three related thrusts. The first is a scheme of classifying structures according to how uniquely their representation is constrained. The three classical numerical representations are known as ordinal, interval, and ratio scale types. For systems with continuous numerical representations and of scale type at least as rich as the ratio one, it has been shown that only one additional type can exist. A second thrust is to accept structural assumptions, like factorial ones, and to derive for each scale the possible functional relations among the independent variables. And the third thrust is to develop axioms for the properties of an order relation that leads to the possible representations. Much is now known about the possible nonadditive representations of both the multifactor case and the one where stimuli can be combined, such as combining sound intensities.

Closely related to this classification of structures is the question: What statements, formulated in terms of the measures arising in such representations, can be viewed as meaningful in the sense of corresponding to something empirical? Statements here refer to any scientific assertions, including statistical ones, formulated in terms of the measures of the variables and logical and mathematical connectives. These are statements for which asserting truth or falsity makes sense. In particular, statements that remain invariant under certain symmetries of structure have played an important role in classical geometry, dimensional analysis in physics, and in relating measurement and statistical models applied to the same phenomenon. In addition, these ideas have been used to construct models in more formally developed areas of the behavioral and social sciences, such as psychophysics. Current research has emphasized the communality of these historically independent developments and is attempting both to uncover systematic, philosophically sound arguments as to why invariance under symmetries is as important as it appears to be and to understand what to do when structures lack symmetry, as, for example, when variables have an inherent upper bound.

Many subjects do not seem to be correctly represented in terms of distances in continuous geometric space. Rather, in some cases, such as the relations among meanings of words—which is of great interest in the study of memory representations—a description in terms of tree-like, hierarchial structures appears to be more illuminating. This kind of description appears appropriate both because of the categorical nature of the judgments and the hierarchial, rather than trade-off, nature of the structure. Individual items are represented as the terminal nodes of the tree, and groupings by different degrees of similarity are shown as intermediate nodes, with the more general groupings occurring nearer the root of the tree. Clustering techniques, requiring considerable computational power, have been and are being developed. Some successful applications exist, but much more refinement is anticipated.

Network Models

Several other lines of advanced modeling have progressed in recent years, opening new possibilities for empirical specification and testing of a variety of theories. In social network data, relationships among units, rather than the units themselves, are the primary objects of study: friendships among persons, trade ties among nations, cocitation clusters among research scientists, interlocking among corporate boards of directors. Special models for social network data have been developed in the past decade, and they give, among other things, precise new measures of the strengths of relational ties among units. A major challenge in social network data at present is to handle the statistical dependence that arises when the units sampled are related in complex ways.

  • Statistical Inference and Analysis

As was noted earlier, questions of design, representation, and analysis are intimately intertwined. Some issues of inference and analysis have been discussed above as related to specific data collection and modeling approaches. This section discusses some more general issues of statistical inference and advances in several current approaches to them.

Causal Inference

Behavioral and social scientists use statistical methods primarily to infer the effects of treatments, interventions, or policy factors. Previous chapters included many instances of causal knowledge gained this way. As noted above, the large experimental study of alternative health care financing discussed in Chapter 2 relied heavily on statistical principles and techniques, including randomization, in the design of the experiment and the analysis of the resulting data. Sophisticated designs were necessary in order to answer a variety of questions in a single large study without confusing the effects of one program difference (such as prepayment or fee for service) with the effects of another (such as different levels of deductible costs), or with effects of unobserved variables (such as genetic differences). Statistical techniques were also used to ascertain which results applied across the whole enrolled population and which were confined to certain subgroups (such as individuals with high blood pressure) and to translate utilization rates across different programs and types of patients into comparable overall dollar costs and health outcomes for alternative financing options.

A classical experiment, with systematic but randomly assigned variation of the variables of interest (or some reasonable approach to this), is usually considered the most rigorous basis from which to draw such inferences. But random samples or randomized experimental manipulations are not always feasible or ethically acceptable. Then, causal inferences must be drawn from observational studies, which, however well designed, are less able to ensure that the observed (or inferred) relationships among variables provide clear evidence on the underlying mechanisms of cause and effect.

Certain recurrent challenges have been identified in studying causal inference. One challenge arises from the selection of background variables to be measured, such as the sex, nativity, or parental religion of individuals in a comparative study of how education affects occupational success. The adequacy of classical methods of matching groups in background variables and adjusting for covariates needs further investigation. Statistical adjustment of biases linked to measured background variables is possible, but it can become complicated. Current work in adjustment for selectivity bias is aimed at weakening implausible assumptions, such as normality, when carrying out these adjustments. Even after adjustment has been made for the measured background variables, other, unmeasured variables are almost always still affecting the results (such as family transfers of wealth or reading habits). Analyses of how the conclusions might change if such unmeasured variables could be taken into account is essential in attempting to make causal inferences from an observational study, and systematic work on useful statistical models for such sensitivity analyses is just beginning.

The third important issue arises from the necessity for distinguishing among competing hypotheses when the explanatory variables are measured with different degrees of precision. Both the estimated size and significance of an effect are diminished when it has large measurement error, and the coefficients of other correlated variables are affected even when the other variables are measured perfectly. Similar results arise from conceptual errors, when one measures only proxies for a theoretical construct (such as years of education to represent amount of learning). In some cases, there are procedures for simultaneously or iteratively estimating both the precision of complex measures and their effect on a particular criterion.

Although complex models are often necessary to infer causes, once their output is available, it should be translated into understandable displays for evaluation. Results that depend on the accuracy of a multivariate model and the associated software need to be subjected to appropriate checks, including the evaluation of graphical displays, group comparisons, and other analyses.

New Statistical Techniques

Internal resampling.

One of the great contributions of twentieth-century statistics was to demonstrate how a properly drawn sample of sufficient size, even if it is only a tiny fraction of the population of interest, can yield very good estimates of most population characteristics. When enough is known at the outset about the characteristic in question—for example, that its distribution is roughly normal—inference from the sample data to the population as a whole is straightforward, and one can easily compute measures of the certainty of inference, a common example being the 95 percent confidence interval around an estimate. But population shapes are sometimes unknown or uncertain, and so inference procedures cannot be so simple. Furthermore, more often than not, it is difficult to assess even the degree of uncertainty associated with complex data and with the statistics needed to unravel complex social and behavioral phenomena.

Internal resampling methods attempt to assess this uncertainty by generating a number of simulated data sets similar to the one actually observed. The definition of similar is crucial, and many methods that exploit different types of similarity have been devised. These methods provide researchers the freedom to choose scientifically appropriate procedures and to replace procedures that are valid under assumed distributional shapes with ones that are not so restricted. Flexible and imaginative computer simulation is the key to these methods. For a simple random sample, the “bootstrap” method repeatedly resamples the obtained data (with replacement) to generate a distribution of possible data sets. The distribution of any estimator can thereby be simulated and measures of the certainty of inference be derived. The “jackknife” method repeatedly omits a fraction of the data and in this way generates a distribution of possible data sets that can also be used to estimate variability. These methods can also be used to remove or reduce bias. For example, the ratio-estimator, a statistic that is commonly used in analyzing sample surveys and censuses, is known to be biased, and the jackknife method can usually remedy this defect. The methods have been extended to other situations and types of analysis, such as multiple regression.

There are indications that under relatively general conditions, these methods, and others related to them, allow more accurate estimates of the uncertainty of inferences than do the traditional ones that are based on assumed (usually, normal) distributions when that distributional assumption is unwarranted. For complex samples, such internal resampling or subsampling facilitates estimating the sampling variances of complex statistics.

An older and simpler, but equally important, idea is to use one independent subsample in searching the data to develop a model and at least one separate subsample for estimating and testing a selected model. Otherwise, it is next to impossible to make allowances for the excessively close fitting of the model that occurs as a result of the creative search for the exact characteristics of the sample data—characteristics that are to some degree random and will not predict well to other samples.

Robust Techniques

Many technical assumptions underlie the analysis of data. Some, like the assumption that each item in a sample is drawn independently of other items, can be weakened when the data are sufficiently structured to admit simple alternative models, such as serial correlation. Usually, these models require that a few parameters be estimated. Assumptions about shapes of distributions, normality being the most common, have proved to be particularly important, and considerable progress has been made in dealing with the consequences of different assumptions.

More recently, robust techniques have been designed that permit sharp, valid discriminations among possible values of parameters of central tendency for a wide variety of alternative distributions by reducing the weight given to occasional extreme deviations. It turns out that by giving up, say, 10 percent of the discrimination that could be provided under the rather unrealistic assumption of normality, one can greatly improve performance in more realistic situations, especially when unusually large deviations are relatively common.

These valuable modifications of classical statistical techniques have been extended to multiple regression, in which procedures of iterative reweighting can now offer relatively good performance for a variety of underlying distributional shapes. They should be extended to more general schemes of analysis.

In some contexts—notably the most classical uses of analysis of variance—the use of adequate robust techniques should help to bring conventional statistical practice closer to the best standards that experts can now achieve.

Many Interrelated Parameters

In trying to give a more accurate representation of the real world than is possible with simple models, researchers sometimes use models with many parameters, all of which must be estimated from the data. Classical principles of estimation, such as straightforward maximum-likelihood, do not yield reliable estimates unless either the number of observations is much larger than the number of parameters to be estimated or special designs are used in conjunction with strong assumptions. Bayesian methods do not draw a distinction between fixed and random parameters, and so may be especially appropriate for such problems.

A variety of statistical methods have recently been developed that can be interpreted as treating many of the parameters as or similar to random quantities, even if they are regarded as representing fixed quantities to be estimated. Theory and practice demonstrate that such methods can improve the simpler fixed-parameter methods from which they evolved, especially when the number of observations is not large relative to the number of parameters. Successful applications include college and graduate school admissions, where quality of previous school is treated as a random parameter when the data are insufficient to separately estimate it well. Efforts to create appropriate models using this general approach for small-area estimation and undercount adjustment in the census are important potential applications.

Missing Data

In data analysis, serious problems can arise when certain kinds of (quantitative or qualitative) information is partially or wholly missing. Various approaches to dealing with these problems have been or are being developed. One of the methods developed recently for dealing with certain aspects of missing data is called multiple imputation: each missing value in a data set is replaced by several values representing a range of possibilities, with statistical dependence among missing values reflected by linkage among their replacements. It is currently being used to handle a major problem of incompatibility between the 1980 and previous Bureau of Census public-use tapes with respect to occupation codes. The extension of these techniques to address such problems as nonresponse to income questions in the Current Population Survey has been examined in exploratory applications with great promise.

Computer Packages and Expert Systems

The development of high-speed computing and data handling has fundamentally changed statistical analysis. Methodologies for all kinds of situations are rapidly being developed and made available for use in computer packages that may be incorporated into interactive expert systems. This computing capability offers the hope that much data analyses will be more carefully and more effectively done than previously and that better strategies for data analysis will move from the practice of expert statisticians, some of whom may not have tried to articulate their own strategies, to both wide discussion and general use.

But powerful tools can be hazardous, as witnessed by occasional dire misuses of existing statistical packages. Until recently the only strategies available were to train more expert methodologists or to train substantive scientists in more methodology, but without the updating of their training it tends to become outmoded. Now there is the opportunity to capture in expert systems the current best methodological advice and practice. If that opportunity is exploited, standard methodological training of social scientists will shift to emphasizing strategies in using good expert systems—including understanding the nature and importance of the comments it provides—rather than in how to patch together something on one’s own. With expert systems, almost all behavioral and social scientists should become able to conduct any of the more common styles of data analysis more effectively and with more confidence than all but the most expert do today. However, the difficulties in developing expert systems that work as hoped for should not be underestimated. Human experts cannot readily explicate all of the complex cognitive network that constitutes an important part of their knowledge. As a result, the first attempts at expert systems were not especially successful (as discussed in Chapter 1 ). Additional work is expected to overcome these limitations, but it is not clear how long it will take.

Exploratory Analysis and Graphic Presentation

The formal focus of much statistics research in the middle half of the twentieth century was on procedures to confirm or reject precise, a priori hypotheses developed in advance of collecting data—that is, procedures to determine statistical significance. There was relatively little systematic work on realistically rich strategies for the applied researcher to use when attacking real-world problems with their multiplicity of objectives and sources of evidence. More recently, a species of quantitative detective work, called exploratory data analysis, has received increasing attention. In this approach, the researcher seeks out possible quantitative relations that may be present in the data. The techniques are flexible and include an important component of graphic representations. While current techniques have evolved for single responses in situations of modest complexity, extensions to multiple responses and to single responses in more complex situations are now possible.

Graphic and tabular presentation is a research domain in active renaissance, stemming in part from suggestions for new kinds of graphics made possible by computer capabilities, for example, hanging histograms and easily assimilated representations of numerical vectors. Research on data presentation has been carried out by statisticians, psychologists, cartographers, and other specialists, and attempts are now being made to incorporate findings and concepts from linguistics, industrial and publishing design, aesthetics, and classification studies in library science. Another influence has been the rapidly increasing availability of powerful computational hardware and software, now available even on desktop computers. These ideas and capabilities are leading to an increasing number of behavioral experiments with substantial statistical input. Nonetheless, criteria of good graphic and tabular practice are still too much matters of tradition and dogma, without adequate empirical evidence or theoretical coherence. To broaden the respective research outlooks and vigorously develop such evidence and coherence, extended collaborations between statistical and mathematical specialists and other scientists are needed, a major objective being to understand better the visual and cognitive processes (see Chapter 1 ) relevant to effective use of graphic or tabular approaches.

Combining Evidence

Combining evidence from separate sources is a recurrent scientific task, and formal statistical methods for doing so go back 30 years or more. These methods include the theory and practice of combining tests of individual hypotheses, sequential design and analysis of experiments, comparisons of laboratories, and Bayesian and likelihood paradigms.

There is now growing interest in more ambitious analytical syntheses, which are often called meta-analyses. One stimulus has been the appearance of syntheses explicitly combining all existing investigations in particular fields, such as prison parole policy, classroom size in primary schools, cooperative studies of therapeutic treatments for coronary heart disease, early childhood education interventions, and weather modification experiments. In such fields, a serious approach to even the simplest question—how to put together separate estimates of effect size from separate investigations—leads quickly to difficult and interesting issues. One issue involves the lack of independence among the available studies, due, for example, to the effect of influential teachers on the research projects of their students. Another issue is selection bias, because only some of the studies carried out, usually those with “significant” findings, are available and because the literature search may not find out all relevant studies that are available. In addition, experts agree, although informally, that the quality of studies from different laboratories and facilities differ appreciably and that such information probably should be taken into account. Inevitably, the studies to be included used different designs and concepts and controlled or measured different variables, making it difficult to know how to combine them.

Rich, informal syntheses, allowing for individual appraisal, may be better than catch-all formal modeling, but the literature on formal meta-analytic models is growing and may be an important area of discovery in the next decade, relevant both to statistical analysis per se and to improved syntheses in the behavioral and social and other sciences.

  • Opportunities and Needs

This chapter has cited a number of methodological topics associated with behavioral and social sciences research that appear to be particularly active and promising at the present time. As throughout the report, they constitute illustrative examples of what the committee believes to be important areas of research in the coming decade. In this section we describe recommendations for an additional $16 million annually to facilitate both the development of methodologically oriented research and, equally important, its communication throughout the research community.

Methodological studies, including early computer implementations, have for the most part been carried out by individual investigators with small teams of colleagues or students. Occasionally, such research has been associated with quite large substantive projects, and some of the current developments of computer packages, graphics, and expert systems clearly require large, organized efforts, which often lie at the boundary between grant-supported work and commercial development. As such research is often a key to understanding complex bodies of behavioral and social sciences data, it is vital to the health of these sciences that research support continue on methods relevant to problems of modeling, statistical analysis, representation, and related aspects of behavioral and social sciences data. Researchers and funding agencies should also be especially sympathetic to the inclusion of such basic methodological work in large experimental and longitudinal studies. Additional funding for work in this area, both in terms of individual research grants on methodological issues and in terms of augmentation of large projects to include additional methodological aspects, should be provided largely in the form of investigator-initiated project grants.

Ethnographic and comparative studies also typically rely on project grants to individuals and small groups of investigators. While this type of support should continue, provision should also be made to facilitate the execution of studies using these methods by research teams and to provide appropriate methodological training through the mechanisms outlined below.

Overall, we recommend an increase of $4 million in the level of investigator-initiated grant support for methodological work. An additional $1 million should be devoted to a program of centers for methodological research.

Many of the new methods and models described in the chapter, if and when adopted to any large extent, will demand substantially greater amounts of research devoted to appropriate analysis and computer implementation. New user interfaces and numerical algorithms will need to be designed and new computer programs written. And even when generally available methods (such as maximum-likelihood) are applicable, model application still requires skillful development in particular contexts. Many of the familiar general methods that are applied in the statistical analysis of data are known to provide good approximations when sample sizes are sufficiently large, but their accuracy varies with the specific model and data used. To estimate the accuracy requires extensive numerical exploration. Investigating the sensitivity of results to the assumptions of the models is important and requires still more creative, thoughtful research. It takes substantial efforts of these kinds to bring any new model on line, and the need becomes increasingly important and difficult as statistical models move toward greater realism, usefulness, complexity, and availability in computer form. More complexity in turn will increase the demand for computational power. Although most of this demand can be satisfied by increasingly powerful desktop computers, some access to mainframe and even supercomputers will be needed in selected cases. We recommend an additional $4 million annually to cover the growth in computational demands for model development and testing.

Interaction and cooperation between the developers and the users of statistical and mathematical methods need continual stimulation—both ways. Efforts should be made to teach new methods to a wider variety of potential users than is now the case. Several ways appear effective for methodologists to communicate to empirical scientists: running summer training programs for graduate students, faculty, and other researchers; encouraging graduate students, perhaps through degree requirements, to make greater use of the statistical, mathematical, and methodological resources at their own or affiliated universities; associating statistical and mathematical research specialists with large-scale data collection projects; and developing statistical packages that incorporate expert systems in applying the methods.

Methodologists, in turn, need to become more familiar with the problems actually faced by empirical scientists in the laboratory and especially in the field. Several ways appear useful for communication in this direction: encouraging graduate students in methodological specialties, perhaps through degree requirements, to work directly on empirical research; creating postdoctoral fellowships aimed at integrating such specialists into ongoing data collection projects; and providing for large data collection projects to engage relevant methodological specialists. In addition, research on and development of statistical packages and expert systems should be encouraged to involve the multidisciplinary collaboration of experts with experience in statistical, computer, and cognitive sciences.

A final point has to do with the promise held out by bringing different research methods to bear on the same problems. As our discussions of research methods in this and other chapters have emphasized, different methods have different powers and limitations, and each is designed especially to elucidate one or more particular facets of a subject. An important type of interdisciplinary work is the collaboration of specialists in different research methodologies on a substantive issue, examples of which have been noted throughout this report. If more such research were conducted cooperatively, the power of each method pursued separately would be increased. To encourage such multidisciplinary work, we recommend increased support for fellowships, research workshops, and training institutes.

Funding for fellowships, both pre-and postdoctoral, should be aimed at giving methodologists experience with substantive problems and at upgrading the methodological capabilities of substantive scientists. Such targeted fellowship support should be increased by $4 million annually, of which $3 million should be for predoctoral fellowships emphasizing the enrichment of methodological concentrations. The new support needed for research workshops is estimated to be $1 million annually. And new support needed for various kinds of advanced training institutes aimed at rapidly diffusing new methodological findings among substantive scientists is estimated to be $2 million annually.

  • Cite this Page National Research Council; Division of Behavioral and Social Sciences and Education; Commission on Behavioral and Social Sciences and Education; Committee on Basic Research in the Behavioral and Social Sciences; Gerstein DR, Luce RD, Smelser NJ, et al., editors. The Behavioral and Social Sciences: Achievements and Opportunities. Washington (DC): National Academies Press (US); 1988. 5, Methods of Data Collection, Representation, and Analysis.
  • PDF version of this title (16M)

In this Page

Other titles in this collection.

  • The National Academies Collection: Reports funded by National Institutes of Health

Recent Activity

  • Methods of Data Collection, Representation, and Analysis - The Behavioral and So... Methods of Data Collection, Representation, and Analysis - The Behavioral and Social Sciences: Achievements and Opportunities

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

Connect with NLM

National Library of Medicine 8600 Rockville Pike Bethesda, MD 20894

Web Policies FOIA HHS Vulnerability Disclosure

Help Accessibility Careers

statistics

data analysis for social research

Data Analysis for Social Science

Elena Llaudet

30% off with code PUP30

Before you purchase audiobooks and ebooks

Please note that audiobooks and ebooks purchased from this site must be accessed on the Princeton University Press app. After you make your purchase, you will receive an email with instructions on how to download the app. Learn more about audio and ebooks .

Support your local independent bookstore.

  • United States
  • United Kingdom

Data Analysis for Social Science: A Friendly and Practical Introduction

  • Elena Llaudet and Kosuke Imai

An ideal textbook for complete beginners—teaches from scratch R, statistics, and the fundamentals of quantitative social science

data analysis for social research

  • Look Inside
  • Request Exam Copy
  • Download Cover

Data Analysis for Social Science provides a friendly introduction to the statistical concepts and programming skills needed to conduct and evaluate social scientific studies. Assuming no prior knowledge of statistics and coding and only minimal knowledge of math, the book teaches the fundamentals of survey research, predictive models, and causal inference while analyzing data from published studies with the statistical program R. It teaches not only how to perform the data analyses but also how to interpret the results and identify the analyses’ strengths and limitations.

  • Progresses by teaching how to solve one kind of problem after another, bringing in methods as needed. It teaches, in this order, how to (1) estimate causal effects with randomized experiments, (2) visualize and summarize data, (3) infer population characteristics, (4) predict outcomes, (5) estimate causal effects with observational data, and (6) generalize from sample to population.
  • Flips the script of traditional statistics textbooks. It starts by estimating causal effects with randomized experiments and postpones any discussion of probability and statistical inference until the final chapters. This unconventional order engages students by demonstrating from the very beginning how data analysis can be used to answer interesting questions, while reserving more abstract, complex concepts for later chapters.
  • Provides a step-by-step guide to analyzing real-world data using the powerful, open-source statistical program R, which is free for everyone to use. The datasets are provided on the book’s website so that readers can learn how to analyze data by following along with the exercises in the book on their own computer.
  • Assumes no prior knowledge of statistics or coding.
  • Specifically designed to accommodate students with a variety of math backgrounds. It includes supplemental materials for students with minimal knowledge of math and clearly identifies sections with more advanced material so that readers can skip them if they so choose.
  • Provides cheatsheets of statistical concepts and R code.
  • Comes with instructor materials (upon request), including sample syllabi, lecture slides, and additional replication-style exercises with solutions and with the real-world datasets analyzed.

Looking for a more advanced introduction? Consider Quantitative Social Science by Kosuke Imai. In addition to covering the material in Data Analysis for Social Science , it teaches diffs-in-diffs models, heterogeneous effects, text analysis, and regression discontinuity designs, among other things.

data analysis for social research

  • 1.1 Book Overview
  • 1.2 Chapter Summaries
  • 1.3 How to Use This Book
  • 1.4.1 Learning to Code
  • 1.5 Getting Ready
  • 1.6.1 Doing Calculations in R
  • 1.6.2 Creating Objects in R
  • 1.6.3 Using Functions in R
  • 1.7.1 Setting the Working Directory
  • 1.7.2 Loading the Dataset
  • 1.7.3 Understanding the Data
  • 1.7.4 Identifying the Types of Variables Included
  • 1.7.5 Identifying the Number of Observations
  • 1.8.1 Accessing Variables inside Dataframes
  • 1.8.2 Means
  • 1.9 Summary
  • 1.10.1 Concepts and Notation
  • 1.10.2 R Symbols and Operators
  • 1.10.3 R Functions
  • 2.1 Project STAR
  • 2.2.1 Treatment Variables
  • 2.2.2 Outcome Variables
  • 2.3 Individual Causal Effects
  • 2.4.1 Randomized Experiments and the Difference-in-Means Estimator
  • 2.5.1 Relational Operators in R
  • 2.5.2 Creating New Variables
  • 2.5.3 Subsetting Variables
  • 2.6 Summary
  • 2.7.1 Concepts and Notation
  • 2.7.2 R Symbols and Operators
  • 2.7.3 R Functions
  • 3.1 The EU Referendum in the UK
  • 3.2.1 Random Sampling
  • 3.2.2 Potential Challenges
  • 3.3.1 Predicting the Referendum Outcome
  • 3.3.2 Frequency Tables
  • 3.3.3 Tables of Proportions
  • 3.4.1 Handling Missing Data
  • 3.4.2 Two-Way Frequency Tables
  • 3.4.3 Two-Way Tables of Proportions
  • 3.4.4 Histograms
  • 3.4.5 Density Histograms
  • 3.4.6 Descriptive Statistics
  • 3.5.1 Scatter Plots
  • 3.5.2 Correlation
  • 3.6 Summary
  • 3.7.1 Concepts and Notation
  • 3.7.2 R Symbols and Operators
  • 3.7.3 R Functions
  • 4.1 GDP and Night-Time Light Emissions
  • 4.2 Predictors, Observed vs. Predicted Outcomes, andPrediction Errors
  • 4.3.1 The Linear Regression Model
  • 4.3.2 The Intercept Coefficient
  • 4.3.3 The Slope Coefficient
  • 4.3.4 The Least Squares Method
  • 4.4.1 Relationship between GDP and Prior GDP
  • 4.4.2 With Natural Logarithm Transformations
  • 4.5 Predicting GDP Growth Using Night-Time LightEmissions
  • 4.6.1 How Well Do the Three Predictive Modelsin This Chapter Fit the Data?
  • 4.7 Summary
  • 4.8 Appendix: Interpretation of the Slope in the Log-Log Linear Model
  • 4.9.1 Concepts and Notation
  • 4.9.2 R Functions
  • 5.1 Russian State-Controlled TV Coverage of 2014Ukrainian Affairs
  • 5.2.1 Confounding Variables
  • 5.2.2 Why Are Confounders a Problem?
  • 5.2.3 Confounders in Randomized Experiments
  • 5.3.1 Using the Simple Linear Model to Computethe Difference-in-Means Estimator
  • 5.3.2 Controlling for Confounders Using aMultiple Linear Regression Model
  • 5.4.1 Using the Simple Linear Model to Computethe Difference-in-Means Estimator
  • 5.4.2 Controlling for Confounders Using aMultiple Linear Regression Model
  • 5.5.1 Randomized Experiments vs.Observational Studies
  • 5.5.2 The Role of Randomization
  • 5.5.3 How Good Are the Two Causal Analysesin This Chapter?
  • 5.5.4 How Good Was the Causal Analysis inChapter 2?
  • 5.5.5 The Coefficient of Determination, R 2
  • 5.6 Summary
  • 5.7.1 Concepts and Notation
  • 5.7.2 R Functions
  • 6.1 What Is Probability?
  • 6.2 Axioms of Probability
  • 6.3 Events, Random Variables, and ProbabilityDistributions
  • 6.4.1 The Bernoulli Distribution
  • 6.4.2 The Normal Distribution
  • 6.4.3 The Standard Normal Distribution
  • 6.4.4 Recap
  • 6.5.1 The Law of Large Numbers
  • 6.5.2 The Central Limit Theorem
  • 6.5.3 Sampling Distribution of the Sample Mean
  • 6.6 Summary
  • 6.7 Appendix: For Loops
  • 6.8.1 Concepts and Notation
  • 6.8.2 R Symbols and Operators
  • 6.8.3 R Functions
  • 7.1 Estimators and Their Sampling Distributions
  • 7.2.1 For the Sample Mean
  • 7.2.2 For the Difference-in-Means Estimator
  • 7.2.3 For Predicted Outcomes
  • 7.3.1 With the Difference-in-Means Estimator
  • 7.3.2 With Estimated Regression Coefficients
  • 7.4 Statistical vs. Scientific Significance
  • 7.5 Summary
  • 7.6.1 Concepts and Notation
  • 7.6.2 R Symbols and Operators
  • 7.6.3 R Functions
  • Index of Concepts
  • Index of Mathematical Notation
  • Index of R and RStudio

“This is the book that I plan to teach from next time I teach introductory statistics. As it is, I recommend it as a reference for students in more advanced classes such as Applied Regression and Causal Inference, if they want a clean refresher from first principles.”—Andrew Gelman, coauthor of Regression and Other Stories

“This is without doubt the best book to get started with data analysis in the social sciences. Readers learn best practices in research design, measurement, data analysis, and data visualization, all in an approachable and engaging way. My students—all of them complete novices—were easily able to conduct their own analyses after working through this book.”—Simon Weschle, Syracuse University

“My favorite feature of Data Analysis for Social Science is that it puts causal inference first, before probability and statistical inference. I have found that this unconventional order is gentler and more engaging for complete beginners than the approach used in many other books. It also allows students with some prior knowledge of statistics to learn something new from the start.”—Max Goplerud, University of Pittsburgh

“I love this book. More importantly, my students love this book. Data Analysis for Social Science is the perfect introduction to causal inference, probability and statistics, and the open-source programming language R, for students without prior experience. With multiple exercises using R Markdown and a variety of datasets drawn from the research literature, Data Analysis for Social Science gives students a hands-on path to build their skills and confidence.”—Anna Harvey, New York University

“Data science from zero to sixty—gently, expertly, quickly.”—Gary King, Weatherhead University Professor, Harvard University

“This book will transform the way we teach data science in the social sciences. Assuming zero background knowledge, it takes readers step-by-step through the most important concepts of data analysis and coding without sacrificing rigor. With clear explanations, beautiful visuals, and engaging examples, Data Analysis for Social Science is the obvious choice for any student looking to build their data science tool kit.”—Molly Roberts, University of California, San Diego

“I have been teaching statistics for twenty-five years and I have never seen a book this well done. Data Analysis for Social Science is such a perfect combination of what students need to know. The authors’ descriptions of the basic logic of causality, along with the many practical examples and visuals, are amazing features. Also, I have been resisting teaching intro students R because I am very watchful of overloading their bandwidth and I worry about killing their spirit with buggy code; I want them to love data analysis as much as I do! This book made me a convert. I am going to spend the time to learn R so that I can assign this book.”—Vanessa Baird, University of Colorado, Boulder

“I have used Data Analysis for Social Science to teach required undergraduate courses with great success. Students liked the clear explanations and relevant real-world examples, and they even found coding in R fun! By the end, they walked away excited about how these skills opened up new career opportunities and helped them understand the research discussed in other classes.”—Alicia Cooperman, George Washington University

“Looking to get started with data science, but scared it’d be too complicated? This book has you covered. Data Analysis for Social Science truly delivers what the title claims: friendly and practical. The focus is on experimental data and causal inference much more than on multiple regression analysis, reflecting recent developments in the social sciences. I don’t think I’ve seen a more accessible introduction to R and RStudio—cheat sheets included!”—Didier Ruedin, University of Neuchâtel

“Following the step-by-step guidance provided in this book, I built my skills in R rather than another expensive proprietary software, allowing me to share my growing knowledge with my working-class, first-generation students. I am confident I can continue to independently develop these skills in ways that support both my teaching and research.”—Jamie D. Gravell, California State University, Stanislaus

“At last, we have a truly modern introduction to social science statistics. The authors do not shy away from topics like causal inference, and they gently and seamlessly integrate instructions on how to use R. This textbook is a generous gift to both students and teachers.”—Valerio Baćak, School of Criminal Justice, Rutgers University, Newark

“A very sensible and intuitive introduction to data science. Llaudet and Imai do an excellent job of explaining the why of data analysis along with the how. I would recommend this book to anyone looking for a nice primer on data science coupled with a good set of tools using the R software.”—Craig Depken, University of North Carolina, Charlotte

“ Data Analysis for Social Science is a great textbook for any undergraduate research methods course. I especially like that it teaches point estimates and uncertainty separately. In the past, when I taught these concepts together, I found students were overwhelmed. Breaking them up makes the statistics easier to understand. It’s a genius idea! I truly can’t recommend this book enough!”—Christopher Ojeda, University of California, Merced

Stay connected for new books and special offers. Subscribe to receive a welcome discount for your next order. 

  • ebook & Audiobook Cart

Library Home

Social Data Analysis

data analysis for social research

Mikaila Mariel Lemonik Arthur, Rhode Island College

Roger Clark, Rhode Island College

Copyright Year: 2021

Last Update: 2023

Publisher: Rhode Island College Digital Publishing

Language: English

Formats Available

Conditions of use.

Attribution-NonCommercial-ShareAlike

Learn more about reviews.

Reviewed by Alice Cheng, Associate Professor, North Carolina State University on 12/19/23

Social Data Analysis: A Comprehensive Guide" truly lives up to its title by offering a comprehensive exploration of both quantitative and qualitative data analysis in the realm of social research. The book provides an in-depth understanding of the... read more

Comprehensiveness rating: 4 see less

Social Data Analysis: A Comprehensive Guide" truly lives up to its title by offering a comprehensive exploration of both quantitative and qualitative data analysis in the realm of social research. The book provides an in-depth understanding of the subject matter, making it a valuable resource for readers seeking a thorough grasp of social data analysis.

The comprehensiveness of the book is evident in several key aspects:

Coverage of Quantitative and Qualitative Methods:

The book effectively covers both quantitative and qualitative data analysis, acknowledging the importance of a balanced approach in social research. Readers benefit from a holistic understanding of various analytical methods, allowing them to choose the most suitable approach for their research questions. Focus on SPSS for Quantitative Analysis:

The dedicated section on quantitative data analysis with SPSS demonstrates the book's commitment to providing practical guidance. Readers are taken through the nuances of using SPSS, from basic functions to more advanced analysis, enhancing their proficiency in a widely used statistical software. Real-World Application Using GSS Data:

The integration of data from the 2021 General Social Survey (GSS) and the modified GSS Codebook adds a practical dimension to the book. Readers have the opportunity to apply their learning to real-world scenarios, fostering a deeper understanding of social data analysis in action. Consideration of Ethical Practices:

The book's mention of survey weights and their exclusion from the learning dataset reflects a commitment to ethical data analysis practices. This attention to ethical considerations enhances the comprehensiveness of the book by addressing important aspects of responsible research. Supplementary Resources and Glossary:

The inclusion of a glossary ensures that readers, especially those new to the field, can easily grasp the terminology used. The availability of supplementary resources, such as a modified GSS Codebook, further supports readers in applying their knowledge beyond theoretical discussions. Recognition of Alternative Tools:

Acknowledging the existence of alternative tools, such as R, demonstrates the book's awareness of the diversity in data analysis approaches. While focusing on SPSS, the book encourages readers to explore other options, contributing to a more nuanced and well-rounded education in social data analysis. Overall, the book's comprehensiveness lies not only in its coverage of various data analysis methods but also in its commitment to providing practical, ethical, and diverse perspectives on social data analysis. It serves as an inclusive and accessible guide for readers at different levels of expertise.

Content Accuracy rating: 4

"Social Data Analysis: A Comprehensive Guide" maintains a commendable level of accuracy throughout its content. The authors demonstrate a meticulous approach to presenting information, ensuring that concepts are explained with precision and clarity. The accuracy is particularly notable in the sections covering quantitative data analysis with SPSS, where step-by-step instructions are provided for readers to follow, minimizing the risk of misinterpretation.

The use of real-world examples from the 2021 General Social Survey enhances the book's accuracy by grounding theoretical discussions in practical applications. The modified GSS Codebook is a thoughtful addition, contributing to the accuracy of the learning experience by providing a clear reference for variables used in the examples.

The authors' acknowledgment of the limitation regarding survey weights in the learning dataset reflects a commitment to transparency and ethical research practices. While the book focuses on a specific statistical software (SPSS), it accurately recognizes alternative tools like R, allowing readers to make informed decisions based on their preferences and requirements.

The glossary aids in maintaining accuracy by providing clear definitions of key terms, ensuring that readers have a precise understanding of the terminology used. Additionally, the reference to external resources, such as IBM's list of resellers and related guides from Kent State, contributes to the accuracy of the book by directing readers to authoritative sources for further information.

In conclusion, "Social Data Analysis: A Comprehensive Guide" upholds a high level of accuracy, presenting information in a manner that is both reliable and accessible. The book's attention to detail, reliance on real-world examples, and commitment to ethical considerations collectively contribute to its overall accuracy as a valuable resource for those engaging in social data analysis.

Relevance/Longevity rating: 4

"Social Data Analysis: A Comprehensive Guide" stands out for its relevance in the field of social research and data analysis. Several key aspects contribute to the book's contemporary and practical relevance:

Integration of Current Data:

The incorporation of data from the 2021 General Social Survey (GSS) ensures that the book's examples and applications are based on recent and relevant datasets. This contemporary approach allows readers to engage with real-world scenarios and analyze data reflective of current social trends. Focus on SPSS and Alternative Tools:

The book's emphasis on using SPSS for quantitative data analysis aligns with the software's widespread use in the social sciences. This focus enhances the book's relevance for readers in academic and professional settings where SPSS is commonly employed. Moreover, the acknowledgment of alternative tools, such as R, adds relevance by catering to a diverse audience with varying software preferences. Practical Applications:

The inclusion of practical examples, screenshots, and step-by-step instructions in the section on quantitative data analysis with SPSS enhances the book's relevance. Readers can directly apply the concepts learned, fostering a hands-on learning experience that is directly applicable to their research or academic pursuits. Ethical Considerations:

The discussion on ethical considerations, particularly the mention of survey weights and their exclusion from the learning dataset, adds relevance by addressing contemporary concerns in research methodology. This ethical awareness aligns with current discussions surrounding responsible and transparent research practices. Diversity of Analytical Approaches:

The book's acknowledgment of alternative methods, such as qualitative and mixed methods data analysis with Dedoose, contributes to its relevance by recognizing the diversity of approaches within the social sciences. This inclusivity allows readers to explore different analytical methods based on their research needs. Supplementary Resources:

The provision of supplementary resources, including the modified GSS Codebook and references to external guides, enhances the book's relevance. These resources offer readers additional tools and information to extend their learning beyond the book, ensuring that they stay updated on best practices and advancements in social data analysis. In summary, "Social Data Analysis: A Comprehensive Guide" remains relevant by incorporating current data, addressing ethical considerations, and catering to a diverse audience with practical examples and alternative tools. The book's contemporary approach aligns with the evolving landscape of social research and data analysis, making it a valuable and relevant resource for students, researchers, and practitioners alike.

Clarity rating: 4

"Social Data Analysis: A Comprehensive Guide" excels in clarity, offering readers a lucid and accessible journey through the intricate landscape of social data analysis. Several factors contribute to the clarity of the book:

Clear Explanations and Language:

The authors employ clear and concise language, making complex concepts in social data analysis accessible to a broad audience. Technical terms are explained in a straightforward manner, enhancing comprehension for readers regardless of their prior knowledge in the field. Step-by-Step Instructions:

The section on quantitative data analysis with SPSS stands out for its clarity due to the inclusion of step-by-step instructions. Readers are guided through processes, ensuring that they can follow and replicate actions easily. This approach fosters a practical understanding of how to apply the theoretical concepts discussed. Visual Aids and Examples:

The use of visual aids, such as screenshots and examples, enhances clarity by providing readers with visual cues to reinforce textual explanations. Real-world examples from the 2021 General Social Survey help readers connect theoretical concepts to practical applications, furthering their understanding. Logical Organization:

The book follows a logical and well-organized structure, moving from introducing social data analysis to specific tools and methods. This logical progression aids in the clarity of the learning journey, allowing readers to build on their understanding progressively. Glossary for Terminology:

The inclusion of a glossary ensures that readers can easily reference and understand key terminology. This contributes to overall clarity by preventing confusion about specialized terms used in the context of social data analysis. Consideration of Different Audiences:

The book is mindful of different audiences by providing options for both students and faculty. This consideration adds clarity by tailoring content to the specific needs and perspectives of these distinct reader groups. Transparency Regarding Limitations:

The book's transparency regarding limitations, such as the exclusion of survey weights from the learning dataset, contributes to clarity. Readers are made aware of the scope and purpose of the dataset, avoiding potential confusion about its applicability to real-world scenarios. In summary, "Social Data Analysis: A Comprehensive Guide" is characterized by its clarity, achieved through clear explanations, practical examples, logical organization, and thoughtful consideration of the diverse needs of its readership. The book effectively demystifies social data analysis, making it an approachable and enlightening resource for individuals at various levels of expertise.

Consistency rating: 4

"Social Data Analysis: A Comprehensive Guide" maintains a high level of consistency throughout its content, ensuring a cohesive and reliable learning experience. The consistency is evident in the uniform and clear language used across chapters, providing a seamless transition for readers as they navigate different sections of the book. The logical organization of topics and the structured approach to quantitative data analysis with SPSS contribute to a consistent learning curve, allowing readers to progressively build on their knowledge. Additionally, the inclusion of real-world examples and visual aids is consistently applied, enhancing the practicality of the book. The authors' commitment to ethical considerations, such as the transparency about the exclusion of survey weights in the learning dataset, reflects a consistent adherence to responsible research practices. Overall, the book's internal coherence, both in language and content, ensures that readers experience a consistent and reliable guide in their exploration of social data analysis.

Modularity rating: 3

"Social Data Analysis: A Comprehensive Guide" excels in modularity, providing a well-organized and modular structure that enhances the learning experience. The book is divided into distinct sections, each focusing on specific aspects of social data analysis. This modular approach allows readers to navigate the content efficiently, catering to different learning preferences and enabling targeted study.

The modularity is evident in the clear demarcation of chapters, from the introduction of social data analysis to the practical application of quantitative data analysis with SPSS and qualitative data analysis with Dedoose. Each section is designed as a standalone module, contributing to a structured and cohesive learning path.

Furthermore, within each module, the book maintains a modular design with sub-sections, ensuring that readers can easily locate and focus on specific topics of interest. The step-by-step instructions provided in the quantitative data analysis section exemplify this modular design, breaking down complex processes into manageable and easily digestible components.

The inclusion of supplementary resources, such as the modified GSS Codebook and glossary, adds to the modularity by offering readers standalone references that complement the main content. This modularity enhances the accessibility of the book, allowing readers to customize their learning experience based on their specific needs and interests.

In conclusion, the modularity of "Social Data Analysis: A Comprehensive Guide" contributes to the book's effectiveness as an educational resource. The well-structured and modular design facilitates a flexible and user-friendly learning experience, making it a valuable tool for readers seeking to navigate the complexities of social data analysis at their own pace.

Organization/Structure/Flow rating: 4

"Social Data Analysis: A Comprehensive Guide" is a well-structured and informative book that serves as an invaluable resource for students and faculty delving into the realm of social data analysis. The authors adeptly navigate readers through the intricacies of both quantitative and qualitative data analysis, placing a specific emphasis on the use of SPSS (Statistical Package for the Social Sciences) for quantitative analysis.

The book begins with a solid foundation, introducing readers to the concept of social data analysis. The initial sections provide a clear understanding of the importance and application of both quantitative and qualitative methods in social research. Notably, the authors strike a balance between theory and practical application, ensuring that readers can grasp the concepts and implement them effectively.

The heart of the book lies in its detailed exploration of quantitative data analysis with SPSS. The authors guide readers through the usage of this powerful statistical software, offering practical insights and step-by-step instructions. The inclusion of screenshots and examples using data from the 2021 General Social Survey enhances the book's accessibility, allowing readers to follow along seamlessly.

Furthermore, the book goes beyond theoretical discussions and provides a modified GSS Codebook for the data used in the text. This resource is invaluable for readers who wish to apply their knowledge to real-world scenarios. The authors' emphasis on the importance of survey weights and their exclusion from the learning dataset demonstrates a commitment to ethical and accurate data analysis practices.

The inclusion of a glossary enriches the learning experience by providing clear definitions of key terms. Additionally, the section on qualitative and mixed methods data analysis with Dedoose broadens the scope of the book, catering to readers interested in a diverse range of analytical approaches.

While the book excels in elucidating complex topics, it does not shy away from acknowledging alternative tools. The authors rightly introduce R as an open-source alternative, recognizing its significance and suggesting that R supplements to the book may be available in the future.

In conclusion, "Social Data Analysis: A Comprehensive Guide" stands out as a comprehensive and accessible resource for individuals venturing into the field of social data analysis. The authors' expertise, coupled with practical examples and supplementary resources, make this book a valuable companion for students, faculty, and anyone keen on mastering the art and science of social data analysis.

Interface rating: 4

The text is free of significant interface issues, including navigation problems, distortion of images/charts, and any other display features that may distract or confuse the reader.

Grammatical Errors rating: 5

The book contains no grammatical errors

Cultural Relevance rating: 5

The text is not culturally insensitive or offensive in any way.

Table of Contents

  • Acknowledgements
  • How to Use This Book
  • Section I. Introducting Social Data Analysis
  • Section II. Quantitative Data Analysis
  • Section III. Qualitative Data Analysis
  • Section IV. Quantitative Data Analysis with SPSS
  • Section V. Qualitative and Mixed Methods Data Analysis with Dedoose
  • Modified GSS Codebook for the Data Used in this Text
  • Works Citied
  • About the Authors

Ancillary Material

About the book.

Social data analysis enables you, as a researcher, to organize the facts you collect during your research. Your data may have come from a questionnaire survey, a set of interviews, or observations. They may be data that have been made available to you from some organization, national or international agency or other researchers. Whatever their source, social data can be daunting to put together in a way that makes sense to you and others.

This book is meant to help you in your initial attempts to analyze data. In doing so it will introduce you to ways that others have found useful in their attempts to organize data. You might think of it as like a recipe book, a resource that you can refer to as you prepare data for your own consumption and that of others. And, like a recipe book that teaches you to prepare simple dishes, you may find this one pretty exciting. Analyzing data in a revealing way is at least as rewarding, we’ve found, as it is to cook up a yummy cashew carrot paté or a steaming corn chowder. We’d like to share our pleasure with you.

About the Contributors

Mikaila Mariel Lemonik Arthur is Professor of Sociology at Rhode Island College, where she has taught a wide variety of courses including Social Research Methods, Social Data Analysis, Senior Seminar in Sociology, Professional Writing for Justice Services, Comparative Law and Justice, Law and Society, Comparative Perspectives on Higher Education, and Race and Justice. She has written a number of books and articles, including both those with a pedagogical focus (including Law and Justice Around the World, published by the University of California Press) and those focusing on her scholarly expertise in higher education (including Student Activism and Curricular Change in Higher Education, published by Routledge). She has expertise and experience in academic program review, translating research findings for policymakers, and disability accessibility in higher education, and has served as a department chair and as Vice President of the RIC/AFT, her faculty union. Outside of work, she enjoys reading speculative fiction, eating delicious vegan food, visiting the ocean, and spending time with amazing humans.

Roger Clark is Professor Emeritus of Sociology at Rhode Island College, where he continues to teach courses in Social Research Methods and Social Data Analysis and to coauthor empirical research articles with undergraduate students. He has coauthored two textbooks, An Invitation to Social Research (with Emily Stier Adler) and Gender Inequality in Our Changing World: A Comparative Approach (with Lori Kenschaft and Desirée Ciambrone). He has been ranked by the USTA in its New England 60- and 65-and-older divisions, shot four holes in one on genuine golf courses, and run multiple half and full marathons. Like the Energizer Bunny, he keeps on going and going, but, given his age, leaves it to your imagination where

Contribute to this Page

data analysis for social research

The Importance of Critical Data Analysis for the Social Sciences

Current social science research and writing faces a number of possibilities that seem to be constrained by three major challenges. The first is the limits of the imagination; the second is knowing what kinds of data are now out there; and the third is having the tools to aggregate and mine them.

Extend this beyond the act of thinking about the publication of the work to doing the research itself—that is, to almost any other question in any social science field. Because there are sensors everywhere— traffic sensors, security footage, digital tracks that we strew all over—now that we are citizens of the Internet. These digital traces are everywhere: there are records that are being kept, sometimes passively, sometimes actively, sometimes curated, sometimes not; there are tracks of data that we are all leaving and have been leaving for at least the past two decades that could answer questions, or pose interesting questions to ask, that require the active stirring of human curiosity to imagine. Add to that the text and data mining of enormous collections of literary texts and the digitizing of earlier analog data sources, and the possibilities within which to apply our cognitive skills grow further.

The second challenge is tied to the first: for that curiosity to be activated, one would need to have a sense of what data are actually out there and therefore what one would use and what one would need to gather anew. What do we know about real-time traffic patterns, or the movements of people from one domicile to another, or the economic transactions of a certain group of people over time? We know that there are certain kinds of archives out there—like the collection of tweets at the Library of Congress or the Internet Archive of the web—but how many social scientists are aware of the work of the New York firm Sparks and Honey that has been tracking trends across the planet in ways that are crucial to corporations but could also be the most valuable kind of research data archive for any number of issues of immense interest to various social science fields? They claim that they work in a space of “people and platforms, man and machine, ideas and algorithms, magic and math.” The data they collect are curated, carefully housed, and searchable in myriad ways. And what about those tech companies who know what we read and when we read it and how much of what we are reading we actually page through, what we buy and how much it costs, where we are at a certain moment in time, and can sort those data in categories that might not yet have been imagined by social scientists?

The third challenge—let us posit that we can imagine what it is that we want to ask and that we can start to get a sense of what possible data are out there (I am more sanguine about the first than the second)—is to get the access and the permission to publish from it. Some privileged researchers can get to Google searches at a more finely grained level—how finely grained it actually may be is likely not known fully by the researchers themselves because they may not know how much is really available or what portion they are being allowed to see. What do we know about real-time traffic patterns, or the movements of people from one domicile to another, or the economic transactions of a certain group of people over time? Then the kinds of issues that may strike terror into an institutional review board: Can one conceivably use the kinds of information that have been collected in ways that would pass muster through the traditional process? What if we are dealing with materials across national borders with different legal privacy regimes? What private agreements have corporations who collect data as a matter of course made in different countries over time and how is that reflected in the data that are extant?

In the face of these three challenges, we continue to do research about questions that we pose and write up the results. One way that some of us deal with these issues is to delimit our work so that we do not wander into areas that require us to think about all of the sources we might imagine by confining our questions to the worlds of research with which we are most comfortable while avoiding privacy issues as far as we can. But I suspect that none of us is willing to constrain ourselves in the long run to tools of the trade that are losing their finely honed edges. Learning what is out there is becoming more and more of a profession in itself: the data curator and the data scientist are two of our newest job titles and the holders of these positions are now working in research libraries and research universities in all fields of the social sciences. The Alfred P. Sloan Foundation has been one of the leaders in this area, developing the careers of recent PhDs in conjunction with the Council on Library and Information Resources (CLIR) in placing postdocs in libraries. Working with the Moore Foundation, Sloan is funding data scientists at the University of Washington, at Berkeley, and at New York University. So finding a colleague in one’s field who is a data curator or a data scientist when one is posing a research question would be a good way to start. The murky legal terrain in the areas of privacy and in areas of copyright across national borders are much more complicated issues and the fears of putting people’s lives in danger because of the multiple sources that could be triangulated to uncover the identity of a supposedly anonymous person make it very difficult to rely upon older methods of protecting one’s informants.

Let me end on another note, not a fourth challenge to the social sciences per se, but to the entire world of scholarship. We are increasingly in a position where the basic way of interacting with words, numbers, images, and data of all sorts is no longer possible without machine intervention. We can read distantly, we search electronically, we collect and file digitally, we write on a machine. It seems to me that the one tool that once remained within our own individual control was the act of reading and what followed from that: thinking about and writing about what we read. Our only technical aid was perhaps a pair of glasses and the organizational skills of those who collected and curated what was published, making it available to us. When reading becomes distant, when interaction with data becomes entirely electronic, and when the searching for materials is mediated by software, the scholar cannot in good faith rely on algorithms in which one had no input on how they operate, or searches that are mediated by a third party, or software that makes assumptions about which the researcher is unaware. Building our own tools, understanding how the tools that others build and operate and knowing that we are working with data in a way that we understand and can rely upon, might be the greatest challenge of the years ahead. Searching for and finding everything relevant on a certain topic—the work that used to be dominated by the values of the library and university community—is key to that challenge: finding a way to take back into the hands of those whose commitment is to the stewardship of information on and knowledge of our cultural heritage should be central to the work of scholarship.

avatar

Elliot Shore

In January 2013, Elliott Shore took on the role of executive director of the Association of Research Libraries, a nonprofit organization of 124 research libraries at comprehensive, research-extensive institutions in the US and Canada. In 1997, he joined Bryn Mawr College as director of libraries and professor of history. He served the previous twelve years as a library director at the Institute for Advanced Study, a post he assumed in 1985, just after he was awarded his PhD in history from Bryn Mawr College. Shore received his MS in library science from Drexel University, and he earned his MA in... Read more

data analysis for social research

You may also like

The changing contexts of evaluation and reputation making, imagining decentralized publication and curation, curating scholarship’s public standing, twitter in the eye of the storm: assessing the public’s risk perceptions.

No internet connection.

All search filters on the page have been cleared., your search has been saved..

  • All content
  • Dictionaries
  • Encyclopedias
  • Expert Insights
  • Foundations
  • How-to Guides
  • Journal Articles
  • Little Blue Books
  • Little Green Books
  • Project Planner
  • Tools Directory
  • Sign in to my profile No Name

Not Logged In

  • Sign in Signed in
  • My profile No Name

Not Logged In

Theory-Based Data Analysis for the Social Sciences

  • By: Carol S. Aneshensel
  • Publisher: SAGE Publications, Inc.
  • Publication year: 2013
  • Online pub date: December 23, 2015
  • Discipline: Anthropology
  • Methods: Independent variables , Dependent variables , Secondary data analysis
  • DOI: https:// doi. org/10.4135/9781506335094
  • Keywords: emotion , estimates , mediation , mental health , population , social data , social science Show all Show less
  • Print ISBN: 9781412994354
  • Online ISBN: 9781506335094
  • Buy the book icon link

Subject index

This book presents a method for bringing data analysis and statistical technique into line with theory. The author begins by describing the elaboration model for analyzing the empirical association between variables. She then introduces a new concept into this model, the focal relationship. Building upon the focal relationship as the cornerstone for all subsequent analysis, two analytic strategies are developed to establish its internal validity: an exclusionary strategy to eliminate alternative explanations, and an inclusive strategy which looks at the interconnected set of relationships predicted by theory. Using real examples of social research, the author demonstrates the use of this approach for two common forms of analysis, multiple linear regression and logistic regression. Whether learning data analysis for the first time or adding new techniques to your repertoire, this book provides an excellent basis for theory-based data analysis.

Front Matter

  • Acknowledgments
  • About the Author
  • Chapter 1 | Introduction to Theory-Based Data Analysis
  • Chapter 2 | The Logic of Theory-Based Data Analysis
  • Chapter 3 | Relationships as Associations
  • Chapter 4 | The Focal Relationship: Causal Inference
  • Chapter 5 | The Elaboration Model with Multiple Linear Regression
  • Chapter 6 | Regression with Survey Data from Complex Samples
  • Chapter 7 | Ruling Out Alternative Explanations: Spuriousness and Control Variables
  • Chapter 8 | Ruling Out Alternative Theoretical Explanations: Rival Independent Variables
  • Chapter 9 | Elaborating the Focal Relationship: Mediation and Intervening Variables
  • Chapter 10 | Elaborating the Focal Relationship: Antecedent and Consequent Variables
  • Chapter 11 | Specifying Conditions of Influence: Moderating Variables
  • Chapter 12 | The Elaboration Model with Logistic Regression
  • Chapter 13 | Synthesis and Comments

Back Matter

  • Abbreviations

Sign in to access this content

Get a 30 day free trial, more like this, sage recommends.

We found other relevant content for you on other Sage platforms.

Have you created a personal profile? Login or create a profile so that you can save clips, playlists and searches

  • Sign in/register

Navigating away from this page will delete your results

Please save your results to "My Self-Assessments" in your profile before navigating away from this page.

Sign in to my profile

Sign up for a free trial and experience all Sage Learning Resources have to offer.

You must have a valid academic email address to sign up.

Get off-campus access

  • View or download all content my institution has access to.

Sign up for a free trial and experience all Sage Research Methods has to offer.

  • view my profile
  • view my lists
  • Technical Support
  • Find My Rep

You are here

Adventures in Social Research

Adventures in Social Research Data Analysis Using IBM SPSS Statistics

  • Earl Babbie - Chapman University, USA
  • William E. Wagner, III - California State University, Dominguez Hills, USA
  • Jeanne Zaino - Iona College, USA
  • Description

See what’s new to this edition by selecting the Features tab on this page. Should you need additional information or have questions regarding the HEOA information provided for this title, including what is new to this edition, please email [email protected] . Please include your name, contact information, and the name of the title for which you would like more information. For information on the HEOA, please go to http://ed.gov/policy/highered/leg/hea08/index.html .

For assistance with your order: Please email us at [email protected] or connect with your SAGE representative.

SAGE 2455 Teller Road Thousand Oaks, CA 91320 www.sagepub.com

Supplements

This text has been a lifesaver! Although the material is challenging, I have been continually impressed with my student’s ability to come away from this course with the ability to perform their own (small) data analysis project in the final week using what they learned. . . . Many start with zero knowledge or experience with research, and in a very short time period are able to get up to speed with the terminology, and to sift through all of the various ‘rules’ of data analysis (which measures of association, tests of significance, etc. to use based on their variables) like pros.

Was already adopted by the prior prof. It is a good book and I kept it adopted. It does need help with modernizing some phrases and data though -- esp in the lab assignments.

Very practical and student-friendly!

Uses SPSS. Provides good instructions, generally, but there continue to be the same errors edition after edition.

Good support for teaching SPSS

Easy to understand for beginners

Sample Materials & Chapters

Chapter 1. Introduction

Chapter 2. The Logic of Measurement

For instructors

Select a purchasing option.

Logo for University of Southern Queensland

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

13 Qualitative analysis

Qualitative analysis is the analysis of qualitative data such as text data from interview transcripts. Unlike quantitative analysis, which is statistics driven and largely independent of the researcher, qualitative analysis is heavily dependent on the researcher’s analytic and integrative skills and personal knowledge of the social context where the data is collected. The emphasis in qualitative analysis is ‘sense making’ or understanding a phenomenon, rather than predicting or explaining. A creative and investigative mindset is needed for qualitative analysis, based on an ethically enlightened and participant-in-context attitude, and a set of analytic strategies. This chapter provides a brief overview of some of these qualitative analysis strategies. Interested readers are referred to more authoritative and detailed references such as Miles and Huberman’s (1984) [1] seminal book on this topic.

Grounded theory

How can you analyse a vast set of qualitative data acquired through participant observation, in-depth interviews, focus groups, narratives of audio/video recordings, or secondary documents? One of these techniques for analysing text data is grounded theory —an inductive technique of interpreting recorded data about a social phenomenon to build theories about that phenomenon. The technique was developed by Glaser and Strauss (1967) [2] in their method of constant comparative analysis of grounded theory research, and further refined by Strauss and Corbin (1990) [3] to further illustrate specific coding techniques—a process of classifying and categorising text data segments into a set of codes (concepts), categories (constructs), and relationships. The interpretations are ‘grounded in’ (or based on) observed empirical data, hence the name. To ensure that the theory is based solely on observed evidence, the grounded theory approach requires that researchers suspend any pre-existing theoretical expectations or biases before data analysis, and let the data dictate the formulation of the theory.

Strauss and Corbin (1998) describe three coding techniques for analysing text data: open, axial, and selective. Open coding is a process aimed at identifying concepts or key ideas that are hidden within textual data, which are potentially related to the phenomenon of interest. The researcher examines the raw textual data line by line to identify discrete events, incidents, ideas, actions, perceptions, and interactions of relevance that are coded as concepts (hence called in vivo codes ). Each concept is linked to specific portions of the text (coding unit) for later validation. Some concepts may be simple, clear, and unambiguous, while others may be complex, ambiguous, and viewed differently by different participants. The coding unit may vary with the concepts being extracted. Simple concepts such as ‘organisational size’ may include just a few words of text, while complex ones such as ‘organizational mission’ may span several pages. Concepts can be named using the researcher’s own naming convention, or standardised labels taken from the research literature. Once a basic set of concepts are identified, these concepts can then be used to code the remainder of the data, while simultaneously looking for new concepts and refining old concepts. While coding, it is important to identify the recognisable characteristics of each concept, such as its size, colour, or level—e.g., high or low—so that similar concepts can be grouped together later . This coding technique is called ‘open’ because the researcher is open to and actively seeking new concepts relevant to the phenomenon of interest.

Next, similar concepts are grouped into higher order categories . While concepts may be context-specific, categories tend to be broad and generalisable, and ultimately evolve into constructs in a grounded theory. Categories are needed to reduce the amount of concepts the researcher must work with and to build a ‘big picture’ of the issues salient to understanding a social phenomenon. Categorisation can be done in phases, by combining concepts into subcategories, and then subcategories into higher order categories. Constructs from the existing literature can be used to name these categories, particularly if the goal of the research is to extend current theories. However, caution must be taken while using existing constructs, as such constructs may bring with them commonly held beliefs and biases. For each category, its characteristics (or properties) and the dimensions of each characteristic should be identified. The dimension represents a value of a characteristic along a continuum. For example, a ‘communication media’ category may have a characteristic called ‘speed’, which can be dimensionalised as fast, medium, or slow . Such categorisation helps differentiate between different kinds of communication media, and enables researchers to identify patterns in the data, such as which communication media is used for which types of tasks.

The second phase of grounded theory is axial coding , where the categories and subcategories are assembled into causal relationships or hypotheses that can tentatively explain the phenomenon of interest. Although distinct from open coding, axial coding can be performed simultaneously with open coding. The relationships between categories may be clearly evident in the data, or may be more subtle and implicit. In the latter instance, researchers may use a coding scheme (often called a ‘coding paradigm’, but different from the paradigms discussed in Chapter 3) to understand which categories represent conditions (the circumstances in which the phenomenon is embedded), actions/interactions (the responses of individuals to events under these conditions), and consequences (the outcomes of actions/interactions). As conditions, actions/interactions, and consequences are identified, theoretical propositions start to emerge, and researchers can start explaining why a phenomenon occurs, under what conditions, and with what consequences.

The third and final phase of grounded theory is selective coding , which involves identifying a central category or a core variable, and systematically and logically relating this central category to other categories. The central category can evolve from existing categories or can be a higher order category that subsumes previously coded categories. New data is selectively sampled to validate the central category, and its relationships to other categories—i.e., the tentative theory. Selective coding limits the range of analysis, and makes it move fast. At the same time, the coder must watch out for other categories that may emerge from the new data that could be related to the phenomenon of interest (open coding), which may lead to further refinement of the initial theory. Hence, open, axial, and selective coding may proceed simultaneously. Coding of new data and theory refinement continues until theoretical saturation is reached—i.e., when additional data does not yield any marginal change in the core categories or the relationships.

The ‘constant comparison’ process implies continuous rearrangement, aggregation, and refinement of categories, relationships, and interpretations based on increasing depth of understanding, and an iterative interplay of four stages of activities: comparing incidents/texts assigned to each category to validate the category), integrating categories and their properties, delimiting the theory by focusing on the core concepts and ignoring less relevant concepts, and writing theory using techniques like memoing, storylining, and diagramming. Having a central category does not necessarily mean that all other categories can be integrated nicely around it. In order to identify key categories that are conditions, action/interactions, and consequences of the core category, Strauss and Corbin (1990) recommend several integration techniques, such as storylining, memoing, or concept mapping, which are discussed here. In storylining , categories and relationships are used to explicate and/or refine a story of the observed phenomenon. Memos are theorised write-ups of ideas about substantive concepts and their theoretically coded relationships as they evolve during ground theory analysis, and are important tools to keep track of and refine ideas that develop during the analysis. Memoing is the process of using these memos to discover patterns and relationships between categories using two-by-two tables, diagrams, or figures, or other illustrative displays. Concept mapping is a graphical representation of concepts and relationships between those concepts—e.g., using boxes and arrows. The major concepts are typically laid out on one or more sheets of paper, blackboards, or using graphical software programs, linked to each other using arrows, and readjusted to best fit the observed data.

After a grounded theory is generated, it must be refined for internal consistency and logic. Researchers must ensure that the central construct has the stated characteristics and dimensions, and if not, the data analysis may be repeated. Researcher must then ensure that the characteristics and dimensions of all categories show variation. For example, if behaviour frequency is one such category, then the data must provide evidence of both frequent performers and infrequent performers of the focal behaviour. Finally, the theory must be validated by comparing it with raw data. If the theory contradicts with observed evidence, the coding process may need to be repeated to reconcile such contradictions or unexplained variations.

Content analysis

Content analysis is the systematic analysis of the content of a text—e.g., who says what, to whom, why, and to what extent and with what effect—in a quantitative or qualitative manner. Content analysis is typically conducted as follows. First, when there are many texts to analyse—e.g., newspaper stories, financial reports, blog postings, online reviews, etc.—the researcher begins by sampling a selected set of texts from the population of texts for analysis. This process is not random, but instead, texts that have more pertinent content should be chosen selectively. Second, the researcher identifies and applies rules to divide each text into segments or ‘chunks’ that can be treated as separate units of analysis. This process is called unitising . For example, assumptions, effects, enablers, and barriers in texts may constitute such units. Third, the researcher constructs and applies one or more concepts to each unitised text segment in a process called coding . For coding purposes, a coding scheme is used based on the themes the researcher is searching for or uncovers as they classify the text. Finally, the coded data is analysed, often both quantitatively and qualitatively, to determine which themes occur most frequently, in what contexts, and how they are related to each other.

A simple type of content analysis is sentiment analysis —a technique used to capture people’s opinion or attitude toward an object, person, or phenomenon. Reading online messages about a political candidate posted on an online forum and classifying each message as positive, negative, or neutral is an example of such an analysis. In this case, each message represents one unit of analysis. This analysis will help identify whether the sample as a whole is positively or negatively disposed, or neutral towards that candidate. Examining the content of online reviews in a similar manner is another example. Though this analysis can be done manually, for very large datasets—e.g., millions of text records—natural language processing and text analytics based software programs are available to automate the coding process, and maintain a record of how people’s sentiments fluctuate with time.

A frequent criticism of content analysis is that it lacks a set of systematic procedures that would allow the analysis to be replicated by other researchers. Schilling (2006) [4] addressed this criticism by organising different content analytic procedures into a spiral model. This model consists of five levels or phases in interpreting text: convert recorded tapes into raw text data or transcripts for content analysis, convert raw data into condensed protocols, convert condensed protocols into a preliminary category system, use the preliminary category system to generate coded protocols, and analyse coded protocols to generate interpretations about the phenomenon of interest.

Content analysis has several limitations. First, the coding process is restricted to the information available in text form. For instance, if a researcher is interested in studying people’s views on capital punishment, but no such archive of text documents is available, then the analysis cannot be done. Second, sampling must be done carefully to avoid sampling bias. For instance, if your population is the published research literature on a given topic, then you have systematically omitted unpublished research or the most recent work that is yet to be published.

Hermeneutic analysis

Hermeneutic analysis is a special type of content analysis where the researcher tries to ‘interpret’ the subjective meaning of a given text within its sociohistoric context. Unlike grounded theory or content analysis—which ignores the context and meaning of text documents during the coding process—hermeneutic analysis is a truly interpretive technique for analysing qualitative data. This method assumes that written texts narrate an author’s experience within a sociohistoric context, and should be interpreted as such within that context. Therefore, the researcher continually iterates between singular interpretation of the text (the part) and a holistic understanding of the context (the whole) to develop a fuller understanding of the phenomenon in its situated context, which German philosopher Martin Heidegger called the hermeneutic circle . The word hermeneutic (singular) refers to one particular method or strand of interpretation.

More generally, hermeneutics is the study of interpretation and the theory and practice of interpretation. Derived from religious studies and linguistics, traditional hermeneutics—such as biblical hermeneutics —refers to the interpretation of written texts, especially in the areas of literature, religion and law—such as the Bible. In the twentieth century, Heidegger suggested that a more direct, non-mediated, and authentic way of understanding social reality is to experience it, rather than simply observe it, and proposed philosophical hermeneutics , where the focus shifted from interpretation to existential understanding. Heidegger argued that texts are the means by which readers can not only read about an author’s experience, but also relive the author’s experiences. Contemporary or modern hermeneutics, developed by Heidegger’s students such as Hans-Georg Gadamer, further examined the limits of written texts for communicating social experiences, and went on to propose a framework of the interpretive process, encompassing all forms of communication, including written, verbal, and non-verbal, and exploring issues that restrict the communicative ability of written texts, such as presuppositions, language structures (e.g., grammar, syntax, etc.), and semiotics—the study of written signs such as symbolism, metaphor, analogy, and sarcasm. The term hermeneutics is sometimes used interchangeably and inaccurately with exegesis , which refers to the interpretation or critical explanation of written text only, and especially religious texts.

Finally, standard software programs, such as ATLAS.ti.5, NVivo, and QDA Miner, can be used to automate coding processes in qualitative research methods. These programs can quickly and efficiently organise, search, sort, and process large volumes of text data using user-defined rules. To guide such automated analysis, a coding schema should be created, specifying the keywords or codes to search for in the text, based on an initial manual examination of sample text data. The schema can be arranged in a hierarchical manner to organise codes into higher-order codes or constructs. The coding schema should be validated using a different sample of texts for accuracy and adequacy. However, if the coding schema is biased or incorrect, the resulting analysis of the entire population of texts may be flawed and non-interpretable. However, software programs cannot decipher the meaning behind certain words or phrases or the context within which these words or phrases are used—such sarcasm or metaphors—which may lead to significant misinterpretation in large scale qualitative analysis.

  • Miles, M. B., & Huberman, A. M. (1984). Qualitative data analysis: A sourcebook of new methods . Newbury Park, CA: Sage Publications. ↵
  • Glaser, B., & Strauss, A. (1967). The discovery of grounded theory: Strategies for qualitative research . New York: Aldine Pub Co. ↵
  • Strauss, A., & Corbin, J. (1990). Basics of qualitative research: Grounded theory procedures and techniques , Beverly Hills: Sage Publications. ↵
  • Schiling, J. (2006). On the pragmatics of qualitative assessment: Designing the process for content analysis. European Journal of Psychological Assessment , 22(1), 28–37. ↵

Social Science Research: Principles, Methods and Practices (Revised edition) Copyright © 2019 by Anol Bhattacherjee is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

data analysis for social research

  • Survey Software The world’s leading omnichannel survey software
  • Online Survey Tools Create sophisticated surveys with ease.
  • Mobile Offline Conduct efficient field surveys.
  • Text Analysis
  • Close The Loop
  • Automated Translations
  • NPS Dashboard
  • CATI Manage high volume phone surveys efficiently
  • Cloud/On-premise Dialer TCPA compliant Cloud on-premise dialer
  • IVR Survey Software Boost productivity with automated call workflows.
  • Analytics Analyze survey data with visual dashboards
  • Panel Manager Nurture a loyal community of respondents.
  • Survey Portal Best-in-class user friendly survey portal.
  • Voxco Audience Conduct targeted sample research in hours.
  • Predictive Analytics
  • Customer 360
  • Customer Loyalty
  • Fraud & Risk Management
  • AI/ML Enablement Services
  • Credit Underwriting

data analysis for social research

Find the best survey software for you! (Along with a checklist to compare platforms)

Get Buyer’s Guide

  • 100+ question types
  • Drag-and-drop interface
  • Skip logic and branching
  • Multi-lingual survey
  • Text piping
  • Question library
  • CSS customization
  • White-label surveys
  • Customizable ‘Thank You’ page
  • Customizable survey theme
  • Reminder send-outs
  • Survey rewards
  • Social media
  • SMS surveys
  • Website surveys
  • Correlation analysis
  • Cross-tabulation analysis
  • Trend analysis
  • Real-time dashboard
  • Customizable report
  • Email address validation
  • Recaptcha validation
  • SSL security

Take a peek at our powerful survey features to design surveys that scale discoveries.

Download feature sheet.

  • Hospitality
  • Financial Services
  • Academic Research
  • Customer Experience
  • Employee Experience
  • Product Experience
  • Market Research
  • Social Research
  • Data Analysis
  • Banking & Financial Services
  • Retail Solution
  • Risk Management
  • Customer Lifecycle Solutions
  • Net Promoter Score
  • Customer Behaviour Analytics
  • Customer Segmentation
  • Data Unification

Explore Voxco 

Need to map Voxco’s features & offerings? We can help!

Watch a Demo 

Download Brochures 

Get a Quote

  • NPS Calculator
  • CES Calculator
  • A/B Testing Calculator
  • Margin of Error Calculator
  • Sample Size Calculator
  • CX Strategy & Management Hub
  • Market Research Hub
  • Patient Experience Hub
  • Employee Experience Hub
  • Market Research Guide
  • Customer Experience Guide
  • The Voxco Guide to Customer Experience
  • NPS Knowledge Hub
  • Survey Research Guides
  • Survey Template Library
  • Webinars and Events
  • Feature Sheets
  • Try a sample survey
  • Professional services
  • Blogs & White papers
  • Case Studies

Find the best customer experience platform

Uncover customer pain points, analyze feedback and run successful CX programs with the best CX platform for your team.

Get the Guide Now

data analysis for social research

We’ve been avid users of the Voxco platform now for over 20 years. It gives us the flexibility to routinely enhance our survey toolkit and provides our clients with a more robust dataset and story to tell their clients.

VP Innovation & Strategic Partnerships, The Logit Group

  • Client Stories
  • Voxco Reviews
  • Why Voxco Research?
  • Why Voxco Intelligence?
  • Careers at Voxco
  • Vulnerabilities and Ethical Hacking

Explore Regional Offices

  • Cloud/On-premise Dialer TCPA compliant Cloud & on-premise dialer
  • Fraud & Risk Management

Get Buyer’s Guide

  • Banking & Financial Services

Explore Voxco 

Watch a Demo 

Download Brochures 

  • CX Strategy & Management Hub
  • Blogs & White papers

VP Innovation & Strategic Partnerships, The Logit Group

  • Our clients
  • Client stories
  • Featuresheets

Data Analysis using Qualitative and Quantitative Techniques1

Examining Data Analysis Techniques in Social Research: Qualitative vs. Quantitative

SHARE THE ARTICLE ON

Data analysis provides social researchers with the tool to unlock insights and understand complex social phenomena. You can interpret the data, and uncover relationships and patterns to address human behavior and social experiences. 

Social research, as we all know, focuses on expanding our knowledge on social dynamics. Data analysis in social science research provides you with empirical evidence to dig deeper and explore human experience, attitudes, interactions, and social structures. Social data analysis enables you to assess the effectiveness of policies and programs, helping you make informed decisions and design effective interventions. 

In this blog we focus on exploring quantitative and qualitative data analysis in social science research. 

What is data analysis in research?

In research, data analysis refers to employing statistical and logical techniques to evaluate and synthesize the data collected. It allows researchers to extract meaningful insights from an unstructured mass of data. 

Extracting insights and meaning from data gives us a better understanding of the world and different phenomena and empowers improved decision-making. 

Different data will need to be analyzed using different techniques. Within this article, we will explore the different kinds of data in research and the different methods of data analysis used to analyze them. 

Read how Voxco helped Siena College conduct more than one hundred polls and to 3M phone calls.

7db6400b af9b 4c67 9bea fa54cb719713

Types of Data in Research

There are three main types of data in research:

  • Qualitative Data: Qualitative data is used to describe qualities or characteristics and generally refers to the descriptive findings collected through different methods of research. It refers to data that is non-numerical in nature and is, therefore, not quantifiable. Some examples of qualitative data are blood type, ethnic group, color , etc. 
  • Quantitative Data: The type of data whose value takes distinct figures or counts that are associated with a numerical value. It refers to quantifiable information that can be used to conduct statistical analysis and mathematical computations. Some examples of quantitative data are cost, age, and weight. 
  • Categorical Data: Categorical data refers to the types of data that can be divided into groups. Categorical variables can only take one of a limited and usually fixed number of possible values. Some examples of categorical data are race, gender, age group, etc.

Key objectives of data analysis in social research

Data Analysis using Qualitative and Quantitative Techniques2

The followings are the primary objectives of data analysis in social research.

  • Data analysis techniques help you describe and summarize the social phenomenon you are studying. It provides you with statistical values such as means, medians, frequencies, and standard deviation, giving you a snapshot of the collected data. 
  • It helps facilitate exploratory analysis allowing you to uncover previously unknown insights. The analysis provides a foundation for further research by identifying patterns and relationships for hypothesis generation. 
  • Data analysis in social science research enables you to make inferences and draw meaningful conclusions about the target population based on the research sample. By gathering empirical evidence, you can generalize the research results to the larger population, ensuring the external validity. 

Related read: Importance of social research.

Challenges in social research data analysis

While data analysis is central to social research and offers multiple benefits, it is not without its challenges. Here are some common obstacles you may encounter when performing data analysis in social science research. 

  • Data quality – It is important to ensure that you remove any missing or inconsistent data to main data integrity and validity. 
  • Selecting the proper data analysis technique – You must have a good understanding of various analysis techniques to select the one that is appropriate for the research. 
  • Interpreting complex results – You need to communicate the findings effectively and provide a clear explanation of the implication of your research result. 

Unlock the true potential of your survey data.

Explore how easy it is to conduct sophisticated statistical analysis and create one-click summaries, custom live dashboards, and in-depth reports with Voxco Analytics.

See Voxco survey software in action with a Free demo.

Data analysis in social research using a qualitative approach

Data Analysis using Qualitative and Quantitative Techniques3

Let’s take a look at how data analysis is conducted in qualitative research and the different methods that are commonly used to do so. 

Data preparation for qualitative data analysis – 

Before you dive into analyzing your qualitative social research data, you need to prepare the data to make sense of the rich information. 

Step 1: Data familiarization: 

You need to start by getting familiar with the qualitative or textual data you have gathered. Take the time to read and re-read the interviews or feedback to gain a holistic understanding of the content. 

Step 2: Coding and categorization: 

This step involves assigning codes or labels to segments of data. Coding helps you identify themes, concepts, and patterns within your data. Organize your codes into categories (grouping related codes together) and themes (overarching ideas that arise from the data). 

Step 3: Theme and pattern identification: 

Once you have assigned codes, you can start identifying common themes. Look for recurring responses to questions, or identify shared experiences. You can now identify similarities and differences across the data and participants. 

How do we identify patterns in qualitative data

When analysing and looking for patterns in textual information, there are many different methods that can be used, including:

  • Word-based Method: The word-based method generally involves manually reading through the gathered data to find repetitive themes or commonly used words. 
  • Scrutiny-based Technique: The scrutiny-based technique is used to derive conclusions based solely on what is already known by the researcher. This is a popular method of text analysis for identifying correlations and patterns within textual information.   
  • Variable Partitioning : Variable partitioning, or dynamic partitioning, can be used to split variables so that more coherent descriptions and explanations can be extracted from vast vast data. 

6 Data analysis methods in qualitative social research -

There are six main analysis methods in quantitative research that you can use in data analysis for social research. Let’s look at these six methods. 

  • Narrative analysis. 
  • Qualitative content analysis. 
  • Grounded theory. 
  • Discourse analysis.
  • Thematic analysis.
  • Interpretive phenomenological analysis. 

Let’s explore these six qualitative data analysis methods. 

1. Narrative Analysis: 

Narrative analysis, or narrative inquiry, is a qualitative research method where researchers interpret texts or visual data in a storied form. There are different approaches to narrative analysis, including; functional, thematic, structural, and dialogic.

2. Qualitative Content Analysis: 

This is a straightforward method of qualitative research where patterns within a piece of content are evaluated. It can be used with different forms of content, such as words, phrases, and/or images.

3. Grounded Theory: 

This method of qualitative analysis is used to create new theories using the data collected by using a series of “tests” and “revisions”. Grounded theory (GT) follows a structured but flexible methodology focusing on social processes or actions. 

4. Discourse Analysis: 

This method is used to study written, vocal, sign language, or any significant semiotic event, in relation to its social context. It allows researchers to examine a language beyond just sentences and explains how these sentences function in a social context. 

5. Thematic Analysis: 

The thematic analysis involves looking for patterns by taking large bodies of data and grouping them based on shared themes or similarities to answer the research question being addressed. This method of qualitative data analysis is widely used in the field of psychology. 

6. Interpretive Phenomenological Analysis (IPA): 

It is an approach to psychological qualitative research and has an ideographic focus. It provides a detailed examination of a person and their lived experiences. The aim of IPA is to understand how participants make sense of their personal and social world. 

Leverage online survey tools that enable you to perform text analysis and sentiment analysis to extract insights from your qualitative research data. 

Looking for robust, agile, and powerful survey software?

Download our guide to see what features your platform must have.

In this guide, you’ll discover: 

  • The risks and benefits of adopting new survey software.
  • What features to look out for when you’re making a purchase decision?
  • A definitive checklist to compare platforms.

Data analysis in social research using a quantitative approach

Let’s now delve into how you can conduct data analysis in quantitative research and the different methods that are commonly used to do so. 

Data preparation for quantitative data analysis

Before quantitative data can be analyzed, it must first be prepared using the following three steps:

Step 1: Data Validation: 

Data validation refers to comparing the gathered data against defined rules to ensure that it is within the required quality parameters without any bias. It generally involves checking for the following; fraud, screening, procedure, and completeness. 

Step 2: Data Editing: 

Data editing refers to reviewing and adjusting after checking for missing, invalid, or inconsistent entries within the data records. 

Step 3: Data Coding: 

As the name suggests, data coding involves deriving codes from observed data. It refers to transforming and organizing gathered information into a set of meaningful and cohesive categories. 

2 data analysis methods in quantitative social research

There are two main methods of data analysis used in quantitative research:

  • Descriptive analysis. 
  • Inferential analysis. 

1. Descriptive Statistics: 

This quantitative method of data analysis is used to describe the basic features of data in a study and provides simple summaries about the measures and sample. 

It helps researchers understand the details of a sample group and doesn’t aim to make assumptions or predictions about the entire population. Descriptive analysis generally includes the first set of statistics covered before moving on to inferential statistics. 

Some common statistical tests used in descriptive statistics are mean, median, mode, skewness, and standard deviation. 

2. Inferential Statistics: 

Inferential statistics differs from descriptive statistics as it aims to make inferences about the population rather than about a specific data set or sample. It, therefore, allows researchers to make assumptions and predictions about an entire population. 

There are two main kinds of predictions made using inferential statistics, including predictions about the differences between groups within a population and predictions about the relationships between variables relevant to a population. 

Some common inferential methods used in quantitative data analysis are regression analysis, frequency tables, analysis of variance (ANOVA), cross-tabulation, and correlational research. Leverage a data analysis tool that streamlines the entire process of quantitative data analysis and automates any manual work. 

Start collecting insights & make data-driven decisions.

Voxco is trusted by 500+ global brands & Top 50MR to gather, measure, uncover, and act on meaningful insights.

Data analysis encompasses both quantitative and qualitative methods. Quantitative methods in social science research allow objective insights with the help of statistical analysis. Qualitative methods in social science research provide exploratory insights with the help of textual analysis.   Through data analysis in social science research, you uncover patterns, establish correlations, and gain a deeper understanding of social systems. You can contribute to the discipline with evidence-based insights and generate knowledge that informs decision-making, policies, and interventions advancing our understanding of human behavior and social phenomenon. 

Explore all the survey question types possible on Voxco

Explore Voxco Survey Software

Online page new product image3 02.png 1

+ Omnichannel Survey Software 

+ Online Survey Software 

+ CATI Survey Software 

+ IVR Survey Software 

+ Market Research Tool

+ Customer Experience Tool 

+ Product Experience Software 

+ Enterprise Survey Software 

Why Is Sentiment Analysis Important?

Sentiment Analysis Importance: Definition, Importance, and Benefits SHARE THE ARTICLE ON Share on facebook Share on twitter Share on linkedin Table of Contents Sentiment analysis

woman carrying her baby and working on a laptop 4079283 400x250 1

Employee experience during the pandemic – Getting insights now will have a long term impact

Getting Employee Experience right can be hard even in the best of times, and now a global lockdown has placed employers in uncharted waters. COVID-19

Call Center Coaching1

Brand Equity : Why Is It Necessary To Understand Customer Perception Of Your Brand ?

BRAND EQUITY Why Is It Necessary To Understand Customer Perception Of Your Brand ? Voxco is trusted by 450+ Global Brands in 40+ countries See

Customer Survey Questions

Customer Survey Questions SHARE THE ARTICLE ON Table of Contents Companies may feel they know everything there is to know about their customers. You may

Market Survey cvr

Market Survey

Market Survey: Uncovering Your Target Market SHARE THE ARTICLE ON Table of Contents In a competitive yet saturated business landscape, staying ahead of the curve

Field Research1

Field Research : Definition, Examples & Methodology

Field Research : Definition, Examples & Methodology Try a free Voxco Online sample survey! Unlock your Sample Survey SHARE THE ARTICLE ON Table of Contents

We use cookies in our website to give you the best browsing experience and to tailor advertising. By continuing to use our website, you give us consent to the use of cookies. Read More

Graduate Programs

Social data analytics.

The STEM–designated master's program in Social Data Analytics in the Department of Sociology at Brown trains students in advanced techniques for data collection and analysis.

Careers in the 21st century increasingly place a premium on the ability to collect, process, analyze and interpret large-scale data on human attributes, preferences, attitudes, and behaviors and complex systems of human interactions. Such skills have concrete application and relevance to a wide variety of careers, including market research, program evaluation, policy work, advanced study in the social sciences and financial analysis.

The McKinsey Global Institute, a management consulting group, estimates that by 2018, the U.S. may face a 50-60 percent gap between the need for individuals who can analyze complex data and the supply of those with the training and skills to do so. The demand for data analysts requires professionals that are not only technically skilled, but also thoughtful about how best to use and interpret data.

The hallmarks of the program are focused methodological training in both quantitative and qualitative methods of data collection and analysis, with cores in spatial analysis and market research, classroom instruction by active and internationally renowned researchers, and individualized supervision of applied, hands-on data analytic research on a faculty project or with an off-campus organization.

Through this program, students will develop the pragmatic and logical skills that will prepare them for a career in social research, whether basic research (such as found in academia or research institutions) or applied (such as found in policy and market research). Students will put these newly developed skills to work, as they apply the techniques they learn to the analysis of actual data from the social sciences.

The master's program is ideal for early–career students who have an existing foundation in basic statistics and social science research and who desire more focused training in order to be not only prepared, but highly competitive, in acquiring careers in market or social research or as analysts at research and policy institutions.

While this degree is based in the Sociology Department, its value extends across disciplines. Many of the master's program faculty are associates of the Population Studies and Training Center (PSTC) at Brown, an interdisciplinary social science research and graduate training center. The Spatial Analysis courses in this program are taught by experts affiliated with the Spatial Structures in the Social Sciences (S4) Initiative; and several of the courses in market research are taught by staff affiliated with the undergraduate program in Business, Entrepreneurship and Organizations (BEO).

Application Information

If you have any questions regarding the application process for this program, please email  [email protected] .

Application Requirements

Gre general:.

Required; the GRE General Test at home version is accepted

TOEFL/IELTS:

Required for any non-native English speaker who does not have a degree from an institution where English is the sole language of instruction or from a University in the following countries: Australia, Bahamas, Botswana, Cameroon, Canada (except Quebec), Ethiopia, Ghana, Ireland, Kenya, Lesotho, Liberia, Malawi, New Zealand, Nigeria, Zimbabwe, South Africa, Sierra Leone, Swaziland, Tanzania, Gambia, Uganda, United Kingdom (England, Scotland, Northern Ireland, Wales), West Indies, Zambia). The TOEFL iBT Special Home Edition and the IELTS Indicator exam are accepted. Students from mainland China may submit the TOEFL ITP Plus exam.

Official Transcripts:

Required. All applicants may upload unofficial transcripts for application submission. Official transcripts are ONLY required for enrolling students before class start. An international transcript evaluation (WES, ECE, or SpanTran) is required for degrees from non-U.S. institutions before enrollment.

Letters of Recommendations:

Three (3) recommendations required

Personal Statement:

600-1000 word personal statement that gives your reasons to pursue graduate work in the field of your study. The statement should include examples of your past work in your chosen field, your plans for study at Brown, issues and problems you would like to address in your field and your professional goals.

Dates/Deadlines

Application deadline, 5 th year deadline, tuition and funding.

Graduate Tuition & Fees : Please visit the Student Financial Services Office  for up-to-date tuition rates.

Completion Requirements

The master's program in Social Data Analytics is a terminal degree program designed to be completed in two semesters. The program requires eight courses including an optional intensive Research Internship that is attached to a faculty Directed Research Practicum. Brown undergraduates who enter the program as fifth–year master's students are allowed to use up to two undergraduate courses to count towards the eight credit requirements if the courses are among the required or elective courses for the program. All entering students are required to have (1) a one–semester introductory statistics course (SOC 1100 Introductory Statistics for Social Research or an equivalent), (2) a more advanced course in statistics or a course in college calculus (MATH 0050 and 0060, or MATH 0090 or an equivalent), and (3) a one–semester course in research methods (SOC 1020 Methods of Social Research or an equivalent).

Alumni Careers

placeholder

Contact and Location

Department of sociology, mailing address.

  • Program Faculty
  • Program Handbook
  • Graduate School Handbook
  • Social Data Analytics Newsletter
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

data analysis for social research

Home Market Research

Data Analysis in Research: Types & Methods

data-analysis-in-research

Content Index

Why analyze data in research?

Types of data in research, finding patterns in the qualitative data, methods used for data analysis in qualitative research, preparing data for analysis, methods used for data analysis in quantitative research, considerations in research data analysis, what is data analysis in research.

Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers to reduce data to a story and interpret it to derive insights. The data analysis process helps reduce a large chunk of data into smaller fragments, which makes sense. 

Three essential things occur during the data analysis process — the first is data organization . Summarization and categorization together contribute to becoming the second known method used for data reduction. It helps find patterns and themes in the data for easy identification and linking. The third and last way is data analysis – researchers do it in both top-down and bottom-up fashion.

LEARN ABOUT: Research Process Steps

On the other hand, Marshall and Rossman describe data analysis as a messy, ambiguous, and time-consuming but creative and fascinating process through which a mass of collected data is brought to order, structure and meaning.

We can say that “the data analysis and data interpretation is a process representing the application of deductive and inductive logic to the research and data analysis.”

Researchers rely heavily on data as they have a story to tell or research problems to solve. It starts with a question, and data is nothing but an answer to that question. But, what if there is no question to ask? Well! It is possible to explore data even without a problem – we call it ‘Data Mining’, which often reveals some interesting patterns within the data that are worth exploring.

Irrelevant to the type of data researchers explore, their mission and audiences’ vision guide them to find the patterns to shape the story they want to tell. One of the essential things expected from researchers while analyzing data is to stay open and remain unbiased toward unexpected patterns, expressions, and results. Remember, sometimes, data analysis tells the most unforeseen yet exciting stories that were not expected when initiating data analysis. Therefore, rely on the data you have at hand and enjoy the journey of exploratory research. 

Create a Free Account

Every kind of data has a rare quality of describing things after assigning a specific value to it. For analysis, you need to organize these values, processed and presented in a given context, to make it useful. Data can be in different forms; here are the primary data types.

  • Qualitative data: When the data presented has words and descriptions, then we call it qualitative data . Although you can observe this data, it is subjective and harder to analyze data in research, especially for comparison. Example: Quality data represents everything describing taste, experience, texture, or an opinion that is considered quality data. This type of data is usually collected through focus groups, personal qualitative interviews , qualitative observation or using open-ended questions in surveys.
  • Quantitative data: Any data expressed in numbers of numerical figures are called quantitative data . This type of data can be distinguished into categories, grouped, measured, calculated, or ranked. Example: questions such as age, rank, cost, length, weight, scores, etc. everything comes under this type of data. You can present such data in graphical format, charts, or apply statistical analysis methods to this data. The (Outcomes Measurement Systems) OMS questionnaires in surveys are a significant source of collecting numeric data.
  • Categorical data: It is data presented in groups. However, an item included in the categorical data cannot belong to more than one group. Example: A person responding to a survey by telling his living style, marital status, smoking habit, or drinking habit comes under the categorical data. A chi-square test is a standard method used to analyze this data.

Learn More : Examples of Qualitative Data in Education

Data analysis in qualitative research

Data analysis and qualitative data research work a little differently from the numerical data as the quality data is made up of words, descriptions, images, objects, and sometimes symbols. Getting insight from such complicated information is a complicated process. Hence it is typically used for exploratory research and data analysis .

Although there are several ways to find patterns in the textual information, a word-based method is the most relied and widely used global technique for research and data analysis. Notably, the data analysis process in qualitative research is manual. Here the researchers usually read the available data and find repetitive or commonly used words. 

For example, while studying data collected from African countries to understand the most pressing issues people face, researchers might find  “food”  and  “hunger” are the most commonly used words and will highlight them for further analysis.

LEARN ABOUT: Level of Analysis

The keyword context is another widely used word-based technique. In this method, the researcher tries to understand the concept by analyzing the context in which the participants use a particular keyword.  

For example , researchers conducting research and data analysis for studying the concept of ‘diabetes’ amongst respondents might analyze the context of when and how the respondent has used or referred to the word ‘diabetes.’

The scrutiny-based technique is also one of the highly recommended  text analysis  methods used to identify a quality data pattern. Compare and contrast is the widely used method under this technique to differentiate how a specific text is similar or different from each other. 

For example: To find out the “importance of resident doctor in a company,” the collected data is divided into people who think it is necessary to hire a resident doctor and those who think it is unnecessary. Compare and contrast is the best method that can be used to analyze the polls having single-answer questions types .

Metaphors can be used to reduce the data pile and find patterns in it so that it becomes easier to connect data with theory.

Variable Partitioning is another technique used to split variables so that researchers can find more coherent descriptions and explanations from the enormous data.

LEARN ABOUT: Qualitative Research Questions and Questionnaires

There are several techniques to analyze the data in qualitative research, but here are some commonly used methods,

  • Content Analysis:  It is widely accepted and the most frequently employed technique for data analysis in research methodology. It can be used to analyze the documented information from text, images, and sometimes from the physical items. It depends on the research questions to predict when and where to use this method.
  • Narrative Analysis: This method is used to analyze content gathered from various sources such as personal interviews, field observation, and  surveys . The majority of times, stories, or opinions shared by people are focused on finding answers to the research questions.
  • Discourse Analysis:  Similar to narrative analysis, discourse analysis is used to analyze the interactions with people. Nevertheless, this particular method considers the social context under which or within which the communication between the researcher and respondent takes place. In addition to that, discourse analysis also focuses on the lifestyle and day-to-day environment while deriving any conclusion.
  • Grounded Theory:  When you want to explain why a particular phenomenon happened, then using grounded theory for analyzing quality data is the best resort. Grounded theory is applied to study data about the host of similar cases occurring in different settings. When researchers are using this method, they might alter explanations or produce new ones until they arrive at some conclusion.

LEARN ABOUT: 12 Best Tools for Researchers

Data analysis in quantitative research

The first stage in research and data analysis is to make it for the analysis so that the nominal data can be converted into something meaningful. Data preparation consists of the below phases.

Phase I: Data Validation

Data validation is done to understand if the collected data sample is per the pre-set standards, or it is a biased data sample again divided into four different stages

  • Fraud: To ensure an actual human being records each response to the survey or the questionnaire
  • Screening: To make sure each participant or respondent is selected or chosen in compliance with the research criteria
  • Procedure: To ensure ethical standards were maintained while collecting the data sample
  • Completeness: To ensure that the respondent has answered all the questions in an online survey. Else, the interviewer had asked all the questions devised in the questionnaire.

Phase II: Data Editing

More often, an extensive research data sample comes loaded with errors. Respondents sometimes fill in some fields incorrectly or sometimes skip them accidentally. Data editing is a process wherein the researchers have to confirm that the provided data is free of such errors. They need to conduct necessary checks and outlier checks to edit the raw edit and make it ready for analysis.

Phase III: Data Coding

Out of all three, this is the most critical phase of data preparation associated with grouping and assigning values to the survey responses . If a survey is completed with a 1000 sample size, the researcher will create an age bracket to distinguish the respondents based on their age. Thus, it becomes easier to analyze small data buckets rather than deal with the massive data pile.

LEARN ABOUT: Steps in Qualitative Research

After the data is prepared for analysis, researchers are open to using different research and data analysis methods to derive meaningful insights. For sure, statistical analysis plans are the most favored to analyze numerical data. In statistical analysis, distinguishing between categorical data and numerical data is essential, as categorical data involves distinct categories or labels, while numerical data consists of measurable quantities. The method is again classified into two groups. First, ‘Descriptive Statistics’ used to describe data. Second, ‘Inferential statistics’ that helps in comparing the data .

Descriptive statistics

This method is used to describe the basic features of versatile types of data in research. It presents the data in such a meaningful way that pattern in the data starts making sense. Nevertheless, the descriptive analysis does not go beyond making conclusions. The conclusions are again based on the hypothesis researchers have formulated so far. Here are a few major types of descriptive analysis methods.

Measures of Frequency

  • Count, Percent, Frequency
  • It is used to denote home often a particular event occurs.
  • Researchers use it when they want to showcase how often a response is given.

Measures of Central Tendency

  • Mean, Median, Mode
  • The method is widely used to demonstrate distribution by various points.
  • Researchers use this method when they want to showcase the most commonly or averagely indicated response.

Measures of Dispersion or Variation

  • Range, Variance, Standard deviation
  • Here the field equals high/low points.
  • Variance standard deviation = difference between the observed score and mean
  • It is used to identify the spread of scores by stating intervals.
  • Researchers use this method to showcase data spread out. It helps them identify the depth until which the data is spread out that it directly affects the mean.

Measures of Position

  • Percentile ranks, Quartile ranks
  • It relies on standardized scores helping researchers to identify the relationship between different scores.
  • It is often used when researchers want to compare scores with the average count.

For quantitative research use of descriptive analysis often give absolute numbers, but the in-depth analysis is never sufficient to demonstrate the rationale behind those numbers. Nevertheless, it is necessary to think of the best method for research and data analysis suiting your survey questionnaire and what story researchers want to tell. For example, the mean is the best way to demonstrate the students’ average scores in schools. It is better to rely on the descriptive statistics when the researchers intend to keep the research or outcome limited to the provided  sample  without generalizing it. For example, when you want to compare average voting done in two different cities, differential statistics are enough.

Descriptive analysis is also called a ‘univariate analysis’ since it is commonly used to analyze a single variable.

Inferential statistics

Inferential statistics are used to make predictions about a larger population after research and data analysis of the representing population’s collected sample. For example, you can ask some odd 100 audiences at a movie theater if they like the movie they are watching. Researchers then use inferential statistics on the collected  sample  to reason that about 80-90% of people like the movie. 

Here are two significant areas of inferential statistics.

  • Estimating parameters: It takes statistics from the sample research data and demonstrates something about the population parameter.
  • Hypothesis test: I t’s about sampling research data to answer the survey research questions. For example, researchers might be interested to understand if the new shade of lipstick recently launched is good or not, or if the multivitamin capsules help children to perform better at games.

These are sophisticated analysis methods used to showcase the relationship between different variables instead of describing a single variable. It is often used when researchers want something beyond absolute numbers to understand the relationship between variables.

Here are some of the commonly used methods for data analysis in research.

  • Correlation: When researchers are not conducting experimental research or quasi-experimental research wherein the researchers are interested to understand the relationship between two or more variables, they opt for correlational research methods.
  • Cross-tabulation: Also called contingency tables,  cross-tabulation  is used to analyze the relationship between multiple variables.  Suppose provided data has age and gender categories presented in rows and columns. A two-dimensional cross-tabulation helps for seamless data analysis and research by showing the number of males and females in each age category.
  • Regression analysis: For understanding the strong relationship between two variables, researchers do not look beyond the primary and commonly used regression analysis method, which is also a type of predictive analysis used. In this method, you have an essential factor called the dependent variable. You also have multiple independent variables in regression analysis. You undertake efforts to find out the impact of independent variables on the dependent variable. The values of both independent and dependent variables are assumed as being ascertained in an error-free random manner.
  • Frequency tables: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
  • Analysis of variance: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
  • Researchers must have the necessary research skills to analyze and manipulation the data , Getting trained to demonstrate a high standard of research practice. Ideally, researchers must possess more than a basic understanding of the rationale of selecting one statistical method over the other to obtain better data insights.
  • Usually, research and data analytics projects differ by scientific discipline; therefore, getting statistical advice at the beginning of analysis helps design a survey questionnaire, select data collection  methods, and choose samples.

LEARN ABOUT: Best Data Collection Tools

  • The primary aim of data research and analysis is to derive ultimate insights that are unbiased. Any mistake in or keeping a biased mind to collect data, selecting an analysis method, or choosing  audience  sample il to draw a biased inference.
  • Irrelevant to the sophistication used in research data and analysis is enough to rectify the poorly defined objective outcome measurements. It does not matter if the design is at fault or intentions are not clear, but lack of clarity might mislead readers, so avoid the practice.
  • The motive behind data analysis in research is to present accurate and reliable data. As far as possible, avoid statistical errors, and find a way to deal with everyday challenges like outliers, missing data, data altering, data mining , or developing graphical representation.

LEARN MORE: Descriptive Research vs Correlational Research The sheer amount of data generated daily is frightening. Especially when data analysis has taken center stage. in 2018. In last year, the total data supply amounted to 2.8 trillion gigabytes. Hence, it is clear that the enterprises willing to survive in the hypercompetitive world must possess an excellent capability to analyze complex research data, derive actionable insights, and adapt to the new market needs.

LEARN ABOUT: Average Order Value

QuestionPro is an online survey platform that empowers organizations in data analysis and research and provides them a medium to collect data by creating appealing surveys.

MORE LIKE THIS

social listening tools

Top 10 Social Listening Tools for Brand Reputation

Mar 1, 2024

knowledge management software

16 Best Knowledge Management Software 2024

Feb 29, 2024

research management system

Research Management System: What it is & Why You Need It

Feb 28, 2024

Journey to Customer Happiness: Strategies for Better CX Programs

Journey to Customer Happiness: Proven Strategies for Building Exceptional CX Programs

Feb 27, 2024

Other categories

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence

Study Site Homepage

  • Request new password
  • Create a new account

Adventures in Social Research: Data Analysis Using IBM® SPSS® Statistics

Student resources, welcome to the companion website.

This site is intended to enhance your use of  Adventures in Social Research  by Earl Babbie, William E. Wagner, III and Jeanne Zaino. Please note that all the materials on this site are especially geared toward maximizing your understanding of the material. 

Written by esteemed social science research authors,  Adventures in Social Research: Data Analysis Using IBM® SPSS® Statistics, Ninth Edition  encourages students to practice SPSS as they read about it, providing a practical, hands-on introduction to conceptualization, measurement, and association through active learning. This fully revised workbook will guide students through step-by-step instruction on data analysis using the latest version of SPSS and the most up to date General Social Survey data. Arranged to parallel most introductory research methods texts, this text starts with an introduction to computerized data analysis and the social research process, then takes readers step-by-step through univariate, bivariate, and multivariate analysis using SPSS Statistics. In this revised edition, active and collaborative learning will be emphasized as students engage in a series of practical investigative exercises.

Acknowledgments

We gratefully acknowledge Earl Babbie, William E. Wagner III, and Jeanne Zaino for writing an excellent text and for reviewing the assets on this site. Special thanks are also due to William E. Wagner III for creating the ancillaries on this site.

book image

  • Costs, Scholarships & Aid
  • Campus Life
  • Faculty & Staff
  • Family & Visitors
  • DFW Community
  • Galaxy Login
  • Academic Calendar
  • Human Resources
  • Accessibility

Master of Science in Social Data Analytics and Research

Program description.

The Master of Science in Social Data Analytics and Research degree program builds on faculty expertise in criminology, economics, geospatial information sciences, political science, public and nonprofit management, public policy, political economy and sociology to equip students with advanced training that is widely applicable in a variety of fields.

As demand for social data production, collection and analysis increases — driven by government, nonprofit and private sector organizations, as well as doctoral programs and advanced research institutions — students with advanced training and expertise in both quantitative and qualitative methodologies will have the skills and experienced needed to thrive in many different industries.

Taught by internationally recognized faculty at The University of Texas at Dallas, graduates of the Social Data Analytics and Research program develop advanced expertise in:

  • Social science research design and evaluation.
  • Quantitative and qualitative data discovery and analysis.
  • Social science methodologies, theories and philosophical dimensions, as well as the ethics of social science practice.
  • Applied social science in the service of public policy and action.

Responding to an ever-increasing need across public and private sectors for social data analytics and research, the program equips students with rigorous multidisciplinary proficiencies in social data production, collection and investigation.

The Social Data Analytics and Research master’s program ensures that students gain a broad understanding of their discipline, apply their knowledge and analytical skills to create effective and novel solutions to practical problems and communicate and work effectively in collaborative environments.

Other benefits include:

  • World-Class Faculty : The program is led by faculty of the School of Economic, Political and Policy Sciences who are widely cited experts in their respective fields.
  • Comprehensive Curriculum : Courses in the Social Data Analytics and Research master’s program will introduce students to new ideas, technologies and competencies while preparing them to succeed in both public and private sectors.
  • Facilities : Students have full access to four state-of-the-art computer laboratories housed in the School of Economic, Political and Policy Sciences. All computers are network linked and hold full suites of leading survey, qualitative, spatial and statistical analysis software, including Qualtrics, NVivo, ArcGis, ENVI, EViews, R, STATA, and SAS.
  • Location : Situated in the greater Dallas region—recently rated by Forbes magazine as the #1 “Best City for Jobs”—UT Dallas provides students with easy access to employers and internship opportunities, not to mention a large and supportive alumni population.

Career Opportunities

Graduates of the Social Data Analytics and Research master’s program have gone on to pursue a wide variety of careers, such as:

  • Data Scientist or Analyst
  • Data Mining Specialist
  • Database Manager
  • Statistician
  • Program Evaluation Analyst
  • Decision Support Analyst
  • Research Analyst
  • Opinion Polling Statistician
  • Community Intelligence Expert
  • Information Resource Analyst.

Marketable Skills

Review the marketable skills for this academic program.

Application Deadlines and Requirements

Please take note of all  application deadlines and visit the  Apply Now webpage to begin the application process.

Applicants to the Social Data Analytics and Research master’s degree program should have:

  • A baccalaureate degree or its equivalent from an accredited institution of higher education.
  • A grade point average (GPA) of 3.0 out of a 4.0 scale.
  • Test Scores: A verbal score of 150 and a quantitative score of 150 on the GRE.
  • Letters of Recommendation: Applicants must submit three letters of recommendation from individuals who can judge the candidate’s potential for success in the master’s degree program.
  • Admissions Essay: Applicants must submit a one-page essay outlining personal background, education and professional objectives.
  • International applicants must submit a TOEFL score of at least 80 on the internet-based test. Scores must be less than two years old. See the Graduate Catalog for additional information regarding English proficiency requirements for international applicants.

About the School of Economic, Political and Policy Sciences

Every new generation inherits a world more complex than that of its predecessors, which prompts a need for new thinking about public policies that impact people’s daily lives. In the School of Economic, Political and Policy Sciences (EPPS), we examine the implications of innovation and change for individuals and communities. The social sciences are where the world turns to for answers to the important issues of today and the future such as education and health policy, financial crises, globalization, policing, political polarization, public management, terrorism, and the application of geographical information sciences to study social, economic and environmental issues.

As an undergraduate in EPPS, you will have the opportunity to work with professors who are probing issues that will affect your future. You will develop the vital skills you need to thrive in a rapidly evolving, highly competitive job market. EPPS will prepare you for careers in government, non-profits and the private sector that enable you to make a real difference in the world of today and tomorrow. EPPS is at the forefront of leadership, ethics and innovation in the public and nonprofit sectors. Our students and faculty look forward to new opportunities to study and address the complex and evolving issues of the future. Research informs much of the instruction. The school has four centers of excellence:

  • Center for Global Collective Action
  • Texas Schools Project
  • Institute for Urban Policy Research
  • The Negotiations Center

Degrees Offered

Bachelor of Science and Bachelor of Arts : Criminology , economics , geospatial information sciences , international political economy , political science , public affairs , public health , public policy , sociology

Master of Science : Applied sociology , criminology , economics , geospatial information sciences , international political economy , social data analytics and research

Master of Arts : Political science

Master of Public Affairs : Public affairs

Master of Public Policy : Public policy

Doctor of Philosophy : Criminology , economics , geospatial information sciences , political science , public affairs , public policy and political economy

Certificates

EPPS offers the following 15-hour graduate certificates, which generally can be completed in one year of part-time evening classes:

  • Economic and Demographic Data Analysis : focusing on the understanding and application of quantitative analysis of demographic and economic data.
  • Geographic Information Systems (GIS) : focusing on the application of GIS in government, private sector and scientific areas.
  • Geospatial Intelligence : focusing on the application of geospatial ideas and techniques to national security and other intelligence activity.
  • Local Government Management : designed to broaden knowledge of important issues and approaches employed by professional local public administrators.
  • Nonprofit Management : designed to provide an overview of the nature and context of nonprofit organizations and develop competencies needed by nonprofit managers.
  • Program Evaluation : designed to provide students the opportunity to gain competencies in the design and implementation of program evaluations in fields such as education, health care, human services, criminal justice and economic development.
  • Remote Sensing : focusing on remote sensing and digital image processing.

Contact Information

For additional information or inquiries, please contact [email protected] .

School of Economic, Political and Policy Sciences The University of Texas at Dallas 800 W. Campbell Rd., GR 31 Richardson, TX  75080-3021

epps.utdallas.edu

Request More Information

data analysis for social research

We have received your request for more information. Our admissions team will contact you soon to share details about pursuing your academic goals at UT Dallas.

The University of Texas at Dallas respects your right to privacy . By submitting this form, you consent to receive emails and calls from a representative of the University.

* Required Field

Exploratory Data Analysis in Social Science Research

3 cartoon hands pointing at a graph that contains a bar chart and a best-fit line

Political science has taken a turn towards causal inference in the last two decades, evidenced by the focus of methods courses in graduate school and the methodological leanings of publications in top journals of the field. Though understanding the causes of effects and effects of causes is an important enterprise, this trend has, at times, come at the expense of grounding research in good research questions and theory. Finding the right research question and building good theories is a difficult task. A core component of this task is descriptive inference, or the process of describing the world as it exists. Descriptive research can help us establish patterns and puzzles - empirical realities - in the world around us and therefore, craft research questions worth asking. Describing the state of the world can also contribute to building theories to answer those questions.

Often the starting point for descriptive research is exploring existing datasets. This process, which I am calling exploratory data analysis, can be critical in unearthing puzzling empirical patterns, establishing associations between variables, finding predictors of outcomes, and being in conversation with the existing literature on a topic. Consequently, exploratory data analysis also lends itself to a variety of techniques, skills, and methods, such as data cleaning, recoding variables, regression analysis, and of course, machine learning. As a PhD student in the process of proposing my dissertation project, exploring existing datasets has been at the center of my research. My proposed dissertation aims to ask whether there is a gender gap in political ambition for political careers such as elected office, political activism, and leadership in political party organizations, and how women’s political ambition can be increased. I explore these research questions in India.

Exploring the 2022 YouGov-CPR-Mint Data

I conducted exploratory data analysis on survey data collected in India by YouGov-Center for Policy Research-Mint in 2022, which asked citizens questions about their political ambition for a career in politics. Specifically, the survey asked whether individuals would consider making politics their career and if they said no, what the reason was. The survey also collected respondents’ demographic information, opinions on Indian politics and the state of the Indian economy, participation in political activities, and level of satisfaction with their personal freedoms.

Some of the questions I explored through this dataset were:

Previous political science research has found a gender gap in political ambition for office (Fox and Lawless 2014, Schneider et al. 2016), that is women are less likely to have considered running for office than men. Does this gender gap in political ambition for office exist in India?

What are the reasons for lack of political ambition among individuals and do these reasons differ for men and women?

Is the gender gap in ambition particular to political careers or are women in general less ambitious than men?

How do politically ambitious women compare to non-politically ambitious women on other indicators of political participation?

What are the most important predictors of women’s political ambition?

My exploratory analysis consisted of three key components. First, I cleaned and recoded the data. Second, I created cross-tables of different variables and conducted difference-in-means t-tests. This was to explore whether the differences I observed were significant or purely due to chance. Third, I trained a machine learning model (random forest) to find important predictors of political ambition.

I find that there is a substantial gender gap in political ambition but not an ambition gap writ large. The most important inhibitor of women’s political ambition is that they are not interested in politics as a career and have other interests instead. And that political participation indicators are some of the leading predictors of women’s political ambition. Many of these findings will motivate the proposal for my dissertation.

Data Exploration Results

Political scientists have consistently found that women are less likely to have considered running for elected political office (Fox and Lawless 2014, Schneider et al. 2016). I wanted to know if this pattern existed in India as well. The survey asked respondents if, “Given an opportunity, would you make politics your career?” and respondents could choose to answer yes, no or don’t know/can’t say. Figure 1 below shows the crosstabulation of respondents’ answers by their gender. I found a large gender gap in political ambition – women were more than 8 percent less likely to consider making politics their career than men (Figure 1).

data analysis for social research

Figure 1: Respondent Political Ambition by Gender

I then conducted a difference-in-means test for the average political ambition by gender – testing whether the average political ambition among men and women differed significantly or purely by chance – and found that the difference was not only large, but also statistically significant as shown from the confidence intervals that are not overlapping (Figure 2).

data analysis for social research

Figure 2: Difference in Means of Political Ambition by Gender

Next, I wanted to know whether women in India were less ambitious than men in general. Given that India is a patriarchal society, with strong gender hierarchies, it is possible women would express lower desire for any profession outside the household, beyond politics.

The survey asked respondents whether they would want to be businesspeople or entrepreneurs if they had the opportunity. I used this question as a proxy for ambition for an alternative career outside the home. Not only were women more likely to be interested in being businesspeople or entrepreneurs relative to politics, they were also only 3 percent less likely than men to be interested in being businesspeople or entrepreneurs (Figure 3). In other words, the lack of ambition for politics as a career was not a story about lack of ambition at large.

data analysis for social research

Figure 3: Respondent Entrepreneurial Ambition by Gender

To examine the reasons why some men and women said they do not wish to make politics their career, I created a crosstable of their reasons by gender (Table 1). The most common reason across genders is that respondents were either not interested in politics or they had other career interests and options. As expected, more women than men felt they did not have the requisite skills to be successful politicians. Surprisingly, men and women felt that they didn’t have the personal ties to succeed in politics and that politics is corrupt at similar rates.

data analysis for social research

Table 1: Crosstable for Lack of Political Ambition by Gender

Lastly, I used a random forest model, trained to predict whether a woman responded they had political ambition, to find the most important predictors of their political ambition. Figure 4 shows a random forest importance plot, which uses the mean decrease in accuracy to capture the importance of a feature on the x-axis. The mean decrease in accuracy tells us the number of observations that would be misclassified if that variable was excluded from the random forest model.

Strikingly, variables capturing an individual’s political participation are the most important predictors of women’s political ambition. This observation is intuitive – women who are more active participants in politics (they vote, protest, attend election meetings and rallies, or volunteer for social causes) would also be more likely to have considered a more active role in politics. Respondents’ area of residence and birth year are also important predictors of political ambition. This would indicate that where an individual lives could influence their political ambition – for instance, states in India (such as Kerala) with more matriarchal norms may have a differential effect on political ambition of women than states with more patriarchal norms. Age can also influence a woman’s political ambition – older women may express lower ambition than younger women. Surprisingly, predictors such as caste or income of the respondent exhibited low importance in predicting political ambition.

data analysis for social research

Figure 4: Random Forest Importance Plot

This exploratory data analysis has given me ample insight into what political ambition for office could look like in India, why individuals choose not to make politics their career, and predictors of women’s political ambition in the country. In conducting this data analysis, I was able to find evidence, though not causal, that either supported or contradicted existing theories in political science that attempt to explain women’s political ambition or lack thereof. Going forward, my dissertation proposal will use these insights to propose the following research directions:

This survey, like others used in political science research, conceptualized political ambition as a career in politics which is akin to asking if one wants to be a politician or run for elected office. This may be a narrow conceptualization of what political ambition means. So I ask, does a gender gap still persist if we conceptualize political ambition more broadly to include everyday forms of politics that are increasingly found in democracies around the world, such as grassroots activism, political non-profit work, and other forms of social mobilization? If so, why does this gender gap in political ambition exist?

Given the reasons why certain women do not have political ambition, how do we increase their ambition for various political careers? Can we design interventions, perhaps targeting women who are already ambitious, that encourage them to run for office or become political activists or involve themselves in politics in some way?

Some social scientists once said that good description is better than a bad explanation (King, Keohane, and Verba 2021) - doing careful descriptive research can provide invaluable insight into how the world works and exploratory data analysis is one important way to do this. Social scientists should endeavor to use the rich sources of existing data to motivate and formulate their research questions, ground their theories in reality, and explain phenomena in the world.

  • Fox, R. L., & Lawless, J. L. (2014). Uncovering the Origins of the Gender Gap in Political Ambition. American Political Science Review, 108(3), 499–519. https://doi.org/10.1017/S0003055414000227
  • Schneider, M. C., Holman, M. R., Diekman, A. B., & McAndrew, T. (2016). Power, Conflict, and Community: How Gendered Views of Political Power Influence Women’s Political Ambition. Political Psychology, 37(4), 515–531. https://doi.org/10.1111/pops.12268
  • King, G., Keohane, R. O., & Verba, S. (2021). Designing Social Inquiry: Scientific Inference in Qualitative Research. Princeton University Press.

Kamya Yadav

 alt=

Statistics: A Tool for Social Research and Data Analysis | 11th Edition

Available study tools, mindtap for healey/donoghue's statistics: a tool for social research and data analysis, 1 term instant access, about this product.

With a new emphasis on the same "real data" professionals use to make evidence-based decisions, Healey's STATISTICS: A TOOL FOR SOCIAL RESEARCH AND DATA ANALYSIS, 11th edition, and the MindTap digital learning solution introduce the fundamental concepts of statistics and their practical application to contemporary social issues. Examples from daily life illustrate the practical value of statistics in government, education, business, media, politics and sports. A student friendly approach breaks down complex material and helps learners understand the importance of statistics -- no advanced mathematical knowledge required. Students gain skills they will need as professionals in a social science field and as statistically literate consumers of social research.

The Library Is Open

The Wallace building is now open to the public. More information on services available.

  • RIT Libraries
  • Social/Behavioral Sciences Research Guide

Data Collection Methods

This InfoGuide assists students starting their research proposal and literature review.

  • Introduction
  • Research Process
  • Types of Research Methodology
  • Anatomy of a Scholarly Article
  • Finding a topic
  • Identifying a Research Problem
  • Problem Statement
  • Research Question
  • Research Design
  • Search Strategies
  • Psychology Database Limiters
  • Literature Review Search
  • Annotated Bibliography
  • Writing a Literature Review
  • Writing a Research Proposal

Quantitative and qualitative data can be collected using various methods. It is important to use a data collection method to help answer your research question(s).

Many data collection methods can be either qualitative or quantitative. For example, in surveys, observational studies or case studies, your data can be represented as numbers (e.g., using rating scales or counting frequencies) or as words (e.g., with open-ended questions or descriptions of what you observe).

However, some methods are more commonly used in one type or the other.

Quantiative & Qualitative Data Collection Methods

Cover Art

  • << Previous: Types of Research Methodology
  • Next: Anatomy of a Scholarly Article >>

Edit this Guide

Log into Dashboard

Use of RIT resources is reserved for current RIT students, faculty and staff for academic and teaching purposes only. Please contact your librarian with any questions.

Facebook icon

Help is Available

data analysis for social research

Email a Librarian

A librarian is available by e-mail at [email protected]

Meet with a Librarian

Call reference desk voicemail.

A librarian is available by phone at (585) 475-2563 or on Skype at llll

Or, call (585) 475-2563 to leave a voicemail with the reference desk during normal business hours .

Chat with a Librarian

Social/behavioral sciences research guide infoguide url.

https://infoguides.rit.edu/researchguide

Use the box below to email yourself a link to this guide

Social Research and Analysis program at Montclair State University

Social Research and Analysis (MA)

Apply Now Frequently Asked Questions Request Information Upcoming Events Program Website

Social Research and Analysis (MA) – STEM Designated Degree Program

The Master of Arts in Social Research and Analysis is a dynamic degree program that trains students to harness the power of data to improve programs, change social policies, market a product, execute an advertising campaign and inform business decision making.

Students in the program gain valuable skills in survey writing, focus groups moderating, ethnographic research, data analysis and data and text mining. They can tailor their program by taking electives in social policy research, business analytics, communications and media, or earth and environmental sciences.

We Are Flexible: Our 10 course/30 credits Social Research and Analysis MA program is designed for full or part-time study with program completion in 16 – 20 months. 

You may start in the Fall or Spring semester and enroll in both hybrid modalities (blend of in-person and virtual) and online courses:

  • Hybrid Program Option – Take a mix of on-campus and online courses for a customized learning experience. Choose this option if you can come to campus for at least 1 course per semester. You will still be able to choose online courses. This choice will give you more opportunities to interact with students and faculty in person. We are accepting F1/J1 seeking students for the Fall 2024 Semester*
  • Online Program Option – Take all your classes online from any location, in the United States or anywhere in the world. Online courses are offered in both synchronous and asynchronous formats. All of the course offerings provide you with opportunities to engage fully with your peers and professors online. 

Master’s Students can also add Certificates in Customer and User Experience Research (CX and UX Research) , Data Collection and Management , and Business Analytics with little additional coursework.

Students learn to use software like  SPSS ,  R ,  Qualtrics  and  NVivo  to analyze data and make evidence based decisions. The Master’s can also become a stepping stone to a PhD.

The Office of Graduate Admissions requires a U.S. bachelor’s degree, or the equivalent, in order to be eligible to apply for the MA in Social Research and Analysis graduate program. Some advanced undergraduates qualify to begin the program before they graduate. Applicants with non-U.S. degrees, please visit the International Applicants  page to review the U.S. degree equivalency information.

In order to make applying for graduate school as seamless as possible for you, we have created an application checklist. This checklist can be a reference point for you during the application process to ensure that you have a comprehensive understanding of the steps needed to apply, as well as all corresponding supplemental materials for your specific program of interest.

  • Application Deadline: Rolling Admission.
  • Submit Online Application: Please create your online account and submit your application by following the general application instructions . The $60 application fee will be waived for for attendees at Open Houses, Information Sessions and Information Webinars.

The following is a list of the supplemental materials that will accompany your application for the Social Research and Analysis (MA) program:

  • Transcript: One from each college attended.
  • What are your goals for graduate study and your future career?
  • In what ways do your academic background and your professional experiences provide evidence of your potential for success in the program you selected and in your eventual career? Please give specific examples of relevant coursework and/or experience.
  • Is there any further information we should consider in assessing your candidacy?
  • Letters of Recommendation (Optional): You may optionally submit up to two (2) letters of recommendation, from persons qualified to evaluate the applicant’s promise of academic achievement and potential for professional growth. However, we do encourage applicants with an undergraduate GPA below a 3.0 to consider submitting at least one recommendation letter.
  • GRE: Not required.
  • Applicants with non-U.S. degrees, please visit the  International Applicants page  to review the US degree equivalency information.

If you have any general questions regarding the application process and requirements, please email or call us: Office of Graduate Admissions Email: [email protected] Telephone:  973-655-5147 Fax: 973-655-7869

If you have specific inquiries regarding your program of interest, please contact the Social Research and Analysis (MA) Program Coordinator:

Program Coordinator: Christopher Donoghue Office: Dickson Hall 312 Email: [email protected]

Read our research on: Immigration & Migration | Podcasts | Election 2024

Regions & Countries

A look at black-owned businesses in the u.s..

The owner of Marcus Book Store, the oldest Black-owned bookstore in the U.S., talks with her employee about a shop display in Oakland, California, in December 2021. (Amy Osborne/The Washington Post via Getty Images)

More than one-in-five Black adults in the United States say owning a business is essential to financial success, according to a September 2023 Pew Research Center survey . While Black-owned businesses have grown significantly in the U.S. in recent years, they still make up a small share of overall firms and revenue, according to our analysis of federal data.

Pew Research Center conducted this analysis to examine the characteristics of Black-owned businesses in the United States. The analysis relies primarily on data from the 2022  Annual Business Survey  (ABS), conducted by the U.S. Census Bureau and the National Science Foundation’s National Center for Science and Engineering Statistics.

The survey – conducted annually since 2017 – includes all non-farm U.S. firms with paid employees and receipts of $1,000 or more in 2021. Firms are defined as businesses “consisting of one or more domestic establishments under its ownership or control.” Majority business ownership is characterized in the survey as having 51% or more of the stock or equity in the firm. The Census Bureau counts multiracial firm owners under all racial categories they identify with; Hispanic firm owners may be of any race. Read more about the ABS methodology .

A bar chart showing that about 3% of U.S. businesses were Black-or African American-owned in 2021.

In 2021, there were 161,031 U.S. firms with majority Black or African American ownership , up from 124,004 in 2017, according to the latest estimates from the Annual Business Survey  (ABS), conducted by the U.S. Census Bureau and the National Science Foundation. Black-owned firms’ gross revenue soared by 43% during this timespan, from an estimated $127.9 billion in 2017 to $183.3 billion in 2021.

Despite this growth, majority Black-owned businesses made up only about 3% of all U.S. firms that were classifiable by the race and ethnicity of their owners in 2021. And they accounted for just 1% of gross revenue from all classifiable companies that year. By comparison, in 2021, roughly 14% of all Americans were Black.

As has  long been the case , White majority-owned businesses made up the greatest share of classifiable firms (85%) and their revenue (93%) in 2021. About one-in-ten classifiable firms (11%) were majority-owned by Asian Americans, and no more than 7% had majority ownership by someone from another racial and ethnic group.

The Annual Business Survey classifies businesses as “majority Black- or African American-owned” if a Black owner has at least 51% equity in the firm. The same standard holds for business owners of other racial and ethnic backgrounds. The U.S. Census Bureau counts multiracial firm owners under all racial categories they identify with; Hispanic firm owners may be of any race. 

Not all U.S. businesses are classifiable by the race or ethnicity of their owners. In 2021, about 4% of all businesses in the U.S. were  not  classifiable by the race and ethnicity of their owners – though these firms accounted for 61% of total revenue. Ownership and revenue figures in this analysis are based on the roughly 5.7 million firms that  were  classifiable by the race and ethnicity of their owners in 2021, most of which are smaller businesses.

How many workers do Black-owned businesses employ?

Black or African American majority-owned firms provided income for roughly 1.4 million workers in 2021. Their annual payrolls were estimated at $53.6 billion.

Still, most Black-owned firms tend to be smaller businesses. Two-thirds had fewer than 10 employees in 2021 ; 13% had 10 to 49 employees and just 3% had 50 or more. Another 16% reported having no employees. (The ABS determines employment size by the number of paid workers during the March 12 pay period.)

What’s the most common sector for Black-owned businesses?

By far, health care and social assistance. About 45,000 of the roughly 161,000 U.S. companies with majority Black or African American ownership, or 28% of the total, were part of this sector in 2021.

Looked at a different way, 7% of  all  classifiable U.S. businesses in the health care and social assistance sector were majority Black-owned that year .

A chart showing that health care and social assistance is the most common sector among Black-or African American-owned businesses.

Other common sectors that year included:

  • Professional, scientific and technical services (comprising 14% of all Black-owned businesses)
  • Administrative and support and waste management and remediation services (8%)
  • Transportation and warehousing (8%)
  • Retail trade (6%)
  • Construction (6%)

Where are Black-owned businesses located?

A map showing that Black- or African American-owned businesses made up greatest share of firms in District of Columbia, Georgia and Maryland in 2021.

Most Black or African American majority-owned businesses (87%) are located in urban areas. Just 5% are in rural areas – that is, places with fewer than 2,500 inhabitants, under  the Census Bureau’s definition .

Some of the most populous states also have the greatest number of Black majority-owned businesses. Florida had 18,502 such businesses in 2021, California had 15,014 and Georgia had 14,394.

Black majority-owned businesses made up the greatest  share  of all classifiable firms in the District of Columbia (15%), Georgia and Maryland (8% each).

Who are Black business owners?

  • They’re more likely to be men than women. Some 53% of Black-owned firms in 2021 had men as their majority owners, while 39% had women majority owners. Another 8% had equal male-female ownership. The gender gap is larger among classifiable U.S. firms overall: 63% were majority-owned by men in 2021, 22% were majority-owned by women and 14% had equal male-female ownership.
  • They tend to be middle-aged. Roughly half (49%) of Black or African American business owners who reported their age group were ages 35 t0 54 in 2021. Another 28% were 55 to 64, and just 7% were younger than 35.
  • A majority have a college degree. Among owners who reported their highest level of education completed, 27% had a bachelor’s degree and 34% had a graduate or professional degree in 2021.

What motivates Black entrepreneurs?

When asked to choose from a list of reasons why they opened their firm, about nine-in-ten Black or African American majority owners who responded said an important reason was the opportunity for greater income; a desire to be their own boss; or wanting the best avenue for their ideas, goods and services. Balancing work and family life (88%) and having flexible hours (85%) were also commonly cited.

For most Black or African American majority owners, their business is their primary source of income . Seven-in-ten of those who reported income information in 2021 said this was the case.

Note: This is an update of a post originally published on Feb. 21, 2023.

data analysis for social research

Sign up for our weekly newsletter

Fresh data delivered Saturday mornings

8 facts about Black Americans and the news

Key facts about the nation’s 47.9 million black americans, facts about the u.s. black population, african immigrants in u.s. more religious than other black americans, and more likely to be catholic, across religious groups, a majority of black americans say opposing racism is an essential part of their faith, most popular.

About Pew Research Center Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions. It is a subsidiary of The Pew Charitable Trusts .

Discover the biggest digital marketing trends and uncover valuable insights and actionable tips.

  • · Brandwatch Academy
  • Forrester Wave

Brandwatch Consumer Research

Formerly the Falcon suite

Formerly Paladin

Published February 23 rd 2024

How B2B Brands Can Benefit from Social Listening

While it’s clear how social listening benefits large consumer brands, what about B2B companies that sell their services to other businesses?

Many B2B organizations wrongly assume that monitoring conversations online is less valuable for them compared to consumer-facing brands. Those brands have hundreds of thousands of buyers and millions more potential ones. They generate high volumes of online discussions about them, have a dedicated social media team, and plenty of data to analyze for insights.

But what about those B2B companies that sell their services to other businesses?What about companies that, though hugely successful and often large, don’t get mentioned much online and perhaps have fewer customers?

Based on Forrester’s 2023 B2B Brand And Communications Survey, 90% of B2B buyers rely on peer recommendations, and 85% trust user reviews from vendors in their industry. And there are many third-party review websites, like TrustPilot, G2, Capterra, TripAdvisor, Glassdoor, and even Reddit , that collect user feedback independently of the businesses featured. 

Without monitoring those conversations, B2B brands miss key consumer intelligence that provides opportunities to build awareness, engagement, and trust throughout the (often) long buyer’s journey. 

An active listening and engagement strategy allows B2B companies to better influence customers and decision makers in their industry.

How can brands quantify the available information online and determine what really matters?

7 of the biggest use cases for social listening in B2B

Below, we’ve outlined the biggest use cases for social listening in the B2B space to show how companies can mine online networks for insights and to meet their higher-level business objectives.

1. Gaining market intelligence and deep consumer insights

Unless you’re in an industry that generates a lot of online chatter, like retail or entertainment , you can expect the volume of online conversation about your brand to be much lower. But fewer conversations don’t necessarily equal fewer valuable data points. It can be quite the opposite.

Online discussions with fewer mentions often tend to be more focused, and it takes less time to analyze all relevant posts. When gathering online mentions, do make sure you’ve checked all available data sources a social listening tool can track. 

To quote the Executive Managing Director at RF|Binder, Jeff Melton: “Focus groups are helpful, but I love social listening because there’s not as much bias. I love that saying that it’s the ‘largest untapped focus group.’ People are pretty damn honest.” 

You might like

Learn how to leverage social listening for due diligence.

[Guide] Five Ways Private Equity Firms Can Leverage Social Listening

2. Generating prospects and driving sales

Most analytics tools recommend that companies begin by monitoring branded conversations. However, many B2B companies lack a strong online brand presence, and they rarely initiate discussions related to their brand with their online audience.

In our recent report, The State of Social, we revealed a noteworthy statistic that may raise concerns for brands. Did you know that across all industries, brand-owned accounts contribute just 1.51% to the overall brand-related conversation ?

As an alternative, brands can monitor all relevant online discussions, including those about competitors, and keep an eye on the audience that matters most to them. They can contact these people and companies and attempt to nurture them into sales.

Tracking and proactively responding to online conversations from potential prospects is more meaningful than trying to surpass your competitors in ad spending as part of your customer acquisition plan.

6 Ways Your Agency Can Deliver More Insightful Pitches

Discover six ways agencies can use social listening insights to win new business.

3. Handling your next issue (before it turns into a crisis)

With social listening, you can easily find when someone complains on social media.

Don’t shy away from criticism. By responding immediately and resolving the issue, your customers become even more loyal. If you don’t know how to address a customer complaint, you can split up mentions into categories and distribute them to different team members or experts in the field.

When a crisis is looming for your company, there is a good chance you will hear about it on the web first. That is, if you are listening for it.

When Slack experienced a widespread outage lasting five hours, it left many users unable to access the platform. 

With social listening, Slack was able to understand why they were trending on social and address consumer concerns quickly and authentically, generating positive sentiment in conversations along the way. 

How to Prepare for and Manage a Crisis

The strategies, tools, and best practices to prepare you for a crisis.

4. Identifying brand advocates and influencers in your space

One of the biggest benefits of monitoring social media is fostering strong connections with brand champions and those who are most influential in the industry. B2B companies can find company or industry critics, listen to what they are saying, and respond suitably. 

When creative agency Supernatural used social listening to uncover data about consumer airplane behavior and air travel etiquette, they gained valuable insights from millions of travel enthusiasts. Armed with this information, the agency helped KAYAK launch a successful campaign aimed at decoding modern travel norms. 

Similarly, global nonprofit Potential Energy Coalition (PEC) used social listening to track online discussions related to climate change. Through their research, the PEC team found important topics and public opinions and uncovered a highly influential audience within their communities: moms. The nonprofit then launched its Science Moms campaign, aiming to engage influencers in the space to inspire change.

5. Keeping tabs on the competition

Although branded conversation for B2B companies is generally lower than for B2C brands, you might be surprised to see the amount of chatter surrounding your competitors, including discussions about what they’re not doing.

Doing an online competitive analysis can be highly beneficial for B2B brands. You can estimate your share of voice compared to your competitors and assess changes in online performance. Brands can also monitor channel growth and media coverage and gain insights into customer perception and competitive advantages. 

For instance, media and creative agency Helen & Gertrude regularly use social listening to help clients better understand their competitive landscape. When monitoring competitors for their clients, the agency can spot competitor announcements or launches that could’ve been overlooked otherwise, and see how competitor campaigns are running.

The agency team can then use this data to show their clients how competitors’ campaigns are doing, both in numbers and in the type of response they're getting online.

Learn from experts: How to approach competitive intelligence

Get your step-by-step tutorial.

6. Discovering and leveraging product feedback

Social media gives companies the ability to get quick, unincentivized feedback they require to stay agile. This can be especially helpful for businesses that don’t have the budget for research and development.

Identifying what people like or dislike about your product will help you understand how to better satisfy customers’ and prospects’ needs.

We once explored what people were saying online about music streaming services . What we found is that many people complained about unwanted ads, getting wrong recommendations, and dealing with in-app bugs. 

In our State of Social report, we showed that consumers often compare different streaming services online and complain about problems with buffering and freezing. 

B2B brands that proactively monitor brand and industry-related conversations are more likely to discover valuable feedback they can use to improve their products and services and to enhance customer satisfaction . 

When Fetch Rewards used social listening to gather feedback on product features, they were able to make improvements based on real-time user insights. On one occasion, they monitored online user reactions around a new feature, which allowed people to connect with friends in the app. This helped Fetch Rewards analyze consumer perceptions in real time, identifying areas for product enhancement and leading to a more refined and user-friendly app experience.

7. Checking your brand health and establishing benchmarks

Social listening can help B2B brands establish benchmarks for themselves. Whether it’s comparing brands’ current and past performance or measuring against established industry metrics, benchmarks can help brands in their decision-making and strategic adjustments.

Here are just a few metrics brands should consider and benchmark against:

  • Share of voice
  • Sentiment analysis
  • Emotion analysis
  • Brand associations
  • Price perception
  • Net Promoter Score (NPS)

[Guide] Here is your guide to brand health.

Learn everything you need to know about brand health and how to track it.

Good luck on your quest to social media intelligence

Navigating the B2B business landscape can be complex, but social listening unveils untapped potential for brands seeking to understand their audience, competitors, position in the market, and the industry at large. 

By using social listening to understand what consumers genuinely care about, brands can gain a competitive edge and improve their strategies, leading to better business outcomes.

Ksenia Newton

Content Marketing Manager

Share this post

Brandwatch bulletin.

Offering up analysis and data on everything from the events of the day to the latest consumer trends. Subscribe to keep your finger on the world’s pulse.

New: Consumer Research

Harness the power of digital consumer intelligence.

Consumer Research gives you access to deep consumer insights from 100 million online sources and over 1.4 trillion posts.

Brandwatch image

More in marketing

The swift effect: what brands can learn from taylor swift.

By Emily Smith Feb 29

7-Step Guide: Choosing the Right Social Media Monitoring Tool for You

By Emily Smith Feb 20

How to Optimize Your Brand for Social Media Search

By Emily Smith Jan 30

The Complete Guide to Social Media Lead Generation

By Michaela Vogl Jan 29

We value your privacy

We use cookies to improve your experience and give you personalized content. Do you agree to our cookie policy?

By using our site you agree to our use of cookies — I Agree

Falcon.io is now part of Brandwatch. You're in the right place!

Existing customer? Log in to access your existing Falcon products and data via the login menu on the top right of the page. New customer? You'll find the former Falcon products under 'Social Media Management' if you go to 'Our Suite' in the navigation.

Paladin is now Influence. You're in the right place!

Brandwatch acquired Paladin in March 2022. It's now called Influence, which is part of Brandwatch's Social Media Management solution. Want to access your Paladin account? Use the login menu at the top right corner.

  • Open access
  • Published: 29 February 2024

What methods are used to examine representation of mental ill-health on social media? A systematic review

  • Lucy Tudehope   ORCID: orcid.org/0000-0002-9544-1006 1 ,
  • Neil Harris   ORCID: orcid.org/0000-0002-1786-3967 1 ,
  • Lieke Vorage   ORCID: orcid.org/0000-0002-5744-189X 1 &
  • Ernesta Sofija   ORCID: orcid.org/0000-0002-4761-9762 1  

BMC Psychology volume  12 , Article number:  105 ( 2024 ) Cite this article

19 Accesses

7 Altmetric

Metrics details

There has been an increasing number of papers which explore the representation of mental health on social media using various social media platforms and methodologies. It is timely to review methodologies employed in this growing body of research in order to understand their strengths and weaknesses. This systematic literature review provides a comprehensive overview and evaluation of the methods used to investigate the representation of mental ill-health on social media, shedding light on the current state of this field. Seven databases were searched with keywords related to social media, mental health, and aspects of representation (e.g., trivialisation or stigma). Of the 36 studies which met inclusion criteria, the most frequently selected social media platforms for data collection were Twitter ( n  = 22, 61.1%), Sina Weibo ( n  = 5, 13.9%) and YouTube ( n  = 4, 11.1%). The vast majority of studies analysed social media data using manual content analysis ( n  = 24, 66.7%), with limited studies employing more contemporary data analysis techniques, such as machine learning ( n  = 5, 13.9%). Few studies analysed visual data ( n  = 7, 19.4%). To enable a more complete understanding of mental ill-health representation on social media, further research is needed focussing on popular and influential image and video-based platforms, moving beyond text-based data like Twitter. Future research in this field should also employ a combination of both manual and computer-assisted approaches for analysis.

Peer Review reports

Introduction

In the last few decades, and particularly in the wake of the COVID-19 pandemic, the threat mental illness poses to public health has been increasingly recognised. The World Health Organization defines mental health as “a state of mental well-being that enables people to cope with the stresses of life, realize their abilities, learn well and work well, and contribute to their community” (World Health Organization, 2022, p. 8). However, this review is focused on mental-ill health, an umbrella term to refer to an absence of this state of well-being either through mental illness/disorder or mental health problems [ 1 , 2 ]. A global burden of disease study to quantify the impact of mental and addictive disorders estimated that 16% of the world’s population were affected by some form of mental or addictive disorder in 2019, and suggest these conditions contribute to 7% of total disease burden as measured by disability adjusted life years (DALYs) [ 3 ]. Although the age-adjusted rates of DALYs and mortality for all disease causes have steadily declined in the last 15 years by 30.4% and 16.3% respectively, these rates have only increased for mental disorders by 4.3% and 12% respectively [ 3 ].

Despite the benefits and effectiveness of modern medicine, therapies and community support programs for those with mental health conditions, engagement with mental health support is often very poor [ 4 ]. Even for individuals who do eventually seek mental health care, the delay between symptom onset and treatment averages more than a decade [ 5 ]. The consequences of such delays in help-seeking can include adverse pathways to care [ 6 ], worse mental health outcomes [ 7 ], drug and alcohol abuse [ 8 ] and suicide [ 9 ]. While there are many potential barriers to the help-seeking process, significant previous research has demonstrated that attitudes towards mental illness, in particular stigma, are key factors preventing individuals from translating a need for help into action [ 9 , 10 , 11 ]. Stigma is a term often used in a broad sense to refer to discriminatory and negative beliefs attributed to a person or group of people [ 12 ]. However, in order to design evidence-based and effective stigma reduction interventions, a nuanced understanding of current societal views and attitudes towards mental ill-health is first necessary.

Historically, many studies investigating public stigma towards mental illness have focussed on traditional media (e.g., print or television news media), but more recently the wealth of information provided by social media has been recognised. Researchers are now harnessing social media as a powerful tool for public health research, for example in the fields of epidemiology and disease surveillance [ 13 , 14 ], chronic disease management and prevention [ 15 ], health communication [ 16 ] and as an effective platform for intervention strategies [ 17 ].

Social media allows individuals to share user-generated or curated content and to interact with others [ 18 ]. It has become a central means to share their experiences and express their thoughts, opinions, and feelings towards issues. Access to such information and opinion has significant potential to influence the attitudes and health behaviours of social media users [ 19 ]. It can perpetuate negative stereotypes and increase stigma, but it can also provide a platform for discussion and sharing of personal experiences potentially helping to reduce stigma and in turn, facilitate help seeking behaviour. It must also be noted that persons living with mental illness are known to have higher rates of social media use in comparison to the general population, and are therefore at high risk of exposure to potentially negative or misrepresenting mental health content [ 20 ]. As such, social media presents a valuable research tool for investigating the attitudes of society toward mental ill-health.

Much of the previous research surrounding mental health and social media focuses on the effects of extensive social media use on psychological health and wellbeing [ 21 ] and utilizing machine learning to detect and predict the mental health status of users [ 22 ]. However, there has been a recent surge in studies using social media data to reveal attitudes and perceptions towards mental-ill health more broadly and towards specific mental health conditions. Despite the growing interest in this field and its importance to public mental health, no attempts have been made to systematically review these studies. The current state of research is heterogenous with various research designs, data collection and data analysis techniques employed to analyse social media data. A methodological review is needed to provide researchers and health professionals with an overview of the current state of the literature, demonstrate the utility of various methods and provide direction for future research.

Therefore, the aim of this systematic literature review is to provide a comprehensive overview and evaluation of the current research methods used to investigate the representation of mental ill-health on social media. The review critically appraises the quality of these studies, summarises their methodological approaches, and identifies priorities and future opportunities for research and study design.

Search strategy and screening procedure

Seven databases were systematically searched on September 27, 2022, including Ovid MEDLINE (via Ovid), PsycINFO (via Ovid), CINAHL (via EBSCO), SCOPUS and the ProQuest Public Health, Psychology and Computer Science Databases. Searches were filtered to present only peer-reviewed journal articles and studies published in English, and terms were applied to the title and abstract fields for each database where possible. Search terms related to [ 1 ] social media (e.g., “social platform”, “online social network*”, “user-generated”), [ 2 ] mental health (e.g., “depress*”, “anxiety”, “schizo*”) and [ 3 ] either relevant method (e.g., “(content or discourse or thematic) adj3 analy*) or terms to reflect representation (e.g., “represent*”, “attitude*”, “stigma*”). The full search strategy employed for each database can be found in Additional File 1.

The abstract and citation information for 9,576 records were downloaded and imported into Covidence systematic review software (Version 2), a web-based software specifically designed to facilitate screening, extraction, and quality appraisal. Once imported, duplicate records were automatically identified and removed by Covidence. Each stage of the screening process was carried out by two authors (LT and LV), independently. The title and abstract of 5,373 articles were screened to determine eligibility. If the two reviewers marked a different decision in Covidence, the articles were discussed and reviewers came to a consensus, and if a decision could not be made a third reviewer was consulted (ES or NH). Articles included at the title/abstract level ( n  = 136) were then screened in full text to determine relevance. Reviewers recorded the reason for exclusion. The reference list for each eligible article was then screened for any relevant publications.

This systematic review is registered with the International Prospective Register of Systematic Reviews (PROSPERO, ID: CRD42022361731). The review is reported in accordance with the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines 2020 [ 23 ]. Figure  1 presents a PRISMA flowchart detailing the systematic review procedure.

figure 1

PRISMA flow diagram of identification, screening, and inclusion procedure

Eligibility criteria

Peer-reviewed journal articles were considered for inclusion if authors conducted an analysis of user-generated social media content regarding mental ill-health and its representation. To be considered for inclusion, social media content must be posted by individual users, as opposed to content posted on behalf of a group or organisation e.g., news media or a non-government organisation. All social media platforms except for those considered discussion forum websites such as Reddit and Quora were included. These were excluded from the review because they are considered distinct forms of social media in which content is arranged and centred on subject matter in contrast to traditional social networking sites which focus on people and their profiles. As a result, the networking dynamics are distinctly different from traditional social media platforms and bring together individuals with specific shared interests and may therefore be less appropriate for analysis of wider public perceptions and representations of mental ill-health. As per the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-V), the scope of the systemic review was narrowed to include social media content regarding any condition classified under ‘schizophrenia spectrum and other psychotic disorders’, ‘bipolar and related disorders’, ‘depressive disorders’, ‘anxiety disorders’ and ‘obsessive-compulsive and related disorders.’ Studies must evaluate content regarding mental health more broadly or focus on a specific mental health condition as listed under these DSM-V classifications. It is beyond the scope of this review to include studies which focus on mental health in a positive sense i.e., wellbeing, happiness, and positive functioning.

In terms of study design, articles were included if they analysed the content of social media posts and/or comment responses, whether this be text, photo and/or video-based content. Data analysis methods may include but are not limited to content, discourse, thematic or linguistic analysis, and may also include studies which utilised machine learning to facilitate the analysis process. Conference proceedings, articles without accessible full-text or published in a language other than English were also excluded.

Data extraction and synthesis

The data extraction template was developed using a sample of 5 studies. It was then piloted using an additional 5 studies and further refined. Extraction was completed through Covidence by one reviewer (LT) and subsequently checked by a second reviewer (LV). Any issues or questions were discussed and agreed upon by the two reviewers, and a third reviewer (ES or NH) was consulted if a decision could not be made. Extracted data included bibliographic information as well as methodological details, including: (1) aim/objective (2) social media platform and language (3) mental health condition/s (4) comparison to physical condition (yes/no) (5) hashtags/keywords used for search (6) data range and timeframe of collected data (7) number of posts analysed (8) type of data analysis (9) coding framework and development process and (10) coding protocol. The extracted results are presented in a narrative synthesis due to the heterogeneity of the included studies and because this review focuses on the methods of included studies.

Article appraisal

Critical appraisal of the included studies was conducted based on the Critical Appraisal Skills Programme (CASP) guidelines for qualitative research (Critical Appraisal Skills Programme, 2022). This tool contains a checklist of 10 items which assist in the assessment of the appropriateness of the qualitative research design, consideration of ethical issues, the rigour of data collection, analysis and presentation of results and value of the research. Each item was answered with ‘Yes’, ‘Can’t tell’ or ‘No’. Two reviewers (LT and LV) independently applied the CASP checklist for each of the extracted studies. Any disagreements were discussed and resolved between the two reviewers of if this was not possible a third independent review (ES or NH) assisted.

The included studies primarily involved the analysis of text-based data derived from social media. When considering the range of critical appraisal tools which could be employed in this systematic review, the CASP tool was selected by the authors because it included items most applicable to this type of analysis, as opposed to qualitative studies involving interview or focus group-based data collection. The authors decided to exclude item 4, “Was the recruitment strategy appropriate to the aims of the research?” and item 6, “Has the relationship between researcher and participants been adequately considered?”, as there was no recruitment of active participants in the included studies. Researcher bias was instead considered when answering ‘Was the data analysis sufficiently rigorous?” by identifying whether authors demonstrated consistency in coding and factored in potential biases. The identification and selection of posts for analysis was considered in the question regarding data collection (item 5).

It must be noted that some of the studies selected for inclusion in the review analyse text-based data in a quantitative manner or conduct additional quantitative analysis of social media reach metrics. These studies were still appraised using the CASP tool, however questions such as “Is a qualitative methodology appropriate?” and “Was the data analysis sufficiently rigorous?” were modified or expanded to include consideration of any quantitative analysis elements. This was deemed more appropriate than employing a mixed-methods appraisal tool, which included items inappropriate or irrelevant to the included studies.

A total of 36 articles met all inclusion criteria and were synthesised in the results. The search yielded 10 articles (27.8%) which were published in 2022, the year the search was conducted. A further 15 articles were published within the previous three years from 2019 to 2021 (41.7%) and 11 were published in 2018 or earlier (30.6%). Figure  2 illustrates the growth in the cumulative number of peer-reviewed publications each year.

figure 2

Cumulative number of articles published each year and their primary method of analysis

Social media platforms and unit of analysis

As shown in Table  1 , various social media platforms were used for the collection of data. Of the 36 included studies, the majority ( n  = 22, 61.1%) analysed data collected from Twitter. This was followed by 5 studies analysing Sina Weibo (13.9%), 4 studies analysing YouTube (11.1%), 2 studies analysing Instagram (5.6%), 1 study analysing TikTok (2.7%) and 1 study analysing Pinterest (2.7%). One study collected data from a variety of social media platforms (2.7%).

The unit/s of analysis (element of social media post analysed) also varied between studies (Refer to Table  2 ). A total of 28 (77.8%), primarily comprising the Twitter and Sina Weibo studies, analysed text-based data. Three studies analysed images (8.3%), two of which also involved analysis of associated captions (5.6%). Four studies analysed video-based content (11.1%). In total 8 studies (22.2%) conducted an analysis of comments associated with social media posts and 15 (41.7%) analysed reach metrics such as post likes and shares. Only 3 studies (8.3%) included an analysis of any content linked in a social media post such as an external website, and 14 (38.9%) collected and analysed data based on the social media profile type or demographics of content posters.

Mental health condition/s in focus

The studies analysed social media content relating to one or more mental health conditions as per the review inclusion criteria (Refer to Table  1 ). The most frequent mental health condition was schizophrenia/psychosis, with content analysed in 14 studies (38.9%). This was closely followed by studies focused on mental health/mental illness content more broadly ( n  = 13, 36.1%), for example by searching for posts using #mentalhealth or ‘mental illness’, and studies which analysed depression ( n  = 12, 33.3%) Four included studies focused on bipolar disorder (11.1%), three studies focused on obsessive compulsive disorder (8.3%), only two focused on anxiety (5.6%) and one specifically focused on trichotillomania (2.8%).

Although the majority of studies focus solely on social media content related to one mental health condition, four studies (11.1%) include multiple health conditions and compare analysis results between each condition. Budenz et al. [ 25 ] compares content related to mental health/mental illness to content specific to bipolar disorder, while Jansli et al. [ 29 ] compares seven different mental health conditions. Both Li et al. [ 45 ] and Reavley and Pilkington [ 39 ] offer a comparison of schizophrenia/psychosis and depression related social media content. Four studies also incorporated a comparison between mental and physical health conditions into research aims. Studies compare mental ill-health content to diabetes [ 24 , 31 , 40 , 43 ], cancer [ 40 , 43 ], Alzheimer’s disease [ 43 ], HIV/AIDS [ 40 , 43 ], asthma [ 40 ] and epilepsy [ 40 ].

Social media content language and location of researchers

The inclusion criteria specified that studies must be published in English, but studies did not necessarily need to analyse English-based social media content. While 75.0% of studies did analyse English content ( n  = 27), five studies analysed Chinese content (13.9%), two studies analysed Greek content (5.6%), and Turkish, French, and Finnish social media content were each analysed in one study (8.4%) (Refer to Table  1 ).

Over half of the literature in this field is published by researchers affiliated with institutions within the United States ( n  = 19, 52.8%). This is followed by five studies from researchers in the United Kingdom (13.9%), four studies from China (11.1%), four studies from Canada (11.1%), and the remaining articles from researchers in Finland, Greece, Israel, Australia, New Zealand, Spain, Netherlands, and Turkey ( n  = 11, 30.6%).

Study design

Data collection methods.

The specific method of data collection varied based on the social media platform analysed. In most studies, authors applied a specific hashtag search relevant to the mental health topic of interest (e.g., #mentalhealth) or entered keywords into the social media platform search bar (e.g., “schizophrenia”). Given the volume of data posted to social media, most studies limited the collection of data to a specified time period, which ranged drastically between studies from 1 day to 10 years.

Several studies aimed to analyse mental health-related social media content based on a particular event or public health campaign, which dictated the timeframe of data collection. Makita et al. [ 33 ] collected data and analysed discourse specifically during Mental Health Awareness Week and Saha et al. [ 41 ] collected data only on World Mental Health Awareness Day. A study by Budenz et al. [ 20 ] collected data before and after a mass shooting event in the United States to identify changes in mental illness stigma messaging. Two studies analysed social media responses to the mental ill-health disclosure of professional athletes [ 36 , 54 ], and one study collected data using the hashtag ‘#InHonorofCarrie’ to examine mental health-related content after the death of mental health advocate and actress Carrie Fisher [ 35 ].

While some authors analysed all posts identified in their social media search, others used specific inclusion/exclusion criteria and/or selection methods to limit the number of posts for further analysis. These included random selection of posts in the search result, selecting only every ‘x’th post, selecting the most viewed/liked/commented posts and/or selecting the first ‘x’ number of posts appearing in search results or each page of search results.

Primary data analysis methods

While all included studies involved analysis of data extracted from social media, the method of analysis differed between studies (Refer to Table  2 ). The majority of studies conducted analysis through manual human-based coding ( n  = 25, 69.4%), of which 24 utilised some form of content analysis ( n  = 24, 66.7%). A total of eight (22.2%) content analysis studies employed an inductive coding approach in which themes were generated from the ‘ground up’ based on the data, while nine studies (25%) employed a deductive approach in which a coding framework was developed prior to the commencement of coding based on previous research and/or author expertise. However, six studies (16.7%) used a combination of approaches, in which a codebook was initially developed, but was inductively refined through a preliminary coding process. Only one study performed an inductive thematic analysis of social media content (2.8%), and one study used a combination of deductive content analysis and inductive thematic analysis to answer research questions (2.8%).

In total five studies (13.9%) used human-based coding in combination with computer-assisted coding, whereby an initial sample of human coded data was used to develop a machine learning model which could subsequently analyse a large volume of data. Aside from content analysis and thematic analysis, three studies conducted software-mediated linguistic analysis (8.3%) and two studies involved sentiment analysis and topic modelling (8.3%) and one used language modelling (2.5%). Figure  2 illustrates the cumulative number of articles published each year and the primary analysis employed. The figure demonstrates that an article utilising a computer-assisted approach was first published in 2018, and there has since been a surge in the number of studies adopting these tools for analysis.

Coding frameworks

The authors who utilised a deductive approach to content analysis, either developed their own coding framework, or adopted a framework previously developed and reported in the literature. Frameworks varied greatly between studies but often included coding the type of social media profile (e.g., individual, consumer, health professional, organisation), the type of mental health-related content (e.g., personal experience, awareness promotion, advertising, news media, personal opinion/dyadic interaction) and/or the broader topic or context of posts (e.g., politics, everyday social chatter, culture/entertainment, mental health, news, awareness campaigns). Some studies also chose to categorise mental health-related content as either ‘medical’ (e.g., diagnosis, treatment, prognosis) or ‘non-medical’ before further classification.

In terms of coding for representation or attitudes towards mental ill-health, most studies coded for stigma, variously defined. In some studies, this was merely the presence or absence of stigma for each unit of analysis (e.g., was there stigmatising content in the tweet or not), but in others stigma was further broken down into more specific types of stigma. For example, the coding framework developed by Reavley and Pilkington [ 39 ] includes stigmatising attitude subthemes such as ‘social distance’, ‘dangerousness’, and ‘personal weakness’. In some studies, trivialisation has been classed as stigma, while in others a separate coding category has been created for any posts which are deemed to be trivialising, mocking or sarcastic towards mental ill-health. Another common approach in the included studies was to code for the valence or overall sentiment of each unit of analysis, in which categories included positive, neutral or negative polarity, or classified tone as positive or pejorative. Some authors analysed the use of mental health related terminology and categorised this based on whether terms are misused or employed metaphorically.

Quality appraisal

The studies were appraised using the CASP tool for qualitative research, which does not calculate a final score or provide an overall grade of quality. A total of 37 studies met all the review inclusion criteria and were appraised by reviewers. A breakdown of appraisal results for each CASP item is presented in Additional File 2. The criteria in which the highest number of studies received a rating of ‘no’ related to the rigour of data analysis ( n  = 6, 16.7%) and clarity of stating findings ( n  = 6, 16.7%). Based on the results of the appraisal and after discussion between all authors, one study was excluded from the review synthesis due to lack of clarity in reporting methods [ 58 ].

This review summarised the current literature investigating the representation of mental ill-health on social media, in particular focussing on methodological design. While human-based content analysis was the dominant means of qualitative data analysis, a limited number of studies employed computer-based techniques. The results also indicated an uneven distribution in the social media platforms selected for data collection, as well as the unit/s of analysis. These findings suggest some important methodological gaps in the literature.

A growing area of research interest

The results demonstrate that almost 70% of all studies in this field were published within the last four years, from 2019 to 2022, suggesting this is an emerging area of interest in the academic literature. Social media research has been used to identify the attitudes and opinions of the public regarding many topics, but appears to have rapidly gained favour amongst researchers during the COVID-19 pandemic, researching public perceptions of issues such as vaccination [ 59 ], healthcare staff [ 60 ], restrictions [ 61 ] and the pandemic more broadly [ 62 , 63 , 64 ]. Perhaps the surge in publications relating to the representation of mental ill-health on social media is reflective of a wider trend towards this type of research and an acknowledgement amongst researchers of the power of social media data. Social media presents real-time data to capture current public perceptions about a topic and the opportunity to monitor changes over time [ 62 ]. However, it must also be acknowledged that the recent growth in publications found may also be reflective of a societal shift towards increased acceptance of using online social media as an appropriate forum for mental health-related discourse, triggering subsequent research interest [ 65 , 66 ].

The dominance of Twitter-based research

Our review revealed an uneven distribution of social media platforms studied within the current literature. Over 50% of the included studies collected data from the text-based social media platform Twitter and a further five studies analysed Sina Weibo data, a Chinese microblogging site highly reminiscent of Twitter. These results align with the findings from other systematic reviews into social media-based research, which demonstrate a skewed focus towards text-based data sources [ 67 , 68 ]. This dominance in the research landscape is likely due to methodological considerations. Twitter is an open-source platform and users can choose not to reveal their identity in profile ‘handles’. The text-based nature of the data also ensures analysis is relatively easier and permits the use of machine learning approaches.

Unfortunately, the emphasis on Twitter limits the scope of this body of research and does not accurately reflect the relative popularity of social media platforms. As of 2023, Facebook has the highest number of global monthly active users (MAUs) at more than 2.9 billion, yet none of the included studies in this review collected data from this platform. This is likely because collecting data on Facebook and other direct messaging platforms without breaching the privacy of users remains an ethical challenge [ 67 ]. Image and video-sharing platforms have seen rapid growth in popularity in the last few years, yet only represent a minority of the studies in this review. Instagram has over 2 billion MAUs, and the video-based platform TikTok has over 1 billion, suggesting a much higher share of the social media market than Twitter at 556 million and Sina Weibo at 584 million MAUs [ 69 ].

Such dominance in the use of Twitter means that certain populations and age groups are underrepresented in the current research. Twitter is known to have an older demographic of users, with 38.5% aged 25–34 years and 20.7% aged 35–49 years [ 70 ]. By comparison, TikTok has become a popular platform for teenagers and young adults, with 67.3% aged under 24 years and only 5.97% aged 35–44 years [ 71 ]. Young people are known to experience a higher rate of mental illness in comparison to older age groups, but their engagement with mental health care is often poor causing a delay in help-seeking behaviour [ 4 , 9 , 72 ]. Thus, future social media-based research into the representation of mental health conditions on platforms predominantly frequented by younger users has the potential to add significant value to this body of literature.

Analysis of social media content

While the majority of included studies employed content analysis ( n  = 24, 66.7%), their processes varied considerably. It is worth noting that nine of these 24 studies, followed a deductive dominant approach to coding, while a further seven included a deductive element. A deductive (or sometimes termed ‘directive’) approach to content analysis is most appropriate where existing research findings, conceptual frameworks or theories can be used to guide codebook development [ 73 , 74 ]. Thus, given that there is extensive previous literature related to the representation of mental ill-health (albeit not necessarily in social media), and in particular frameworks for mental illness stigma, it is appropriate to take a deductive approach [ 75 , 76 ]. However, introducing an inductive element to the approach, in which the initial codebook is inductively refined through initial coding stages can result in coding categories more suited to the specific social media data extracted from the platform of interest and potentially provide more nuanced analysis [ 77 ].

It should also be noted the apparent dearth of studies in this field adopting thematic analysis. There are several reasons why this may be the case, the foremost being the volume of data for analysis on social media. It is widely held in the literature that the choice of content analysis versus thematic analysis is a question of wide application versus deep analysis [ 78 ]. Due its alignment with quantitative research, content analysis can be more suited to larger data sets, whereas thematic analysis allows for greater immersion in the data and depth of understanding [ 78 ]. While both are of value, in the case of social media data where researchers are aiming to understand public representations and attitudes towards mental illness, content analysis can provide the wider analysis required for research questions.

Review of the current literature also suggested the coding frameworks adopted by the included studies vary greatly, making comparison of their findings challenging. Each study defined the concept of stigma differently through their approach to coding, for example both Jansli et al. [ 29 ] and Jilka et al. [ 30 ] simply identified whether content was stigmatising or not. Conversely, Budenz et al. [ 25 ] coded for the presence or absence of mental illness stigma and then specifically coded for violence-related mental illness stigma as the study aimed to identify changes in tweet content before and after a mass shooting event. Meanwhile, Reavley and Pilkington [ 39 ] took the coding process one step further and developed a detailed coding framework which groups different types of stigmatising attitudes such as ‘beliefs that mental illness is due to personal weakness’, ‘people with mental illness are dangerous’ and ‘desire for social distance from the person’. In critically analysing the methodological approaches of these studies, it must be acknowledged that stigma is a broad concept containing many nuances. In order to gain a deep understanding of societal perceptions and attitudes towards mental ill-health, coding frameworks should be developed with these nuances in mind and reflect the many aspects of stigmatising attitudes. Content analysis should avoid a ‘tick box’ approach to the identification of stigma, and instead aim for a richer understanding of mental ill-health perceptions.

Of the studies which employed content analysis, the vast majority used a manual approach in which human researchers hand coded the data. However, more recently machine learning techniques have been applied to the field. For example, Saha et al. [ 41 ] hand coded a sample of 700 tweets and used these to develop a machine learning framework to automatically infer the topic of the remaining 13,517 tweets. Several studies also used specialised packages such as Linguistic Inquiry and Word Count software to extract the psycholinguistic features from social media data and obtain quantitative counts [ 37 , 44 , 45 ]. The clear advantage of these computerised methods is that they allow researchers to analyse much larger volumes of data and reduce the manual labour and time involved in the analysis process. While these studies undoubtedly add value to the body of literature, there still remains a place for the process of manual human coding, especially in the case of more detailed coding frameworks, which can offer more nuanced insights. Although technology is rapidly advancing, manual human coding also remains the only viable means of analysis for researchers intending to interpret image and video-based data.

Quality of studies and frequent issues

Critical appraisal of the included studies was conducted using the CASP tool for qualitative research [ 79 ]. As was described in the methods, this was deemed the most appropriate tool for the appraisal, yet authors still needed to modify and adapt the tool for the purposes of this review. Given the difficulty in finding an appropriate critical appraisal tool for studies which involve analysis of social media-based content and the apparent growth in researcher interest for this study design, the authors advocate for the need of the development of a more specific appraisal tool.

The authors noted a few frequent issues which lowered the quality of included studies and should be addressed in future research in the field. Firstly, multiple studies did not describe the process of codebook development with transparency and if the approach was deductive did not indicate the previous literature which assisted this process. The coding framework is key to ensuring rigorous data analysis and generating meaningful findings, and its development should therefore be described in sufficient detail. The reviewers also noted inconsistency in study coding protocols for content analysis studies. In this type of analysis, reliability is of paramount importance, and previous methodological literature highlights the need to establish intercoder reliability (ICR) [ 80 , 81 ]. At least two coders are needed to independently analyse data [ 81 ], or alternatively two coders can analyse a sample of data and if sufficient intercoder reliability is achieved, one coder can complete the remaining analysis [ 82 ]. Yet, some studies utilised only a single coder, did not establish or report measures of intercoder reliability, or were unclear in their reporting of the coding protocol. Content analysis is susceptible to human biases during the coding process, and thus it is essential to minimise these risks through a robust protocol.

Limitations of social media-based research

Although the strengths of social media-based research are numerous, there are several key limitations to this type of research. Many studies utilise ‘hashtags’ to search and identify content relevant to their topic of interest. However, not everyone who posts on social media uses hashtags, and these are often employed as a means to generate followers [ 83 ]. There are also some technical challenges in the data collection process whereby researchers must use external programs such as a Twitter Application Programming Interface to search for data which only permits access to a portion of all tweets.

Another important consideration is that findings cannot necessarily be generalised to the wider community. Although social media is a significant aspect of life for many, some demographics use and post on social media more frequently than others, for example women and younger age groups [ 84 , 85 ]. Not everyone uses and interacts with social media in the same way, so this type of research cannot be used to interpret the opinions and perspectives of the broader population.

Social media-based research is also somewhat constrained by ethical concerns regarding user privacy. Studies are often limited to the use of data extracted from public profiles, which in turn may bias the type of data collected. Mental health is an inherently sensitive topic, and thus analysis of mental health content posted to private social media profiles may yield additional insights.

Limitations of the review

This systematic review is subject to several limitations which must be noted. Firstly, the scope of this review was limited to identification and analysis of the methods used in the included studies and did not extend to synthesis of results. Future review articles may wish to focus on synthesis of results, although their highly heterogenous nature is likely to prevent meta-analysis. Secondly, the search was filtered to include only articles which were published in the English language. This may have missed relevant studies published in a language other than English, although the review did include several studies focused on social media content posted in Chinese, Greek, Turkish, French, and Finnish. The database searches were also limited to peer-reviewed publications as per convention for systematic literature reviews, however this search approach could potentially miss peer-reviewed conference proceedings and industry reports [ 67 ].

This review is the first to systematically identify, summarise and critically evaluate the available literature focused on the representation of mental ill-health on social media. The review analysed current methodologies employed by these studies and critically evaluated strengths and weaknesses of the various approaches adopted by researchers. The results highlight the need to shift away from text-based social media research such as Twitter, towards the more popular and emerging image and video-based platforms. The utility of both manual and computer-assisted content analysis was discussed, and reviewers concluded that both make valuable contributions to the body of research. Future research could aim to investigate how social media representation of mental illness translates to ‘real-life’ attitudes and instances of stigmatising behaviour, as well as the help-seeking behaviours of those experiencing symptoms of mental ill-health. Along with many other non-communicable chronic diseases, the rate of mental illness continues to grow, presenting an urgent public health challenge. This field of research can help to develop a deeper understanding of societal attitudes towards mental ill-health and reveal the information those suffering from mental ill-health are exposed to on social media. Through this knowledge, mental and public health professionals can create more targeted and effective campaigns to combat negative representations of mental ill-health using social media as a medium.

Data availability

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Abbreviations

Disability-adjusted life year

Coronavirus disease 2019

International Prospective Register of Systematic Reviews

Preferred Reporting Items for Systematic reviews and Meta-Analyses

Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition

Critical Appraisal Skills Programme

Autism spectrum disorder

Eating disorder

Human immunodeficiency virus/Acquired immunodeficiency syndrome

Monthly active users

Allen D. The relationship between challenging behaviour and mental ill-health in people with intellectual disabilities: a review of current theories and evidence. J Intellect Disabil. 2008;12(4):267–94.

Article   PubMed   Google Scholar  

Everymind. Understanding mental ill-health n.d. [cited 2023 Mar 14]. Available from: https://everymind.org.au/understanding-mental-health/mental-health/what-is-mental-illness .

Rehm KD Jr. Global burden of Disease and the impact of Mental and Addictive disorders. Curr Psychiatry Rep. 2019;21(2):1–7.

Article   Google Scholar  

Elias CL, Gorey KM. Online social networking among clinically depressed Young people: scoping review of potentially supportive or harmful behaviors. J Technol Hum Serv. 2022;40(1):79–96.

Wang PS, Berglund PA, Olfson M, Kessler RC. Delays in initial treatment contact after first onset of a mental disorder. Health Serv Res. 2004;39(2):393–415.

Article   PubMed   PubMed Central   Google Scholar  

Morgan C, Mallett R, Hutchinson G, Leff J. Negative pathways to psychiatric care and ethnicity: the bridge between social science and psychiatry. Soc Sci Med. 2004;58(4):739–52.

Dell’Osso B, Glick ID, Baldwin DS, Altamura AC. Can Long-Term outcomes be improved by shortening the duration of untreated illness in Psychiatric disorders? A conceptual Framework. Psychopathology. 2013;46(1):14–21.

Sullivan LE, Fiellin DA, O’Connor, PGDoIMYUSoM. The prevalence and impact of alcohol problems in major depression: a systematic review. Am J Med. 2005;118(4):330–41.

Clement S, Schauman O, Graham T, Maggioni F, Evans-Lacko S, Bezborodovs N, et al. What is the impact of mental health-related stigma on help-seeking? A systematic review of quantitative and qualitative studies. Psychol Med. 2015;45(1):11–27.

Article   CAS   PubMed   Google Scholar  

Xu Z, Huang F, Kösters M, Staiger T, Becker T, Thornicroft G, et al. Effectiveness of interventions to promote help-seeking for mental health problems: systematic review and meta-analysis. Psychol Med. 2018;48(16):2658–67.

Schnyder N, Panczak R, Groth N, Schultze-Lutter F. Association between mental health-related stigma and active help-seeking: systematic review and meta-analysis. B J Psychiatry. 2017;210(4):261–8.

Dudley JR. Confronting stigma within the Services System. Soc Work. 2000;45(5):449.

Salathé M, Bengtsson L, Bodnar TJ, Brewer DD, Brownstein JS, Buckee C et al. Digital Epidemiology. PLoS Comput Biol. 2012;8(7).

Brownstein JS, Freifeld CC, Madoff LC. Digital disease detection-harnessing the web for public health surveillance. N Engl J Med. 2009;360(21):2153–5.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Patel R, Chang T, Greysen SR, Chopra V. Social Media Use in Chronic Disease: a systematic review and Novel Taxonomy. Am J Med. 2015;128(12):1335–50.

Eysenbach G, Schulz P, Auvinen A-M, Crotty B, Moorhead SA, Hazlett DE et al. A New Dimension of Health Care: systematic review of the uses, benefits, and Limitations of Social Media for Health Communication. JMIR. 2013;15(4).

Zhang Y, Cao B, Wang Y, Peng T-Q, Wang X. When Public Health Research Meets Social Media: knowledge mapping from 2000 to 2018. JMIR. 2020;22(8):e17582.

PubMed   PubMed Central   Google Scholar  

Wongkoblap A, Vadillo MA, Curcin V. Researching Mental Health disorders in the era of Social Media. Syst Rev JMIR. 2017;19(6):e228.

Google Scholar  

Passerello GL, Hazelwood JE, Lawrie S. Using Twitter to assess attitudes to schizophrenia and psychosis. BJPsych Bull. 2019;43(4):158–66.

Budenz A, Purtle J, Klassen A, Yom-Tov E, Yudell M, Massey P. The case of a mass shooting and violence-related mental illness stigma on Twitter. Stigma Health. 2019;4(4):411–20.

Liu D, Feng XL, Ahmed F, Shahid M, Guo J. Detecting and measuring Depression on Social Media using a machine Learning Approach: systematic review. JMIR Ment Health. 2022;9(3):e27244.

Chancellor S, De Choudhury M. Methods in predictive techniques for mental health status on social media: a critical review. NPJ Digit Med. 2020;3(1).

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ (Clinical Res ed). 2021;372:n71.

Athanasopoulou C, Sakellari E. Schizophrenia’ on Twitter: Content Analysis of Greek Language tweets. Stud Health Technol Inf. 2016;226:271–4.

Budenz A, Klassen A, Purtle J, Yom Tov E, Yudell M, Massey P. Mental illness and bipolar disorder on Twitter: implications for stigma and social support. J Ment Health. 2020;29(2):191–9.

Cavazos-Rehg PA, Krauss MJ, Sowles S, Connolly S, Rosas C, Bharadwaj M, et al. A content analysis of depression-related tweets. Comput Hum Behav. 2016;54:351–7.

Delanys S, Benamara F, Moriceau V, Olivier F, Mothe J. Psychiatry on Twitter: Content Analysis of the Use of Psychiatric terms in French. JMIR Form Res. 2022;6(2):e18539.

Hernandez MY, Hernandez M, Lopez DH, Gamez D, Lopez SR. What do health providers and patients tweet about schizophrenia? Early Interv Psychiatry. 2020;14(5):613–8.

Jansli SM, Hudson G, Negbenose E, Erturk S, Wykes T, Jilka S. Investigating mental health service user views of stigma on Twitter during COVID-19: a mixed-methods study. J Ment Health. 2022;31(4):576–84.

Jilka S, Odoi CM, van Bilsen J, Morris D, Erturk S, Cummins N et al. Identifying schizophrenia stigma on Twitter: a proof of principle model using service user supervised machine learning. Schizophr. 2022;8(1).

Joseph AJ, Tandon N, Yang LH, Duckworth K, Torous J, Seidman LJ, et al. #Schizophrenia: use and misuse on Twitter. Schizophr Res. 2015;165(2–3):111–5.

Kara UY, Şenel Kara B. Schizophrenia on Turkish Twitter: an exploratory study investigating misuse, stigmatization and trivialization. Soc Psychiatry Psychiatr Epidemiol. 2022;57(3):531–9.

Makita M, Mas-Bleda A, Morris S, Thelwall M. Mental Health Discourses on Twitter during Mental Health Awareness Week. Issues Ment Health Nurs. 2021;42(5):437–50.

Nelson A. Ups and Downs: Social Media Advocacy of bipolar disorder on World Mental Health Day. Front Commun. 2019;4.

Park S, Hoffner C. Tweeting about mental health to honor Carrie Fisher: how #InHonorOfCarrie reinforced the social influence of celebrity advocacy. Comput Hum Behav. 2020;110:106353.

Parrott B. Hakim, Gentile. From #endthestigma to #realman: Stigma-Challenging Social Media Responses to NBA players’ Mental Health disclosures. Commun Rep. 2020;33(3):148–60.

Pavlova A, Berkers P. Mental health discourse and social media: which mechanisms of cultural power drive discourse on Twitter. Soc Sci Med. 2020;263:113250.

Pavlova A, Berkers P. Mental Health as defined by Twitter: frames, emotions, Stigma. Health Commun. 2022;37(5):637–47.

Reavley NJ, Pilkington PD. Use of Twitter to monitor attitudes toward depression and schizophrenia: an exploratory study. PeerJ. 2014;2:e647.

Robinson P, Turk D, Jilka S, Cella M. Measuring attitudes towards mental health using social media: investigating stigma and trivialisation. Soc Psychiatry Psychiatr Epidemiol. 2019;54(1):51–8.

Saha K, Torous J, Ernala SK, Rizuto C, Stafford A, De Choudhury M. A computational study of mental health awareness campaigns on social media. Transl Behav Med. 2019;9(6):1197–207.

Stupinski AM, Alshaabi T, Arnold MV, Adams JL, Minot JR, Price M, et al. Quantifying changes in the Language used around Mental Health on Twitter over 10 years: Observational Study. JMIR Ment Health. 2022;9(3):e33685.

Alvarez-Mon MA, Llavero-Valero M, Sánchez-Bayona R, Pereira-Sanchez V, Vallejo-Valdivielso M, Monserrat J, et al. Areas of interest and stigmatic attitudes of the General Public in five relevant Medical conditions: thematic and quantitative analysis using Twitter. JMIR. 2019;21(5):e14110.

Li A, Zhu T, Jiao D. Detecting depression stigma on social media: a linguistic analysis. J Affect Disord Rep. 2018;232:358–62.

Article   CAS   Google Scholar  

Li A, Jiao D, Liu X, Zhu T. A comparison of the psycholinguistic styles of Schizophrenia-Related Stigma and Depression-Related Stigma on Social Media: Content Analysis. JMIR. 2020;22(4):e16470.

Pan J, Liu B, Kreps GL. A content analysis of depression-related discourses on Sina Weibo: attribution, efficacy, and information sources. BMC Public Health. 2018;18(1):772.

Wang W, Liu Y. Discussing mental illness in Chinese social media: the impact of influential sources on stigmatization and support among their followers. Health Commun. 2016;31(3):355–63.

Yu L, Jiang W, Ren Z, Xu S, Zhang L, Hu X. Detecting changes in attitudes toward depression on Chinese social media: a text analysis. J Affect Disord. 2021;280(Pt A):354–63.

Athanasopoulou C, Suni S, Hätönen H, Apostolakis I, Lionis C, Välimäki M. Attitudes towards schizophrenia on YouTube: A content analysis of Finnish and Greek videos. Inf Health Soc Care. 2016;41(3):307–24.

Devendorf A, Bender A, Rottenberg J. Depression presentations, stigma, and mental health literacy: a critical review and YouTube content analysis. Clin Psychol Rev. 2020;78:101843.

Ghate R, Hossain R, Lewis SP, Richter MA, Sinyor M. Characterizing the content, messaging, and tone of trichotillomania on YouTube: A content analysis. J Psychiatr Res. 2022;151:150–6.

McLellan A, Schmidt-Waselenchuk K, Duerksen K, Woodin E. Talking back to mental health stigma: an exploration of YouTube comments on anti-stigma videos. Comput Hum Behav. 2022;131:107214.

Wu J, Hong T. The picture of #Mentalhealth on Instagram: congruent vs. incongruent emotions in Predicting the sentiment of comments. Front Commun. 2022;7.

Pavelko RL, Wang T. Love and basketball: audience response to a professional athlete’s mental health proclamation. Health Educ J. 2021;80(6):635–47.

Shigeta N, Ahmed S, Ahmed SW, Afzal AR, Qasqas M, Kanda H, et al. Content analysis of Canadian newspapers articles and readers’ comments related to schizophrenia. Int J Cult Ment Health. 2017;10(1):75–81.

Basch CH, Donelle L, Fera J, Jaime C. Deconstructing TikTok videos on Mental Health: cross-sectional, descriptive content analysis. JMIR Form Res. 2022;6(5):e38340.

Guidry J, Zhang Y, Jin Y, Parrish C. Portrayals of depression on Pinterest and why public relations practitioners should care. Public Relat Rev. 2016;42.

Vidamaly S, Lee SL. Young adults’ Mental Illness aesthetics on Social Media. Int J Cyber Behav. 2021;11:13–32.

Cascini F, Pantovic A, Al-Ajlouni YA, Failla G, Puleo V, Melnyk A, et al. Social media and attitudes towards a COVID-19 vaccination: a systematic review of the literature. eClinicalMedicine. 2022;48:101454.

Tokac U, Brysiewicz P, Chipps J. Public perceptions on Twitter of nurses during the COVID-19 pandemic. Contemp Nurse. 2022:1–10.

Ölcer S, Yilmaz-Aslan Y, Brzoska P. Lay perspectives on social distancing and other official recommendations and regulations in the time of COVID-19: a qualitative study of social media posts. BMC Public Health. 2020;20(1):963.

Ugarte DA, Cumberland WG, Flores L, Young SD. Public attitudes about COVID-19 in response to President Trump’s Social Media posts. JAMA Netw Open. 2021;4(2):e210101–e.

De Falco CC, Punziano G, Trezza D. A mixed content analysis design in the study of the Italian perception of COVID-19 on Twitter. Athens J Soc Sci. 2021;8(3):191–210.

Shorey S, Ang E, Yamina A, Tam C. Perceptions of public on the COVID-19 outbreak in Singapore: a qualitative content analysis. J Public Health. 2020;42(4):665–71.

Bucci S, Schwannauer M, Berry N. The digital revolution and its impact on mental health care. Psychol Psychother. 2019;92(2):277–97.

Naslund JA, Aschbrenner KA, Marsch LA, Bartels SJ. The future of mental health care: peer-to-peer support and social media. Epidemiol Psychiatr Sci. 2016;25(2):113–22.

Fung IC-H, Duke CH, Finch KC, Snook KR, Tseng P-L, Hernandez AC, et al. Ebola virus disease and social media: a systematic review. AM J Infect Control. 2016;44(12):1660–71.

Hawks JR, Madanat H, Walsh-Buhi ER, Hartman S, Nara A, Strong D et al. Narrative review of social media as a research tool for diet and weight loss. Comput Hum Behav. 2020;111.

Statista. Most popular social networks worldwide as of January 2023, ranked by number of monthly active users 2023 [cited 2023 Feb 23]. Available from: https://www.statista.com/statistics/272014/global-social-networks-ranked-by-number-of-users/ .

Statista. Distribution of Twitter users worldwide as of April 2021, by age group 2021 [cited 2023 Feb 23]. Available from: https://www.statista.com/statistics/283119/age-distribution-of-global-twitter-users/ .

Oberlo. TikTok age demographics 2023 [cited 2023 Feb 23]. Available from: https://au.oberlo.com/statistics/tiktok-age-demographics .

Australian Bureau of Statistics. National Study of Mental Health and Wellbeing. 2022 [cited 2023 Feb 23]. Available from: https://www.abs.gov.au/statistics/health/mental-health/national-study-mental-health-and-wellbeing/2020-21#cite-window2 .

Liamputtong P. Qualitative research methods. Fifth ed. Melbourne, Australia: Oxford University Press Australia and New Zealand; 2020.

Cho JY, Lee E-H. Reducing confusion about grounded theory and qualitative content analysis: similarities and differences. Qual Rep. 2014;19(32):1–20.

Fox AB, Earnshaw VA, Taverna EC, Vogt D. Conceptualizing and measuring Mental Illness Stigma: the Mental Illness Stigma Framework and critical review of measures. Stigma Health. 2018;3(4):348–76.

Corrigan P. How stigma interferes with mental health care. Am Psychol. 2004;59(7):614–25.

Forman J, Damschroder L. Qualitative content analysis. Empirical methods for bioethics: a primer. Emerald Group Publishing Limited; 2007. pp. 39–62.

Humble N, Mozelius P, editors. Content analysis or thematic analysis: Similarities, differences and applications in qualitative research. European Conference on Research Methodology for Business and Management Studies; 2022 June 2–3; Portugal.

Critical Appraisal Skills Programme. CASP Qualitative Studies Checklist 2022 [cited 2023 Feb 23]. Available from: https://casp-uk.net/images/checklist/documents/CASP-Qualitative-Studies-Checklist/CASP-Qualitative-Checklist-2018_fillable_form.pdf .

Kleinheksel AJP, Rockich-Winston NP, Tawfik HPMD, Wyatt TRP. Demystifying content analysis. Am J Pharm Educ. 2020;84(1).

O’Connor C, Joffe H. Intercoder Reliability in Qualitative Research: debates and practical guidelines. Int J Qual Methods. 2020;19.

Campbell JL, Quincy C, Osserman J, Pedersen OK. Coding in-depth semistructured interviews: problems of unitization and intercoder reliability and agreement. Sociol Methods Res. 2013;42:294–320.

Article   MathSciNet   Google Scholar  

Martín EG, Lavesson N, Doroud M. Hashtags and followers. Social Netw Anal Min. 2016;6(1):12.

Svensson R, Johnson B, Olsson A. Does gender matter? The association between different digital media activities and adolescent well-being. BMC Public Health [Internet]. 2022; 22(1).

Twenge JM, Martin GN. Gender differences in associations between digital media use and psychological well-being: evidence from three large datasets. J Adolesc. 2020;79:91–102.

Download references

Acknowledgements

Not applicable.

The authors received no financial support for the research, authorship, and/or publication of this article.

Author information

Authors and affiliations.

School of Medicine and Dentistry, Griffith University, Gold Coast Campus, 1 Parklands Drive, 4222, Southport, Gold Coast, QLD, Australia

Lucy Tudehope, Neil Harris, Lieke Vorage & Ernesta Sofija

You can also search for this author in PubMed   Google Scholar

Contributions

LT, ES and NH conceptualised the study. LT conducted the systematic literature search, and LV and LT completed the article screening and appraisal process. LT wrote the first draft of the manuscript, which was subsequently edited and reviewed by ES and NH. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Lucy Tudehope .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary material 2, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Tudehope, L., Harris, N., Vorage, L. et al. What methods are used to examine representation of mental ill-health on social media? A systematic review. BMC Psychol 12 , 105 (2024). https://doi.org/10.1186/s40359-024-01603-1

Download citation

Received : 24 July 2023

Accepted : 18 February 2024

Published : 29 February 2024

DOI : https://doi.org/10.1186/s40359-024-01603-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Mental health
  • Public health
  • Content analysis
  • Research methods

BMC Psychology

ISSN: 2050-7283

data analysis for social research

An official website of the United States government

Here's how you know

Official websites use .gov A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS. A lock ( Lock Locked padlock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

A Dear Colleague Letter (DCL) is an informal correspondence which is written by a Requesting Office and distributed to communities within a specific program area, to attract individuals eligible under a Visiting Scientist, Engineer, and Educator (VSEE) appointment, an Intergovernmental Personnel Act (IPA) assignment and/or a Federal Temporary appointment. These letters may be circulated in paper form through internal mail, distributed electronically using listservs or accessed through NSF.gov’s Career Page.

Behavioral Scientist (Program Director)

Application timeline, position summary.

The National Science Foundation is seeking qualified candidates for a Program Director in Cognitive Science – Artificial Intelligence (AI) in the Division of Behavioral and Cognitive Sciences (BCS) within the Directorate for Social, Behavioral and Economic Sciences (SBE), Alexandria, VA.

BCS is a division within SBE and is responsible for the support of fundamental research in all behavioral and cognitive science fields. The Division is composed of standing programs in the Psychological and Language Sciences as well as the Anthropological and Geographic Sciences.

For more information on SBE please click here.

For more information on BCS please click here.

The successful applicant is expected to assume responsibilities as Program Director for Cognitive Science-AI. The responsibilities of NSF Program Directors are dynamic and constantly evolving. Program Directors have an opportunity and responsibility to ensure NSF-funded research is at the forefront of advancing fundamental knowledge. In support of that, Program Directors are responsible for extensive interaction with academic research communities, as well as interaction with other Federal agencies that may lead to development of interagency collaborations. Within this context, Program Directors solicit, receive and review research proposals, make funding recommendations, administer awards, and undertake interaction with research communities in their fields.

Fundamental tasks include the administration of the merit review process and proposal recommendations, program budget administration, participation in strategic planning for the program, division, directorate, and agency, the preparation of public-facing materials highlighting advances in the supported research, as well as coordination with related programs in NSF or in other agencies and organizations.

The Program Director is guided by the goals of NSF's Strategic Plan: (1) enable the United States to uphold a position of world leadership in all aspects of science, mathematics, and engineering, (2) promote the discovery, integration, dissemination, and employment of new knowledge in service to society, and (3) achieve excellence in U.S. science, mathematics, engineering, and technology education at all levels. The core strategies NSF staff employ include developing intellectual capital, strengthening the physical infrastructure, integrating research and education, and promoting partnerships.

Position Description

We seek a Program Director who will cultivate a cognitive science portfolio that uses novel AI techniques and theories to address behavioral science issues. The Program Director will advance foundational AI research that promotes AI systems and technologies that contribute to an understanding of human perception, behavior, and social interaction, expands the understanding of the brain, and leverages AI capabilities to enhance learning.

Successful candidates are expected to develop initiatives that advance theory in core BCS disciplines by application of artificial intelligence models and network science methods. This includes multi-scale, multi-level network data and techniques of network analysis about individual and group behavior. Candidates will support basic research on the interaction between cognitive science and artificial intelligence, encouraging proposals that develop new theories or methods for understanding behavior that engage with the rapid advances being made in artificial intelligence, as well as interdisciplinary, multidisciplinary and convergent research approaches. Research is expected to yield results that will enhance, expand, and transform theory and methods, and that generate novel understandings of human behavior and how humans interact with emerging AI technologies – particularly understandings that can lead to significant opportunities or societal benefits. Research areas of interest include studying systems using AI and machine learning to investigate perception, image and video analyses, object detection, recognition, and tracking, social and spatial cognition, decision-making, attention, and control processes, as well as natural language processing (NLP). Successful candidates are also expected to utilize their expertise to engage with other programs in the division, the directorate, and across the agency in relevant cross-directorate programs and initiatives relating to their expertise in AI.

Appointment options

The position recruited under this announcement will be filled under the following appointment option(s):

Intergovernmental Personnel Act (IPA) Assignment: Individuals eligible for an IPA assignment with a Federal agency include employees of State and local government agencies or institutions of higher education, Indian tribal governments, and other eligible organizations in instances where such assignments would be of mutual benefit to the organizations involved. Initial assignments under IPA provisions may be made for a period up to two years, with a possible extension for up to an additional two-year period. The individual remains an employee of the home institution and NSF provides the negotiated funding toward the assignee's salary and benefits. Initial IPA assignments are made for a one-year period and may be extended by mutual agreement. 

Temporary Excepted Service Appointment: Appointment to this position will be made under the Excepted Authority of the NSF Act. Candidates who do not have civil service status or reinstatement eligibility will not obtain civil service status if selected. Candidates currently in the competitive service will be required to waive competitive civil service rights if selected. Usual civil service benefits (retirement, health benefits and life insurance) are applicable for appointments of more than one year. Temporary appointments may not exceed three years.

Visiting Scientist, Engineer, and Educator (VSEE) Program: Appointment to this position will be made under the Excepted Authority of the NSF Act. Visiting Scientists are on non-paid leave status from their home institution and placed on the NSF payroll. NSF withholds Social Security taxes and pays the home institution's contributions to maintain retirement and fringe benefits (i.e., health benefits and life insurance), either directly to the home institution or to the carrier. Appointments are usually made for a one-year period and may be extended for an additional year by mutual agreement.

Eligibility information

It is NSF policy that NSF personnel employed at or IPAs detailed to NSF are not permitted to participate in foreign government talent recruitment programs.  Failure to comply with this NSF policy could result in disciplinary action up to and including removal from Federal Service or termination of an IPA assignment and referral to the Office of Inspector General. https://www.nsf.gov/careers/Definition-of-Foreign-Talent-HRM.pdf .

Applications will be accepted from U.S. Citizens. Recent changes in Federal Appropriations Law require Non-Citizens to meet certain eligibility criteria to be considered. Therefore, Non-Citizens must certify eligibility by signing and attaching this Citizenship Affidavit to their application. Non-Citizens who do not provide the affidavit at the time of application will not be considered eligible. Non-Citizens are not eligible for positions requiring a security clearance.

To ensure compliance with an applicable preliminary nationwide injunction, which may be supplemented, modified, or vacated, depending on the course of ongoing litigation, the Federal Government will take no action to implement or enforce the COVID-19 vaccination requirement pursuant to Executive Order 14043 on Requiring Coronavirus Disease 2019 Vaccination for Federal Employees. Federal agencies may request information regarding the vaccination status of selected applicants for the purposes of implementing other workplace safety protocols, such as protocols related to masking, physical distancing, testing, travel, and quarantine.

Qualifications

Candidates must have a Ph.D. or equivalent in an appropriate field such as Cognitive Science, Cognitive Neuroscience, Computational Linguistics, Computer Vision, or other relevant fields. In addition, the candidate must have six or more years of successful research after award of the doctorate degree, and research administration, editorial, and/or managerial experience pertinent to the position.

A successful candidate will also demonstrate effective oral and written communication skills. Familiarity with NSF programs and activities is highly desirable. The candidate is expected to function effectively as a member of crosscutting and interactive teams as well as be an individual contributor. The candidate must also demonstrate a capability to promote NSF activities and to work closely with a broad spectrum of behavioral and social sciences.

How to apply

To apply, send a cover letter outlining qualifications and reason for interest in the position and an up-to-date curriculum vitae to the Chair of the search committee, Dr. Simon Fischer-Baum, [email protected] .

Consideration of applications will begin March 6, 2024.

IMAGES

  1. Statistics: A Tool for Social Research and Data Analysis, 11th Edition

    data analysis for social research

  2. Tools for data analysis in research methodology

    data analysis for social research

  3. Data analysis research concept 541484 Vector Art at Vecteezy

    data analysis for social research

  4. Data Analysis Single slide

    data analysis for social research

  5. (PDF) Social Set Analysis: A Set Theoretical Approach to Big Data Analytics

    data analysis for social research

  6. Krieg, Statistics and Data Analysis for Social Science

    data analysis for social research

VIDEO

  1. Qualitative Research Data Analysis

  2. Economics- Data Analysis Question Paper BA PROG 4th Semester DU SOL| Data Analysis Exam Pattern 2023

  3. Data Analysis

  4. Qualitative Data Analysis Procedures

  5. 'Countdown to Socialism' book launch

  6. Research Methodology and Data Analysis-Refresher Course

COMMENTS

  1. (Pdf) Data Analysis in Social Science Research

    Data analysis provides social science researchers with the tools to unlock insights and understand complex social phenomena. Researchers can interpret the data and uncover relationships and ...

  2. Methods of Data Collection, Representation, and Analysis

    This chapter concerns research on collecting, representing, and analyzing the data that underlie behavioral and social sciences knowledge. Such research, methodological in character, includes ethnographic and historical approaches, scaling, axiomatic measurement, and statistics, with its important relatives, econometrics and psychometrics. The field can be described as including the self ...

  3. Data Analysis for Social Scientists

    Learn methods for harnessing and analyzing data to answer questions of cultural, social, economic, and policy interest. The course is free to audit. Learners can take a proctored exam and earn a course certificate by paying a fee, which varies by ability to pay. Please scroll down for more information on the verified and audit track features ...

  4. Learning to Do Qualitative Data Analysis: A Starting Point

    For many researchers unfamiliar with qualitative research, determining how to conduct qualitative analyses is often quite challenging. Part of this challenge is due to the seemingly limitless approaches that a qualitative researcher might leverage, as well as simply learning to think like a qualitative researcher when analyzing data. From framework analysis (Ritchie & Spencer, 1994) to content ...

  5. Data Analysis for Social Science

    Data Analysis for Social Science provides a friendly introduction to the statistical concepts and programming skills needed to conduct and evaluate social scientific studies. Assuming no prior knowledge of statistics and coding and only minimal knowledge of math, the book teaches the fundamentals of survey research, predictive models, and causal inference while analyzing data from published ...

  6. Social Data Analysis

    Social data analysis enables you, as a researcher, to organize the facts you collect during your research. Your data may have come from a questionnaire survey, a set of interviews, or observations. They may be data that have been made available to you from some organization, national or international agency or other researchers. Whatever their source, social data can be daunting to put ...

  7. The Importance of Critical Data Analysis for the Social Sciences

    by Elliot Shore April 5, 2017. Current social science research and writing faces a number of possibilities that seem to be constrained by three major challenges. The first is the limits of the imagination; the second is knowing what kinds of data are now out there; and the third is having the tools to aggregate and mine them.

  8. Theory-Based Data Analysis for the Social Sciences

    Using real examples of social research, the author demonstrates the use of this approach for two common forms of analysis, multiple linear regression and logistic regression. Whether learning data analysis for the first time or adding new techniques to your repertoire, this book provides an excellent basis for theory-based data analysis.

  9. Adventures in Social Research

    The text starts with an introduction to computerized data analysis and the social research process, then walks users through univariate, bivariate, and multivariate analysis using SPSS. The book contains applications from across the social sciences—sociology, political science, social work, criminal justice, health—so it can be used in ...

  10. Social Analysis and Research

    The Sc.B. concentration in Social Analysis and Research provides both a conceptual and a working knowledge of the techniques for data collection and analysis used for social research in academic and non-academic environments. All Programs. The centerpiece of the concentration is a rigorous and comprehensive collection of courses: (1) that ...

  11. Master's (Sc.M.) Program in Social Data Analytics

    The master's program in Social Data Analytics is a terminal degree program designed to be completed in two semesters. The program requires eight courses including an optional intensive Research Internship that is attached to a faculty Directed Research Practicum. Brown undergraduates who enter the program as fifth-year Master's students are ...

  12. Qualitative analysis

    Qualitative analysis is the analysis of qualitative data such as text data from interview transcripts. Unlike quantitative analysis, which is statistics driven and largely independent of the researcher, qualitative analysis is heavily dependent on the researcher's analytic and integrative skills and personal knowledge of the social context where the data is collected.

  13. Examining Data Analysis Techniques in Social Research ...

    Through data analysis in social science research, you uncover patterns, establish correlations, and gain a deeper understanding of social systems. You can contribute to the discipline with evidence-based insights and generate knowledge that informs decision-making, policies, and interventions advancing our understanding of human behavior and ...

  14. (PDF) SPSS: An Imperative Quantitative Data Analysis Tool for Social

    The main idea of this paper is to offer new and handy information about quantitative data analysis in social science research. Keywords: SPSS, PLS, AMOS, Social Science, Quantitative Data Analysis

  15. Social Data Analytics

    Sc.M. The STEM-designated master's program in Social Data Analytics in the Department of Sociology at Brown trains students in advanced techniques for data collection and analysis. Careers in the 21st century increasingly place a premium on the ability to collect, process, analyze and interpret large-scale data on human attributes ...

  16. Data Analysis in Research: Types & Methods

    Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers to reduce data to a story and interpret it to derive insights. The data analysis process helps reduce a large chunk of data into smaller fragments, which makes sense. Three essential things occur during the data ...

  17. Adventures in Social Research: Data Analysis Using IBM® SPSS

    This fully revised workbook will guide students through step-by-step instruction on data analysis using the latest version of SPSS and the most up to date General Social Survey data. Arranged to parallel most introductory research methods texts, this text starts with an introduction to computerized data analysis and the social research process ...

  18. Statistics: A Tool for Social Research and Data Analysis

    With a new emphasis on the same "real data" professionals use to make evidence-based decisions, Healey's STATISTICS: A TOOL FOR SOCIAL RESEARCH AND DATA ANALYSIS, 11e, introduces the fundamental concepts of statistics and their practical application to contemporary social issues.

  19. Master of Science in Social Data Analytics and Research

    Applicants to the Social Data Analytics and Research master's degree program should have: A baccalaureate degree or its equivalent from an accredited institution of higher education. A grade point average (GPA) of 3.0 out of a 4.0 scale. Test Scores: A verbal score of 150 and a quantitative score of 150 on the GRE.

  20. (PDF) Data Analysis Plan for Social Science Research

    Data analysis was performed using Statistical For Social Science (SPSS) Version 22.0 software. The findings show that the level of technology-assisted teaching aids usage is moderate to high.

  21. Exploratory Data Analysis in Social Science Research

    Exploratory Data Analysis in Social Science Research. Political science has taken a turn towards causal inference in the last two decades, evidenced by the focus of methods courses in graduate school and the methodological leanings of publications in top journals of the field. Though understanding the causes of effects and effects of causes is ...

  22. Statistics: A Tool for Social Research and Data Analysis

    ISBN-13: 9780357371114. MindTap for Healey/Donoghue's Statistics: A Tool for Social Research and Data Analysis, 11th Edition is the digital learning solution that powers students from memorization to mastery. It gives you complete control of your course -- to provide engaging content, to challenge every individual and to build their confidence.

  23. Data Collection Methods

    As part of their research activities, researchers in all areas of education develop measuring instruments, design and conduct experiments and surveys, and analyze data resulting from these activities. Educational research has a strong tradition of employing state-of-the-art statistical and psychometric (psychological measurement) techniques.

  24. Social Research and Analysis (MA)

    Social Research and Analysis (MA) - STEM Designated Degree Program. The Master of Arts in Social Research and Analysis is a dynamic degree program that trains students to harness the power of data to improve programs, change social policies, market a product, execute an advertising campaign and inform business decision making.. Students in the program gain valuable skills in survey writing ...

  25. A look at Black-owned businesses in the U.S.

    Pew Research Center conducted this analysis to examine the characteristics of Black-owned businesses in the United States. The analysis relies primarily on data from the 2022 Annual Business Survey (ABS), conducted by the U.S. Census Bureau and the National Science Foundation's National Center for Science and Engineering Statistics.. The survey - conducted annually since 2017 - includes ...

  26. Full article: Automated Extraction of Treatment Patterns from Social

    This research uses a novel methodology that combines natural language processing, machine learning and rule-based methods to utilize health-related social media data. Comparisons of findings from social media data and estimates from published studies with regard to treatment patterns in renal cell carcinoma show that in 18 of 21 cases for the ...

  27. How B2B brands can use social listening for digital insights

    1. Gaining market intelligence and deep consumer insights. Unless you're in an industry that generates a lot of online chatter, like retail or entertainment, you can expect the volume of online conversation about your brand to be much lower.But fewer conversations don't necessarily equal fewer valuable data points.

  28. What methods are used to examine representation of mental ill-health on

    The current state of research is heterogenous with various research designs, data collection and data analysis techniques employed to analyse social media data. A methodological review is needed to provide researchers and health professionals with an overview of the current state of the literature, demonstrate the utility of various methods and ...

  29. Behavioral Scientist (Program Director)

    We seek a Program Director who will cultivate a cognitive science portfolio that uses novel AI techniques and theories to address behavioral science issues. The Program Director will advance foundational AI research that promotes AI systems and technologies that contribute to an understanding of human perception, behavior, and social interaction, expands the understanding of the brain, and ...