Randy Duchaine / Alamy Stock Photo

How the Sausage Gets Made: A Primer

Mark M. Gray

March 27, 2024

A social science research glossary for the uninitiated

A social science research glossary for the uninitiated

In his 2011 book “Everything Is Obvious: *Once You Know the Answer,” Duncan Watts notes that as a former physicist, he understood rocket science — that famously challenging discipline — to be relatively easy. Predicting where a rocket would go was a combination of physics and mathematics that was, for the most part, predictable. When he attempted his hand at social science, he realized that predicting human behavior was far more challenging.

He had a point. Social science has three potential goals, which are related to one another: describing the way the world works, establishing cause and effect and predicting future behavior. Accurate description, tough in and of itself, is probably the easiest of these tasks; establishing cause and effect is more challenging; the hardest of them all is attempting to predict future behavior based on the first two.

The methods available to social scientists to achieve these goals are very different than those available to rocket scientists. We don’t have laboratories. The world we study through hopefully unbiased observation is messy and layered and full of interacting forces and ambiguities and contradictions. Our subjects are human beings and our most important ethical concern is to do no harm. Sometimes we collect qualitative data, meaning information that’s not easily distillable to numbers, such as opinions gathered in focus groups — but increasingly, we live in the world of distilling our findings to numbers and formulas that, when analyzed, tell us the shape of a given phenomenon or the effect of a given intervention on changing the world and human behavior. Sometimes we do both.

Even when we limit our discussion to quantitative methods, there are nearly as many ways to break down and study the world as there are people who seek to do it.

Social science has three goals: describing the way the world works, establishing cause and effect and predicting future behavior.

Many applied social scientists who are seeking to provide answers or solutions to real-world problems and issues rely on surveys, trend analysis and projections with data collections. Among the quantitative methods social scientists often use to find lessons in the mountains of collected information — depending on the type of data they have — is cross-tabulations or regression methods with tests for statistical significance.

Described in the simplest possible terms, these are attempts to disaggregate — meaning disentangle — factors that aren’t being studied from those that are.

The first two types of data are nominal and ordinal. Nominal measurements describe something that simply divides observations into different groups (e.g., the religion of respondents might be measured as 1=Protestant, 2=Catholic, 3=Jewish, etc.). Ordinal measurements are more complicated; these are measurements where the numbers begin to take on meaning in terms of directionality (e.g., the political ideology of respondents might be measured as 1=conservative, 2=moderate, 3=liberal. There is a direction in the measurement but the differences between each category do not measure something precisely or consistently). When a social scientist is working with nominal and ordinal data, they often use cross-tabulation to discover differences between subgroups of observations by looking at how subgroups of respondents or participants are responding (e.g., how do people of different religious affiliations differ in how they self-identify their political ideology).

If we’re trying to isolate the impact of a public policy intervention on people, there are two questions after we go through the complicated mathematical work of trying to establish, using data, whether anything changed among those who were subjected to a given intervention. Was this change confounded by other variables? Was it merely a correlation, or is there reason to believe it had something to do with causation?

Answering this last question is crucial, and it’s not simple. Sometimes, it’s clear that correlation doesn’t equal causation. Studies can show that either eating breakfast or not doing so can “result in” weight loss. Trouble is, if they fail to control for whether those studied are also working out, or cutting back on lunch or dinner, they don’t tell us anything of significance.

There are nearly as many ways to break down and study the world as there are people who seek to do it.

But even when researchers properly control for other factors, our work isn’t done. One of the statistical tests used to determine if already isolated differences between groups are statistically significant, and therefore if we might be seeing not just correlation but causation, is the Chi-square statistic. In a tradition rooted in history, social scientists generally assume that when this test obtains a p<.05, it is statistically significant and one can reject the null hypothesis that two variables are unrelated. In other words, one would expect there to be less than a 5% chance that the association seen in the sample data would not be observed in the broader population or in a replication of the researcher.

This is, needless to say, a rather arbitrary standard. A 2016 statement by the American Statistical Association cautions misinterpretations that can be made with this standard: “A conclusion does not immediately become ‘true’ on one side of the divide and ‘false’ on the other.”

But that’s not the only way to parse data, because it comes in many shapes and sizes. When using interval or ratio-level data, different types of statistical methods are needed. Interval and ratio-level data are real measurements with precision and direction; the difference between the two is that ratio-level data have a true zero point where interval measurements can be negative. Examples of interval data would be temperature, bank account balance, etc. Examples of ratio measurements would be age, years of education, household income, etc. Social scientists will often use correlation, scatterplots and regression methods — complicated mathematical formulas meant to disentangle variables so that the thread one is seeking to observe can be clearly seen and measured — to understand cause and effect between variables in datasets that contain these types of statistics.

To the initiated observer, these often resemble the most complicated theorems from calculus or physics.

When a dependent variable has just two outcome categories (e.g., whether someone voted or did not vote) or multiple discrete categories (e.g., a person voted for candidate A, B, C or D), forms of what is called logistic regression are used. When the dependent variable is an interval or ratio-level measurement (e.g. household income), ordinary least squares regression is often the model of choice. Whatever the regression method used, the statistical significance of variables in models relies again on the same p<.05 standard mentioned previously, and the same cautions by American Statistical Association remain in place.

Good social science requires much more than statistical significance. It requires judgment.

In recent years, as the era of “big data” has emerged and computers can help us crunch ever more complicated datasets in ever more complicated ways, an additional concern has become clearer. As Tyler Vigen has shown in his work on spurious correlations, there are random and highly statistically significant correlations between many variables in large datasets by random chance alone. These correlations are often of the p<.01 with correlations above .900 (1.000 is a perfect correlation and .000 is no correlation). For example, the trends in milk consumption and the divorce rate in Colorado between 1999 and 2021 have a correlation coefficient of .932 and this is statistically significant at the p<.01 level. While statistically impressive by current journal publication standards, the connection between these two trends has no reasonable theoretical connection.

So good social science requires much more than statistical significance. It requires judgment. First and foremost, a good hypothesis and/or theory of why one thing may cause another to happen is necessary. Even with this and a seemingly statistically significant result, repeated replication is necessary to strengthen our belief that one thing is related to another.

A statistical result requires far more than one run through the steps of the scientific method. The process of peer review and publication follows, in which fellow scholars pull apart one’s research methodology, analysis and conclusions — a process that is changing like so many others in this self-scrutinizing era. For a theory of change to become well established, it must pass scrutiny, and ultimately an experiment must be replicated. The passing of one research test doesn’t make for an idea’s graduation.

As Thomas Kuhn pointed out in the 1962 volume “The Structure of Scientific Revolutions,” scientists are human beings and are subject to all the tendencies of the human condition that we see elsewhere — including ambition and groupthink, which can stifle legitimate scientific breakthroughs. Often, too much confidence may be put in the marginally statistically significant result, while at other times, a potentially worthy new point of view may struggle to make it past peer-reviewed publication, at least for a time, because of the personalities involved in the process.

In the end, we might want to add science to the likes of sausages and laws as a thing we would be better off not seeing being made. With that, we should also be more skeptical of the scientific process to robustly establish cause and effect in the social world. Descriptions of behavior and establishing associations between different factors are often achieved in the social world. Proving cause and effect and/or predicting human behavior is far rarer.