By Sean McShee
The GLAAD report “2017 Accelerating
Acceptance” has suffered from common errors in misunderstanding survey data. While
not as serious a transgression as “alternative facts” and fake news, this type
of misreporting contributes to public misunderstanding of social data. Three majors errors occurred in reporting
this data: 1) generalizing without
justification, 2) generalizing from too small a sample size, and 3) comparing non-comparable
groups.
Generalizing without justification
Political campaigns require
accurate polling. As a result, they have to report results that can be
generalized from the sample (those responding to the survey) to the population
(in a political poll, the voters in a given election). Results from an
electoral poll consist of two sets of number, the proportion of a candidate’s
supporters in the sample and the margin of error, usually expressed as +
2 to 3 percentage points.
In order to generalize from the
sample to the larger population, the sample should be randomly drawn from the
population of interest. Almost everyone generalizes the proportion from the
sample to the larger population. That is wrong.
The proportion in the sample indicates the range of possible values in
the population. The margin of error or confidence interval defines this
range.
Thus if a political poll reports
Candidate A got 51% of the vote in the sample with a + 3% margin of
error or confidence interval, it means that the survey is predicting he will
get between 48% and 54% of the vote. If his opponent got 47%, it means that the
survey predicts that his opponent will get between 44% and 50% of the vote. About
2% of the vote is undecided or supporting some else. As the confidence interval
for both candidates have a 2% overlap (48% to 50%), neither is predicted to be
a clear winner, but candidate A is slightly favored.
A 3% margin of error might be too
large for a political race, but would be narrow for other types of
surveys. Narrower margins of errors or
confidence intervals should generate more confidence in the results; larger ones
should generate less confidence. The margin of error or confidence interval has
more importance that the proportion from the sample.
The Harris-GLAAD survey contained
much higher estimates of the LGBT population in the US than other estimates. It
did not, however, report confidence intervals. I emailed Matt Goodman, the
contact person listed in the report to ask him about the Confidence
intervals. In his response, he said that
most surveys lacked enough statistical rigor to report confidence intervals. In
short, general tendencies can be noted, but precise figures should be
avoided.
According to Goodman, the purpose
of the survey was to explore the effects of adding “the broad spectrum of
sexual orientation experiences (like pansexual or asexual) as well as gender
expression separate from gender identity (like genderqueer or agender).” The purpose of the survey was not to estimate
population estimates for specific identity categories.
Generalizing from a small sample
size
While the total number of people in
the survey was adequate (2037), the total number identifying as LGBTQ was only
229. Among the various age groups, the number ranged from 6 in the 72+ age
group to 93 in the 18-34 age group. These sample sizes are simply too small to
produce meaningful results. For the other
age cohorts, the number would be even less.
Comparing non-comparable groups
The Harris-GLAAD survey report
compares generational cohorts at one point in time. It would be different if
they compared 18-24 year old millennials with baby boomers when they/we were
18-24. The process of aging can affect
attitudes in multiple ways.
Adolescence and young adulthood are
times of exploring and trying on different identifies and seeing which ones
stick. Twenty years later what seemed important may seem much less important. This
may be particularly true for two groups: lesbian until graduation, and straight
until the third beer (or 2nd line of ecstasy).
Having children also frequently
makes people more conservative (and religious).
People that are more conservative
tend to take fewer risks and thus tend to live longer. For example, deaths from
HIV impacted gay and bi men in the Baby Boomer and Generation X cohorts more
than any other group.
Many of these expanded categories
are not “new” but were around in the 70s. Bisexual, asexual, pansexual, along
with the ever popular “try sexual” were all known identifies then.
Reporting survey results also
requires a certain amount of education about innumeracy. Similar to illiteracy,
innumeracy involves the inability to understand quantitative data. Combating innumeracy
may not be important as combating fake news, but it is critical to creating an
informed citizenry.