how standard is your deviation

By Sean McShee

The GLAAD report “2017 Accelerating Acceptance” has suffered from common errors in misunderstanding survey data. While not as serious a transgression as “alternative facts” and fake news, this type of misreporting contributes to public misunderstanding of social data. Three majors errors occurred in reporting this data: 1) generalizing without justification, 2) generalizing from too small a sample size, and 3) comparing non-comparable groups.

Generalizing without justification

Political campaigns require accurate polling. As a result, they have to report results that can be generalized from the sample (those responding to the survey) to the population (in a political poll, the voters in a given election). Results from an electoral poll consist of two sets of number, the proportion of a candidate’s supporters in the sample and the margin of error, usually expressed as + 2 to 3 percentage points.

In order to generalize from the sample to the larger population, the sample should be randomly drawn from the population of interest. Almost everyone generalizes the proportion from the sample to the larger population. That is wrong. The proportion in the sample indicates the range of possible values in the population. The margin of error or confidence interval defines this range.

Thus if a political poll reports Candidate A got 51% of the vote in the sample with a + 3% margin of error or confidence interval, it means that the survey is predicting he will get between 48% and 54% of the vote. If his opponent got 47%, it means that the survey predicts that his opponent will get between 44% and 50% of the vote. About 2% of the vote is undecided or supporting some else. As the confidence interval for both candidates have a 2% overlap (48% to 50%), neither is predicted to be a clear winner, but candidate A is slightly favored.

A 3% margin of error might be too large for a political race, but would be narrow for other types of surveys. Narrower margins of errors or confidence intervals should generate more confidence in the results; larger ones should generate less confidence. The margin of error or confidence interval has more importance that the proportion from the sample.

The Harris-GLAAD survey contained much higher estimates of the LGBT population in the US than other estimates. It did not, however, report confidence intervals. I emailed Matt Goodman, the contact person listed in the report to ask him about the Confidence intervals. In his response, he said that most surveys lacked enough statistical rigor to report confidence intervals. In short, general tendencies can be noted, but precise figures should be avoided.

According to Goodman, the purpose of the survey was to explore the effects of adding “the broad spectrum of sexual orientation experiences (like pansexual or asexual) as well as gender expression separate from gender identity (like genderqueer or agender).” The purpose of the survey was not to estimate population estimates for specific identity categories.

Generalizing from a small sample size

While the total number of people in the survey was adequate (2037), the total number identifying as LGBTQ was only 229. Among the various age groups, the number ranged from 6 in the 72+ age group to 93 in the 18-34 age group. These sample sizes are simply too small to produce meaningful results. For the other age cohorts, the number would be even less.

Comparing non-comparable groups

The Harris-GLAAD survey report compares generational cohorts at one point in time. It would be different if they compared 18-24 year old millennials with baby boomers when they/we were 18-24. The process of aging can affect attitudes in multiple ways.

Adolescence and young adulthood are times of exploring and trying on different identifies and seeing which ones stick. Twenty years later what seemed important may seem much less important. This may be particularly true for two groups: lesbian until graduation, and straight until the third beer (or 2^nd line of ecstasy).

Having children also frequently makes people more conservative (and religious).

People that are more conservative tend to take fewer risks and thus tend to live longer. For example, deaths from HIV impacted gay and bi men in the Baby Boomer and Generation X cohorts more than any other group.

Many of these expanded categories are not “new” but were around in the 70s. Bisexual, asexual, pansexual, along with the ever popular “try sexual” were all known identifies then.

Reporting survey results also requires a certain amount of education about innumeracy. Similar to illiteracy, innumeracy involves the inability to understand quantitative data. Combating innumeracy may not be important as combating fake news, but it is critical to creating an informed citizenry.

how standard is your deviation

Thursday, April 13, 2017

Challenging innumeracy

Blog Archive

Followers