Data Literacy: Margins of Error

The site has been up for a while now, and if you’ve done any browsing on the different data pages, you may have noticed that a lot of the tables have columns titled “MOE” – “Margin of error.” So in our first data literacy post, we’d like to go over what an MOE is; when they usually appear; how to read, interpret, and share data with MOEs; and why some of them are so huge. All without anybody needing a statistics degree.

Margins of error are related to other statistical concepts that we’re not going to get into here. Our goal is not to turn this blog into a statistics textbook, but we can offer a crash course in working with some basic ideas that are immediately relevant to the content on the rest of the site.

What is a margin of error?

A margin of error is a feature of data that is estimated, rather than counted, and indicates uncertainty. It shows the amount that the estimate may be off by, and the range of possible values for the estimate in question.

Margins of error are often denoted by “+/-.” If the estimate of average speed on a stretch of road is 55.0 mph, and the MOE is 2.5, you may see it written as “55.0 mph, +/- 2.5.” This means that the actual average speed on that road could be as high as 57.5 mph, or as low as 52.5 mph. It also means that it could be any value between 52.5 and 57.5. Occasionally, you may see MOEs written as the range they describe: “55.0 mph, +/- 2.5” would be “55.0 mph, 52.5-57.5.” Effectively, both ways of writing it mean the same thing. (Please note: MOEs should always be in the same units as the estimate: if the estimate is in miles per hour, joules, or people, the MOE is, respectively, in miles per hour, joules, or people. However, if the estimate is a percentage, remember that the MOE is in percentage points to be added to or subtracted from the percentage value, and is not describing the percent by which the estimate could be off.)

When do we usually see margins of error?

Margins of error are a function of estimates by definition not being an exact count and having a degree of uncertainty. That’s why you never see MOEs associated with a piece of data from a Decennial Census, but you always see MOEs associated with the U.S. Census Bureau’s American Community Survey (ACS). The Decennial Census is an exact count: when it says that there were 12,941 people in the Village of Rantoul in 2010[1], we know that that figure is a result of survey responses and door-to-door enumerators that accounted for each person in the village at that time. However, the ACS presents estimates: it collects responses from a sample of the population on a rolling basis, and extrapolates the estimates from that sample. But there’s some wiggle room in this extrapolation process, and that’s where MOEs come in.

How do we read, interpret, and share data with margins of error?

Understanding MOEs is important. If you’re sharing data from here, or from anywhere else, and that data includes an MOE, we strongly recommend sharing the MOE with the data. If you don’t, the information that you’re passing on – whether it’s for a college paper, a presentation, or a report for your job – is incomplete. This is especially important when the MOEs are particularly large.

For example, if I’m planning a presentation about the need for senior services in Hensley Township, just northwest of the City of Champaign, it will be important to include how many people in Hensley Township are over the age of 65. For additional detail, let’s say I also include a breakdown of that population by five-year age cohorts (65-69 years, 70-74 years, 75-79 years, 80-84 years, and 85+ years).

The ACS 2010-2014 5-Year Estimates present this data as the percentage of the total population in each age cohort. The following is a table showing those percentages without the MOE:

Table: Elderly Population by Cohort in Hensley Township, Champaign County, IllinoisDownload table data for Elderly Population by Cohort in Hensley Township, Champaign County, Illinois.

Source: U.S. Census Bureau; American Community Survey, 2010-2014 American Community Survey 5-Year Estimates, Table S0101; generated by CCRPC staff; using American FactFinder; (14 July 2016).

Helpful stuff, right? And it looks like a significant portion of the population is elderly. The five-year cohort of 65-69 alone accounts for an estimated 23.1% of the total population. But before we get too excited, let’s take a look at the same estimates, but include the MOEs this time:

Table: Elderly Population by Cohort in Hensley Township, Champaign County, IllinoisDownload table data for Elderly Population by Cohort in Hensley Township, Champaign County, Illinois.

Source: U.S. Census Bureau; American Community Survey, 2010-2014 American Community Survey 5-Year Estimates, Table S0101; generated by CCRPC staff; using American FactFinder; (14 July 2016).

Changes things, doesn’t it? For all five age cohorts, the margin of error is over 66% of the value of the actual estimate (the 70-74 age cohort: (3.3/4.9)*100 = 67.3%, and the 75-79 age cohort: (4.8/5.5)*100 = 94.1%). For the 80-84 and 85+ age cohorts, the MOE is larger than the actual estimate. And the 65-69 five-year cohort, at 23.1%, isn’t looking so solid with an MOE of +/- 15.3. That means that the percentage of the population accounted for by the 65-69 cohort could actually be as low as 7.8%. Alternatively, it could be as high as 38.4%. That could make a huge difference in the need for senior services, but with only this data to go on, we don’t know.

Now, it might help my case to omit the MOEs. Saying that almost a quarter of the township’s population is between the ages of 65-69 is an attention-getter. But MOEs mean something, and they’re there for a reason. Choosing to omit or include MOEs based on convenience to your message, or knowingly presenting an estimated figure as an exact figure, is unethical. So if your data comes with MOEs, include them in whatever you’re using the data for. It’s part and parcel of being a good data user. (The first table in this post is an exception for two reasons: 1) it’s an example to illustrate the importance of MOEs, and 2) the MOEs are included, for the same data, in the following table.)

There are cases where the MOEs are so large that the estimate can become almost completely unhelpful. When that happens, you have a few options. You can present it anyway, with the MOE. You can decrease the level of detail (e.g., examining one age category, 65+, instead of five-year age cohorts). You can look for another data source. Different agencies collect and publish different data. Returning to the senior services example, maybe a local nonprofit did a door-to-door survey three years ago and has an exact count of how many elderly residents were in Hensley Township at that time, or a state agency has put together estimates at a larger geographic scale (e.g., Champaign County) with lower MOEs.

You can also look for local sources of qualitative data or anecdotal evidence. If there are local agencies already providing some senior services, ask about their population served. If there’s a local library or other community hub, it might be useful to ask staff members there if they see a lot of elderly patrons, and how they tend to utilize the library and its services. Discussing how local seniors use existing services could help build a case about whether additional services are needed, and how they might be used. This type of information can seem less official than hard figures, but it does have the advantages of being hyper-local to the relevant area and based on the knowledge of local experts.

Why is this margin of error so large? Why are some of them larger than their corresponding estimate?

Margins of error often become larger as the population of the geography you’re working with shrinks. Let’s look at the same estimate from two different areas: percentage of the total workforce that commutes by carpool, in Champaign County and in Brown Township, a township in the far northwest corner of Champaign County.

In Champaign County, according to the 2010-2014 ACS 5-Year Estimates, an estimated 8.7% of workers commuted to work via carpool, with an MOE of +/- 0.8[2]. (Meaning somewhere between 7.9% and 9.5% of workers actually commuted to work via carpool.) An estimated 10.8% of workers in Brown Township commuted to work via carpool, a figure not so different from the Champaign County estimate; however, the MOE for Brown Township’s estimate is +/- 3.8, much larger than Champaign County’s MOE[3].

The disparity in size of the MOEs is due to the size of the sample considered. The same principle applies as when you’re calculating averages: the calculation of an average based on 500 points of data will be more precise than the calculation of an average based on 50 points of data. So estimates for townships or smaller municipalities often have larger MOEs than those for the county or larger municipalities. Champaign County had a population of over 200,000, according to the 2010-2014 ACS 5-Year Estimates, while Brown Township and Hensley Township had populations of 1,841 (+/- 222) and 922 (+/- 274), respectively[4].

Some ACS estimates of 0% can also have MOEs that look frankly implausible, but they’re a reflection of estimation methodology, beginning with a sample. To calculate estimates, instead of counting characteristics of an entire population, data is collected on a portion of that population. That group is called a sample, and data about that sample is extrapolated into the estimate(s) for a whole population. An estimate of 0% indicates that no respondent in the sample had the relevant characteristic, but that does not mean that no one in the population does, because not everyone in the population was in the sample[5]. The MOE shows the greatest percentage of the population that could have that characteristic, despite no respondents in the sample reporting it[6].

We hope that our first data literacy post was informative and at least somewhat entertaining. To sum up: margins of error show the amount of uncertainty associated with an estimate. They can range from small to very large, relative to their corresponding estimate, and they’re always important to include when you share or publish data. If you have more questions, try the U.S. Census Bureau’s Glossary and FAQ pages; we’ve certainly found them helpful.

Data literacy posts will be a quarterly feature of this blog, so the next one will be in December. If you have any data literacy questions that you’d like to see featured in future blog posts, drop us a comment or an email!

[1] U.S. Census Bureau; 2010 Decennial Census, Table P1; generated by CCRPC staff; using American FactFinder; (26 July 2016).

[2] U.S. Census Bureau; American Community Survey, 2010-2014 American Community Survey 5-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (28 July 2016).

[3] Ibid.

[4] U.S. Census Bureau; American Community Survey 2010-2014 American Community Survey 5-Year Estimates, Table B01003; generated by CCRPC staff; using American FactFinder; (4 August 2016).

[5] U.S. Census Bureau; Frequently Asked Questions, FAQ1551; retrieved by CCRPC staff; (28 July 2016).

[6] Ibid.

Related Documents

Leave a Reply

Please be respectful. All fields are required, and all comments are subject to moderation. URLs are not allowed.