Data Is or Data Are: Settling the Singular vs Plular Grammar Debate

Writers, analysts, and scientists routinely pause at the keyboard because “data” refuses to behave like an ordinary noun. The hesitation stems from a centuries-old tension between Latin grammar and modern English usage.

Search engines surface conflicting guidance, style guides differ, and editors argue in comment threads. This article delivers a definitive, practical map for choosing “is” or “are” in any context without second-guessing yourself.

Etymology and Historical Usage

From Latin Datum to English Data

The Latin word datum meant “something given” and appeared in scholarly texts as early as the 17th century. Scholars naturally pluralized it as data when referring to multiple observations.

By the 1800s English treatises on astronomy and medicine preserved the plural sense, writing “the data are conclusive.” This pattern remained dominant in academic prose until the mid-20th century.

Corpora show a sharp rise in singular usage after 1970, coinciding with the growth of computing and the mass media.

Corpus Evidence Across Decades

The Google Books Ngram Viewer records “data is” surpassing “data are” in American English around 1980. British English followed suit roughly a decade later, though “data are” still appears more often in UK journals.

Linguists attribute the shift to conceptual change: people began treating “data” as a collective mass rather than a countable set of points.

Grammatical Number in Modern English

Mass Nouns vs Count Nouns

Mass nouns like “water” or “information” trigger singular verbs and resist pluralization. Count nouns like “apple” or “report” take plurals and pair with plural verbs when needed.

“Data” straddles both categories, so the verb choice signals which interpretation the writer intends.

Collective Nouns as a Parallel

Consider “team” or “government,” which swing between singular and plural depending on emphasis. When the focus is on the unit acting as one entity, the singular verb feels natural.

Apply the same logic to “data” and the path forward becomes clearer.

Academic and Scientific Conventions

Journal Style Guide Sampling

Nature and Science instruct authors to use “data are,” reinforcing the plural tradition. Conversely, The Lancet accepts both forms but defaults to “data is” in news sections.

Check the target journal’s instructions for authors before submission.

Grant Proposals and Technical Reports

Funding bodies such as the NIH and ERC rarely police the issue unless the usage is inconsistent. Reviewers care more about clarity than dogma, so pick one convention and apply it rigorously.

Corporate and Marketing Realities

Annual Reports and Investor Decks

In SEC filings, singular usage dominates because the document treats “data” as a bulk asset. Phrases like “our data is encrypted” sound more natural to non-technical stakeholders.

Switching to plural may distract readers who expect business English norms.

Brand Voice Guides

Slack’s style guide recommends “data is” for consistency with conversational tone. IBM’s internal manual allows either form but flags mixed usage in a single document.

Align with the established brand lexicon to avoid copy-editing churn.

Technical Writing and API Documentation

Code Comments and README Files

Developers usually write “the data is loaded into memory” because programming languages treat datasets as single objects. This mirrors how variables are referenced in code.

Consistency reduces cognitive load for readers toggling between prose and code snippets.

Endpoint Descriptions

RESTful documentation favors singular for resource names: “GET /data returns a JSON array.” The surrounding explanatory text should echo that convention.

Deviating into plural may confuse implementers who map endpoints to object models.

Data Journalism and Newsrooms

Headline Constraints

Headlines prize brevity, so “data is” wins by one character. Editors also believe the singular sounds less intimidating to general audiences.

Story Body Standards

The Associated Press allows both forms but recommends choosing based on context. If the paragraph stresses individual figures, “data are” fits; if it refers to a dataset as a whole, “data is” works.

Reuters and Bloomberg follow similar flexible policies.

Machine Learning and AI Papers

Dataset vs Datum Distinction

Authors often introduce “the data is split into train and test sets,” treating the corpus as one block. Later they might note “the data are drawn from 42 countries,” emphasizing collection diversity.

Use the shift in perspective to reinforce narrative flow.

Conference Templates

NeurIPS provides LaTeX macros that expand to “Data (plural of datum) are…” in footnotes, yet the main text uses singular for fluency. This hybrid approach satisfies purists without forcing awkward phrasing.

Legal and Regulatory Language

GDPR Recitals

European Union texts oscillate: “personal data is processed lawfully” appears alongside “the data are adequate, relevant and limited.” The difference hinges on whether the sentence addresses the concept or the individual pieces.

Lawyers draft clauses to mirror the regulation’s wording, ensuring interpretive alignment.

Patent Applications

USPTO examiners accept singular usage in method claims: “the data is encrypted using AES-256.” The focus is on the process, not the discrete values.

Switching to plural could unintentionally narrow claim scope.

Everyday Blogging and Social Media

SEO Keyword Optimization

Google’s keyword planner shows “data is” queries outranking “data are” by 12:1 in global volume. Optimizing for the singular increases click-through rates from non-experts.

Reserve the plural for niche academic audiences to avoid alienation.

Voice Search Alignment

Smart assistants parse “data is” more reliably because it aligns with spoken patterns. Voice snippets that mirror natural phrasing earn higher SERP placements.

Global English Variations

American vs British Preferences

Corpus linguistics reveals a 70% singular preference in US publications, while UK texts hover near 50%. Canadian and Australian English lean toward the US pattern.

Localize usage if the audience is predominantly from one region.

Second-Language Speakers

Non-native writers often default to the plural because textbooks emphasize Latin roots. Provide them with region-specific style sheets to reduce friction.

Teams with mixed linguistic backgrounds should codify one standard in the onboarding wiki.

Pronunciation and Cognitive Load

Auditory Smoothness

“Data is” rolls off the tongue in rapid speech, avoiding the sibilant clash of “data are.” Podcast hosts and webinar presenters favor the singular for listener comfort.

Screen Reader Behavior

Accessibility tests show that screen readers pronounce “data are” with a slight pause that can break sentence rhythm. Singular usage improves flow for visually impaired users.

Tooling and Linting Solutions

Stylelint and Vale Rules

Open-source linters like Vale allow custom rules to enforce “data is” or “data are” across markdown repositories. Configure the YAML file once and let CI enforce it.

This prevents editorial drift during multi-author sprints.

Grammarly and LanguageTool

Both services default to accepting either form but flag inconsistency. Override the suggestion with a dedicated style guide link embedded in the document header.

Practical Decision Framework

Step 1: Identify Audience Domain

Ask whether the primary readers are academics, engineers, executives, or general consumers. Each group carries expectations that override abstract grammar rules.

Step 2: Audit Existing Content

Run a regex search for “datas(is|are)” across your codebase or documentation. Note the dominant form and align future edits to match.

This single pass often resolves 90% of inconsistencies.

Step 3: Document the Choice

Create a one-line entry in your style guide: “Use data is in all user-facing copy; reserve data are for peer-reviewed appendices.” Link the rule to examples for quick onboarding.

Edge Cases and Troubleshooting

Parenthetical Clarifications

When both nuances matter, write “the data (each observation) are validated, and the dataset is stored securely.” The parenthesis removes ambiguity without forcing a rewrite.

Quotations and Historical Texts

Preserve the original verb when quoting sources, even if it conflicts with your style guide. Add a sic tag only if the discrepancy could confuse readers.

Multilingual Figures and Tables

Captions often compress information: “Figure 3: Data is normalized.” If the figure legend later lists individual points, maintain the singular to stay consistent with the caption.

Future Trajectory

Generative AI Training Data

Large language models increasingly favor singular usage because their training corpora skew toward web text. Expect this trend to accelerate as synthetic text floods the internet.

Standards Bodies in Flux

IEEE is debating a style update that would accept singular as default while permitting plural in footnotes. Monitor draft ballots if you publish in that ecosystem.

Quick Reference Cheat Sheet

Academic journal: data are. Startup blog: data is. API docs: data is. Legal brief: mirror source text.

Store this line in your note-taking app and paste it into contributor guidelines to save editorial time.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *