Data Is or Data Are: Settling the Singular vs Plular Grammar Debate
Writers, analysts, and scientists routinely pause at the keyboard because “data” refuses to behave like an ordinary noun. The hesitation stems from a centuries-old tension between Latin grammar and modern English usage.
Search engines surface conflicting guidance, style guides differ, and editors argue in comment threads. This article delivers a definitive, practical map for choosing “is” or “are” in any context without second-guessing yourself.
Etymology and Historical Usage
From Latin Datum to English Data
The Latin word datum meant “something given” and appeared in scholarly texts as early as the 17th century. Scholars naturally pluralized it as data when referring to multiple observations.
By the 1800s English treatises on astronomy and medicine preserved the plural sense, writing “the data are conclusive.” This pattern remained dominant in academic prose until the mid-20th century.
Corpora show a sharp rise in singular usage after 1970, coinciding with the growth of computing and the mass media.
Corpus Evidence Across Decades
The Google Books Ngram Viewer records “data is” surpassing “data are” in American English around 1980. British English followed suit roughly a decade later, though “data are” still appears more often in UK journals.
Linguists attribute the shift to conceptual change: people began treating “data” as a collective mass rather than a countable set of points.
Grammatical Number in Modern English
Mass Nouns vs Count Nouns
Mass nouns like “water” or “information” trigger singular verbs and resist pluralization. Count nouns like “apple” or “report” take plurals and pair with plural verbs when needed.
“Data” straddles both categories, so the verb choice signals which interpretation the writer intends.
Collective Nouns as a Parallel
Consider “team” or “government,” which swing between singular and plural depending on emphasis. When the focus is on the unit acting as one entity, the singular verb feels natural.
Apply the same logic to “data” and the path forward becomes clearer.
Academic and Scientific Conventions
Journal Style Guide Sampling
Nature and Science instruct authors to use “data are,” reinforcing the plural tradition. Conversely, The Lancet accepts both forms but defaults to “data is” in news sections.
Check the target journal’s instructions for authors before submission.
Grant Proposals and Technical Reports
Funding bodies such as the NIH and ERC rarely police the issue unless the usage is inconsistent. Reviewers care more about clarity than dogma, so pick one convention and apply it rigorously.
Corporate and Marketing Realities
Annual Reports and Investor Decks
In SEC filings, singular usage dominates because the document treats “data” as a bulk asset. Phrases like “our data is encrypted” sound more natural to non-technical stakeholders.
Switching to plural may distract readers who expect business English norms.
Brand Voice Guides
Slack’s style guide recommends “data is” for consistency with conversational tone. IBM’s internal manual allows either form but flags mixed usage in a single document.
Align with the established brand lexicon to avoid copy-editing churn.
Technical Writing and API Documentation
Code Comments and README Files
Developers usually write “the data is loaded into memory” because programming languages treat datasets as single objects. This mirrors how variables are referenced in code.
Consistency reduces cognitive load for readers toggling between prose and code snippets.
Endpoint Descriptions
RESTful documentation favors singular for resource names: “GET /data returns a JSON array.” The surrounding explanatory text should echo that convention.
Deviating into plural may confuse implementers who map endpoints to object models.
Data Journalism and Newsrooms
Headline Constraints
Headlines prize brevity, so “data is” wins by one character. Editors also believe the singular sounds less intimidating to general audiences.
Story Body Standards
The Associated Press allows both forms but recommends choosing based on context. If the paragraph stresses individual figures, “data are” fits; if it refers to a dataset as a whole, “data is” works.
Reuters and Bloomberg follow similar flexible policies.
Machine Learning and AI Papers
Dataset vs Datum Distinction
Authors often introduce “the data is split into train and test sets,” treating the corpus as one block. Later they might note “the data are drawn from 42 countries,” emphasizing collection diversity.
Use the shift in perspective to reinforce narrative flow.
Conference Templates
NeurIPS provides LaTeX macros that expand to “Data (plural of datum) are…” in footnotes, yet the main text uses singular for fluency. This hybrid approach satisfies purists without forcing awkward phrasing.
Legal and Regulatory Language
GDPR Recitals
European Union texts oscillate: “personal data is processed lawfully” appears alongside “the data are adequate, relevant and limited.” The difference hinges on whether the sentence addresses the concept or the individual pieces.
Lawyers draft clauses to mirror the regulation’s wording, ensuring interpretive alignment.
Patent Applications
USPTO examiners accept singular usage in method claims: “the data is encrypted using AES-256.” The focus is on the process, not the discrete values.
Switching to plural could unintentionally narrow claim scope.
Everyday Blogging and Social Media
SEO Keyword Optimization
Google’s keyword planner shows “data is” queries outranking “data are” by 12:1 in global volume. Optimizing for the singular increases click-through rates from non-experts.
Reserve the plural for niche academic audiences to avoid alienation.
Voice Search Alignment
Smart assistants parse “data is” more reliably because it aligns with spoken patterns. Voice snippets that mirror natural phrasing earn higher SERP placements.
Global English Variations
American vs British Preferences
Corpus linguistics reveals a 70% singular preference in US publications, while UK texts hover near 50%. Canadian and Australian English lean toward the US pattern.
Localize usage if the audience is predominantly from one region.
Second-Language Speakers
Non-native writers often default to the plural because textbooks emphasize Latin roots. Provide them with region-specific style sheets to reduce friction.
Teams with mixed linguistic backgrounds should codify one standard in the onboarding wiki.
Pronunciation and Cognitive Load
Auditory Smoothness
“Data is” rolls off the tongue in rapid speech, avoiding the sibilant clash of “data are.” Podcast hosts and webinar presenters favor the singular for listener comfort.
Screen Reader Behavior
Accessibility tests show that screen readers pronounce “data are” with a slight pause that can break sentence rhythm. Singular usage improves flow for visually impaired users.
Tooling and Linting Solutions
Stylelint and Vale Rules
Open-source linters like Vale allow custom rules to enforce “data is” or “data are” across markdown repositories. Configure the YAML file once and let CI enforce it.
This prevents editorial drift during multi-author sprints.
Grammarly and LanguageTool
Both services default to accepting either form but flag inconsistency. Override the suggestion with a dedicated style guide link embedded in the document header.
Practical Decision Framework
Step 1: Identify Audience Domain
Ask whether the primary readers are academics, engineers, executives, or general consumers. Each group carries expectations that override abstract grammar rules.
Step 2: Audit Existing Content
Run a regex search for “datas(is|are)” across your codebase or documentation. Note the dominant form and align future edits to match.
This single pass often resolves 90% of inconsistencies.
Step 3: Document the Choice
Create a one-line entry in your style guide: “Use data is in all user-facing copy; reserve data are for peer-reviewed appendices.” Link the rule to examples for quick onboarding.
Edge Cases and Troubleshooting
Parenthetical Clarifications
When both nuances matter, write “the data (each observation) are validated, and the dataset is stored securely.” The parenthesis removes ambiguity without forcing a rewrite.
Quotations and Historical Texts
Preserve the original verb when quoting sources, even if it conflicts with your style guide. Add a sic tag only if the discrepancy could confuse readers.
Multilingual Figures and Tables
Captions often compress information: “Figure 3: Data is normalized.” If the figure legend later lists individual points, maintain the singular to stay consistent with the caption.
Future Trajectory
Generative AI Training Data
Large language models increasingly favor singular usage because their training corpora skew toward web text. Expect this trend to accelerate as synthetic text floods the internet.
Standards Bodies in Flux
IEEE is debating a style update that would accept singular as default while permitting plural in footnotes. Monitor draft ballots if you publish in that ecosystem.
Quick Reference Cheat Sheet
Academic journal: data are. Startup blog: data is. API docs: data is. Legal brief: mirror source text.
Store this line in your note-taking app and paste it into contributor guidelines to save editorial time.