How to use this. A data contract is an agreement between the people who produce a dataset and the people who consume it. It exists to replace assumption with accountability, not to generate paperwork.
Fill the Minimum Viable Contract (sections 1–5) for every governed dataset. Add sections 6–9 only when the data’s scale, risk, or number of consumers actually justifies them. If a section does not reduce a real risk, leave it out. A short contract that people honour beats a complete one that they ignore.
Replace every
[bracketed placeholder]and delete the guidance notes in italics before publishing.
Contract Header
| Field | Value |
|---|---|
| Dataset / product name | [e.g., Customer Master — EU] |
| Contract ID | [e.g., DC-CUST-EU-001] |
| Version | [e.g., 1.0] |
| Status | [Draft / Active / Deprecated] |
| Effective date | [YYYY-MM-DD] |
| Review date | [YYYY-MM-DD — when this contract is next revisited] |
| Producing domain | [team / system that publishes the data] |
Minimum Viable Contract (Sections 1–5)
1. Purpose
One or two sentences. What is this dataset, and what decision or process does it serve? If you cannot say why it exists, it does not need a contract yet.
[e.g., The authoritative view of active EU customers, used by billing, CRM, and the churn model.]
2. Definitions
The most expensive conflicts are semantic, not technical. Define the terms people argue about before they argue about them. Add a row for every entity, status, or attribute that could be interpreted two ways.
| Term | Definition | Notes / edge cases |
|---|---|---|
[Active customer] | [Precise business meaning] | [e.g., excludes accounts dormant > 18 months] |
[Country of origin] | [Precise business meaning] | [e.g., manufacturing origin, NOT supplier HQ] |
[Transaction complete] | [The exact condition that makes this true] | |
[Official shipment date] | [Which date, from which system] |
Units and formats: [currency, units of measure, date format, time zone]
3. Ownership & Accountability
Contribution can be shared. Accountability cannot. Name one accountable owner. If the answer is “several teams”, the contract has already failed.
| Role | Name / team | Contact | Responsibility |
|---|---|---|---|
| Data Owner (accountable) | [name] | [email / channel] | Final accountability; resolves disputes; approves changes |
| Data Steward (coordinates) | [name] | [email / channel] | Day-to-day quality, definitions, issue coordination |
| Producing team | [team] | [channel] | Delivers data to standard; gives notice of change |
| Primary consumers | [teams / systems] | [channel] | Report issues; validate changes |
Escalation path: [who is contacted, in what order, when something breaks]
4. Schema & Rules
The structure consumers can rely on. A change here can silently break a dashboard or a model, so this is the part that must be versioned and never changed quietly.
Delivery: [table / API / file / stream] at [location or endpoint]
Format: [e.g., Parquet / JSON / CSV]
| Field | Type | Required | Description | Validation / allowed values |
|---|---|---|---|---|
[customer_id] | [string] | Yes | [unique identifier] | [primary key; format ...] |
[country_of_origin] | [string] | Yes | [see definition above] | [ISO 3166 code] |
[status] | [string] | Yes | [lifecycle state] | [Active / Inactive / Pending] |
[...] |
Primary key / identifiers: [field(s) that uniquely identify a record]
Reference data dependencies: [lookups or hierarchies this data relies on]
Versioning policy: [how schema versions are numbered and how long old versions are supported]
5. Quality Standards
State the bar, and what happens when data falls below it. A threshold with no consequence is decoration.
| Dimension | Standard / threshold | If breached |
|---|---|---|
| Completeness | [e.g., country_of_origin ≥ 99% populated] | [flag / hold / notify] |
| Accuracy | [e.g., validated against source X] | |
| Uniqueness | [e.g., duplicate rate < 0.5%] | |
| Timeliness | [e.g., reflects source within 24h] | |
| Consistency | [e.g., status aligns with billing system] |
On breach, the pipeline: [stops / passes with warning / flags for review] — stopping bad data early is cheaper than auditing a model that already learned from it.
Extended Contract (add only when justified)
6. SLAs & Service Expectations
Consumers build assumptions into their processes. Make the assumptions explicit so they don’t become surprises.
| Commitment | Value |
|---|---|
| Refresh frequency | [e.g., daily by 06:00 CET] |
| Availability | [e.g., 99.5%] |
| Latency | [max delay from source change to availability] |
| Incident response | [time to acknowledge / resolve] |
| Change notification | [notice period before breaking changes, e.g., 30 days] |
| Retention | [how long data / history is kept] |
7. Access & Security
| Item | Value |
|---|---|
| Classification | [Public / Internal / Confidential / Restricted] |
| Access permissions | [who may read; how access is requested] |
| Usage restrictions | [permitted and prohibited uses] |
| Compliance scope | [GDPR / sector regulation / audit requirements] |
8. Change Management
How this contract changes without breaking the people who depend on it.
- Proposing a change:
[who can, and how] - Notice period for breaking changes:
[e.g., 30 days, with consumer sign-off] - Deprecation process:
[how a version is retired and over what period] - Backward compatibility:
[what is guaranteed not to break within a major version]
9. RACI
Use this when more than a couple of teams touch the data. The dangerous condition it prevents: shared ownership with no defined accountability — when everyone owns the data, nobody owns the problem.
| Activity | Owner | Steward | Engineer | Governance Lead | Consumer |
|---|---|---|---|---|---|
| Define meaning & standards | A | R | C | C | C |
| Implement schema & pipeline | C | C | R | I | I |
| Monitor quality | I | A | R | I | I |
| Approve breaking change | A | C | C | C | C |
| Report issues | I | R | I | I | R |
| Resolve cross-domain conflict | C | C | I | A | I |
R = Responsible · A = Accountable · C = Consulted · I = Informed
Sign-off
| Party | Name | Date |
|---|---|---|
| Data Owner (accountable) | [ ] | [ ] |
| Producing team lead | [ ] | [ ] |
| Primary consumer(s) | [ ] | [ ] |
A data contract is not a technical artefact. It is an agreement about how the organisation will behave with its own data — and it only works when the ownership behind it is real.
Leave a Reply