Can you train on this data or not?
We use five tiers so buyers can see the answer quickly before a sample or contract moves forward.
Every source gets one clear tier
The tier says what you can do, what you cannot do, and what paperwork backs the decision.
Green light — train on it
We have a signed license that explicitly allows training. Use it the way the contract says.
Allowed
- Train your model on it
- Use it for internal evaluation
- Build derived models
Restricted
- Resell it to someone else
- Sublicense it outside the contract
Proof we hold
Signed agreement + source paperwork + review date
Train with guardrails
Training is fine, but the license adds limits — like which markets, which fields, or what you can ship downstream.
Allowed
- Train internally
- Evaluate with the limits we agree on
Restricted
- Republish or redistribute it
- Use it outside the agreed scope
Proof we hold
License notes + reviewer memo + source files
Talk to legal first
Useful for research and small experiments. Production training needs written approval from both legal teams.
Allowed
- Research and small experiments
- Pilot work pending written approval
Restricted
- Production training without written approval
- Publishing benchmark slices
Proof we hold
Review notes + escalation log
Do not train on it
Either the license doesn't allow training or the paper trail isn't strong enough. You can still review it internally.
Allowed
- Internal legal review
- Hold and re-evaluate later
Restricted
- Training your model
- Any commercial use
Proof we hold
Restriction memo + source references
Blocked
We don't ship it. The source has a problem we can't resolve — unclear origin, broken license, or a hard restriction.
Allowed
- Nothing — we don't release this to buyers
Restricted
- Training, samples, anything else
Proof we hold
Block decision + audit trail
How we decide the tier
We trace the source, read the license, then require a second reviewer.
Check where it came from
Trace the data back to its source: collector, license, consent, and evidence chain.
Read the license
Confirm whether model training is allowed and whether commercial, geographic, or downstream limits apply.
Two people sign off
One reviewer assigns the tier, another checks it, and the decision enters the release record.
What comes with the dataset
These four items ship with every reviewed package.
License summary
Plain-English version of what can and cannot be done with each dataset.
Usage notes
Specific limits such as commercial use, regions, exclusions, and redistribution scope.
Source paperwork
Contracts, consent records, source references, and evidence files for each source group.
Review date + reviewer
Who signed off, when, and what is still under active review.
When we say no
These are the patterns that stop a release.
- We can't trace where the data originally came from
- The license language conflicts with model training
- Your use case is wider than what we reviewed
- Part of the bundle is blocked — we don't ship the rest as a workaround
Download the one-page summary.
Use it to brief legal, sourcing, or anyone else who just wants the answer fast.