Skip to content
Licensing

Can you train on this data or not?

We use five tiers so buyers can see the answer quickly before a sample or contract moves forward.

5 tiersSource docs attached
5-tier system

Every source gets one clear tier

The tier says what you can do, what you cannot do, and what paperwork backs the decision.

Tier 1

Green light — train on it

We have a signed license that explicitly allows training. Use it the way the contract says.

Allowed

  • Train your model on it
  • Use it for internal evaluation
  • Build derived models

Restricted

  • Resell it to someone else
  • Sublicense it outside the contract

Proof we hold

Signed agreement + source paperwork + review date

Tier 2

Train with guardrails

Training is fine, but the license adds limits — like which markets, which fields, or what you can ship downstream.

Allowed

  • Train internally
  • Evaluate with the limits we agree on

Restricted

  • Republish or redistribute it
  • Use it outside the agreed scope

Proof we hold

License notes + reviewer memo + source files

Tier 3

Talk to legal first

Useful for research and small experiments. Production training needs written approval from both legal teams.

Allowed

  • Research and small experiments
  • Pilot work pending written approval

Restricted

  • Production training without written approval
  • Publishing benchmark slices

Proof we hold

Review notes + escalation log

Tier 4

Do not train on it

Either the license doesn't allow training or the paper trail isn't strong enough. You can still review it internally.

Allowed

  • Internal legal review
  • Hold and re-evaluate later

Restricted

  • Training your model
  • Any commercial use

Proof we hold

Restriction memo + source references

Tier 5

Blocked

We don't ship it. The source has a problem we can't resolve — unclear origin, broken license, or a hard restriction.

Allowed

  • Nothing — we don't release this to buyers

Restricted

  • Training, samples, anything else

Proof we hold

Block decision + audit trail

Tier assignment

How we decide the tier

We trace the source, read the license, then require a second reviewer.

Step 01

Check where it came from

Trace the data back to its source: collector, license, consent, and evidence chain.

Step 02

Read the license

Confirm whether model training is allowed and whether commercial, geographic, or downstream limits apply.

Step 03

Two people sign off

One reviewer assigns the tier, another checks it, and the decision enters the release record.

What buyers receive

What comes with the dataset

These four items ship with every reviewed package.

01

License summary

Plain-English version of what can and cannot be done with each dataset.

02

Usage notes

Specific limits such as commercial use, regions, exclusions, and redistribution scope.

03

Source paperwork

Contracts, consent records, source references, and evidence files for each source group.

04

Review date + reviewer

Who signed off, when, and what is still under active review.

Red-flag scenarios

When we say no

These are the patterns that stop a release.

Release blocked
  • We can't trace where the data originally came from
  • The license language conflicts with model training
  • Your use case is wider than what we reviewed
  • Part of the bundle is blocked — we don't ship the rest as a workaround

Download the one-page summary.

Use it to brief legal, sourcing, or anyone else who just wants the answer fast.