A consent layer for video AI.
Crowdsourced video clips with per-clip permissions — purpose, model class, modality, expiry — cryptographically signed by the contributor on ATProto. Revocations propagate through the public firehose in real time, and every dataset row traces back to its signed receipt.
The problem
Video dataset curation has a consent problem.
YouTube rate limits make dataset-scale download infeasible. Bluesky is short-form and never consented to AI use. The nearest alternatives don't fit the gap.
Spawning · Do Not Train
Opt-out only — 80M URLs honored on goodwill, applied to data that already left.
Sony FHIBE
Right idea on images (Nature 2025): ~10k frames, centrally curated, revocable — but small, and not video.
Mozilla Common Voice
Gold-standard opt-in crowdsourcing — but audio, with no per-contribution cryptographic provenance.
ConsentVid
Federated · video · consent-first · cryptographically signed. The empty intersection — filled.
How it works
Three parties, one open protocol.
Every record lives in the contributor's own ATProto repo, signed by their key. No central database. Revocation is a record deletion. Storage is decoupled from metadata.
Contributor
Sign consent
Register a clip and a granular receipt on your own PDS — purpose × model class × modality × expiry × revocable. Your repo key signs both.
AppView
Index the firehose
Subscribes to Jetstream, indexes every org.consentvid.* record, validating on ingest. Deletions mark consent revoked in real time.
Researcher
Cite signed receipts
Export a Croissant-RAI manifest. Each file embeds its consent receipt's AT-URI. Re-sync revocations before each training run.
Status
Working today, on the live network.
Registered, validated, and revoked on live ATProto. Bytes on R2, IPFS, or PeerTube; the platform holds only signed records and hashes.
The AppView indexes Jetstream in real time (validating on ingest); datasets export a Croissant-RAI manifest; consumers re-sync revocations with consentvid resync; consent can be embedded in the bytes as a signed C2PA manifest; and the org.consentvid.* lexicons are published and DNS-resolvable.
"@type": "sc:Dataset", "conformsTo": "…/croissant/1.0", "distribution": [{ "@type": "cr:FileObject", "sha256": "29506f94…", "cr:consentReceipt": { "atUri": "at://…/org.consentvid.consent/3mkm6zmigrx2v", "purposes": ["training", "evaluation"], "modelClasses": ["academic", "openWeights"], "revocable": true }}]
Compared
Where ConsentVid sits.
| Project | Modality | Approach | Signed | Federated | Revocable |
|---|---|---|---|---|---|
| ConsentVid | Video | Consent-first | yes | yes | yes |
| Sony FHIBE | Image | Consent-first | partial | no | yes |
| Mozilla Common Voice | Audio | Consent-first | no | no | partial |
| Spawning DNT | Generic | Opt-out | no | no | n/a |
| DECORAIT | Generic | Registry (DLT) | yes | partial | yes |
Why now
The substrate finally exists.
EU AI Act, Art. 53
GPAI providers must publish training-data summaries with rights-holder opt-outs. ConsentVid is compliance infrastructure — signed, machine-readable, auditable.
C2PA v2.2
Added video provenance and AI-training-consent assertions. Sony shipped a C2PA video camera; OpenAI committed Sora outputs.
ATProto #3617
The community proposed "User Intents for Data Reuse." Active, no implementation. ConsentVid ships a concrete answer.
Roadmap
Shipped — and what's next.
Consent, provenance, storage, revocation
Signed records on the live network. R2 / IPFS / PeerTube storage. Revocation re-sync + feed. C2PA in-file provenance. Lexicons published, DNS-resolvable, client-side validated.
Web uploader. First contributor community
A browser uploader for non-CLI contributors; hosted manifests; a hosted AppView. Onboarding one motivated community — egocentric, sign language, or oral history.
Loader. Compensation. Compliance
A consentvid-loader that re-syncs revocations and verifies signatures in one call. Optional compensation. EU AI Act Art. 53 alignment.
Three ways in.
Contribute video
A Bluesky account, the CLI, consent terms you choose. Revoke any time. Your repo, your key.
CLI quickstart →Train on consented data
Run the AppView, filter by purpose and model class, export a Croissant-RAI manifest, re-sync before each run.
AppView guide →Collaborate
Funders, academic collaborators, ATproto and dataset-tooling contributors — get in touch.
evangelos.kazakos@cvut.cz