Artificial Intelligence and Data Festival (AIDFest)

Artificial Intelligence and Data Festival.Data for AI in practice.

One place to align on what "good data" means for AI: from labeling and evaluation sets to governance and production monitoring—so pilots turn into repeatable practice across teams and sectors.

Context

Background

Models get the headlines; data does the work. If the data is wrong, drifting, or locked away, the best model in the world will still fail in production.

AI adoption increasingly depends on the quality, availability, and governability of data—not just the choice of model. A data-centric approach treats AI success as a lifecycle discipline: deliberately engineering training and operational data, building fit-for-purpose evaluation data, and maintaining data assets as systems evolve. That is the thread AIDFest pulls on—end to end, not just a one-off dataset handoff.

Diagram-style illustration of a data lifecycle: collect, label, evaluate, operate and maintain.
Diagram of three groups—solution builders, practitioners, and agencies—connected to a shared hub for data-for-AI practice.
Intent

Purpose of the Community Event

This is a working meeting, not a slide deck tour: compare notes, surface blockers, and leave with shared language and next steps.

Bring together practitioners, solution builders, and government agencies to strengthen shared capability on data for AI—how to design, create, evaluate, govern, and continuously improve data assets that enable effective, reliable, and responsible AI in real operational settings. The event aims to move organizations beyond isolated pilots toward repeatable, scalable practices: fewer one-off demos, more documented pipelines, evaluation criteria, and governance that teams can actually run.

Why it matters

Why it Matters

Model improvements alone rarely deliver durable value when data is incomplete, biased, drifting, inconsistently labeled, or constrained by governance and access barriers. Weak data practices can compound downstream issues, undermine performance, and increase operational and reputational risk—especially in high-stakes and regulated environments. At the same time, modern data scale requires balancing automation (to scale and standardize) with human expertise (to preserve meaning, context, and accountability).

Data quality and drift

Incomplete, biased, or drifting data undermines reliability when ground truth and operational inputs do not keep pace with change.

Governance and access

Constraints and access barriers must be navigated without freezing innovation in regulated or high-stakes settings.

Automation and human oversight

Scale with automation where it helps; preserve expert judgment where meaning, context, and accountability matter.

Cyber Security Shield
user@infrastructure:~
$ ./data-for-ai lifecycle status
Evaluation data aligned to deployment context
Governance & access policies documented
Monitoring hooks for drift & quality
LIFECYCLE_CHECKDATA-CENTRIC
$ Ready for responsible AI ops.

Goals for the community event

What we want to walk away with: a common playbook, honest visibility into cross-sector blockers, practical methods you can reuse, and collaboration hooks for pilots—not another shelf of strategy PDFs.

Lifecycle playbook

Create a shared "data for AI" lifecycle playbook with common vocabulary and reference practices spanning collection, labeling, preparation, evaluation data design, monitoring, and maintenance.

Cross-sector challenges

Identify and prioritize cross-sector challenges that block AI adoption (data readiness, governance constraints, interoperability, workforce gaps, procurement and partnership needs).

Methods & robustness

Exchange proven methods and tools for improving data quality, representativeness, and robustness, including testing for edge cases and distribution shifts.

Collaboration mechanisms

Establish collaboration mechanisms that connect agencies and practitioners for pilots, benchmarking, and reusable assets.

Deliverables

Expected Outcomes

Tangible outputs you can point to after the festival—not vague “alignment,” but artifacts and commitments.

Each outcome below is something participants can co-own: a charter, a prioritized backlog, working groups, reusable templates, pilots with clear evaluation gates, playbooks for automation vs. human review, and a path to keep collaborating after the room clears.

  • A community charter with agreed scope, principles, and shared terminology for data-centric AI work.
  • A prioritized backlog of real-world "data for AI" challenges contributed by participating agencies and practitioners.
  • Working groups organized around high-impact themes (e.g., data readiness and governance, labeling and ground truth, evaluation data design, monitoring and maintenance).
  • Reusable starter artifacts such as data readiness checklists, labeling guidelines, evaluation set design patterns, and monitoring metric templates.
  • A small set of jointly defined pilot initiatives with clear problem statements, datasets (or data access pathways), evaluation criteria, and governance constraints documented up front.
  • A practical set of "how-to" playbooks clarifying what to automate, where expert oversight is required, and how to operationalize continuous data improvement.
  • A shared pathway for sustained collaboration (regular meetups, knowledge repository, and a mechanism to onboard new partners and contribute reusable assets).

Who the festival is for

A community event for people who build, buy, regulate, or operate AI systems—and care that the data behind them is intentional, evaluable, and governable.

Practitioners

Engineers, data scientists, and operators improving training and operational data, evaluation sets, and monitoring in production.

Solution builders

Teams shipping platforms, tooling, and integrations that make data quality, labeling, and lifecycle practices repeatable at scale.

Government agencies

Public-sector partners aligning data readiness, interoperability, procurement, and responsible use in real operational settings.

Collaboration &
discussion

Connect on data readiness, governance, labeling and ground truth, evaluation design, monitoring, and cross-agency pilots—aligned with the festival outcomes and working groups.

No active discussions matched the criteria.

Curated sources

Recommended reading and listening.

LinkApr 2, 2026

Fireship

🚨 AI Industry Shock: Anthropic Leak Exposes Claude’s Secrets In a wild turn of events, Anthropic—known for its safety-first, closed-source stance—accidentally leaked over 500,000 lines of Claude Code due to a packaging mistake. The code spread instantly, despite takedown attempts. 💡 Key takeaways: Claude isn’t “magic”—it’s a complex system of prompts, guardrails, and tooling stitched together. Heavy use of hardcoded instructions shows how much effort goes into controlling AI behavior. “Anti-distillation” tricks were used to mislead competitors—but are now exposed. Features like “undercover mode” aim to make AI-generated code look human. Hidden roadmap hints reveal experimental features like AI companions and autonomous agents. ⚠️ Bigger picture: This leak highlights a harsh reality—today’s most advanced AI systems are still built on familiar programming techniques, and even top labs are one mistake away from going fully open.
AnthropicAILLM
ArticleApr 2, 2026

Super Data Science podcast - Rohit Choudhary & John Krohn talk

Agentic Data Management (ADM) is a new platform category designed to bring AI automation to data governance, pipeline optimisation, and data operations. Evolving from data observability, ADM aims to reduce the manual effort historically required in data management. The Core Problem ADM Solves Enterprise data is expanding rapidly, currently growing 4 to 5 times year-over-year, with 10x growth expected soon. Much of this is driven by activity logs generated by AI agents. As data volume grows, the cost of errors compounds. Fixing bad data at the point of consumption is roughly a thousand times more expensive than correcting it as it enters your system. ADM addresses this by monitoring data as it moves and transforms, catching problems early rather than at the end of the pipeline. How the ADM Platform Works ADM platforms use Large Language Models (LLMs) to help users search metadata, identify important data assets, diagnose issues, and apply fixes. The platform is designed for different user types: Business users can use plain English prompts and drag-and-drop interfaces to execute data management tasks. Technical users can rely on the platform to generate, deploy, and maintain complex workflow code. A practical example: a Chief Marketing Officer trying to improve audience targeting with complex zip code data can use ADM to automatically identify faulty pipelines, apply data quality rules, and generate remediation steps, rather than manually parsing petabytes of data. Active Governance and AI-Ready Data Traditional data governance treats compliance as an end-of-pipeline process. ADM shifts organisations toward active governance, a real-time approach that monitors data across its entire lifecycle, from origin to consumption. This continuous monitoring is essential for making enterprise data AI-ready. To be AI-ready, data must meet two requirements: Technical accuracy: Data types are correct, for example numbers are numbers and strings are strings. Business context: Structured database records are aligned with the broader context found in unstructured documents and policies. ADM systems automatically recommend rules and execute remediation steps to bridge this gap, making previously unusable data accessible to machine learning models. The Future of Human-Agent Collaboration As ADM matures, runtime environments will be shared by humans and agents. Agents will handle routine workflows and verify each other's work, reducing the alert noise that creates cognitive overload. Rather than humans sorting through constant notifications, agents will surface only critical issues alongside ready-to-deploy fixes. Human workers can then focus on strategic decisions that require judgement.
Data ScienceAIGovernanceAgentADM