How multi-agent systems revolutionize data work flows

Sponsored feature The biggest challenge to AI initiatives is the data they rely on. More powerful computing and higher-capacity storage at lower cost has created a flood of information, and not all of it is clean. It is often fragmented, duplicated, poorly governed, or inadequately structured. The old rule of data processing remains garbage in, garbage out, and it haunts any business that wants to be more data-driven. AI came along just in time to help solve the problem. It scales to analyze and improve this data at speed. Now, multi-agent AI systems are able to do it even more quickly and efficiently. Agentic AI uses autonomous agents to transform how data is prepared, governed, and made usable with the help of humans in the loop. They promise to ready our data for AI workloads by tackling specialized tasks together, making it easier to squeeze value from our information. The future of data management is multi-agentic, where sophisticated systems will continuously learn and adapt to deliver trustworthy, high-quality data at scale. Clearing the bottlenecks Traditional data engineering is resource-intensive. It needs expertise in manual coding, schema mapping, and error-prone troubleshooting. Many generative AI projects falter due to inadequate data governance and preparation. "To move the needle, we have to drastically reduce the manual toil involved in making a computer understand what you want to achieve," explains Firat Tekiner, senior staff product manager at Google Cloud. There are three components to this. The first is an evolution in how we talk to computer systems. This used to require extensive coding. Now, we can use plain natural language prompts to trigger context-aware pipeline creation, modification, and testing. Large language models help the system understand intent without requiring sophisticated coding skills. "An AI agent can translate what you're asking it to do into concrete actions," Tekiner explains. The second component is ensuring that data is AI-ready with governance baked in. It must be trustworthy, secure, and compliant with the organization's rules. Systems must manage quality and access controls on a per-application basis, because doing so at a global level would slow things down too much. The third effort-saving measure is breaking down data silos and allowing more people to handle data engineering tasks. That prevents data engineers from becoming bottlenecks in the data pipeline. "We need to empower a broader range of data workers that include analysts, scientists, or even business users who are involved in contributing to and making sense of the data," Tekiner says. The multi-agent advantage We can assign roles to AI agents that enable them to handle different jobs that contribute to the same common goal, completing it more quickly than a single monolithic program could. Google Cloud is designing a collaborative ecosystem of multiple agents within a common framework of metadata driven by Gemini, Google's advanced family of LLMs. Each agent can be an expert in its own area, such as data engineering, data science, governance, or data analytics. These agents can also pass information and tasks to each other. Tekiner explained: "we are heading to a future where one agent ingests data, another handles complex transformation, another focuses on data quality, and another handles validation." He compared this type of collaboration to an ant colony. Individual ants can individually carry out simple tasks. When they come together, though, they can solve really sophisticated problems. "The system becomes much more robust, resilient, and adaptive with multi-agent systems," Tekiner observes. Collaboration and context-aware intelligence Tekiner offers the analogy of a successful football team. "A striker could be really good at scoring goals, but the team also needs defenders like an ingestion agent, and midfielders to control the play like a transformation agent." These players must all work together, taking instructions from the coach based on strategy and data about the opposing team's strengths and weaknesses. In the same way, a multi-agent system has to understand and operate within an organizational context. This context and the metadata that agent has access to defines its intelligence. From a logical point of view, an agent system can build its intelligence through a hierarchical system that operates across multiple interconnected layers.. The base level assures that the agents understand best practices, including standard data formats and structures, common data quality checks and validation rules, analytical methodologies, and basic security and compliance principles. This foundational knowledge, which is available through LLMs such as Gemini, is key to enabling agents to manage data operations consistently and reliably. It is also possible to layer sector-specific knowledge atop this base level. An agent working with healthcare data would need to comply with HIPAA regulations, just as a financial agent would need to know the privacy and money laundering regulations. This industry layer ensures that agents can navigate the specific complexities and requirements of different business domains. Another layer of metadata can incorporate company-specific elements like naming conventions, security policies, and existing data models. These critical insights are not uploaded to anything outside the customer's environment, which is very important for maintaining data privacy and security. Learning to preempt problems The system can be taught this hierarchical understanding through explicit business rules and workflow instructions. These include approved data processing procedures, compliance requirements and constraints, and custom field mappings and transformations. In other words, a data engineering contract or a spec can be given to the agent. The agent can also learn organizational patterns itself by processing historical workflows. Based on what it observes and learns, an agent can detect and flag deviations from established patterns and recommend a course of action. It might monitor your pipelines to identify schema and data drift and suggest fixes. This capacity for autonomous learning makes the agent more relevant and efficient, greatly reducing the need for manual input from the human-in-the-loop. The benefits of automated efficiency We are heading to a future where multiple data agents replicate this kind of autonomous capability, increasing individual productivity as agents take on repetitive tasks. For example, a business might face a regulatory change that would impact multiple pipelines. A low-tech fix involves writing up the code and copying it to many locations, but that's time-consuming and error-prone. An agent could consistently get it to thousands of places in seconds. If the agent can get 95% of the work done error-free then it is potentially reducing six months of work to around a week. Agents can also automate manual metadata management and documentation tasks. As you interact with them they can capture knowledge within the organisation and they can capture and document processing rules and patterns, suggesting and implementing updates systematically. Automating metadata creation and maintenance preserves institutional knowledge and keeps it updated as an organization grows. Democratizing data The future of AI lies in agents that understand not just what to do, but how to do it within specific business environments. As trust in these systems grows, organizations can expect increasing autonomy to reduce the manual process burden. With BigQuery, Google Cloud has taken the lead in this frontier. It is empowering a new generation of data workers, from engineers to analysts, by providing them with intelligent data engineering agents as collaborative partners. These agents have the potential to manage complex data operations with embedded governance to deliver trustworthy, high-quality data at scale. By continuously learning and autonomously adapting, they make it easier and faster to get insights from our data while human engineers get on with higher-order work. Sponsored by Google Cloud.