Building Autonomous Agents Using Gemini 3 Pro's Tool Calling

Building Autonomous Agents Using Gemini 3 Pro's Tool Calling
Photo by Amanz / Unsplash

By now, in early 2026, the industry has firmly moved past the "chatbot" phase. We are no longer asking Large Language Models (LLMs) to simply talk; we are asking them to do. For CTOs and Senior Engineers, the focus has shifted from prompt engineering to custom ai agents development—building deterministic, reliable systems where models like Gemini 3 Pro act as the reasoning engine behind complex business workflows.

The release of Gemini 3 Pro has streamlined this significantly. Unlike previous iterations that required heavy middleware to manage tool schemas (the JSON definitions of what a model can do), the modern google-genai SDK combined with Gemini 3’s native reasoning capabilities allows for nearly "zero-latency" decision-making.

In this article, we will build a fully functional autonomous agent that can interact with a local database and perform external actions (mocked as email delivery), demonstrating the architectural patterns required for enterprise-grade agentic systems.

On-Demand Shared Software Engineering Team

Access a flexible, shared software product engineering team on demand through a predictable monthly subscription. Expert developers, designers, QA engineers, and a free project manager help you build MVPs, scale products, and innovate with modern technologies like React, Node.js, and more.

Try 4Geeks Teams

The Architecture of a Modern Agent

Before writing code, we must understand the "Loop" that differentiates a standard API call from an Agent.

In a traditional RAG (Retrieval-Augmented Generation) setup, the flow is linear: Input -> Retrieve -> Answer.

In an Agentic setup, the flow is circular and stateful:

  1. Observation: The user provides input ("Find the overdue invoices and email the clients").
  2. Reasoning (The Brain): Gemini 3 Pro analyzes the intent and selects a tool from its registered toolbox.
  3. Execution (The Body): The application executes the Python function (e.g., query_database).
  4. Reflection: The tool's output is fed back into the model.
  5. Termination: The model decides if the task is complete or requires further steps.

Technical Implementation

We will use Python 3.12+ and the google-genai library. While older versions required verbose JSON schema definitions, the modern SDK allows us to pass Python functions directly—Gemini handles the type inference and schema generation.

1. Setup and Initialization

First, ensure your environment is configured.

pip install -q -U google-genai

We initialize the client using the standard pattern. Note the configuration for auto_function_calling, which simplifies the execution loop in simple use cases, though we will build a manual loop here for maximum control—a requirement for complex enterprise logic.

import os
from google import genai
from google.genai import types

# Initialize the client (assumes GEMINI_API_KEY is set in environment)
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])

# Model Configuration
MODEL_ID = "gemini-3.0-pro" # Or gemini-2.0-flash-exp for current testing

2. Defining the Toolbelt

The power of an agent lies in its tools. We will define two functions: one to retrieve data (read) and one to perform an action (write).

Crucial Note: The docstrings are not comments; they are part of the prompt. Gemini uses them to understand when and how to use the tool.

# Mock Database
INVOICE_DB = {
    "INV-001": {"client": "Acme Corp", "amount": 5000, "status": "overdue", "contact": "finance@acme.com"},
    "INV-002": {"client": "Globex", "amount": 12000, "status": "paid", "contact": "billing@globex.com"},
    "INV-003": {"client": "Soylent Corp", "amount": 3400, "status": "overdue", "contact": "pay@soylent.com"},
}

def search_invoices(status: str) -> dict:
    """
    Searches the internal invoice database by status.
    
    Args:
        status: The status to filter by (e.g., 'overdue', 'paid').
        
    Returns:
        A dictionary of invoice details matching the criteria.
    """
    print(f"\n[System] Searching database for status: {status}...")
    results = {k: v for k, v in INVOICE_DB.items() if v["status"] == status}
    return results

def send_email_reminder(invoice_id: str, email_address: str) -> str:
    """
    Sends a payment reminder email to a client.
    
    Args:
        invoice_id: The ID of the invoice (e.g., 'INV-001').
        email_address: The recipient's email address.
        
    Returns:
        Confirmation string of the action.
    """
    print(f"\n[System] Sending email for {invoice_id} to {email_address}...")
    # In a real scenario, this would integrate with SendGrid or SES
    return f"Email successfully queued for {invoice_id}."

# Create the toolbox
toolbox = [search_invoices, send_email_reminder]

On-Demand Shared Software Engineering Team

Access a flexible, shared software product engineering team on demand through a predictable monthly subscription. Expert developers, designers, QA engineers, and a free project manager help you build MVPs, scale products, and innovate with modern technologies like React, Node.js, and more.

Try 4Geeks Teams

3. The Execution Loop

This is where the engineering happens. We don't just call generate_content once; we enter a while loop that handles the model's requests to invoke functions. This pattern is essential for custom ai agents development, as it handles multi-step reasoning (e.g., finding the invoice first, then extracting the email, then sending it).

def run_agent(user_query: str):
    print(f"--- Agent Task: {user_query} ---")
    
    # Initialize chat history with the user's request
    chat = client.chats.create(
        model=MODEL_ID,
        config=types.GenerateContentConfig(
            tools=toolbox,
            temperature=0.0 # Deterministic behavior for tool use
        )
    )

    response = chat.send_message(user_query)
    
    # The SDK handles the recursive loop of:
    # 1. Model predicts function call
    # 2. SDK executes function
    # 3. SDK feeds result back to model
    # 4. Model generates final answer
    # Note: If using 'automatic_function_calling=True' isn't desired (e.g. for async),
    # you would manually parse 'response.function_calls'. 
    # Here, we demonstrate the manual parsing for clarity on what happens under the hood.

    while response.function_calls:
        for tool_call in response.function_calls:
            name = tool_call.name
            args = tool_call.args
            
            # Dynamic dispatch
            if name == "search_invoices":
                result = search_invoices(**args)
            elif name == "send_email_reminder":
                result = send_email_reminder(**args)
            else:
                result = "Error: Tool not found."
            
            # Feed the result back to Gemini
            # The model needs the result to continue its train of thought
            response = chat.send_message(
                types.Part.from_function_response(
                    name=name,
                    response={"result": result}
                )
            )

    print(f"\n[Agent Final Answer]: {response.text}")

# --- Execution ---
run_agent("Find all overdue invoices and send an email reminder to their contacts.")

Output Analysis

When you run the code above, you will observe the agent performing "Chain of Thought" reasoning without explicit instruction:

  1. Step 1: The agent recognizes it cannot "send emails" yet because it doesn't know who is overdue.
  2. Step 2: It calls search_invoices(status='overdue').
  3. Step 3: It receives the data (INV-001 and INV-003).
  4. Step 4: It iterates internally. It sees two results. It triggers send_email_reminder for INV-001.
  5. Step 5: It triggers send_email_reminder for INV-003.
  6. Step 6: It summarizes the actions to the user.

This autonomy is what distinguishes a script from an agent.

Advanced Patterns: Parallel Execution & Error Handling

In a production environment (like those we see when consulting on custom ai agents development), simple loops aren't enough. You need:

  • Parallel Tool Use: Gemini 3 Pro can generate multiple function calls in a single turn. If the agent needs to check the stock price of Apple, Google, and Microsoft, it should issue three get_stock_price calls simultaneously, not sequentially. The function_calls list in the response object supports this natively.
  • Guardrails: Never let an agent execute "write" operations (like delete_database or refund_payment) without a "Human-in-the-Loop" check. You can implement this by having the delete tool return a request for confirmation code, which the user must provide in the next turn.

Why Logic belongs in Code, not Prompts

A common mistake is asking the LLM to perform arithmetic or complex data transformation.

  • Bad: Asking Gemini to "Calculate the sum of all overdue invoices."
  • Good: Providing a tool calculate_total(invoices) or simply writing code that sums the results of the search_invoices tool.

As software engineers, we should view the LLM as a router and reasoner, not a calculator. It excels at unstructured-to-structured transformation (User Text -> JSON Tool Call) and decision making (If X -> Call Y).

Conclusion

The release of Gemini 3 Pro has validated the architectural shift toward agentic workflows. By leveraging native tool calling, we can build systems that are flexible enough to handle natural language but rigid enough to execute critical business logic reliably.

For organizations looking to scale these architectures, the challenge often lies not in the API integration, but in the surrounding infrastructure: vector databases for memory, evaluation pipelines to prevent drift, and secure deployment. If you are exploring enterprise-grade custom ai agents development, partnering with a specialized engineering firm is often the fastest path to production.

On-Demand Shared Software Engineering Team

Access a flexible, shared software product engineering team on demand through a predictable monthly subscription. Expert developers, designers, QA engineers, and a free project manager help you build MVPs, scale products, and innovate with modern technologies like React, Node.js, and more.

Try 4Geeks Teams

FAQs

How does an autonomous AI agent differ from a traditional RAG chatbot?

While a traditional RAG (Retrieval-Augmented Generation) chatbot typically follows a linear flow of retrieving information and generating an answer, an autonomous agent operates in a circular, stateful loop. This "Agentic" loop consists of Observation, Reasoning, Execution, and Reflection. Instead of just talking, the agent uses reasoning to select specific tools, execute actions (like querying a database), and analyze the results to determine if the task is complete or if further steps are required.

How does Gemini 3 Pro allow developers to connect LLMs with local Python functions?

Gemini 3 Pro utilizes a feature known as tool calling (or function calling) to bridge the gap between natural language and code. Developers can pass standard Python functions directly to the model using the google-genai SDK. The model analyzes the function's docstrings to understand its purpose and arguments, then generates the necessary structured data to execute that function. This allows the AI to act as a reasoning engine that "drives" the code, handling tasks like database searches or email delivery with near zero-latency decision-making.

What are the best practices for ensuring security and reliability in enterprise AI agents?

To build reliable systems, it is critical to separate reasoning from computation; complex logic or arithmetic should remain in the code, while the LLM acts as a router. Furthermore, for high-stakes operations (like writing to a database or sending payments), developers should implement "Human-in-the-Loop" guardrails to require user confirmation before execution. Specialized engineering services, such as 4Geeks Custom AI Agents Development, focus on establishing these secure infrastructures, including parallel execution patterns and evaluation pipelines to prevent model drift in production environments.

Read more