From Text Generation to Real-World Action
A language model that can only produce text is fundamentally limited. It can draft an email but cannot send it. It can recommend a database query but cannot execute it. It can describe the steps to process a refund but cannot actually process one. Function calling and tool use are the capabilities that bridge this gap, transforming language models from sophisticated text generators into autonomous agents that take action in the real world.
Function calling allows a language model to recognize when a user's request requires an external action, select the appropriate function from a defined set, extract the necessary parameters from the conversation context, and output a structured function call that a runtime system can execute. The model doesn't execute the function itself. It generates a precise specification that tells the execution layer what to call and with what arguments.
This capability has become the foundational mechanism for every serious agentic AI system. According to a 2025 survey by Weights & Biases, 87% of production AI agents rely on function calling as their primary mechanism for interacting with external systems. The adoption curve reflects the transformative impact: organizations report that adding tool use to their AI systems increases task completion rates by 3-5x compared to text-only AI assistants.
How Function Calling Works
The Function Definition
Every tool-using AI system starts with function definitions, structured descriptions of the external capabilities available to the model. A well-crafted function definition includes the function name, a clear description of what it does and when to use it, parameter specifications with types, constraints, and descriptions, required vs. optional parameters, and example usage patterns.
Here's what a function definition looks like conceptually for a CRM lookup tool:
The function is named `lookup_customer`. Its description explains that it searches the CRM for customer records by various identifiers and returns profile information, order history, and support ticket history. Parameters include `search_term` (required, string, the customer identifier to search for), `search_type` (optional, enum of email/phone/name/account_id, defaults to email), and `include_history` (optional, boolean, whether to include full order and support history, defaults to false).
The quality of function definitions directly determines how well the model uses tools. Vague descriptions lead to incorrect tool selection. Missing parameter constraints lead to malformed calls. Inadequate examples lead to poor parameter extraction. Investing in precise, comprehensive function definitions is the highest-leverage activity in building tool-using agents.
The Model's Decision Process
When a user sends a message, the model evaluates whether any available tools are relevant. This evaluation considers the user's intent (what are they trying to accomplish?), the conversation history (what context has been established?), the available functions (which tools match this intent?), and the information available (are there enough details to populate required parameters?).
The model may decide that no tool is needed and respond with text. It may decide a single tool call is sufficient. Or it may determine that multiple tool calls are needed, either in parallel (independent operations) or in sequence (where the output of one call informs the next).
Modern models from Anthropic, OpenAI, and Google support parallel function calling, where the model generates multiple function calls in a single response. This is critical for efficiency. If a user asks "What's the status of orders 1234 and 5678?" the model should call the order lookup function twice in parallel rather than sequentially.
The Execution Loop
Function calling operates in a loop: the user sends a message, the model generates one or more function calls, the runtime system executes those calls against real APIs, the results are fed back to the model, and the model either generates another set of function calls (if more work is needed) or produces a final text response to the user.
This loop can iterate multiple times for complex tasks. An agent helping a user troubleshoot a technical issue might first query the user's account details, then check system logs for errors, then search the knowledge base for solutions, then apply a fix through an API call, then verify the fix by checking system status, all within a single conversation. Each iteration of the function calling loop moves the task closer to completion.
Tool Integration Patterns
Direct API Integration
The simplest pattern wraps existing REST or GraphQL APIs as functions. Each API endpoint becomes a function definition. The model generates structured calls that the runtime translates into HTTP requests. This pattern works well when you have well-documented APIs with clear input/output schemas, the API operations map naturally to user intents, and authentication and authorization are manageable at the integration layer.
Direct API integration can connect agents to virtually any existing business system: CRMs, ERPs, HRIS platforms, marketing tools, payment processors, cloud infrastructure, and more. The Girard AI platform provides pre-built connectors for over 200 common business APIs, reducing integration time from weeks to hours.
Database Query Tools
Giving agents the ability to query databases is powerful but requires careful design. Rather than exposing raw SQL execution (which creates security and performance risks), best practice is to create parameterized query functions that accept structured inputs and return formatted results.
For example, instead of a general `execute_sql` function, define specific functions like `get_sales_by_region(region, start_date, end_date)`, `find_customers_by_criteria(industry, revenue_range, last_contact_date)`, and `get_inventory_levels(product_category, warehouse_location)`. Each function translates to a specific, optimized, and security-reviewed query. This approach prevents SQL injection, limits query scope, and ensures consistent performance.
Code Execution Tools
Some tasks require computational capabilities that go beyond API calls. Code execution tools let agents write and run code to perform data analysis, generate visualizations, manipulate files, or implement custom logic. Sandboxed code execution environments like E2B, Modal, or Docker-based sandboxes provide safe execution contexts.
The pattern works as follows: the agent generates code to accomplish a task, the code runs in an isolated sandbox with limited permissions, results (data, files, visualizations) are returned to the agent, and the agent interprets results and continues its workflow.
Code execution is particularly valuable for data analysis tasks where the specific computation isn't known in advance. An agent asked "What's the correlation between our marketing spend and lead volume over the past year?" can write Python code to pull the data, calculate the correlation, generate a chart, and interpret the results.
Multi-Step Orchestration
Complex business tasks require orchestrating multiple tools in sequence. An agent processing a new customer order might validate the customer's payment information (payment API), check inventory availability (inventory system), calculate shipping costs and delivery estimates (logistics API), apply promotional discounts (pricing engine), create the order record (order management system), send confirmation to the customer (email API), and update the sales dashboard (analytics system).
Each step depends on the results of previous steps, and the agent must handle edge cases at every stage: payment declined, item out of stock, shipping address invalid. This kind of multi-step orchestration is where agentic AI delivers its greatest value, automating workflows that previously required human judgment and manual system interactions. For a broader look at how agents manage complex workflows, see our article on [agentic AI explained](/blog/agentic-ai-explained).
Safety Patterns for Tool Use
The Permission Model
Not all actions carry the same risk. Reading data is generally safe. Modifying data requires more caution. Deleting data or making financial transactions demands the highest level of control. Implement a tiered permission model:
**Read-only actions.** Execute automatically without confirmation. Querying databases, searching knowledge bases, checking system status.
**Reversible write actions.** Execute with logging and optional confirmation based on context. Creating draft documents, updating CRM notes, sending internal messages.
**Irreversible or high-stakes actions.** Always require explicit human confirmation. Processing payments, sending external communications, modifying production infrastructure, deleting records.
The permission tier for each function should be defined at the function definition level and enforced by the runtime system, not by the model. Never rely solely on prompt instructions to prevent dangerous actions.
Input Validation
Every function call generated by the model should be validated before execution. Validation checks include type checking (are parameters the correct data types?), range checking (are numeric values within acceptable bounds?), format checking (do strings match expected patterns like email addresses or phone numbers?), authorization checking (does the current user have permission for this operation?), and rate limiting (is this call within acceptable frequency limits?).
Validation failures should be reported back to the model as error messages, allowing it to correct the call or inform the user of the limitation. Well-designed error messages help the model recover gracefully rather than getting stuck in retry loops.
Output Sanitization
Results returned from tool executions may contain sensitive data that shouldn't be exposed to the user. Implement output sanitization that redacts PII, masks credentials, filters internal system identifiers, and removes debug information before results reach the model or the user.
Audit Logging
Every function call, its parameters, execution result, and the context in which it was invoked should be logged in an immutable audit trail. This logging serves multiple purposes: debugging when things go wrong, compliance with regulatory requirements, detecting misuse or anomalous patterns, and providing training data for improving function definitions.
The audit log should capture the full function call specification, execution timestamp, the user identity and conversation context, execution success/failure status, sanitized input parameters and output results, and the model's reasoning for making the call (when available). For comprehensive guidance on maintaining safe AI systems, see our article on [AI guardrails for business](/blog/ai-guardrails-safety-business).
Real-World Examples
Customer Service Agent
A customer service agent equipped with function calling can resolve issues end-to-end. When a customer reports a billing discrepancy, the agent calls `lookup_customer` to retrieve the account, `get_billing_history` to review recent charges, `get_service_usage` to verify actual usage, identifies the discrepancy, calls `create_billing_adjustment` to apply a correction, calls `send_notification` to email the customer a confirmation, and calls `update_ticket` to close the support ticket with a resolution summary.
Organizations deploying function-calling agents for customer service report 55-70% first-contact resolution rates for issues that previously required multiple interactions and manual processing.
DevOps Automation Agent
A DevOps agent with access to infrastructure tools can manage routine operations autonomously. It monitors deployment pipelines via `get_pipeline_status`, investigates failures by calling `get_build_logs` and `search_error_database`, implements fixes by calling `update_configuration` or `rollback_deployment`, verifies resolution through `run_health_check`, and reports actions via `post_to_slack`.
Engineering teams using tool-equipped DevOps agents report 40% reduction in mean time to resolution for common infrastructure issues and 60% fewer after-hours pages for on-call engineers.
Sales Intelligence Agent
A sales agent connected to CRM, email, calendar, and research tools can prepare account executives for meetings by pulling the prospect's company profile and recent news, reviewing CRM activity history and previous interactions, analyzing similar deals that closed successfully, drafting a personalized meeting agenda with talking points, and scheduling follow-up tasks in the CRM.
What previously took 45 minutes of manual preparation per meeting now happens in under 3 minutes, allowing sales teams to be consistently prepared for every customer interaction.
Best Practices for Production Tool Use
Start Narrow, Expand Carefully
Begin with a small set of well-defined, low-risk tools. Validate that the model uses them correctly across a wide range of inputs. Only then add more tools. Each additional tool increases the combinatorial complexity of the system and the potential for misuse. Organizations that start with 5-10 carefully designed tools and expand to 20-30 over three months report significantly better outcomes than those that launch with 50+ tools on day one.
Invest in Function Descriptions
The natural language descriptions in your function definitions are the primary mechanism the model uses to decide which tool to use and how to use it. Spend more time writing and refining these descriptions than you think is necessary. Include explicit guidance on when to use the function, when not to use it, common edge cases, and what the function's output means.
Test with Adversarial Inputs
Users will ask your agent to do things you didn't anticipate. They'll ask it to bypass permissions, access data they shouldn't see, or perform actions outside its intended scope. Test your function calling system with adversarial prompts that attempt to manipulate tool use. Ensure your validation and permission layers hold under pressure.
Monitor and Iterate
Track function call patterns in production. Which tools are used most frequently? Which calls fail most often? Where do users get frustrated? This data is invaluable for improving function definitions, adding missing tools, and optimizing the overall agent experience. Continuous monitoring and iteration separate good tool-using agents from great ones.
Build Agents That Get Things Done
Function calling and tool use are what make AI agents genuinely useful in business. Without these capabilities, AI is a conversational partner. With them, AI is a capable worker that interacts with your systems, executes your processes, and delivers measurable results.
Ready to build AI agents that take real action? [Contact our team](/contact-sales) to see how the Girard AI platform simplifies tool integration with pre-built connectors, safety frameworks, and monitoring dashboards. Or [sign up](/sign-up) and start connecting your first tools today.