Loading...

How to Build Better AI Agent Tools: Cut Costs by 70% (MCP Server Case Study)

How to Build Better AI Agent Tools: Cut Costs by 70% (MCP Server Case Study)

Building tools for AI agents isn’t the same as building regular APIs. This guide shows you how to design tools that reduce token costs by 60-70% while improving accuracy. Whether you’re building Model Context Protocol (MCP) servers, LangChain tools, or custom agent functions—these principles apply.

Quick Take

I reduced my AI tool count from 30 to 8 (73% reduction) and cut token usage by 60-70% per response. This guide shows you how to:

  • Consolidate tools using action parameters
  • Optimize response formats to reduce costs
  • Write tool descriptions that AI agents understand
  • Avoid common pitfalls in AI tool design

The Problem: Why Multiple AI Tools Are Costing You Money

I thought I was being smart when I built 30 separate tools for my AI agent. Each tool did exactly one thing. Clean. Organized. Professional.

Then I got the token bill.

And watched my AI agent call the wrong tool 38% of test queries—calling three tools when it only needed one, requesting detailed responses when summaries would work, and burning through my budget.

Here’s what happened: I was building a Model Context Protocol (MCP) server for SharePoint integration, and I did what seemed logical: create one tool for every API endpoint. Need to get site info? That’s a tool. Need to list subsites? Another tool. Need to search? Yet another tool.

I ended up with 30 tools. It seemed organized on paper.

But when I tested it, the reality hit hard. The AI agent kept making mistakes. It would call the wrong tool, or call three tools when it only needed one. And the token costs? They were way higher than expected.

The Solution: Consolidating AI Tools for Better Performance

I took a step back and asked: “What are people actually trying to do?”

The key insight: Think about tasks, not API endpoints.

Instead of wrapping each API call in its own tool, I focused on what users were trying to accomplish. This single mindset shift changed everything.

I combined 30 tools into 8. That’s a 73% reduction. Here’s what it looked like:

Visual: The Transformation

MCP Tool Consolidation Strategy

Before

❌ get_site_info
❌ get_site_lists
❌ get_site_libraries
❌ get_site_pages
❌ search_sites
… (25 more tools)

After

✅ sharepoint_site (actions: get_info, list_subsites, search)
✅ sharepoint_list (actions: get_lists, get_items, create_item)
✅ sharepoint_files (actions: search, get_metadata, download)

Two Small Changes That Made a Big Difference

1. Action parameter - One tool can do multiple things

action: Literal["get_info", "list_subsites", "search"]

2. Response format parameter - Control how much detail you get back

response_format: Literal["concise", "detailed"] = "concise"

How Token Costs Impact AI Development Budget

Every API call your AI agent makes costs money. When your agent calls the wrong tool or requests more data than needed, those costs add up fast.

Tokens Are Expensive

Here’s a real example from my SharePoint server that made me rethink everything. When you ask for a site’s information, you can get back a lot of detail:

Detailed response (~280 tokens):

{
    "@odata.context": "https://graph.microsoft.com/v1.0/$metadata#sites/$entity",
    "@microsoft.graph.tips": "Use $select to choose only the properties your app needs, as this can lead to performance improvements. For example: GET sites('<key>')/microsoft.graph.getByPath(path=<key>)?$select=displayName,error",
    "createdDateTime": "2025-04-12T16:40:22.963Z",
    "description": "A centralized repository for accessing country-specific HR policies and procedures across ACME Corporation's global operations.",
    "id": "spridermvp.sharepoint.com,506b7692-04ba-4be9-afc6-df146925948b,c7f4ceb0-f301-4280-8cc4-a8dba8560b64",
    "lastModifiedDateTime": "2026-01-24T13:42:56Z",
    "name": "acme-global-hr-policies-portal",
    "webUrl": "https://spridermvp.sharepoint.com/sites/acme-global-hr-policies-portal",
    "displayName": "ACME Global HR Policies Portal",
    "root": {},
    "siteCollection": {
        "hostname": "spridermvp.sharepoint.com"
    }
}

Concise response (~88 tokens):

{
    "description": "A centralized repository for accessing country-specific HR policies and procedures across ACME Corporation's global operations.",
    "lastModifiedDateTime": "2026-01-24T13:42:56Z",
    "name": "acme-global-hr-policies-portal",
    "webUrl": "https://spridermvp.sharepoint.com/sites/acme-global-hr-policies-portal"
}

Most of the time, you just need the name and URL. You don’t need all those IDs and timestamps. So I made concise the default. If the agent needs the technical details for a follow-up call, it can ask for detailed.

This approach can reduce token usage by 60-70% per response.

How Does the AI Know Which Action and Format to Use?

You might be wondering: “How does the AI agent pick the right action and response format?”

The Tool Description Pattern

The answer is in your tool description. The AI reads it like instructions. Here’s an example:

@mcp.tool()
def sharepoint_site(
    action: Literal["get_info", "list_subsites", "search"],
    site_url: str = None,
    query: str = None,
    response_format: Literal["concise", "detailed"] = "concise"
) -> str:
    """
    Work with SharePoint sites.
    
    Actions:
    - get_info: Get details about a specific site (requires site_url)
    - list_subsites: List all subsites under a parent site (requires site_url)
    - search: Find sites matching a query (requires query)
    
    Response formats:
    - concise: Returns only essential information (names, titles, URLs)
    - detailed: Returns full metadata including IDs for follow-up operations
    
    Use 'detailed' only when you need technical IDs for subsequent tool calls.
    """

Example Walkthrough: Finding a Marketing Site

When a user asks “Find the marketing site”, the AI:

  1. Reads the tool description
  2. Sees that search action requires a query
  3. Picks action="search" and query="marketing"
  4. Uses default response_format="concise" since it just needs to show results

Example Walkthrough: Fetching Documents

If the user then says “Get all the documents from that site”, the AI:

  1. Remembers it needs the site ID for the next call
  2. Goes back and calls the same tool with response_format="detailed"
  3. Gets the technical IDs it needs
  4. Uses those IDs in the next tool call

The Key Principle

💡 Key Insight: The AI isn’t magic—it’s following your instructions. The better you explain what each action does and when to use each format, the better it performs.

The Tradeoffs

Nothing is perfect. Here are the downsides I ran into:

1. More Complex Tool Descriptions

Before, each tool was simple: “Get site info.” Done.

Now, I have to explain multiple actions in one description. The tool description got longer. If you have 5-6 actions in one tool, it can get messy and the AI might get confused.

My rule: Keep it to 3-4 actions max per tool. If you need more, split it into two tools.

2. Harder to Debug

When something goes wrong, it’s trickier to figure out what happened. With 30 separate tools, if get_site_info failed, I knew exactly where to look.

Now, if sharepoint_site fails, I have to check: Which action was called? What parameters were passed? Was it a problem with the action logic or the parameter validation?

My solution: Add detailed logging for each action within the tool. Log the action name, parameters, and response format every time.

3. The AI Can Still Pick Wrong

Even with clear descriptions, the AI sometimes picks the wrong action or forgets to use detailed when it needs IDs for the next call.

This happens maybe 5-10% of the time. It’s better than the 38% error rate I had with 30 tools (where test queries resulted in wrong tool selection), but it’s not zero.

What helps:

  • Add examples in your tool description
  • Test with real user queries
  • Use clear parameter names (site_url not just url)

4. Not Every Tool Should Be Consolidated

Some tools are better left separate. If two operations are completely different and rarely used together, don’t force them into one tool just to reduce the count.

For example, I kept user_profile and user_search as separate tools. They serve different purposes and combining them would make the description confusing.

The test: Ask yourself: “Would a person naturally think these actions belong together?” If not, keep them separate.

Reflection point: Which of these tradeoffs concerns you most for your use case? The debugging complexity or the risk of AI confusion?

When This Approach Works Best

This works great when:

  • You have multiple tools that operate on the same resource (sites, files, users)
  • The actions are related and often used in sequence
  • You’re dealing with high token costs
  • Your users do varied tasks (not just one specific workflow)

This might not work if:

  • You have very specialized, single-purpose tools
  • Each tool has completely different parameters
  • You need extremely precise error handling for each operation
  • Your users only do one or two specific tasks

What I Learned: Key Principles for AI Tool Design

1. Think about tasks, not API endpoints (Most Important!)

Don’t just wrap your API. Think about what people are trying to accomplish. This is the most important principle that drives everything else.

❌ Three separate tools: list_users, list_events, create_event
✅ One tool: schedule_event (finds availability and creates the event)

2. Return information people can actually read

AI agents do better with names than with cryptic IDs.

user_uuid: "e1b2c3d4-e5f6-7890"
user_name: "Sarah Chen, Engineering Manager"

3. Use smart defaults

  • Start with concise responses
  • Add pagination (I limit responses to 25,000 tokens)
  • Let agents filter results to get exactly what they need

4. Write tool descriptions like you’re explaining to a coworker

The AI reads your tool description. Make it clear and helpful.

❌ “Searches SharePoint”
✅ “Search across SharePoint sites, documents, and lists. Use filters to narrow results. Returns top 10 matches by default.”

The Results

Based on the consolidation and MCP best practices:

Metric Impact
Total Tools 73% reduction (30 → 8)
Token Efficiency ~70% fewer tokens per response
Agent Performance Faster tool selection, fewer errors
Monthly Cost Savings 50-80% reduction (varies by query complexity)*

Note: These are projected savings based on tool consolidation and response format optimization. Actual results will vary depending on your specific use cases and query patterns.

How to Do This Yourself

Here’s a basic template you can use:

from enum import Enum
from typing import Literal

class ResponseFormat(Enum):
    DETAILED = "detailed"
    CONCISE = "concise"

@mcp.tool()
def my_action_tool(
    action: Literal["search", "get", "list"],
    query: str,
    response_format: ResponseFormat = ResponseFormat.CONCISE
) -> str:
    """
    Multi-purpose tool for [resource].
    
    Actions:
    - search: Find items matching query
    - get: Retrieve specific item details
    - list: Show all available items
    
    Use 'concise' for human-readable summaries.
    Use 'detailed' when you need IDs for follow-up calls.
    """
    
    result = perform_action(action, query)
    
    if response_format == ResponseFormat.CONCISE:
        return format_concise(result)
    else:
        return format_detailed(result)

Quick Summary

Before you dive in, here’s the roadmap:

  1. Combine related tools using action parameters → reduces tool count and confusion
  2. Add a response_format option (concise vs detailed) → cuts token usage by 60-70%
  3. Default to concise to save tokens → agents request detailed only when needed
  4. Return human-readable information, not just IDs → improves agent decision-making
  5. Write clear tool descriptions → think of them as instructions for a coworker
  6. Test with real tasks and measure results → validate your optimizations

Other Token Reduction Techniques

Beyond tool design, consider these approaches:

  • TOON Format – A JSON alternative designed for LLMs, reducing tokens by 30-60%
  • Prompt Caching – Cache repeated context for 75% cheaper tokens
  • Model Cascading – Use cheaper models for simple tasks, up to 90% savings
  • RAG – Retrieve only relevant context instead of full documents

Want to Learn More?

The official MCP documentation has a great guide on this topic: Writing Effective Tools for Agents

Take Action

Ready to optimize your AI tools?

Next Steps:

  1. Audit your current tools - how many could be combined?
  2. Identify which tools could benefit from response format options
  3. Start with your highest-traffic tools for maximum impact
  4. Measure token usage before and after

Questions or feedback? I’d love to hear about your optimization results or challenges you’re facing. What’s your tool count, and which optimization would help your use case most?

The Bottom Line

The key insight: Think about tasks, not API endpoints. This single principle drives everything else in AI tool design.

Building tools for AI isn’t the same as building regular APIs. I cut my tool count by 73% and this approach can reduce token usage by 60-70% per response, depending on the data complexity. The agent worked better, costs went down, and maintenance became simpler.

Sometimes less really is more.

Published on:

Learn more
Home | Joseph Velliah
Home | Joseph Velliah

Fulfilling God’s purpose for my life

Share post:

Related posts

Stay up to date with latest Microsoft Dynamics 365 and Power Platform news!
* Yes, I agree to the privacy policy