A Tale of Three MCP Servers

Carlisia Campos picture
Carlisia Campos
MCP Technical Strategist

Publish Date August 15, 2025

Let me tell you about Eduardo and Monica (😬 if you know, you know, lolz), and Bruno, three developers who learned this lesson very differently.

Keep in mind:

A tool (handler) can either contain all the code to be executed or simply call another function that serves as an entry point for the actual work. Below, I’m showing only these entry-point functions.

Monica’s FlexiServer: why just one thing?

Monica built FlexiServer, aiming to create the “Swiss Army knife of MCP servers” that could handle text processing tasks, and be extensible:

Code:

type OperationType string // stringly-typed
type ProcessConfig map[string]any // a grab bag
type ProcessResult struct {
    Data interface{} `json:"data"` // returns anything
    Type string      `json:"type"` // requires interpretation
}
 
// Process executes any text processing operation based on the operation type.
// The config parameter requirements vary by operation.
// Returns ProcessResult whose structure depends on the operation.
func Process(input string, operation OperationType, config ProcessConfig) (*ProcessResult, error)
 
// Apply runs a processing pipeline on text.
// Each step is executed in sequence with the output of one feeding the next.
// Returns a PipelineResult whose structure varies based on the final step.
func Apply(text string, steps []PipelineStep) (*PipelineResult, error)
 
// Execute performs text operations based on natural language commands.
// Interprets commands like "make this shorter" or "find key points".
// Returns an ExecutionResult with command-specific structure.
func Execute(content string, command NaturalCommand) (*ExecutionResult, error)
 
// Transform modifies text based on the provided ruleset.
// Rules can be regex patterns, templates, or custom transformers.
// Output structure depends on the rule type applied.
func Transform(input TextInput, rules TransformRules) (*TransformOutput, error)
 
// RegisterExtension adds new processing capabilities at runtime.
// Extensions are identified by ExtensionID like "sentiment" or "translate.spanish".
func RegisterExtension(id ExtensionID, handler ExtensionHandler) errorCode language: Go (go)

With unconstrained flexibility and ambiguous interface definitions, this design creates uncertainty for LLM usage:

  • Tool selection token burn – With Process, Execute, and Transform all modifying text, LLMs waste tokens reasoning through which to use and explaining their choice to users
  • Configuration discovery overhead – ProcessConfig map[string]any forces trial-and-error. Each failed attempt adds request + error + retry reasoning to the context window
  • Verbose operation explanations – Since OperationType values like “enhance” have no clear meaning, LLMs burn tokens explaining what they think might happen
  • Compound ambiguity costs – Execute with natural language commands like “make it better” requires tokens for: interpreting the command + explaining uncertainty + handling unpredictable results
  • Runtime capability confusion – RegisterExtension means LLMs can’t cache which operations exist, requiring fresh discovery and adding explanation tokens each session
  • Error cascade verbosity – When Process fails due to wrong config keys, the LLM needs tokens to explain the error, guess correct parameters, and retry
  • Result interpretation overhead – With ProcessResult.Data interface{}, LLMs waste tokens explaining what type of data they received and how they’re interpreting it, instead of just presenting results

Note

The flexible design undermines reliability at every stage: selection (ambiguous boundaries), execution (unpredictable behavior), and integration (inconsistent outputs).

Bruno’s DocInspector: bring my own APIs, all of them

Bruno built DocInspector. He already had a comprehensive REST API for document analytics, so his intention was simple: be efficient and reuse all 47 existing endpoints by mapping each one directly to an MCP tool. Why reinvent the wheel when the APIs were already tested and deployed?

Code:

// DocumentWordCount returns the total word count
func DocumentWordCount(path string) (int, error)
 
// DocumentCharacterCount returns the total character count
func DocumentCharacterCount(path string) (int, error)
 
// DocumentLineCount returns the total line count
func DocumentLineCount(path string) (int, error)
 
// DocumentParagraphCount returns the total paragraph count
func DocumentParagraphCount(path string) (int, error)
 
// DocumentHeading1Count returns count of H1 headings
func DocumentHeading1Count(path string) (int, error)
 
// DocumentHeading2Count returns count of H2 headings
func DocumentHeading2Count(path string) (int, error)
 
// DocumentCodeBlockCount returns count of code blocks
func DocumentCodeBlockCount(path string) (int, error)
 
// DocumentPythonCodeBlockCount returns count of Python code blocks
func DocumentPythonCodeBlockCount(path string) (int, error)
 
// DocumentJavaScriptCodeBlockCount returns count of JavaScript code blocks
func DocumentJavaScriptCodeBlockCount(path string) (int, error)
 
// CheckDocumentHasTableOfContents returns true if doc has a TOC
func CheckDocumentHasTableOfContents(path string) (bool, error)
 
// CheckDocumentHasIntroduction returns true if doc has an intro section
func CheckDocumentHasIntroduction(path string) (bool, error)
 
// DocumentReadingTimeInSeconds returns estimated reading time
func DocumentReadingTimeInSeconds(path string) (int, error)
 
// DocumentFleschScore returns Flesch readability score
func DocumentFleschScore(path string) (float64, error)
 
// ... 30+ more ultra-specific toolsCode language: Go (go)

With excessive granularity and fragmented operations, this design creates orchestration overhead for LLM usage:

  • Tool discovery overhead – Before executing any request, LLMs must parse and evaluate 50+ tool definitions (each 100-200 tokens), consuming 5,000-10,000 tokens just to understand available capabilities
  • Orchestration complexity – A “document analysis” request requires the LLM to construct a directed acyclic graph of 15-30 tool calls, reason about dependencies, and maintain execution order
  • Context window exhaustion – Each tool call adds ~200-500 tokens (request + response + reasoning). A 20-tool sequence consumes 4,000-10,000 tokens in intermediate state alone, leaving insufficient room for document content
  • Latency amplification – Serial dependencies prevent parallelization. If ReadingTimeInSeconds depends on WordCount, and FleschScore depends on both, you have a critical path of 3+ sequential round-trips
  • High failure surface area – More tools mean more failure points. When tool 15 of 20 fails, the LLM must decide whether partial results are acceptable or if the entire analysis is compromised
  • Semantic ambiguity at scale – LLMs must infer relationships between DocumentHeading1Count, DocumentHeading2Count…DocumentHeading6Count versus a single DocumentHeadingStructure that returns hierarchical data

Note

The atomic design degrades user experience at every stage: selection (overwhelming choices), execution (excessive latency), and integration (fragmented results).

Eduardo’s DocQualityAdvisor: why so picky?

Eduardo built DocQualityAdvisor. He had one precise intention: help developers understand and improve their documentation quality with clear, actionable feedback, so that their tools become reliably discoverable and easy to use:

Code:

// AnalyzeDocumentationQuality examines technical documentation for common quality issues
// including missing sections, unclear explanations, and structural problems.
// Use this to get specific improvement recommendations for developer documentation.
// Returns a QualityReport with scored issues and actionable suggestions for each problem found.
func AnalyzeDocumentationQuality(path string) (QualityReport, error)
 
// CheckLinkValidity validates all links in documentation, identifying broken links,
// redirects, and invalid anchor references. Use this to ensure all references work correctly.
// Set checkExternal to true to also validate external URLs (slower but more thorough).
// Returns a LinkReport containing broken links, redirect chains, and anchor mismatches.
func CheckLinkValidity(path string, checkExternal bool) (LinkReport, error)
 
// CalculateReadabilityMetrics analyzes text complexity and reading difficulty.
// Use this to ensure documentation matches your target audience's reading level.
// Returns grade level, estimated reading time, and technical jargon density.
func CalculateReadabilityMetrics(path string) (ReadabilityMetrics, error)
 
// ExtractDocumentStructure parses the hierarchical organization of a document.
// Use this to analyze navigation flow and identify structural issues like
// missing sections or imbalanced content depth.
// Returns a DocumentStructure with heading hierarchy, section balance, and navigation tree.
func ExtractDocumentStructure(path string) (DocumentStructure, error)Code language: Go (go)

With intentional constraints and boundaries, this design optimizes for LLM efficiency:

  • Deterministic tool selection – Names like CheckLinkValidity and CalculateReadabilityMetrics create clear decision boundaries, minimizing selection tokens because LLMs don’t need exploratory reasoning
  • Zero overlap design – Each tool owns exclusive functionality (links OR readability OR structure), preventing token waste because LLMs never compare similar tools
  • Predictable parameter contracts – Fixed, typed parameters like (path string, checkExternal bool) eliminate configuration discovery overhead because LLMs can see exactly what’s required from the schema
  • Structured return types – LinkReport, ReadabilityMetrics provide consistent schemas, allowing direct result formatting because LLMs know the exact structure in advance
  • Composability without coupling – Tools can be called independently or together, enabling simpler orchestration because there are no hidden dependencies between tools
  • Context window efficiency – Clear, focused tools need minimal description tokens while providing maximum clarity because each tool does one thing well

Note

The design optimizes for LLM effectiveness at every stage: selection (obvious choice), execution (predictable behavior), and integration (composable results).

Six months later

Monica’s FlexiServer:

  • 47 open GitHub issues: “Process returns string but AI expects array”, “What does operation=‘enhance’ actually do?”, “ProcessConfig completely undocumented”
  • Monica spends more time explaining parameter combinations than adding features
  • 3 frustrated blog posts: “Why I Gave Up on MCP After 2 Weeks”
  • Average token usage: 3x higher due to LLMs repeatedly guessing valid combinations

Bruno’s DocInspector:

  • 12 open issues: “Why does analyzing a document require 47 API calls?”, “Timing out after 30 seconds”
  • Bruno’s server works perfectly for direct API users, but fails for LLM interactions
  • One viral tweet: “MCP servers: Death by a thousand tool calls”
  • 90% of users only use the newly added AnalyzeDocument aggregate tool

Eduardo’s DocQualityAdvisor:

  • Integrated into 4 major documentation platforms
  • Pull request: “This is exactly what we needed, each tool does one thing perfectly!”
  • Featured in MCP showcase: “Example of thoughtful MCP server design that LLMs love”
  • Community fork expanded it following the same easy to follow The agentic intention framework

The key differences? Eduardo designed for agentic interaction. Monica designed for flexibility. Bruno designed for API reuse.

The pitfalls of unclear intention

After seeing Eduardo, Monica, and Bruno’s six-month outcomes, you might be ready to craft your own clear intention for your MCP servers and tools using The agentic intention framework.

But if you’re still unconvinced that unclear intentions create real problems, here are specific pitfalls across three critical categories. These aren’t theoretical concerns, they’re inevitable challenges for LLMs.

1) LLM interaction pitfalls

a) The tool granularity trap

Without clear intention, developers swing between extremes:

Too granular (Bruno): 50+ atomic tools force LLMs to orchestrate complex call sequences for simple requests.

Too broad (Monica): Ambiguous mega-tools leave LLMs guessing about parameters, outputs, and chaining behavior.

Just right (Eduardo): Each tool serves one specific need with clear boundaries.

b) The discovery paradox

The more flexible the tools, the less discoverable they become:

  • Monica’s “process any text” → When would an LLM choose this?
  • Bruno’s “get any metric” → Which of 50 tools to call?
  • Eduardo’s “find quality issues” → Clear purpose, obvious choice

The paradox: trying to be useful for everything makes you useful for nothing specific.

c) The composition breakdown

Unclear intention sabotages tool composition, imo MCP’s most powerful feature:

  • Monica: Unpredictable outputs (enhance → ??? → format → ???)
  • Bruno: Excessive orchestration (4 calls for what should be one)
  • Eduardo: Natural chaining (structured outputs enable parallelization)

Good composition requires:

  • Predictable outputs that become inputs
  • Independence (tool B doesn’t break if tool A fails)
  • Clear data flow (obvious what chains with what)

2) Technical design pitfalls

a) The context window waste

Every token counts, and unclear designs burn them recklessly:

Discovery: Bruno’s 50+ tool definitions consume 5,000+ tokens before work begins. Monica’s vague descriptions require lengthy explanations. Eduardo’s focused tools need minimal description.

Execution: Bruno needs 20 calls × 200 tokens = 4,000 tokens of orchestration. Monica multiplies tokens through failed attempts and retries. Eduardo uses single-purpose calls with predictable results.

Results: Bruno accumulates 20 small results. Monica requires explanation of ambiguous outputs. Eduardo’s structured results speak for themselves.

b) The hidden dependencies trap

Unclear intention creates non-obvious coupling:

  • Monica: Runtime registration (RegisterExtension before Process) creates fragile dependencies
  • Bruno: Each call depends on previous results, filling context with intermediate state
  • Eduardo: Every tool is self-contained and independently callable

The real cost isn’t the dependencies themselves, it’s forcing LLMs to become more orchestration engines instead of problem solvers, burning tokens on state management rather than solutions.

3) Evolution pitfalls

The feature creep spiral

Without clear intention, every user feature request seems reasonable:

Monica’s spiral:

  1. “Can Process handle JSON?” → Add JSON mode
  2. “What about YAML?” → Add YAML mode
  3. “Can it validate too?” → Add validation modes
  4. “Transform between formats?” → More modes

Result: 47 modes, none work reliably

Bruno’s spiral:

  1. “We need WordCount” → Add endpoint
  2. “Also need CharCount” → Add endpoint
  3. “And LineCount” → Add endpoint

Result: 50+ endpoints, terrible UX

Eduardo’s triage:

  1. “Can it fix the issues it finds?” → No, that’s a different intention
  2. “Add markdown-to-HTML conversion?” → No, that’s transformation not quality analysis
  3. “Add auto-translation?” → No, that’s content generation not quality checking
  4. “Add plagiarism detection?” → No, that’s content verification not documentation quality
  5. “Add SEO optimization?” → No, that’s marketing not documentation quality

Result: Focused tools that excel at their purpose

Upfront, clear intention is our defense against the feature creep spiral. It gives us clarity and permission to say “that’s a great idea for a different MCP server.”