Research

The research and patterns behind Papaya's optimization engine.

Papaya's 200+ analyses are grounded in current agent and LLM research — from context economy and trajectory health to model routing and verification design. Below: the research areas we draw from, the questions our engines ask, and the optimization patterns they look for in production runs.

Research areas

What we study.

Six of many — growing with every release.

01
Context economy
Detect what the model actually reads vs. what it's sent — informed by recent context-distillation research.
02
Tool-call quality
Score tools on consistency, signal-to-noise, and whether their output changes the model's behavior.
03
Verification design
Check whether self-checks actually catch failures the run produced — not just whether they ran.
04
Plan-and-execute shape
Identify reasoning that should be staged, parallelized, or collapsed based on observed run traces.
05
Retry pathology
Cluster retries by root cause: parsing, tool flakiness, or instruction conflicts.
06
Prompt-program structure
Surface scaffolding patterns that consistently outperform on similar workloads.

Sample questions

Sample questions we will answer.

01
Did the workflow actually complete the task?
Or did it claim success without doing the work the user asked for.
02
Are you sending the right context?
Critical fields the model needs — and bloat drowning what is there.
03
Is the model actually reading what you send?
Half the prompt may be invisible to the answer.
04
Are your tools returning the right amount?
Tool output that quietly pollutes the next step's context.
05
Are your tool calls at the right layer?
Some belong in a sub-task; others belong inline.
06
Are sub-agents redoing the parent's work?
Hand-offs dropping context, roles unclear.

Sample optimization patterns

200+ research-backed analyses checking your traces.

Papaya runs every analysis on your workflows. Individual findings are tailored to a specific workflow, while these pages explain the broader pattern, impact, and common fixes.

Missing context
Inputs the model needed to succeed weren't in the prompt or available via tools.
Unused context
Context was sent but never referenced — bloat without payoff.
Oversized context
Prompts so large the signal gets drowned or truncated downstream.
Repeated prompt fragments
The same instructions or examples sent over and over across steps.
Clarification burden
The agent keeps asking the user for things it could resolve itself.
Retry loops
The same step retried without changing inputs, tools, or strategy.
Truncation / resume churn
Runs cut off mid-thought and resumed with lossy state.
Tool misuse
Right tool, wrong arguments — or wrong tool for the job entirely.
Tool error loop
Repeated failing tool calls without escalation or fallback.
Composite tool opportunity
A sequence of tool calls that should be collapsed into one operation.
Mutation safety
Writes happening without confirmation, idempotency, or rollback.
State delta grounding
Decisions made on stale state instead of the latest observed change.
Output contract mismatch
Returned shape doesn't match what the caller — or the next step — expects.
Outcome cohort gap
Whole classes of runs missing from your evals or success metrics.
Evaluation coverage
Evals that don't exercise the failure modes production actually hits.
Model right-sizing
Over- or under-powered model choices for the step's real difficulty.
Delegated task overhead
Sub-agents adding latency and tokens without earning the hand-off.
Workflow ordering
Steps run in an order that forces rework or blocks parallelism.
Template opportunity
An ad-hoc workflow that recurs often enough to deserve a reusable template.
Missing verification gate
No structured check between a risky step and what it affects.

See it on your workflows

The fastest way to understand the research is to see it applied to your own runs.

Talk to us

The research and patterns behind Papaya's optimization engine.

What we study.

Context economy

Tool-call quality

Verification design

Plan-and-execute shape

Retry pathology

Prompt-program structure

Sample questions we will answer.

Did the workflow actually complete the task?

Are you sending the right context?

Is the model actually reading what you send?

Are your tools returning the right amount?

Are your tool calls at the right layer?

Are sub-agents redoing the parent's work?

200+ research-backed analyses checking your traces.

Missing context

Unused context

Oversized context

Repeated prompt fragments

Clarification burden

Retry loops

Truncation / resume churn

Tool misuse

Tool error loop

Composite tool opportunity

Mutation safety

State delta grounding

Output contract mismatch

Outcome cohort gap

Evaluation coverage

Model right-sizing

Delegated task overhead

Workflow ordering

Template opportunity

Missing verification gate

The fastest way to understand the research is to see it applied to your own runs.