Skip to content
Compresr docs

Quick start

Send your first compression request in 30 seconds.

By the end of this page you will have made your first compressed request and seen the token savings.

  1. Install the SDK

    Install the client for your language. The cURL path needs nothing - it ships with your OS.

    bash
  2. Get an API key

    Create a key in the dashboard, copy its cmp_... value, and export it as COMPRESR_API_KEY. See Authentication for the full key model and security guidance.

  3. Send a compression request

    Pass the long context you would otherwise send to your LLM, plus the query you want it to answer. Always set compression_model_name: "latte_v1".

    python
  4. Inspect the result

    The response contains the compressed text plus token-savings stats. This is the actual response from the live API for the call above (numbers will vary slightly run-to-run as duration_ms depends on load):

    text

    What the fields mean

    • compressed_context — the shortened text. Forward this to your LLM exactly as you would the original input.[N tokens dropped] markers show where spans were cut; pass disable_placeholders=true if you want a clean concatenation without them.
    • actual_compression_ratio — fraction of input tokens removed (here 0.6375 = ~64% removed). It is not an Nx factor.
    • target_compression_ratio — the value you asked for, echoed back. 0–1 = removal strength; >1 = Nx factor (max 200).
    • tokens_savedoriginal_tokens compressed_tokens.
    • duration_ms — server-side compression time. Network round-trip is on top of this.

What just happened?

latte_v1 scored every paragraph in your context against the query, kept the ones that answer it, and dropped the rest. The paragraph with the mirror-diameter sentence stayed; the L2 orbit, sunshield, launch, and operator paragraphs did not. You sent ~64% fewer input tokens to your downstream model without losing the answer.

Want a cleaner output without the [N tokens dropped] placeholders? Pass disable_placeholders=True. For finer-grained, sentence-level cuts inside a paragraph, pass coarse=False. See the models reference for the full parameter list.

Next steps

  • Python SDK - full method reference, async variants, streaming, batching.
  • TypeScript SDK - same surface, camelCase params.
  • cURL / HTTP - raw REST reference.
  • Models - tune target_compression_ratio and other latte-only options.
  • Agent client - drop-in for anthropic.Anthropic() / openai.OpenAI() with automatic tool-output compression.
  • Web search - add Tavily or Brave to your agent loop in one line.
  • LangChain integration — first-party middleware for tool outputs, history, and outbound prompts, plus a BaseDocumentCompressor for RAG.
  • LangGraph integration — state-graph node, lossy checkpoint serializer, store wrapper, and multi-agent handoff tool.
  • LlamaIndex integration — query-engine postprocessor, tool wrapper, and Memory API block.
  • LiteLLM integration — drop the compresr guardrail into the proxy and compress tool messages across every provider transparently.