Quick start

Send your first compression request in 30 seconds.

By the end of this page you will have made your first compressed request and seen the token savings.

Install the SDK
Install the client for your language. The cURL path needs nothing - it ships with your OS.
bash
Get an API key
Create a key in the dashboard, copy its cmp_... value, and export it as COMPRESR_API_KEY. See Authentication for the full key model and security guidance.
3
Send a compression request
Pass the long context you would otherwise send to your LLM, plus the query you want it to answer.
- query: what the LLM needs to answer. Compression keeps the tokens relevant to it, so be specific — good: "What was the project's Q3 churn rate?", bad: "churn" (no intent, degrades to generic compression). Learn more about queries in Query-specific compression.
- target_compression_ratio: how hard to compress, from 0 to 1 (e.g. 0.75 removes ~75% of tokens). Try a few values, or set dynamic=True to auto-pick. See the models reference.
python
4
Inspect the result
The response contains the compressed text plus token-savings stats. This is the actual response from the live API for the call above (numbers will vary slightly run-to-run as duration_ms depends on load):
text
What the fields mean
compressed_context: the shortened text. Forward this to your LLM exactly as you would the original input.[N tokens dropped] markers show where spans were cut; pass disable_placeholders=true if you want a clean concatenation without them.
actual_compression_ratio: fraction of input tokens removed (here 0.6375 = ~64% removed). It is not an Nx factor.
target_compression_ratio: the value you asked for, echoed back — a number from 0 to 1 giving the fraction of tokens to remove.
tokens_saved: original_tokens − compressed_tokens.
duration_ms: server-side compression time. Network round-trip is on top of this.

From here, see the models reference for everything you can tune, or wire Compresr into your stack with one of the integrations below.

Python SDK - full method reference, async variants, streaming, batching.
TypeScript SDK - same surface, camelCase params.
cURL / HTTP - raw REST reference.
Models - tune target_compression_ratio and other latte-only options.
Agent client - drop-in for anthropic.Anthropic() / openai.OpenAI() with automatic tool-output compression.
Web search - add Tavily or Brave to your agent loop in one line.
LangChain integration: first-party middleware for tool outputs, history, and outbound prompts, plus a BaseDocumentCompressor for RAG.
LangGraph integration: state-graph node, lossy checkpoint serializer, store wrapper, and multi-agent handoff tool.
LlamaIndex integration: query-engine postprocessor, tool wrapper, and Memory API block.
LiteLLM integration: drop the compresr guardrail into the proxy and compress tool messages across every provider transparently.