<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="">
  
  <title>Griphcode | Blog</title>
  <subtitle>My person blog :)</subtitle>
  <link href="https://blog.griphcode.dev/rss.xml" rel="self" />
  <link href="https://blog.griphcode.dev/" />
  <updated>2026-05-03T00:00:00Z</updated>
  <id>https://blog.griphcode.dev/</id>
  <author>
    <name>griphcode</name>
  </author>
  <entry>
    <title>My Freebsd Experience</title>
    <link href="https://blog.griphcode.dev/posts/Freebsd-experience/" />
    <updated>2026-05-03T00:00:00Z</updated>
    <id>https://blog.griphcode.dev/posts/Freebsd-experience/</id>
    <content type="html">&lt;h1&gt;Why I switched to Freebsd on my laptop&lt;/h1&gt;
&lt;p&gt;well you might be wondering why i switched from Nixos to Freebsd, the answer mostly comes from storage and ram usage. I noticed a big difference in speed on my Thinkpad 13. When i used Nixos my ram usage was on 3gb out of 8gb on idle, but while using Freebsd it was a different story. Approximatly 1.5gb out of 8gb on idle.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;I really enjoyed the Freebsd installer, even though it was a bit of a hastle in the beginning it was really enjoyable and something i really like about Freebsd is the command that let&#39;s you configure network, package channels and different users.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;bsdconfig
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Freebsd&#39;s native support for ZFS is also great. (To be continued)&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Claude api usage</title>
    <link href="https://blog.griphcode.dev/posts/Claude-api-usage/" />
    <updated>2026-05-03T00:00:00Z</updated>
    <id>https://blog.griphcode.dev/posts/Claude-api-usage/</id>
    <content type="html">&lt;h1&gt;The Complete Developer&#39;s Guide to the Claude API (2026)&lt;/h1&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;A practical, no-fluff guide to integrating Anthropic&#39;s Claude into your applications — from your first API call to production-grade patterns.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href=&quot;https://blog.griphcode.dev/posts/Claude-api-usage/#what-is-the-claude-api&quot;&gt;What Is the Claude API?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://blog.griphcode.dev/posts/Claude-api-usage/#understanding-models-and-pricing&quot;&gt;Understanding Models and Pricing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://blog.griphcode.dev/posts/Claude-api-usage/#getting-your-api-key&quot;&gt;Getting Your API Key&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://blog.griphcode.dev/posts/Claude-api-usage/#your-first-api-call&quot;&gt;Your First API Call&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://blog.griphcode.dev/posts/Claude-api-usage/#the-messages-api--core-patterns&quot;&gt;The Messages API — Core Patterns&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://blog.griphcode.dev/posts/Claude-api-usage/#system-prompts&quot;&gt;System Prompts&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://blog.griphcode.dev/posts/Claude-api-usage/#multi-turn-conversations&quot;&gt;Multi-Turn Conversations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://blog.griphcode.dev/posts/Claude-api-usage/#streaming-responses&quot;&gt;Streaming Responses&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://blog.griphcode.dev/posts/Claude-api-usage/#tool-use-function-calling&quot;&gt;Tool Use (Function Calling)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://blog.griphcode.dev/posts/Claude-api-usage/#prompt-caching--cut-costs-by-90&quot;&gt;Prompt Caching — Cut Costs by 90%&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://blog.griphcode.dev/posts/Claude-api-usage/#batch-processing--cut-costs-by-50&quot;&gt;Batch Processing — Cut Costs by 50%&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://blog.griphcode.dev/posts/Claude-api-usage/#vision-and-multimodal-inputs&quot;&gt;Vision and Multimodal Inputs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://blog.griphcode.dev/posts/Claude-api-usage/#error-handling-and-reliability&quot;&gt;Error Handling and Reliability&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://blog.griphcode.dev/posts/Claude-api-usage/#production-best-practices&quot;&gt;Production Best Practices&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://blog.griphcode.dev/posts/Claude-api-usage/#cost-optimization-summary&quot;&gt;Cost Optimization Summary&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;hr&gt;
&lt;h2&gt;What Is the Claude API?&lt;/h2&gt;
&lt;p&gt;The Claude API is a RESTful HTTP interface hosted at &lt;code&gt;https://api.anthropic.com&lt;/code&gt; that gives you programmatic access to Anthropic&#39;s family of Claude language models. Unlike using Claude through claude.ai (a subscription product), the API is billed per token — meaning you pay only for exactly what you use.&lt;/p&gt;
&lt;p&gt;The API is centered around a single, unified endpoint — the &lt;strong&gt;Messages API&lt;/strong&gt; — which handles everything from simple question-answering to complex agentic workflows with tool calling, file analysis, and long-context reasoning.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Who should use it?&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Developers building AI-powered applications&lt;/li&gt;
&lt;li&gt;Teams integrating Claude into internal tooling&lt;/li&gt;
&lt;li&gt;Engineers running high-volume, automated pipelines&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2&gt;Understanding Models and Pricing&lt;/h2&gt;
&lt;p&gt;Anthropic uses a three-tier model family. Choosing the right tier is the single most impactful cost and performance decision you will make.&lt;/p&gt;
&lt;h3&gt;Current Model Tiers (May 2026)&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;API String&lt;/th&gt;
&lt;th&gt;Input (per MTok)&lt;/th&gt;
&lt;th&gt;Output (per MTok)&lt;/th&gt;
&lt;th&gt;Context&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Claude Haiku 4.5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;claude-haiku-4-5-20251001&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;$1.00&lt;/td&gt;
&lt;td&gt;$5.00&lt;/td&gt;
&lt;td&gt;200K&lt;/td&gt;
&lt;td&gt;High-volume, low-latency tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Claude Sonnet 4.6&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;claude-sonnet-4-6&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;$3.00&lt;/td&gt;
&lt;td&gt;$15.00&lt;/td&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;Balanced: most production workloads&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Claude Opus 4.6&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;claude-opus-4-6&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;$5.00&lt;/td&gt;
&lt;td&gt;$25.00&lt;/td&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;Complex reasoning, agentic workflows&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;blockquote&gt;
&lt;p&gt;MTok = Million Tokens. A token is roughly 4 characters or 0.75 words in English.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;What Is a Token?&lt;/h3&gt;
&lt;p&gt;This is foundational. The API does not charge per request — it charges per token. Every character you send (your prompt) and every character Claude generates (the response) is counted.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;quot;Hello, world!&amp;quot; ≈ 4 tokens
A typical paragraph ≈ 75–100 tokens
A 10-page document ≈ 2,000–3,000 tokens
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Input tokens (what you send) and output tokens (what Claude returns) are billed separately, with output consistently more expensive. This reflects the additional compute required to generate tokens versus reading them.&lt;/p&gt;
&lt;h3&gt;Model Selection Strategy&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Use Haiku 4.5&lt;/strong&gt; for classification, routing, summarization, extraction, and any task requiring sub-second latency at scale.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Use Sonnet 4.6&lt;/strong&gt; for the vast majority of production workloads — coding, customer assistants, document analysis. It handles over 90% of tasks without compromise.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Use Opus 4.6&lt;/strong&gt; when you need the absolute best reasoning — legal analysis, complex multi-step agents, advanced coding tasks.&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Rule of thumb:&lt;/strong&gt; Start with Sonnet 4.6. Drop to Haiku if latency or cost is a constraint. Escalate to Opus only if output quality falls short.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h2&gt;Getting Your API Key&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;Go to &lt;a href=&quot;https://console.anthropic.com&quot;&gt;console.anthropic.com&lt;/a&gt; and create an account.&lt;/li&gt;
&lt;li&gt;Navigate to &lt;strong&gt;API Keys&lt;/strong&gt; in your account settings.&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Create Key&lt;/strong&gt;, give it a descriptive name (e.g., &lt;code&gt;prod-app&lt;/code&gt;, &lt;code&gt;dev-testing&lt;/code&gt;), and copy it immediately.&lt;/li&gt;
&lt;/ol&gt;
&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;Critical:&lt;/strong&gt; Your API key is displayed only once. Store it securely — in an environment variable, a secrets manager, or a &lt;code&gt;.env&lt;/code&gt; file. Never hardcode it in source code.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;# Store it as an environment variable
export ANTHROPIC_API_KEY=&amp;quot;sk-ant-...&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Use workspaces to segment API keys by environment (dev, staging, prod) for cleaner billing and access control.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Your First API Call&lt;/h2&gt;
&lt;p&gt;The simplest way to test the API is with &lt;code&gt;curl&lt;/code&gt;. This sends a single message to Claude and returns a JSON response.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;curl https://api.anthropic.com/v1/messages &#92;
  -H &amp;quot;x-api-key: $ANTHROPIC_API_KEY&amp;quot; &#92;
  -H &amp;quot;anthropic-version: 2023-06-01&amp;quot; &#92;
  -H &amp;quot;content-type: application/json&amp;quot; &#92;
  -d &#39;{
    &amp;quot;model&amp;quot;: &amp;quot;claude-sonnet-4-6&amp;quot;,
    &amp;quot;max_tokens&amp;quot;: 1024,
    &amp;quot;messages&amp;quot;: [
      {&amp;quot;role&amp;quot;: &amp;quot;user&amp;quot;, &amp;quot;content&amp;quot;: &amp;quot;Explain the concept of recursion in one paragraph.&amp;quot;}
    ]
  }&#39;
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Breaking Down the Request&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;x-api-key&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Authentication — your API key&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;anthropic-version&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;API versioning — always &lt;code&gt;2023-06-01&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;content-type&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Tells the server to parse the body as JSON&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;model&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Which Claude model to use&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;max_tokens&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Hard cap on output length — prevents runaway costs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;messages&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The conversation — an array of &lt;code&gt;role&lt;/code&gt;/&lt;code&gt;content&lt;/code&gt; pairs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3&gt;The Response&lt;/h3&gt;
&lt;pre&gt;&lt;code class=&quot;language-json&quot;&gt;{
  &amp;quot;id&amp;quot;: &amp;quot;msg_01XFDUDYJgAACzvnptvVoYEL&amp;quot;,
  &amp;quot;type&amp;quot;: &amp;quot;message&amp;quot;,
  &amp;quot;role&amp;quot;: &amp;quot;assistant&amp;quot;,
  &amp;quot;content&amp;quot;: [
    {
      &amp;quot;type&amp;quot;: &amp;quot;text&amp;quot;,
      &amp;quot;text&amp;quot;: &amp;quot;Recursion is a programming technique where a function calls itself...&amp;quot;
    }
  ],
  &amp;quot;model&amp;quot;: &amp;quot;claude-sonnet-4-6&amp;quot;,
  &amp;quot;stop_reason&amp;quot;: &amp;quot;end_turn&amp;quot;,
  &amp;quot;usage&amp;quot;: {
    &amp;quot;input_tokens&amp;quot;: 20,
    &amp;quot;output_tokens&amp;quot;: 85
  }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;usage&lt;/code&gt; field is important — it tells you exactly how many tokens were consumed, which maps directly to your cost.&lt;/p&gt;
&lt;h3&gt;Using the Python SDK&lt;/h3&gt;
&lt;p&gt;For production applications, Anthropic&#39;s official Python SDK is the recommended approach. It handles authentication, request formatting, retries, and error parsing automatically.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;pip install anthropic
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;import anthropic

client = anthropic.Anthropic()  # Reads ANTHROPIC_API_KEY from environment

message = client.messages.create(
    model=&amp;quot;claude-sonnet-4-6&amp;quot;,
    max_tokens=1024,
    messages=[
        {&amp;quot;role&amp;quot;: &amp;quot;user&amp;quot;, &amp;quot;content&amp;quot;: &amp;quot;Explain the concept of recursion in one paragraph.&amp;quot;}
    ]
)

print(message.content[0].text)
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Using the Node.js SDK&lt;/h3&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;npm install @anthropic-ai/sdk
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code class=&quot;language-javascript&quot;&gt;import Anthropic from &amp;quot;@anthropic-ai/sdk&amp;quot;;

const client = new Anthropic(); // Reads ANTHROPIC_API_KEY from environment

const message = await client.messages.create({
  model: &amp;quot;claude-sonnet-4-6&amp;quot;,
  max_tokens: 1024,
  messages: [
    { role: &amp;quot;user&amp;quot;, content: &amp;quot;Explain the concept of recursion in one paragraph.&amp;quot; }
  ],
});

console.log(message.content[0].text);
&lt;/code&gt;&lt;/pre&gt;
&lt;hr&gt;
&lt;h2&gt;The Messages API — Core Patterns&lt;/h2&gt;
&lt;p&gt;The Messages API is built around a conversation model. Every request contains a &lt;code&gt;messages&lt;/code&gt; array where each item has a &lt;code&gt;role&lt;/code&gt; (&lt;code&gt;user&lt;/code&gt; or &lt;code&gt;assistant&lt;/code&gt;) and &lt;code&gt;content&lt;/code&gt;.&lt;/p&gt;
&lt;h3&gt;Key Parameters&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Parameter&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;model&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;string&lt;/td&gt;
&lt;td&gt;The Claude model to use&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;max_tokens&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;integer&lt;/td&gt;
&lt;td&gt;Maximum tokens to generate (required)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;messages&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;array&lt;/td&gt;
&lt;td&gt;Conversation history&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;system&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;string&lt;/td&gt;
&lt;td&gt;System prompt (instructions for Claude)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;temperature&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;float&lt;/td&gt;
&lt;td&gt;Randomness — 0 (deterministic) to 1 (creative)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;stop_sequences&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;array&lt;/td&gt;
&lt;td&gt;Strings that stop generation when encountered&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3&gt;Stop Reasons&lt;/h3&gt;
&lt;p&gt;The response&#39;s &lt;code&gt;stop_reason&lt;/code&gt; field tells you why generation ended:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;end_turn&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Claude finished naturally&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;max_tokens&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Hit the &lt;code&gt;max_tokens&lt;/code&gt; limit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;stop_sequence&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;A stop sequence was triggered&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tool_use&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Claude wants to call a tool&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;If you see &lt;code&gt;max_tokens&lt;/code&gt; frequently, your limit is too low for the task.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;System Prompts&lt;/h2&gt;
&lt;p&gt;A system prompt is a set of persistent instructions that define Claude&#39;s behavior, persona, and constraints for the entire conversation. It is set once and does not appear in the &lt;code&gt;messages&lt;/code&gt; array.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;message = client.messages.create(
    model=&amp;quot;claude-sonnet-4-6&amp;quot;,
    max_tokens=1024,
    system=&amp;quot;You are a senior software engineer specializing in Python. &amp;quot;
           &amp;quot;Always provide concise, idiomatic code. &amp;quot;
           &amp;quot;When explaining code, focus on the &#39;why&#39; not just the &#39;what&#39;.&amp;quot;,
    messages=[
        {&amp;quot;role&amp;quot;: &amp;quot;user&amp;quot;, &amp;quot;content&amp;quot;: &amp;quot;How do I read a CSV file efficiently?&amp;quot;}
    ]
)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Best practices for system prompts:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Be explicit about the persona, tone, and output format&lt;/li&gt;
&lt;li&gt;Define what Claude should and should not do&lt;/li&gt;
&lt;li&gt;Include examples if the task is complex or has edge cases&lt;/li&gt;
&lt;li&gt;Keep it focused — bloated system prompts add cost to every call&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2&gt;Multi-Turn Conversations&lt;/h2&gt;
&lt;p&gt;To maintain a conversation, you pass the entire history with each request. The API is stateless — it has no memory between calls. You are responsible for managing and passing the conversation state.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;conversation_history = []

def chat(user_message: str) -&amp;gt; str:
    conversation_history.append({
        &amp;quot;role&amp;quot;: &amp;quot;user&amp;quot;,
        &amp;quot;content&amp;quot;: user_message
    })

    response = client.messages.create(
        model=&amp;quot;claude-sonnet-4-6&amp;quot;,
        max_tokens=1024,
        system=&amp;quot;You are a helpful coding assistant.&amp;quot;,
        messages=conversation_history
    )

    assistant_message = response.content[0].text

    conversation_history.append({
        &amp;quot;role&amp;quot;: &amp;quot;assistant&amp;quot;,
        &amp;quot;content&amp;quot;: assistant_message
    })

    return assistant_message

# Usage
print(chat(&amp;quot;What is a Python decorator?&amp;quot;))
print(chat(&amp;quot;Can you show me an example?&amp;quot;))
print(chat(&amp;quot;How does that differ from a closure?&amp;quot;))
&lt;/code&gt;&lt;/pre&gt;
&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;Token cost grows with conversation length.&lt;/strong&gt; Every turn re-sends the full history. For long conversations, consider summarizing older turns to reduce costs.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h2&gt;Streaming Responses&lt;/h2&gt;
&lt;p&gt;By default, the API waits until Claude finishes generating the full response before returning it. For user-facing applications, this creates a noticeable lag. Streaming sends tokens back as they are generated, creating the real-time &amp;quot;typing&amp;quot; effect.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;with client.messages.stream(
    model=&amp;quot;claude-sonnet-4-6&amp;quot;,
    max_tokens=1024,
    messages=[{&amp;quot;role&amp;quot;: &amp;quot;user&amp;quot;, &amp;quot;content&amp;quot;: &amp;quot;Write a short story about a robot.&amp;quot;}]
) as stream:
    for text in stream.text_stream:
        print(text, end=&amp;quot;&amp;quot;, flush=True)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Streaming is strongly recommended for any interactive, user-facing interface.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Tool Use (Function Calling)&lt;/h2&gt;
&lt;p&gt;Tool use allows you to define functions that Claude can invoke. Instead of trying to answer a question directly, Claude can request to call a tool with structured arguments, and your code executes it and returns the result.&lt;/p&gt;
&lt;p&gt;This is the foundation of agentic applications.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;tools = [
    {
        &amp;quot;name&amp;quot;: &amp;quot;get_weather&amp;quot;,
        &amp;quot;description&amp;quot;: &amp;quot;Get the current weather for a given city.&amp;quot;,
        &amp;quot;input_schema&amp;quot;: {
            &amp;quot;type&amp;quot;: &amp;quot;object&amp;quot;,
            &amp;quot;properties&amp;quot;: {
                &amp;quot;city&amp;quot;: {
                    &amp;quot;type&amp;quot;: &amp;quot;string&amp;quot;,
                    &amp;quot;description&amp;quot;: &amp;quot;The name of the city, e.g. &#39;Stockholm&#39;&amp;quot;
                }
            },
            &amp;quot;required&amp;quot;: [&amp;quot;city&amp;quot;]
        }
    }
]

response = client.messages.create(
    model=&amp;quot;claude-sonnet-4-6&amp;quot;,
    max_tokens=1024,
    tools=tools,
    messages=[
        {&amp;quot;role&amp;quot;: &amp;quot;user&amp;quot;, &amp;quot;content&amp;quot;: &amp;quot;What&#39;s the weather like in Tokyo?&amp;quot;}
    ]
)

# Claude will respond with a tool_use block instead of text
if response.stop_reason == &amp;quot;tool_use&amp;quot;:
    tool_call = next(b for b in response.content if b.type == &amp;quot;tool_use&amp;quot;)
    print(f&amp;quot;Claude wants to call: {tool_call.name}&amp;quot;)
    print(f&amp;quot;With arguments: {tool_call.input}&amp;quot;)
    # -&amp;gt; {&amp;quot;city&amp;quot;: &amp;quot;Tokyo&amp;quot;}
    
    # Your code executes the function and passes the result back
    result = get_weather(tool_call.input[&amp;quot;city&amp;quot;])  # Your actual function
    
    # Continue the conversation with the tool result
    final_response = client.messages.create(
        model=&amp;quot;claude-sonnet-4-6&amp;quot;,
        max_tokens=1024,
        tools=tools,
        messages=[
            {&amp;quot;role&amp;quot;: &amp;quot;user&amp;quot;, &amp;quot;content&amp;quot;: &amp;quot;What&#39;s the weather like in Tokyo?&amp;quot;},
            {&amp;quot;role&amp;quot;: &amp;quot;assistant&amp;quot;, &amp;quot;content&amp;quot;: response.content},
            {
                &amp;quot;role&amp;quot;: &amp;quot;user&amp;quot;,
                &amp;quot;content&amp;quot;: [{
                    &amp;quot;type&amp;quot;: &amp;quot;tool_result&amp;quot;,
                    &amp;quot;tool_use_id&amp;quot;: tool_call.id,
                    &amp;quot;content&amp;quot;: str(result)
                }]
            }
        ]
    )
    print(final_response.content[0].text)
&lt;/code&gt;&lt;/pre&gt;
&lt;hr&gt;
&lt;h2&gt;Prompt Caching — Cut Costs by 90%&lt;/h2&gt;
&lt;p&gt;Prompt caching is one of the most powerful cost-reduction features available. If you repeatedly send the same large content — a system prompt, document, or tool definition — Claude can cache that content and read from cache on subsequent calls at a fraction of the cost.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Cache hit pricing on Sonnet 4.6: $0.30/MTok vs. $3.00/MTok standard (90% reduction)&lt;/strong&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;response = client.messages.create(
    model=&amp;quot;claude-sonnet-4-6&amp;quot;,
    max_tokens=1024,
    system=[
        {
            &amp;quot;type&amp;quot;: &amp;quot;text&amp;quot;,
            &amp;quot;text&amp;quot;: &amp;quot;You are a legal document analyst. &amp;quot; + legal_guidelines,  # Large doc
            &amp;quot;cache_control&amp;quot;: {&amp;quot;type&amp;quot;: &amp;quot;ephemeral&amp;quot;}  # Cache this block
        }
    ],
    messages=[
        {&amp;quot;role&amp;quot;: &amp;quot;user&amp;quot;, &amp;quot;content&amp;quot;: &amp;quot;Summarize section 3 of the contract.&amp;quot;}
    ]
)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;When caching saves the most:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Long system prompts (&amp;gt;1,000 tokens) sent on every request&lt;/li&gt;
&lt;li&gt;Document analysis where the same document is queried multiple times&lt;/li&gt;
&lt;li&gt;Tool definitions for large tool schemas&lt;/li&gt;
&lt;li&gt;RAG applications where context documents repeat&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The first call pays a cache &lt;strong&gt;write&lt;/strong&gt; cost (1.25x standard rate for a 5-minute TTL). All subsequent calls within that TTL pay only 0.1x the standard rate — a 90% reduction.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Batch Processing — Cut Costs by 50%&lt;/h2&gt;
&lt;p&gt;For workloads that do not require real-time responses, the Batch API processes requests asynchronously and returns results within 24 hours at 50% off standard pricing.&lt;/p&gt;
&lt;p&gt;This is ideal for: data pipelines, bulk document analysis, offline report generation, and evaluation runs.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;batch = client.messages.batches.create(
    requests=[
        {
            &amp;quot;custom_id&amp;quot;: f&amp;quot;request-{i}&amp;quot;,
            &amp;quot;params&amp;quot;: {
                &amp;quot;model&amp;quot;: &amp;quot;claude-sonnet-4-6&amp;quot;,
                &amp;quot;max_tokens&amp;quot;: 1024,
                &amp;quot;messages&amp;quot;: [{&amp;quot;role&amp;quot;: &amp;quot;user&amp;quot;, &amp;quot;content&amp;quot;: doc}]
            }
        }
        for i, doc in enumerate(documents)
    ]
)

print(f&amp;quot;Batch ID: {batch.id}&amp;quot;)  # Poll this later for results
&lt;/code&gt;&lt;/pre&gt;
&lt;hr&gt;
&lt;h2&gt;Vision and Multimodal Inputs&lt;/h2&gt;
&lt;p&gt;Claude can analyze images, PDFs, and other documents alongside text. Pass them as base64-encoded content in the messages array.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;import base64

with open(&amp;quot;diagram.png&amp;quot;, &amp;quot;rb&amp;quot;) as f:
    image_data = base64.standard_b64encode(f.read()).decode(&amp;quot;utf-8&amp;quot;)

response = client.messages.create(
    model=&amp;quot;claude-sonnet-4-6&amp;quot;,
    max_tokens=1024,
    messages=[
        {
            &amp;quot;role&amp;quot;: &amp;quot;user&amp;quot;,
            &amp;quot;content&amp;quot;: [
                {
                    &amp;quot;type&amp;quot;: &amp;quot;image&amp;quot;,
                    &amp;quot;source&amp;quot;: {
                        &amp;quot;type&amp;quot;: &amp;quot;base64&amp;quot;,
                        &amp;quot;media_type&amp;quot;: &amp;quot;image/png&amp;quot;,
                        &amp;quot;data&amp;quot;: image_data
                    }
                },
                {
                    &amp;quot;type&amp;quot;: &amp;quot;text&amp;quot;,
                    &amp;quot;text&amp;quot;: &amp;quot;Describe the architecture shown in this diagram.&amp;quot;
                }
            ]
        }
    ]
)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Supported formats: &lt;code&gt;image/jpeg&lt;/code&gt;, &lt;code&gt;image/png&lt;/code&gt;, &lt;code&gt;image/gif&lt;/code&gt;, &lt;code&gt;image/webp&lt;/code&gt;, and PDFs via &lt;code&gt;application/pdf&lt;/code&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Error Handling and Reliability&lt;/h2&gt;
&lt;p&gt;The API returns structured HTTP error codes. Your application must handle these gracefully.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;HTTP Status&lt;/th&gt;
&lt;th&gt;Error&lt;/th&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;400&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;invalid_request_error&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Fix your request — bad parameters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;401&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;authentication_error&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Check your API key&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;403&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;permission_error&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Check key permissions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;429&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;rate_limit_error&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Back off and retry&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;500&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;api_error&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Retry with exponential backoff&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;529&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;overloaded_error&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Retry — Anthropic is at capacity&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;import anthropic
import time

def call_with_retry(client, max_retries=3, **kwargs):
    for attempt in range(max_retries):
        try:
            return client.messages.create(**kwargs)
        except anthropic.RateLimitError:
            wait = 2 ** attempt  # Exponential backoff: 1s, 2s, 4s
            print(f&amp;quot;Rate limited. Retrying in {wait}s...&amp;quot;)
            time.sleep(wait)
        except anthropic.APIError as e:
            if e.status_code &amp;gt;= 500:
                time.sleep(2 ** attempt)
            else:
                raise  # Don&#39;t retry 4xx errors — they won&#39;t fix themselves
    raise Exception(&amp;quot;Max retries exceeded&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;hr&gt;
&lt;h2&gt;Production Best Practices&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;1. Always set &lt;code&gt;max_tokens&lt;/code&gt; intentionally.&lt;/strong&gt;
Do not set it to an arbitrarily large number. Know what your task requires and cap it there.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. Use environment variables for API keys.&lt;/strong&gt;
Never hardcode credentials. Use &lt;code&gt;python-dotenv&lt;/code&gt;, AWS Secrets Manager, or similar tooling.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3. Log token usage on every request.&lt;/strong&gt;
Instrument &lt;code&gt;response.usage.input_tokens&lt;/code&gt; and &lt;code&gt;response.usage.output_tokens&lt;/code&gt; as first-class metrics to catch cost anomalies early.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4. Wrap the API in a service layer.&lt;/strong&gt;
Do not scatter raw API calls throughout your codebase. Centralize them in a service module that enforces token budgets, handles retries, and logs all interactions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;5. Use separate API keys per environment.&lt;/strong&gt;
Dev, staging, and production should each have isolated API keys and workspace budgets.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;6. Implement circuit breakers for agentic workflows.&lt;/strong&gt;
If Claude is calling tools in a loop, enforce a maximum number of tool cycles and exit gracefully with a user-visible message if exceeded.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;7. Pick the right model, then profile.&lt;/strong&gt;
Start with Sonnet 4.6 as your default. Profile latency and cost in production. Drop to Haiku for tasks that don&#39;t need Sonnet-level intelligence.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Cost Optimization Summary&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Strategy&lt;/th&gt;
&lt;th&gt;Savings Potential&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Use Haiku instead of Sonnet for simple tasks&lt;/td&gt;
&lt;td&gt;~67%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prompt caching for repeated large contexts&lt;/td&gt;
&lt;td&gt;Up to 90%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Batch API for non-real-time workloads&lt;/td&gt;
&lt;td&gt;50%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prompt caching + Batch API combined&lt;/td&gt;
&lt;td&gt;Up to 95%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Trim unnecessary content from prompts&lt;/td&gt;
&lt;td&gt;Variable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Use streaming to reduce perceived latency (not cost)&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;hr&gt;
&lt;h2&gt;Further Reading&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.anthropic.com&quot;&gt;Official API Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://platform.claude.com/docs/en/api/overview&quot;&gt;Claude API Overview&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview&quot;&gt;Prompt Engineering Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.anthropic.com/en/docs/build-with-claude/tool-use&quot;&gt;Tool Use Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching&quot;&gt;Prompt Caching Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.anthropic.com/en/docs/build-with-claude/batch-processing&quot;&gt;Batch API Documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;p&gt;&lt;em&gt;Last updated: May 2026 | Pricing verified against official Anthropic documentation&lt;/em&gt;&lt;/p&gt;
</content>
  </entry>
</feed>