|
This version is still in development and is not considered stable yet. For the latest snapshot version, please use Spring AI 1.1.3! |
Anthropic Chat
Spring AI supports Anthropic’s Claude models through the official Anthropic Java SDK, providing access to Claude through Anthropic’s API.
Prerequisites
Create an account at the Anthropic Console and generate an API key on the API Keys page.
Add Repositories and BOM
Spring AI artifacts are published in Maven Central and Spring Snapshot repositories. Refer to the Artifact Repositories section to add these repositories to your build system.
To help with dependency management, Spring AI provides a BOM (bill of materials) to ensure that a consistent version of Spring AI is used throughout the entire project. Refer to the Dependency Management section to add the Spring AI BOM to your build system.
Auto-Configuration
Spring Boot auto-configuration is available via the spring-ai-starter-model-anthropic starter.
-
Maven
-
Gradle
Add it to your project’s Maven pom.xml file:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-model-anthropic</artifactId>
</dependency>
or to your Gradle build.gradle build file:
dependencies {
implementation 'org.springframework.ai:spring-ai-starter-model-anthropic'
}
| Refer to the Dependency Management section to add the Spring AI BOM to your build file. |
Configuration Properties
Use the spring.ai.anthropic.* properties to configure the Anthropic connection and chat options:
| Property | Description | Default |
|---|---|---|
|
Anthropic API key |
- |
|
API base URL |
|
|
Model name |
|
|
Maximum tokens |
|
|
Sampling temperature |
- |
|
Top-p sampling |
- |
|
Top-k sampling |
- |
Manual Configuration
The AnthropicChatModel implements the ChatModel interface and uses the official Anthropic Java SDK to connect to Claude.
-
Maven
-
Gradle
Add the spring-ai-anthropic dependency to your project’s Maven pom.xml file:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-anthropic</artifactId>
</dependency>
or to your Gradle build.gradle build file:
dependencies {
implementation 'org.springframework.ai:spring-ai-anthropic'
}
| Refer to the Dependency Management section to add the Spring AI BOM to your build file. |
Authentication
Configure your API key either programmatically or via environment variable:
var chatOptions = AnthropicChatOptions.builder()
.model("claude-sonnet-4-20250514")
.maxTokens(1024)
.apiKey(System.getenv("ANTHROPIC_API_KEY"))
.build();
var chatModel = new AnthropicChatModel(chatOptions);
Or set the environment variable and let the SDK auto-detect it:
export ANTHROPIC_API_KEY=<your-api-key>
// API key will be detected from ANTHROPIC_API_KEY environment variable
var chatModel = new AnthropicChatModel(
AnthropicChatOptions.builder()
.model("claude-sonnet-4-20250514")
.maxTokens(1024)
.build());
Runtime Options
The AnthropicChatOptions.java class provides model configurations such as the model to use, temperature, max tokens, etc.
On start-up, configure default options with the AnthropicChatModel(options) constructor.
At run-time, you can override the default options by adding new, request-specific options to the Prompt call.
For example, to override the default model and temperature for a specific request:
ChatResponse response = chatModel.call(
new Prompt(
"Generate the names of 5 famous pirates.",
AnthropicChatOptions.builder()
.model("claude-sonnet-4-20250514")
.temperature(0.4)
.build()
));
Chat Options
| Option | Description | Default |
|---|---|---|
model |
Name of the Claude model to use. Models include: |
|
maxTokens |
The maximum number of tokens to generate in the response. |
4096 |
temperature |
Controls randomness in the response. Higher values make output more random, lower values make it more deterministic. Range: 0.0-1.0 |
1.0 |
topP |
Nucleus sampling parameter. The model considers tokens with top_p probability mass. |
- |
topK |
Only sample from the top K options for each token. |
- |
stopSequences |
Custom sequences that will cause the model to stop generating. |
- |
apiKey |
The API key for authentication. Auto-detects from |
- |
baseUrl |
The base URL for the Anthropic API. |
|
timeout |
Request timeout duration. |
60 seconds |
maxRetries |
Maximum number of retry attempts for failed requests. |
2 |
proxy |
Proxy settings for the HTTP client. |
- |
customHeaders |
Custom HTTP headers to include on all requests (client-level). |
- |
httpHeaders |
Per-request HTTP headers. These are added to individual API calls via |
- |
thinking |
Thinking configuration. Use the convenience builders |
- |
outputConfig |
Output configuration for structured output (JSON schema) and effort control. Use |
- |
Tool Calling Options
| Option | Description | Default |
|---|---|---|
toolChoice |
Controls which tool (if any) is called by the model. Use |
AUTO |
toolCallbacks |
List of tool callbacks to register with the model. |
- |
toolNames |
Set of tool names to be resolved at runtime. |
- |
internalToolExecutionEnabled |
If false, tool calls are proxied to the client for manual handling. If true, Spring AI handles tool calls internally. |
true |
disableParallelToolUse |
When true, the model will use at most one tool per response. |
false |
| In addition to the model-specific AnthropicChatOptions, you can use a portable ChatOptions instance, created with ChatOptions#builder(). |
Tool Calling
You can register custom Java functions or methods with the AnthropicChatModel and have Claude intelligently choose to output a JSON object containing arguments to call one or many of the registered functions/tools.
This is a powerful technique to connect the LLM capabilities with external tools and APIs.
Read more about Tool Calling.
Basic Tool Calling
var chatOptions = AnthropicChatOptions.builder()
.model("claude-sonnet-4-20250514")
.toolCallbacks(List.of(
FunctionToolCallback.builder("getCurrentWeather", new WeatherService())
.description("Get the weather in location")
.inputType(WeatherService.Request.class)
.build()))
.build();
var chatModel = new AnthropicChatModel(chatOptions);
ChatResponse response = chatModel.call(
new Prompt("What's the weather like in San Francisco?", chatOptions));
Tool Choice Options
Control how Claude uses tools with the toolChoice option:
import com.anthropic.models.messages.ToolChoiceAny;
import com.anthropic.models.messages.ToolChoiceTool;
import com.anthropic.models.messages.ToolChoiceNone;
// Force Claude to use any available tool
var options = AnthropicChatOptions.builder()
.toolChoice(ToolChoiceAny.builder().build())
.toolCallbacks(...)
.build();
// Force Claude to use a specific tool
var options = AnthropicChatOptions.builder()
.toolChoice(ToolChoiceTool.builder().name("getCurrentWeather").build())
.toolCallbacks(...)
.build();
// Prevent tool use entirely
var options = AnthropicChatOptions.builder()
.toolChoice(ToolChoiceNone.builder().build())
.toolCallbacks(...)
.build();
|
The Anthropic Java SDK provides convenient static factory methods for common tool choices, which can make your code more concise:
|
Streaming Tool Calling
The Anthropic SDK module fully supports tool calling in streaming mode. When Claude decides to call a tool during streaming:
-
Tool call arguments are accumulated from partial JSON deltas
-
Tools are executed when the content block completes
-
Results are sent back to Claude
-
The conversation continues recursively until Claude provides a final response
Flux<ChatResponse> stream = chatModel.stream(
new Prompt("What's the weather in Paris, Tokyo, and New York?", chatOptions));
String response = stream
.collectList()
.block()
.stream()
.map(r -> r.getResult().getOutput().getContent())
.filter(Objects::nonNull)
.collect(Collectors.joining());
Streaming
The Anthropic SDK module supports both synchronous and streaming responses. Streaming allows Claude to return responses incrementally as they’re generated.
Flux<ChatResponse> stream = chatModel.stream(new Prompt("Tell me a story"));
stream.subscribe(response -> {
String content = response.getResult().getOutput().getContent();
if (content != null) {
System.out.print(content);
}
});
Extended Thinking
Anthropic Claude models support a "thinking" feature that allows the model to show its reasoning process before providing a final answer. This is especially useful for complex questions that require step-by-step reasoning, such as math, logic, and analysis tasks.
|
Supported Models The thinking feature is supported by the following Claude models:
Model capabilities:
API request structure is the same across all supported models, but output behavior varies. |
Thinking Configuration
To enable thinking, configure the following:
-
Set a thinking budget: The
budgetTokensmust be >= 1024 and less thanmaxTokens. -
Set temperature to 1.0: Required when thinking is enabled.
Convenience Builder Methods
AnthropicChatOptions.Builder provides convenience methods for the three thinking modes:
// Enable thinking with a specific token budget
var options = AnthropicChatOptions.builder()
.model("claude-sonnet-4-20250514")
.temperature(1.0)
.maxTokens(16000)
.thinkingEnabled(10000L) // budget must be >= 1024 and < maxTokens
.build();
// Let Claude adaptively decide whether to think
var options = AnthropicChatOptions.builder()
.model("claude-sonnet-4-20250514")
.thinkingAdaptive()
.build();
// Explicitly disable thinking
var options = AnthropicChatOptions.builder()
.model("claude-sonnet-4-20250514")
.thinkingDisabled()
.build();
You can also use the raw SDK ThinkingConfigParam directly:
import com.anthropic.models.messages.ThinkingConfigParam;
import com.anthropic.models.messages.ThinkingConfigEnabled;
var options = AnthropicChatOptions.builder()
.thinking(ThinkingConfigParam.ofEnabled(
ThinkingConfigEnabled.builder().budgetTokens(10000L).build()))
.build();
Non-streaming Example
var options = AnthropicChatOptions.builder()
.model("claude-sonnet-4-20250514")
.temperature(1.0)
.maxTokens(16000)
.thinkingEnabled(10000L)
.build();
ChatResponse response = chatModel.call(
new Prompt("Are there an infinite number of prime numbers such that n mod 4 == 3?", options));
// The response contains multiple generations:
// - ThinkingBlock generations (with "signature" in metadata)
// - TextBlock generations (with the final answer)
for (Generation generation : response.getResults()) {
AssistantMessage message = generation.getOutput();
if (message.getMetadata().containsKey("signature")) {
// This is a thinking block - contains Claude's reasoning
System.out.println("Thinking: " + message.getText());
System.out.println("Signature: " + message.getMetadata().get("signature"));
}
else if (message.getMetadata().containsKey("data")) {
// This is a redacted thinking block (safety-redacted reasoning)
System.out.println("Redacted thinking data: " + message.getMetadata().get("data"));
}
else if (message.getText() != null && !message.getText().isBlank()) {
// This is the final text response
System.out.println("Answer: " + message.getText());
}
}
Streaming Example
Thinking is fully supported in streaming mode. Thinking deltas and signature deltas are emitted as they arrive:
var options = AnthropicChatOptions.builder()
.model("claude-sonnet-4-20250514")
.temperature(1.0)
.maxTokens(16000)
.thinkingEnabled(10000L)
.build();
Flux<ChatResponse> stream = chatModel.stream(
new Prompt("Are there an infinite number of prime numbers such that n mod 4 == 3?", options));
stream.subscribe(response -> {
Generation generation = response.getResult();
AssistantMessage message = generation.getOutput();
if (message.getMetadata().containsKey("thinking")) {
// Incremental thinking content
System.out.print(message.getText());
}
else if (message.getMetadata().containsKey("signature")) {
// Thinking block signature (emitted at end of thinking)
System.out.println("\nSignature: " + message.getMetadata().get("signature"));
}
else if (message.getText() != null) {
// Final text content
System.out.print(message.getText());
}
});
Response Structure
When thinking is enabled, the response contains different types of content:
| Content Type | Metadata Key | Description |
|---|---|---|
Thinking Block |
|
Claude’s reasoning text with a cryptographic signature. In sync mode, the thinking text is in |
Redacted Thinking |
|
Safety-redacted reasoning. Contains only a |
Signature (streaming) |
|
In streaming mode, the signature arrives as a separate delta at the end of a thinking block. |
Thinking Delta (streaming) |
|
Incremental thinking text chunks during streaming. The |
Text Block |
(none) |
The final answer text in |
Multi-Modal Support
The Anthropic SDK module supports multi-modal inputs, allowing you to send images and PDF documents alongside text in your prompts.
Image Input
Send images to Claude for analysis using the Media class:
var imageResource = new ClassPathResource("/test-image.png");
var userMessage = UserMessage.builder()
.text("What do you see in this image?")
.media(List.of(new Media(MimeTypeUtils.IMAGE_PNG, imageResource)))
.build();
ChatResponse response = chatModel.call(new Prompt(List.of(userMessage)));
Supported image formats: PNG, JPEG, GIF, WebP. Images can be provided as:
-
Byte arrays (automatically base64-encoded)
-
HTTPS URLs (passed directly to the API)
PDF Document Input
Send PDF documents for Claude to analyze:
var pdfResource = new ClassPathResource("/document.pdf");
var userMessage = UserMessage.builder()
.text("Please summarize this document.")
.media(List.of(new Media(new MimeType("application", "pdf"), pdfResource)))
.build();
ChatResponse response = chatModel.call(new Prompt(List.of(userMessage)));
Citations
Anthropic’s Citations API allows Claude to reference specific parts of provided documents when generating responses. When citation documents are included in a prompt, Claude can cite the source material, and citation metadata (character ranges, page numbers, or content blocks) is returned in the response metadata.
Citations help improve:
-
Accuracy verification: Users can verify Claude’s responses against source material
-
Transparency: See exactly which parts of documents informed the response
-
Compliance: Meet requirements for source attribution in regulated industries
-
Trust: Build confidence by showing where information came from
|
Supported Models Citations are supported on Claude 3.7 Sonnet and Claude 4 models (Opus and Sonnet). Document Types Three types of citation documents are supported:
|
Creating Citation Documents
Use the AnthropicCitationDocument builder to create documents that can be cited:
Plain Text Documents
AnthropicCitationDocument document = AnthropicCitationDocument.builder()
.plainText("The Eiffel Tower was completed in 1889 in Paris, France. " +
"It stands 330 meters tall and was designed by Gustave Eiffel.")
.title("Eiffel Tower Facts")
.citationsEnabled(true)
.build();
PDF Documents
// From file path
AnthropicCitationDocument document = AnthropicCitationDocument.builder()
.pdfFile("path/to/document.pdf")
.title("Technical Specification")
.citationsEnabled(true)
.build();
// From byte array
byte[] pdfBytes = loadPdfBytes();
AnthropicCitationDocument document = AnthropicCitationDocument.builder()
.pdf(pdfBytes)
.title("Product Manual")
.citationsEnabled(true)
.build();
Custom Content Blocks
For fine-grained citation control, use custom content blocks:
AnthropicCitationDocument document = AnthropicCitationDocument.builder()
.customContent(
"The Great Wall of China is approximately 21,196 kilometers long.",
"It was built over many centuries, starting in the 7th century BC.",
"The wall was constructed to protect Chinese states from invasions."
)
.title("Great Wall Facts")
.citationsEnabled(true)
.build();
Using Citations in Requests
Include citation documents in your chat options:
ChatResponse response = chatModel.call(
new Prompt(
"When was the Eiffel Tower built and how tall is it?",
AnthropicChatOptions.builder()
.model("claude-sonnet-4-20250514")
.maxTokens(1024)
.citationDocuments(document)
.build()
)
);
Multiple Documents
You can provide multiple documents for Claude to reference:
AnthropicCitationDocument parisDoc = AnthropicCitationDocument.builder()
.plainText("Paris is the capital city of France with a population of 2.1 million.")
.title("Paris Information")
.citationsEnabled(true)
.build();
AnthropicCitationDocument eiffelDoc = AnthropicCitationDocument.builder()
.plainText("The Eiffel Tower was designed by Gustave Eiffel for the 1889 World's Fair.")
.title("Eiffel Tower History")
.citationsEnabled(true)
.build();
ChatResponse response = chatModel.call(
new Prompt(
"What is the capital of France and who designed the Eiffel Tower?",
AnthropicChatOptions.builder()
.model("claude-sonnet-4-20250514")
.citationDocuments(parisDoc, eiffelDoc)
.build()
)
);
Accessing Citations
Citations are returned in the response metadata:
ChatResponse response = chatModel.call(prompt);
// Get citations from metadata
List<Citation> citations = (List<Citation>) response.getMetadata().get("citations");
// Optional: Get citation count directly from metadata
Integer citationCount = (Integer) response.getMetadata().get("citationCount");
System.out.println("Total citations: " + citationCount);
// Process each citation
for (Citation citation : citations) {
System.out.println("Document: " + citation.getDocumentTitle());
System.out.println("Location: " + citation.getLocationDescription());
System.out.println("Cited text: " + citation.getCitedText());
System.out.println("Document index: " + citation.getDocumentIndex());
System.out.println();
}
Citation Types
Citations contain different location information depending on the document type:
Character Location (Plain Text)
For plain text documents, citations include character indices:
Citation citation = citations.get(0);
if (citation.getType() == Citation.LocationType.CHAR_LOCATION) {
int start = citation.getStartCharIndex();
int end = citation.getEndCharIndex();
String text = citation.getCitedText();
System.out.println("Characters " + start + "-" + end + ": " + text);
}
Page Location (PDF)
For PDF documents, citations include page numbers:
Citation citation = citations.get(0);
if (citation.getType() == Citation.LocationType.PAGE_LOCATION) {
int startPage = citation.getStartPageNumber();
int endPage = citation.getEndPageNumber();
System.out.println("Pages " + startPage + "-" + endPage);
}
Content Block Location (Custom Content)
For custom content, citations reference specific content blocks:
Citation citation = citations.get(0);
if (citation.getType() == Citation.LocationType.CONTENT_BLOCK_LOCATION) {
int startBlock = citation.getStartBlockIndex();
int endBlock = citation.getEndBlockIndex();
System.out.println("Content blocks " + startBlock + "-" + endBlock);
}
Complete Example
Here’s a complete example demonstrating citation usage:
// Create a citation document
AnthropicCitationDocument document = AnthropicCitationDocument.builder()
.plainText("Spring AI is an application framework for AI engineering. " +
"It provides a Spring-friendly API for developing AI applications. " +
"The framework includes abstractions for chat models, embedding models, " +
"and vector databases.")
.title("Spring AI Overview")
.citationsEnabled(true)
.build();
// Call the model with the document
ChatResponse response = chatModel.call(
new Prompt(
"What is Spring AI?",
AnthropicChatOptions.builder()
.model("claude-sonnet-4-20250514")
.maxTokens(1024)
.citationDocuments(document)
.build()
)
);
// Display the response
System.out.println("Response: " + response.getResult().getOutput().getText());
System.out.println("\nCitations:");
// Process citations
List<Citation> citations = (List<Citation>) response.getMetadata().get("citations");
if (citations != null && !citations.isEmpty()) {
for (int i = 0; i < citations.size(); i++) {
Citation citation = citations.get(i);
System.out.println("\n[" + (i + 1) + "] " + citation.getDocumentTitle());
System.out.println(" Location: " + citation.getLocationDescription());
System.out.println(" Text: " + citation.getCitedText());
}
} else {
System.out.println("No citations were provided in the response.");
}
Best Practices
-
Use descriptive titles: Provide meaningful titles for citation documents to help users identify sources in the citations.
-
Check for null citations: Not all responses will include citations, so always validate the citations metadata exists before accessing it.
-
Consider document size: Larger documents provide more context but consume more input tokens and may affect response time.
-
Leverage multiple documents: When answering questions that span multiple sources, provide all relevant documents in a single request rather than making multiple calls.
-
Use appropriate document types: Choose plain text for simple content, PDF for existing documents, and custom content blocks when you need fine-grained control over citation granularity.
Citation Document Options
Context Field
Optionally provide context about the document that won’t be cited but can guide Claude’s understanding:
AnthropicCitationDocument document = AnthropicCitationDocument.builder()
.plainText("...")
.title("Legal Contract")
.context("This is a merger agreement dated January 2024 between Company A and Company B")
.build();
Controlling Citations
By default, citations are disabled for all documents (opt-in behavior).
To enable citations, explicitly set citationsEnabled(true):
AnthropicCitationDocument document = AnthropicCitationDocument.builder()
.plainText("The Eiffel Tower was completed in 1889...")
.title("Historical Facts")
.citationsEnabled(true) // Explicitly enable citations for this document
.build();
You can also provide documents without citations for background context:
AnthropicCitationDocument backgroundDoc = AnthropicCitationDocument.builder()
.plainText("Background information about the industry...")
.title("Context Document")
// citationsEnabled defaults to false - Claude will use this but not cite it
.build();
|
Anthropic requires consistent citation settings across all documents in a request. You cannot mix citation-enabled and citation-disabled documents in the same request. |
Prompt Caching
Anthropic’s Prompt Caching reduces costs and latency by caching repeated context across API calls. The Anthropic SDK module supports prompt caching with configurable strategies, TTL, and per-message-type settings.
Caching Strategies
Five caching strategies are available via AnthropicCacheStrategy:
| Strategy | Description |
|---|---|
|
No caching (default). No cache control headers are added. |
|
Cache system message content. Uses 1 cache breakpoint. |
|
Cache tool definitions only. Uses 1 cache breakpoint. |
|
Cache both system messages and tool definitions. Uses 2 cache breakpoints. |
|
Cache system messages, tool definitions, and conversation messages. Uses up to 4 cache breakpoints. |
| Anthropic allows a maximum of 4 cache breakpoints per request. The implementation tracks breakpoint usage and stops adding cache control once the limit is reached. |
Basic Usage
var options = AnthropicChatOptions.builder()
.model("claude-sonnet-4-20250514")
.maxTokens(1024)
.cacheOptions(AnthropicCacheOptions.builder()
.strategy(AnthropicCacheStrategy.SYSTEM_ONLY)
.build())
.build();
ChatResponse response = chatModel.call(
new Prompt(List.of(
new SystemMessage("You are an expert assistant with deep domain knowledge..."),
new UserMessage("What is the capital of France?")),
options));
Cache Configuration Options
AnthropicCacheOptions provides fine-grained control over caching behavior:
var cacheOptions = AnthropicCacheOptions.builder()
.strategy(AnthropicCacheStrategy.SYSTEM_AND_TOOLS)
.messageTypeTtl(MessageType.SYSTEM, AnthropicCacheTtl.ONE_HOUR) // 1 hour TTL
.messageTypeMinContentLength(MessageType.SYSTEM, 100) // Min 100 chars
.multiBlockSystemCaching(true) // Per-block caching
.build();
| Option | Description | Default |
|---|---|---|
|
The caching strategy to use. |
|
|
TTL per message type. Available values: |
|
|
Minimum content length required before caching a message type. |
|
|
Custom function to compute content length (e.g., token counting). |
|
|
When |
|
Multi-Block System Caching
When you have both a static system prompt and dynamic instructions, use multi-block system caching to cache only the static portion:
var cacheOptions = AnthropicCacheOptions.builder()
.strategy(AnthropicCacheStrategy.SYSTEM_ONLY)
.multiBlockSystemCaching(true)
.build();
ChatResponse response = chatModel.call(
new Prompt(List.of(
new SystemMessage("You are an expert knowledge base assistant..."), // Static (cached)
new SystemMessage("Today's date is 2025-02-23. User timezone: PST"), // Dynamic
new UserMessage("What are the latest updates?")),
AnthropicChatOptions.builder()
.model("claude-sonnet-4-20250514")
.cacheOptions(cacheOptions)
.build()));
Accessing Cache Token Usage
Cache token metrics are available through the native SDK Usage object:
ChatResponse response = chatModel.call(prompt);
com.anthropic.models.messages.Usage sdkUsage =
(com.anthropic.models.messages.Usage) response.getMetadata().getUsage().getNativeUsage();
long cacheCreation = sdkUsage.cacheCreationInputTokens().orElse(0L);
long cacheRead = sdkUsage.cacheReadInputTokens().orElse(0L);
System.out.println("Cache creation tokens: " + cacheCreation);
System.out.println("Cache read tokens: " + cacheRead);
On the first request, cacheCreationInputTokens will be non-zero (tokens written to cache). On subsequent requests with the same cached prefix, cacheReadInputTokens will be non-zero (tokens read from cache at reduced cost).
Conversation History Caching
The CONVERSATION_HISTORY strategy caches the entire conversation context, including system messages, tool definitions, and the last user message. This is useful for multi-turn conversations where the growing context would otherwise be re-processed on every request:
var cacheOptions = AnthropicCacheOptions.builder()
.strategy(AnthropicCacheStrategy.CONVERSATION_HISTORY)
.build();
var options = AnthropicChatOptions.builder()
.model("claude-sonnet-4-20250514")
.cacheOptions(cacheOptions)
.build();
// First turn
ChatResponse response1 = chatModel.call(
new Prompt(List.of(
new SystemMessage("You are a helpful assistant."),
new UserMessage("What is machine learning?")),
options));
// Second turn - previous context is cached
ChatResponse response2 = chatModel.call(
new Prompt(List.of(
new SystemMessage("You are a helpful assistant."),
new UserMessage("What is machine learning?"),
new AssistantMessage(response1.getResult().getOutput().getText()),
new UserMessage("Can you give me an example?")),
options));
Structured Output
Structured output constrains Claude to produce responses conforming to a JSON schema. The Anthropic SDK module also supports Anthropic’s effort control for tuning response quality vs speed.
|
Model Requirement Structured output and effort control require Schema Requirements When using JSON schema output, Anthropic requires |
JSON Schema Output
Constrain Claude’s responses to a specific JSON schema using the outputSchema convenience method:
var options = AnthropicChatOptions.builder()
.model("claude-sonnet-4-6")
.outputSchema("""
{
"type": "object",
"properties": {
"name": {"type": "string"},
"capital": {"type": "string"},
"population": {"type": "integer"}
},
"required": ["name", "capital"],
"additionalProperties": false
}
""")
.build();
ChatResponse response = chatModel.call(new Prompt("Tell me about France.", options));
// Response text will be valid JSON conforming to the schema
Effort Control
Control how much compute Claude spends on its response. Lower effort means faster, cheaper responses; higher effort means more thorough reasoning.
| Effort Level | Description |
|---|---|
|
Fast and concise responses with minimal reasoning |
|
Balanced trade-off between speed and thoroughness |
|
More thorough reasoning and detailed responses |
|
Maximum compute for the most thorough possible responses |
var options = AnthropicChatOptions.builder()
.model("claude-sonnet-4-6")
.effort(OutputConfig.Effort.LOW)
.build();
ChatResponse response = chatModel.call(new Prompt("What is the capital of France?", options));
Combined Schema and Effort
You can combine JSON schema output with effort control:
var options = AnthropicChatOptions.builder()
.model("claude-sonnet-4-6")
.outputSchema("""
{
"type": "object",
"properties": {
"answer": {"type": "integer"},
"explanation": {"type": "string"}
},
"required": ["answer", "explanation"],
"additionalProperties": false
}
""")
.effort(OutputConfig.Effort.HIGH)
.build();
ChatResponse response = chatModel.call(
new Prompt("What is 15 * 23? Show your reasoning.", options));
Direct OutputConfig
For full control, use the SDK’s OutputConfig directly:
import com.anthropic.models.messages.OutputConfig;
import com.anthropic.models.messages.JsonOutputFormat;
import com.anthropic.core.JsonValue;
var outputConfig = OutputConfig.builder()
.effort(OutputConfig.Effort.HIGH)
.format(JsonOutputFormat.builder()
.schema(JsonOutputFormat.Schema.builder()
.putAdditionalProperty("type", JsonValue.from("object"))
.putAdditionalProperty("properties", JsonValue.from(Map.of(
"name", Map.of("type", "string"))))
.putAdditionalProperty("additionalProperties", JsonValue.from(false))
.build())
.build())
.build();
var options = AnthropicChatOptions.builder()
.model("claude-sonnet-4-6")
.outputConfig(outputConfig)
.build();
ChatResponse response = chatModel.call(new Prompt("Tell me about France.", options));
Per-Request HTTP Headers
The Anthropic SDK module supports per-request HTTP headers, which are injected into individual API calls. This is distinct from customHeaders (which are set at the client level for all requests).
Per-request headers are useful for:
-
Request tracking: Adding correlation IDs or trace headers per request
-
Beta API access: Including beta feature headers for specific requests
-
Routing: Adding routing or priority headers for load balancing
var options = AnthropicChatOptions.builder()
.httpHeaders(Map.of(
"X-Request-Id", "req-12345",
"X-Custom-Tracking", "my-tracking-value"))
.build();
ChatResponse response = chatModel.call(new Prompt("Hello", options));
httpHeaders are per-request and set via MessageCreateParams.putAdditionalHeader(). They do not affect other requests. For headers that should apply to all requests, use customHeaders instead.
|
Sample Controller
Here is an example of a simple @RestController class that uses the chat model for text generations:
@RestController
public class ChatController {
private final AnthropicChatModel chatModel;
public ChatController() {
var options = AnthropicChatOptions.builder()
.model("claude-sonnet-4-20250514")
.maxTokens(1024)
.apiKey(System.getenv("ANTHROPIC_API_KEY"))
.build();
this.chatModel = new AnthropicChatModel(options);
}
@GetMapping("/ai/generate")
public Map<String, String> generate(
@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
return Map.of("generation", chatModel.call(message));
}
@GetMapping("/ai/generateStream")
public Flux<ChatResponse> generateStream(
@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
Prompt prompt = new Prompt(new UserMessage(message));
return chatModel.stream(prompt);
}
}
Accessing the Raw Response
The full Anthropic SDK Message object is available in the response metadata under the "anthropic-response" key. This provides access to any fields not explicitly mapped by Spring AI’s abstraction:
ChatResponse response = chatModel.call(new Prompt("Hello"));
com.anthropic.models.messages.Message rawMessage =
(com.anthropic.models.messages.Message) response.getMetadata().get("anthropic-response");
// Access native SDK fields
rawMessage.stopReason(); // Optional<StopReason>
rawMessage.content(); // List<ContentBlock>
rawMessage.usage(); // Usage with cache token details
| The raw response is available for synchronous calls only. Streaming responses do not include it. |
Skills
Anthropic’s Skills API extends Claude’s capabilities with specialized, pre-packaged abilities for document generation. Skills enable Claude to create actual downloadable files — Excel spreadsheets, PowerPoint presentations, Word documents, and PDFs — rather than just describing what these documents might contain.
|
Supported Models Skills are supported on Claude Sonnet 4, Claude Sonnet 4.5, Claude Opus 4, and later models. Requirements
|
Pre-built Anthropic Skills
Spring AI provides type-safe access to Anthropic’s pre-built skills through the AnthropicSkill enum:
| Skill | Description | Generated File Type |
|---|---|---|
|
Excel spreadsheet generation and manipulation |
|
|
PowerPoint presentation creation |
|
|
Word document generation |
|
|
PDF document creation |
|
Basic Usage
Enable skills by adding them to your AnthropicChatOptions:
ChatResponse response = chatModel.call(
new Prompt(
"Create an Excel spreadsheet with Q1 2025 sales data. " +
"Include columns for Month, Revenue, and Expenses with 3 rows of sample data.",
AnthropicChatOptions.builder()
.model(Model.CLAUDE_SONNET_4_5)
.maxTokens(4096)
.skill(AnthropicSkill.XLSX)
.build()
)
);
// Claude will generate an actual Excel file
String responseText = response.getResult().getOutput().getText();
System.out.println(responseText);
// Output: "I've created an Excel spreadsheet with your Q1 2025 sales data..."
Multiple Skills
You can enable multiple skills in a single request (up to 8):
ChatResponse response = chatModel.call(
new Prompt(
"Create a sales report with both an Excel file containing the raw data " +
"and a PowerPoint presentation summarizing the key findings.",
AnthropicChatOptions.builder()
.model(Model.CLAUDE_SONNET_4_5)
.maxTokens(8192)
.skill(AnthropicSkill.XLSX)
.skill(AnthropicSkill.PPTX)
.build()
)
);
Using AnthropicSkillContainer for Advanced Configuration
For more control over skill types and versions, use AnthropicSkillContainer directly:
AnthropicSkillContainer container = AnthropicSkillContainer.builder()
.skill(AnthropicSkill.XLSX)
.skill(AnthropicSkill.PPTX, "20251013") // Specific version
.build();
ChatResponse response = chatModel.call(
new Prompt(
"Generate the quarterly report",
AnthropicChatOptions.builder()
.model(Model.CLAUDE_SONNET_4_5)
.maxTokens(4096)
.skillContainer(container)
.build()
)
);
Downloading Generated Files
When Claude generates files using Skills, the response contains file IDs that can be used to download the actual files via the Files API.
Spring AI provides the AnthropicSkillsResponseHelper utility class for extracting file IDs and downloading files.
Extracting File IDs
import org.springframework.ai.anthropic.AnthropicSkillsResponseHelper;
ChatResponse response = chatModel.call(prompt);
// Extract all file IDs from the response
List<String> fileIds = AnthropicSkillsResponseHelper.extractFileIds(response);
for (String fileId : fileIds) {
System.out.println("Generated file ID: " + fileId);
}
Downloading All Files
The AnthropicSkillsResponseHelper provides a convenience method to download all generated files at once.
This requires the AnthropicClient instance (the same one used to create the chat model):
import com.anthropic.client.AnthropicClient;
@Autowired
private AnthropicClient anthropicClient;
// Download all files to a target directory
Path targetDir = Path.of("generated-files");
Files.createDirectories(targetDir);
List<Path> savedFiles = AnthropicSkillsResponseHelper.downloadAllFiles(
response, anthropicClient, targetDir);
for (Path file : savedFiles) {
System.out.println("Downloaded: " + file.getFileName() +
" (" + Files.size(file) + " bytes)");
}
Complete Example
Here’s a complete example showing Skills usage with file download:
@Service
public class DocumentGenerationService {
private final AnthropicChatModel chatModel;
private final AnthropicClient anthropicClient;
public DocumentGenerationService(AnthropicChatModel chatModel,
AnthropicClient anthropicClient) {
this.chatModel = chatModel;
this.anthropicClient = anthropicClient;
}
public Path generateSalesReport(String quarter, Path outputDir) throws IOException {
// Generate Excel report using Skills
ChatResponse response = chatModel.call(
new Prompt(
"Create an Excel spreadsheet with " + quarter + " sales data. " +
"Include Month, Revenue, Expenses, and Profit columns.",
AnthropicChatOptions.builder()
.model(Model.CLAUDE_SONNET_4_5)
.maxTokens(4096)
.skill(AnthropicSkill.XLSX)
.build()
)
);
// Extract file IDs from the response
List<String> fileIds = AnthropicSkillsResponseHelper.extractFileIds(response);
if (fileIds.isEmpty()) {
throw new RuntimeException("No file was generated");
}
// Download all generated files
List<Path> savedFiles = AnthropicSkillsResponseHelper.downloadAllFiles(
response, anthropicClient, outputDir);
return savedFiles.get(0);
}
}
Best Practices
-
Use appropriate models: Skills work best with Claude Sonnet 4 and later models. Ensure you’re using a supported model.
-
Set sufficient max tokens: Document generation can require significant tokens. Use
maxTokens(4096)or higher for complex documents. -
Be specific in prompts: Provide clear, detailed instructions about document structure, content, and formatting.
-
Handle file downloads promptly: Generated files expire after 24 hours. Download files soon after generation.
-
Check for file IDs: Always verify that file IDs were returned before attempting downloads. Some prompts may result in text responses without file generation.
-
Use defensive error handling: Wrap file operations in try-catch blocks to handle network issues or expired files gracefully.
List<String> fileIds = AnthropicSkillsResponseHelper.extractFileIds(response);
if (fileIds.isEmpty()) {
// Claude may have responded with text instead of generating a file
String text = response.getResult().getOutput().getText();
log.warn("No files generated. Response: {}", text);
return;
}
try {
List<Path> files = AnthropicSkillsResponseHelper.downloadAllFiles(
response, anthropicClient, targetDir);
// Process files...
} catch (IOException e) {
log.error("Failed to download file: {}", e.getMessage());
}
Observability
The Anthropic SDK implementation supports Spring AI’s observability features through Micrometer. All chat model operations are instrumented for monitoring and tracing.
Limitations
The following features are not yet supported:
-
Amazon Bedrock backend
-
Google Vertex AI backend
These features are planned for future releases.