此版本仍在开发中,尚未被视为稳定版。为了获取最新的快照版本,请使用Spring AI 1.1.3spring-doc.cadn.net.cn

语音合成(TTS)API

Spring AI 通过 TextToSpeechModelStreamingTextToSpeechModel 接口为文本转语音(TTS)提供了一个统一的API。这使您能够编写可跨不同TTS提供商工作的可移植代码。spring-doc.cadn.net.cn

公共接口

所有TTS提供商都实现了以下共享接口:spring-doc.cadn.net.cn

语音合成模型

TextToSpeechModel 接口提供了将文本转换为语音的方法:spring-doc.cadn.net.cn

public interface TextToSpeechModel extends Model<TextToSpeechPrompt, TextToSpeechResponse>, StreamingTextToSpeechModel {

    /**
     * Converts text to speech with default options.
     */
    default byte[] call(String text) {
        // Default implementation
    }

    /**
     * Converts text to speech with custom options.
     */
    TextToSpeechResponse call(TextToSpeechPrompt prompt);

    /**
     * Returns the default options for this model.
     */
    default TextToSpeechOptions getDefaultOptions() {
        // Default implementation
    }
}

流式文本转语音模型

StreamingTextToSpeechModel 接口提供了实时流式音频的方法:spring-doc.cadn.net.cn

@FunctionalInterface
public interface StreamingTextToSpeechModel extends StreamingModel<TextToSpeechPrompt, TextToSpeechResponse> {

    /**
     * Streams text-to-speech responses with metadata.
     */
    Flux<TextToSpeechResponse> stream(TextToSpeechPrompt prompt);

    /**
     * Streams audio bytes for the given text.
     */
    default Flux<byte[]> stream(String text) {
        // Default implementation
    }
}

语音合成提示

TextToSpeechPrompt 类封装了输入文本和选项:spring-doc.cadn.net.cn

TextToSpeechPrompt prompt = new TextToSpeechPrompt(
    "Hello, this is a text-to-speech example.",
    options
);

语音合成响应

TextToSpeechResponse 类包含生成的音频和元数据:spring-doc.cadn.net.cn

TextToSpeechResponse response = model.call(prompt);
byte[] audioBytes = response.getResult().getOutput();
TextToSpeechResponseMetadata metadata = response.getMetadata();

编写与提供者无关的代码

共享TTS接口的主要优势之一是能够编写代码,使其无需修改即可与任何TTS服务提供商协同工作。实际的服务提供商(如OpenAI、ElevenLabs等)由您的Spring Boot配置决定,这意味着您可以切换提供商而无需更改应用程序代码。spring-doc.cadn.net.cn

基础服务示例

共享接口使您能够编写可与任何TTS提供商配合使用的代码:spring-doc.cadn.net.cn

@Service
public class NarrationService {

    private final TextToSpeechModel textToSpeechModel;

    public NarrationService(TextToSpeechModel textToSpeechModel) {
        this.textToSpeechModel = textToSpeechModel;
    }

    public byte[] narrate(String text) {
        // Works with any TTS provider
        return textToSpeechModel.call(text);
    }

    public byte[] narrateWithOptions(String text, TextToSpeechOptions options) {
        TextToSpeechPrompt prompt = new TextToSpeechPrompt(text, options);
        TextToSpeechResponse response = textToSpeechModel.call(prompt);
        return response.getResult().getOutput();
    }
}

此服务无缝集成OpenAI、ElevenLabs或任何其他TTS提供商,具体实现由您的Spring Boot配置决定。spring-doc.cadn.net.cn

高级示例:多提供者支持

您可以构建同时支持多个TTS提供商的应用程序:spring-doc.cadn.net.cn

@Service
public class MultiProviderNarrationService {

    private final Map<String, TextToSpeechModel> providers;

    public MultiProviderNarrationService(List<TextToSpeechModel> models) {
        // Spring will inject all available TextToSpeechModel beans
        this.providers = models.stream()
            .collect(Collectors.toMap(
                model -> model.getClass().getSimpleName(),
                model -> model
            ));
    }

    public byte[] narrateWithProvider(String text, String providerName) {
        TextToSpeechModel model = providers.get(providerName);
        if (model == null) {
            throw new IllegalArgumentException("Unknown provider: " + providerName);
        }
        return model.call(text);
    }

    public Set<String> getAvailableProviders() {
        return providers.keySet();
    }
}

流式音频示例

共享接口同样支持流式传输,以实现实时音频生成:spring-doc.cadn.net.cn

@Service
public class StreamingNarrationService {

    private final TextToSpeechModel textToSpeechModel;

    public StreamingNarrationService(TextToSpeechModel textToSpeechModel) {
        this.textToSpeechModel = textToSpeechModel;
    }

    public Flux<byte[]> streamNarration(String text) {
        // TextToSpeechModel extends StreamingTextToSpeechModel
        return textToSpeechModel.stream(text);
    }

    public Flux<TextToSpeechResponse> streamWithMetadata(String text, TextToSpeechOptions options) {
        TextToSpeechPrompt prompt = new TextToSpeechPrompt(text, options);
        return textToSpeechModel.stream(prompt);
    }
}

REST控制器示例

使用提供商无关的TTS构建REST API:spring-doc.cadn.net.cn

@RestController
@RequestMapping("/api/tts")
public class TextToSpeechController {

    private final TextToSpeechModel textToSpeechModel;

    public TextToSpeechController(TextToSpeechModel textToSpeechModel) {
        this.textToSpeechModel = textToSpeechModel;
    }

    @PostMapping(value = "/synthesize", produces = "audio/mpeg")
    public ResponseEntity<byte[]> synthesize(@RequestBody SynthesisRequest request) {
        byte[] audio = textToSpeechModel.call(request.text());
        return ResponseEntity.ok()
            .contentType(MediaType.parseMediaType("audio/mpeg"))
            .header("Content-Disposition", "attachment; filename=\"speech.mp3\"")
            .body(audio);
    }

    @GetMapping(value = "/stream", produces = MediaType.APPLICATION_OCTET_STREAM_VALUE)
    public Flux<byte[]> streamSynthesis(@RequestParam String text) {
        return textToSpeechModel.stream(text);
    }

    record SynthesisRequest(String text) {}
}

基于配置的提供者选择

使用 Spring 配置文件或属性在提供程序之间切换:spring-doc.cadn.net.cn

# application-openai.yml
spring:
  ai:
    model:
      audio:
        speech: openai
    openai:
      api-key: ${OPENAI_API_KEY}
      audio:
        speech:
          options:
            model: gpt-4o-mini-tts
            voice: alloy

# application-elevenlabs.yml
spring:
  ai:
    model:
      audio:
        speech: elevenlabs
    elevenlabs:
      api-key: ${ELEVENLABS_API_KEY}
      tts:
        options:
          model-id: eleven_turbo_v2_5
          voice-id: your_voice_id

然后激活所需的提供者:spring-doc.cadn.net.cn

# Use OpenAI
java -jar app.jar --spring.profiles.active=openai

# Use ElevenLabs
java -jar app.jar --spring.profiles.active=elevenlabs

使用可移植选项

为了最大程度的可移植性,请仅使用通用的 TextToSpeechOptions 接口方法:spring-doc.cadn.net.cn

@Service
public class PortableNarrationService {

    private final TextToSpeechModel textToSpeechModel;

    public PortableNarrationService(TextToSpeechModel textToSpeechModel) {
        this.textToSpeechModel = textToSpeechModel;
    }

    public byte[] createPortableNarration(String text) {
        // Use provider's default options for maximum portability
        TextToSpeechOptions defaultOptions = textToSpeechModel.getDefaultOptions();
        TextToSpeechPrompt prompt = new TextToSpeechPrompt(text, defaultOptions);
        TextToSpeechResponse response = textToSpeechModel.call(prompt);
        return response.getResult().getOutput();
    }
}

使用提供商特定功能

当您需要使用特定于提供商的功能时,您仍然可以在保持代码可移植性的同时使用它们。spring-doc.cadn.net.cn

@Service
public class FlexibleNarrationService {

    private final TextToSpeechModel textToSpeechModel;

    public FlexibleNarrationService(TextToSpeechModel textToSpeechModel) {
        this.textToSpeechModel = textToSpeechModel;
    }

    public byte[] narrate(String text, TextToSpeechOptions baseOptions) {
        TextToSpeechOptions options = baseOptions;

        // Apply provider-specific optimizations if available
        if (textToSpeechModel instanceof OpenAiAudioSpeechModel) {
            options = OpenAiAudioSpeechOptions.builder()
                .from(baseOptions)
                .model("gpt-4o-tts")  // OpenAI-specific: use high-quality model
                .speed(1.0)
                .build();
        } else if (textToSpeechModel instanceof ElevenLabsTextToSpeechModel) {
            // ElevenLabs-specific options could go here
        }

        TextToSpeechPrompt prompt = new TextToSpeechPrompt(text, options);
        TextToSpeechResponse response = textToSpeechModel.call(prompt);
        return response.getResult().getOutput();
    }
}

可移植代码的最佳实践

  1. 依赖于接口: 始终注入 接口 而不是具体实现spring-doc.cadn.net.cn

  2. 使用通用选项: 为了最大程度的可移植性,请坚持使用 TextToSpeechOptions 个接口方法spring-doc.cadn.net.cn

  3. 优雅处理元数据: 不同的提供商返回不同的元数据;请通用性地处理spring-doc.cadn.net.cn

  4. 多提供商测试: 确保您的代码至少与两个TTS提供商兼容spring-doc.cadn.net.cn

  5. 文档提供者假设: 如果您依赖于特定提供商的行为,请明确记录下来spring-doc.cadn.net.cn

提供商特定功能

共享接口提供了可移植性,但每个提供商也通过特定于提供商的选项类(例如,OpenAiAudioSpeechOptionsElevenLabsSpeechOptions)提供了特定功能。这些类实现了TextToSpeechOptions接口的同时增加了特定于提供商的功能。spring-doc.cadn.net.cn