Quickstart: Recognize and convert speech to text

Reference documentation | Package (download) | Additional samples on GitHub

In this quickstart, you create and run an application to recognize and transcribe speech to text in real-time.

To instead transcribe audio files asynchronously, see What is batch transcription. If you're not sure which speech to text solution is right for you, see What is speech to text?

Prerequisites

An Azure subscription. You can create one for free. Create a Speech resource in the Azure portal. Get the Speech resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys.

Set up the environment

The Speech SDK for Swift is distributed as a framework bundle. The framework supports both Objective-C and Swift on both iOS and macOS.

The Speech SDK can be used in Xcode projects as a CocoaPod, or downloaded directly and linked manually. This guide uses a CocoaPod. Install the CocoaPod dependency manager as described in its installation instructions.

Set environment variables

You need to authenticate your application to access Azure AI services. For production, use a secure way to store and access your credentials. For example, after you get a key for your Speech resource, write it to a new environment variable on the local machine that runs the application.

Important If you use an API key, store it securely somewhere else, such as in Azure Key Vault. Don't include the API key directly in your code, and never post it publicly. For more information about AI services security, see Authenticate requests to Azure AI services.

To set the environment variables for your Speech resource key and region, open a console window, and follow the instructions for your operating system and development environment.

SPEECH_KEYSPEECH_REGION

setx SPEECH_KEY your-key
setx SPEECH_REGION your-region

setsetx

export SPEECH_KEY=your-key
export SPEECH_REGION=your-region

source ~/.bashrc

export SPEECH_KEY=your-key
export SPEECH_REGION=your-region

source ~/.bash_profileSPEECH_KEYSPEECH_REGIONwestus

Recognize speech from a microphone

Follow these steps to recognize speech in a macOS application.

helloworldpod installhelloworld.xcworkspacehelloworld.xcworkspaceapplicationDidFinishLaunchingrecognizeFromMic

import Cocoa

@NSApplicationMain
class AppDelegate: NSObject, NSApplicationDelegate {
    var label: NSTextField!
    var fromMicButton: NSButton!

    var sub: String!
    var region: String!

    @IBOutlet weak var window: NSWindow!

    func applicationDidFinishLaunching(_ aNotification: Notification) {
        print("loading")
        // load subscription information
        sub = ProcessInfo.processInfo.environment["SPEECH_KEY"]
        region = ProcessInfo.processInfo.environment["SPEECH_REGION"]

        label = NSTextField(frame: NSRect(x: 100, y: 50, width: 200, height: 200))
        label.textColor = NSColor.black
        label.lineBreakMode = .byWordWrapping

        label.stringValue = "Recognition Result"
        label.isEditable = false

        self.window.contentView?.addSubview(label)

        fromMicButton = NSButton(frame: NSRect(x: 100, y: 300, width: 200, height: 30))
        fromMicButton.title = "Recognize"
        fromMicButton.target = self
        fromMicButton.action = #selector(fromMicButtonClicked)
        self.window.contentView?.addSubview(fromMicButton)
    }

    @objc func fromMicButtonClicked() {
        DispatchQueue.global(qos: .userInitiated).async {
            self.recognizeFromMic()
        }
    }

    func recognizeFromMic() {
        var speechConfig: SPXSpeechConfiguration?
        do {
            try speechConfig = SPXSpeechConfiguration(subscription: sub, region: region)
        } catch {
            print("error \(error) happened")
            speechConfig = nil
        }
        speechConfig?.speechRecognitionLanguage = "en-US"

        let audioConfig = SPXAudioConfiguration()

        let reco = try! SPXSpeechRecognizer(speechConfiguration: speechConfig!, audioConfiguration: audioConfig)

        reco.addRecognizingEventHandler() {reco, evt in
            print("intermediate recognition result: \(evt.result.text ?? "(no result)")")
            self.updateLabel(text: evt.result.text, color: .gray)
        }

        updateLabel(text: "Listening ...", color: .gray)
        print("Listening...")

        let result = try! reco.recognizeOnce()
        print("recognition result: \(result.text ?? "(no result)"), reason: \(result.reason.rawValue)")
        updateLabel(text: result.text, color: .black)

        if result.reason != SPXResultReason.recognizedSpeech {
            let cancellationDetails = try! SPXCancellationDetails(fromCanceledRecognitionResult: result)
            print("cancelled: \(result.reason), \(cancellationDetails.errorDetails)")
            print("Did you set the speech resource key and region values?")
            updateLabel(text: "Error: \(cancellationDetails.errorDetails)", color: .red)
        }
    }

    func updateLabel(text: String?, color: NSColor) {
        DispatchQueue.main.async {
            self.label.stringValue = text!
            self.label.textColor = color
        }
    }
}

sub = ProcessInfo.processInfo.environment["SPEECH_KEY"]
region = ProcessInfo.processInfo.environment["SPEECH_REGION"]

en-USes-ESen-USSPEECH_KEYSPEECH_REGION

After you select the button in the app and say a few words, you should see the text that you spoke on the lower part of the screen. When you run the app for the first time, it prompts you to give the app access to your computer's microphone.

Remarks

recognizeOnce

Objective-C

The Speech SDK for Objective-C shares client libraries and reference documentation with the Speech SDK for Swift. For Objective-C code examples, see the recognize speech from a microphone in Objective-C on macOS sample project in GitHub.

Clean up resources

You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.

Quickstart: Recognize and convert speech to text

Prerequisites

Set up the environment

Set environment variables

Recognize speech from a microphone

Remarks

Objective-C

Clean up resources

文生视频大模型引发广泛关注深度关注 | Sora将改变什么————头条——中央纪委国家监委网站

AI大模型在短视频处理和剪辑中的应用，文末送书

AI视频生成革命！MIT领衔豪华天团让生成效率暴涨370%，成本直降4.4倍

只需一段提示文本，就能生成60秒连贯性视频——Sora火了，通用人工智能要来了？

AI合成视频全攻略：一键生成，让创意无限延伸

AI视频生成模型开发教程

AI软件教程：零基础学会文本生成视频制作

2025年制作AI视频的完整流程与工具推荐

利用AI大模型来全自动生成高清短视频，Ai视频制作+部署教程

保姆级教程：文字一键成片，免费AI视频制作软件

Discord光遇社区入门指南＆测试服申请教程

Discord下载及保姆使用教程附翻译插件

Midjourney手机版和电脑版的功能差异

discord项目mod申请,平板discord注册教程

新手必读，如何玩转Discord，那些你必须知道的小常识