Reference documentation | Package (download) | Additional samples on GitHub

In this quickstart, you create and run an application to recognize and transcribe speech to text in real-time.

To instead transcribe audio files asynchronously, see What is batch transcription. If you're not sure which speech to text solution is right for you, see What is speech to text?

Prerequisites

An Azure subscription. You can create one for free. Create a Speech resource in the Azure portal. Get the Speech resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys.

Set up the environment

The Speech SDK for Swift is distributed as a framework bundle. The framework supports both Objective-C and Swift on both iOS and macOS.

The Speech SDK can be used in Xcode projects as a CocoaPod, or downloaded directly and linked manually. This guide uses a CocoaPod. Install the CocoaPod dependency manager as described in its installation instructions.

Set environment variables

You need to authenticate your application to access Azure AI services. For production, use a secure way to store and access your credentials. For example, after you get a key for your Speech resource, write it to a new environment variable on the local machine that runs the application.

Important If you use an API key, store it securely somewhere else, such as in Azure Key Vault. Don't include the API key directly in your code, and never post it publicly. For more information about AI services security, see Authenticate requests to Azure AI services.

To set the environment variables for your Speech resource key and region, open a console window, and follow the instructions for your operating system and development environment.

SPEECH_KEYSPEECH_REGION setx SPEECH_KEY your-key setx SPEECH_REGION your-region setsetxexport SPEECH_KEY=your-key export SPEECH_REGION=your-region source ~/.bashrcexport SPEECH_KEY=your-key export SPEECH_REGION=your-region source ~/.bash_profileSPEECH_KEYSPEECH_REGIONwestus

Recognize speech from a microphone

Follow these steps to recognize speech in a macOS application.

helloworldpod installhelloworld.xcworkspacehelloworld.xcworkspaceapplicationDidFinishLaunchingrecognizeFromMicimport Cocoa @NSApplicationMain class AppDelegate: NSObject, NSApplicationDelegate { var label: NSTextField! var fromMicButton: NSButton! var sub: String! var region: String! @IBOutlet weak var window: NSWindow! func applicationDidFinishLaunching(_ aNotification: Notification) { print("loading") // load subscription information sub = ProcessInfo.processInfo.environment["SPEECH_KEY"] region = ProcessInfo.processInfo.environment["SPEECH_REGION"] label = NSTextField(frame: NSRect(x: 100, y: 50, width: 200, height: 200)) label.textColor = NSColor.black label.lineBreakMode = .byWordWrapping label.stringValue = "Recognition Result" label.isEditable = false self.window.contentView?.addSubview(label) fromMicButton = NSButton(frame: NSRect(x: 100, y: 300, width: 200, height: 30)) fromMicButton.title = "Recognize" fromMicButton.target = self fromMicButton.action = #selector(fromMicButtonClicked) self.window.contentView?.addSubview(fromMicButton) } @objc func fromMicButtonClicked() { DispatchQueue.global(qos: .userInitiated).async { self.recognizeFromMic() } } func recognizeFromMic() { var speechConfig: SPXSpeechConfiguration? do { try speechConfig = SPXSpeechConfiguration(subscription: sub, region: region) } catch { print("error \(error) happened") speechConfig = nil } speechConfig?.speechRecognitionLanguage = "en-US" let audioConfig = SPXAudioConfiguration() let reco = try! SPXSpeechRecognizer(speechConfiguration: speechConfig!, audioConfiguration: audioConfig) reco.addRecognizingEventHandler() {reco, evt in print("intermediate recognition result: \(evt.result.text ?? "(no result)")") self.updateLabel(text: evt.result.text, color: .gray) } updateLabel(text: "Listening ...", color: .gray) print("Listening...") let result = try! reco.recognizeOnce() print("recognition result: \(result.text ?? "(no result)"), reason: \(result.reason.rawValue)") updateLabel(text: result.text, color: .black) if result.reason != SPXResultReason.recognizedSpeech { let cancellationDetails = try! SPXCancellationDetails(fromCanceledRecognitionResult: result) print("cancelled: \(result.reason), \(cancellationDetails.errorDetails)") print("Did you set the speech resource key and region values?") updateLabel(text: "Error: \(cancellationDetails.errorDetails)", color: .red) } } func updateLabel(text: String?, color: NSColor) { DispatchQueue.main.async { self.label.stringValue = text! self.label.textColor = color } } } sub = ProcessInfo.processInfo.environment["SPEECH_KEY"] region = ProcessInfo.processInfo.environment["SPEECH_REGION"] en-USes-ESen-USSPEECH_KEYSPEECH_REGION

After you select the button in the app and say a few words, you should see the text that you spoke on the lower part of the screen. When you run the app for the first time, it prompts you to give the app access to your computer's microphone.

Remarks

recognizeOnce

Objective-C

The Speech SDK for Objective-C shares client libraries and reference documentation with the Speech SDK for Swift. For Objective-C code examples, see the recognize speech from a microphone in Objective-C on macOS sample project in GitHub.

Clean up resources

You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.