Reference documentation | Package (download) | Additional samples on GitHub
In this quickstart, you create and run an application to recognize and transcribe speech to text in real-time.
To instead transcribe audio files asynchronously, see What is batch transcription. If you're not sure which speech to text solution is right for you, see What is speech to text?
Prerequisites
An Azure subscription. You can create one for free. Create a Speech resource in the Azure portal. Get the Speech resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys.
Set up the environment
The Speech SDK for Swift is distributed as a framework bundle. The framework supports both Objective-C and Swift on both iOS and macOS.
The Speech SDK can be used in Xcode projects as a CocoaPod, or downloaded directly and linked manually. This guide uses a CocoaPod. Install the CocoaPod dependency manager as described in its installation instructions.
Set environment variables
You need to authenticate your application to access Azure AI services. For production, use a secure way to store and access your credentials. For example, after you get a key for your Speech resource, write it to a new environment variable on the local machine that runs the application.
Important If you use an API key, store it securely somewhere else, such as in Azure Key Vault. Don't include the API key directly in your code, and never post it publicly. For more information about AI services security, see Authenticate requests to Azure AI services.
To set the environment variables for your Speech resource key and region, open a console window, and follow the instructions for your operating system and development environment.
SPEECH_KEY
SPEECH_REGION
setx SPEECH_KEY your-key
setx SPEECH_REGION your-region
set
setx
export SPEECH_KEY=your-key
export SPEECH_REGION=your-region
source ~/.bashrc
export SPEECH_KEY=your-key
export SPEECH_REGION=your-region
source ~/.bash_profile
SPEECH_KEY
SPEECH_REGION
westus
Recognize speech from a microphone
Follow these steps to recognize speech in a macOS application.
helloworld
pod install
helloworld.xcworkspace
helloworld.xcworkspace
applicationDidFinishLaunching
recognizeFromMic
import Cocoa
@NSApplicationMain
class AppDelegate: NSObject, NSApplicationDelegate {
var label: NSTextField!
var fromMicButton: NSButton!
var sub: String!
var region: String!
@IBOutlet weak var window: NSWindow!
func applicationDidFinishLaunching(_ aNotification: Notification) {
print("loading")
// load subscription information
sub = ProcessInfo.processInfo.environment["SPEECH_KEY"]
region = ProcessInfo.processInfo.environment["SPEECH_REGION"]
label = NSTextField(frame: NSRect(x: 100, y: 50, width: 200, height: 200))
label.textColor = NSColor.black
label.lineBreakMode = .byWordWrapping
label.stringValue = "Recognition Result"
label.isEditable = false
self.window.contentView?.addSubview(label)
fromMicButton = NSButton(frame: NSRect(x: 100, y: 300, width: 200, height: 30))
fromMicButton.title = "Recognize"
fromMicButton.target = self
fromMicButton.action = #selector(fromMicButtonClicked)
self.window.contentView?.addSubview(fromMicButton)
}
@objc func fromMicButtonClicked() {
DispatchQueue.global(qos: .userInitiated).async {
self.recognizeFromMic()
}
}
func recognizeFromMic() {
var speechConfig: SPXSpeechConfiguration?
do {
try speechConfig = SPXSpeechConfiguration(subscription: sub, region: region)
} catch {
print("error \(error) happened")
speechConfig = nil
}
speechConfig?.speechRecognitionLanguage = "en-US"
let audioConfig = SPXAudioConfiguration()
let reco = try! SPXSpeechRecognizer(speechConfiguration: speechConfig!, audioConfiguration: audioConfig)
reco.addRecognizingEventHandler() {reco, evt in
print("intermediate recognition result: \(evt.result.text ?? "(no result)")")
self.updateLabel(text: evt.result.text, color: .gray)
}
updateLabel(text: "Listening ...", color: .gray)
print("Listening...")
let result = try! reco.recognizeOnce()
print("recognition result: \(result.text ?? "(no result)"), reason: \(result.reason.rawValue)")
updateLabel(text: result.text, color: .black)
if result.reason != SPXResultReason.recognizedSpeech {
let cancellationDetails = try! SPXCancellationDetails(fromCanceledRecognitionResult: result)
print("cancelled: \(result.reason), \(cancellationDetails.errorDetails)")
print("Did you set the speech resource key and region values?")
updateLabel(text: "Error: \(cancellationDetails.errorDetails)", color: .red)
}
}
func updateLabel(text: String?, color: NSColor) {
DispatchQueue.main.async {
self.label.stringValue = text!
self.label.textColor = color
}
}
}
sub = ProcessInfo.processInfo.environment["SPEECH_KEY"]
region = ProcessInfo.processInfo.environment["SPEECH_REGION"]
en-US
es-ES
en-US
SPEECH_KEY
SPEECH_REGION
After you select the button in the app and say a few words, you should see the text that you spoke on the lower part of the screen. When you run the app for the first time, it prompts you to give the app access to your computer's microphone.
Remarks
recognizeOnce
Objective-C
The Speech SDK for Objective-C shares client libraries and reference documentation with the Speech SDK for Swift. For Objective-C code examples, see the recognize speech from a microphone in Objective-C on macOS sample project in GitHub.
Clean up resources
You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Speech resource you created.