Voice and Speech
Use speech transcription and text-to-speech providers for voice input, generated narration, and channel delivery.
Voice and speech features help agents work with audio. Depending on configured providers, the app can transcribe audio, synthesize speech, and deliver voice-ready outputs through tasks or channels.
Configure Speech Providers
- Open Settings
- Find the speech or provider configuration area
- Add the required API key or provider credentials
- Save the settings
- Run a small test task before using speech in a production workflow
Provider availability depends on your build and configured credentials.
Transcribe Audio
Use transcription when a task includes meetings, voice notes, interviews, or media files.
- Attach an audio or video file to a task
- Ask the agent to transcribe it
- Specify whether you need a verbatim transcript, summary, action items, or timestamps
- Review the transcript before using it as source material
For long files, ask for sections or summaries first, then request detail where needed.
Generate Speech
Use text-to-speech when the output should become narration, a voice message, or spoken media.
Useful requests include:
- "Create a 45-second narration for this product demo."
- "Generate a friendly voiceover script and speech file for the onboarding clip."
- "Turn this summary into a short audio update for the team."
Review the script before synthesis when the message is customer-facing.
Speech in Channels
Channel delivery can include audio or media outputs when the target platform supports the file type. Use this for:
- Daily spoken summaries
- Incident update voice notes
- Generated narration for media reviews
- Accessibility-friendly recap formats
Check platform file limits before sending large audio or video files.
Privacy and Consent
Audio can contain sensitive personal data. Follow these rules:
- Transcribe only files you are allowed to process
- Remove or redact sensitive excerpts before sharing
- Do not send private recordings to public channels
- Confirm provider policy before uploading regulated or confidential audio
- Keep source files in the workspace only as long as needed
Troubleshooting
- If transcription fails, confirm the file format and size are supported
- If speech output sounds wrong, revise the script and voice instructions
- If provider calls fail, check API keys and quota
- If channel delivery fails, export the audio file and upload manually