Generate speech using partner models

Last updated on 27 Okt 2025

Learn how to use partner models to generate audio clips with varying voices, tones, and accents.

Try it in the app
Generate speech from text in a few simple steps.

With Generate Speech (beta)you can generate audio clips using partner models to quickly create natural-sounding voice-overs. You can set different voices and tonalities by adjusting core functionalities of the speech generation, such as speed and stability.

On the Firefly homepage, select Generate from the left panel and then Generate speech (beta).

On the Generate speech page, copy and paste the text you want to convert into speech or select Add Text and upload a file in DOCX or TXT format.

After adding the text, navigate to the Speech settings panel on the left and use the Model dropdown to select ElevenLabs Multilingual v2.

Note

You can also create audio clips using the Firefly Speech model. 

Use the Voice dropdown menu and select a voice.

The Textto-speech window on Firefly displays the Speech settings panel with a highlight over the Voice dropdown menu.
Pick the voice that best suits your project’s requirements and aligns with your creative goals.

You can give the selected voice a character by changing its speed, stability, style, exaggeration, and other general settings.

  • Speed: Drag the Speed bar to the right to increase or to the left to decrease the speed of the generated audio.
  • Stability: Drag the Stability bar to the right to increase or to the left to decrease the stability of the generated audio.
  • Similarity: Drag the Similarity bar to the right to increase or to the left to decrease the similarity to the selected voice audio.
  • Style Exaggeration: Drag the Style Exaggeration bar to the right to increase or to the left to decrease the adherence to the selected audio style.
  • Speaker boost: Toggle on the Speaker boost option to increase the similarity of the synthesized voice and speech.
Tip
  • Navigate to the bottom of the left panel and select the icon to play a sample audio of the voice you’ve selected and adjust the controls.
  • You can also add the voice to your favourites by selecting the  icon.

In the main text editor window, you can make additional edits to the text entered:

  • Play: Preview selected text in your uploaded content before generating it.
Text-to-speech screen displays the text editor window with the Play button highlighted to preview voice output.
Use the Play button to quickly preview how the text sounds with the selected voice settings.

  • Fix Pronunciation: Fix pronunciation and add additional guidance on how certain words should sound.
  • Find & Replace: Select words and replace them.
  • Add Text: Add additional text to the uploaded content by importing a TXT or DOCX file.
  • Add Pause: Add pauses to make the audio track sound more natural.
  • Add Tone: Add tonality to your audio and define the intonation of the generated speech.

Select Generate.

Once you’re satisfied with the generation and how it sounds, select Download to save a copy of the audio file in WAV or MP3 format.