Skip to content

openAi vs. ElevenLabs for natural text to speech: comparison + code

Both are excellent choices for “human-level” TTS, but they serve different needs. OpenAI is the budget-friendly, high-speed workhorse, while ElevenLabs is the premium “creative studio” for maximum realism and voice cloning.

The Breakdown

FeatureOpenAI (TTS-1)ElevenLabs
Best ForDevelopers needing affordable, fast, consistently good audio.Creators needing specific voices, emotions, or voice cloning.
Cost$15.00 per 1 million characters.$100+ per 1 million characters (via subscription tiers).*
VoicesLimited to 6 preset voices (Alloy, Echo, Fable, Onyx, Nova, Shimmer).Thousands of community voices + ability to clone your own.
QualityExcellent, but limited emotional control.“Gold Standard.” Can whisper, shout, and laugh realistically.
LatencyVery low (great for real-time apps).Low with “Turbo” models, but generally slightly slower.

*Note: ElevenLabs uses a credit system. The “Creator” plan is roughly $11/mo for 100,000 characters. If you scale that to 1 million, it is significantly more expensive than OpenAI.


Create a natural conversation?

Suppose that I have a conversation between 2 characters. For example


Data Scientist: So, we’ve trained this model, it’s got 97% accuracy, but… we really don’t know if it’ll generalize.

Statistician: Don’t know? Just give it a confidence interval, call it a day.

Engineer: Wait, so… the model doesn’t even know if it knows? Is it that unsure?

Data Scientist: Exactly. We need to quantify how much it doesn’t know. That’s where uncertainty modeling comes in.

Engineer: So let me get this straight. We’re building a model… to model the things our model doesn’t know?

Statistician: Welcome to the world of nested probabilistic models. It’s like Russian dolls, but with more Greek letters.

Data Scientist: We’re adding Gaussian processes, Bayesian neural networks… we’ll throw all of it at the problem.

Engineer: So we're basically teaching the model to say, “I have no idea, but in a very precise way”?

Statistician: Exactly. Think of it as making the model more… honest.

Data Scientist: Right! And if we can quantify its uncertainty, then we can make decisions with more confidence.

Engineer: So now we’re confidently uncertain?

Statistician: Exactly! The more confident we are in our uncertainty, the better.

Engineer: …and we call this progress?

Data Scientist: Absolutely! Look, without uncertainty, our model would just be pretending to know everything. And that’s worse.

Statistician: In short, we’re certain that uncertainty is essential.

Engineer: Okay, but what happens if the uncertainty model itself is uncertain?

Data Scientist: Then we add another uncertainty model… to model the uncertainty of our uncertainty model.

Engineer: I think I need a high-confidence drink.

Now, I want to use text-to-speech in python to create an audio file of conversation

  • Use OpenAI if you are okay with the 6 standard voices. You can map them like this:
    • Data Scientist \rightarrow onyx (Deep, serious)
    • Statistician \rightarrow shimmer (Clear, articulate)
    • Engineer \rightarrow fable (British-leaning, distinct)
  • Use ElevenLabs if you want the Statistician to sound like an old British professor and the Engineer to sound like a grumpy New Yorker. You would clone or select those specific “actor” voices from their library.

Python Code: OpenAI TTS-1 Version

Since OpenAI is much cheaper and easier to set up for a quick test, here is the code to generate your conversation using the OpenAI API.

Prerequisites:

Bash

pip install openai pydub
# You also need an OpenAI API Key

The Code:

Python

from openai import OpenAI
from pydub import AudioSegment
import io

# 1. Setup OpenAI Client
client = OpenAI(api_key="YOUR_OPENAI_API_KEY")

# 2. Define the conversation
conversation = [
    ("Data Scientist", "So, we’ve trained this model, it’s got 97% accuracy, but… we really don’t know if it’ll generalize."),
    ("Statistician", "Don’t know? Just give it a confidence interval, call it a day."),
    ("Engineer", "Wait, so… the model doesn’t even know if it knows? Is it that unsure?"),
    ("Data Scientist", "Exactly. We need to quantify how much it doesn’t know. That’s where uncertainty modeling comes in."),
    ("Engineer", "So let me get this straight. We’re building a model… to model the things our model doesn’t know?"),
    ("Statistician", "Welcome to the world of nested probabilistic models. It’s like Russian dolls, but with more Greek letters."),
    ("Data Scientist", "We’re adding Gaussian processes, Bayesian neural networks… we’ll throw all of it at the problem."),
    ("Engineer", "So we're basically teaching the model to say, “I have no idea, but in a very precise way”?"),
    ("Statistician", "Exactly. Think of it as making the model more… honest."),
    ("Data Scientist", "Right! And if we can quantify its uncertainty, then we can make decisions with more confidence."),
    ("Engineer", "So now we’re confidently uncertain?"),
    ("Statistician", "Exactly! The more confident we are in our uncertainty, the better."),
    ("Engineer", "…and we call this progress?"),
    ("Data Scientist", "Absolutely! Look, without uncertainty, our model would just be pretending to know everything. And that’s worse."),
    ("Statistician", "In short, we’re certain that uncertainty is essential."),
    ("Engineer", "Okay, but what happens if the uncertainty model itself is uncertain?"),
    ("Data Scientist", "Then we add another uncertainty model… to model the uncertainty of our uncertainty model."),
    ("Engineer", "I think I need a high-confidence drink.")
]

# 3. Map Characters to OpenAI Voices
# Available voices: alloy, echo, fable, onyx, nova, shimmer
voice_map = {
    "Data Scientist": "onyx",   # Deep, authoritative
    "Statistician": "nova",    # Professional, slightly higher pitch
    "Engineer": "fable"      # Distinct tone, often sounds a bit more casual
}

full_audio = AudioSegment.empty()
pause = AudioSegment.silent(duration=300)

print("Generating audio with OpenAI...")

for speaker, text in conversation:
    voice = voice_map.get(speaker, "alloy")
    
    try:
        # Request audio from OpenAI
        response = client.audio.speech.create(
            model="tts-1",
            voice=voice,
            input=text
        )
        
        # Stream response into memory
        # OpenAI returns binary data directly
        audio_data = io.BytesIO(response.content)
        
        # Convert to AudioSegment
        segment = AudioSegment.from_file(audio_data, format="mp3")
        full_audio += segment + pause
        
        print(f"Generated: {speaker} ({voice})")
        
    except Exception as e:
        print(f"Error generating line for {speaker}: {e}")

# 4. Export
output_file = "conversation_openai.mp3"
full_audio.export(output_file, format="mp3")
print(f"Done! Saved to {output_file}")

If you choose ElevenLabs:

The code logic is almost identical, but instead of client.audio.speech.create, you would use the elevenlabs Python library. You would need to copy the voice_id (a long string like TxGEqnHW...) for each specific character you picked from their website.

See also  PySpark: selecting and accessing data

To use ElevenLabs, you will need to install their official Python library. This code is slightly more complex than the OpenAI version because ElevenLabs gives you granular control over Voice Settings (stability, similarity, etc.) which is what allows you to make one character sound “excited/unstable” and another “serious/stable.”

Prerequisites

Bash

pip install elevenlabs pydub

The ElevenLabs Code

  • Voice IDs: I have filled in default “pre-made” voice IDs (Adam, Antoni, Rachel) below.
  • Custom Voices: If you clone your own voices on the ElevenLabs website, you will get a voice_id (e.g., TxGEqnHW...). You should paste those IDs into the voice_map dictionary below.

Python

import io
from elevenlabs.client import ElevenLabs
from elevenlabs import VoiceSettings
from pydub import AudioSegment

# 1. Setup Client
# Make sure you have your API key ready
client = ElevenLabs(api_key="YOUR_ELEVENLABS_API_KEY")

# 2. Define the conversation
conversation = [
    ("Data Scientist", "So, we’ve trained this model, it’s got 97% accuracy, but… we really don’t know if it’ll generalize."),
    ("Statistician", "Don’t know? Just give it a confidence interval, call it a day."),
    ("Engineer", "Wait, so… the model doesn’t even know if it knows? Is it that unsure?"),
    ("Data Scientist", "Exactly. We need to quantify how much it doesn’t know. That’s where uncertainty modeling comes in."),
    ("Engineer", "So let me get this straight. We’re building a model… to model the things our model doesn’t know?"),
    ("Statistician", "Welcome to the world of nested probabilistic models. It’s like Russian dolls, but with more Greek letters."),
    ("Data Scientist", "We’re adding Gaussian processes, Bayesian neural networks… we’ll throw all of it at the problem."),
    ("Engineer", "So we're basically teaching the model to say, “I have no idea, but in a very precise way”?"),
    ("Statistician", "Exactly. Think of it as making the model more… honest."),
    ("Data Scientist", "Right! And if we can quantify its uncertainty, then we can make decisions with more confidence."),
    ("Engineer", "So now we’re confidently uncertain?"),
    ("Statistician", "Exactly! The more confident we are in our uncertainty, the better."),
    ("Engineer", "…and we call this progress?"),
    ("Data Scientist", "Absolutely! Look, without uncertainty, our model would just be pretending to know everything. And that’s worse."),
    ("Statistician", "In short, we’re certain that uncertainty is essential."),
    ("Engineer", "Okay, but what happens if the uncertainty model itself is uncertain?"),
    ("Data Scientist", "Then we add another uncertainty model… to model the uncertainty of our uncertainty model."),
    ("Engineer", "I think I need a high-confidence drink.")
]

# 3. Configure Voices
# You can find Voice IDs in your VoiceLab on the ElevenLabs website.
# These are standard pre-made IDs:
# Adam (Deep, male) | Rachel (Clear, female) | Antoni (Sharp, male)
voice_map = {
    "Data Scientist": {
        "id": "pNInz6obpgDQGcFmaJgB", # 'Adam'
        "settings": VoiceSettings(stability=0.5, similarity_boost=0.75, style=0.0, use_speaker_boost=True)
    },
    "Statistician": {
        "id": "21m00Tcm4TlvDq8ikWAM", # 'Rachel'
        # Higher stability = more consistent/professional
        "settings": VoiceSettings(stability=0.8, similarity_boost=0.75, style=0.0, use_speaker_boost=True)
    },
    "Engineer": {
        "id": "ErXwobaYiN019PkySvjV", # 'Antoni'
        # Lower stability = more expressive/sarcastic
        "settings": VoiceSettings(stability=0.35, similarity_boost=0.75, style=0.5, use_speaker_boost=True)
    }
}

full_audio = AudioSegment.empty()
pause = AudioSegment.silent(duration=400) # 400ms pause

print("Generating audio with ElevenLabs...")

for speaker, text in conversation:
    config = voice_map.get(speaker)
    
    if config:
        print(f"Generating: {speaker}...")
        try:
            # Generate the audio stream (generator)
            audio_generator = client.text_to_speech.convert(
                voice_id=config["id"],
                voice_settings=config["settings"],
                model_id="eleven_multilingual_v2", # Best for nuance
                text=text
            )
            
            # Consume the generator into a single bytes object
            audio_bytes = b"".join(chunk for chunk in audio_generator)
            
            # Convert bytes to AudioSegment
            segment = AudioSegment.from_file(io.BytesIO(audio_bytes), format="mp3")
            full_audio += segment + pause
            
        except Exception as e:
            print(f"Error for {speaker}: {e}")
    else:
        print(f"Voice not found for {speaker}")

# 4. Export
output_file = "conversation_elevenlabs.mp3"
full_audio.export(output_file, format="mp3")
print(f"Done! Saved to {output_file}")

Pro Tip for ElevenLabs Costs

Since ElevenLabs is expensive ($15+ goes fast if you make mistakes), test your conversation with the OpenAI script first.

See also  PyTorch function song & examples: Autograd, Random Number Generation, Loss Functions, optimization

Once you are happy with the pacing (pauses) and the text content, swap to this ElevenLabs script for the final “production” render. This prevents you from burning credits on typos or bad dialogue.

Leave a Reply

error: Content is protected !!