Both are excellent choices for “human-level” TTS, but they serve different needs. OpenAI is the budget-friendly, high-speed workhorse, while ElevenLabs is the premium “creative studio” for maximum realism and voice cloning.
The Breakdown
| Feature | OpenAI (TTS-1) | ElevenLabs |
| Best For | Developers needing affordable, fast, consistently good audio. | Creators needing specific voices, emotions, or voice cloning. |
| Cost | $15.00 per 1 million characters. | $100+ per 1 million characters (via subscription tiers).* |
| Voices | Limited to 6 preset voices (Alloy, Echo, Fable, Onyx, Nova, Shimmer). | Thousands of community voices + ability to clone your own. |
| Quality | Excellent, but limited emotional control. | “Gold Standard.” Can whisper, shout, and laugh realistically. |
| Latency | Very low (great for real-time apps). | Low with “Turbo” models, but generally slightly slower. |
*Note: ElevenLabs uses a credit system. The “Creator” plan is roughly $11/mo for 100,000 characters. If you scale that to 1 million, it is significantly more expensive than OpenAI.
Create a natural conversation?
Suppose that I have a conversation between 2 characters. For example
Data Scientist: So, we’ve trained this model, it’s got 97% accuracy, but… we really don’t know if it’ll generalize.
Statistician: Don’t know? Just give it a confidence interval, call it a day.
Engineer: Wait, so… the model doesn’t even know if it knows? Is it that unsure?
Data Scientist: Exactly. We need to quantify how much it doesn’t know. That’s where uncertainty modeling comes in.
Engineer: So let me get this straight. We’re building a model… to model the things our model doesn’t know?
Statistician: Welcome to the world of nested probabilistic models. It’s like Russian dolls, but with more Greek letters.
Data Scientist: We’re adding Gaussian processes, Bayesian neural networks… we’ll throw all of it at the problem.
Engineer: So we're basically teaching the model to say, “I have no idea, but in a very precise way”?
Statistician: Exactly. Think of it as making the model more… honest.
Data Scientist: Right! And if we can quantify its uncertainty, then we can make decisions with more confidence.
Engineer: So now we’re confidently uncertain?
Statistician: Exactly! The more confident we are in our uncertainty, the better.
Engineer: …and we call this progress?
Data Scientist: Absolutely! Look, without uncertainty, our model would just be pretending to know everything. And that’s worse.
Statistician: In short, we’re certain that uncertainty is essential.
Engineer: Okay, but what happens if the uncertainty model itself is uncertain?
Data Scientist: Then we add another uncertainty model… to model the uncertainty of our uncertainty model.
Engineer: I think I need a high-confidence drink.
Now, I want to use text-to-speech in python to create an audio file of conversation
- Use OpenAI if you are okay with the 6 standard voices. You can map them like this:
- Data Scientist
onyx(Deep, serious) - Statistician
shimmer(Clear, articulate) - Engineer
fable(British-leaning, distinct)
- Data Scientist
- Use ElevenLabs if you want the Statistician to sound like an old British professor and the Engineer to sound like a grumpy New Yorker. You would clone or select those specific “actor” voices from their library.
Python Code: OpenAI TTS-1 Version
Since OpenAI is much cheaper and easier to set up for a quick test, here is the code to generate your conversation using the OpenAI API.
Prerequisites:
Bash
pip install openai pydub
# You also need an OpenAI API Key
The Code:
Python
from openai import OpenAI
from pydub import AudioSegment
import io
# 1. Setup OpenAI Client
client = OpenAI(api_key="YOUR_OPENAI_API_KEY")
# 2. Define the conversation
conversation = [
("Data Scientist", "So, we’ve trained this model, it’s got 97% accuracy, but… we really don’t know if it’ll generalize."),
("Statistician", "Don’t know? Just give it a confidence interval, call it a day."),
("Engineer", "Wait, so… the model doesn’t even know if it knows? Is it that unsure?"),
("Data Scientist", "Exactly. We need to quantify how much it doesn’t know. That’s where uncertainty modeling comes in."),
("Engineer", "So let me get this straight. We’re building a model… to model the things our model doesn’t know?"),
("Statistician", "Welcome to the world of nested probabilistic models. It’s like Russian dolls, but with more Greek letters."),
("Data Scientist", "We’re adding Gaussian processes, Bayesian neural networks… we’ll throw all of it at the problem."),
("Engineer", "So we're basically teaching the model to say, “I have no idea, but in a very precise way”?"),
("Statistician", "Exactly. Think of it as making the model more… honest."),
("Data Scientist", "Right! And if we can quantify its uncertainty, then we can make decisions with more confidence."),
("Engineer", "So now we’re confidently uncertain?"),
("Statistician", "Exactly! The more confident we are in our uncertainty, the better."),
("Engineer", "…and we call this progress?"),
("Data Scientist", "Absolutely! Look, without uncertainty, our model would just be pretending to know everything. And that’s worse."),
("Statistician", "In short, we’re certain that uncertainty is essential."),
("Engineer", "Okay, but what happens if the uncertainty model itself is uncertain?"),
("Data Scientist", "Then we add another uncertainty model… to model the uncertainty of our uncertainty model."),
("Engineer", "I think I need a high-confidence drink.")
]
# 3. Map Characters to OpenAI Voices
# Available voices: alloy, echo, fable, onyx, nova, shimmer
voice_map = {
"Data Scientist": "onyx", # Deep, authoritative
"Statistician": "nova", # Professional, slightly higher pitch
"Engineer": "fable" # Distinct tone, often sounds a bit more casual
}
full_audio = AudioSegment.empty()
pause = AudioSegment.silent(duration=300)
print("Generating audio with OpenAI...")
for speaker, text in conversation:
voice = voice_map.get(speaker, "alloy")
try:
# Request audio from OpenAI
response = client.audio.speech.create(
model="tts-1",
voice=voice,
input=text
)
# Stream response into memory
# OpenAI returns binary data directly
audio_data = io.BytesIO(response.content)
# Convert to AudioSegment
segment = AudioSegment.from_file(audio_data, format="mp3")
full_audio += segment + pause
print(f"Generated: {speaker} ({voice})")
except Exception as e:
print(f"Error generating line for {speaker}: {e}")
# 4. Export
output_file = "conversation_openai.mp3"
full_audio.export(output_file, format="mp3")
print(f"Done! Saved to {output_file}")
If you choose ElevenLabs:
The code logic is almost identical, but instead of client.audio.speech.create, you would use the elevenlabs Python library. You would need to copy the voice_id (a long string like TxGEqnHW...) for each specific character you picked from their website.
To use ElevenLabs, you will need to install their official Python library. This code is slightly more complex than the OpenAI version because ElevenLabs gives you granular control over Voice Settings (stability, similarity, etc.) which is what allows you to make one character sound “excited/unstable” and another “serious/stable.”
Prerequisites
Bash
pip install elevenlabs pydub
The ElevenLabs Code
- Voice IDs: I have filled in default “pre-made” voice IDs (Adam, Antoni, Rachel) below.
- Custom Voices: If you clone your own voices on the ElevenLabs website, you will get a
voice_id(e.g.,TxGEqnHW...). You should paste those IDs into thevoice_mapdictionary below.
Python
import io
from elevenlabs.client import ElevenLabs
from elevenlabs import VoiceSettings
from pydub import AudioSegment
# 1. Setup Client
# Make sure you have your API key ready
client = ElevenLabs(api_key="YOUR_ELEVENLABS_API_KEY")
# 2. Define the conversation
conversation = [
("Data Scientist", "So, we’ve trained this model, it’s got 97% accuracy, but… we really don’t know if it’ll generalize."),
("Statistician", "Don’t know? Just give it a confidence interval, call it a day."),
("Engineer", "Wait, so… the model doesn’t even know if it knows? Is it that unsure?"),
("Data Scientist", "Exactly. We need to quantify how much it doesn’t know. That’s where uncertainty modeling comes in."),
("Engineer", "So let me get this straight. We’re building a model… to model the things our model doesn’t know?"),
("Statistician", "Welcome to the world of nested probabilistic models. It’s like Russian dolls, but with more Greek letters."),
("Data Scientist", "We’re adding Gaussian processes, Bayesian neural networks… we’ll throw all of it at the problem."),
("Engineer", "So we're basically teaching the model to say, “I have no idea, but in a very precise way”?"),
("Statistician", "Exactly. Think of it as making the model more… honest."),
("Data Scientist", "Right! And if we can quantify its uncertainty, then we can make decisions with more confidence."),
("Engineer", "So now we’re confidently uncertain?"),
("Statistician", "Exactly! The more confident we are in our uncertainty, the better."),
("Engineer", "…and we call this progress?"),
("Data Scientist", "Absolutely! Look, without uncertainty, our model would just be pretending to know everything. And that’s worse."),
("Statistician", "In short, we’re certain that uncertainty is essential."),
("Engineer", "Okay, but what happens if the uncertainty model itself is uncertain?"),
("Data Scientist", "Then we add another uncertainty model… to model the uncertainty of our uncertainty model."),
("Engineer", "I think I need a high-confidence drink.")
]
# 3. Configure Voices
# You can find Voice IDs in your VoiceLab on the ElevenLabs website.
# These are standard pre-made IDs:
# Adam (Deep, male) | Rachel (Clear, female) | Antoni (Sharp, male)
voice_map = {
"Data Scientist": {
"id": "pNInz6obpgDQGcFmaJgB", # 'Adam'
"settings": VoiceSettings(stability=0.5, similarity_boost=0.75, style=0.0, use_speaker_boost=True)
},
"Statistician": {
"id": "21m00Tcm4TlvDq8ikWAM", # 'Rachel'
# Higher stability = more consistent/professional
"settings": VoiceSettings(stability=0.8, similarity_boost=0.75, style=0.0, use_speaker_boost=True)
},
"Engineer": {
"id": "ErXwobaYiN019PkySvjV", # 'Antoni'
# Lower stability = more expressive/sarcastic
"settings": VoiceSettings(stability=0.35, similarity_boost=0.75, style=0.5, use_speaker_boost=True)
}
}
full_audio = AudioSegment.empty()
pause = AudioSegment.silent(duration=400) # 400ms pause
print("Generating audio with ElevenLabs...")
for speaker, text in conversation:
config = voice_map.get(speaker)
if config:
print(f"Generating: {speaker}...")
try:
# Generate the audio stream (generator)
audio_generator = client.text_to_speech.convert(
voice_id=config["id"],
voice_settings=config["settings"],
model_id="eleven_multilingual_v2", # Best for nuance
text=text
)
# Consume the generator into a single bytes object
audio_bytes = b"".join(chunk for chunk in audio_generator)
# Convert bytes to AudioSegment
segment = AudioSegment.from_file(io.BytesIO(audio_bytes), format="mp3")
full_audio += segment + pause
except Exception as e:
print(f"Error for {speaker}: {e}")
else:
print(f"Voice not found for {speaker}")
# 4. Export
output_file = "conversation_elevenlabs.mp3"
full_audio.export(output_file, format="mp3")
print(f"Done! Saved to {output_file}")
Pro Tip for ElevenLabs Costs
Since ElevenLabs is expensive ($15+ goes fast if you make mistakes), test your conversation with the OpenAI script first.
Once you are happy with the pacing (pauses) and the text content, swap to this ElevenLabs script for the final “production” render. This prevents you from burning credits on typos or bad dialogue.