Skip to content

Commit ea2e1b6

Browse files
authored
LLM Gateway: Estimating Input Tokens (#469)
* Add file * Remove
1 parent 18c5d85 commit ea2e1b6

File tree

3 files changed

+154
-42
lines changed

3 files changed

+154
-42
lines changed

fern/cookbooks/lemur/custom-vocab-lemur.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ Before we begin, make sure you have an AssemblyAI account and an API key. You ca
1919

2020
### Side Note: Costs
2121

22-
We've optimized the LeMUR prompt to minimize the number of output tokens produced by the model, which will reduce the overall cost of this solution significantly, but if you do want to calculate the total number of input / output tokens and cost associated with them, please use our [LeMUR pricing cookbook](https://www.assemblyai.com/docs/guides/counting-tokens) as a reference on how to do so.
22+
We've optimized the LeMUR prompt to minimize the number of output tokens produced by the model, which will reduce the overall cost of this solution significantly, but if you do want to calculate the total number of input / output tokens and cost associated with them, please use our [LLM Gateway Input Token Estimation Guide](https://www.assemblyai.com/docs/guides/counting-tokens) as a reference on how to do so.
2323

2424
## Step-by-Step Instructions
2525

Lines changed: 152 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1,76 +1,188 @@
11
---
2-
title: "Estimate Input Token Costs for LeMUR"
2+
title: "Estimate Input Token Costs for LLM Gateway"
33
---
44

5-
AssemblyAI's [LeMUR](https://www.assemblyai.com/blog/lemur/) (Leveraging Large Language Models to Understand Recognized Speech) framework is a powerful way to extract insights from transcripts generated from audio and video files. Given how varied the type of input and output could be for these use cases, the [pricing](https://www.assemblyai.com/pricing) for LeMUR is based on input and output tokens.
5+
AssemblyAI's [LLM Gateway](/docs/llm-gateway/overview) is a unified API providing access to 15+ models from Claude, GPT, and Gemini through a single interface.
6+
It's a powerful way to extract insights from transcripts generated from audio and video files. Given how varied the type of input and output could be for these use cases, the [pricing](https://www.assemblyai.com/pricing) for LLM Gateway is based on **both** input and output tokens.
67

7-
Output tokens can be controlled via LeMUR's [max_output_size](https://www.assemblyai.com/docs/lemur/advanced/customize-parameters#change-the-maximum-output-size) parameter, but how do you determine the amount of input tokens you'll be sending to LeMUR? How many tokens does an audio file contain? This Colab will show you how to calculate that information to help predict LeMUR's cost ahead of time.
8+
Output tokens will vary depending on the model and the complexity of your request, but how do you determine the amount of input tokens you'll be sending to LLM Gateway?
9+
How many tokens does an audio file and your prompt contain? This guide will show you how to roughly calculate that information to help predict LLM Gateway's input token cost ahead of time.
810

9-
## Quickstart
10-
11-
```python
12-
import assemblyai as aai
11+
<Note>
12+
This guide calculates **input token costs only**.
13+
Output token costs will vary based on the model used and the length of the generated response.
1314

14-
aai.settings.api_key = "YOUR_API_KEY"
15-
transcriber = aai.Transcriber()
15+
To see the specific cost of each model (per 1M input and output tokens) applicable to your AssemblyAI account, refer to the Rates table on the [Billing page](https://www.assemblyai.com/dashboard/account/billing) of the dashboard.
16+
</Note>
1617

17-
transcript = transcriber.transcribe("https://github.com/AssemblyAI-Examples/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3")
18-
character_count = len(transcript.text)
19-
count_in_thousands = character_count / 1000
20-
21-
sonnet_cost = 0.003 * count_in_thousands
22-
opus_cost = 0.015 * count_in_thousands
23-
haiku_cost = 0.00025 * count_in_thousands
18+
## Quickstart
2419

25-
print(f"LeMUR Claude 3.5 Sonnet Cost: ${sonnet_cost}")
26-
print(f"LeMUR Default | LeMUR Claude 2.1 | LeMUR Claude 3 Opus Cost: ${opus_cost}")
27-
print(f"LeMUR Claude 3 Haiku Cost: ${haiku_cost}")
20+
```python
21+
import requests
22+
import time
23+
24+
base_url = "https://api.assemblyai.com"
25+
headers = {"authorization": "YOUR_API_KEY"}
26+
27+
# Transcribe audio file
28+
audio_url = "https://assembly.ai/wildfires.mp3"
29+
data = {"audio_url": audio_url}
30+
31+
response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
32+
transcript_id = response.json()["id"]
33+
polling_endpoint = base_url + f"/v2/transcript/{transcript_id}"
34+
35+
# Poll for completion
36+
print("Waiting for transcription to complete...")
37+
38+
while True:
39+
transcript = requests.get(polling_endpoint, headers=headers).json()
40+
if transcript["status"] == "completed":
41+
break
42+
elif transcript["status"] == "error":
43+
raise RuntimeError(f"Transcription failed: {transcript['error']}")
44+
time.sleep(3)
45+
46+
# Define your prompt
47+
prompt = "Provide a brief summary of the transcript."
48+
49+
# Calculate character count (transcript + prompt)
50+
transcript_chars = len(transcript["text"])
51+
prompt_chars = len(prompt)
52+
total_chars = transcript_chars + prompt_chars
53+
print(f"\nTotal characters: {total_chars}")
54+
55+
# Estimate tokens (roughly 4 characters = 1 token)
56+
estimated_tokens = total_chars / 4
57+
tokens_in_millions = estimated_tokens / 1_000_000
58+
59+
# Calculate input costs for different models
60+
gpt5_cost = 1.25 * tokens_in_millions
61+
claude_sonnet_cost = 3.00 * tokens_in_millions
62+
gemini_pro_cost = 1.25 * tokens_in_millions
63+
64+
print(f"Estimated input tokens: {estimated_tokens:,.0f}")
65+
print(f"\nEstimated input costs:")
66+
print(f"GPT-5: ${gpt5_cost:.4f}")
67+
print(f"Claude 4.5 Sonnet: ${claude_sonnet_cost:.4f}")
68+
print(f"Gemini 2.5 Pro: ${gemini_pro_cost:.4f}")
2869
```
2970

3071
## Step-by-Step Guide
3172

32-
To get started, you'll need to install the AssemblyAI Python SDK, which we'll use to transcribe our file.
73+
### Install dependencies
74+
75+
Install the requests library if you haven't already:
3376

3477
```bash
35-
pip install -U assemblyai
78+
pip install requests
3679
```
3780

38-
Now we'll import these files and set our AssemblyAI API key, which can be found on your account [dashboard](https://www.assemblyai.com/app/account).
81+
### Set up your API key
82+
83+
Import the necessary libraries and set your AssemblyAI API key, which can be found on your account [dashboard](https://www.assemblyai.com/dashboard/api-keys):
3984

4085
```python
41-
import assemblyai as aai
86+
import requests
87+
import time
4288
43-
aai.settings.api_key = "API_KEY"
89+
base_url = "https://api.assemblyai.com"
90+
headers = {"authorization": "YOUR_API_KEY"}
4491
```
4592

46-
Next we'll transcribe our file using AssemblyAI.
93+
### Transcribe your audio file
94+
95+
Transcribe your audio file using AssemblyAI:
4796

4897
```python
49-
transcriber = aai.Transcriber()
98+
audio_url = "https://assembly.ai/wildfires.mp3"
99+
data = {"audio_url": audio_url}
100+
101+
response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
102+
transcript_id = response.json()["id"]
103+
polling_endpoint = base_url + f"/v2/transcript/{transcript_id}"
104+
105+
# Poll for completion
106+
print("Waiting for transcription to complete...")
107+
108+
while True:
109+
transcript = requests.get(polling_endpoint, headers=headers).json()
110+
if transcript["status"] == "completed":
111+
break
112+
elif transcript["status"] == "error":
113+
raise RuntimeError(f"Transcription failed: {transcript['error']}")
114+
time.sleep(3)
115+
```
116+
117+
### Calculate character count
50118

51-
transcript = transcriber.transcribe("https://github.com/AssemblyAI-Examples/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3")
119+
We'll count the characters in both the transcript and your prompt:
120+
121+
```python
122+
# Define your prompt
123+
prompt = "Provide a brief summary of the transcript."
124+
125+
# Calculate character count (transcript + prompt)
126+
transcript_chars = len(transcript["text"])
127+
prompt_chars = len(prompt)
128+
total_chars = transcript_chars + prompt_chars
129+
print(f"\nTotal characters: {total_chars}")
52130
```
53131

54-
LeMUR counts tokens based solely on character count, so we'll be using Python's built-in `len()` function to count the characters in our transcript.
132+
For this specific file with the example prompt, the transcript contains approximately 4,880 characters and the prompt contains 42 characters, for a total of 4,922 characters.
133+
134+
### Estimate tokens
135+
136+
Different LLM providers use different tokenization methods, but a rough estimate is that **4 characters equals approximately 1 token**. This is based on guidance from:
137+
138+
- [Claude tokenization documentation](https://docs.claude.com/en/docs/about-claude/pricing#frequently-asked-questions)
139+
- [OpenAI token counting guide](https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them)
140+
- [Gemini tokens documentation](https://ai.google.dev/gemini-api/docs/tokens?lang=python)
55141

56142
```python
57-
character_count = len(transcript.text)
143+
# Estimate tokens (roughly 4 characters = 1 token)
144+
estimated_tokens = total_chars / 4
145+
tokens_in_millions = estimated_tokens / 1_000_000
58146
59-
print(character_count)
147+
print(f"Estimated input tokens: {estimated_tokens:,.0f}")
60148
```
61149

62-
For this specific file, we got 4,928 tokens. LeMUR's pricing is calculated per 1K tokens, and is prorated for amounts below this.
150+
<Note title="Language considerations">
151+
Token counts can differ significantly across languages. Non-English languages typically require more tokens per character than English. For instance, text in languages like Spanish, Chinese, or Arabic may use 2-3 characters per token instead of 4, resulting in higher token costs for the same amount of content.
152+
</Note>
63153

64-
```python
65-
count_in_thousands = character_count / 1000
154+
### Calculate input token costs
66155

67-
sonnet_cost = 0.003 * count_in_thousands
68-
opus_cost = 0.015 * count_in_thousands
69-
haiku_cost = 0.00025 * count_in_thousands
156+
LLM Gateway's pricing is calculated per 1M input tokens. Here are the [current rates](https://www.assemblyai.com/pricing) for popular models:
70157

71-
print(f"LeMUR Claude 3.5 Sonnet Cost: ${sonnet_cost}")
72-
print(f"LeMUR Default | LeMUR Claude 2.1 | LeMUR Claude 3 Opus Cost: ${opus_cost}")
73-
print(f"LeMUR Claude 3 Haiku Cost: ${haiku_cost}")
158+
```python
159+
# Calculate input costs for different models (rates per 1M tokens)
160+
gpt5_cost = 1.25 * tokens_in_millions
161+
claude_sonnet_cost = 3.00 * tokens_in_millions
162+
gemini_pro_cost = 1.25 * tokens_in_millions
163+
164+
print(f"\nEstimated input costs:")
165+
print(f"GPT-5: ${gpt5_cost:.4f}")
166+
print(f"Claude 4.5 Sonnet: ${claude_sonnet_cost:.4f}")
167+
print(f"Gemini 2.5 Pro: ${gemini_pro_cost:.4f}")
74168
```
75169

76-
Here we've determined how much the input tokens from our transcript will cost with LeMUR, not including the tokens of our prompt. To calculate how much the input tokens will cost for your prompt, or how much the amount of output tokens you're limited to will cost, you can follow this same method, replacing `transcript.text` with the text of your prompt, or `num_tokens` with the `max_output_size` amount you've specified to LeMUR.
170+
For our example file with approximately 1,230 input tokens:
171+
- **GPT-5** (`gpt-5`): ~$0.0015
172+
- **Claude 4.5 Sonnet** (`claude-sonnet-4-5-20250929`): ~$0.0037
173+
- **Gemini 2.5 Pro** (`gemini-2.5-pro`): ~$0.0015
174+
175+
<Note>
176+
These calculations estimate **input token costs only**. Output tokens are not included and will vary based on:
177+
- The model you choose
178+
- The complexity of your request
179+
- The length of the generated response
180+
181+
To see the complete pricing for both input and output tokens for all available models, visit the Rates table on the [Billing page](https://www.assemblyai.com/dashboard/account/billing) of your dashboard.
182+
</Note>
183+
184+
## Next steps
185+
186+
- [LLM Gateway Overview](/docs/llm-gateway/overview) - Learn about all available models and capabilities
187+
- [Apply LLM Gateway to Audio Transcripts](/docs/lemur/apply-llms-to-audio-files) - Complete guide to using LLM Gateway with transcripts
188+
- [Billing page](https://www.assemblyai.com/dashboard/account/billing) - View LLM pricing for your account

fern/pages/05-guides/cookbooks/lemur/custom-vocab-lemur.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ Before we begin, make sure you have an AssemblyAI account and an API key. You ca
9494

9595
### Side Note: Costs
9696

97-
We've optimized the LeMUR prompt to minimize the number of output tokens produced by the model, which will reduce the overall cost of this solution significantly, but if you do want to calculate the total number of input / output tokens and cost associated with them, please use our [LeMUR pricing cookbook](https://www.assemblyai.com/docs/guides/counting-tokens) as a reference on how to do so.
97+
We've optimized the LeMUR prompt to minimize the number of output tokens produced by the model, which will reduce the overall cost of this solution significantly, but if you do want to calculate the total number of input / output tokens and cost associated with them, please use our [LLM Gateway Input Token Estimation Guide](https://www.assemblyai.com/docs/guides/counting-tokens) as a reference on how to do so.
9898

9999
## Step-by-Step Instructions
100100

0 commit comments

Comments
 (0)