Do Hindi users pay more for AI? New data says Claude is expensive for non-english speakers
Are you talking to AI in Hindi or another non-English language? If yes, your AI chatbot could actually be costing you more. Companies like Anthropic
Are you talking to AI in Hindi or another non-English language? If yes, your AI chatbot could actually be costing you more. Companies like Anthropic, OpenAI and Google often present their latest AI models as tools that work equally well for everyone, regardless of where they live or what language they speak. But as it turns out, new data shared by researchers suggests that users who interact with AI in languages such as Hindi, Arabic and Chinese may effectively pay more than English speakers for conveying the same amount of information. Read Full Story The reason? It lies behind how AI models process language. The same prompt in Hindi can generate significantly more tokens — the units AI systems use to read and understand text — than its English equivalent. Or, to put it simply, saying the same thing in Hindi will cost you more tokens than saying it in English.
Hence making AI more expensive for non-English speakers. Researchers, developers and AI users are increasingly referring to this phenomenon as a "language tax" or "linguistic tax". Or a hidden cost created by the way AI models process different languages. What exactly is going on? A few weeks ago, OpenAI researcher Aran Komatsuzaki shared an experiment comparing how OpenAI and Anthropic's tokenizers handle text in different languages. Using AI researcher Rich Sutton's influential essay ‘The Bitter Lesson’ as a benchmark, Komatsuzaki translated the text into multiple languages and measured how many tokens were generated by different AI systems. The results revealed a significant gap between English and several non-English languages. According to the analysis, Hindi text required 1.37 times more tokens than English on OpenAI's tokenizer. On Anthropic's Claude tokenizer, however, the figure rose to 3.24 times. Arabic required 2.86 times more tokens on Claude, while Chinese required 1.71 times more.
In simpler terms, if an English-speaking user spends one token budget to communicate an idea, a Hindi-speaking user may need more than three times that token budget on Claude to express the same information. Komatsuzaki noted that these figures are based on a specific benchmark rather than every possible type of text. Still, the findings have sparked a wider conversation about how AI systems treat non-English languages. But why do some languages cost more than others? The answer lies in how AI models break down and process text. Before an AI model can understand a prompt, it converts the text into smaller units called tokens. This process is handled by a component known as a tokenizer. Now according to researchers, this 'language tax' happens because AI models were mostly trained on English data. Because of this, the systems handle English much more efficiently. Other languages, like Hindi, Arabic, and Chinese, get broken down into many more pieces (tokens) due to their different scripts and structures, which makes them costlier to process.
