Updated: March 18, 2025 (March 18, 2025)
Charts & IllustrationsUnderstanding Tokens and Context Windows
Tokens are the core unit of consumption measurement for generative AI applications. A token refers to the smallest unit of text—or the smallest piece of an image—managed by a large language model (LLM). Azure OpenAI charges based upon the number of input and output tokens. The process of tokenization and the number of tokens created may vary according to the language; for example, a sentence in German may generate more tokens than an equivalent English sentence.
In text, a token can be a word, a part of a word (“Directions” might be decomposed into three tokens — “Di,” “rec,” and “tions”), or punctuation, such as a semicolon. Azure AI Foundry’s Chat Playground can track the number of tokens used in a session. In the illustration, there are 52 tokens, the word “generative” is split into two tokens, and each punctuation mark is a token. (OpenAI’s open-source Python library tiktoken was used to analyze this sentence.)
Context windows limit the number of tokens used, as long prompts and responses can place a strain on inferencing resources and add cost. Context window limits include the number of tokens in the input prompt, tokens added from retrieval augmented generation (RAG) searches (for example, if the prompt also includes elements from a data source such as an HR database), and output.
Atlas Members have full access
Get access to this and thousands of other unbiased analyses, roadmaps, decision kits, infographics, reference guides, and more, all included with membership. Comprehensive access to the most in-depth and unbiased expertise for Microsoft enterprise decision-making is waiting.
Membership OptionsAlready have an account? Login Now