Updated: March 18, 2025 (March 18, 2025)

Understanding Tokens and Context Windows

Before joining Directions on Microsoft in 2020, Barry worked at Microsoft for 12 years in a variety of roles, including... more

Tokens are the core unit of consumption measurement for generative AI applications. A token refers to the smallest unit of text—or the smallest piece of an image—managed by a large language model (LLM). Azure OpenAI charges based upon the number of input and output tokens. The process of tokenization and the number of tokens created may vary according to the language; for example, a sentence in German may generate more tokens than an equivalent English sentence.

In text, a token can be a word, a part of a word (“Directions” might be decomposed into three tokens — “Di,” “rec,” and “tions”), or punctuation, such as a semicolon. Azure AI Foundry’s Chat Playground can track the number of tokens used in a session. In the illustration, there are 52 tokens, the word “generative” is split into two tokens, and each punctuation mark is a token. (OpenAI’s open-source Python library tiktoken was used to analyze this sentence.)

Context windows limit the number of tokens used, as long prompts and responses can place a strain on inferencing resources and add cost. Context window limits include the number of tokens in the input prompt, tokens added from retrieval augmented generation (RAG) searches (for example, if the prompt also includes elements from a data source such as an HR database), and output.

Atlas Members have full access

Get access to this and thousands of other unbiased analyses, roadmaps, decision kits, infographics, reference guides, and more, all included with membership. Comprehensive access to the most in-depth and unbiased expertise for Microsoft enterprise decision-making is waiting.

Membership Options

Already have an account? Login Now