What is Temperature?
AI EngineeringA parameter controlling the randomness and creativity of AI model outputs.
Temperature ranges from 0 (deterministic) to 2 (highly random). Lower temperatures produce focused, consistent outputs ideal for factual tasks. Higher temperatures increase creativity and variation.
Temperature: A Comprehensive Guide
Temperature is a parameter that controls the randomness and creativity of outputs generated by large language models. It is one of the most important inference-time settings for tuning AI behavior, and understanding how it works is essential for anyone building AI applications. Temperature typically ranges from 0 to 2, where 0 produces the most deterministic (greedy) output and higher values introduce increasing randomness and diversity into the generated text.
Technically, temperature works by scaling the logits (raw prediction scores) before they are converted into probabilities through the softmax function. At temperature 0, the model always selects the single most probable next token, producing identical outputs for identical inputs. At temperature 1.0, the model samples from the full probability distribution as learned during training. At temperatures above 1.0, the distribution is flattened, giving low-probability tokens a higher chance of being selected, which increases diversity but also increases the chance of incoherent or off-topic outputs.
Choosing the right temperature depends on the use case. For factual question answering, data extraction, code generation, and tasks requiring accuracy and consistency, low temperatures (0 to 0.3) are preferred. For creative writing, brainstorming, generating diverse marketing copy, or exploratory tasks, moderate temperatures (0.7 to 1.0) produce more varied and interesting results. Temperatures above 1.0 are rarely used in production, as outputs tend to become unreliable. Many AI applications use different temperature settings for different parts of their pipeline — low temperature for classification and extraction, moderate temperature for drafting content.
Temperature is often used in conjunction with other sampling parameters like top-p (nucleus sampling) and top-k. Top-p limits sampling to the smallest set of tokens whose cumulative probability exceeds a threshold P, while top-k limits sampling to the K most probable tokens. In practice, most developers set either temperature or top-p (not both) along with the model provider's recommended defaults for the other parameter. When debugging unexpected model behavior, temperature is one of the first parameters to check.