“Annealing”, in this sense, is an analogy to a chemical process in which temperature is reduced.
Here, too, the ‘temperature’ term in the softmax is reduced, changing it gradually from a ‘soft’ sigmoid function to a ‘sharp’ argmax
Try plotting a softmax function for yourself with different temperatures, you’ll see the difference…
solved Terms in neural networks: what is annealing temperature parameter in a softmax activation function?