Researchers study the implicit bias of self-attention in transformers.Convergence of the key-query matrix is possbile with certain conditions.Two adaptive step-size strategies, normalized GD and Polyak step-size, are analyzed.The findings accelerate parameter convergence and deepen understanding of implicit bias in self-attention.