We consider the problem of a learning agent playing a general sum game against an unknown strategic opponent.The agent knows their own payoff function but is uncertain about the opponent's payoff distribution.A polynomial-time algorithm is presented to maximize the agent's utility within a small epsilon of optimal utility.When the algorithm is constrained to be a no-regret algorithm, an optimal learning algorithm can be constructed in polynomial time.