Actor-Critic is a Temporal Difference version of policy gradient.It has two networks: Actor and Critic.Actor decides which action to take, and Critic evaluates the action.The architecture resembles a Generative Adversarial Network.