The DeepSeek-R1 model gained attention for its reasoning abilities and cost-efficiency compared to other models.Assessing DeepSeek-R1's reasoning abilities programmatically offers deeper insights.Distilled models from DeepSeek-R1, varying in size, aim to replicate the larger model's performance.Distillation transfers reasoning abilities to smaller, more efficient models for complex tasks.The selection of a distilled model size depends on hardware capabilities and performance needs.Benchmarks like GPQA-Diamond are used to evaluate reasoning capabilities in LLMs.Tools like Ollama and OpenAI's simple-evals assist in evaluating reasoning models.Evaluation results of DeepSeek-R1's distilled model on GPQA-Diamond highlighted some challenges.Setting up Ollama and simple-evals for benchmarking involves specific configurations.Although distilled models may have limitations in complex tasks, they offer opportunities for efficient deployment.