To determine the best generative AI model, smart evaluation is essential, like using Vertex AI evaluation service and LLM Comparator.Pairwise model evaluations, comparing two models directly, are crucial for assessing relative performance.Pairwise model evaluation helps make informed decisions, define 'better' quantitatively, and monitor continuous retraining.Vertex AI evaluation service aids in model selection, configuration, prompt engineering, fine-tuning, and migration.It also supports various model-based metrics, pairwise evaluations, and computation-based metrics.The service allows for custom metrics definition, model comparison, tracking datasets, and seamless API access.LLM Comparator, an open-source tool by PAIR at Google, enhances automated LLM evaluation with human-in-the-loop processes.LLM Comparator enables side-by-side model output comparisons, visualizations, and custom metric extensions.By integrating pairwise evaluation results with LLM Comparator, a deep understanding of model performance can be achieved.This semi-automated evaluation approach can streamline model assessment and comparison for diverse LLMs, improving overall quality.