Phi-3-Vision, a multi-modal LLM, excels in various areas but faces challenges in high-level reasoning tasks and occasionally generates ungrounded outputs, posing reliability concerns in sensitive fields like finance.
Safety measures post-training have improved but Phi-3-Vision struggles to avoid providing answers to harmful or sensitive questions, highlighting a trade-off between helpfulness and harmlessness.
Future plans involve integrating more reasoning-focused and hallucination-related DPO data into post-training to address the identified limitations.