Training conversational question-answering systems with in-domain data is challenging due to its scarcity.
Traditional top-down methods use a large language model to generate multi-turn dialogues, but lack content control and are susceptible to hallucinations.
A bottom-up approach is introduced, generating QA pairs first and then combining them into coherent dialogues, offering greater control and precision.
Human and automated evaluations show that the bottom-up approach produces more realistic and higher-quality dialogues compared to top-down methods.