PMOA-TTS is introduced as the first openly available dataset of 124,699 PubMed Open Access case reports, each converted into structured timelines containing over 5.6 million timestamped clinical events.
The dataset was created using a combination of heuristic filtering, Llama 3.3, and DeepSeek R1 to extract single-patient case reports and generate prompt-driven extraction.
Evaluation against a clinician-curated reference set showed high quality in terms of event-level matching, temporal concordance, and timestamp alignment with wide diagnostic and demographic coverage.
In a survival prediction task, embeddings from extracted timelines demonstrated predictive value with time-dependent concordance indices up to 0.82, showcasing the potential for temporal reasoning and longitudinal modeling in biomedical NLP.