Supervised learning approaches for causal discovery from observational data often achieve competitive performance despite seemingly avoiding explicit assumptions that traditional methods make for identifiability.
Researchers investigated the transformer-based model CSIvA, exploring its ability to train on synthetic data and transfer to real data.
The study found that constraints on the training data distribution implicitly define a prior on the test observations, and good performance is achieved when there is a good prior on the test data and the underlying model is identifiable.
The study also showed that training on datasets generated from different classes of causal models, even if individually identifiable, improves test generalization, as the ambiguous cases resulting from the mixture of identifiable causal models are unlikely to occur.