A study explores hierarchical meta-learning in dynamical system reconstruction (DSR) using a Mixture of Experts (MoE) approach.
While conventional MoEs faced challenges in hierarchical DSR due to slow updates and conflicted routing, a new method called MixER is introduced.
MixER, a sparse top-1 MoE layer, incorporates a custom gating update algorithm based on $K$-means and least squares for more effective training and scalability.
Experiments validate MixER's efficiency and scalability in handling systems with up to ten parametric ordinary differential equations.
However, MixER falls short compared to existing meta-learners in scenarios with abundant data, especially when each expert processes only a fraction of a dataset with closely related data points.
Analysis with synthetic and neuroscientific time series data indicates that MixER's performance is influenced by the presence of hierarchical structure in the data.