Complex System Forecasting with Expert Knowledge
Published:
Why complex time-series system?
Real-world information is complex. From climate to the immune system, from quantum dynamics to the financial market, information comes sequentially in interrelated and multi-modal forms. In other words, time-series data is naturally a system of multiple temporal sequences where each may have an independent process, represented by multiple variables per timestep. Analysing a single sequence in the system is unlikely to reveal the hidden dynamics and correlations between the data. This explains why most theoretical forecasting systems for a single stock price often fail in practice (otherwise everyone would be rich).
The same story applies to deep learning models–the current workhorse of modern AI. They fit well with training data, which only consider one time series at a time, yet cannot work during testing time simply because they ignore the relationships of the considering time series and other data available in the system. This is opposite to how human experts make predictions. For example, when a doctor decides on a treatment plan for a particular patient, many health time series are evaluated (ECG, EEG, MRI signals, historical diagnoses, etc.) and a global consideration of all these signals is crucial to ensure the safety of the treatment approach. Another example is forecasting the movement of atoms in materials. It is impossible to predict where a single atom will jump to in the next nanoseconds without considering the position, velocity and force of other atoms, which constructs the surrounding quantum dynamics.
What can be done about it?
To cope with multiple time-series systems, we can consider attention mechanisms in deep learning to capture relationships within and across variables. Recent Transformer-based architectures have been designed for multivariate time series and shown better results than traditional approaches such as recurrent neural networks. Yet, they still consider a single sequence and ignore others prevalent in the system.
Worse still, along the temporal dimension, there can be distributional shifts, i.e. the statistics of data change over time. This problem is serious for any unimodal architectures if they assume a single set of parameters to represent the dynamics of the sequence.
One potential solution for both aforementioned issues is Program Memory. Using multiple programs for different sequences or parts of a sequence enables flexibility and adaptation for the model to handle rapid changes in the data. In addition to learning the programs specialized for certain dynamics, the model meta-learns program scheduling mechanisms to assign programs to the right context.
Predictions under control of domain knowledge
It is common to see a time-series forecaster produce unexpected predictions. For example, molecular dynamic forecasting may result in an atom in a prohibited location that violates quantum laws. Or a treatment recommender suggests an overdosed drug that is not allowed by the hospital rules. The models fail to obey the domain-specific laws and knowledge simply because they are unaware of them. The knowledge is not apparent in the training data, thus, it is very hard for the model to capture.
Recent attempts to bring human feedback into the training of deep learning models have demonstrated the importance of aligning large language model’s output with human values. Inspired by these successes, we can introduce an expert-in-the-loop paradigm to time-series problems through reinforcement learning. Treating a forecaster as an RL agent, we have the freedom to integrate domain knowledge to control the model’s behaviours and outputs.