Symbolic Representation and Modeling of Time-Series Data

Purpose: To consider how "sequences of patters" may be identified and used for forecasting of time-series datasets, specifically using Markov Chains.

Background: While a graduate student at the University of Florida, I completed this project independently as the final assignment for Safe Autonomous Systems course (Fall of 2022). The deliverable was a final paper (not peer reviewed) which received a grade of 96%.

Source Code: The software was implemented in Jupyter Notebooks.

Dataset: The dataset used was a ECG data from this source.

Paper Abstract: Autonomous systems frequently use probabilistic models, reinforcement learning, and machine learning techniques built upon Markov Chains (MC) and Markov Decision Processes (MDP) to model problem spaces, perform forecasting, and learn policies for task completion. However, explicit structures of Markov models can be challenging to define, and models handcrafted by experts become challenging to maintain. Related approaches use high-dimensional and continuous state spaces and neural networks to define MDPs for reinforcement learning applications. However, the resulting models can require increased data size and training time. Furthermore, high-dimensional models are often inexplicable due to complex mathematical abstraction and are challenging to visualize. This paper proposes a method to construct MCs by partitioning a state-space using Gaussian mixture models to generate symbolic representations of time-series datasets. The algorithm is demonstrated on an Electrocardiogram (ECG) dataset, and is extended to the problem of forecasting the next symbolic state using a MC representing
learned states and state transition times.