The process of Topological Data Analysis. (Photo credit: Riñon et. al. 2024)
A data analysis method that borrows concepts from topology – the so-called “rubber-sheet geometry” – may soon find its way to a stock trader’s toolkit in analyzing market movements. This method, called Topological Data Analysis (TDA), uncovers patterns in large datasets, preparing them for deeper analysis.
In a study published in the Philippine Journal of Science, Ela Mae Riñon and Dr. Rachelle Sambayan of the UP Diliman College of Science Institute of Mathematics (UPD-CS IM) demonstrated that TDA can be used as an early warning signal for stock market crashes. Their study, which analyzed stock data of three Philippine companies from 2019 to 2021, successfully anticipated periods when the market was about to crash.
“The findings suggest that [TDA] can help identify industries most affected during economic downturns, such as the COVID-19 pandemic, aiding investors and policymakers in minimizing risks,” they said.
The T in Topological Data Analysis
To understand TDA, it may be helpful to compare it with how we observe stars. At first glance, the night sky appears as a seemingly random plot of stars scattered across the vast canvas of the universe. However, with extended observation, patterns emerge: clusters of stars, constellations, and voids become apparent the moment we recognize them. As we continue to gaze at the sky, the random stars fade into the background, while the clusters, constellations, and voids take the spotlight.
This effect is similar to what TDA does with large sets of data points. Essentially, TDA reveals hidden geometric structures within datasets, allowing insights to be drawn from what previously looked like an ambiguous clump of points. “TDA helps uncover patterns, such as clusters forming constellations, loops, or voids,” said the authors.
One tool in TDA, called persistent homology, follows a series of steps to systematically uncover patterns. First, researchers draw a small ball around the data points. Next, they gradually expand the balls until some of them overlap. When this happens, they connect the corresponding points. The balls continue to expand until the points form various figures, which are then classified into what are called homology groups.
Researchers classify the figures into three homology groups: connected components, which are open shapes like line segments; loops, which are closed shapes like triangles; and cavities, which are shapes that extend to the third dimension. As the balls expand, the figures can change from one homology group to another. A connected component, for example, might attach to a data point that closes it off, transforming it into a loop.
As the balls expand, the points connect, forming different homology groups. (Photo credit: Riñon et. al. 2024)
Throughout this process, researchers observe how quickly the figures change homology groups. Figures that switch groups over a short range of ball sizes are considered noise, while those that persist for wider ranges are deemed more significant. Eventually, when the balls are large enough, they will all overlap and all points will be connected into one big structure, marking the end of the process.
Which figures quickly change and which persist is the core idea of TDA, revealing key features of the data. Moreover, the authors explained that “unlike standard methods, TDA is robust to noise and can detect complex patterns that might otherwise go unnoticed, offering a deeper understanding of high-dimensional data.”
Putting TDA to the Test
Dr. Sambayan and Riñon applied TDA to the stock price data of three Philippine companies – Cebu Air (CEB), PAL Holdings (PAL), and Century Pacific Food (CNPF) – from January 2019 to January 2021. They found that when the market was about to crash, the data points began to cluster together, which made their corresponding figures more likely to change homology groups. In other words, the persistence of homology groups weakens as the stock prices start to plummet.
“This weakening of persistence is unexpected because it reveals a distinct change in the data's topological structure during market downturns, which contrasts with the scattered point clouds observed during stable periods,” the authors explained.
To track these changes, they used a persistence landscape, a chart that maps out the crash probability of a stock. In 2019, during a period when CEB was about to crash, its persistence landscape spiked, indicating a high crash probability of 40-60%. Similarly, in early 2020, the landscape showed a high crash probability prior to another CEB crash, which was attributed to pandemic travel bans and lockdowns.
Their study also revealed that TDA is indeed robust to noise, which refers to ordinary fluctuations within the data. Stock prices naturally fluctuate, but not every dip signals an impending crash. Unlike CEB, both PAL and CNPF were more stable during the onset of the pandemic. Although they experienced minor dips during this period, they did not experience a significant crash, which the model correctly anticipated.
However, the authors noted that they only applied the analysis to three companies over a brief period and that other markets or time frames might yield different results. “Additionally, the study's approach to determining thresholds and window sizes may need further refinement to ensure consistency and accuracy in different contexts,” they said.
To further test TDA, the authors recommend extending the analysis to other markets and time frames. “Another avenue for future research is applying the TDA approach to other types of time series data, such as exchange rates, to explore its effectiveness in detecting structural changes and understanding the behavior of stock prices and other financial indicators under different economic conditions,” they concluded.
By: Harvey Sapigao