Use of Algebraic Topology for Big Data Analysis in Advanced Computing Environments
Abstract
The rapid growth of Big Data has posited an immediate need for efficient data analysis techniques capable of dealing efficiently with big, complicated datasets. Algebraic topology and topological data analysis are powerful tools for simplifying high-dimensional data by preserving the critical structural features of this data. This paper represents a framework of algebraic topology combined with advanced computing environments, such as cloud computing or distributed systems, to enable addressing major challenges within the context of Big Data analysis. It proposes a framework that enables scalable, fast, and accurate computation of persistent homology by parallel processing techniques like MapReduce. Experimental evaluation using several data from point cloud, Earth observation, and IoT sensor datasets show significant performance enhancements up to 35%, with an accuracy improvement of 8% and scalability enhancement of 55%. These results illustrate the promise of combining algebraic topology with state-of-the-art computational environments to provide a potent scalable methodology for analyzing complex data sets.
References
Pięta, P., & Szmuc, T. (2021). Applications of rough sets in big data analysis: an overview. International Journal of Applied Mathematics and Computer Science, 31(4), 659-683.
Rao, T. R., Mitra, P., Bhatt, R., & Goswami, A. (2019). The big data system, components, tools, and technologies: a survey. Knowledge and Information Systems, 60, 1165-1245.
Skaf, Y., & Laubenbacher, R. (2022). Topological data analysis in biomedicine: A review. Journal of Biomedical Informatics, 130, 104082.
Hamid, K., Iqbal, M. W., Abbas, Q., Arif, M., Brezulianu, A., & Geman, O. (2023). Cloud computing network empowered by modern topological invariants. Applied Sciences, 13(3), 1399.
Gang, A., Xiang, B., & Bajwa, W. U. (2021). Distributed principal subspace analysis for partitioned big data: Algorithms, analysis, and implementation. IEEE Transactions on Signal and Information Processing over Networks, 7, 699-715.
Chazal, F., & Michel, B. (2021). An introduction to topological data analysis: fundamental and practical aspects for data scientists. Frontiers in artificial intelligence, 4, 667963.
Cole, A., & Shiu, G. (2019). Topological data analysis for the string landscape. Journal of High Energy Physics, 2019(3), 1-31.
Khattak, A. M., Abdullah, S., Zaighum, A., Jamal, F., Rana, M. A., Anjum, Z., ... & Ishfaq, M. (2018). A Characterization of Soft Semi Separation Axioms in Soft Quad Topological Spaces. Iraqi Journal of Science, 552-563.
Mahmood, S. I. (2017). Weak Forms of Fuzzy N-Open Sets and Fuzzy-Sets in Fuzzy Topological Spaces. Iraqi Journal of Science, 2401-2411.
Polterovich, L., Rosen, D., Samvelyan, K., & Zhang, J. (2020). Topological persistence in geometry and analysis (Vol. 74). American Mathematical Soc..
Bubenik, P., & Vergili, T. (2018). Topological spaces of persistence modules and their properties. Journal of Applied and Computational Topology, 2(3), 233-269.
Patil, P. G., & Pattanashetti, B. (2023). Generalized Pre α-Regular and Generalized Pre α-Normal Spaces in Topological Spaces. Iraqi Journal of Science, 4568-4579.
Chakraborty, J., Das, B., Mahato, P., & Bhattacharya, B. (2023). Unification of Generalized pre-regular closed Sets on Topological Spaces. Iraqi Journal of Science, 4557-4567.
Patil, P. G., & Pattanashetti, B. (2023). Generalized Pre α-Regular and Generalized Pre α-Normal Spaces in Topological Spaces. Iraqi Journal of Science, 4568-4579.
Chazal, F., & Michel, B. An introduction to Topological Data Analysis: Fundamental and practical aspects for data scientists. arXiv 2017. arXiv preprint arXiv:1710.04019.
Ravishanker, N., & Chen, R. (2019). Topological data analysis (TDA) for time series. arXiv preprint arXiv:1909.10604.
Sanderson, N., Shugerman, E., Molnar, S., Meiss, J. D., & Bradley, E. (2017). Computational topology techniques for characterizing time-series data. In Advances in Intelligent Data Analysis XVI: 16th International Symposium, IDA 2017, London, UK, October 26–28, 2017, Proceedings 16 (pp. 284-296). Springer International Publishing.
Tauzin, G., Lupo, U., Tunstall, L., Pérez, J. B., Caorsi, M., Medina-Mardones, A. M., ... & Hess, K. (2021). giotto-tda:: A topological data analysis toolkit for machine learning and data exploration. Journal of Machine Learning Research, 22(39), 1-6.
Rivera-Castro, R., Pilyugina, P., Pletnev, A., Maksimov, I., Wyz, W., & Burnaev, E. (2019). Topological data analysis of time series data for B2B customer relationship management. arXiv preprint arXiv:1906.03956.
Hensel, F., Moor, M., & Rieck, B. (2021). A survey of topological machine learning methods. Frontiers in Artificial Intelligence, 4, 681108.
Behrens, J., Løvholt, F., Jalayer, F., Lorito, S., Salgado-Gálvez, M. A., Sørensen, M., ... & Vyhmeister, E. (2021). Probabilistic tsunami hazard and risk analysis: A review of research gaps. Frontiers in Earth Science, 9, 628772.
Geiker, N. R. W., Bertram, H. C., Mejborn, H., Dragsted, L. O., Kristensen, L., Carrascal, J. R., ... & Astrup, A. (2021). Meat and human health—Current knowledge and research gaps. Foods, 10(7), 1556.
Stix, C., & Maas, M. M. (2021). Bridging the gap: the case for an ‘Incompletely Theorized Agreement’on AI policy. AI and Ethics, 1(3), 261-271.
von Bock und Polach, R. F., Klein, M., Kubiczek, J., Kellner, L., Braun, M., & Herrnring, H. (2019, June). State of the art and knowledge gaps on modelling structures in cold regions. In International Conference on Offshore Mechanics and Arctic Engineering (Vol. 58875, p. V008T07A014). American Society of Mechanical Engineers.