New Algorithm by Pusan National University Scientists Can Repair Missing Data in Event Logs with Superior Accuracy
The high accuracy of the new data restoration algorithm guarantees its applications not only in current enterprises but also in future AI applications
Process models and optimization processes rely on the quality of data. Missing data can lead to models that generate an incorrect analysis. In a new study, researchers from Pusan National University, South Korea, have developed an improved algorithm that uses correlations between existing information to restore missing data in an event log with a high degree of accuracy.
Digitalization has enabled businesses to record their operations in event logs where each activity in a business process is recorded as data with certain attributes such as a timestamp, event name etc. These logs are helpful as they give an overview of the operations and can be used to develop process models that optimize the business process. However, the quality of the optimization process is only as good as the data stored and event logs with missing events lead to poor analysis and data models.
In a collaborative study, researchers from Pusan National University, South Korea, including Dr. Sunghyun Sim and Prof. Hyerim Bae, along with Prof. Ling Liu from Georgia Institute of Technology have developed a method that can restore missing data in an event log. The study, published in IEEE Transactions on Services Computing, uses imputation methods that use correlations between available data to find missing information. “Since data is collected from multiple perspectives in numerous information systems, there is a relationship between the collected data. Starting with this point, our study suggested a method of restoring missing event values by utilizing the relationship among entities in the event log, which can overcome human error or system,” explains Dr. Sim.
In event logs, events have attributes that are linked to other events in “single event” or “multiple event” relationships. In the former case, each attribute of an event corresponds to a unique attribute in another event. Based on this relationship, the researchers developed a Systematic Event Imputation (SEI) method that restores a missing value by simply referring to the available value it is linked to.
However, in the latter case where attributes have multiple correspondences, a simple matching of attributes is not possible. For such situations, a multiple event imputation (MEI) method was developed where missing events are first estimated and used to create event sequences or event chains. These sequences can be compared with an event log without missing data to restore the missing event attributes.
These imputation methods were applied simultaneously by a bagging recurrent event imputation (BREI) algorithm, uses bootstrap sampling and recurrent event imputation (REI) to repair the event log. On tests with real-world event logs, the researchers found that their algorithm improved restoration accuracy by 10–30% compared to existing restoration algorithms. Moreover, it could restore almost 90% of the data accuracy even when more than half of it was missing.
Apart from optimizing business processes, the researchers are optimistic that such an algorithm can be extended to other applications that rely on the quality of data. One promising avenue lies in improving the data fed to AI systems and this method has the potential to accelerate the development of AI technologies. “It is possible to improve the performance of artificial intelligence by improving the quality of data in its learning process. The algorithm will also help prevent model malfunction by improving the quality of data it collects in real-time in a real-time environment,” elaborates Prof. Hyerim.
The high accuracy of the new algorithm, as well as its versatility is sure to ensure its widespread application in industry in the near future.
Reference
Authors: Sunghyun Sim (1), Hyerim Bae (1), and Ling Liu (2)
Title of original paper: Bagging Recurrent Event Imputation for Repair of Imperfect Event Log With Missing Categorical Events
Journal: IEEE Transactions on Services Computing
DOI: https://doi.org/10.1109/TSC.2021.3118381
Affiliations:
(1) Pusan National University, South Korea
(2) Georgia Institute of Technology, USA
*Corresponding author’s email: ssh@pusan.ac.kr , hrbae@pusan.ac.kr, lingliu@cc.gatech.edu
ORCID ID:
Sunghyun Sim: https://orcid.org/0000-0002-3410-8744
Hyerim Bae: https://orcid.org/0000-0003-2602-5911
About Pusan National University
Pusan National University, located in Busan, South Korea, was founded in 1946, and is now the no. 1 national university of South Korea in research and educational competency. The multi-campus university also has other smaller campuses in Yangsan, Miryang, and Ami. The university prides itself on the principles of truth, freedom, and service, and has approximately 30,000 students, 1200 professors, and 750 faculty members. The university is composed of 14 colleges (schools) and one independent division, with 103 departments in all.
Website: https://www.pusan.ac.kr/eng/Main.do
About the authors
Dr. Sunghyun Sim received his M.S. and Ph.D. in Industrial Engineering from Pusan National University, South Korea, in 2021. His research interests include automatic process mining, event log quality improvement, and process optimization based on the deep learning method.
Prof. Hyerim Bae received his Ph.D. in Industrial Engineering from Seoul National University, South Korea. Since 2004, he has been a professor with the Department of Industrial Engineering, Pusan National University, South Korea. His interests include AI based smart ports, cloud computing, process mining for smart factories and big data analytics for operational intelligence.
Website address: http://baelab.pusan.ac.kr/