A PSO Based Cloud Framework for Knowledge Extraction
Abstract
Many industries, such as oil, construction, banking, and insurance, have substantial historical physical data. Companies store this data in physical warehouses which are geographically distributed and is usually taken care by record management companies. Storing large volumes of historical physical data poses many critical challenges, such increase in maintenance cost, high time for recovery, and unsearchable data. Many companies digitize this data and consolidate this data into cloud repositories as part of their Digital Transformation (DT) journey to address these challenges. This DT process introduces many other technical challenges while dealing with poor scans, huge file size, geographically distributed files, and confidential documents. Though there are options to resolve each of these limitations individually, there are no frameworks that deal with digitization and historical data storage in its entirety. Moreover, they cannot handle large number of documents having variable file sizes. This paper presents a generic cloud-based high-performance computing framework for knowledge extraction, comprising document classification based on neural networks and particle swarm optimization (PSO), data extraction, metadata enrichment, image enhancement using image processing (IP) techniques, and high dataavailability to users using cloud-based search. The proposed framework is executed on two cloud providers, i.e., Azure and AWS, to test its efficacy.