Cross User Bigdata Deduplication

Yash Karanje, Ankita Jadhav, Nikhita Biradar, Ketaki Kadam, Prof. M. P. Navale

Yash Karanje, Ankita Jadhav, Nikhita Biradar, Ketaki Kadam, Prof. M. P. Navale

Abstract

Today’s world has been digitalized to a large extent. The total amount of data generated per day is more than 2.5 exabytes out of which social media fuels up with maximum contribution along with business transactional data, sensor-generated data. Such a huge amount of data must be managed properly to use it for certain business domain-specific decision-taking purposes. It is very confronting to store and manage such huge amounts of data which is mostly redundant in nature and that too present over multiple cloud platforms for multiple users; it requires high resources including the cost required to store, backup time, processing time; which results into reduction of system throughput. So, Data Deduplication is the most preferable way that we propose here considering the above issue. We propose a model that will perform deduplication of data for multiple users to achieve the uniqueness of textual data (only) uploaded by multiple users; data access must be efficient though, maintain the privacy of data against brute-force attacks. This intension will be achieved by employing certain algorithms like a Fixed-size blocking algorithm & Encryption algorithm and effective data organization. It will not only preserve the space by means of reducing storage allocation but also effectively manage network bandwidth.