Elimination of Duplicate Records from Multiple Resources by Applying Normalization

Mohini Markad, Renuka Shaikh, Shivani Chaudhari, Mayuri Patil

Mohini Markad, Renuka Shaikh, Shivani Chaudhari, Mayuri Patil

Abstract

Data duplication is a big issue in various organizations. Data redundancy is the major cause of memory wastage and time wastage. Most of the times data is remaining in the system even if it is of no longer use. At large level it becomes impossible to identify such data and organizations end up spending huge amount of money and time. More money is spent on buying clouds for information storage instead of clearing unused previous stored data. Normalization helps a lot in this problem. The rate of learning can be increased in normalized record as compare to non-normalized data. Also if we add some time server for the information then we do not even need to remember. At certain time files will be deleted automatically. Memory and Time can be saved. Then Here we mine big data using Keyword search algorithm for getting duplicate records from the database. We created a system for detecting the redundant data. The invented system includes algorithms to get duplicate records and save only normalized records. Here we use time server that are used for time and storage utilization. When time limit exceeds then that file automatically delete from database. Here we provide more security by applying Encryption on the files and detect duplicate records using file signature or tag. Here experimental result shows our system are more efficient than existing systems