Saturday 14 June 2014

Tagged under: , ,

Java code to remove redundant data / duplicate entries spread across multiple Files

The scenario here was that there are several text files, with redundant data in them. The objective was to remove the redundant data spread across multiple text files, and collate the entire data in a single file. This task can ofcourse be done using excel macros, but not being too fond of excel, I preferred writing a java code for the same.
Below is the Java code to accomplish this task:


package aj;

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.Scanner;
 
public class Load {
 public static void main(String[] args) throws IOException {
 
  System.out.println("Enter the no of files");
  Scanner scr=new Scanner(System.in);
  int no=scr.nextInt();
  int i=1;
  String source;
  String dest = "Filepath\\Dest.txt";
  String result = "Filepath\\Result.txt";


  while(i<=no)
  {
  source = "Filepath\\File"+i+".txt";
  copyContent(source, dest);
  ++i;
  }
  
  removeDuplicates(dest, result);
 }
 
 static void copyContent(String source, String dest)throws IOException {
  File fin = new File(source);
  FileInputStream fis = new FileInputStream(fin);
  BufferedReader in = new BufferedReader(new InputStreamReader(fis));
 
  FileWriter fstream = new FileWriter(dest, true);
  BufferedWriter out = new BufferedWriter(fstream);
 
  String aLine = null;
  while ((aLine = in.readLine()) != null) {
   out.write(aLine);
   out.newLine();
  }
  out.newLine();
  out.newLine();
 
  out.write("***************************************************");
  out.newLine();
  out.newLine();
  
  in.close();
 
  out.close();
  
 }
 
 static void removeDuplicates(String dest, String result) throws IOException{

  int flag=0;
  File fin = new File(dest);
  FileInputStream fis = new FileInputStream(fin);
  BufferedReader in = new BufferedReader(new InputStreamReader(fis));

  
  FileWriter fstream = new FileWriter(result, true);
  BufferedWriter out = new BufferedWriter(fstream);
 
  String aLine = null;
  String bLine = null;

  int i=1,j=1;
  while ((aLine = in.readLine()) != null) {
   j=i;
   flag=0;
   File fin1 = new File(dest);
   FileInputStream fis1 = new FileInputStream(fin1);
   BufferedReader in1;
   in1 = new BufferedReader(new InputStreamReader(fis1));

   if(aLine.equals("***************************************************")==true || i==1)
   {
    out.write(aLine);
    out.newLine();
    
   }
   else{
    while(j>1)
    {
     bLine = in1.readLine();
     if (bLine!=null && bLine.equals(aLine)) {
      flag=1;
     }
     --j;
    }

    if(flag==0)
    {
     out.write(aLine);
     out.newLine();

    }
   }
   ++i;
   in1.close();

  }
  
  in.close();
  out.close();
  
  
 }
}


The function copyContent() copies the data from all the files to a single text file.
Once this is done, the function removeDuplicates() removes the redundant data from the newly created file. The final cleaned data is now available in the 'Result.txt' file.

Kindly Bookmark and Share it:

0 comments:

Post a Comment