Class FileToTextConverter

java.lang.Object
com.chatmotorapi.api.util.FileToTextConverter

public class FileToTextConverter
extends Object
A utility class for converting various file formats to plain text files. This class supports HTML, DOC, DOCX, RTF, and PDF files by extracting text content and saving it into a plain text file. The output encoding is UTF-8 by default.

Note: This class is not designed to handle gigantic files, as the content will be loaded into memory during the conversion process.

  • Constructor Summary

    Constructors
    Constructor Description
    FileToTextConverter()  
  • Method Summary

    Modifier and Type Method Description
    static void convert​(String inputFilePath, String outputFilePath)
    Converts an HTML, DOC, DOCX, RTF, or PDF file to a text file by extracting text content and dumping it into a plain text file.
    static void convert​(String inputFilePath, String outputFilePath, Charset charset)
    Converts an HTML, DOC, DOCX, RTF, or PDF file to a text file by extracting text content and dumping it into a plain text file.

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • FileToTextConverter

      public FileToTextConverter()
  • Method Details

    • convert

      public static void convert​(String inputFilePath, String outputFilePath) throws IOException
      Converts an HTML, DOC, DOCX, RTF, or PDF file to a text file by extracting text content and dumping it into a plain text file. The encoding used for the output file is UTF-8.
      Parameters:
      inputFilePath - the input file path of the file to be converted
      outputFilePath - the output file path where the converted text will be dumped
      Throws:
      IOException - if any error occurs during the conversion process
    • convert

      public static void convert​(String inputFilePath, String outputFilePath, Charset charset) throws IOException
      Converts an HTML, DOC, DOCX, RTF, or PDF file to a text file by extracting text content and dumping it into a plain text file.
      Parameters:
      inputFilePath - the input file path of the file to be converted
      outputFilePath - the output file path where the converted text will be dumped
      charset - the charset of the encoding to be used for the output file
      Throws:
      IOException - if any error occurs during the conversion process