Class CSVReader

  • Direct Known Subclasses:
    CSVFromPropertiesReader, FixedLengthCSVReader, ImpExReader.ResultSetCSVReader, SyncScheduleReader, UNSPSCReader

    public class CSVReader
    extends java.lang.Object
    This class parses a CSV InputStream to a list of maps. Each list entry represents a line of the stream and each entry in the map is a parsed CSV field. By default, the reader ignores comments and empty lines. The format of the CSV-source is expected as defined in RFC-4180. Separator and comment chars can differ from the ones in rfc.

    Use of this CSVReader:

    1. Create the Reader
      csvreader = new CSVReader(inputStream, null ); //null for default encoding
      Use the constructor to set the encoding and the stream. There are also other constructors available.
    2. Configure the Reader
      Example: csvreader.setTextSeparator('?');
      If you skip this step, following defaults are set:
      • commentOut = '#' //chars for comment lines
      • fieldseparator = ';' //separate the CSV fields
      • textseparator = '\"' //enclose a CSV text
      • showComments = false //all comment line are ignored
      • toSkip = 0 //there will be no lines ignored
      ATTENTION: If toSkip is set to 3 the first 3 lines of data are skipped, regardless if they are empty lines, comments, real header or data lines!
    3. Use the Reader
       ArrayList i_am_the_parsed_csv_stream = new ArrayList();
       while (csvreader.readNextLine())
       {
              i_am_the_parsed_csv_stream.add(csvreader.getLine());
       }
       

      After this while loop the ArrayList contains all parsed CSV lines. Each line is a Map of the parsed CSV fields. See parseLine(String) for the setup of the map.

    4. Close the Reader
      Do not forget to call csvreader.close() to close the reader.
    • Constructor Summary

      Constructors 
      Constructor Description
      CSVReader​(java.io.File file, java.lang.String encoding)
      Opens the file and sets the given encoding.
      CSVReader​(java.io.InputStream is, java.lang.String encoding)
      Opens the given Inputstream with the given file.
      CSVReader​(java.io.Reader reader)
      Opens the given reader.
      CSVReader​(java.lang.String lines)
      A convenience constructor for passing csv data as simple string object.
      CSVReader​(java.lang.String fileName, java.lang.String encoding)
      Opens the file with the given filename and sets the given encoding.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      static java.util.Map applyDecorators​(java.util.Map<java.lang.Integer,​CSVCellDecorator> decoratorMap, java.util.Map line)
      Applies given decorators to columns in given line.
      void clearAllCellDecorators()
      Removes all declared cell decorators.
      void clearCellDecorator​(int position)
      Removes one declared cell decorator.
      void close()
      Close the reader.
      void closeQuietly()
      Close the reader quietly.
      boolean finished()
      Is reading from stream is already finished (has stream reached its end)?
      CSVCellDecorator getCellDecorator​(int position)
      Returns the decorator mapped to the given column position.
      char[] getCommentOut()  
      int getCurrentLineNumber()  
      protected java.util.Map<java.lang.Integer,​CSVCellDecorator> getDecoratorMap​(boolean create)
      Returns a map containing the current csv decorators mapped to column positions.
      char[] getFieldSeparator()  
      java.util.Map<java.lang.Integer,​java.lang.String> getLine()
      Returns the parsed line as map.
      java.lang.String getSourceLine()
      Gets the last line read from stream.
      char getTextSeparator()  
      boolean hasCellDecorators()
      Are there declared cell decorators?
      protected boolean isCommentedOut​(java.lang.String line)
      Returns true if the passed line is a comment (depends on setCommentOut(char[])).
      boolean isFinished()
      Is reading of input stream finished (has reached end)?
      boolean isMultiLineMode()
      Tells whether or not the reader supports csv lines spread across multiple lines by putting a \ (backslash) at the end of each unfinished line.
      protected boolean isReading()
      Checks if reader is already reading from stream.
      boolean isShowComments()  
      protected void markFinished()
      Sets the finished flag which indicates the reaching of stream end.
      protected boolean mustSkip()
      Checks if lines have to be skipped still.
      protected void notifyNextLine()
      Increments the line number and decreases the line skip counter.
      static java.util.Map<java.lang.Integer,​java.lang.String>[] parse​(CSVReader reader)
      Convenience method which parses csv lines directly from the given reader.
      static java.util.Map<java.lang.Integer,​java.lang.String>[] parse​(java.lang.String lines)
      Convenience method which parses csv lines directly from the given string.
      static java.util.Map<java.lang.Integer,​java.lang.String>[] parse​(java.lang.String lines, char[] fieldSeparator)
      Convenience method which parses csv lines directly from the given string.
      static java.util.Map<java.lang.Integer,​java.lang.String>[] parse​(java.lang.String lines, char[] fieldSeparator, char textSeparator)
      Convenience method which parses csv lines directly from the given string.
      protected java.util.Map<java.lang.Integer,​java.lang.String> parseLine​(java.lang.String line)
      Tokenises the given line and returns a Map with following content:

      Map{
      { 0:Integer, Field_1:String },
      { 1:Integer, Field_2:String },
      ...
      { n-1:Integer, Field_n:String }
      }

      boolean readNextLine()
      Reads and parses next line from stream and returns true if the line was read and parsed successfully.
      protected java.lang.String readSrcLineFromStream()
      Reads next line from stream.
      void setCellDecorator​(int position, CSVCellDecorator decorator)
      Maps a decorator to a column position.
      void setCommentOut​(char[] commentOut)
      Set the characters which indicates a comment line.
      void setFieldSeparator​(char[] fieldseparator)
      Sets the CSV field separator char(s).
      void setLinesToSkip​(int i)
      Set the number of real lines which are skipped when readNextLine() is called the first time.
      void setMaxBufferLines​(int number)
      Set maxBufferLines to a new value.
      void setMultiLineMode​(boolean on)
      Changes whether or not the reader supports csv lines spread across multiple lines by putting a \ (backslash) at the end of each unfinished line.
      void setShowComments​(boolean showComments)
      Set to true if all comment line should also parsed.
      void setTextSeparator​(char textseparator)
      Sets the text separator char which enclose in CSV a text(default is ").
      protected java.lang.String trim​(java.lang.String src, boolean fromStart, boolean fromEnd)
      Trims the given string like the trim() method of java.lang.String, but allows to disable trimming from start or end of the string.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • CSVReader

        public CSVReader​(java.lang.String fileName,
                         java.lang.String encoding)
                  throws java.io.UnsupportedEncodingException,
                         java.io.FileNotFoundException
        Opens the file with the given filename and sets the given encoding.
        Parameters:
        fileName - the filename of the CSV file
        encoding - the given encoding, default is CSVConstants.DEFAULT_ENCODING
        Throws:
        java.io.UnsupportedEncodingException - thrown by not supported encoding type
        java.io.FileNotFoundException - thrown if file not found
      • CSVReader

        public CSVReader​(java.io.File file,
                         java.lang.String encoding)
                  throws java.io.UnsupportedEncodingException,
                         java.io.FileNotFoundException
        Opens the file and sets the given encoding.
        Parameters:
        file - the CSV file
        encoding - the given encoding, default is CSVConstants.DEFAULT_ENCODING
        Throws:
        java.io.UnsupportedEncodingException - thrown by not supported encoding type
        java.io.FileNotFoundException - thrown if file not found
      • CSVReader

        public CSVReader​(java.io.InputStream is,
                         java.lang.String encoding)
                  throws java.io.UnsupportedEncodingException
        Opens the given Inputstream with the given file.
        Parameters:
        is - the InputStream
        encoding - the given encoding, default is CSVConstants.DEFAULT_ENCODING
        Throws:
        java.io.UnsupportedEncodingException - thrown by not supported encoding type
      • CSVReader

        public CSVReader​(java.io.Reader reader)
        Opens the given reader. The default encoding is set to CSVConstants.DEFAULT_ENCODING
        Parameters:
        reader - the reader
      • CSVReader

        public CSVReader​(java.lang.String lines)
        A convenience constructor for passing csv data as simple string object.
        Parameters:
        lines - the csv data as string
    • Method Detail

      • getDecoratorMap

        protected java.util.Map<java.lang.Integer,​CSVCellDecorator> getDecoratorMap​(boolean create)
        Returns a map containing the current csv decorators mapped to column positions. If there are no decorators declared, a new map will be created if create flag is set.
        Parameters:
        create - If no cell decorators declared and flag is set a new map will be returned
        Returns:
        map holding the current declared cell decorators
      • clearAllCellDecorators

        public void clearAllCellDecorators()
        Removes all declared cell decorators.
      • clearCellDecorator

        public void clearCellDecorator​(int position)
        Removes one declared cell decorator.
        Parameters:
        position - the column position the decorator is mapped
      • setCellDecorator

        public void setCellDecorator​(int position,
                                     CSVCellDecorator decorator)
        Maps a decorator to a column position.
        Parameters:
        position - position the decorator will be mapped
        decorator - the decorator
      • hasCellDecorators

        public boolean hasCellDecorators()
        Are there declared cell decorators?
        Returns:
        true if decorators are declared
      • getCellDecorator

        public CSVCellDecorator getCellDecorator​(int position)
        Returns the decorator mapped to the given column position.
        Parameters:
        position - position of column for which the decorator is needed
        Returns:
        decorator mapped to given position or null if no one is declared
      • readSrcLineFromStream

        protected java.lang.String readSrcLineFromStream()
        Reads next line from stream. If stream has reached end, null is returned and finished flag will be set.
        Returns:
        next line from stream or null
      • finished

        public boolean finished()
        Is reading from stream is already finished (has stream reached its end)?
        Returns:
        true if the stream has reached its end and last call of readNextLine has returned null
      • readNextLine

        public final boolean readNextLine()
        Reads and parses next line from stream and returns true if the line was read and parsed successfully.
        Returns:
        false if the end of the stream is reached, else true.
      • trim

        protected java.lang.String trim​(java.lang.String src,
                                        boolean fromStart,
                                        boolean fromEnd)
        Trims the given string like the trim() method of java.lang.String, but allows to disable trimming from start or end of the string. Another difference is, that all declared field separators will not be trimmed.
        Parameters:
        src - the string which will be trimmed
        fromStart - is trimming from the left side of the string enabled?
        fromEnd - is trimming from the right side of the string enabled?
        Returns:
        if there are no changes, the given string is returned, else a new trimmed copy
        See Also:
        String.trim()
      • getLine

        public java.util.Map<java.lang.Integer,​java.lang.String> getLine()
        Returns the parsed line as map.
        Returns:
        the map
      • getSourceLine

        public java.lang.String getSourceLine()
        Gets the last line read from stream.
        Returns:
        the last read source line from the stream.
      • parseLine

        protected java.util.Map<java.lang.Integer,​java.lang.String> parseLine​(java.lang.String line)
        Tokenises the given line and returns a Map with following content:

        Map{
        { 0:Integer, Field_1:String },
        { 1:Integer, Field_2:String },
        ...
        { n-1:Integer, Field_n:String }
        }

        Parameters:
        line - the line
        Returns:
        a map with the parsed CSV fields or null if failure
      • close

        public void close()
                   throws java.io.IOException
        Close the reader. Should be always called if parsing is finished.
        Throws:
        java.io.IOException - throws if error occurred
      • closeQuietly

        public void closeQuietly()
        Close the reader quietly. The IOException will be catched and if the debug mode is enabled the exeption message is written to the log.
      • setTextSeparator

        public void setTextSeparator​(char textseparator)
        Sets the text separator char which enclose in CSV a text(default is "). If line was read already a IllegalStateException will be thrown.
        Parameters:
        textseparator - the text separator char
      • isShowComments

        public boolean isShowComments()
        Returns:
        true if all comment line are parsed
      • isCommentedOut

        protected boolean isCommentedOut​(java.lang.String line)
        Returns true if the passed line is a comment (depends on setCommentOut(char[])).
        Parameters:
        line - the passed line
        Returns:
        false if line is not a comment line
      • setShowComments

        public void setShowComments​(boolean showComments)
        Set to true if all comment line should also parsed.
        Parameters:
        showComments - default value is false
      • isReading

        protected boolean isReading()
        Checks if reader is already reading from stream.
        Returns:
        true if reader is already reading stream
      • setMultiLineMode

        public void setMultiLineMode​(boolean on)
        Changes whether or not the reader supports csv lines spread across multiple lines by putting a \ (backslash) at the end of each unfinished line.

        An example:

         cell 1 ; cell 2 ; cell 3 starts here \
         ...and continues here ... \
         ... finally ends here ; cell4
         

        This is read as one single line.

        Parameters:
        on - will multi line mode be enabled?
      • isMultiLineMode

        public boolean isMultiLineMode()
        Tells whether or not the reader supports csv lines spread across multiple lines by putting a \ (backslash) at the end of each unfinished line.

        Be default this feature is switched off.

        An example:

         cell 1 ; cell 2 ; cell 3 starts here \
         ...and continues here ... \
         ... finally ends here ; cell4
         

        This is read as one single line.

        Returns:
        is multi line mode enabled?
      • setCommentOut

        public void setCommentOut​(char[] commentOut)
        Set the characters which indicates a comment line.
        Parameters:
        commentOut - default characters are '#'
      • setFieldSeparator

        public void setFieldSeparator​(char[] fieldseparator)
        Sets the CSV field separator char(s). Default is ';'.
        Parameters:
        fieldseparator - the char(s)
      • getCommentOut

        public char[] getCommentOut()
        Returns:
        the comment chars
      • getFieldSeparator

        public char[] getFieldSeparator()
        Returns:
        the CSV field separator char(s)
      • getTextSeparator

        public char getTextSeparator()
        Returns:
        the CSV text separator char
      • setLinesToSkip

        public void setLinesToSkip​(int i)
        Set the number of real lines which are skipped when readNextLine() is called the first time.
        Parameters:
        i - must be a positive integer value
        See Also:
        getCurrentLineNumber()
      • getCurrentLineNumber

        public int getCurrentLineNumber()
        Returns:
        the position of the last line read via readNextLine() within the input data (file).
      • notifyNextLine

        protected void notifyNextLine()
        Increments the line number and decreases the line skip counter.
      • mustSkip

        protected boolean mustSkip()
        Checks if lines have to be skipped still.
        Returns:
        true if lines have to be skipped, otherwise false
      • setMaxBufferLines

        public void setMaxBufferLines​(int number)
        Set maxBufferLines to a new value. Must be larger than 5.
        Parameters:
        number - the number
      • isFinished

        public boolean isFinished()
        Is reading of input stream finished (has reached end)?
        Returns:
        true if input stream has reached end.
      • markFinished

        protected void markFinished()
        Sets the finished flag which indicates the reaching of stream end.
      • parse

        public static final java.util.Map<java.lang.Integer,​java.lang.String>[] parse​(CSVReader reader)
        Convenience method which parses csv lines directly from the given reader.
        Parameters:
        reader - the reader holding the lines to be read
        Returns:
        a array of maps describing one parsed line each
      • parse

        public static final java.util.Map<java.lang.Integer,​java.lang.String>[] parse​(java.lang.String lines,
                                                                                            char[] fieldSeparator,
                                                                                            char textSeparator)
        Convenience method which parses csv lines directly from the given string.
        Parameters:
        lines - text which will be read
        fieldSeparator - e.g. ;
        textSeparator - e.g. "
        Returns:
        a array of maps describing one parsed line each
      • parse

        public static final java.util.Map<java.lang.Integer,​java.lang.String>[] parse​(java.lang.String lines,
                                                                                            char[] fieldSeparator)
        Convenience method which parses csv lines directly from the given string.
        Parameters:
        lines - text which will be read
        fieldSeparator - e.g. ;
        Returns:
        a array of maps describing one parsed line each
      • parse

        public static final java.util.Map<java.lang.Integer,​java.lang.String>[] parse​(java.lang.String lines)
        Convenience method which parses csv lines directly from the given string.
        Parameters:
        lines - text which will be read
        Returns:
        a array of maps describing one parsed line each
      • applyDecorators

        public static java.util.Map applyDecorators​(java.util.Map<java.lang.Integer,​CSVCellDecorator> decoratorMap,
                                                    java.util.Map line)
        Applies given decorators to columns in given line.
        Parameters:
        decoratorMap - map containing the decorators mapped to column positions
        line - map containing columns of a line
        Returns:
        the line after applying given decorators