Class ExcelCSVParser

java.lang.Object
com.Ostermiller.util.ExcelCSVParser
All Implemented Interfaces:
CSVParse

public class ExcelCSVParser extends Object implements CSVParse
Read files in comma separated value format as outputted by the Microsoft Excel Spreadsheet program. More information about this class is available from ostermiller.org.

Excel CSV is a file format used as a portable representation of a database. The file format is described by RFC 4180.

Each line is one entry or record and the fields in a record are separated by commas. If field includes a comma or a new line, the whole field must be surrounded with double quotes. When the field is in quotes, any quote literals must be escaped by two quotes (""). Text that comes after quotes that have been closed but come before the next comma will be ignored.

Empty fields are returned as as String of length zero: "". The following line has three empty fields and three non-empty fields in it. There is an empty field on each end, and one in the middle. One token is returned as a space.

,second,, ,fifth,

Blank lines are always ignored. Other lines will be ignored if they start with a comment character as set by the setCommentStart() method.

An example of how ExcelCSVParser might be used:

 ExcelCSVParser shredder = new ExcelCSVParser(System.in);
 String t;
 while ((t = shredder.nextValue()) != null){
     System.out.println("" + shredder.lastLineNumber() + " " + t);
 }
 

The CSV that Excel outputs differs the format read by com.Ostermiller.util.CSVParser:

  • Leading and trailing whitespace is significant.
  • A backslash is not a special character and is not used to escape anything.
  • Quotes inside quoted strings are escaped with a double quote rather than a backslash.
  • Excel may convert data before putting it in CSV format:
    • Tabs are converted to a single space.
    • New lines in the data are always represented as the UNIX new line. ("\n")
    • Numbers that are greater than 12 digits may be represented in truncated scientific notation form.
    This parser does not attempt to fix these Excel conversions, but users should be aware of them.
Since:
ostermillerutils 1.00.00
Author:
Stephen Ostermiller https://ostermiller.org/contact.pl?regarding=Java+Utilities
See Also:
  • Constructor Details

    • ExcelCSVParser

      public ExcelCSVParser(InputStream in, char delimiter) throws BadDelimiterException
      Create a parser to parse delimited values from an InputStream.
      Parameters:
      in - stream that contains comma separated values.
      delimiter - record separator
      Throws:
      BadDelimiterException - if the specified delimiter cannot be used
      Since:
      ostermillerutils 1.02.24
    • ExcelCSVParser

      public ExcelCSVParser(InputStream in)
      Create a parser to parse comma separated values from an InputStream.
      Parameters:
      in - stream that contains comma separated values.
      Since:
      ostermillerutils 1.00.00
    • ExcelCSVParser

      public ExcelCSVParser(Reader in, char delimiter) throws BadDelimiterException
      Create a parser to parse delimited values from a Reader.
      Parameters:
      in - reader that contains comma separated values.
      delimiter - record separator
      Throws:
      BadDelimiterException - if the specified delimiter cannot be used
      Since:
      ostermillerutils 1.02.24
    • ExcelCSVParser

      public ExcelCSVParser(Reader in)
      Create a parser to parse comma separated values from a Reader.
      Parameters:
      in - reader that contains comma separated values.
      Since:
      ostermillerutils 1.00.00
  • Method Details

    • close

      public void close() throws IOException
      Close any stream upon which this parser is based.
      Specified by:
      close in interface CSVParse
      Throws:
      IOException - if an error occurs while closing the stream.
      Since:
      ostermillerutils 1.02.22
    • nextValue

      public String nextValue() throws IOException
      get the next value.
      Specified by:
      nextValue in interface CSVParse
      Returns:
      the next value or null if there are no more values.
      Throws:
      IOException - if an error occurs while reading.
      Since:
      ostermillerutils 1.00.00
    • lastLineNumber

      public int lastLineNumber()
      Get the line number that the last token came from.

      New line breaks that occur in the middle of a token are no counted in the line number count.

      Specified by:
      lastLineNumber in interface CSVParse
      Returns:
      line number or -1 if no tokens have been returned yet.
      Since:
      ostermillerutils 1.00.00
    • getLine

      public String[] getLine() throws IOException
      Get all the values from a line.

      If the line has already been partially read, only the values that have not already been read will be included.

      Specified by:
      getLine in interface CSVParse
      Returns:
      all the values from the line or null if there are no more values.
      Throws:
      IOException - if an error occurs while reading.
      Since:
      ostermillerutils 1.00.00
    • getAllValues

      public String[][] getAllValues() throws IOException
      Get all the values from the file.

      If the file has already been partially read, only the values that have not already been read will be included.

      Each line of the file that has at least one value will be represented. Comments and empty lines are ignored.

      The resulting double array may be jagged.

      Specified by:
      getAllValues in interface CSVParse
      Returns:
      all the values from the file or null if there are no more values.
      Throws:
      IOException - if an error occurs while reading.
      Since:
      ostermillerutils 1.00.00
    • changeDelimiter

      public void changeDelimiter(char newDelim) throws BadDelimiterException
      Change this parser so that it uses a new delimiter.

      The initial character is a comma, the delimiter cannot be changed to a quote or other character that has special meaning in CSV.

      Specified by:
      changeDelimiter in interface CSVParse
      Parameters:
      newDelim - delimiter to which to switch.
      Throws:
      BadDelimiterException - if the character cannot be used as a delimiter.
      Since:
      ostermillerutils 1.02.08
    • changeQuote

      public void changeQuote(char newQuote) throws BadQuoteException
      Change this parser so that it uses a new character for quoting.

      The initial character is a double quote ("), the delimiter cannot be changed to a comma or other character that has special meaning in CSV.

      Specified by:
      changeQuote in interface CSVParse
      Parameters:
      newQuote - character to use for quoting.
      Throws:
      BadQuoteException - if the character cannot be used as a quote.
      Since:
      ostermillerutils 1.02.16
    • setCommentStart

      public void setCommentStart(String commentDelims)
      Set the characters that indicate a comment at the beginning of the line. For example if the string "#;!" were passed in, all of the following lines would be comments:
       # Comment
       ; Another Comment
       ! Yet another comment
      By default there are no comments in CVS files. Commas and quotes may not be used to indicate comment lines.
      Parameters:
      commentDelims - list of characters a comment line may start with.
      Since:
      ostermillerutils 1.00.00
    • getLastLineNumber

      public int getLastLineNumber()
      Get the number of the line from which the last value was retrieved.
      Specified by:
      getLastLineNumber in interface CSVParse
      Returns:
      line number or -1 if no tokens have been returned.
      Since:
      ostermillerutils 1.00.00
    • parse

      public static String[][] parse(String s)
      Parse the comma delimited data from a string.
      Parameters:
      s - string with comma delimited data to parse.
      Returns:
      parsed data.
      Since:
      ostermillerutils 1.02.03
    • parse

      public static String[][] parse(String s, char delimiter) throws BadDelimiterException
      Parse the delimited data from a string.
      Parameters:
      s - string with delimited data to parse.
      delimiter - record separator
      Returns:
      parsed data.
      Throws:
      BadDelimiterException - if the character cannot be used as a delimiter.
      Since:
      ostermillerutils 1.02.24
    • parse

      public static String[][] parse(Reader in) throws IOException
      Parse the comma delimited data from a stream.
      Parameters:
      in - Reader with comma delimited data to parse.
      Returns:
      parsed data.
      Throws:
      IOException - if an error occurs while reading.
      Since:
      ostermillerutils 1.02.03
    • parse

      public static String[][] parse(Reader in, char delimiter) throws IOException, BadDelimiterException
      Parse the delimited data from a stream.
      Parameters:
      in - Reader with delimited data to parse.
      delimiter - record separator
      Returns:
      parsed data.
      Throws:
      BadDelimiterException - if the character cannot be used as a delimiter.
      IOException - if an error occurs while reading.
      Since:
      ostermillerutils 1.02.24