CSV File Format

CSVIconThis document describes a standard CSV file format for exchanging tabular data using text files.

Header Row

The first row of the file must contain a header with the name of each column. the header follows the conventions below and the title of each column is considered a string and hence must be quoted.

The header row also determines the number of columns for every row in the reminder of the file. Should a row contain fewer or more columns than in the header – this is considered an error in the file.

Column Delimiter

Columns should be delimited by a pipe (| – 0x7C). This character is chosen because it is unlikely to occur in plain text and it makes it easy for a human to read the format and distinguish columns from each other.

Row Delimiter

Rows must be delimited by the Line Feed (LF – 0x0A) character. Note that this is not the default on Windows systems and old Macintosh machines. The reason LF is chosen is that this makes the files smaller (as compared with the Windows standard of CR+LF)

Data Type Representation

The subsections here describe details on how to represent common data types.

Representing Strings

String values should be quoted using double quotes (” – 0x22). For example, the following is correct:

“Year”|”Country”|”Value”

2010|”SE”|42

2011|”SE”|43

2010|”DK”|7

2011|”DK”|7

String Escape Sequences
If data contains quotes, LF or other reserved characters in text strings, the data itself should be escaped like this:

Character ASCII Description Escape Sequence
0x22 Double Quotes
| 0x7C Pipe |
LF 0x0A Line Feed n
0x5C Backslash \

Representing Numbers

Decimals should be separated with period (. – 0x2E). Thousand separators should not be used. Zero padding should not be used either.

Representing Date and Time

Date and time should be represented using the ISO 8601 and RFC 3339 format. For reference, the format is:

YYYY-MM-DDThh:mm:ss

Unless specified otherwise, times are assumed to be in UTC. Dates should use the Gregorian Calendar. Year should always include at least four digits.

See:

  • ISO 8601 WikiPedia entry on ISO 8601 (Since the ISO standard itself is behind a paywall)
  • RFC 3339 Describes time zone and year offsets
  • Gregorian Calendar A detailed description of the problems arising with dates before 1582.

Representing Countries

Data about countries should use the ISO-3166-1 alpha-2 country codes instead of the country name.

See:

  • ISO-3166-1 WikiPedia entry on ISO 3166 (Since the ISO standard itself is behind a paywall)
  • ISO Countries CSV file with all ISO-3166-1 countries, formatted like described on this page.

Representing Currencies

Currencies should be represented using the ISO-4217 currency codes. Bit coins can be represented using the code BTC.

See:

Encoding

The entire file should be encoded with UTF-8. This is the most compact format that allows representation of unicode while remaining reasonably compatible with ASCII.

MIME Type

The MIME type of CSV files is text/csv.

See:

  • RFC 2048 – Details of MIME types
  • RFC 4180 – Defines that text/csv is the MIME type of CSV