CSV File Format
This document describes a standard CSV file format for exchanging tabular data using text files.
Header Row
The first row of the file must contain a header with the name of each column. the header follows the conventions below and the title of each column is considered a string and hence must be quoted.
The header row also determines the number of columns for every row in the reminder of the file. Should a row contain fewer or more columns than in the header – this is considered an error in the file.
Column Delimiter
Columns should be delimited by a pipe (| – 0x7C). This character is chosen because it is unlikely to occur in plain text and it makes it easy for a human to read the format and distinguish columns from each other.
Row Delimiter
Rows must be delimited by the Line Feed (LF – 0x0A) character. Note that this is not the default on Windows systems and old Macintosh machines. The reason LF is chosen is that this makes the files smaller (as compared with the Windows standard of CR+LF)
Data Type Representation
The subsections here describe details on how to represent common data types.
Representing Strings
String values should be quoted using double quotes (” – 0x22). For example, the following is correct:
2010|”SE”|42
2011|”SE”|43
2010|”DK”|7
2011|”DK”|7
String Escape Sequences
If data contains quotes, LF or other reserved characters in text strings, the data itself should be escaped like this:
Character | ASCII | Description | Escape Sequence |
---|---|---|---|
“ | 0x22 | Double Quotes | “ |
| | 0x7C | Pipe | | |
LF | 0x0A | Line Feed | n |
0x5C | Backslash | \ |
Representing Numbers
Decimals should be separated with period (. – 0x2E). Thousand separators should not be used. Zero padding should not be used either.
Representing Date and Time
Date and time should be represented using the ISO 8601 and RFC 3339 format. For reference, the format is:
Unless specified otherwise, times are assumed to be in UTC. Dates should use the Gregorian Calendar. Year should always include at least four digits.
See:
- ISO 8601 WikiPedia entry on ISO 8601 (Since the ISO standard itself is behind a paywall)
- RFC 3339 Describes time zone and year offsets
- Gregorian Calendar A detailed description of the problems arising with dates before 1582.
Representing Countries
Data about countries should use the ISO-3166-1 alpha-2 country codes instead of the country name.
See:
- ISO-3166-1 WikiPedia entry on ISO 3166 (Since the ISO standard itself is behind a paywall)
- ISO Countries CSV file with all ISO-3166-1 countries, formatted like described on this page.
Representing Currencies
Currencies should be represented using the ISO-4217 currency codes. Bit coins can be represented using the code BTC.
See:
- ISO-4217 on WikiPedia
Encoding
The entire file should be encoded with UTF-8. This is the most compact format that allows representation of unicode while remaining reasonably compatible with ASCII.
MIME Type
The MIME type of CSV files is text/csv.
See: