Windows CSV File Format

CSV Windows IconThis document describes a standard Windows CSV file format for exchanging tabular data on the Windows platform and between Windows applications like Excel, SQL Server and Access. Because Windows typically uses UTF-16 and has poor support for UTF-8, it is often desirable (though less efficient) to exchange data in this format instead of the one specified in: CSV File Format.

Header Row

The first row of the file must contain a header with the name of each column. the header follows the conventions below for strings.

The header row also determines the number of columns for every row in the reminder of the file. Should a row contain fewer or more columns than in the header – this is considered an error in the file.

Column Delimiter

Columns should be delimited by a tab (, – 0x09). This character is chosen because it is the standard choice when Excel saves in the .txt file format (Save option: UTF-16 Unicode Text).

Row Delimiter

Rows must be delimited by the Carriage Return (CR – 0x0D) character followed the Line Feed (LF – 0x0A) character. Note that this is not the default on Windows systems and old Macintosh machines and on Unix and Linux. Files originating from those operating system must typically be converted before importing to Windows. (HELP WANTED: If anyone has written a utility to convert, I would happily link it here)

Data Type Representation

The subsections here describe details on how to represent common data types.

Representing Strings

String values should be quoted using double quotes (” – 0x22) when they contain either the comma, CR+LF or quote character. For example, the following is correct:

String Escape Sequences

If data contains quotes, CR, LF or other reserved characters in text strings, the data itself should be formatted like this:

Character ASCII Description Escape Sequence
0x22 Double Quotes “”
LF 0x0A Line Feed Quote the string that contains it
CR 0x0D Carriage Return Quote the string that contains it
TAB 0x09 Tab Quote the string that contains it

The rules for quoting special characters apply to the header file too.

Representing Numbers

Decimals should be separated with period (. – 0x2E) not with comma. Thousand separators should not be used. Zero padding should not be used either.

Representing Date and Time

Date and time should be represented using the ISO 8601 and RFC 3339 format. For reference, the format is:

YYYY-MM-DDThh:mm:ss

Unless specified otherwise, times are assumed to be in UTC. Dates should use the Gregorian Calendar. Year should always include at least four digits.

See:

  • ISO 8601 WikiPedia entry on ISO 8601 (Since the ISO standard itself is behind a paywall)
  • RFC 3339 Describes time zone and year offsets
  • Gregorian Calendar A detailed description of the problems arising with dates before 1582.

Representing Countries

Data about countries should use the ISO-3166-1 alpha-2 country codes instead of the country name.

See:

  • ISO-3166-1 WikiPedia entry on ISO 3166 (Since the ISO standard itself is behind a paywall)
  • ISO Countries CSV file with all ISO-3166-1 countries, formatted like described on this page.

Representing Currencies

Currencies should be represented using the ISO-4217 currency codes. Bit coins can be represented using the code BTC.

See:

Encoding

The entire file should be encoded with UTF-16. While this is not the most compact format, it is the format understood by nearly all Windows applications

MIME Type and Extension

The MIME type of CSV files is text/csv. Excel traditionally uses the extension .txt for tab separated, unicode UTF-16 values.

See:

  • RFC 2048 – Details of MIME types
  • RFC 4180 – Defines that text/csv is the MIME type of CSV

Leave a Reply

Your email address will not be published. Required fields are marked *