Fact Table Conventions
When building fact tables, follow these conventions
Place fact tables in the schema Fact
Use the default schema of the database (dbo in the case of SQL server)
This does not allow the fact tables to be identified easily. While is pre or postfix could be used for that, this would simply mimic the behaviour of a schema.
Using a schema also makes it convenient to grant business users access to all facts and dimension, but not to other tables that are used by the ETL process.
Name the fact table after the event or measurement it represents
Unlike dimension tables, fact represents events or measurements. Use the name the best represents the event that happened instead of the transaction that caused the event.
The transactions that generate sales in TPC made up of Order and Lineitem. The fact table that contains the orders that turn into sales (and which is the join of Order and Lineitem should be named Fact.Sales not Order.
Fact tables contain the following columns
Column Name | Purpose | Data Type | Default Value | NULL'able |
---|---|---|---|---|
EK_[Dimension] | Foreign key to dimension table. | Same as the key in [Dimension] | None | No |
EK_[Dimension]_[Role] | When a dimension is a role playing dimension, use this naming convention for the fact table foreign key | Same as the key in [Dimension] | None | No |
[Measure Name] | Measures | As needed | 0 or None | Yes, when there is a default value |
In the case of factless fact tables, add a column of type bigint named Count with a default value of 1 and NOT NULL
This makes it a lot easier to write queries that count the number of times an event has happened by using the SUM aggregate. If the grain is later reduced, it also makes it easier to represent that more than one even with the same granularity has occurred (using COUNT would require the adding of multiple rows
Don’t add the column, use COUNT instead. This saves space in the fact table.
This idea is rejected because engines like SQL Server requires that counts larger than 2**32 use the aggregation COUNT_BIG instead of COUNT. This means that queries may break if the table exceeds about 2B rows.
Storing a constant value of 1 in a bigint will not take up the full 8 bytes if compression in the database is enabled.