SELECT product, SUM(profit) FROM table GROUP BY product;
order | line_number | profit | month | product |
---|---|---|---|---|
1 | 1 | $1.50 | 1 | 53 |
1 | 2 | $3.00 | 1 | 989 |
2 | 1 | $2.78 | 1 | 78 |
3 | 1 | $1.98 | 2 | 211 |
3 | 2 | $5.67 | 2 | 987 |
3 | 3 | $1.98 | 2 | 78 |
SELECT quarter, product, SUM(profit) FROM table GROUP BY quarter, product;
order | line_number | profit | month | product | quarter |
---|---|---|---|---|---|
1 | 1 | $1.50 | 1 | 53 | 1 |
1 | 2 | $3.00 | 1 | 989 | 1 |
2 | 1 | $2.78 | 1 | 78 | 1 |
3 | 1 | $1.98 | 2 | 211 | 1 |
3 | 2 | $5.67 | 2 | 987 | 1 |
3 | 3 | $1.98 | 2 | 78 | 1 |
SELECT manufacturer, SUM(profit) FROM table GROUP BY manufacturer;
order | line_number | profit | month | product | quarter | manufacturer |
---|---|---|---|---|---|---|
1 | 1 | $1.50 | 1 | 53 | 1 | TruxRUs |
1 | 2 | $3.00 | 1 | 989 | 1 | TruxRUs |
2 | 1 | $2.78 | 1 | 78 | 1 | Holidayz |
3 | 1 | $1.98 | 2 | 211 | 1 | Holidayz |
3 | 2 | $5.67 | 2 | 987 | 1 | TruxRUs |
3 | 3 | $1.98 | 2 | 78 | 1 | Disney |
SELECT man_country, SUM(profit) FROM table GROUP BY man_country;
order | ln | profit | month | product | quarter | manufacturer | man_country |
---|---|---|---|---|---|---|---|
1 | 1 | $1.50 | 1 | 53 | 1 | TruxRUs | USA |
1 | 2 | $3.00 | 1 | 989 | 1 | TruxRUs | USA |
2 | 1 | $2.78 | 1 | 78 | 1 | Holidayz | Russia |
3 | 1 | $1.98 | 2 | 211 | 1 | Holidayz | Russia |
3 | 2 | $5.67 | 2 | 987 | 1 | TruxRUs | USA |
3 | 3 | $1.98 | 2 | 78 | 1 | Disney | Canada |
We can see that the tables we designed to answer our queries consisted of two parts:
The star schema will consist of two main parts:
These are the things we want to "measure," and generally correspond one-to-one to a business process, often a "transaction" of some sort.
We think of these being big N! Many rows of the measure and we want to aggregate them.
Because we aggregate them, we usually think of these being continuous variables, preferably that are additive.
Grain!
How do we aggregate the facts? We aggregate them by dimensions.
Dimension tables usually consist of the discrete variables by which we want to aggregate our measures to ask different questions.
id | profit | quantity | product_id | time_id | order_id |
---|---|---|---|---|---|
1 | $1.50 | 1 | 53 | 987 | 1 |
2 | $3.00 | 15 | 989 | 876 | 1 |
3 | $2.78 | 13 | 78 | 788 | 2 |
4 | $1.98 | 2 | 211 | 333 | 3 |
5 | $5.67 | 1 | 53 | 400 | 3 |
6 | $1.98 | 43 | 78 | 93 | 3 |
SELECT dow, SUM(profit) FROM facts left join time GROUP BY dow;
time_id | date | month | dow |
---|---|---|---|
93 | 23 | may | tue |
987 | 15 | june | wed |
788 | 10 | may | wed |
400 | 30 | may | sat |
SELECT holiday, SUM(profit) FROM facts left join time GROUP BY holiday;
time_id | date | month | dow | holiday |
---|---|---|---|---|
93 | 23 | may | tue | true |
987 | 15 | june | wed | false |
788 | 10 | may | wed | false |
400 | 30 | may | sat | false |