About Data Cubes
Once data has been processed, the data is stored in a data cube, which is a multi-dimensional array of complex values. Having a cursory understanding of data cubes is important as you review the data because it will help you better understand SQL query results.
Cube Concepts: Measures and Dimensions
Cube data is grouped into different dimensions, indexed, and have precomputed queries frequently performed. Because of this, data cubes speed query processing.
There are two dimensions groupings:
- Measure - Provides information that you can perform mathematical operations on. For example, temperature or humidity. Measure dimensions (often referred to as simply "measures") Measures are dependent variables.
- Dimensions - Typically qualitative, such as a row or column number. Often referred to as simply "dimensions", these are independent variables. Both measures and dimensions are arrays. An array is an ordered collection of elements.
Let's take a look at an example to better understand these concepts.
NOTE
The following example has plate location (row, column) as the dimensions, but other dimensions can be used. For example, with kinetic plate reads time can be a key dimension in a datacube. Please refer to the specific schema documentation for information on how to interpret the data.
{
"datacubes": [
{
"name": "Plate Reads",
"description": "More information (Optional)",
"mode": "absorbance",
"another_property": "you decide",
"measures": [
{
"name": "OD_600",
"unit": "ArbitraryUnit",
"value": [
[1.19, 1.05, 1.05, 1.01],
[1.11, 0.90, 0.95, 0.98],
[1.11, 0.93, 0.95, 0.99]
]
}
],
"dimensions": [
{
"name": "row",
"unit": "count",
"scale": [0, 1, 2]
},
{
"name": "column",
"unit": "count",
"scale": [0, 1, 2, 3]
}
]
}
]
}
RAW data from a spectrophotometer that contains the following variables. Which are measures and dimensions?
- Optical density at 600 nm wavelength
• Row of the plate well
• Column of the plate well
In this example, the row and columns for the well are descriptive; they simply indicate where the samples were taken from. These are independent variables. Row of the plate well and column of the plate well are dimensions.
The optical density at 600 nm wavelength is a dependent variable (the data depends on the location of the well). The optical density is a measure.
A few other examples appear in the table below.
Example | Measure(s) | Dimension(s) |
---|---|---|
Chromatogram | 1. Detector Intensity | 1. Wavelength 2. Retention Time |
Weather | 1. Temperature 2. Humidity | 1. Longitude 2. Latitude |
Plate Reader | 1. Absorbance 2. Concentration | 1. Row Position 2. Column Position |
Mapping the Cube to Table Structure
Note that data cubes are often visualized as a "rubik's cube-like structure" but are typically stored in SQL tables. Measures and dimensions are mapped to table rows, columns, and values. To better understand this, let’s take a look at the following figure.
In the figure, there are two dimensions and one measure. Their names are “row”, “column” and “OD_600”.
Table column headers are the measures and dimensions with a number appended to them. The number simply indicates the numerical order, starting with the number 0, of the dimension or measure in the JSON file. Since there are two dimensions and one measure, the column headers are dimension_0 (row dimension), dimension_1 (column dimension), and measure_0 (OD_600 measure).
Each dimension has an array. An array is simply an ordered list. In this example, the measure has three arrays. Each array contains elements (values).
Values in the dimension and measure arrays are collated based on the order in which the element is listed in the array. For example in the second measure array [1.11, 0.90, 0.95, and 0.98] the elements are in these positions:
Array Element Position | Measure Value |
---|---|
0 (first) | 1.11 |
1 (second) | .90 |
2 (third) | .95 |
3 (fourth) | .98 |
The position of elements in an array are important because values in each row of the table collated based on the position of the elements across different arrays.
In the example, for the three rows in the well, there are four records. Each record contains the element of the dimension and measure arrays that share the same position. For example, the element in the second position of the column dimension's array is “1”. The element in the second position of the OD_600 measure for the first array is .90.
Note that if there is more than one measure array, the values for the first dimension (the first one that appears in the JSON file) match the element number of the array. Because we are using the second array for the OD_600 measure, we are using the element in the first position of the row dimension's array, which is "1".
Therefore in the table for that row, the dimension_0 (row) value is 1, the dimension_1 (column) value is 1, and the measure_1 (OD_600) value is .90.
Note that if the measure contains several different arrays, the dimension arrays repeat.
Updated about 1 year ago