When you connect Crisp to object storage applications (such as Google Cloud Storage, AWS S3, or Azure Blob Storage), data is exported using structured paths that function like a folder system.)
On the connector configuration screens for object storage applications, Crisp requires you to specify a storage bucket/container to which data will flow, but you can also specify a destination path and relative path to customize the export path (as shown in the following image).
Crisp uses a multi-file export process to support large data sets and improve efficiency, so each export may contain multiple files, denoted with split numbers. The Crisp exporter uses the following default pattern:
DESTINATION_PATH/TABLE_NAME/RELATIVE_PATH
If you do not specify a destination or relative path, Crisp uses the following default path structure:
- Path: <table-name>/<from-date>_<to-date>
- File name: <from-date>_<to-date>-<export-timestamp>-<split-number>
- Combined (path and file name): <table-name>/<from-date>_<to-date><from-date>_<to-date>-<export_timestamp>-<split-number>
For example:
harmonized_retailer_sales/2023-01-30_2023-01-30/2023-01-30_2023-01-30-1675054800000-0000000000000.json.gz
Note: For snapshot reports (e.g., those that are not time-series reports, such as product details), Crisp uses a different default export path. For more: Handling snapshot tables.
Data restatements
Sometimes, the sources that we get data from make changes to that data after the initial data capture. In order to account for these changes, when Crisp receives new data from a source, we will re-export the data for that date (this is also known as restatement). Crisp may also re-export data for other reasons, such as at your request. The default path configuration is set up to restate data in place by removing files for duplicate dates. However, as file names include a timestamp, they may change slightly from the original when Crisp restates data. If you do not want newly exported files to remove existing ones, you can customize your export path to create separate paths for each export. For more: Customizing default paths.
Example re-export filepaths
When Crisp restates or re-exports files, they have an updated timestamp, but the rest of the file path remains the same, as shown in the following example:
Existing file path:
2023-01-29_2023-01-29/2023-01-29_2023-01-29-1675054800000-0000000000000.json.gz
Re-exported file path:
2023-01-29_2023-01-29/2023-01-29_2023-01-29-1675659600000-0000000000000.json.gz
Example paths
You can use the following examples to help you understand how Crisp creates exports paths with different tables and export configurations.
Example 1 - Default configuration, daily frequency table
For a table with the following characteristics and export criteria,
Destination path: Default (i.e., this field is empty in your connector configuration)
Relative path: Default (i.e., this field is empty in your connector configuration)
Table name: harmonized_retailer_sales
Table refresh frequency: Daily
Export format: Compressed JSON
Export date: Feb 1 2023
Most recent data: Jan 30 2023
the export files would be structured as follows:
/crisp/harmonized_retailer_sales/
2023-01-30_2023-01-30/2023-01-30_2023-01-30-1675054920000-0000000000002.json.gz
2023-01-30_2023-01-30/2023-01-30_2023-01-30-1675054860000-0000000000001.json.gz
2023-01-30_2023-01-30/2023-01-30_2023-01-30-1675054800000-0000000000000.json.gz
Example 2 - Default configuration, weekly frequency table
Datasets that are updated weekly, typically by the retailer, will be placed in paths where the start and end dates reflect the time period that the data is for. For example, for a table with the following characteristics and export criteria,
Destination path: Default value
Relative path: Default value
Table name: direct_retailer_weekly_sales
Refresh frequency: Weekly
Export format: Compressed JSON
Export date: Feb 1 2023
Most recent data: Jan 24 2023 - Jan 30 2023
the export paths and files would be structured as follows:
/crisp/direct_retailer_weekly_sales/
2023-01-24_2023-01-30/2023-01-24_2023-01-30-1738431172000-0000000000000.json.gz
2023-01-17_2023-01-23/2023-01-17_2023-01-23-1738431172000-0000000000000.json.gz
Handling snapshot tables
Snapshot tables are those that are not time series (such as product details) and have a different default export path format.
DESTINATION_PATH/TABLE_NAME/EXPORT_DATE
For example, a table with following characteristics and export criteria
Destination path: /crisp
Relative path: Default value
Table name: harmonized_retailer_sales
Refresh frequency: Daily
Export format: Compressed JSON
Export date: Feb 1 2023
will result in the following export path:
/crisp/harmonized_retailer_sales/
2023-01-31_2023-01-31/2023-01-31_2023-01-31-1738431172000-0000000000000.json.gz
If you customize the relative path of a snapshot table, the export_date is replaced with what you specified in the Relative Path field.
Customizing default paths
The relative path portion of the export path can be configured and customized with macros expressions, where path variables in expressions are enclosed by ${}. The default relative path expression is ${fromDate}-${toDate}/${fromDate}-${toDate}.
You may want to customize the relative path if you want to maintain separate folders/files for each export to avoid overwriting data or to follow path structure conventions at your organization. You can use path variables in the relative path field on the connector configuration screen to adjust the default behavior (as shown in the following image).
The supported path variables are:
Key | Description |
fromDate | The smallest time partition key in the exported data. Date format is YYYY-MM-DD. |
toDate | The biggest time partition key in the exported data. The date is inclusive so if the data is for a single day, fromDate and toDate will be the same. Date format is YYYY-MM-DD. |
exportDate | The date that the export takes place. Unrelated to the actual table content. |
outboundConnectorConfigId |
The ID of the Crisp connector configuration. |
Example - Custom configuration with export date
By default, if new data is available from the source, Crisp re-exports and overwrites data for duplicate dates. We do this since sometimes past data changes from the source and we want to ensure you have the most current data. Note: Crisp overwrites files with duplicate paths, but file names may change due to updated timestamps.
If you want to avoid overwriting files and maintain a separate path for each export, you can structure your connector configuration as in the following example.
For a table with the following characteristics and export criteria,
Destination path: /crisp
Relative path: ${exportDate}/data-${fromDate}
Table name: harmonized_retailer_sales
Refresh frequency: Daily
Export format: Compressed JSON
Export date: Feb 1 2023
Most recent data: Jan 30 2023
the export paths and files would be structured as follows:
/crisp/harmonized_retailer_sales/2023-02-01/
data-2023-01-30-1675054800000-0000000000000.json.gz
data-2023-01-29-1674968400000-0000000000000.json.gz
data-2023-01-28-1674882000000-0000000000000.json.gz
...
data-2023-01-17-1673916000000-0000000000000.json.gz