When you connect Crisp to object storage applications (such as Google Cloud Storage, AWS S3, or Azure Blob Storage) data is exported into paths (which function like a folder system).
On the connector configuration screens for object storage applications, Crisp requires you to specify a storage bucket/container to which data will flow, but you can also specify a destination path and relative path to customize the export path (as shown in the following image).
Note: In order to account for changes to the data from the source, for daily datasets, Crisp exports 14 days worth of data each day. The default path configuration is set up to restate data in place by overwriting files for duplicate date files resulting from this 14-day lookback period. If you do not want newly exported files to overwrite existing ones, you can customize your export path to create separate paths for each export. For more: Customizing default paths.
The Crisp exporter uses the following path structure:
DESTINATION_PATH/TABLE_NAME/RELATIVE_PATH
If you do not specify a destination or relative path, Crisp uses the following default path structure:
crisp/table_name/from_date-to_date/from_date-to_date
Note: For snapshot reports (e.g., those that are not time-series reports, such as product details), Crisp uses a different default export path. For more: Handling snapshot tables.
Example paths
You can use the following examples to help you understand how Crisp creates exports paths with different tables and export configurations.
Example 1 - Default configuration, daily frequency table
For a table with the following characteristics and export criteria,
Destination path: Default (i.e., this field is empty in your connector configuration)
Relative path: Default (i.e., this field is empty in your connector configuration)
Table name: harmonized_retailer_sales
Table refresh frequency: Daily
Export format: Compressed JSON
Export date: Feb 1 2023
Most recent data: Jan 30 2023
the export files would be structured as follows:
/crisp/harmonized_retailer_sales/
2023-01-30_2023-01-30/2023-01-30_2023-01-30.json.gz
2023-01-29_2023-01-29/2023-01-29_2023-01-29.json.gz
…
2023-01-17_2023-01-17/2023-01-17_2023-01-17.json.gz
Example 2 - Default configuration, weekly frequency table
Datasets that are updated weekly, typically by the retailer, will be placed in paths where the start and end dates reflect the time period that the data is for. For example, for a table with the following characteristics and export criteria,
Destination path: Default value
Relative path: Default value
Table name: direct_retailer_weekly_sales
Refresh frequency: Weekly
Export format: Compressed JSON
Export date: Feb 1 2023
Most recent data: Jan 22 2023 - Jan 30 2023
the export paths and files would be structured as follows:
/crisp/direct_retailer_weekly_sales/
2023-01-22_2023-01-30/2023-01-22_2023-01-30.json.gz
2023-01-15_2023-01-21/2023-01-15_2023-01-21.json.gz
Handling snapshot tables
Snapshot tables are those that are not time series (such as product details) and have a different default export path format.
DESTINATION_PATH/TABLE_NAME/EXPORT_DATE
For example, a table with following characteristics and export criteria
Destination path: /crisp
Relative path: Default value
Table name: normalize_walmart_dim_product
Refresh frequency: Daily
Export format: Compressed JSON
Export date: Feb 1 2023
will result in the following export path:
/crisp/harmonized_retailer_sales/
2023-02-01_2023-02-01/2023-02-01_2023-02-01.json.gz
If you customize the relative path of a snapshot table, the export_date is replaced with what you specified in the Relative Path field.
Customizing default paths
The relative path portion of the export path can be configured and customized with macros expressions, where path variables in expressions are enclosed by ${}. The default relative path expression is ${fromDate}-${toDate}/${fromDate}-${toDate}.
You may want to customize the relative path if you want to maintain separate folders/files for each export to avoid overwriting data or to follow path structure conventions at your organization. You can use path variables in the relative path field on the connector configuration screen to adjust the default behavior (as shown in the following image).
The supported path variables are:
Key | Description |
fromDate | The smallest time partition key in the exported data. Date format is YYYY-MM-DD. |
toDate | The biggest time partition key in the exported data. The date is inclusive so if the data is for a single day, fromDate and toDate will be the same. Date format is YYYY-MM-DD. |
exportDate | The date that the export takes place. Unrelated to the actual table content. |
Example - Custom configuration with export date
By default for daily tables, Crisp exports 14 days worth of data each export and overwrites data for duplicate dates. We do this since sometimes past data changes from the source and we want to ensure you have the most current data. If you want to avoid overwriting files and maintain a separate path for each export, you can structure your connector configuration as in the following example.
For a table with the following characteristics and export criteria,
Destination path: /crisp
Relative path: ${exportDate}/data-${fromDate}
Table name: harmonized_retailer_sales
Refresh frequency: Daily
Export format: Compressed JSON
Export date: Feb 1 2023
Most recent data: Jan 30 2023
the export paths and files would be structured as follows:
/crisp/harmonized_retailer_sales/2023-02-01/
data-2023-01-30.json.gz
data-2023-01-29.json.gz
...
data-2023-01-17.json.gz