At Crisp, we understand the importance of delivering data in a timely manner and on a predictable schedule, so we take many measures to provide a consistent and reliable data pipeline. Since the way retailers and distributors provide data varies greatly, ingesting and delivering data from a wide variety of sources is a complex task. This article will help you understand the strategies Crisp uses to provide reliable and timely data while tailoring our approach to the data source.
Detecting and pulling new data
The majority of Crisp data sources (i.e., upstream systems) do not have documented schedules of when new data is available and do not have a way to notify downstream systems like Crisp when new data is available. So, we check for new data on a regular schedule determined by how often we have found that new data is typically available from a source. If no data or only partial data is returned when we check, we check again later. However, we have to avoid overloading the source data portals we pull from. To balance these considerations with providing timely data, we take into account the following factors when scheduling data pulls:
- When we expect new data to become available (based on our testing of data availability over time)
- How long we wait before we check again if the data was not available or incomplete
- How many simultaneous data pulls we believe the source system can handle. Many humans and services depend on the upstream system, so we strive to play fair and not overload our sources.
Balancing timeliness and completeness
Crisp also takes the following measures to ensure that data is not only timely, but also complete and actionable, to help you confidently perform analysis upon delivery.
- We strive to pull the most important data first, like current sales and inventory, then proceed with data that might be less time sensitive, like changes to product and store lists or restating historical data.
- We try to provide a complete set of updates at once, rather than letting changes trickle in slowly. For example, we may update inventory and sales data at the same time, even if the inventory data becomes available minutes or even hours sooner.
- We designed our system to slightly prioritize completeness over timeliness and achieve this by buffering changes for a predefined amount of time before updating dashboards and destination connectors. We do this to try to ensure that reports and dashboard visualizations are not missing pieces of data that were not available at the time of the data pull.
Crisp data delivery
Given the complexity of balancing timeliness, completeness, and being fair users of our source systems, Crisp cannot guarantee exactly when data will be available in our system. However, we do strive to provide data as quickly as possible once we pull it from the source. We internally track and measure data delivery timeliness to make sure we are providing the best service we can. Once you connect a data source to Crisp, we can provide a history of data availability upon request. For destination connectors, in general you can expect that data will be available within 3 hours from the time we ingest a complete report from the source.