Data First / Data Centric
Data First #
What does it mean to be Data First?
This is a corollary to API-first design, where the APIs are designed as the primary mechanism of designing interfaces to applications. Being data-first means that data considerations drive application development, integration decisions, and business processes.
For example, when trying to answer questions about whether an LMS (Learning Management System) has had an effect on conversions or retention, we need to ensure our data model captures the right information.
The final shape of the model might look like:
LMS_ACTIVITY:
- tenant_id
- tenant_name
- current_csm
- tenant_status
- trial_start_date
- subscription_start_date
- subscription_expiry_date
- health_score_trend
LMS_USER:
- user_id
- enrollment_id
ENROLLMENTS:
The ELT (Extract, Load, Transform) approach suggested loading the entirety of the Intellum datasets via the API into a data lake environment, and processing after the fact.
- Type2 changes (tracking historical changes in slowly changing dimensions) become a reality
- Define the business questions we need to answer
- Identify the exact data points required to answer those questions
- Design our data collection, storage, and processing to optimize for those insights
- Ensure that data governance and quality are built into the system from the beginning
This approach minimizes redundant data collection, reduces processing overhead, and creates a clearer connection between business needs and data assets. It also makes it easier to adapt as requirements evolve, since the focus remains on the business value rather than technical implementations.