Skip to main content
  1. posts/

Data First / Data Centric

Data First #

What does it mean to be Data First?

This is a corollary to API-first design, where the APIs are designed as the primary mechanism of designing interfaces to applications. Being data-first means that data considerations drive application development, integration decisions, and business processes.

When we adopt a data-first mindset, we prioritize how data will be structured, accessed, and utilized before building features or integrating systems. This approach ensures that the organization can answer critical business questions and derive meaningful insights without being limited by technical constraints.

For example, when trying to answer questions about whether an LMS (Learning Management System) has had an effect on conversions or retention, we need to ensure our data model captures the right information.

The final shape of the model might look like:

LMS_ACTIVITY:

  • tenant_id
  • tenant_name
  • current_csm
  • tenant_status
  • trial_start_date
  • subscription_start_date
  • subscription_expiry_date
  • health_score_trend

LMS_USER:

  • user_id
  • enrollment_id

ENROLLMENTS:

- enrollment_id - course_id - user_id - enrollment_date - completion_status - completion_date - score - time_spent

The ELT (Extract, Load, Transform) approach suggested loading the entirety of the Intellum datasets via the API into a data lake environment, and processing after the fact.

  • Type2 changes (tracking historical changes in slowly changing dimensions) become a reality
What if, instead, we were to design a contract (via the API or otherwise) and extract only what we needed, with the correct semantic meaning? Defining reports in the LMS interface does similar work, but a true data-first approach would:
  1. Define the business questions we need to answer
  2. Identify the exact data points required to answer those questions
  3. Design our data collection, storage, and processing to optimize for those insights
  4. Ensure that data governance and quality are built into the system from the beginning

This approach minimizes redundant data collection, reduces processing overhead, and creates a clearer connection between business needs and data assets. It also makes it easier to adapt as requirements evolve, since the focus remains on the business value rather than technical implementations.

I’ve been fortunate to work with and learn from data, product and growth leaders at companies such as InVision, CB Insights, Breather, Karbon, FreshBooks, Wealthsimple, among others. I love getting my hands dirty in helping build infrastructure to power insights and growth. I’ve helped build data teams from scratch, and I’m learning how to a better manager every day.