Module 01 // Foundation
Available

Introducing the Synthetic School Dataset

Before you can find the signal, you need to understand the structure.

// module overview
This module is your orientation to the teaching dataset. You won't run any analysis here — instead, you'll learn to navigate. Understanding the shape and relationships of your data before you touch it is one of the most important habits you can build as an analyst.

We'll walk through all 11 tables, from the student roster to SEL surveys, and understand how they connect to each other. By the end, you'll be able to look at any record in the dataset and know exactly where it came from, what it means, and what it doesn't tell you.
// key insight
Raw data doesn't have a narrative — that's the analyst's job. This module teaches you to read the map before you start the journey.
// what you'll learn
🏫
What Educators Will Learn
  • What a relational database actually is, and why school data is structured that way
  • How attendance, grades, behavior, and engagement data connect to a single student record
  • What 'raw data' means — and why it looks nothing like a report card
  • The difference between a record and a metric, and why that distinction matters for interventions
  • An introduction to FERPA: what data you can use, how to protect it, and what 'synthetic' means
🐍
Python Walkthrough
  • Loading all 11 CSV files into pandas DataFrames
  • Inspecting dtypes, nulls, and row counts with .info() and .describe()
  • Drawing the entity-relationship diagram in code with a simple schema map
  • Performing your first join: linking students to their attendance records
  • Writing a helper function to pull all records for a single student across every table