Module 01: Introducing the Synthetic School Dataset

// module overview

This module is your orientation to the teaching dataset. You won't run any analysis here — instead, you'll learn to navigate. Understanding the shape and relationships of your data before you touch it is one of the most important habits you can build as an analyst.

We'll walk through all 11 tables, from the student roster to SEL surveys, and understand how they connect to each other. By the end, you'll be able to look at any record in the dataset and know exactly where it came from, what it means, and what it doesn't tell you.

// key insight

Raw data doesn't have a narrative — that's the analyst's job. This module teaches you to read the map before you start the journey.

// what you'll learn

🏫

What Educators Will Learn

What a relational database actually is, and why school data is structured that way
How attendance, grades, behavior, and engagement data connect to a single student record
What 'raw data' means — and why it looks nothing like a report card
The difference between a record and a metric, and why that distinction matters for interventions
An introduction to FERPA: what data you can use, how to protect it, and what 'synthetic' means

🐍

Python Walkthrough

Loading all 11 CSV files into pandas DataFrames
Inspecting dtypes, nulls, and row counts with .info() and .describe()
Drawing the entity-relationship diagram in code with a simple schema map
Performing your first join: linking students to their attendance records
Writing a helper function to pull all records for a single student across every table