The growing availability of digitized transcript data holds great promise for understanding students’ pathways through a college curriculum, revealing insight not just into the structure of academic curricula but also how students’ course-taking decisions navigate that structure. However, there are no widely established modeling approaches to reveal those pathways and assess how they differ among demographically distinct student groups. One challenge in using transcript data to study pathways is that the course-taking space is prohibitively large—over 4,000 classes at a large university—while the actual number of courses taken by any given student is comparatively tiny (~ 40). Additionally, raw transcript data does not reveal which course-taking sequences are indicative of a particular academic trajectory.

We present a conceptually appealing, data-driven framework for translating transcript data into information on students’ pathways. Our framework delivers information about students’ movements both through the space of possible majors and also within a particular program. This information is remarkably detailed, but this richness creates statistical challenges in that the analyst must allow for temporal dynamics, heterogeneity, and the possibility that students from a given demographic background may have distinct experiences in different majors. Thus we develop a multilevel statistical model that can leverage the richness of these data, with each level tuned to nonparametrically extract a different kind of substantive information about trajectories, student demographics, and major types, as well as how these interrelate.

We apply the model to reveal the diverse pathways students take within majors, and show how this analysis produces novel insights into differential experiences across gender, ethnic group, and economic background in STEM versus non-STEM fields.