Data Practices and Documentation

This project draws on multiple forms of data, including archival records from the Leeds General Infirmary Nurse Training Register (1856–1888), structured datasets organised in Google Sheets, and publicly available Reddit discussions. These sources required different approaches to data preparation due to variations in format, completeness, and reliability.

The archival register contained handwritten and partially legible entries, which were digitised and organised into structured datasets for analysis. This involved transcribing records, standardising categories, and managing incomplete or ambiguous entries. In many cases, missing or unclear data points were retained as meaningful indicators of historical record limitations rather than being removed.

The Reddit dataset was collected through manual scraping of publicly available discussion threads and then cleaned to remove personally identifiable information. This process required balancing data usability with ethical considerations, particularly in relation to privacy and anonymisation.

Once structured, the datasets were organised in tabular formats to enable comparison across historical and contemporary sources. These datasets were then used to generate visualisations and support interactive storytelling outputs, ensuring that fragmented and heterogeneous data could be interpreted in a coherent and accessible way.

Across this process, data handling was not treated as a purely technical step, but as part of the analytical workflow, where decisions about inclusion, structure, and omission directly shaped how the material could be interpreted.