DttG

RKD STUDIES

2.1 The Limits of Traditional Data Repositories in Humanities Research


Art historical research, like much work in the humanities, relies on bringing together a broad and often heterogeneous collection of source material. Whether consulting archival documents on painting technique and material trade, auction catalogues, or early descriptions of works of art to trace provenance, the data we work with is typically diverse in both format and origin. In practice, such materials are often compiled informally: stored locally on researchers’ computers, in personal databases, or, most commonly, in Microsoft Excel or Google Sheets files that serve as data repositories. While these tools are familiar, flexible, and easy to adopt, they are rarely designed to accommodate the complexity or scale of multi-year research projects. As data accumulates in the form of additional fields and columns, such spreadsheets can become unwieldy and increasingly difficult to query or maintain.

This was precisely the situation faced by the Down to the Ground (DttG) project as it entered its third year. The project’s database had grown into a single ‘master’ spreadsheet used to record technical observations of coloured grounds in Netherlandish paintings from c. 1550 to 1650. Each row represented a painting, containing metadata such as artist, date, support, and current collection, alongside observations on the preparatory layers: the number of grounds, their colour, stratigraphy, and identified materials. By that stage, the spreadsheet contained over 30 columns and several hundred entries drawn from published literature, unpublished reports, and new research conducted within the project.

Though adequate as a way to store data, this format was ill-suited for actively supporting (technical) art historical research that the project sought to enable. Furthermore, as the dataset grew, so too did the challenges of ensuring internal consistency. Basic but essential operations, such as e.g. identifying all paintings by a given artist using a red ground, were difficult to execute reliably, in part because the same information might be entered inconsistently across rows. Data redundancy (e.g. the same artist or pigment name repeated across multiple entries) introduced inevitable inconsistencies. A minor spelling variation or typographic error could fragment research results, and compromise data continuity and comparative analysis. Spreadsheet software lacks mechanisms for enforcing relationships between different categories of data. There is nothing to prevent a user from entering contradicting metadata, or a pigment or artist name incorrectly in a way that does not match other entries, since these fields exist independently with no built-in checks.

To address these limitations, we made the decision to migrate the dataset to a structured relational database. This transformation, from a single static, locally stored spreadsheet to a normalized, interlinked data model, represented a further rethinking of how the data could be structured not simply to store information, but to actively support and sustain research.