Singapore's digital archiving effort has hit a familiar snag. Across government agencies, statutory boards, and public institutions, administrators are sitting on vast repositories of duplicated images — the same photograph filed under multiple entries, the same architectural render saved in conflicting formats, the same heritage scan catalogued twice by separate teams. The question of what to do next is no longer theoretical. It is a live operational problem with budget, legal, and preservation consequences.
The pressure has sharpened in 2026. The National Library Board's digital infrastructure, which underpins the National Archives of Singapore at Canning Rise, has been expanding its ingest capacity as part of a broader push to digitise records before the physical media degrades beyond recovery. But ingesting more material without first resolving the duplicate problem compounds it. Every redundant file consumes server space, distorts search results, and creates legal ambiguity about which version of a document is the authoritative record.
Where the Problem Is Concentrated
The issue is particularly acute in two areas. First, the urban planning and heritage space, where agencies including the Urban Redevelopment Authority and the National Heritage Board have been running parallel digitisation programs for years, sometimes photographing the same shophouses along Emerald Hill Road or the same temple facades in Kampong Glam without adequate cross-referencing. Second, in the healthcare and public administration sector, where the transition to unified data platforms — part of the Government Technology Agency's Whole-of-Government approach — has exposed thousands of image files that exist in duplicate or triplicate across legacy systems.
GovTech's Smart Nation infrastructure team has been working since at least 2024 to implement deduplication protocols at the file-hash level, a technical process that compares the underlying digital fingerprint of each image rather than relying on file names or metadata tags, which are frequently inconsistent. The challenge is that automated hash-matching catches exact duplicates but misses near-duplicates — slightly cropped versions, colour-corrected scans, or images resaved at different resolutions, all of which may carry different archival value.
Singapore's public sector holds more than 200 terabytes of digitised image content across major repositories, according to figures cited in parliamentary discussions on digital governance in 2025. Even a conservative duplication rate of 15 percent would represent a meaningful volume of redundant data — and the true rate, based on comparable digitisation projects in cities like London and Tokyo, is often higher in the early phases of consolidation.
The Decisions That Cannot Wait
Three choices are now pressing. The first is governance: which single agency holds authority to declare an image redundant and authorise deletion? Currently no single body has that mandate across all of government. The National Archives of Singapore has custodial authority over public records, but statutory boards with their own digitisation programs operate with considerable autonomy. Without a centralised adjudicator, deduplication stalls at the point of interagency negotiation.
The second decision is technical standard-setting. Institutions need a common metadata schema so that images from the URA's conservation database can be matched against those in the National Heritage Board's iremember portal without manual intervention. Adopting an international standard such as the Dublin Core Metadata Initiative, adapted for local context, would help — but it requires commitment and resourcing across agencies that have historically built their own silos.
The third is the hardest: what constitutes an acceptable loss? Some duplicates are genuinely redundant. Others differ in ways that matter — a photograph taken seconds apart that captures a different crowd angle at the 2015 SG50 celebrations at the Padang, for instance, may look identical to an algorithm but carry distinct documentary value to a historian. Human curatorial review at scale is expensive. Skipping it risks deleting material that cannot be recovered.
The National Library Board is expected to release updated digitisation guidelines later in 2026, which could set the framework for how public institutions approach this problem. In the meantime, agencies running active digitisation projects would be prudent to pause new ingest where duplication is suspected, document their metadata practices, and engage GovTech's data governance team before the backlog grows further. The cost of getting this right now is considerably lower than the cost of sorting it out after another decade of uncoordinated archiving.