Thousands of duplicate images are clogging the digital archives of Singapore's public agencies and cultural institutions, and the people responsible for fixing the problem are running out of easy options. The National Library Board, which oversees holdings across the Lee Kong Chian Reference Library at Victoria Street and multiple regional branches, confirmed last year that it was conducting a structured audit of its digital asset management systems. The duplication issue — images filed under multiple catalogue entries, sometimes with conflicting metadata — has compounded over roughly a decade of digitisation drives.
The problem matters now because Singapore is at a decision point. The country has spent heavily positioning itself as a regional AI and data hub, and messy underlying data undermines that ambition at its roots. Institutions that want to feed visual archives into machine-learning pipelines, train heritage-recognition models, or offer public search tools cannot do so reliably when the same photograph might appear under four different filenames with four different accession dates.
What the Duplication Actually Costs
Storage is the obvious line item. Cloud storage for large image repositories in Singapore runs, at current commercial rates, between S$0.023 and S$0.05 per gigabyte per month depending on the provider and tier. For a mid-sized institution holding several hundred terabytes of visual material — not unusual for a body like the Asian Civilisations Museum on Empress Place — even a 15 percent duplication rate translates into a meaningful recurring expense. The more significant cost, though, is human: staff time spent resolving conflicting records, fielding queries from researchers who find multiple versions of the same image, and manually tagging assets that automated systems cannot process cleanly because of inconsistent metadata.
The Info-communications Media Development Authority has been pressing agencies toward the Whole-of-Government data standards framework, which sets baseline requirements for metadata consistency across public sector digital holdings. But compliance has been uneven. Smaller statutory boards and town councils, which maintain their own photographic records of estates from Tampines to Buona Vista, were not always brought into harmonisation exercises early enough, and the gap shows.
The Decisions That Cannot Be Delayed
Three choices are now pressing. The first is tool selection. Institutions must decide whether to deploy AI-assisted deduplication software — several vendors have pitched solutions to agencies along Fusionopolis Way in one-north — or rely on manual curation workflows. AI tools are faster but require clean training data to work well, which is precisely what is in short supply. Manual workflows are accurate but slow; one heritage sector estimate, cited in a 2025 Government Technology Agency discussion paper, put the cost of manually resolving a single duplicate record cluster at between S$12 and S$40 depending on complexity.
The second decision is governance: who owns the canonical version of a disputed image? When the National Archives of Singapore and a separate agency both hold copies of the same photograph from, say, the 1970s Orchard Road redevelopment, with slightly different crop and colour correction, neither version is obviously wrong. A cross-agency arbitration protocol does not yet exist in published form.
The third, and arguably most consequential, is public access policy. Some institutions have floated the idea of exposing deduplicated archives through an open API, similar to what the Smithsonian Institution implemented in Washington D.C. in 2020. That would let researchers, educators, and app developers query verified, single-instance records directly. The upside is transparency and utility. The downside is that any errors in the deduplication process become publicly visible immediately.
The timeline is not abstract. Several agencies have digitisation grant cycles ending in the first quarter of 2027, meaning procurement decisions for new archive infrastructure need to be made before the end of this calendar year. Institutions that delay selecting a deduplication approach risk inheriting an even larger backlog as new material continues to be ingested. The next six months, from now through December 2026, are when the foundational architecture gets locked in — and changing course after that point will cost considerably more than getting it right the first time.