Skip to main content
The Daily Singapore

Singapore news, every day

News

Singapore's Duplicate Image Problem: The Key Decisions That Will Shape What Comes Next

As agencies grapple with a backlog of misfiled and duplicated visual records, the choices made in the next six months could define how the city-state manages its digital heritage for decades.

Share

By Singapore News Desk · Published 5 July 2026 at 3:25 am

4 min read

Updated 3 h ago· 5 July 2026 at 12:02 pm

How we reported this

This article was generated by AI from the linked public sources. The Daily Singapore is independently owned and covers Singapore news free from advertiser or sponsor influence. Read our editorial standards →

Singapore's Duplicate Image Problem: The Key Decisions That Will Shape What Comes Next
Photo: Photo by Robert Stokoe on Pexels

Thousands of duplicate images are clogging the digital archives of Singapore's public agencies and cultural institutions, and the people responsible for fixing the problem are running out of easy options. The National Library Board, which oversees holdings across the Lee Kong Chian Reference Library at Victoria Street and multiple regional branches, confirmed last year that it was conducting a structured audit of its digital asset management systems. The duplication issue — images filed under multiple catalogue entries, sometimes with conflicting metadata — has compounded over roughly a decade of digitisation drives.

The problem matters now because Singapore is at a decision point. The country has spent heavily positioning itself as a regional AI and data hub, and messy underlying data undermines that ambition at its roots. Institutions that want to feed visual archives into machine-learning pipelines, train heritage-recognition models, or offer public search tools cannot do so reliably when the same photograph might appear under four different filenames with four different accession dates.

What the Duplication Actually Costs

Storage is the obvious line item. Cloud storage for large image repositories in Singapore runs, at current commercial rates, between S$0.023 and S$0.05 per gigabyte per month depending on the provider and tier. For a mid-sized institution holding several hundred terabytes of visual material — not unusual for a body like the Asian Civilisations Museum on Empress Place — even a 15 percent duplication rate translates into a meaningful recurring expense. The more significant cost, though, is human: staff time spent resolving conflicting records, fielding queries from researchers who find multiple versions of the same image, and manually tagging assets that automated systems cannot process cleanly because of inconsistent metadata.

The Info-communications Media Development Authority has been pressing agencies toward the Whole-of-Government data standards framework, which sets baseline requirements for metadata consistency across public sector digital holdings. But compliance has been uneven. Smaller statutory boards and town councils, which maintain their own photographic records of estates from Tampines to Buona Vista, were not always brought into harmonisation exercises early enough, and the gap shows.

The Decisions That Cannot Be Delayed

Three choices are now pressing. The first is tool selection. Institutions must decide whether to deploy AI-assisted deduplication software — several vendors have pitched solutions to agencies along Fusionopolis Way in one-north — or rely on manual curation workflows. AI tools are faster but require clean training data to work well, which is precisely what is in short supply. Manual workflows are accurate but slow; one heritage sector estimate, cited in a 2025 Government Technology Agency discussion paper, put the cost of manually resolving a single duplicate record cluster at between S$12 and S$40 depending on complexity.

The second decision is governance: who owns the canonical version of a disputed image? When the National Archives of Singapore and a separate agency both hold copies of the same photograph from, say, the 1970s Orchard Road redevelopment, with slightly different crop and colour correction, neither version is obviously wrong. A cross-agency arbitration protocol does not yet exist in published form.

The third, and arguably most consequential, is public access policy. Some institutions have floated the idea of exposing deduplicated archives through an open API, similar to what the Smithsonian Institution implemented in Washington D.C. in 2020. That would let researchers, educators, and app developers query verified, single-instance records directly. The upside is transparency and utility. The downside is that any errors in the deduplication process become publicly visible immediately.

The timeline is not abstract. Several agencies have digitisation grant cycles ending in the first quarter of 2027, meaning procurement decisions for new archive infrastructure need to be made before the end of this calendar year. Institutions that delay selecting a deduplication approach risk inheriting an even larger backlog as new material continues to be ingested. The next six months, from now through December 2026, are when the foundational architecture gets locked in — and changing course after that point will cost considerably more than getting it right the first time.

You might also like

Editorial picks

How did this story land?

Spread the word

Share

Have your say

Loading comments…

Sources

About this article

Published by The Daily Singapore

Covering news in Singapore. This article was generated by AI from the linked sources and was not reviewed by a human editor before publishing. See our editorial standards.

Spread the word

Share

See something wrong? Suggest a correction.

Daily brief

Enjoyed this? Wake up to Singapore news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Singapore and accept our Privacy Policy. Unsubscribe anytime.

Before you go

Get the Singapore brief

The day's Singapore news in a 2-minute read. Free, weekday mornings.

No spam. Unsubscribe anytime.