Singapore's push to digitise everything — from HDB flat records to National Library Board collections — created an unexpected problem nobody planned for: the same photograph, the same scanned document, the same infographic appearing dozens of times across different servers, different departments, different public-facing portals. Duplicate image replacement, once a technical footnote in IT procurement papers, is now a formal line item in agency budgets and a recognised discipline inside GovTech Singapore's engineering teams.
The issue matters right now because the government's drive toward integrated smart services — consolidating citizen touchpoints under the Singpass and LifeSG platforms — has forced agencies to merge legacy content repositories that were never designed to talk to each other. When databases merge, duplicate images do not cancel each other out. They multiply. A single photograph of Toa Payoh's iconic dragon playground, scanned separately by the National Heritage Board, the Urban Redevelopment Authority, and a town council communications team, can occupy three separate storage buckets under slightly different filenames, each treated by automated systems as a distinct asset.
How the Problem Accumulated
The roots go back to the early 2000s, when individual statutory boards rushed to build their own digital asset management systems without a common standard. The National Library Board at Victoria Street and the National Archives of Singapore at Canning Rise both digitised overlapping collections of historical images during that period. Because file-naming conventions differed — one system used date-first codes, the other used accession numbers — automated deduplication tools flagged almost nothing. By the time a government-wide content audit was attempted around 2018 under the Smart Nation initiative, the backlog of redundant assets across major agencies ran into the hundreds of thousands of files, according to technology procurement documents reviewed at the time.
Commercial sectors faced the same headache. Singapore Press Holdings — before its restructuring in 2021 — maintained photo archives across multiple newsroom systems. Mediacorp, at its Caldecott campus, dealt with broadcast asset libraries where the same promotional still for a programme might exist in four different resolution variants, each logged separately. The duplication was not malicious. It was the natural consequence of decentralised workflows in organisations that grew fast and standardised slowly.
GovTech's Data and AI Practice, which began consolidating tooling across agencies from around 2022, identified duplicate image detection as one of the higher-priority data hygiene tasks sitting underneath more visible projects like the National AI Strategy 2.0, launched in December 2023. Perceptual hashing — a technique that compares image content rather than filenames or metadata — became the recommended detection method in internal technical guidance circulated to agencies in 2024.
What Replacement Actually Involves
Replacing a duplicate image is not as simple as deleting extras. Each copy may carry different metadata, different licensing annotations, or be embedded in a live webpage or PDF that will break if the asset is removed without a redirect being put in place. The Housing Development Board, which manages image libraries for thousands of BTO project pages on its website, has had to build semi-automated workflows that verify downstream dependencies before any replacement action is committed. A similar process is underway at the Singapore Tourism Board, whose digital asset library feeds imagery to regional marketing partners across Southeast Asia.
The practical cost of doing this badly is measurable. A broken image on a BTO flat listing page, for example, can generate a spike in call centre enquiries — a cost that HDB's service quality teams track per incident. Industry estimates for comparable government digital estate cleanups in cities like London and Tokyo have put per-agency remediation costs in the range of hundreds of thousands of dollars when done retroactively, versus a fraction of that when deduplication is built into ingest workflows from the start.
The lesson being applied across Singapore's agencies now is straightforward: embed duplicate detection at the point of upload, not years later. GovTech has made this part of its Digital Services Standards, updated in 2025, which all new government web projects are required to follow. For organisations still sitting on legacy libraries — and there are many — the practical next step is a controlled audit using perceptual hashing tools, prioritising assets that feed public-facing pages before moving to internal repositories. The unglamorous work of cleaning up digital clutter, it turns out, is what keeps the smarter systems running cleanly underneath.