Singapore's push to become a leading AI and smart-city hub has exposed a persistent, unglamorous problem: duplicate images embedded across government portals, public housing platforms and civic databases are slowing automated systems, distorting machine-learning training sets and wasting server storage that agencies pay for by the terabyte. The question now is not whether to fix it, but how — and who decides the standard.
The issue has grown more urgent in 2026 because Singapore's Smart Nation and Digital Government Office has accelerated the integration of AI tools across public services. When image datasets contain duplicates — the same photograph of a Tampines void deck appearing under three different property listings, or an identical infrastructure photo filed twice in the National Parks Board's green corridor archive — automated systems trained on those sets produce skewed outputs. Garbage in, garbage out, as data engineers say. The downstream costs are real: wasted compute cycles, inaccurate analytics and, in some cases, incorrect information surfaced to citizens.
Where the Problem Lives
Two platforms illustrate the challenge clearly. HDB's resale portal, which handles thousands of flat transactions monthly across towns from Woodlands to Bedok, relies on user-uploaded images for listings. Sellers frequently upload the same photograph multiple times, either by accident or to push their listing higher in search results. The Housing and Development Board has not publicly specified how many duplicate images its moderation systems flag each month, but the structural problem is common to any large-scale crowdsourced image repository. GovTech, the agency responsible for the Singapore Government Tech Stack, acknowledged in its 2025 annual report that image asset management across whole-of-government systems is an area earmarked for improvement, though it did not specify a remediation timeline in that document.
Meanwhile, the National Heritage Board's digitisation drive — which has been scanning physical records from the National Archives at Canning Rise and cataloguing them through the Archives Online portal — has produced its own duplication headache. Batch scanning sometimes creates near-identical files that differ only in compression artefacts, making simple hash-based duplicate detection insufficient. More sophisticated perceptual hashing or AI-assisted deduplication tools are needed, and procurement decisions for those tools are expected before the end of 2026's financial year, which closes in March 2027.
What Happens Next
Three decisions will determine how quickly and cleanly Singapore resolves this. First, GovTech must settle on a deduplication standard — whether to mandate perceptual hashing, vector-embedding comparison, or a hybrid approach — that works across agencies with wildly different image types, from satellite shots of Jurong Island's industrial zones to portrait photographs in MyInfo citizen profiles. A working group is understood to be examining options, though no public consultation has been announced.
Second, the question of retroactive versus prospective cleaning is live. Scrubbing existing databases is expensive and risks deleting images that are duplicated in name but contextually distinct. Agencies must decide whether to quarantine suspected duplicates for human review or apply automated deletion with an appeal window. The latter is faster but carries reputational risk if valid images are lost.
Third, there is the question of citizen-facing transparency. Platforms like the OneService app, run by the Municipal Services Office, allow residents to upload photos of municipal issues from broken pavements in Ang Mo Kio to flooding near Buona Vista MRT. Those uploads feed directly into case-management workflows. If deduplication rules are applied too aggressively, a genuine second report of the same pothole — filed by a different resident — could be silently merged or suppressed, leaving one complainant without acknowledgment.
Agencies have until the middle of this year's second half to present unified recommendations. For residents and businesses using government digital services, the practical takeaway is straightforward: keep originals of any images submitted to government platforms, note your submission reference numbers, and check portal confirmation screens carefully. The cleanup is coming. Whether it arrives smoothly depends on decisions being made in offices along Victoria Street and Mapletree Business City right now.