Singapore's push to digitise everything from HDB flat applications to medical records at Tan Tock Seng Hospital has quietly produced a sprawling secondary problem: databases riddled with duplicate images, redundant files, and conflicting visual records that are costing agencies time, storage budget, and, in some cases, processing accuracy. The question of what happens next—who decides the standards, who funds the cleanup, and which technology gets the contract—is now pressing in ways it was not two years ago.
The urgency is not accidental. The Smart Nation and Digital Government Office has been driving agencies toward consolidated data platforms since at least 2022, and the pace picked up again this year as Singapore positioned itself as a regional AI hub. Duplicate image data is not a cosmetic nuisance. In computer vision and machine-learning pipelines, repeated training images skew model outputs, inflate storage costs, and compromise the integrity of public-facing systems—everything from the MyInfo digital identity service to visual inspection tools used at Jurong Port.
Where the Bottlenecks Are Building
The problem shows up most visibly in two areas: government document archives and the consumer-facing digital platforms operated by agencies under GovTech Singapore. Staff at service centres along Orchard Road and at the Central Community Development Council offices have flagged backlogs in document verification that, according to internal workflow reviews cited in trade discussions, can be traced partly to duplicate scans created when physical files were digitised across different departments without a unified naming or hashing protocol.
HDB's resale and rental portal is one specific pressure point. The portal processes hundreds of thousands of document uploads annually—floor plans, identity photographs, renovation permits—and without a robust deduplication layer, the same file often lives in multiple folders under different transaction IDs. Estimates within the govtech procurement community, though not officially published, put the proportion of redundant image files in legacy government repositories at somewhere between 15 and 30 percent, a range consistent with international benchmarks from comparable city-state digitisation programmes in Tallinn and Seoul.
Private sector platforms are in no better shape. PropNex Realty and ERA Singapore, two of the largest real estate agencies with significant listing volumes on platforms like 99.co and PropertyGuru, have each had to manage listing images that appear duplicated across multiple property IDs—a problem that degrades search results and frustrates both agents and buyers hunting for flats in Tampines or Queenstown.
The Decisions That Cannot Wait Much Longer
Three choices are coming to a head. First, the standard: GovTech needs to decide whether to mandate a single perceptual hashing standard—tools like pHash or dHash are already in wide use internationally—or leave agencies to procure their own deduplication solutions. Without a mandated standard, the interoperability problem only deepens as agencies exchange image files more frequently under the Whole-of-Government data sharing framework launched in 2024.
Second, the budget model. Deduplication at scale is not free. Running a retroactive cleanup across, say, the National Archives of Singapore's digitised collection—which holds records going back to the colonial-era Raffles Institution documents—requires compute time and staff hours. Whether that cost sits with individual agencies or gets pooled under a central GovTech budget line is a procurement question with real political weight, particularly as the government manages cost-of-living pressures and scrutinises discretionary IT spending heading into the next budget cycle in February 2027.
Third, the private sector boundary. The government must clarify how much of the deduplication burden falls on commercial platforms that interface with public databases. PropTech firms and healthcare imaging companies operating out of one-north's Biopolis cluster have been waiting for clearer guidance on data standards before committing to infrastructure upgrades.
Practically speaking, organisations sitting on large image repositories should begin auditing now rather than waiting for a government mandate. Running an open-source perceptual hashing tool across existing archives costs relatively little and produces an immediate picture of the scale of the problem. Agencies that can demonstrate they have already reduced redundancy will be better positioned when GovTech sets compliance timelines—a step most observers expect before the end of 2026.