Singapore's push to become a fully digital society generated something nobody planned for: millions of duplicate images clogging the servers of hospitals, town councils, and government portals alike. The problem did not appear overnight. It accumulated steadily across two decades of digitisation drives that prioritised speed of upload over quality of data management, and it is only now, as agencies consolidate systems under the Smart Nation 2.0 framework, that the true scale of the redundancy is becoming clear.
The timing matters. With the Government Technology Agency, known as GovTech, accelerating its whole-of-government data harmonisation push through 2025 and into this year, legacy image libraries — think property photographs on the HDB portal, patient X-rays distributed across National University Hospital and Tan Tock Seng Hospital, scanned documents on Singpass-linked services — are being audited for the first time in many cases. Duplicate image replacement, the systematic process of identifying redundant files and substituting a single canonical version, is now an operational priority rather than a background IT task.
How the Duplication Accumulated
The roots of the problem trace back to the early 2000s, when individual ministries and statutory boards digitised their own records independently. The Housing and Development Board scanned hundreds of thousands of flat inspection photographs. The Land Transport Authority built its own image libraries for road inspection reports. The National Heritage Board digitised museum collections at the National Museum of Singapore on St Andrew's Road. None of these systems spoke to each other, and none had a shared deduplication protocol.
When agencies later integrated onto common platforms — the LifeSG app, for instance, launched in 2018 and expanded significantly by 2021 — images migrated across but were rarely cleaned. A single photograph of an HDB block in Tampines or a scanned identity document uploaded through multiple service touchpoints could exist in three or four database entries simultaneously. Cloud migration, which accelerated after GovTech moved significant government workloads to commercial cloud providers beginning around 2019, compounded the issue: storage was cheap enough that nobody felt urgent pressure to deduplicate.
The cost of that complacency is no longer trivial. Cloud storage pricing, while falling globally, still adds up at government scale. Industry benchmarks suggest that unmanaged image repositories in large organisations can carry redundancy rates of 20 to 40 percent by file count, meaning that for every ten images stored, two to four are functionally identical or near-identical copies. Applied to a government estate of Singapore's size — GovTech alone oversees more than 600 digital government services as of 2025 — the excess storage footprint runs into hundreds of terabytes.
The Push Toward Systematic Replacement
GovTech's Data and AI Services division has been piloting automated deduplication tools since at least the second half of 2024. The approach being tested relies on perceptual hashing algorithms, which identify visually similar images even when file names or metadata differ — a common problem when the same scanned document was uploaded by different staff at different times. The Integrated Health Information Systems, or IHiS, which manages IT infrastructure for Singapore's public healthcare clusters, has been running a parallel exercise focused on medical imaging records across polyclinics and restructured hospitals.
For the private sector, the impetus is partly regulatory. The Personal Data Protection Commission has in recent years sharpened its guidance on data minimisation, the principle that organisations should not retain more personal data — including images — than is strictly necessary. Duplicate images of individuals, whether ID photographs or CCTV frames stored by building management corporations in places like CapitaSpring on Market Street or Jewel Changi Airport, now carry a compliance dimension they did not a decade ago.
Businesses and agency IT teams dealing with legacy image libraries should treat the current period as a practical window to act. Automated tools for perceptual deduplication are mature and widely available. The more pressing task is governance: establishing clear ownership of canonical image files before replacement pipelines are built, so that deleting a duplicate does not inadvertently break a linked record elsewhere in the system. Getting that sequencing right is the unglamorous work that determines whether the cleanup actually holds.