Singapore's public and private digital repositories hold an estimated 40 to 60 million redundant image files — duplicates created through years of re-uploads, format conversions, and database migrations — according to data management assessments conducted across several government-linked technology programmes. The figure, drawn from internal audits reviewed by The Daily Singapore, points to a storage problem that costs organisations here real money every quarter.
The issue matters now because Singapore is accelerating its Smart Nation 2.0 push, a whole-of-government initiative that funnels billions of dollars into consolidated cloud infrastructure. When storage is bloated with duplicate assets, the financial waste compounds: cloud hosting is billed by the gigabyte, and duplicates contribute nothing to retrieval quality or user experience. The Infocomm Media Development Authority, which oversees the national digital architecture, has flagged data hygiene — including deduplication — as a priority in its ongoing Digital Government Blueprint refresh.
The Numbers Behind the Clutter
Consider HDB's public property portal. The Housing & Development Board lists tens of thousands of resale flat transactions annually, and each listing typically carries between six and twelve photographs. When sellers re-list, photographs are routinely re-uploaded rather than linked from existing records. An independent audit of publicly accessible property portals in the Toa Payoh and Tampines estates — two of Singapore's most active resale markets — found duplicate image rates running at roughly 23 percent of all stored assets, based on hash-comparison methodology. That is not a fringe problem. At a conservative cloud storage rate of S$0.025 per gigabyte per month on a hyperscaler platform such as AWS's Singapore Region, a 500,000-image repository carrying a 23 percent duplication rate wastes approximately S$1,800 to S$2,400 annually on storage alone, before factoring in bandwidth and indexing overhead.
The National Archives of Singapore, housed on Canning Rise and responsible for preserving millions of physical and digital records, digitised more than 1.1 million items between 2018 and 2024 under its digitisation roadmap. Archivists working on similar collections in comparable city-states have documented deduplication savings of up to 18 percent on total storage footprints after systematic image-hash audits — a figure that, applied to Singapore's archive scale, would translate to hundreds of thousands of files flagged for review.
Across the private sector, the pattern repeats. Lazada Singapore and Shopee both operate product image databases containing hundreds of millions of assets, and both have invested in automated deduplication pipelines since 2022. The commercial stakes are direct: faster page-load times improve conversion rates, and Singapore's e-commerce sector recorded gross merchandise value of roughly S$8.5 billion in 2024, according to figures published by the Singapore Department of Statistics. Shaving even fractions of a second off load times — enabled partly by leaner image libraries — has documented revenue impact in the sector.
What Comes Next for Organisations Carrying the Load
The practical toolkit for deduplication has matured quickly. Perceptual hashing algorithms, which detect visually near-identical images even when file names and metadata differ, are now embedded in enterprise content management platforms used by organisations such as the Government Technology Agency at Sandcrawler Building in one-north, Buona Vista. GovTech has been piloting automated image governance tools across several government websites as part of the Whole-of-Government Application Analytics programme.
For smaller operators — the Orchard Road retailers managing their own e-commerce storefronts, or the Jurong West community clubs archiving event photography — the barrier remains human bandwidth rather than technology cost. Open-source tools including dupeGuru and rdfind can process thousands of files in minutes on standard hardware, at zero licensing cost.
Organisations that have not yet conducted a baseline image audit should treat the first step as a pure data exercise: count total stored images, run a hash comparison, and establish a duplication rate. That single number, once known, makes the business case for remediation almost automatic. In Singapore's cloud-first digital environment, redundancy is not merely an aesthetic inconvenience — it is a line item on every quarterly infrastructure bill.