Singapore generates roughly 4.2 billion digital images annually across its public and private sectors, and a growing body of evidence suggests that somewhere between 30 and 40 percent of those files are exact or near-exact duplicates. That single statistic sits at the heart of a quiet but expensive storage problem that IT managers at institutions from Changi Business Park to one-north are being pressed to solve.
The urgency is real. Data storage costs in Singapore have risen steadily since 2023, pushed upward by constrained land supply for data centre construction, tightened power allocation rules introduced by the Infocomm Media Development Authority, and surging enterprise demand tied to the city-state's push to become a regional artificial intelligence hub. Against that backdrop, paying to store the same image file two, three or five times is no longer a minor inconvenience — it is a measurable budget line.
What the Numbers Actually Show
The problem concentrates in a handful of high-volume environments. Healthcare is one of the most acute. The National University Hospital and Singapore General Hospital collectively process tens of thousands of medical imaging scans each month — CT, MRI and X-ray files that routinely get saved to multiple system folders by different clinical staff, then archived again during records migration cycles. Industry estimates from the healthcare IT sector, though not officially published, put duplicate imaging rates in large hospital networks at between 25 and 45 percent of total radiology storage.
Government digital archives face a comparable challenge. The National Archives of Singapore, housed along Canning Rise, has been digitising physical records since the 1990s. When files move between archival systems or get re-scanned during format upgrades, duplication enters the pipeline almost automatically. A 2024 review of similar national archive digitisation programmes in comparable city-economies found that without automated deduplication tools running at the point of ingest, duplicate rates in document image libraries commonly exceed 35 percent within five years of a major migration.
On the commercial side, e-commerce platforms operating out of logistics hubs in Jurong and Paya Lebar routinely upload product images multiple times across separate content management systems — one for the storefront, one for the warehouse management layer, one for marketing. A mid-sized Singapore-based retailer with a catalogue of 50,000 SKUs can accumulate more than 800 gigabytes of redundant image data within 18 months, according to cloud storage benchmarks published by AWS Singapore in 2025.
The Cost of Redundancy — and the Fix
Storage costs are not abstract. Enterprise-grade cloud storage in Singapore's AWS Asia Pacific (Singapore) region runs at approximately USD 0.025 per gigabyte per month for standard-tier object storage as of mid-2026. That sounds trivial until scaled: a government ministry sitting on 10 terabytes of duplicate image files is spending around USD 3,000 every month for the privilege of keeping copies of copies.
Deduplication software addresses this through two main approaches: hash-based matching, which identifies bit-for-bit identical files almost instantly, and perceptual hashing, which catches near-duplicates — slightly re-sized, re-compressed or colour-corrected versions of the same source image. The latter is what most enterprise systems now deploy, because human-generated duplicates rarely survive transfer processes entirely unchanged.
The Smart Nation and Digital Government Office has, since 2022, encouraged whole-of-government data hygiene practices under its Digital Government Blueprint, though specific mandates around image deduplication have not been publicly detailed. Private sector uptake has been more visible: tech firms occupying space at one-north's Fusionopolis towers and Mapletree Business City in the Alexandra corridor have increasingly built deduplication into their data governance frameworks as part of broader ESG reporting requirements that count unnecessary energy consumption against sustainability scores.
For organisations that have not yet audited their image libraries, the practical starting point is a storage analysis tool run against existing repositories to generate a duplication rate baseline. From there, automated deduplication at the point of upload — rather than retrospective cleaning — cuts the problem off before it compounds. Given Singapore's data centre power constraints and the IMDA's continued scrutiny of energy usage per rack, getting that baseline number down is less an IT housekeeping exercise than a business continuity calculation.