Singapore's public agencies collectively hold tens of millions of digital image files across their content management systems — and a growing share of those files are exact or near-exact duplicates. That is the central finding emerging from ongoing digital asset audits being conducted across several statutory boards this year, with technology teams flagging the problem as a significant drain on storage budgets, retrieval speeds, and content integrity.
The timing matters because Singapore is mid-way through a S$3.8 billion Smart Nation 2.0 investment cycle, with the Government Technology Agency (GovTech) pushing agencies to consolidate and rationalise their digital infrastructure before the end of the current fiscal year in March 2027. Redundant image files are not a cosmetic concern — they inflate cloud storage costs, slow down citizen-facing portals, and complicate the training of government AI systems that rely on clean, deduplicated datasets.
What the Data Shows
Industry benchmarks from enterprise content management research suggest that between 25 and 40 percent of images stored in large organisational repositories are duplicates or near-duplicates — files that are pixel-identical, resized versions of the same original, or watermarked variants of a single source photograph. Applied to Singapore's public sector, which the Infocomm Media Development Authority (IMDA) estimated held over 60 petabytes of total digital data as of 2024, even a conservative duplication rate represents hundreds of terabytes of redundant visual content.
Cloud storage pricing in the Singapore market typically ranges from S$0.023 to S$0.035 per gigabyte per month for enterprise-tier services, depending on the provider and redundancy tier. At those rates, a single agency storing 500 terabytes of duplicate image data could be spending upward of S$138,000 a month on files that deliver no unique informational value. Across dozens of agencies, the aggregate figure becomes material against any IT operations budget.
The Housing and Development Board, which maintains image libraries covering more than one million residential units across towns from Woodlands to Tampines, and the National Parks Board, which documents green spaces including the Southern Ridges corridor and Bishan-Ang Mo Kio Park, are among the agencies understood to be running the most image-intensive repositories. Neither agency has publicly disclosed the precise scope of its duplication problem, but both have been named in GovTech's Whole-of-Government Digital Infrastructure review as participants in the current rationalisation exercise.
Automated Detection and What Comes Next
The technical fix is well established. Perceptual hashing algorithms — tools that generate a compact numerical fingerprint for each image and flag files with fingerprints falling within a defined similarity threshold — can process millions of images in hours. Several Singapore-based technology firms, including those operating out of the one-north innovation district in Queenstown, have built deduplication pipelines specifically calibrated for the multilingual, multi-format image libraries that characterise local public-sector archives, which must handle assets in English, Mandarin, Malay, and Tamil.
The harder problem is governance. Deleting a file flagged as a duplicate requires confirming that no downstream system, archive record, or published URL depends on that specific file instance. A photograph of the Toa Payoh town centre used in a 2019 annual report PDF, for example, may exist as three separate uploads across three departments — but deleting two of them without checking embedded links risks breaking legacy documents that remain publicly accessible. GovTech's Central Manpower Base for digital operations has been developing a dependency-mapping protocol to address exactly this, with a pilot scheduled for the fourth quarter of 2026.
For private-sector operators — e-commerce platforms along the Orchard Road retail corridor, media companies, and the real estate portals that list HDB resale flats starting from around S$450,000 in mature estates — the commercial calculus is more straightforward. Automated deduplication tools are available as managed services, and the return on investment typically materialises within two to three billing cycles of cloud storage savings. The public sector's challenge is not the technology. It is the institutional discipline to act on what the numbers already make obvious.