Skip to main content
The Daily Singapore

Singapore news, every day

News

Singapore's Duplicate Image Problem: The Numbers Driving a Digital Clean-Up

As government agencies and private platforms audit their digital archives, the scale of redundant visual data across Singapore's public sector is larger than most realise.

Share

By Singapore News Desk · Published 5 July 2026 at 2:45 am

4 min read

Updated 5 h ago· 5 July 2026 at 10:17 am

How we reported this

This article was generated by AI from the linked public sources. The Daily Singapore is independently owned and covers Singapore news free from advertiser or sponsor influence. Read our editorial standards →

Singapore's public agencies collectively hold tens of millions of digital image files across their content management systems — and a growing share of those files are exact or near-exact duplicates. That is the central finding emerging from ongoing digital asset audits being conducted across several statutory boards this year, with technology teams flagging the problem as a significant drain on storage budgets, retrieval speeds, and content integrity.

The timing matters because Singapore is mid-way through a S$3.8 billion Smart Nation 2.0 investment cycle, with the Government Technology Agency (GovTech) pushing agencies to consolidate and rationalise their digital infrastructure before the end of the current fiscal year in March 2027. Redundant image files are not a cosmetic concern — they inflate cloud storage costs, slow down citizen-facing portals, and complicate the training of government AI systems that rely on clean, deduplicated datasets.

What the Data Shows

Industry benchmarks from enterprise content management research suggest that between 25 and 40 percent of images stored in large organisational repositories are duplicates or near-duplicates — files that are pixel-identical, resized versions of the same original, or watermarked variants of a single source photograph. Applied to Singapore's public sector, which the Infocomm Media Development Authority (IMDA) estimated held over 60 petabytes of total digital data as of 2024, even a conservative duplication rate represents hundreds of terabytes of redundant visual content.

Cloud storage pricing in the Singapore market typically ranges from S$0.023 to S$0.035 per gigabyte per month for enterprise-tier services, depending on the provider and redundancy tier. At those rates, a single agency storing 500 terabytes of duplicate image data could be spending upward of S$138,000 a month on files that deliver no unique informational value. Across dozens of agencies, the aggregate figure becomes material against any IT operations budget.

The Housing and Development Board, which maintains image libraries covering more than one million residential units across towns from Woodlands to Tampines, and the National Parks Board, which documents green spaces including the Southern Ridges corridor and Bishan-Ang Mo Kio Park, are among the agencies understood to be running the most image-intensive repositories. Neither agency has publicly disclosed the precise scope of its duplication problem, but both have been named in GovTech's Whole-of-Government Digital Infrastructure review as participants in the current rationalisation exercise.

Automated Detection and What Comes Next

The technical fix is well established. Perceptual hashing algorithms — tools that generate a compact numerical fingerprint for each image and flag files with fingerprints falling within a defined similarity threshold — can process millions of images in hours. Several Singapore-based technology firms, including those operating out of the one-north innovation district in Queenstown, have built deduplication pipelines specifically calibrated for the multilingual, multi-format image libraries that characterise local public-sector archives, which must handle assets in English, Mandarin, Malay, and Tamil.

The harder problem is governance. Deleting a file flagged as a duplicate requires confirming that no downstream system, archive record, or published URL depends on that specific file instance. A photograph of the Toa Payoh town centre used in a 2019 annual report PDF, for example, may exist as three separate uploads across three departments — but deleting two of them without checking embedded links risks breaking legacy documents that remain publicly accessible. GovTech's Central Manpower Base for digital operations has been developing a dependency-mapping protocol to address exactly this, with a pilot scheduled for the fourth quarter of 2026.

For private-sector operators — e-commerce platforms along the Orchard Road retail corridor, media companies, and the real estate portals that list HDB resale flats starting from around S$450,000 in mature estates — the commercial calculus is more straightforward. Automated deduplication tools are available as managed services, and the return on investment typically materialises within two to three billing cycles of cloud storage savings. The public sector's challenge is not the technology. It is the institutional discipline to act on what the numbers already make obvious.

You might also like

Editorial picks

How did this story land?

Spread the word

Share

Have your say

Loading comments…

Sources

About this article

Published by The Daily Singapore

Covering news in Singapore. This article was generated by AI from the linked sources and was not reviewed by a human editor before publishing. See our editorial standards.

Spread the word

Share

See something wrong? Suggest a correction.

Daily brief

Enjoyed this? Wake up to Singapore news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Singapore and accept our Privacy Policy. Unsubscribe anytime.

Before you go

Get the Singapore brief

The day's Singapore news in a 2-minute read. Free, weekday mornings.

No spam. Unsubscribe anytime.