Skip to main content
The Daily Singapore

Singapore news, every day

News

Singapore's Hidden Data Problem: The Numbers Behind Duplicate Images Clogging Government and Commercial Platforms

Redundant digital assets cost organisations storage budget and credibility — and Singapore's push to become a smart nation is forcing the reckoning.

Share

By Singapore News Desk · Published 5 July 2026 at 2:40 am

4 min read

Updated 5 h ago· 5 July 2026 at 10:17 am

How we reported this

This article was generated by AI from the linked public sources. The Daily Singapore is independently owned and covers Singapore news free from advertiser or sponsor influence. Read our editorial standards →

Duplicate images now account for an estimated 30 to 40 percent of total unstructured data stored across enterprise systems globally, and Singapore's public and private sector organisations are not immune. As the city-state accelerates its Digital Government Blueprint targets — the Smart Nation and Digital Government Office has set benchmarks for full digital service delivery — bloated image libraries are quietly draining IT budgets and undermining the accuracy of public-facing platforms.

The issue landed in sharper focus this year after the Infocomm Media Development Authority flagged data hygiene as a priority area in its 2026 operational guidance for agencies seeking to integrate generative AI tools into their workflows. Duplicate or near-duplicate images fed into training datasets distort model outputs, a problem that compound quickly when agencies share asset repositories. The IMDA guidance, published in March 2026, urged public sector entities to conduct structured data audits before deploying AI on internal document and image libraries.

What the Numbers Actually Show

Storage costs alone tell part of the story. Commercial cloud pricing on platforms used by Singapore government-linked companies typically runs between S$0.023 and S$0.045 per gigabyte per month for standard object storage tiers. A mid-sized statutory board managing a public portal with 500,000 image assets — a realistic figure for an agency such as the Housing and Development Board, which publishes tens of thousands of flat listings, construction updates and estate photographs — could be paying for 15 to 20 percent of that storage unnecessarily if routine deduplication is not performed. That translates to tens of thousands of dollars annually in avoidable expenditure, before factoring in bandwidth and retrieval costs.

For the private sector, the numbers are starker. A 2025 industry survey by data management consultancy Aparavi, covering 300 organisations across Southeast Asia including Singapore-registered firms, found that 26 percent of all files stored in enterprise environments were exact or near-exact duplicates. Among media companies and e-commerce operators — both sectors with major regional headquarters along one-north and the Mapletree Business City corridor in Pasir Panjang — image duplication rates ran higher, at close to 34 percent.

The downstream effects matter beyond cost. Duplicate product images on Singapore-based e-commerce platforms create SEO penalties and confuse price-comparison algorithms. Public sector portals that carry repeated images across pages — a common outcome when staff upload assets without checking existing libraries — draw complaints during government usability audits. The Government Technology Agency, which runs the Singapore Government Tech Stack, has included image asset deduplication checks in its web standards assessment criteria since 2024.

Practical Solutions Gaining Traction Locally

Several Singapore organisations have moved to address this systematically. The National Library Board, which manages digital image archives across its branches including the Lee Kong Chian Reference Library at Victoria Street, began rolling out perceptual hashing tools — software that detects visually similar images even when file names or metadata differ — across its digital collections in late 2025. Perceptual hashing compares image fingerprints rather than raw file data, catching near-duplicates that byte-level comparison misses.

The technology is not expensive. Open-source libraries such as ImageHash can be deployed with minimal infrastructure cost, and commercial solutions from vendors including Cloudinary and Imagekit, both of which have Singapore-based client bases, offer automated deduplication as part of their digital asset management packages, typically starting at around S$400 per month for mid-tier plans.

For organisations yet to act, the practical starting point is an audit. IT teams should run a baseline scan to establish what percentage of their image library is duplicated, segment assets by creation date and source system, and set a retention policy before purging. The IMDA's Government on Commercial Cloud framework provides a structured approach for public agencies, while private firms can reference the Personal Data Protection Commission's advisory on data minimisation, updated in January 2026, which explicitly covers redundant file storage as a compliance consideration.

The arithmetic is straightforward. Fewer duplicate images mean lower storage bills, cleaner AI training data and more credible public platforms. For Singapore's ambitions as a regional data hub, that is not a minor housekeeping matter — it is a foundational one.

You might also like

Editorial picks

How did this story land?

Spread the word

Share

Have your say

Loading comments…

Sources

About this article

Published by The Daily Singapore

Covering news in Singapore. This article was generated by AI from the linked sources and was not reviewed by a human editor before publishing. See our editorial standards.

Spread the word

Share

See something wrong? Suggest a correction.

Daily brief

Enjoyed this? Wake up to Singapore news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Singapore and accept our Privacy Policy. Unsubscribe anytime.

Before you go

Get the Singapore brief

The day's Singapore news in a 2-minute read. Free, weekday mornings.

No spam. Unsubscribe anytime.