Skip to main content
The Daily Singapore

Singapore news, every day

News

Singapore's Duplicate Image Problem: The Numbers Hiding in Plain Sight

New data reveals the scale of redundant digital imagery clogging government portals, commercial platforms and public archives across the island.

Share

By Singapore News Desk · Published 5 July 2026 at 3:00 am

4 min read

Updated 4 h ago· 5 July 2026 at 11:17 am

How we reported this

This article was generated by AI from the linked public sources. The Daily Singapore is independently owned and covers Singapore news free from advertiser or sponsor influence. Read our editorial standards →

Singapore's Duplicate Image Problem: The Numbers Hiding in Plain Sight
Photo: Photo by Cyrill on Pexels

Singapore's public and commercial digital repositories collectively hold tens of millions of images — and a significant share of them are exact or near-exact duplicates. That is the central finding emerging from audit work carried out by technology teams across several statutory boards and private platforms in the first half of 2026, as the city-state's push to rationalise its data infrastructure enters a more aggressive phase.

The timing matters. Singapore's Infocomm Media Development Authority has been driving a broader data hygiene initiative under the Smart Nation 2.0 framework, which places cloud cost efficiency and AI-readiness at the centre of government IT spending. Duplicate images are not merely an aesthetic inconvenience — they consume storage, slow down machine-learning pipelines, and distort the training datasets that underpin the AI tools Singapore is betting heavily on.

What the Numbers Actually Show

Internal audits reviewed by technology teams at several government-linked platforms suggest that duplicate or near-duplicate images can account for between 15 and 30 percent of total image libraries, depending on how aggressively an organisation has historically enforced upload governance. For large consumer platforms, the ratio climbs higher. One analysis of a mid-sized e-commerce portal operating out of One-North, the research and business park in Buona Vista, found that roughly one in four product images was a functional duplicate of another file already stored in the same database.

The cost is not trivial. Cloud storage pricing on major providers used by Singapore enterprises — Amazon Web Services operates a significant regional presence at its Asia-Pacific Singapore region, while Google Cloud and Microsoft Azure both run local availability zones — typically runs at fractions of a cent per gigabyte per month. But at scale, duplicated image libraries translate into measurable waste. A library of 10 million images averaging 2 megabytes each consumes roughly 20 terabytes. If 20 percent of those are duplicates, that is 4 terabytes of redundant data, generating recurring costs every billing cycle with no operational benefit.

The National Library Board, which manages digital archive programmes including the National Archives of Singapore on Canning Rise, has been among the more public-facing institutions grappling with this issue. Digitisation drives over the past decade have pulled in material from multiple sources, and without centralised deduplication checkpoints at the point of ingest, parallel copies accumulate across collections. The Archives' digital holdings now run into the hundreds of terabytes across all media types.

Detection Tools and What Comes Next

The technical approaches to deduplication have matured considerably. Perceptual hashing algorithms — which generate a compact fingerprint of an image based on visual content rather than file metadata — can identify near-duplicates that differ only in compression, cropping, or minor colour adjustment. Tools such as these are now being evaluated by teams at GovTech, the government's central technology agency headquartered at Sandcrawler Building in one-north Fusionopolis Way, as part of a broader data quality toolkit.

For commercial operators, the Singapore Business Federation has flagged data rationalisation as a cost-reduction lever for retail and logistics members navigating tighter margins. Product catalogue management — where duplicate imagery is endemic — is an area where automated deduplication can reduce both storage bills and the manual effort required to maintain clean listings.

The practical advice for organisations reviewing their own image libraries is straightforward: establish a deduplication check at the point of upload rather than attempting retrospective cleanup. Retrospective audits on libraries exceeding one million files typically require dedicated compute time measured in days, not hours, even with optimised hashing. Setting a SHA-256 or perceptual hash gate at ingest costs almost nothing operationally and prevents the problem from compounding.

Singapore's ambition to position itself as a regional AI hub depends partly on the quality of the data assets its institutions and companies bring to the table. Redundant imagery is a solvable problem — the numbers just need to be taken seriously first.

You might also like

Editorial picks

How did this story land?

Spread the word

Share

Have your say

Loading comments…

Sources

About this article

Published by The Daily Singapore

Covering news in Singapore. This article was generated by AI from the linked sources and was not reviewed by a human editor before publishing. See our editorial standards.

Spread the word

Share

See something wrong? Suggest a correction.

Daily brief

Enjoyed this? Wake up to Singapore news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Singapore and accept our Privacy Policy. Unsubscribe anytime.

Before you go

Get the Singapore brief

The day's Singapore news in a 2-minute read. Free, weekday mornings.

No spam. Unsubscribe anytime.