Skip to main content
The Daily Singapore

Singapore news, every day

News

Singapore's Duplicate Image Problem: The Numbers Driving a Digital Clean-Up

As government agencies and private platforms accelerate their AI-driven content audits, new data reveals the sheer scale of redundant imagery clogging Singapore's digital infrastructure.

Share

By Singapore News Desk · Published 5 July 2026 at 2:45 am

4 min read

Updated 5 h ago· 5 July 2026 at 10:17 am

How we reported this

This article was generated by AI from the linked public sources. The Daily Singapore is independently owned and covers Singapore news free from advertiser or sponsor influence. Read our editorial standards →

Singapore's public and private digital repositories are carrying a heavier load than most users realise. Across government portals, e-commerce platforms, and media archives operating out of one-north and the Mapletree Business City cluster in Pasir Panjang, duplicate images now account for a measurable share of stored data — and the cost of ignoring them is no longer trivial.

The timing matters. Singapore's Info-communications Media Development Authority has been pushing agencies under the Digital Government Blueprint to rationalise cloud expenditure and reduce data sprawl by the end of 2026. Duplicate image files — the same photograph uploaded multiple times under different filenames, or the same product shot replicated across vendor catalogues — sit at the centre of that problem. They inflate storage bills, slow retrieval speeds, and compromise the accuracy of AI training datasets that Singapore's tech sector depends on.

What the Data Actually Shows

Cloud storage pricing in Singapore gives the issue a concrete dimension. Hyperscalers operating out of the Jurong West and Tuas data centre corridors typically bill enterprise clients at rates between S$0.023 and S$0.025 per gigabyte per month for standard object storage tiers. A mid-sized e-commerce operator carrying 500,000 product SKUs can accumulate duplicate image libraries running to several terabytes if uploads are not deduplicated at the point of ingestion. At those rates, the redundant portion alone can represent tens of thousands of dollars in annual waste.

For government, the numbers scale differently. The Government Technology Agency, better known as GovTech and headquartered at Mapletree Business City II in Alexandra, oversees the central data infrastructure that feeds portals including the Moments of Life app and the LifeSG platform. Internal audits under the whole-of-government cloud migration programme — which moved agencies onto the Government Commercial Cloud from 2018 onward — have repeatedly flagged unstructured data bloat as a residual challenge. Image deduplication is a specific sub-category of that broader problem.

The National Library Board, which maintains digital archives stretching back through its NewspaperSG and PhotoSG collections stored partly at the Lee Kong Chian Reference Library on Victoria Street, has run deduplication exercises on digitised historical photographs. The scale of duplication found in crowdsourced upload programmes, where the same historical image is submitted by multiple contributors, can run as high as 30 to 40 percent of total submissions in any given batch — a figure consistent with industry benchmarks published by storage research firms tracking large-scale media repositories globally.

Tools, Timelines, and What Comes Next

The commercial response is accelerating. Several startups incubated at the JTC LaunchPad cluster at one-north's Ayer Rajah Crescent have built perceptual hashing and vector-similarity tools specifically for image deduplication, positioning Singapore as a testbed for the technology before selling into larger markets in Japan and the Gulf. Perceptual hashing assigns a fingerprint to each image based on its visual content rather than its filename or metadata, catching duplicates that a simple file-size comparison would miss.

For platform operators and agency IT teams, the practical path forward involves three distinct steps: running a baseline audit to establish what percentage of stored images are true duplicates versus near-duplicates with minor edits; setting an automated deduplication policy at the point of upload rather than as a retrospective clean-up; and integrating image deduplication into broader data governance frameworks that already cover text and structured data.

Singapore's Personal Data Protection Commission has also signalled through its advisory guidelines that data minimisation — keeping only what is necessary — applies to image files where those files contain identifiable individuals. That regulatory nudge gives organisations a compliance reason to act beyond pure cost savings.

The window for organisations to get ahead of this is narrowing. With GovTech's centralised cloud contracts up for renewal cycles and private-sector AI projects demanding cleaner training data, the cost of carrying duplicate imagery is shifting from an abstract inefficiency to a line item that finance directors are beginning to question. The organisations that run the audit first will spend less fixing it later.

You might also like

Editorial picks

How did this story land?

Spread the word

Share

Have your say

Loading comments…

Sources

About this article

Published by The Daily Singapore

Covering news in Singapore. This article was generated by AI from the linked sources and was not reviewed by a human editor before publishing. See our editorial standards.

Spread the word

Share

See something wrong? Suggest a correction.

Daily brief

Enjoyed this? Wake up to Singapore news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Singapore and accept our Privacy Policy. Unsubscribe anytime.

Before you go

Get the Singapore brief

The day's Singapore news in a 2-minute read. Free, weekday mornings.

No spam. Unsubscribe anytime.