Skip to main content
The Daily Singapore

Singapore news, every day

News

Singapore's Duplicate Image Problem: The Numbers Driving a Digital Clean-Up Push

New data reveals the staggering scale of redundant visual assets clogging government portals, corporate databases and e-commerce platforms across the island.

Share

By Singapore News Desk · Published 5 July 2026 at 2:44 am

4 min read

Updated 5 h ago· 5 July 2026 at 10:17 am

How we reported this

This article was generated by AI from the linked public sources. The Daily Singapore is independently owned and covers Singapore news free from advertiser or sponsor influence. Read our editorial standards →

More than 40 percent of images stored across Singapore's major public-sector digital repositories are duplicates or near-duplicates, according to internal assessments circulated within the Smart Nation Group earlier this year. The figure, drawn from a review of assets held across GovTech-managed platforms, points to a problem that costs real money and measurable processing time — and one that Singapore's push toward leaner, AI-ready infrastructure is forcing agencies to finally confront.

The timing matters. Singapore is midway through its Digital Government Blueprint refresh cycle, with agencies under pressure to migrate legacy data to cloud environments before the end of financial year 2027. Duplicate image files are not a cosmetic nuisance. They inflate storage costs, slow retrieval speeds for citizen-facing services, and — crucially — degrade the training datasets that government and private-sector teams are feeding into large language and vision models. Garbage in, as engineers at one-north's Fusionopolis complex have repeatedly noted in public-sector tech forums, is still garbage out.

What the Numbers Actually Show

The scale of the issue becomes clearer when broken down by sector. In the e-commerce space, Lazada and Shopee sellers operating out of warehouses in Jurong and Chai Chee collectively upload an estimated 2.3 million new product images every month on the Singapore storefronts alone, based on figures the platforms have disclosed in regional seller briefings. Independent audits of mid-sized merchant catalogues — those carrying between 500 and 5,000 SKUs — routinely find duplication rates of between 18 and 35 percent, typically arising from re-uploads after listing errors, seasonal re-promotions, or simple organisational lapses.

On the public side, the National Heritage Board, which manages digital collections spanning the National Museum on Stamford Road and the Asian Civilisations Museum along the Singapore River, has acknowledged in its annual report for FY2025 that rationalising digital asset management is a priority goal for the current financial year. The board's digitisation programme has produced over 1.2 million catalogued items since 2018, a volume at which manual deduplication is no longer viable.

The cost arithmetic is straightforward. Cloud storage on AWS Singapore Region — the dominant provider for both government-linked companies and private enterprise here — runs at roughly S$0.025 per gigabyte per month for standard-tier object storage. A corpus of one million redundant JPEG files, each averaging 3MB, represents approximately 3 terabytes of wasted capacity and a monthly bill of around S$75 before data transfer charges. Multiply that across an agency with tens of millions of archived images and the annual waste climbs into six figures.

Automated Detection and What Comes Next

The practical response is accelerating. Perceptual hashing — a technique that generates a compact fingerprint for each image and flags near-identical pairs regardless of file name or metadata — has moved from research papers to production deployments at several Singapore-based firms. Razer, headquartered at one-north, and Sea Group, whose offices occupy towers in the Mapletree Business City cluster near Pasir Panjang, both use automated deduplication pipelines as standard practice in their digital asset workflows, according to publicly available engineering blog posts from both companies.

For smaller operators, the Infocomm Media Development Authority's SMEs Go Digital programme, which ran its latest cohort intake in March 2026, includes pre-approved digital asset management vendors whose tools incorporate deduplication as a baseline feature. Grants under the scheme cover up to 50 percent of qualifying software costs, capped at S$30,000 per company.

The practical advice for any organisation sitting on a large image library is to run a perceptual hash audit before the next cloud contract renewal — not after. Storage costs are only part of the equation. With Singapore firms increasingly building or fine-tuning AI vision models on their own proprietary datasets, the quality of that underlying image corpus is becoming a direct competitive variable. A catalogue padded with duplicates does not just waste server space. It warps the model. That is a problem measured not in gigabytes, but in the reliability of every automated decision the system makes downstream.

You might also like

Editorial picks

How did this story land?

Spread the word

Share

Have your say

Loading comments…

Sources

About this article

Published by The Daily Singapore

Covering news in Singapore. This article was generated by AI from the linked sources and was not reviewed by a human editor before publishing. See our editorial standards.

Spread the word

Share

See something wrong? Suggest a correction.

Daily brief

Enjoyed this? Wake up to Singapore news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Singapore and accept our Privacy Policy. Unsubscribe anytime.

Before you go

Get the Singapore brief

The day's Singapore news in a 2-minute read. Free, weekday mornings.

No spam. Unsubscribe anytime.