Skip to main content
The Daily Singapore

Singapore news, every day

News

Singapore's Duplicate Image Problem: The Numbers Driving a Digital Storage Crisis

Government agencies and businesses are drowning in redundant visual data — and the scale of the waste is only now becoming clear.

Share

By Singapore News Desk · Published 5 July 2026 at 2:43 am

4 min read

Updated 5 h ago· 5 July 2026 at 10:17 am

How we reported this

This article was generated by AI from the linked public sources. The Daily Singapore is independently owned and covers Singapore news free from advertiser or sponsor influence. Read our editorial standards →

Singapore's public and private sector databases collectively hold tens of millions of duplicate digital images, a problem that costs organisations measurable money every year and is quietly undermining the city-state's ambition to run lean, AI-ready data infrastructure. The issue sits at the intersection of two converging pressures: explosive growth in image-heavy digital records and a push by agencies to clean their datasets before feeding them into machine learning pipelines.

The timing matters because Singapore's Smart Nation and Digital Government Office has been accelerating AI adoption across government ministries since 2024. Dirty data — and duplicate images are among the most common forms of it — degrades model accuracy and inflates cloud storage bills simultaneously. One industry benchmark widely cited in data engineering circles puts the share of duplicate or near-duplicate images in large unstructured enterprise archives at between 20 and 35 percent. Apply that range to any organisation storing tens of thousands of product photos, identity documents, or property images and the redundancy problem becomes a line item.

What the Numbers Actually Look Like on the Ground

The Housing and Development Board manages image records for more than one million residential flats across towns from Woodlands to Tampines. Property listings, resale flat inspections, renovation permits and estate maintenance requests all generate photographs. When the same flat is listed, inspected and re-listed over several years, near-identical images accumulate across different case files without automated deduplication in place. The HDB did not provide a specific figure for duplicate image volumes when contacted for this story, but the structural conditions — high transaction turnover, multi-department workflows, legacy document management systems — are textbook generators of image redundancy.

The Inland Revenue Authority of Singapore and the Ministry of Manpower face analogous pressures on the identity document side. Passport photos, work pass headshots, and supporting application images are submitted repeatedly across different agencies, often as separate uploads rather than through a centralised pull from a single verified source. The National Digital Identity framework, anchored by Singpass, was partly designed to reduce exactly this kind of re-submission friction, but full integration across all document workflows remains incomplete as of mid-2026.

In the private sector, e-commerce platforms operating out of one-north and the Mapletree Business City cluster in Alexandria Road deal with duplicate product imagery at scale. A single SKU — a bottle of shampoo, a kitchen appliance — can have dozens of near-identical images uploaded by different sellers or at different resolutions. Platforms that have run internal deduplication audits typically report storage reductions of 15 to 28 percent after a first pass, according to data engineering practitioners familiar with local deployments. At current AWS Singapore regional pricing of roughly USD 0.025 per gigabyte per month for standard S3 storage, a company sitting on 100 terabytes of images could save between SGD 5,000 and SGD 14,000 a month simply by removing confirmed duplicates — before any compression or tiering strategy is applied.

Detection Tools and What Organisations Are Doing About It

Perceptual hashing — a technique that generates a compact fingerprint from an image's visual content rather than its file metadata — has become the standard first-line tool for duplicate detection. Libraries such as ImageHash and open-source pipelines built on Python can process several thousand images per minute on modest hardware. The more demanding problem is near-duplicate detection: images of the same subject taken seconds apart, or the same document scanned at slightly different angles. That requires embedding-based similarity search, the kind of vector database work that companies at the Jurong Innovation District and AI Singapore's 100E Pasir Panjang Road campus have been actively building capability around since 2023.

Organisations that have not yet audited their image archives should start with a storage inventory — identifying which systems hold unstructured image data, what formats are in use, and when files were last accessed. Files untouched for more than 24 months and smaller than a defined resolution threshold are the lowest-risk candidates for automated deduplication review. Singapore's Personal Data Protection Commission guidelines require that images containing identifiable individuals be handled under data minimisation principles, which adds both a legal incentive and a compliance framework for organisations to act. The commission's advisory on data protection by design, updated in January 2025, specifically references storage reduction as a concrete implementation of that principle. The cost savings are real. So is the regulatory nudge. The question for most organisations is no longer whether to run deduplication — it is how long they can afford to wait.

You might also like

Editorial picks

How did this story land?

Spread the word

Share

Have your say

Loading comments…

Sources

About this article

Published by The Daily Singapore

Covering news in Singapore. This article was generated by AI from the linked sources and was not reviewed by a human editor before publishing. See our editorial standards.

Spread the word

Share

See something wrong? Suggest a correction.

Daily brief

Enjoyed this? Wake up to Singapore news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Singapore and accept our Privacy Policy. Unsubscribe anytime.

Before you go

Get the Singapore brief

The day's Singapore news in a 2-minute read. Free, weekday mornings.

No spam. Unsubscribe anytime.