Skip to main content
The Daily Singapore

Singapore news, every day

News

Singapore's Digital Archives Push Tackles Duplicate Image Problem This Week

Government agencies and tech firms accelerated efforts to clean up redundant visual data across public databases, with new tools and a pilot programme drawing attention in the first week of July.

Share

By Singapore News Desk · Published 5 July 2026 at 2:45 am

4 min read

Updated 5 h ago· 5 July 2026 at 10:17 am

How we reported this

This article was generated by AI from the linked public sources. The Daily Singapore is independently owned and covers Singapore news free from advertiser or sponsor influence. Read our editorial standards →

Singapore's infocomm authorities moved this week to address a quietly growing problem in the city-state's digital infrastructure: thousands of duplicate images clogging government and institutional databases, slowing retrieval systems and inflating storage costs across the public sector. The issue surfaced prominently in discussions at the Infocomm Media Development Authority's ongoing Digital Infrastructure Review, which entered a new phase in early July 2026.

The problem is more consequential than it sounds. As Singapore pushes to position itself as a regional AI and data hub — anchored by facilities like the National Supercomputing Centre on Queenstown's Science Park Drive and data centres clustered in Jurong — the quality and cleanliness of underlying datasets directly affects the reliability of machine-learning models trained on them. Duplicate or near-duplicate images introduce bias and inefficiency at the training stage, a concern that AI researchers and database administrators have flagged with increasing urgency over the past 18 months.

What Happened This Week

On Tuesday, the Smart Nation and Digital Government Office confirmed it was piloting an automated duplicate-detection framework across three statutory boards, with the National Library Board among the participants. The NLB holds digitised collections spanning decades of Singapore's documentary heritage — photographs, newspaper clippings, maps — stored and indexed through its NewspaperSG and other online portals. A proportion of those assets have been scanned multiple times across different digitisation drives, resulting in near-identical files with minor resolution or compression differences that standard deduplication tools historically failed to catch.

The new framework, developed in partnership with a local deep-tech firm based at one-north's Fusionopolis cluster, uses perceptual hashing combined with convolutional neural network comparisons to flag visually identical images even when file metadata differs. The pilot runs through September 2026 across an estimated 4.2 million image files, according to programme documentation circulated at this week's review briefing. Early internal testing reportedly reduced redundant storage load by around 18 percent in a sandboxed environment, though those figures have not been independently verified by external auditors yet.

Beyond the library system, the Housing and Development Board has separately flagged duplicate imagery as a maintenance issue within its estate photo documentation system. HDB estate managers photograph common areas from Tampines to Buona Vista for maintenance records, and staff have reported that re-uploads following system migrations — particularly after the 2024 consolidation of town council digital platforms — created significant redundancy. HDB did not release a specific figure for the number of affected files this week.

Why It Matters for Residents and Businesses

The financial dimension is real. Cloud storage is not cheap, even at government-negotiated rates. Singapore's public sector cloud bill has grown considerably since the shift towards commercial cloud platforms accelerated under the Digital Government Blueprint, first published in 2018. Trimming redundant image data is one of the lower-effort ways to claw back storage expenditure without cutting programmes.

For private-sector companies — particularly the media agencies, e-commerce platforms and property portals clustered along Cecil Street and in the Tanjong Pagar business district — the week's developments carry a practical signal. The IMDA is expected to release updated guidelines on image data hygiene for businesses that operate under its digital trust framework before the end of the third quarter. Companies that sell or license image datasets to public agencies will likely face stricter deduplication requirements as part of procurement conditions.

The practical upshot for organisations reviewing their own image libraries is straightforward: open-source tools like pHash and ImageHash have been available for years, but the Singapore pilot is testing whether locally calibrated neural models outperform generic solutions on the specific types of imagery — tropical outdoor environments, HDB corridors, hawker centres — that dominate public-sector collections here. Results from the September trial will inform whether the framework is expanded across all 16 ministries. Organisations running their own audits should not wait for the government timeline; the IMDA's existing Data Quality guidelines, updated in 2025, already recommend periodic deduplication checks as best practice for any dataset exceeding one million files.

You might also like

Editorial picks

How did this story land?

Spread the word

Share

Have your say

Loading comments…

Sources

About this article

Published by The Daily Singapore

Covering news in Singapore. This article was generated by AI from the linked sources and was not reviewed by a human editor before publishing. See our editorial standards.

Spread the word

Share

See something wrong? Suggest a correction.

Daily brief

Enjoyed this? Wake up to Singapore news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Singapore and accept our Privacy Policy. Unsubscribe anytime.

Before you go

Get the Singapore brief

The day's Singapore news in a 2-minute read. Free, weekday mornings.

No spam. Unsubscribe anytime.