Skip to main content
The Daily Singapore

Singapore news, every day

News

Singapore Agencies Push to Stamp Out Duplicate Images in Public Records and Digital Archives This Week

A quiet but consequential effort to clean up duplicated visual data across government platforms is gaining pace, with real implications for AI training sets and public-sector efficiency.

Share

By Singapore News Desk · Published 5 July 2026 at 3:45 am

4 min read

Updated 4 h ago· 5 July 2026 at 12:01 pm

How we reported this

This article was generated by AI from the linked public sources. The Daily Singapore is independently owned and covers Singapore news free from advertiser or sponsor influence. Read our editorial standards →

Singapore Agencies Push to Stamp Out Duplicate Images in Public Records and Digital Archives This Week
Photo: Photo by CK Seng on Pexels

Singapore's Infocomm Media Development Authority flagged duplicate image contamination as a live operational problem this week, as multiple public-sector agencies moved to audit their digital repositories ahead of a third-quarter deadline tied to the Smart Nation 2.0 infrastructure refresh. The issue is more prosaic than it sounds — and more expensive.

Duplicate images clog storage, distort AI training datasets, and inflate licensing costs. For a city-state positioning itself as a regional hub for artificial intelligence and data services, messy back-end repositories are a reputational and commercial liability. The urgency increased after the Government Technology Agency, known as GovTech, circulated an internal guidance note — confirmed by people familiar with the matter but not publicly released as of Saturday — encouraging statutory boards to conduct duplicate-detection sweeps before migrating legacy data to the new Whole-of-Government data platform.

What Triggered the Week's Activity

The immediate catalyst was a routine infrastructure audit at the National Library Board's Lee Kong Chian Reference Library on Victoria Street, where archivists discovered that digitised historical photographs uploaded to the NLB's online portal, BookSG and its successor platforms, contained significant duplication rates. Some image batches, particularly those sourced from heritage collections scanned between 2018 and 2022, had been ingested multiple times during platform migrations, according to a person familiar with the audit who was not authorised to speak on record. The NLB declined to provide specific figures before publication.

That finding reverberated because it is not isolated. The Housing and Development Board, which maintains hundreds of thousands of images tied to flat listings, renovation permits, and estate documentation, has been running its own deduplication exercise since May. HDB's digital estate covers more than one million residential units across towns from Tampines to Bukit Batok, and image duplication at that scale creates non-trivial storage overhead. Object storage costs on commercial cloud platforms have hovered around US$0.023 per gigabyte per month for standard tiers — a figure that compounds fast when duplicated assets are counted multiple times across backup cycles.

GovTech has been testing perceptual hashing tools — software that generates a fingerprint for each image and flags near-identical copies even when file names differ — as part of its data-quality toolkit. The technology is not new. What is new is the institutional pressure to actually deploy it systematically, driven partly by the AI governance frameworks Singapore published in 2024 and updated earlier this year, which require agencies to document the provenance and integrity of data used in automated decision-making.

Why This Matters Beyond Housekeeping

The stakes are higher than they appear on a storage invoice. Singapore's AI strategy depends on clean, well-labelled local datasets. Duplicated images in government training sets produce models that overfit to repeated examples, skewing outputs in subtle ways that are hard to detect downstream. The Smart Nation and Digital Government Office has been explicit, in published policy documents, that data quality is a prerequisite for responsible AI deployment in public services.

For the private sector, the government's deduplication drive carries practical signals. Companies in the Marina Bay and one-north tech clusters that supply data-management services to public agencies say procurement conversations have shifted in recent weeks, with buyers asking more detailed questions about deduplication capabilities and audit trails. None of those conversations have produced announced contracts as of this weekend.

Citizens are unlikely to notice any immediate change in the services they use. MyInfo, the government's personal-data platform, and Singpass-linked services already operate with tighter data-quality controls. The clean-up work is happening in the layers that support back-office functions — archiving, planning, estate management — rather than in consumer-facing applications.

The practical advice for anyone dealing with public-sector digital submissions right now is straightforward: label image files with unique identifiers before uploading, avoid resubmitting documents from earlier applications, and check agency portals for updated file-naming conventions, several of which were quietly revised this month. GovTech has indicated it will publish updated developer guidelines covering image submission standards before the end of July.

You might also like

Editorial picks

How did this story land?

Spread the word

Share

Have your say

Loading comments…

Sources

About this article

Published by The Daily Singapore

Covering news in Singapore. This article was generated by AI from the linked sources and was not reviewed by a human editor before publishing. See our editorial standards.

Spread the word

Share

See something wrong? Suggest a correction.

Daily brief

Enjoyed this? Wake up to Singapore news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Singapore and accept our Privacy Policy. Unsubscribe anytime.

Before you go

Get the Singapore brief

The day's Singapore news in a 2-minute read. Free, weekday mornings.

No spam. Unsubscribe anytime.