Skip to main content
The Daily Singapore

Singapore news, every day

News

How Singapore's Digital Records Ended Up Full of Duplicate Images — and What's Being Done to Fix It

Years of rapid digitisation across government agencies and public housing databases left a sprawling mess of repeated files; cleaning it up is now a national data-integrity priority.

Share

By Singapore News Desk · Published 5 July 2026 at 2:51 am

4 min read

Updated 5 h ago· 5 July 2026 at 10:30 am

How we reported this

This article was generated by AI from the linked public sources. The Daily Singapore is independently owned and covers Singapore news free from advertiser or sponsor influence. Read our editorial standards →

Singapore's push to digitise everything — from HDB flat inspection reports to hawker-centre licensing documents — has produced an unintended consequence: tens of thousands of duplicate images clogging the databases of public agencies, slowing retrieval systems and inflating storage costs across the civil service. The problem did not emerge overnight. It is the accumulated result of more than a decade of digitisation drives that prioritised speed of upload over data hygiene.

The issue matters today because Singapore is in the middle of repositioning itself as a regional AI and data hub, a goal that depends on the quality of the underlying data that government systems hold. Dirty data — including repeated image files with inconsistent metadata — undermines machine-learning pipelines before they are even trained. The Government Technology Agency of Singapore, known as GovTech, has flagged data standardisation as a prerequisite for the public-sector AI projects it is rolling out under the Smart Nation 2.0 framework announced in 2024.

How the Backlog Built Up

The duplication problem has roots in the mid-2010s, when agencies began scanning paper records in bulk. The Housing & Development Board, which manages more than one million residential flats across estates from Punggol to Queenstown, digitised decades of floor-plan photographs, renovation-permit images and structural-inspection shots in parallel batches. Because different departments used different naming conventions and uploaded to separate servers, the same photograph could enter the system three or four times under different file names. A 2023 review by the Public Service Division identified this kind of siloed digitisation as a systemic vulnerability, though the full scale of duplication across all agencies has not been made public.

The National Library Board faced a similar reckoning with its digital archive at the National Archives of Singapore on Canning Rise. Newspaper photograph collections and oral-history session images digitised before 2018 contained significant overlap because scanning was outsourced to multiple vendors without a unified deduplication protocol. The NLB subsequently invested in hash-based deduplication software as part of its ArchiveSG refresh, which began in earnest in 2022.

Costs compound the problem. Government cloud storage in Singapore is procured through the Government Commercial Cloud framework, and redundant files translate directly into redundant expenditure. Industry benchmarks suggest that unmanaged duplication can inflate storage requirements by between 20 and 40 per cent in large document repositories — a range that, applied to the public sector's scale, represents material budget waste at a time when ministries are managing tighter operational budgets after the post-pandemic spending cycle.

What Comes Next for Agencies and Residents

GovTech is now piloting an automated image-deduplication layer within the Whole-of-Government Data Architecture, which is designed to sit upstream of any agency-level upload portal. Under this approach, a perceptual-hashing algorithm flags near-identical images before they are committed to long-term storage, routing them to a human reviewer rather than simply deleting them — a safeguard against losing genuinely distinct files that happen to look similar.

For ordinary Singaporeans, the practical effect will be felt most in interactions with MyInfo and the Singpass app, where document uploads — identity photographs, property images, supporting files for grant applications — have historically been re-uploaded repeatedly by applicants across different transactions without the system recognising them as identical. Streamlining that backend means faster verification and fewer requests for re-submission.

Businesses on the CorpPass system, particularly the roughly 570,000 registered entities that interact with government licensing portals, stand to benefit from shorter processing queues once legacy duplicate files are purged and active queues run on cleaner data.

The deduplication work is unglamorous and largely invisible to the public. But it is the kind of foundational remediation that determines whether Singapore's AI ambitions run on solid ground or on a landfill of repeated files. Getting the data right now, before the next wave of AI procurement begins, is the more pragmatic path — and agencies appear to know it.

You might also like

Editorial picks

How did this story land?

Spread the word

Share

Have your say

Loading comments…

Sources

About this article

Published by The Daily Singapore

Covering news in Singapore. This article was generated by AI from the linked sources and was not reviewed by a human editor before publishing. See our editorial standards.

Spread the word

Share

See something wrong? Suggest a correction.

Daily brief

Enjoyed this? Wake up to Singapore news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Singapore and accept our Privacy Policy. Unsubscribe anytime.

Before you go

Get the Singapore brief

The day's Singapore news in a 2-minute read. Free, weekday mornings.

No spam. Unsubscribe anytime.