Singapore's accelerating drive to digitise public records hit a practical wall this week when technology teams across several government agencies flagged a growing problem with duplicate images embedded in shared databases — redundant files that are inflating storage costs, degrading search performance, and complicating the rollout of AI-powered public services scheduled for later this year.
The issue matters now because it lands at a particularly sensitive moment. The Smart Nation and Digital Government Office has set a year-end target for expanding AI-assisted citizen services across platforms including Singpass and LifeSG, both of which draw on consolidated image repositories for identity verification and document processing. Duplicate image records — which can number in the tens of thousands once scanning backlogs from legacy paper systems are processed — slow retrieval times and, in some cases, cause verification systems to return conflicting results.
Where the Problem Is Showing Up
The duplication issue has been most visible in two places. At the National Library Board's digitisation facility on Victoria Street, archivists working on the NewspaperSG expansion project discovered this month that a batch of scanned historical images uploaded between March and May 2026 contained a duplication rate estimated internally at roughly 12 percent — meaning more than one in ten images had been stored at least twice, sometimes in different resolutions. Separately, the Housing Development Board's document management system, which handles renovation permit drawings and flat inspection photos for estates from Tampines to Bukit Batok, is understood to have triggered an internal audit after storage consumption jumped unexpectedly in the second quarter.
Neither the National Library Board nor HDB has issued a public statement on the matter this week, and the figures cited above have not been confirmed in official releases. The Daily Singapore is seeking responses from both agencies.
The technical cause is not exotic. When multiple teams scan the same physical document — a common occurrence when both a regional archive and a central repository independently process the same batch — deduplication tools that should catch the overlap sometimes fail to reconcile files stored in different formats, such as TIFF versus JPEG. The result is a dataset that looks complete but carries significant redundancy.
What's Being Done, and What Residents Should Know
GovTech, which sits at Mapletree Business City in Pasir Panjang and serves as the central technology arm for Singapore's public sector, has been working with agencies since at least early 2026 on a duplicate image replacement framework — essentially a standardised protocol for identifying, flagging, and replacing or merging redundant files before they propagate further into AI training datasets. Progress has been uneven. Agencies with older content management systems require manual review pipelines that are slower and more labour-intensive than automated hashing tools available to newer platforms.
For ordinary Singaporeans, the immediate practical effect is minor but real. Users of MyInfo, the data platform that pre-fills government and private-sector forms using verified personal records, may occasionally encounter delays when image-heavy documents — such as scanned property titles or educational certificates — are pulled from repositories affected by the cleanup process. The delays are typically measured in seconds rather than minutes, but they become more noticeable during peak usage windows such as the 9 a.m. to 10 a.m. slot when CPF and HDB transactions spike.
The longer-term stakes are higher. Singapore's positioning as a regional AI hub depends substantially on the quality and cleanliness of the datasets that train and feed public-sector models. Duplicate images are not merely a storage inconvenience — they can skew model outputs, introduce bias in visual recognition tasks, and produce inconsistent results in document authentication pipelines. A deduplication pass that seems like mundane housekeeping is, in practice, infrastructure work that underpins the credibility of every downstream AI application built on top of it.
GovTech has not confirmed a timeline for completing the deduplication sweep across all affected agencies. Agencies are advised to implement SHA-256 hashing at the point of ingestion — a standard that prevents duplicates from entering the system rather than cleaning them out after the fact. Residents who encounter errors or unexpected delays on government digital platforms can report them through the Singpass app feedback function or by contacting the relevant agency helpdesk directly.