Singapore's drive to digitise its public records ran into a concrete obstacle this week, as agencies managing national image repositories acknowledged that duplicate and near-duplicate photographs have been quietly degrading the quality of several flagship digital archives. The problem, which has been building for at least two years, came to a head after the National Library Board flagged internal audit results at a closed-door working session on July 2 at the National Archives of Singapore building on Canning Rise.
The timing matters. Singapore is midway through its Digital Government Blueprint 2023–2028, which commits to making government data — including multimedia assets — searchable, deduplicated and machine-readable across ministries. Persistent image duplication undermines that goal directly, clogging storage, distorting search results and making AI-assisted cataloguing less reliable. With the government positioning Singapore as a regional AI hub, the integrity of training data held in public repositories is not a minor housekeeping matter.
Where the Problem Shows Up
Three specific databases have drawn attention. The National Archives of Singapore's PictureSG portal, which holds over 50,000 digitised photographs of local heritage, contains clusters of near-identical images that were uploaded separately during batch migrations in 2023 and 2024. The Roots.sg portal, run by the National Heritage Board, has similar duplication in its community photograph collections, particularly in the Chinatown and Kampong Glam heritage sections. Staff at both institutions have been relying on manual checks, a process that is slow and inconsistent when applied across tens of thousands of files.
At the municipal level, the Urban Redevelopment Authority's OneMap platform — widely used by developers, architects and the public — has faced complaints about duplicate aerial and streetscape images appearing in layered map views, occasionally causing display errors on mobile browsers. The URA has not publicly detailed the scope, but the issue has been raised in developer forums on GitHub linked to the OneMap API since at least March this year.
The root cause is not unusual. When agencies migrate legacy systems or onboard photographs from external contributors — community groups, schools, grassroots organisations — automated deduplication tools are often not applied consistently. Standard perceptual hashing tools, which compare image fingerprints rather than file names, exist commercially for well under S$500 per licence annually, yet procurement cycles and inter-agency coordination have slowed adoption.
A Fix on the Table, But No Deadline Yet
The Government Technology Agency, known as GovTech, confirmed in a June 30 advisory to agencies that a centralised deduplication module will be piloted under the Singapore Government Tech Stack before the end of the third quarter of 2026. The module is designed to run perceptual hashing at ingestion, flagging duplicates before they enter production databases rather than requiring retroactive cleanup. The pilot will initially cover image assets managed by five agencies, though GovTech has not published the full list.
For institutions that cannot wait, the practical options are limited but clear. Open-source tools such as Microsoft's PhotoDNA, adapted for non-commercial archival use, and the Python-based imagededup library have both been successfully deployed by smaller cultural institutions in London and Tokyo to clean existing collections before migrating to new systems. Singapore's own Info-communications Media Development Authority published guidance in 2024 recommending perceptual hashing as a baseline standard for any public-sector media archive, though that guidance is advisory rather than mandatory.
The NLB and NHB have not confirmed a timeline for cleaning their existing backlog. Archivists familiar with the PictureSG collection note that a full retroactive audit of 50,000-plus images, even with automated tools, typically takes three to four months of iterative work before human review. If GovTech's pilot runs on schedule and results are favourable, a government-wide rollout could realistically begin in the first quarter of 2027. Until then, institutions uploading new batches of images to shared repositories are being advised internally to apply deduplication checks locally before submission — a practical stopgap, but not a systemic answer.