News

Singapore's Digital Archivists Tackle the Duplicate Image Problem Head-On This Week

Libraries, government agencies and tech firms are moving to clean up redundant visual data clogging Singapore's growing digital repositories.

#News #Singapore #Singapore News Desk #Local news

By Singapore News Desk · Published 5 July 2026 at 3:16 am

4 min read

Updated 4 h ago· 5 July 2026 at 11:21 am

How we reported this

This article was generated by AI from the linked public sources. The Daily Singapore is independently owned and covers Singapore news free from advertiser or sponsor influence. Read our editorial standards →

Photo: Photo by Ravish Maqsood on Pexels

Singapore's Digital Archivists Tackle the Duplicate Image Problem Head-On This Week — Photo: Photo by Ravish Maqsood on Pexels

Singapore's major digital archives have a clutter problem. Thousands of duplicate and near-duplicate images — scanned heritage photographs, government records, HDB estate documentation, and public infrastructure files — are piling up across repositories, driving up storage costs and slowing down retrieval systems that public agencies and researchers depend on daily.

The issue landed firmly in the spotlight this week after the National Library Board flagged the scale of the redundancy challenge facing its digital collections, which span the National Archives of Singapore at Canning Rise and the National Library building on Victoria Street. The push to address it is now being treated as an operational priority, not a background maintenance task.

Why This Week's Push Matters

The timing is not coincidental. Singapore has been accelerating its Smart Nation digital infrastructure drive, and agencies across Jurong, Woodlands and the Central Business District have been migrating decades of paper-based and analogue records into centralised cloud environments over the past 18 months. That migration, while necessary, has created an avalanche of duplicated files — in some cases the same image uploaded four or five times across different departmental silos before consolidation checks were in place.

Duplicate image data is more than a housekeeping annoyance. Storage on government-grade cloud infrastructure carries real cost, and redundant files inflate the computational load on AI-assisted search and cataloguing tools that agencies like the Urban Redevelopment Authority and the Housing and Development Board increasingly rely on to index visual records. Every irrelevant duplicate that surfaces in a search result is time a civil servant or researcher does not get back.

The problem also affects public-facing tools. The National Heritage Board's Roots.sg portal, which allows Singaporeans to trace family histories and browse digitised civic photographs dating back to the Straits Settlements era, has seen user complaints about duplicate image results cluttering searches — particularly for images of pre-independence neighbourhoods like Tiong Bahru, Tanjong Pagar, and Kampong Glam.

The Technology Being Deployed

The approach being rolled out draws on perceptual hashing algorithms and convolutional neural network models trained to detect near-identical images — not just pixel-perfect copies, but photographs that are slightly cropped, colour-adjusted, or scanned at different resolutions from the same original. These tools can flag suspect pairs for human review rather than automatically deleting files, preserving archival integrity.

GovTech Singapore, based at Sandcrawler Building in one-north, has been central to the technical implementation. The agency has been coordinating with both the National Library Board and the National Archives to standardise the deduplication pipeline across agencies using the Singapore Government Technology Stack. Pilot work began in the first quarter of 2026, and this week's developments reflect the programme moving from pilot into broader deployment across at least three additional ministries.

The National Archives alone holds more than 11 million items in its collection, a figure that has grown sharply since the pandemic-era push to digitise physical records accelerated from 2021 onward. Even a duplication rate of two to three percent across a collection that size represents hundreds of thousands of redundant files consuming server capacity and distorting search rankings.

For institutions like Nanyang Technological University's library system and the Singapore Management University, which maintain their own digitised visual research collections, the national push offers a useful template. Both universities have begun internal reviews aligned with the standards GovTech is establishing, according to publicly available statements on their respective library portals.

Practically speaking, members of the public using Roots.sg or the NLB's digital catalogue at nlb.overdrive.com should expect cleaner, less cluttered image search results over the coming months as the deduplication passes complete. Researchers who have already downloaded archive batches for academic work are being advised to cross-check their local copies against the updated catalogue once the cleaned dataset goes live — expected before the end of the third quarter of 2026. For agencies still mid-migration, the consistent advice from GovTech is to run deduplication checks before ingestion rather than after, a far cheaper fix than cleaning up a contaminated archive after the fact.

Editorial picks

How did this story land?

Spread the word

Have your say

Loading comments…

Sources

About this article

Published by The Daily Singapore

Covering news in Singapore. This article was generated by AI from the linked sources and was not reviewed by a human editor before publishing. See our editorial standards.

Spread the word

See something wrong? Suggest a correction.

Daily brief

Enjoyed this? Wake up to Singapore news every morning.

Free, in your inbox before 7am. Weekdays.

News

Singapore life

Records

News

Singapore life

Records

Singapore's Digital Archivists Tackle the Duplicate Image Problem Head-On This Week

Why This Week's Push Matters

The Technology Being Deployed

You might also like

Singapore's Duplicate Image Problem: The Key Decisions That Will Shape How the City Manages Its Visual Archive

Singapore's Duplicate Image Problem: The Numbers Driving a Digital Clean-Up

Singapore Platforms Tighten Rules on Duplicate Images This Week as AI-Generated Content Floods Listings

Singapore's Push to Root Out Duplicate Images Online: What Officials, Experts and Key Figures Are Saying

How did this story land?

Have your say

Sources

Enjoyed this? Wake up to Singapore news every morning.

Get the Singapore brief