Singapore's archival and public sector technology teams spent much of this week grappling with a practical but stubborn problem: tens of thousands of duplicate images clogging digital repositories that the government has been expanding rapidly since 2023. The issue surfaced publicly after the National Library Board flagged the challenge in internal documentation reviewed by The Daily Singapore, pointing to redundant image files across multiple platforms as a drag on storage costs and search accuracy.
The timing matters. Singapore has been aggressively digitising physical collections — from housing records to Chinatown streetscape photography — as part of its Smart Nation push, and the volume of inbound digital assets has outpaced the quality-control tools deployed to manage them. When duplicates multiply unchecked, search results degrade, storage bills climb, and users — whether civil servants or members of the public using the NLB's BiblioAsia portal — end up wading through redundant results instead of finding what they need.
Where the Bottleneck Is Hitting Hardest
Two repositories in particular have been flagged this week. The National Archives of Singapore, whose main reading room sits along Canning Rise in Fort Canning, holds millions of digitised photographs spanning the colonial era through independence. Separately, the Urban Redevelopment Authority's SPACE portal — used by planners and researchers at its Maxwell Road headquarters — stores large volumes of aerial and ground-level imagery tied to development applications. Staff at both institutions have been running deduplication scripts, but the process is manual enough that it is consuming staff hours that would otherwise go toward cataloguing new acquisitions.
The duplicate image problem is not unique to Singapore, but local conditions sharpen it. Agencies frequently share image assets across platforms without a centralised metadata standard, meaning the same photograph of, say, the old Kallang Airport terminal can exist under three different file names, two different date stamps, and no consistent rights classification. That makes automated matching harder. GovTech, which oversees Singapore's whole-of-government digital infrastructure from its Mapletree Business City offices in Pasir Panjang, has been working on a unified digital asset management framework since late 2024, but full rollout across all agencies has not been completed.
The scale of the backlog is significant. The National Library Board's annual report for financial year 2024/2025 noted that its digital collection had crossed 1.2 million items, a figure that has continued to grow through heritage digitisation grants offered to community organisations. Industry benchmarks from digital asset management research suggest that large institutional collections typically carry a duplication rate of between eight and fifteen percent once automated hash-matching tools are applied — meaning the NLB alone could be sitting on upward of 90,000 redundant files if it falls at the midpoint of that range. The board has not publicly confirmed its own duplication count.
What Comes Next for Users and Institutions
GovTech's framework, when it lands, is expected to introduce perceptual hashing — a technique that identifies visually similar images even when file names or metadata differ — across participating agencies. That would be a meaningful upgrade from the current checksum-based matching, which only catches byte-for-byte copies and misses the more common case of the same image saved at different resolutions or with minor cropping.
For everyday users of platforms like the NLB's eResources portal or the Archives Online search tool, the practical advice for now is straightforward: if a search for historical images of areas like Tanjong Pagar or Buona Vista returns suspiciously repetitive results, refine searches by date range or file format to filter out lower-quality duplicates. The libraries have also encouraged users to flag obvious duplicates through the feedback function on Archives Online, which feeds into a corrections queue reviewed monthly.
A broader review of digital asset governance across the public sector is expected to form part of the next Smart Nation update, which the government typically releases in the third quarter of the calendar year. Whether the deduplication tools get budget priority before then will depend on how urgently agencies quantify the storage cost — and how loudly researchers complain in the meantime.