Skip to main content
The Daily Singapore

Singapore news, every day

News

Singapore's Digital Archives Tackle Duplicate Image Problem With New AI-Assisted Screening Tool

Libraries and government agencies are moving fast to clean up redundant visual records before a major digitisation milestone this quarter.

Share

By Singapore News Desk · Published 5 July 2026 at 3:25 am

4 min read

Updated 3 h ago· 5 July 2026 at 12:02 pm

How we reported this

This article was generated by AI from the linked public sources. The Daily Singapore is independently owned and covers Singapore news free from advertiser or sponsor influence. Read our editorial standards →

Singapore's Digital Archives Tackle Duplicate Image Problem With New AI-Assisted Screening Tool
Photo: Photo by Towfiqu barbhuiya on Pexels

Singapore's National Library Board quietly rolled out a new duplicate-image detection pipeline this week, deploying it across the NLB's digital repository at Victoria Street — a system that archivists have been pressure-testing since May 2026. The tool, built on perceptual hashing combined with machine-learning similarity scoring, is designed to flag near-identical scans and photographs before they are formally ingested into the national digital collection.

The timing is not accidental. NLB has set a target of processing its two-millionth digitised asset by the end of the third quarter of 2026. Duplicate records inflate that count artificially, and more practically, they consume server storage that the Board's data centre at Punggol Digital District is already managing under expanding demand.

Why Duplicates Became a Pressing Problem

The issue crept up gradually. When government agencies accelerated digitisation drives during and after the COVID-19 disruptions, multiple teams — often working in parallel — scanned the same physical collections from the National Archives of Singapore on Canning Rise. Newspapers, civic photographs, maps of the old Kampong Glam district, and building records from the Urban Redevelopment Authority all went through scanners at different resolutions and colour profiles. The result was thousands of near-identical image files with marginally different metadata, making manual review impractical at scale.

Duplicate images cause problems beyond storage bloat. Search results return redundant hits, researchers waste time triaging results, and automated cataloguing systems assign conflicting tags to files that represent the same original object. For a city-state that has positioned itself as a regional leader in AI-augmented public services — and that spent S$1 billion on its Smart Nation and Digital Government initiatives across the 2021–2025 fiscal cycle — messy back-end data is an awkward gap in the narrative.

The new screening tool processes image batches overnight. According to documentation published on the NLB's developer portal on June 30, the system uses a two-stage check: a fast perceptual hash pass that eliminates obvious duplicates in milliseconds, followed by a convolutional neural network comparison for near-duplicates that share composition but differ in crop, brightness, or compression artefacts. Files flagged as probable duplicates are quarantined for human review rather than deleted automatically — a deliberate safeguard against false positives stripping unique records from the archive.

Agencies Align on a Common Standard

The NLB is not working in isolation. The Infocomm Media Development Authority, headquartered at South Beach Avenue, has been coordinating with at least four statutory boards since January 2026 to establish a shared image-quality and deduplication standard for government digital assets. The goal is a single metadata framework that would allow agencies to cross-check holdings without duplicating effort across departmental silos.

For Singapore's creative and research communities based around institutions like Lasalle College of the Arts on McNally Street and the National University of Singapore's Centre for Digital Humanities, cleaner archives mean more reliable datasets for computational research. Researchers using the NLB's NewspaperSG portal have long flagged duplicate pages as one of the more tedious friction points in their workflow.

The practical implications extend to the private sector too. Digital asset management is a growing service line among Singapore-based firms that handle marketing libraries, legal document repositories, and property listing photographs — categories where duplicates carry real financial cost in storage and licensing confusion.

NLB's deduplication project is expected to complete its first full system scan of the existing collection by September 2026. After that, the detection pipeline will run as a standing pre-ingest check. For researchers and members of the public using the NLB's digital services, the most visible change will be cleaner, faster search results with fewer redundant thumbnails cluttering catalogue pages. For the archivists on Victoria Street, it means fewer late nights triaging files by hand — and a two-millionth asset count that will actually mean something.

You might also like

Editorial picks

How did this story land?

Spread the word

Share

Have your say

Loading comments…

Sources

About this article

Published by The Daily Singapore

Covering news in Singapore. This article was generated by AI from the linked sources and was not reviewed by a human editor before publishing. See our editorial standards.

Spread the word

Share

See something wrong? Suggest a correction.

Daily brief

Enjoyed this? Wake up to Singapore news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Singapore and accept our Privacy Policy. Unsubscribe anytime.

Before you go

Get the Singapore brief

The day's Singapore news in a 2-minute read. Free, weekday mornings.

No spam. Unsubscribe anytime.