Skip to main content
The Daily Singapore

Singapore news, every day

News

How Singapore's Digital Records Crisis Led to a Reckoning With Duplicate Images

Decades of overlapping digitisation drives across government agencies left a sprawling mess of redundant files — and fixing it is costing real money.

Share

By Singapore News Desk · Published 5 July 2026 at 3:23 am

4 min read

Updated 4 h ago· 5 July 2026 at 11:42 am

How we reported this

This article was generated by AI from the linked public sources. The Daily Singapore is independently owned and covers Singapore news free from advertiser or sponsor influence. Read our editorial standards →

How Singapore's Digital Records Crisis Led to a Reckoning With Duplicate Images
Photo: Photo by Kenny Foo on Pexels

Singapore's push to digitise everything — land titles, HDB flat records, medical histories, court documents — has produced an unexpected problem: nobody agreed on a single standard, and now the nation's public databases are riddled with duplicate image files running into the tens of millions. The Info-communications Media Development Authority confirmed earlier this year that a whole-of-government deduplication exercise is underway, touching agencies from the Housing Development Board to the National Archives of Singapore on Canning Rise.

The timing matters. Singapore is spending heavily to position itself as a regional artificial intelligence hub, with the National AI Strategy 2.0 — launched in late 2023 — calling for high-quality, machine-readable datasets across the public sector. Duplicate images are not a cosmetic irritant. They skew model training, inflate storage costs, and compromise the integrity of search results in systems that citizens use daily, from the HDB's flat application portal to the SingPass document vault.

How the Mess Accumulated

The duplication problem has roots in the 1990s, when individual ministries began scanning paper records independently. The National Archives ran its own programme. The Land Transport Authority digitised vehicle and licensing documents on a separate track. Hospitals under the National University Health System and Singapore Health Services scanned patient records using different resolution standards and naming conventions. When these systems were later connected — first through eCitizen, then through the Singpass app — no one wrote a master deduplication rule into the integration layer.

By 2018, when GovTech took over the Smart Nation infrastructure role in full, internal audits reportedly flagged the redundancy problem. But fixing it required every agency to temporarily freeze updates to shared repositories, a disruption that kept getting deferred. The National Library Board's digitised newspaper archive — accessible from its reference library at Victoria Street — is one of the cleaner datasets, because it was built under a single contractor with strict file-naming rules from the outset. Most other collections were not so lucky.

A 2024 benchmark study by the Singapore Management University's School of Computing and Information Systems — using publicly cited figures from that institution's published research — found that government-held image repositories in comparable city-states carry duplication rates of between 18 and 34 percent once cross-agency indexing is applied. Singapore has not published its own figure officially, but the IMDA's current deduplication tender, awarded in the first quarter of 2026, covers more than 200 terabytes of scanned government documents.

What the Fix Actually Involves

Deduplication at this scale is not a single software patch. The approach being rolled out uses perceptual hashing — a technique that generates a short fingerprint from an image's visual content rather than its file name — to cluster near-identical scans. A file scanned at 200 dpi and the same file scanned at 300 dpi will look different to a traditional checksum tool but nearly identical to a perceptual hash. GovTech's data engineering team at Mapletree Business City in Pasir Panjang is running the matching engine, with results piped back to each originating agency for human review before any file is deleted.

The practical stakes are significant for ordinary residents. The HDB's resale portal processes tens of thousands of flat transactions each year — resale prices averaged around S$570,000 island-wide in early 2026 — and each transaction requires verified document images. A wrongly flagged duplicate can stall a sale. Lawyers at firms along Cecil Street handling conveyancing work say document retrieval delays have been a recurring friction point, though the agencies themselves have not publicly attributed those delays to the duplication backlog.

The current deduplication contract runs through the third quarter of 2027. Agencies are expected to migrate cleaned image libraries into a new central object store before the end of that year. Residents who use SingPass to store personal documents — identity cards, birth certificates, tenancy agreements — are not directly affected during the transition, but the back-end clean-up will eventually make document verification faster. For now, anyone dealing with time-sensitive transactions involving government-held scanned records should build in extra lead time and confirm document status directly with the relevant agency rather than assuming the portal is fully current.

You might also like

Editorial picks

How did this story land?

Spread the word

Share

Have your say

Loading comments…

Sources

About this article

Published by The Daily Singapore

Covering news in Singapore. This article was generated by AI from the linked sources and was not reviewed by a human editor before publishing. See our editorial standards.

Spread the word

Share

See something wrong? Suggest a correction.

Daily brief

Enjoyed this? Wake up to Singapore news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Singapore and accept our Privacy Policy. Unsubscribe anytime.

Before you go

Get the Singapore brief

The day's Singapore news in a 2-minute read. Free, weekday mornings.

No spam. Unsubscribe anytime.