A quiet but consequential reckoning is underway across Singapore's digital landscape. Duplicate images — the same photograph appearing under multiple listings, records or identities across government and commercial databases — have accumulated at a scale that administrators are only beginning to quantify. The problem cuts across sectors: HDB resale flat listings on property portals carry recycled stock photos, national identity verification systems flag mismatched visuals, and municipal records held by agencies including the Urban Redevelopment Authority contain imagery that was uploaded multiple times over successive system migrations.
The timing matters because Singapore is mid-stride in a significant push to position itself as a regional AI and data governance hub. Dirty image data — duplicated, mislabelled or misattributed — undermines the training sets that underpin the computer vision tools the Smart Nation and Digital Government Office has been rolling out across public services since at least 2022. Garbage in, garbage out: the phrase is old, but it lands with fresh urgency when the systems involved are being used to assess property conditions, verify identities at Changi Airport's automated immigration lanes, and support the National Library Board's digital archive projects.
The problem is not unique to real estate. The National Heritage Board's digital collections, accessible through its roots.sg portal, have undergone several archival migrations since the platform launched. Each migration carried a risk of duplication, particularly for digitised photographs from the 1960s and 1970s covering Kampong Glam, the old Kallang Airport site, and the early HDB estates of Queenstown. Archivists have been working through a backlog, but the process is labour-intensive and no automated resolution has been publicly announced.
At the commercial level, e-commerce platforms operating out of Singapore — including Lazada's regional operations, which are headquartered here — face the same structural challenge. A single product photograph can appear across thousands of seller storefronts, confusing recommendation algorithms and, more practically, misleading buyers about the condition of second-hand goods.
The Decisions That Cannot Be Deferred
Three choices are now pressing. First, who owns the deduplication mandate? Singapore has the Infocomm Media Development Authority as the natural coordinating body for digital infrastructure standards, but it has not, as of July 2026, gazetted any binding image-data quality standard applicable across both public agencies and licensed commercial platforms. A decision on jurisdiction — whether standards are voluntary or enforceable — cannot wait much longer if the next generation of AI tools is to be trained on reliable local data.
Second, what technology standard applies? Perceptual hashing — the technique that identifies near-identical images even after minor edits — is the current industry baseline, but it produces false positives when applied to the kind of similar-but-distinct images that dominate a dense urban environment like Toa Payoh or Jurong East, where dozens of corridors are genuinely almost identical. Singapore's Government Technology Agency, better known as GovTech, has the engineering capacity to develop a local benchmark, but that requires a policy brief it has not yet received.
Third, there is the question of legacy records. Purging duplicates from active commercial databases is relatively straightforward; removing them from official archives without losing metadata or provenance information is not. The National Archives of Singapore, based at Canning Rise, is the natural custodian of that decision, and any framework it develops will set a precedent for how Southeast Asian neighbours approach the same problem.
Singapore's Digital Government Blueprint, last updated for the 2023 to 2025 cycle, committed the government to improving data quality across agencies. The next iteration, expected to be published before the end of 2026, is the clearest near-term vehicle for turning that commitment into something binding. Whether the image deduplication question makes it into that document — or gets deferred again — will tell practitioners a great deal about how seriously the city-state takes the foundations beneath its AI ambitions.