Singapore's digital estate has a duplication problem. Across government portals, property listing platforms, and heritage archives, the same images are being stored, licensed, and served multiple times — costing agencies storage bandwidth, creating legal exposure over intellectual property, and muddying the public record. The question of what to do next is no longer theoretical. Decisions made in the coming months will determine how the city manages its visual infrastructure through the next decade of AI-driven content systems.
The issue has sharpened because of timing. Singapore's Smart Nation and Digital Government Office has been consolidating data assets across ministries since 2023, and duplicate imagery buried inside those datasets has surfaced as a non-trivial compliance problem. At the same time, platforms operating on the Central Provident Fund Board's MyNiceHome portal and the Housing Development Board's resale listings rely on uploaded flat photographs that are frequently duplicated when agents relist units. Industry observers have flagged that without a deduplication layer, the same Tampines or Bukit Timah flat photograph can exist in a dozen separate database entries simultaneously.
Why Deduplication Is Now a Policy Question, Not Just a Technical One
Storage costs are one thing. Legal liability is another. Under Singapore's Copyright Act 2021, which came into force on 21 November 2021 and introduced new rules on authorship and orphan works, a duplicated image that cannot be traced to its original licensor creates genuine exposure for the platform hosting it. The Intellectual Property Office of Singapore has published guidance encouraging organisations to conduct rights audits before embedding imagery into AI training datasets — a pressure point that has become acute as agencies across one-north and the Jurong Lake District tech cluster begin feeding visual content into machine learning pipelines.
The Urban Redevelopment Authority's digitised planning records and the National Archives of Singapore, which holds more than 10 million images and documents from colonial-era surveys to independence-era construction, are both understood to be reviewing their deduplication workflows. Neither agency has issued a public timeline, but the Archives' digitisation roadmap covers the period to 2027, making the next 18 months the natural window for any structural fix.
Commercially, the stakes are clearest in property. PropertyGuru, which operates the dominant listing portal in Singapore, has previously disclosed that user-submitted photographs represent the largest single category of uploaded content on its platform. When landlords or agents recycle photographs across multiple listings — a common practice along corridors like Orchard Road and in dense estates like Woodlands — the platform's image index balloons with near-identical files. Deduplication tools using perceptual hashing, already standard in newsroom content management systems at outlets in London and New York, remain unevenly adopted locally.
The Decisions That Cannot Be Deferred
Three choices are converging. First, whether deduplication is mandated at the point of upload — which shifts processing costs to platforms — or handled retrospectively in bulk, which is cheaper per image but slower. Second, whether deduplicated images are deleted, archived, or merged with metadata chains that preserve provenance. The National Heritage Board's ongoing work on the Singapore Memory Project suggests a preference for preservation over deletion, but that approach requires significantly more storage provisioning. Third, who pays. Under the GovTech procurement framework, technology solutions acquired by public agencies go through the Government Technology Agency's vendor panels; a centralised deduplication service could theoretically be tendered as a shared service, reducing individual agency costs.
Private platforms face a parallel set of choices without a centralised mandate. The Personal Data Protection Commission's guidelines on data minimisation, last updated in March 2024, provide indirect pressure to avoid retaining redundant copies of images that include identifiable individuals — a category that covers almost every photograph of an occupied HDB flat interior.
The practical path forward for any organisation sitting on a large image library starts with an audit using open-standard perceptual hashing tools, followed by a legal review of licensing chains for images acquired before 2021. Agencies with archives predating Singapore's current copyright framework face the most complex remediation. The window to act before AI ingestion pipelines embed duplicate content permanently into model weights is narrowing, and the cost of cleaning up after that point will be considerably higher than the cost of acting now.