Singapore's push to become a world-class digital government has a persistent, unglamorous problem hiding underneath it: thousands of duplicate images lodged inside public databases, citizen-facing portals and agency content management systems, inflating storage costs, slowing retrieval speeds and, in some cases, serving the wrong photo to the wrong citizen at a critical moment.
The problem did not appear overnight. It is the accumulated consequence of more than a decade of rapid digitalisation — waves of platform migrations, agency mergers and emergency deployments, most recently the scramble to shift services online during the Covid-19 pandemic years of 2020 and 2021, when lead times collapsed and data hygiene protocols were deprioritised in favour of speed.
How the Backlog Built Up
Three structural factors drove the duplication. First, agencies historically operated in silos. The Housing and Development Board, which manages over one million flats across estates from Tampines to Buona Vista, maintained its own image libraries entirely separate from the Urban Redevelopment Authority's GIS-linked photo repositories. When cross-agency portals like Singpass and LifeSG were built to present a unified citizen interface, engineers pulled assets from both pools without systematic deduplication checks.
Second, the Government Technology Agency of Singapore — GovTech — oversaw multiple platform consolidations between 2018 and 2023, each of which involved migrating legacy content. Standard migration practice at the time did not include mandatory hash-based duplicate detection before ingestion, according to GovTech's published technical guidelines on the Singapore Government Developer Portal. Assets were copied forward, not rationalised.
Third, vendors. Singapore's Smart Nation initiatives brought in dozens of third-party contractors to build and maintain sub-systems. Each contractor delivered image assets packaged to their own file-naming conventions, meaning the same photograph of, say, the Central Provident Fund Board's Bishan Service Centre could exist under four different filenames across four separate content delivery systems — none of which a basic filename search would flag as identical.
By the time GovTech commissioned an internal audit in late 2024, the scale was significant. While precise aggregate figures have not been released publicly, the agency acknowledged in its FY2025 Annual Report that storage rationalisation across whole-of-government systems was a named efficiency target for the financial year ending March 2026, with digital asset deduplication listed as a component workstream.
What Deduplication Actually Involves
Replacing or removing duplicate images is not simply a matter of hitting delete. Agencies must first run perceptual hashing — a technique that identifies visually identical or near-identical files regardless of filename or format — across terabytes of stored content. Images flagged as duplicates then need a lineage check: which version is canonical, which system references it, and whether removing a copy breaks a downstream link in a citizen portal or a printed form template.
The National Library Board, which manages digital repositories at its Lee Kong Chian Reference Library at Victoria Street and its network of community libraries, completed a smaller-scale version of this exercise for its digital heritage collections in 2023. That project, carried out over roughly eight months, involved perceptual hash scanning of approximately 400,000 digitised items and resulted in a reported storage reduction of around 18 percent — a figure the Board cited in its 2023-24 corporate report.
GovTech has since developed a shared deduplication microservice available to agencies through the Singapore Government Tech Stack, the standardised toolset that public agencies are required to adopt for new systems. Agencies building or rebuilding portals after January 2025 must run new image assets through this service at ingestion. The harder work — retroactively cleaning up what already exists — falls to each agency individually, with no unified public deadline set yet.
For citizens, the practical implications are subtle but real. Duplicate assets slow page-load times on mobile connections, and mismatched images — a common symptom when two versions of an edited photo coexist — can surface incorrect information on pages like HDB's resale portal or the Ministry of Health's MediShield Life explainers. Agencies have been advised to prioritise their highest-traffic public pages first, before working through internal systems. The full rationalisation is expected to run well into 2027.