Singapore's public sector is sitting on millions of duplicate digital images, a problem years in the making that has quietly ballooned into a governance and cost headache for agencies stretching from the Housing Development Board in Toa Payoh to the Integrated Land Information Service offices at HDB Hub. The scale of the duplication — estimated by technology consultants working in the sector to run into the tens of millions of individual file copies across government repositories — has finally forced a coordinated rethink of how the city-state manages its visual data assets.
The urgency is real. Singapore's Smart Nation and Digital Government Office, which sits under the Prime Minister's Office, has been pushing agencies since at least 2022 to consolidate data infrastructure as part of a broader effort to cut cloud hosting costs and improve retrieval speed. Duplicate image files — scanned identity documents, resale flat photographs, planning blueprints, heritage records — represent dead weight on both storage budgets and staff time. Every redundant copy costs money to back up, secure and audit.
A Long Road of Siloed Systems
The duplication problem did not appear overnight. Through the 1990s and 2000s, agencies digitised their paper records independently, each building its own file-naming conventions, storage hierarchies and backup protocols. The National Archives of Singapore, headquartered at Canning Rise, ingested millions of documents with one system. The Urban Redevelopment Authority scanned planning submissions with another. When the Government Technology Agency, known as GovTech, took shape in 2016 and began pushing agencies onto shared cloud infrastructure, the legacy mess came with them.
The jump to cloud hosting through GovTech's Government on Commercial Cloud programme, which began migrating services from 2018 onwards, exposed just how tangled the situation had become. Agencies uploading existing records to platforms like AWS GovCloud and Microsoft Azure found they were paying to store multiple identical files, sometimes four or five copies of the same scanned page, because different units within the same ministry had archived the same document at different points in time. Internal reviews flagged the problem, but cross-agency coordination remained slow.
By 2024, GovTech had started rolling out automated deduplication tooling for a handful of pilot agencies, with SingHealth — which manages patient imaging records across facilities including Singapore General Hospital on Outram Road and Tan Tock Seng Hospital in Novena — among the first to trial the approach at scale. Patient imaging generates enormous volumes of data. A single CT scan can produce hundreds of image slices, and without systematic deduplication, archiving systems routinely stored multiple copies of the same study across different departmental nodes.
What the Clean-Up Looks Like in Practice
Deduplication is not simply a matter of deleting obvious copies. Agencies must first run perceptual hashing algorithms — tools that assign a fingerprint to each image based on its visual content rather than just its filename — to identify near-duplicates that may differ slightly due to compression or format conversion. That process requires computing resources upfront, legal sign-off on what constitutes a disposable duplicate under the National Archives Act, and staff retraining.
The National Library Board, which operates branches including the Central Public Library at Victoria Street and manages digital heritage collections, has faced its own version of this challenge. Photograph collections donated by community groups often arrive with multiple prints of the same image. Digitising each print separately, without cross-referencing, has compounded the storage load.
GovTech has said publicly that its commercial cloud programme is central to Singapore's goal of improving government digital services, though the agency has not released a consolidated figure for how much storage cost reduction the deduplication push is expected to yield.
For agencies still mid-migration, the practical next step is joining GovTech's Central Data Exchange framework, which sets common metadata standards that make deduplication easier to automate. Agencies that have already completed the initial audit phase are expected to begin active file removal in the second half of 2026. The lesson from this episode is straightforward: shared infrastructure without shared data standards just moves the clutter to a more expensive location.