Singapore's public and private sector is carrying an estimated tens of millions of duplicate digital images across government portals, e-commerce marketplaces and news archives — a largely invisible data inefficiency that is now drawing attention from the Infocomm Media Development Authority as the city-state pushes to consolidate its AI and cloud infrastructure ahead of a 2027 national digital-readiness review.
The issue matters now for a precise reason. Singapore's three major hyperscale data centre zones — at Jurong, Tuas and the Loyang corridor in Pasir Ris — are running at elevated utilisation rates following the government's partial lift of the data centre moratorium in 2022. Every redundant gigabyte retained on commercial storage racks translates into measurable energy draw, a problem the National Environment Agency has flagged under its Green Plan 2030 targets, which include cutting data centre energy intensity by 10 percent by 2030 compared to a 2018 baseline.
What the Numbers Actually Show
Industry benchmarks from cloud storage audits conducted in comparable high-density digital markets — including Tokyo and London — consistently find that between 20 and 40 percent of stored image assets on large content platforms are exact or near-exact duplicates. Applied conservatively to Singapore's context, that figure is not trivial. The Singapore Tourism Board alone manages image libraries spanning destination photography, event archives and partner-submitted promotional material accumulated across more than a decade of campaigns. The National Heritage Board's digital collections, accessible via the National Library Board's BookSG and Roots.sg portals, similarly hold overlapping photographic records from digitisation drives that ran in parallel between 2018 and 2023.
On the commercial side, Lazada and Shopee — both with significant Singapore operational infrastructure — together host product listings in the hundreds of millions. Sellers routinely re-upload identical product photographs across multiple listing categories, creating layered duplication. A 2024 analysis published by a Southeast Asian cloud cost-optimisation firm estimated that regional e-commerce platforms could cut object storage costs by up to 18 percent through aggressive deduplication alone. At current AWS S3 standard storage rates of approximately USD 0.023 per gigabyte per month, the savings at scale run into six figures annually for a platform of Shopee's size in this market.
Tools, Timelines and What Comes Next
The technical fix exists and is well-understood. Perceptual hashing algorithms — tools that generate a compact numerical fingerprint of an image and flag near-identical copies even when filenames or metadata differ — have been commercially available since the early 2010s. What has lagged is institutional will and workflow integration. The IMDA's Digital Access for All initiative, updated in March 2025, now includes a sub-programme encouraging statutory boards to conduct annual digital asset audits, though compliance reporting under that sub-programme is not yet mandatory.
Several Singapore polytechnics, including Ngee Ann Polytechnic's School of InfoComm Technology in Clementi, have incorporated deduplication exercises into final-year data engineering modules, reflecting industry demand for graduates who understand storage hygiene alongside the more glamorous work of model training and inference. That pipeline will matter: the Smart Nation Group under the Prime Minister's Office has earmarked SGD 1 billion over five years for public-sector digital infrastructure upgrades, and storage rationalisation is among the cost-saving levers officials have identified in budget documentation tabled in Parliament in February 2026.
For businesses and content managers operating in Singapore, the practical steps are concrete. Platforms running on AWS or Google Cloud can activate native deduplication settings at the storage bucket level at no additional cost. Organisations managing on-premise archives at facilities like the Alexandra Technopark data hub should schedule a perceptual-hash audit before the next government carbon reporting cycle closes in December 2026. The arithmetic is not complicated: fewer duplicate images mean lower storage expenditure, lower energy consumption and cleaner datasets for any AI model trained on that content. In a city that has staked significant credibility on being a lean, efficient digital hub, the case for acting on this is straightforward.