Singapore's Infocomm Media Development Authority confirmed this week that it is expanding a data-quality initiative targeting duplicate and near-identical images embedded in government-linked digital archives, a move that has drawn attention from both public agencies and private AI developers operating out of one-north and the Central Business District.
The timing is deliberate. As Singapore accelerates its push to become a regional hub for AI model development — anchored by the National AI Strategy 2.0 framework launched in late 2023 — the integrity of training datasets has become a pressing operational concern. Duplicate images inflate dataset sizes, skew model outputs, and in some cases cause AI classifiers to develop systematic biases toward repeated visual content. That is a real problem when those models are being deployed in healthcare triage, public-housing maintenance inspections, or traffic-camera analytics.
What Moved This Week
On Tuesday, the Smart Nation Group released updated data-hygiene guidelines under the Singapore Government Developer Portal, specifying that agencies uploading images to shared government data lakes must run deduplication checks before submission. The guidelines reference perceptual hashing — a technique that flags visually similar images even when file names or metadata differ — as the preferred first-pass method. Agencies have until 31 October 2026 to align their existing image repositories with the new standards.
Separately, AI Singapore, the national programme office headquartered at NUS Enterprise at 21 Heng Mui Keng Terrace, held a closed-door working session on Wednesday with a dozen local startups and research labs. The session focused on open-source deduplication tools, including implementations compatible with datasets held on the government's CODEX cloud infrastructure. Participants included teams from the National University of Singapore's School of Computing and at least two firms based at Mapletree Business City in Pasir Panjang.
The Housing and Development Board has its own stake in this. HDB's Digital Services arm has been digitising decades of flat-inspection photographs — records covering more than one million residential units — as part of a broader estate-management modernisation effort. Duplicate images, often generated when field officers upload the same photograph multiple times through the HDB Mobile app, have reportedly created storage redundancies that complicate automated defect-detection workflows. HDB has not disclosed the scale of the problem publicly, but the IMDA guidelines released Tuesday specifically cite built-environment inspection workflows as a priority use case.
Why It Matters Beyond the Technical
The cost dimension is tangible. Cloud storage in Singapore runs at roughly SGD 0.023 per gigabyte per month on major commercial platforms, and government data estates can scale into petabytes. Deduplication rates of 20 to 40 percent — a range commonly cited in enterprise storage literature — would translate to meaningful recurring savings at that scale, quite apart from the downstream gains in model quality.
There is also a regional competitive angle. Singapore is actively pitching itself as a trustworthy AI jurisdiction to multinationals choosing where to anchor Southeast Asian operations. Data-quality governance, including image deduplication standards, feeds directly into that pitch. The EU's AI Act, which took fuller effect earlier this year, has pushed European clients to scrutinise training-data provenance, and Singapore wants to be the partner of choice for companies navigating those requirements.
For smaller players — the independent developers and research teams working out of spaces like Pixel @ Ayer Rajah or the Sandbox @ Jurong Innovation District — the practical upshot this week is a clearer set of expectations from the government. The IMDA guidelines are not yet mandatory for private-sector datasets, but agencies drafting AI procurement contracts are already referencing them as a baseline standard.
The 31 October deadline gives agencies roughly four months to audit and clean their image stores. Developers seeking technical guidance can access the updated specifications through the Singapore Government Developer Portal, while AI Singapore is expected to publish a summary of Wednesday's working session, including recommended open-source tooling, by the end of this month.