Singapore's push to overhaul its digital record-keeping hit a concrete milestone this week, as institutions managing large image libraries accelerated work to identify and replace duplicate photographs and scanned documents clogging their public-facing databases. The effort, which spans agencies from the National Library Board to the Urban Redevelopment Authority, reflects a broader government drive to sharpen data hygiene ahead of a planned expansion of AI-assisted public services in the second half of 2026.
The timing matters. Singapore is positioning itself as Southeast Asia's primary AI development hub, and the credibility of that claim rests partly on the quality of training data held in public repositories. Duplicate imagery — the same photograph indexed under multiple file names, or scanned twice from legacy print archives — degrades search results, inflates storage costs, and introduces errors into machine-learning pipelines that government agencies are building out. When the same image of, say, the old National Theatre site appears forty times under different metadata tags, it distorts both public history records and the datasets built on top of them.
What Happened This Week
At the National Library Board's Lee Kong Chian Reference Library on Victoria Street, archivists completed a first-pass audit of the PictureSG collection — a publicly searchable archive of historical photographs covering Singapore from the 1800s to the present day. The audit flagged a substantial volume of entries that either shared pixel-identical source files or were scans of the same physical print from different points in time. Staff are now working through a de-duplication protocol developed in-house, replacing redundant entries with a single canonical record while preserving all associated metadata and provenance notes.
Separately, the Urban Redevelopment Authority updated its SPACE database — which holds planning and conservation images tied to properties across the island — to reflect new file-management standards introduced under the Smart Nation and Digital Government Office's data governance framework. The URA manages visual records for more than 7,000 conserved buildings, including shophouses in Tanjong Pagar and pre-war terrace rows in Emerald Hill. Redundant images in that system had accumulated partly because multiple departments submitted overlapping photo sets during development applications over the years.
The Infocomm Media Development Authority has for several years cited data quality as a prerequisite for responsible AI deployment, and the duplicate-image problem falls squarely within that agenda. Storage is not trivial: cloud infrastructure costs for Singapore's public sector have risen steadily, and even modest reductions in redundant file volume translate into measurable savings at scale. Industry benchmarks suggest duplicate files can account for anywhere between 10 and 30 percent of total image library size in large institutional archives, though the exact figures for Singapore's specific repositories have not been publicly released.
Why This Affects Ordinary Singaporeans
The practical consequences reach beyond government server rooms. Researchers at Singapore Management University and Nanyang Technological University regularly draw on PictureSG and related public collections for urban history and heritage studies. Duplicate records without clear canonical status create citation problems — a researcher cannot be certain which version of an image is authoritative when multiple entries exist with slightly different metadata. The de-duplication work aims to resolve that ambiguity by the end of the third quarter of 2026.
Members of the public who use the National Archives of Singapore's online portal, which operates from a building off Canning Rise in Fort Canning, will eventually see cleaner search returns as the project progresses. Searches for heritage images of Chinatown or Kampong Glam, for instance, have sometimes returned multiple near-identical results occupying the first page of results — a minor frustration that archivists say the new protocol is designed to eliminate.
Institutions involved have not announced a single centralised completion date, but the Lee Kong Chian audit is expected to produce a revised public-facing PictureSG interface by September 2026. Anyone who has flagged duplicate entries through the National Library Board's public feedback form — accessible via the NLB website — is encouraged to check their submissions again once the updated records go live.