Vanishing Public Record Makes Enterprise Data an Asset

Public data in many countries, including the U.S., once seemed like a reliable source of information, but now that data is fragile and subject to political intervention and systemic neglect. For CIOs, the implications can be profound: without stable external datasets, internal information assets must evolve from being mere operational records into strategic differentiators, new revenue opportunities, and organizational lifelines.

“We are rapidly running out of public data that is credible and usable. More and more enterprises will start to assign value to their data and go beyond partnerships to monetize it. For example, wind measurements captured by a wind turbine company could be helpful to many businesses that are not competitors,” said Olga Kupriyanova, principal consultant of AI and data engineering at ISG.

While data manipulation is a timeless tale in politics, this year the U.S. government accelerated efforts to manipulate publicly accessible data. Even seemingly nonpolitical and innocuous data, such as climate and weather records, economic indicators and scientific research, were scrubbed or tilted toward one bias or another. This is a much bigger problem than some may realize.

“We’re entering a defining moment in AI where access to reliable, scalable, and ethical data is quickly becoming the central bottleneck, and also the most valuable asset. As legal and regulatory pressure tightens access to public data, due to copyright lawsuits, privacy concerns, or manipulation of open data repositories, enterprises are being forced to rethink where their AI advantage will come from,” said Farshid Sabet, CEO and co-founder at Corvic AI, developer of a GenAI management platform.

Disappearing Public Data

For example, in early 2025, the U.S. government removed thousands of datasets and web pages, according to The New York Times, across agencies such as the EPA, NOAA, and CDC, effectively scrubbing key sources of climate, health, and environmental justice data from the public record. It was a serious and appalling move that continues to pose substantial risks for the private sector and individuals alike. Organizations depend on public data to function, and the public needs to know their risks in climate disasters, spreading communicable diseases, and economic factors like unemployment and inflation rates.

“Through our monthly Evidence Capacity Pulse Reports, we’ve documented specific operational impacts that have real-world implications for data users,” said Nick Hart, president & CEO of the Data Foundation, a non-profit organization based in Washington, D.C. that champions the use of open data and evidence-informed public policy. “For example, the National Weather Service reduced its workforce by over 500 employees, with 52 of 122 forecasting offices now having vacancy rates above 20%, leading to operational changes in weather forecasting that affects everything from agriculture to transportation planning.”

Among the casualties was FEMA’s “Future Risk Index,” a sophisticated tool that mapped community-level exposure to floods, fires, extreme heat, and hurricanes. Its deletion not only undermined disaster planning but also erased a resource that insurers, city planners, and businesses depended on to understand climate risk. The tool was considered of such significance to public safety that The Guardian recreated it.

The economic consequences of such data loss are already visible. Analysts estimate that U.S. public data underpinned nearly $750 billion of business activity as recently as 2022, according to the Department of Commerce. The loss of such data blinds companies that build models for everything from supply chain forecasting to investment strategy and predictions. Removing or destabilizing these resources not only damages confidence in the government but also clouds economic outlooks, leaving enterprises and markets vulnerable, according to Reuters.

These disruptions are not contained within the U.S. alone. According to Reuters, officials in Europe have recognized the fragility of relying on American scientific datasets. Countries across the EU are accelerating efforts to build alternative systems for collecting and storing critical environmental and climate information. Activists, researchers, and civil servants have also launched “guerrilla archiving” projects to mirror and preserve data before it disappears.

Global trust in shared information infrastructure is indisputably fractured. But trust in American scientists remains firm. “In March, more than a dozen European countries urged the EU Commission to move fast to recruit American scientists who lose their jobs to those cuts,” according to Reuters. The resulting brain drain further diminishes access to information in the U.S.

Saving and Finding Public Data in Unexpected Places

Meanwhile, private researchers and some nonprofit organizations sprang into action to monitor and preserve public data. Two examples are the aforementioned data rescue efforts via guerrilla archiving in the EU and the Future Risk Index, which was recreated by The Guardian after FEMA was mandated to destroy it.

Another example is found in a group of researchers and students at the Harvard T.H. Chan School of Public Health who immediately began a data preservation marathon in an unholy race to scrape and download public data from websites faster than government agencies could take it down. The public data they managed to save was then distributed back to the public through repositories such as the Harvard Dataverse. Unfortunately, the changes to government websites happened faster than the researchers could react. Not all of the data was preserved.

Fortunately, all is not lost. For example, federal open data continues to expand. “Data.gov includes over 317,000 datasets as of our July 31 report, up from about 308,000 data assets in January. This demonstrates that while there are capacity concerns in some areas, data access continues to grow in others. We also observed that at the Department of Education’s National Center for Education Statistics — a federal statistical agency — a decision to remove remote access for restricted use education data was reversed which allows researchers access to data through the end of 2025,” said Hart.

Hart also said that The National Secure Data Service at NSF has continued issuing contracts to build an effective multi-lateral data sharing capacity across agencies, rapidly scaling secure, responsible data linkage for research. The NSDS relies on existing data infrastructure from federal agencies, states, and other partners.

“Recently the Department of Transportation published its Open Data Plan required by the OPEN Government Data Act signed by President Trump in 2019 and following guidance issued by former President Biden. Other agencies ranging from the Securities and Exchange Commission to NASA have already published plans too, with more expected in coming weeks,” Hart added.

The Journalist’s Resource by the Harvard Kennedy School offers solid advice for journalists and others looking for clean public data or a replacement of such. The following are tips for CIOs and other company leaders looking for data that the government has manipulated or deleted:

To find the missing websites, go to Wayback Machine and type in the website’s URL in the search bar.
Check with CAFE Research Coordinating Center, which is working with dozens of researchers across the country to preserve health and climate data. Key programs include CAFE Dataverse and CAFE GitHub.

Data Rescue efforts suggested by the Muhlenberg College Trexler Library offer the following tips verbatim:

Data Rescue Efforts: an evolving list of crowd-sourced efforts to preserve and maintain accessibility to data. The website for the Data Rescue Project, which evolved from this data rescue initiative is now available here, and the Data Rescue Tracker is available here.
End of Term Crawl: an Internet Archive cache of government web sites, crawled and collected in the months between a presidential election and a presidential inauguration. 
GovWayback: a simple method for accessing historical versions of U.S. government websites from before January 20, 2025.  Some resources, like interactive websites, web forms, and contents behind password authentication are likely not included in GovWayback caches.
Harvard Library Innovation Lab: an effort from the Harvard Law School Library to provide access to major datasets from data.gov, PubMed, and federal GitHub repositories 
DataLumos, is an Inter-university Consortium for Political and Social Research (ICPSR) archive for valuable government data resources.  This international consortium of more than 760 academic institutions and research organizations maintains a data archive of more than 500,000 files of research in the social sciences, including 16 specialized collections of data in education, aging, criminal justice, substance abuse, terrorism, and other fields. 
Restored CDC is “is an independent project, not affiliated with CDC or any federal entity. Visit CDC.gov for free official information. Due to archival on January 6, 2025, recent outbreak data is unavailable. Videos are not restored. Access data.restoredcdc.org for restored data.”

New Data Monetization Opportunities

Even with the many heroic efforts to rescue, retain, recover or recreate public data, not everyone believes that will be enough.

“Public data is challenging in many ways because the quality is often questionable and therefore, so is the value it drives. Even if data quality is not an issue, data scientists often look to public data for information that can supplement their own models but, in many instances, the data is essentially useless for this purpose. To help supplement models and fill gaps, enterprises are more likely to turn to partnerships for reliable external data,” said Kupriyanova.

The doors are opening on new opportunities for CIOs to better leverage their data for internal use and external sales.

“I foresee the normalization of fine‑grained licensing frameworks that embed cryptographic watermarks and usage telemetry in each dataset shard. Provenance chains recorded via distributed ledgers will become standard evidence for downstream audit, enabling enforceable royalty structures and faster dispute resolution,” said Nic Adams, co-founder and CEO at Orcus, provider of cybersecurity solutions.

That may be where things are headed, but there are ways to cash in right now too.

“License internal data. For example, IoT telemetry, operating logs, or user analytics-enabled companies can bundle such streams as a subscription service or APIs. You can also create vertical data platforms or cooperatives. Smaller organizations can share revenue from external licensing, along with costs, through resource pooling. And you can offer synthetic data. It is possible for privacy‑safe synthetic data to meet outside demand without exposing sensitive data,” said Sandro Shubladze, CEO and founder at data extraction service Datamam.

Creative minds are hard at work thinking of ways to cash in on this data shortage. But there’s no time to waste if increased data monetization is your game.

“The smart money has already started to flow. Those who monetize their data assets now will capture premium prices before the market gets saturated with other options,” said Fergal Glynn, AI security advocate and chief marketing officer at Mindgard, an automated AI red teaming and security testing company.

“Companies that possess scientific data, climate records, economic databases, and government information are sitting on goldmines. Even specialized datasets, such as camera footage or regional climate measurements, can generate revenue through Data-as-a-Service models,” Glynn added.

Source link