Connect with us

Tech

What are the storage requirements for AI training and inference? | Computer Weekly

Published

on

What are the storage requirements for AI training and inference? | Computer Weekly


Despite ongoing speculation around an investment bubble that may be set to burst, artificial intelligence (AI) technology is here to stay. And while an over-inflated market may exist at the level of the suppliers, AI is well-developed and has a firm foothold among organisations of all sizes.

But AI workloads place specific demands on IT infrastructure and on storage in particular. Data volumes can start big and then balloon, in particular during training phases as data is vectorised and checkpoints are created. Meanwhile, data must be curated, gathered and managed throughout its lifecycle.

In this article, we look at the key characteristics of AI workloads, the particular demands of training and inference on storage I/O, throughput and capacity, whether to choose object or file storage, and the storage requirements of agentic AI.

What are the key characteristics of AI workloads?

AI workloads can be broadly categorised into two key stages – training and inference.

During training, processing focuses on what is effectively pattern recognition. Large volumes of data are examined by an algorithm – likely part of a deep learning framework like TensorFlow or PyTorch – that aims to recognise features within the data.

This could be visual elements in an image or particular words or patterns of words within documents. These features, which might fall under the broad categories of “a cat” or “litigation”, for example, are given values and stored in a vector database.

The assigned values provide for further detail. So, for example “a tortoiseshell cat”, would comprise discrete values for “cat” and “tortoiseshell”, that make up the whole concept and allow comparison and calculation between images.

Once the AI system is trained on its data, it can then be used for inference – literally, to infer a result from production data that can be put to use for the organisation.

So, for example, we may have an animal tracking camera and we want it to alert us when a tortoiseshell cat crosses our garden. To do that it would infer the presence or not of a cat and whether it is tortoiseshell by reference to the dataset built during the training described above.

But, while AI processing falls into these two broad categories, it is not necessarily so clear cut in real life. It will always be the case that training will be done on an initial dataset. But after that it is likely that while inference is an ongoing process, training also becomes perpetual as new data is ingested and new inference results from it.

So, to labour the example, our cat-garden-camera system may record new cats of unknown types and begin to categorise their features and add them to the model.

What are the key impacts on data storage of AI processing?

At the heart of AI hardware are specialised chips called graphics processing units (GPUs). These do the grunt processing work of training and are incredibly powerful, costly and often difficult to procure. For these reasons their utilisation rates are a major operational IT consideration – storage must be able to handle their I/O demands so they are optimally used.

Therefore, data storage that feeds GPUs during training must be fast, so it’s almost certainly going to be built with flash storage arrays.

Another key consideration is capacity. That’s because AI datasets can start big and get much bigger. As datasets undergo training, the conversion of raw information into vector data can see data volumes expand by up to 10 times.

Also, during training, checkpointing is carried out at regular intervals, often after every “epoch” or pass through the training data, or after changes are made to parameters.

Checkpoints are similar to snapshots, and allow training to be rolled back to a point in time if something goes wrong so that existing processing does not go to waste. Checkpointing can add significant data volume to storage requirements.

So, sufficient storage capacity must be available, and will often need to scale rapidly.

What are the key impacts of AI processing on I/O and capacity in data storage?

The I/O demands of AI processing on storage are huge. It is often the case that model data in use will just not fit into a single GPU memory and so is parallelised across many of them.

Also, AI workloads and I/O differ significantly between training and inference. As we’ve seen, the massive parallel processing involved in training requires low latency and high throughput.

While low latency is a universal requirement during training, throughput demands may differ depending on the deep learning framework used. PyTorch, for example, stores model data as a large number of small files while TensorFlow uses a smaller number of large model files.

The model used can also impact capacity requirements. TensorFlow checkpointing tends towards larger file sizes, plus dependent data states and metadata, while PyTorch checkpointing can be more lightweight. TensorFlow deployments tend to have a larger storage footprint generally.

If the model is parallelised across numerous GPUs this has an effect on checkpoint writes and restores that mean storage I/O must be up to the job.

Does AI processing prefer file or object storage?

While AI infrastructure isn’t necessarily tied to one or other storage access method, object storage has a lot going for it.

Most enterprise data is unstructured data and exists at scale, and it is often what AI has to work with. Object storage is supremely well suited to unstructured data because of its ability to scale. It also comes with rich metadata capabilities that can help data discovery and classification before AI processing begins in earnest.

File storage stores data in a tree-like hierarchy of files and folders. That can become unwieldy to access at scale. Object storage, by contrast, stores data in a “flat” structure, by unique identifier, with rich metadata. It can mimic file and folder-like structures by addition of metadata labels, which many will be familiar with in cloud-based systems such as Google Drive, Microsoft OneDrive and so on.

Object storage can, however, be slow to access and lacks file-locking capability, though this is likely to be of less concern for AI workloads.

What impact will agentic AI have on storage infrastructure?

Agentic AI uses autonomous AI agents that can carry out specific tasks without human oversight. They are tasked with autonomous decision-making within specific, predetermined boundaries.

Examples would include the use of agents in IT security to scan for threats and take action without human involvement, to spot and initiate actions in a supply chain, or in a call centre to analyse customer sentiment, review order history and respond to customer needs.

Agentic AI is largely an inference phase phenomenon so compute infrastructure will not need to be up to training-type workloads. Having said that, agentic AI agents will potentially access multiple data sources across on-premises systems and the cloud. That will cover the range of potential types of storage in terms of performance.

But, to work at its best, agentic AI will need high-performance, enterprise-class storage that can handle a wide variety of data types with low latency and with the ability to scale rapidly. That’s not to say datasets in less performant storage cannot form part of the agentic infrastructure. But if you want your agents to work at their best you’ll need to provide the best storage you can.



Source link

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Tech

Fermented fibers could tackle both world hunger and fashion waste

Published

on

Fermented fibers could tackle both world hunger and fashion waste


The leftover yeast from brewing beer, wine or even to make some pharmaceuticals can be repurposed to produce high-performance fibers stronger than natural fibers with significantly less environmental impact, according to a new study led by researchers at Penn State. Credit: Penn State

A fermentation byproduct might help to solve two major global challenges: world hunger and the environmental impact of fast fashion. The leftover yeast from brewing beer, wine or even to make some pharmaceuticals can be repurposed to produce high-performance fibers stronger than natural fibers with significantly less environmental impact, according to a new study led by researchers at Penn State and published in the Proceedings of the National Academy of Sciences.

The yeast biomass—composed of proteins, fatty molecules called lipids and sugars—left over from alcohol and is regarded as waste, but lead author Melik Demirel, Pearce Professor of Engineering and Huck Chair in Biomimetic Materials at Penn State, said his team realized they could repurpose the material to make fibers using a previously developed process.

The researchers successfully achieved pilot-scale production of the fiber—producing more than 1,000 pounds—in a factory in Germany, with continuous and batch production for more than 100 hours per run of fiber spinning.

They also used data collected during this production for a lifecycle assessment, which assessed the needs and impact of the product from obtaining the raw fermentation byproduct through its life to disposal and its cost, and to evaluate the economic viability of the technology. The analysis predicted the cost, , production output, greenhouse gas emissions and more at every stage.

Ultimately, the researchers found that the commercial-scale production of the fermentation-based fiber could compete with wool and other fibers at scale but with considerably fewer resources, including far less land—even when accounting for the land needed to grow the crops used in the fermentation processes that eventually produce the yeast biomass.

“Just as domesticated sheep for wool 11,000 years ago, we’re domesticating yeast for a fiber that could shift the agricultural lens to focus far more resources to ,” said Demirel, who is also affiliated with the Materials Research Institute and the Institute of Energy and the Environment, both at Penn State.

“We successfully demonstrated that this material can be made cheaply—for $6 or less per kilogram, which is about 2.2 pounds, compared to wool’s $10 to $12 per kilogram—with significantly less water and land but improved performance compared to any other natural or processed fibers, while also nearly eliminating greenhouse gas emissions. The saved resources could be applied elsewhere, like repurposing land to grow food crops.”

Waste not, want not

Demirel’s team has spent over a decade developing a process to produce a fiber from proteins. Inspired by nature, the fiber is durable and free of the chemicals other fibers can leave in the environment for years.

“We can pull the proteins as an aggregate—mimicking naturally occurring protein accumulations called amyloids—from the yeast, dissolve the resulting pulp in a solution, and push that through a device called a spinneret that uses tiny spigots to make continuous fibers,” Demirel said, explaining the fibers are then washed, dried and spun into yarn that can then be woven into fabric for clothes.

He also noted that the fibers are biodegradable, meaning they would break down after disposal, unlike the millions of tons of polyester clothing discarded every year that pollutes the planet.

“The key is the solution used to dissolve the pulp. This solvent is the same one used to produce Lyocell, the fiber derived from cellulose, or wood pulp. We can recover 99.6% of the solvent used to reuse it in future production cycles.”

The idea of using proteins to make fiber is not new, according to Demirel, who pointed to Lanital as an example. The material was developed in the 1930s from milk protein, but it fell out of fashion due to low strength with the advent of polyester.

“The issue has always been performance and cost,” Demirel said, noting the mid-20th century also saw the invention of fibers made from peanut proteins and from corn proteins before cheap and stronger polyester ultimately reigned.

Fermentation waste used to make natural fabric
Replacing conventional fabric fibers — like cotton — with the novel material could free up land, water and other resources to grow more food crops and reduce fast fashion waste, according to the project’s lead researcher Penn State Professor Melik Demirel. Credit: Penn State

Freeing land from fiber to produce food

Beyond producing a quality fiber, Demirel said, the study also indicated the fiber’s potential on a commercial scale. The models rolled their pilot-scale findings into simulated scenarios of commercial production. For comparison, about 55,000 pounds of cotton are produced globally every year and just 2.2 pounds—about what it takes to make one T-shirt and one pair of jeans—requires up to 2,642 gallons of water. Raw cotton is relatively cheap, Demirel said, but the environmental cost is staggering.

“Cotton crops also use about 88 million acres, of farmable land around the world—just under 40% of that is in India, which ranks as ‘serious’ on the Global Hunger Index,” Demirel said.

“Imagine if instead of growing cotton, that land, water, resources and energy could be used to produce crops that could feed people. It’s not quite as simple as that, but this analysis demonstrated that biomanufactured fibers require significantly less land, water and other resources to produce, so it’s feasible to picture how shifting from crop-based fibers could free up a significant amount of land for food production.”

In 2024, 733 million people—about one in 12—around the world faced food insecurity, a continued trend that has led the United Nations to declare a goal of Zero Hunger to eliminate this issue by 2030. One potential solution may be to free land currently used to grow fiber crops to produce more food crops, according to Demirel.

Current production methods not only use significant resources, he said, but more than 66% of clothing produced annually in the U.S. alone ends up in landfills. Demirel’s approach offers a solution for both problems, he said.

“By leveraging biomanufacturing, we can produce sustainable, high-performance fibers that do not compete with food crops for land, water or nutrients,” Demirel said. “Adopting biomanufacturing-based protein fibers would mark a significant advancement towards a future where fiber needs are fulfilled without compromising the planet’s capacity to nourish its growing population. We can make significant strides towards achieving the Zero Hunger goal, ensuring everyone can access nutritious food while promoting sustainable development goals.”

Future of fiber

Demirel said the team plans to further investigate the viability of fermentation-based fibers at a commercial scale.

The team includes Benjamin Allen, chief technology officer, and Balijit Ghotra, Tandem Repeat Technologies, Inc., the spin-off company founded by Demirel and Allen based on this fiber production approach. The work has a patent pending, and the Penn State Office of Technology Transfer licensed the technology to Tandem Repeat Technologies. Other co-authors include Birgit Kosan, Philipp Köhler, Marcus Krieg, Christoph Kindler and Michael Sturm, all with the Thüringisches Institut für Textil- und Kunststoff-Forschung (TITK) e. V. in Germany.

“In my lab at Penn State, we demonstrated we could physically make the fiber,” Demirel said. “In this pilot production at the factory, together with Tandem and TITK, we demonstrated we could make the fiber a contender in the global fiber market. Sonachic, an online brand formed by Tandem Repeat, makes this a reality. Next, we will bring it to mass market.”

More information:
Impact of biomanufacturing protein fibers on achieving sustainable development, Proceedings of the National Academy of Sciences (2025). DOI: 10.1073/pnas.2508931122

Citation:
Fermented fibers could tackle both world hunger and fashion waste (2025, November 3)
retrieved 3 November 2025
from https://techxplore.com/news/2025-10-fermented-fibers-tackle-world-hunger.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Continue Reading

Tech

Student trust in AI coding tools grows briefly, then levels off with experience

Published

on

Student trust in AI coding tools grows briefly, then levels off with experience


Screenshot of the code students worked on in the study. Credit: University of California – San Diego

How much do undergraduate computer science students trust chatbots powered by large language models like GitHub Copilot and ChatGPT? And how should computer science educators modify their teaching based on these levels of trust?

These were the questions that a group of U.S. computer scientists set out to answer in a study that will be presented at the Koli Calling conference Nov. 11 to 16 in Finland. In the course of the study’s few weeks, researchers found that trust in generative AI tools increased in the short run for a majority of students.

But in the long run, students said they realized they needed to be competent programmers without the help of AI tools. This is because these tools often generate incorrect or would not help students with code comprehension tasks.

The study was motivated by the dramatic change in the skills required from undergraduate students since the advent of generative AI tools that can create code from scratch. The work is published on the arXiv preprint server.

“Computer science and programming is changing immensely,” said Gerald Soosairaj, one of the paper’s senior authors and an associate teaching professor in the Department of Computer Science and Engineering at the University of California San Diego.

Today, students are tempted to overly rely on chatbots to generate code and, as a result, might not learn the basics of programming, researchers said. These tools also might generate code that is incorrect or vulnerable to cybersecurity attacks. Conversely, students who refuse to use chatbots miss out on the opportunity to program faster and be more productive.

But once they graduate, computer science students will most likely use generative AI tools in their day-to-day, and need to be able to do so effectively. This means they will still need to have a solid understanding of the fundamentals of computing and how programs work, so they can evaluate the AI-generated code they will be working with, researchers said.

“We found that student trust, on average, increased as they used GitHub Copilot throughout the study. But after completing the second part of the study–a more elaborate project–students felt that using Copilot to its full extent requires a competent programmer that can complete some tasks manually,” said Soosairaj.

The study surveyed 71 junior and senior computer science students, half of whom had never used GitHub Copilot. After an 80-minute class where researchers explained how GitHub Copilot works and had students use the , half of the students said their trust in the tool had increased, while about 17% said it had decreased. Students then took part in a 10-day-long project where they worked on a large open-source codebase using GitHub Copilot throughout the project to add a small new functionality to the codebase.

At the end of the project, about 39% of students said their trust in Copilot had increased. But about 37% said their trust in Copilot had decreased somewhat while about 24% said it had not changed.

The results of this study have important implications for how computer science educators should approach the introduction of AI assistants in introductory and advanced courses. Researchers make a series of recommendations for computer science educators in an undergraduate setting.

  • To help students calibrate their trust and expectations of AI assistants, computer science educators should provide opportunities for students to use AI programming assistants for tasks with a range of difficulty, including tasks within large codebases.
  • To help students determine how much they can AI assistants’ output, computer science educators should ensure that students can still comprehend, modify, debug, and test code in large codebases without AI assistants.
  • Computer science educators should ensure that students are aware of how AI assistants generate output via processing so that students understand the AI assistants’ expected behavior.
  • Computer science educators should explicitly inform and demonstrate key features of AI assistants that are useful for contributing to a large code base, such as adding files as context while using the ‘explain code’ feature and using keywords such as “/explain,” “/fix,” and “/docs” in GitHub Copilot.

“CS educators should be mindful that how we present and discuss AI assistants can impact how students perceive such assistants,” the researchers write.

Researchers plan to repeat their experiment and survey with a larger pool of 200 students this winter quarter.

More information:
Anshul Shah et al, Evolution of Programmers’ Trust in Generative AI Programming Assistants, arXiv (2025). DOI: 10.48550/arxiv.2509.13253

Conference: www.kolicalling.fi/

Journal information:
arXiv


Citation:
Student trust in AI coding tools grows briefly, then levels off with experience (2025, November 3)
retrieved 3 November 2025
from https://techxplore.com/news/2025-11-student-ai-coding-tools-briefly.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Continue Reading

Tech

Denmark inaugurates rare low-carbon hydrogen plant

Published

on

Denmark inaugurates rare low-carbon hydrogen plant


Using eight electrolysers powered by solar and wind energy, the plant will produce around eight tonnes of hydrogen a day in its first phase.

Denmark inaugurated one of Europe’s few low-carbon hydrogen plants on Monday, a sector touted as a key to cleaner energy but plagued with challenges.

Using eight electrolyzers powered by solar and , the HySynergy project will produce around eight tonnes of a day in its first phase, to be transported to a nearby refinery and to Germany.

Hydrogen has been touted as a potential energy game-changer that could decarbonize industry and heavy transport.

Unlike , which emit planet-warming carbon, hydrogen simply produces when burned.

But producing so-called “green hydrogen” remains a challenge, and the sector is still struggling to take off in Europe, with a multitude of projects abandoned or delayed.

Originally scheduled to open in 2023, the HySynergy project, based in Fredericia in western Denmark, has suffered from delays.

According to the International Energy Agency (IEA), only four plants currently produce low-carbon hydrogen in Europe, none with a capacity greater than one megawatt.

HySynergy will initially produce 20 megawatts, but “our ambitions grow far beyond” that, said Jakob Korsgaard, founder and CEO of Everfuel, which owns 51% of the project.

“We have power connection, we have land, we have utilities starting to be ready for expansions right here, up to 350 megawatts,” he told AFP.

With the technology not yet fully mature, hydrogen often remains far too expensive compared to the gas and oil it aims to replace, mainly due to the cost of electricity required for its production.

Outside of China, which is leading the sector, the “slower-than-expected deployment” is limiting the potential cost reductions from larger-scale production, the IEA said in recent report.

It added that “only a small share of all announced projects are expected to be operative by 2030.”

“The growth of green hydrogen depends on the political momentum,” Korsgaard said, urging European Union countries and politicians to push for ambitious implementation of the EU’s so-called RED III renewable energy directive.

The directive sets a goal of at least 42.5% in the EU’s gross final consumption by 2030, and highlights fuels such as low-carbon hydrogen.

© 2025 AFP

Citation:
Denmark inaugurates rare low-carbon hydrogen plant (2025, November 3)
retrieved 3 November 2025
from https://techxplore.com/news/2025-11-denmark-inaugurates-rare-carbon-hydrogen.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Continue Reading

Trending