Connect with us

Tech

Using generative AI to diversify virtual training grounds for robots

Published

on

Using generative AI to diversify virtual training grounds for robots



Chatbots like ChatGPT and Claude have experienced a meteoric rise in usage over the past three years because they can help you with a wide range of tasks. Whether you’re writing Shakespearean sonnets, debugging code, or need an answer to an obscure trivia question, artificial intelligence systems seem to have you covered. The source of this versatility? Billions, or even trillions, of textual data points across the internet.

Those data aren’t enough to teach a robot to be a helpful household or factory assistant, though. To understand how to handle, stack, and place various arrangements of objects across diverse environments, robots need demonstrations. You can think of robot training data as a collection of how-to videos that walk the systems through each motion of a task. Collecting these demonstrations on real robots is time-consuming and not perfectly repeatable, so engineers have created training data by generating simulations with AI (which don’t often reflect real-world physics), or tediously handcrafting each digital environment from scratch.

Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and the Toyota Research Institute may have found a way to create the diverse, realistic training grounds robots need. Their “steerable scene generation” approach creates digital scenes of things like kitchens, living rooms, and restaurants that engineers can use to simulate lots of real-world interactions and scenarios. Trained on over 44 million 3D rooms filled with models of objects such as tables and plates, the tool places existing assets in new scenes, then refines each one into a physically accurate, lifelike environment.

Steerable scene generation creates these 3D worlds by “steering” a diffusion model — an AI system that generates a visual from random noise — toward a scene you’d find in everyday life. The researchers used this generative system to “in-paint” an environment, filling in particular elements throughout the scene. You can imagine a blank canvas suddenly turning into a kitchen scattered with 3D objects, which are gradually rearranged into a scene that imitates real-world physics. For example, the system ensures that a fork doesn’t pass through a bowl on a table — a common glitch in 3D graphics known as “clipping,” where models overlap or intersect.

How exactly steerable scene generation guides its creation toward realism, however, depends on the strategy you choose. Its main strategy is “Monte Carlo tree search” (MCTS), where the model creates a series of alternative scenes, filling them out in different ways toward a particular objective (like making a scene more physically realistic, or including as many edible items as possible). It’s used by the AI program AlphaGo to beat human opponents in Go (a game similar to chess), as the system considers potential sequences of moves before choosing the most advantageous one.

“We are the first to apply MCTS to scene generation by framing the scene generation task as a sequential decision-making process,” says MIT Department of Electrical Engineering and Computer Science (EECS) PhD student Nicholas Pfaff, who is a CSAIL researcher and a lead author on a paper presenting the work. “We keep building on top of partial scenes to produce better or more desired scenes over time. As a result, MCTS creates scenes that are more complex than what the diffusion model was trained on.”

In one particularly telling experiment, MCTS added the maximum number of objects to a simple restaurant scene. It featured as many as 34 items on a table, including massive stacks of dim sum dishes, after training on scenes with only 17 objects on average.

Steerable scene generation also allows you to generate diverse training scenarios via reinforcement learning — essentially, teaching a diffusion model to fulfill an objective by trial-and-error. After you train on the initial data, your system undergoes a second training stage, where you outline a reward (basically, a desired outcome with a score indicating how close you are to that goal). The model automatically learns to create scenes with higher scores, often producing scenarios that are quite different from those it was trained on.

Users can also prompt the system directly by typing in specific visual descriptions (like “a kitchen with four apples and a bowl on the table”). Then, steerable scene generation can bring your requests to life with precision. For example, the tool accurately followed users’ prompts at rates of 98 percent when building scenes of pantry shelves, and 86 percent for messy breakfast tables. Both marks are at least a 10 percent improvement over comparable methods like “MiDiffusion” and “DiffuScene.”

The system can also complete specific scenes via prompting or light directions (like “come up with a different scene arrangement using the same objects”). You could ask it to place apples on several plates on a kitchen table, for instance, or put board games and books on a shelf. It’s essentially “filling in the blank” by slotting items in empty spaces, but preserving the rest of a scene.

According to the researchers, the strength of their project lies in its ability to create many scenes that roboticists can actually use. “A key insight from our findings is that it’s OK for the scenes we pre-trained on to not exactly resemble the scenes that we actually want,” says Pfaff. “Using our steering methods, we can move beyond that broad distribution and sample from a ‘better’ one. In other words, generating the diverse, realistic, and task-aligned scenes that we actually want to train our robots in.”

Such vast scenes became the testing grounds where they could record a virtual robot interacting with different items. The machine carefully placed forks and knives into a cutlery holder, for instance, and rearranged bread onto plates in various 3D settings. Each simulation appeared fluid and realistic, resembling the real-world, adaptable robots steerable scene generation could help train, one day.

While the system could be an encouraging path forward in generating lots of diverse training data for robots, the researchers say their work is more of a proof of concept. In the future, they’d like to use generative AI to create entirely new objects and scenes, instead of using a fixed library of assets. They also plan to incorporate articulated objects that the robot could open or twist (like cabinets or jars filled with food) to make the scenes even more interactive.

To make their virtual environments even more realistic, Pfaff and his colleagues may incorporate real-world objects by using a library of objects and scenes pulled from images on the internet and using their previous work on “Scalable Real2Sim.” By expanding how diverse and lifelike AI-constructed robot testing grounds can be, the team hopes to build a community of users that’ll create lots of data, which could then be used as a massive dataset to teach dexterous robots different skills.

“Today, creating realistic scenes for simulation can be quite a challenging endeavor; procedural generation can readily produce a large number of scenes, but they likely won’t be representative of the environments the robot would encounter in the real world. Manually creating bespoke scenes is both time-consuming and expensive,” says Jeremy Binagia, an applied scientist at Amazon Robotics who wasn’t involved in the paper. “Steerable scene generation offers a better approach: train a generative model on a large collection of pre-existing scenes and adapt it (using a strategy such as reinforcement learning) to specific downstream applications. Compared to previous works that leverage an off-the-shelf vision-language model or focus just on arranging objects in a 2D grid, this approach guarantees physical feasibility and considers full 3D translation and rotation, enabling the generation of much more interesting scenes.”

“Steerable scene generation with post training and inference-time search provides a novel and efficient framework for automating scene generation at scale,” says Toyota Research Institute roboticist Rick Cory SM ’08, PhD ’10, who also wasn’t involved in the paper. “Moreover, it can generate ‘never-before-seen’ scenes that are deemed important for downstream tasks. In the future, combining this framework with vast internet data could unlock an important milestone towards efficient training of robots for deployment in the real world.”

Pfaff wrote the paper with senior author Russ Tedrake, the Toyota Professor of Electrical Engineering and Computer Science, Aeronautics and Astronautics, and Mechanical Engineering at MIT; a senior vice president of large behavior models at the Toyota Research Institute; and CSAIL principal investigator. Other authors were Toyota Research Institute robotics researcher Hongkai Dai SM ’12, PhD ’16; team lead and Senior Research Scientist Sergey Zakharov; and Carnegie Mellon University PhD student Shun Iwase. Their work was supported, in part, by Amazon and the Toyota Research Institute. The researchers presented their work at the Conference on Robot Learning (CoRL) in September.



Source link

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Tech

Rednote Draws a Line Between China and the World

Published

on

Rednote Draws a Line Between China and the World


Some Rednote users have reported that their accounts were automatically converted from the Chinese to the international version of the website recently. One American user, who asked to remain anonymous to avoid being punished by the platform, shared a screenshot with WIRED showing that when he logged into the platform in April, a banner appeared that read “Your account is a rednote account. We have automatically redirected you to rednote.com.”

The user says he registered his account with a Chinese phone number years ago, but suspects his account was converted because of using a non-Chinese IP address. “I have never posted from China. It’s always been in the United States. Obviously, in one glance, they can see this is an American posting in English,” he says.

Looming Split

After TikTok sidestepped a US shutdown by selling a majority stake in its American business, most of the “refugees” who had fled to Rednote went back to the video app or to other platforms. Those who stayed often did so because they value reading about and talking directly with Chinese people living in China. They now worry that a corporate split could destroy what had been one of the strongest bridges between the Chinese internet and the wider world.

Jerry Liu, a Vancouver-based TikTok influencer known for sharing funny content about Rednote itself, said in a November video that he was told by staff at the company’s Shanghai office that international users should expect to see less Chinese content and more North American content in the future. “I feel frustrated. I think it’s just gonna be less fun,” he said in the video.

Rednote had tried the TikTok localization playbook before—it launched a slew of regionally focused apps roughly three years ago with names like Uniik, Spark, Catalog, Takib, habU, and S’More that each catered to specific countries outside China, but they failed to catch on. The effort could have been a lesson for the company about the value of its massive Chinese content ecosystem to people in other countries, but as is often the case, regulatory and political considerations appear to have taken priority.

“I don’t want to see Americans talking about Coachella. I did that on Instagram, I didn’t join Xiaohongshu to see Instagram,” says the American user who was recently redirected to Rednote.

Security Concerns

As Rednote goes global, the company is no doubt looking to Chinese predecessors like WeChat and TikTok for ideas about how to navigate the minefield of content moderation and data privacy. So far, its approach looks to more closely resemble that of WeChat.

For over a decade, WeChat has sorted users based largely on one criterion: whether they used a Chinese or a foreign number to sign up. That has allowed users to cross Tencent’s digital border by unlinking and relinking their WeChat accounts to different mobile numbers.

Jeffrey Knockel, an assistant professor of computer science at Bowdoin College, found that Tencent censors content on WeChat and Weixin differently, even though the two platforms are integrated with one another and users can communicate across them. He says Chinese users are subject to a real-time keyword-matching filter to censor politically sensitive speech, but “if you registered for WeChat using a Canadian or an American phone number, your messages aren’t necessarily under that kind of censorship.”

Knockel says WeChat’s blended content moderation approach may have made some people wary about using the app. “Users are generally distrustful of the platform. They don’t know if they’re being watched and censored,” he says. As Rednote moves in a similar direction, it will be worth watching whether international audiences end up having similar misgivings.


This is an edition of Zeyi Yang and Louise Matsakis Made in China newsletter. Read previous newsletters here.





Source link

Continue Reading

Tech

Google Cloud Next: It’s time to create value, not slop, from the AI boom | Computer Weekly

Published

on

Google Cloud Next: It’s time to create value, not slop, from the AI boom | Computer Weekly


If there was any doubt, AI mania was on full display at Google Cloud Next in Las Vegas this week, but history shows us that when humans start getting manic about things, it doesn’t always work out great.

Lately, I’ve seen a few commentators bringing up the horrible story of the radium girls to try to make this point. Have you ever heard of them? They were factory workers of the 1920s hired to paint watch faces with newfangled luminous paint containing deadly radium.

The camel hair paintbrushes the workers used lost their shape after a few brush strokes so they were encouraged to reshape the brushes by licking the tips. Many of the workers also used the paint as lipstick or nail polish, because why not?

This did not go well for anybody involved. Many radium girls experienced dental issues, lost teeth, and suffered oral lesions and ulcers. Others developed anaemia and necrosis of the jaw. Some experienced disruption to their menstrual cycles or were even rendered sterile.

At least 50 women died prematurely as a result.

This wasn’t the only misuse of radium. In a short-lived mania for the radioactive metal – first discovered by Marie and Pierre Curie in 1898 – humans also put it in toothpaste, hair cream, and a medicinal tonic drink called Radithor. Doctors even used it to try to treat cancer.

AI is manifestly not a radioactive element but there are clear parallels between its widespread application and the reckless use of radium a century ago. And I believe there is a warning here for us, or a lesson if we care to hear it; we need to figure AI out before we do something really dumb.

Put your hands in the air, the use cases aren’t there

Just look at the application of AI to the ‘creation’ of art and music and other forms of self-expression. Here, take-up has become so pervasive that the well of human creativity, perhaps our most awesome trait, is rapidly being poisoned with utter slop.

As a case in point, ahead of the opening keynote at Google Cloud Next, 32,000 humans and a handful of AIs were treated to a Google Gemini-enhanced DJ set accompanied by AI-generated visuals created by the complex ‘art’ of waving your hands about in midair.

To be fair to the performers, the results were quite impressive and the audience was bopping along.

But it’s worth a sidenote that Italian DJ Robert Miles created his breakthrough 1995 track Children using nothing more than a Korg 01/W FD synthesiser, its 16’ Piano patch, and his own skill.

My point is that Children remains an iconic piece of genre-defining ‘90s dance music, but nobody in the Google audience will be able to hum today’s set in 30 years’ time.

Next, in a demonstration of the power of Google’s Gemini Agent Platform – officially unveiled at the show – Google Cloud’s Erica Chuong, manager for applied AI forward deployed engineering, designed a ground-up interior design campaign for a fictional furniture company that had found itself lumbered with dead stock that nobody wants.

Analysing current ‘modern organic’ interior design trends the agent designed a campaign for Chuong where relevant dead stock was repriced to undercut the competition and created a series of videos showing off its flair for interior design.

Unfortunately for the agent the result was a banal and unimaginative sofa and coffee table combo dominated by dull neutral tones and devoid of personality. It would have looked okay in a Travelodge lobby.

In a world where interior design trends are being dictated by consumers asking their AI assistants about the latest interior design trends while interior designers ask their AI agents what interior design trends consumers are into, you may be wondering how any new information about interior design trends gets into this loop. If you find out how, please let someone at Computer Weekly know.

But at this point, the AI cat is not only out of the bag, it’s on top of your living room shelves knocking over your good wine glasses. Three quarters of Google Cloud customers already leverage Google’s AI, says CEO Thomas Kurian. “You have moved beyond the pilot, the experimental phase is behind us and now the real challenge begins,” he told the audience.

Moving AI into production of course needs a unified stack and happily for Google Cloud, right on cue, here comes a Google-branded one. As Google iterates its tensor processing units (TPUs) at an ever-increasing pace, it also comes with a whole new chipset, TPU 8i to support inference and TPU 8t to run training.

Lest his existence be forgot during the love-in, Kurian’s boss, Google and Alphabet CEO Sundar Pichai, appeared on a big screen to tell everyone how glad he was that they were in Las Vegas even though he hadn’t made the trip himself, and revealed just how much money – almost two hundred billion dollars – Google will spend on capex investment in innovation this year, a good portion of it to the cloud unit and much of it supporting AI.

“We are more on the front foot than ever before,” remarked Pichai. “We are moving in a bold and responsible way.”

So if that’s true, where are the bold and responsible use cases? Do they even exist? Or are they just the usual conference waffle? I went looking.

Resident agents

Resident Evil developer Capcom says it is using Google Cloud to enhance its videogame development processes, not by taking over the creativity but by enabling creatives to be creative.

A big challenge for videogame developers is playtesting their products prior to release, and as their properties grow in scale – many now encompass vast digital worlds with unthinkable numbers of permutations – the strain on developers has ramped up, big time, leading to a phenomena known as defensive development.

Defensive development is a situation where the cost of making technical changes to an in-progress project gets so high that the human engineers feel pressurised to prioritise maintenance over innovation. In gaming this often occurs late in the production cycle, leading to problems with titles being released that seem, well, unfinished in some way.

It’s not an issue that’s unique to companies like Capcom, though. Take the manufacturing sector, where facility managers might see similar challenges when trying to simulate how a hardware update will work within their current procedures, or in retail, where logistics experts must navigate dynamic data reserves when trying to optimise supply chains without disrupting their current inventory systems.

Working with Google Cloud, Capcom has now launched an in-house agentic platform that not only relieves some of this burden but also serves as a blueprint for where AI might be used better in the creative sector, and others. 

It describes its approach as a multimodal workbench, and at its core, it comprises a small group of distinct agents that optimise the playtesting process using vision and reasoning to understand the intent of a system.

The first of these, the visual inspection agent, uses Gemini Vision to look at the screen through near-human eyes, working out what is an intentional design choice and what is a technical failure.

The second, the predictive agent, pores over historical data to work out where a system might break next and directs a mini army of test bots to ‘swarm’ high-risk areas, rather than testing randomly.

The third, the institutional knowledge agent, enables new team members, human ones, to learn how their colleagues or predecessors worked similar problems before, preserving decades of expertise – three of them in the case of the Resident Evil franchise.

The fourth, the data inefficiency agent, spots inefficiencies within datasets to optimise overall game performance. Developers can query it to help summarise complex technical logs and make more advanced data more widely available to their teams.

Data inefficiency agents: These agents identify inefficiencies within massive datasets to optimize game performance. Developers can query their AI teammates to receive summaries of complex technical logs, making advanced development data accessible to all team members.

Collectively, Capcom’s agents are now running for 30,000 human hours every month and the firm’s developers say they now feel empowered to focus on higher value creative tasks, while Google Cloud, for its part, says that many of the tasks the agents are performing have applications in many other industries.

Citi Sky lines up

Elsewhere, Citi Wealth, the wealth management arm of Citibank parent Citigroup, unveiled an AI team member called Citi Sky, which it says will help reshape how its clients access market insights, act on potential opportunities, and work with their human financial advisors. Bilingual in English and Spanish, in time it will be integrated into Citi Wealth’s platforms – although in the US only for now.

Citi head of wealth, Andy Sieg, said that for decades, managing your financial life has meant navigating calls, meetings, and more recently apps. With the new agentic service, you simply ask and then act. It’s a shift from interface to intelligence and transactions to outcomes, he says, with a universal question at the centre: am I financially okay?

Citi Sky will answer this question in real time, marrying insight and execution simply and clearly – not replacing human advisors, but extending their reach and deepening their impact. In fact, Citi Wealth plans to hire advisors in the years ahead.

For Citi Wealth as a business, Sieg says the goal is to unlock massive scale and apply basically unlimited cognitive resources to its clients. “And the real need that we’ve met … is creating a relationship that can evoke the same kind of trust, we believe, that clients have with their human financial advisors,” he says.

Citi Wealth invoked Google’s full AI stack to build Sky, from Google Cloud infrastructure to Google DeepMind and, of course, Gemini models running on Gemini Enterprise Agent Platform. It worked closely with both teams to incorporate DeepMind’s real-time avatar technology and Gemini’s live application programming interface (API) to solve challenges around providing low-latency audio and video conversations.

A plea for rational thinking

I must acknowledge that Google Cloud’s customer stories are carefully curated by its communications teams – not every customer wants to talk, some will be forbidden from doing so, even more are still shivering on the edge of the pool with their inflatable armbands on, too scared to jump in.

And to be blunt, some customers will be at the deep end doing really stupid things with AI that will blow up in their faces.

But in the examples of Capcom and Citi Wealth – and others that would have pushed the word count unreasonably high – I think there is some hope.

With forethought – not even very much of it – and a rational head, we can turn AI loose on both the small challenges we face in our daily lives, and the grand challenges we face collectively.

But to do this we need to resist the advances of the snake oil salesmen, the charmers and grifters, and especially the tech bros who want to disrupt something that doesn’t need disrupting, like the habit of art, for the sake of making themselves richer.

And I fear we may be running out of time to do so.



Source link

Continue Reading

Tech

This Is the Only Office Lamp That Does Double Duty on My Nightstand

Published

on

This Is the Only Office Lamp That Does Double Duty on My Nightstand


The base of the lamp has two slider buttons. One toggle adjusts the warmth, from cold white light all the way to red. One adjusts the intensity, from ultra-bright down to a glareless glow. Hard taps on each button skip ahead, while holding the toggle down on one side or another adjusts the light settings quite slowly—slowly enough I at first sometimes question whether it’s happening.

The maximum brightness is 1,000 lumens—the approximate intensity of a 75-watt incandescent bulb. At this brightness, the battery lasts about five hours. At a lower intensity, this can extend to as long as a dozen hours.

Red Shift

Photograph: Matthew Korfhage

There’s an added feature I have come to appreciate at night, which is the red-light mode. There’s little evidence that blue light from your little smartphone is keeping you awake at night. But numerous studies do show that blue light wavelengths can affect melatonin levels and thus your body’s circadian rhythm, while red light doesn’t do this.

Red light therapy is, of course, the province of TikTok as much as science—a field where wild exaggerations live alongside legitimate uses and benefits. For every sleep study showing that red light is superior to blue light when it comes to melatonin levels, there’s another showing that red light is associated with “negative emotions” before bed.

So I can only offer my own experience, which is that Edge Light Go’s red reading light offers me a pleasant liminal space between awake time and sleepy time, one not offered by a basic nightstand lamp. It allows me to sort of bask in a darkroom space that still lets me see and read, and drift off a little easier.

If I fall asleep, the light has an automatic 25-minute shut-off, which means I won’t do what I far too often do, which is drift off while reading and then wake up, alarmed, to a room filled with bright light in the middle of the night.

Caveats and Quirks

Image may contain Lamp Furniture and Tape

Photograph: Matthew Korfhage

This said, for all the virtues of portability, the Edge Light Go does not boast a base that’s heavy enough to stop the lamp from tipping over if I bend it forward from its lowest hinge. This can be an annoyance when trying to use the lamp as a reading light from a bedside table or the arm of a couch.



Source link

Continue Reading

Trending