Why Are AI-Generated Hands So Messed Up?

But why do these programs mess up hands (not to mention bare feet) so badly? It’s a question that many people have asked

To find out, I emailed Midjourney; Stability AI, which makes Stable Diffusion; and OpenAI, which created DALL-E 2. Only Stability AI responded to my questions.

“It’s generally understood that within AI datasets, human images display hands less visibly than they do faces,” a Stability AI spokesperson told BuzzFeed News. “Hands also tend to be much smaller in the source images, as they are relatively rarely visible in large form.”

To understand more, I got in touch with Amelia Winger-Bearskin, an artist and an associate professor of AI and the arts at the University of Florida, who has been analyzing the aesthetics of AI art on her blog. “I am obsessed with this question!” Winger-Bearskin exclaimed on our video call. 

Generative artificial intelligence that’s trained on billions of images scraped from the internet, Winger-Bearskin explained, does not really understand what a “hand” is, at least not in the way it connects anatomically to a human body. 

“It’s just looking at how hands are represented” in the images that it has been trained on, she said. “Hands, in images, are quite nuanced,” she adds. “They’re usually holding on to something. Or sometimes, they’re holding on to another person.”

In the photographs, paintings, and screenshots that AI learns from, hands may be holding onto drapery or clutching a microphone. They may be waving or facing the camera in a way where just a few fingers are visible. Or they may be balled up into fists where no fingers are visible. 

“In images, hands are rarely like this,” Winger-Bearskin said, holding up her hands with fingers spread apart. “If they were like this in all images, the AI would be able to reproduce them perfectly.” AI, she said, needs to understand what it is to have a human body, how exactly hands are connected to it, and what their constraints are. 

Hands have a fundamental place within the art world — imprints of hands on cave walls are the very first kind of art that Homo sapiens created that we know of — and are considered to be some of the most difficult objects to draw or paint. In paintings from Ancient Greece and medieval Europe, representations of human hands were still flat and lacked intricacies. 

It was only in the era of Renaissance art, between the 14th and the 16th centuries in Europe — when artists like Leonardo da Vinci started studying and sketching hands, including their structural elements like bones and ligaments — that human hands began to be represented in all their complexity. (This era also gave us one of the most recognizable frescos involving two hands — Michelangelo’s The Creation of Adam, which depicts God as a bearded man stretching out his right arm to touch Adam’s outstretched left.)

“Da Vinci was actually quite obsessed with hands and did many, many studies of hands,” Winger-Bearskin said. Meanwhile, when AI is trained on an image, “it’s just looking at that and saying, ‘Oh, in this case, there’s only half of a thumb,’ because the rest of it is hidden under fabric or grabbing on to something, and so when it reproduces it, it’s somewhat deformed.”

One day though, generative AI will get significantly better at rendering pictures of hands and feet and teeth. “It has to,” Winger-Bearskin said. “For AI to become a useful tool for humanity, it has to understand what it is to be human, and the anatomical reality of being human.”

Source

But why do these programs mess up hands (not to mention bare feet) so badly? It’s a question that many people have asked.  To find out, I emailed Midjourney; Stability AI, which makes Stable Diffusion; and OpenAI, which created DALL-E 2. Only Stability AI responded to my questions. “It’s generally…

Leave a Reply

Your email address will not be published. Required fields are marked *