Graphic design plays a fundamental role in shaping how clients or consumers perceive your business, but not every founder has the budget or the time to hire a professional designer for every project. Those limiting factors could soon become a thing of the past thanks to text-to-image generation, a new type of machine learning that can create original images by processing simple text prompts.

OpenAI, a so-called research and deployment company, is pioneering the technology with its program Dall-E 2, released in April to a closed beta audience. The program takes in huge amounts of images with corresponding descriptions in order to learn how to visually identify objects (think "cat") and the relationships between objects (think "cat driving a car"). When you enter a prompt, it calls from this data to create its best approximation of your request. The model can even identify and replicate different artists' styles (think "cat driving a car in the style of Jack Kirby").

Interest in text-to-image technology went viral back in June, after Craiyon, a less-advanced, third-party version of OpenAI's model (formerly called Dall-E Mini), exploded on social media, with thousands of people posting their creations online. Images such as a chicken nugget smoking a cigarette in the rain, or Darth Vader competing on the cooking show Chopped (both below) became widely shared as people fed the model their most ridiculous prompts to find the limits of the technology.

 inline image

The value of text-to-image as a neat toy is immediately apparent, but what about the potential business applications? An OpenAI spokesperson told Inc. that the researchers behind Dall-E are still discovering how people want to use it, but that they see the program as being "a useful creative tool for artists, architects, product designers, and magazine cover designers." 

Another potential use for the technology offered by OpenAI is in video games and interactive experiences, like the metaverse. According to the company's spokesperson, text-to-image tech could be used by game designers and developers as a tool to "inspire designs for AR avatars or experiences."

The purpose of text-to-image tech isn't to replace artists and graphic designers, according to OpenAI, but rather to assist them in their jobs while also granting the ability to create original images to anyone with an imagination. In a blog post published in June 2022, Google software engineer Yonghui Wu and research scientist David Fleet wrote that Google's text-to-image models, known as Imagen and Parti, will "bring user experiences based on these models to the world in a safe, responsible way that will inspire creativity."

To assist artists, Dall-E 2 has a function called Inpainting, which allows users to highlight part of an image they'd like to change. An interior designer could use the tool to remove a throw pillow from a picture of a living room by simply highlighting the pillow and typing in "plain couch." 

Another possibility for monetizing the tech is creating NFTs, though OpenAI says that it is taking time to understand the capabilities and limits of its models in creating digital tokens before making any official steps in that direction. A key question: Who actually owns an NFT created by a text prompt? OpenAI currently owns all images produced using the program, but the company says it will revisit the decision after the program's official launch. 

One of the main risks of artificially generated images is that they can easily be used to foster disinformation or to create deepfake images, so providing ways to easily verify whether an image is legitimate or artificial will be incredibly important to the success of the tech. For now, each image generated by Dall-E 2 displays a small series of colored boxes in the lower right-hand corner, a kind of signature, according to OpenAI. 

The company is quick to point out that text-to-image technology isn't perfect yet, and that's by design. Dall-E 2 has barriers in place to prevent photorealistic depictions of real peoples' faces, and the program has very little ability to depict violent or hateful imagery because researchers removed such explicit content from its training data. 

For budding entrepreneurs with big imaginations and little artistic ability, though, the tech could serve as both a source of inspiration and a practical solution for an image-obsessed world.